United States
Environmental Protection
Agency
Office of Air Quality
Planning and Standards
Research Triangle Park NC 27711
EPA-450/4-87-010
May 1987
Air
Protocol For
Applying And
Validating  The
CMB  Model

-------
                                           EPA-450/4-87-010
PROTOCOL FOR APPLYING AND
 VALIDATING THE CMB MODEL
                       By

             Air Management Technology Branch
             Monitoring And Data Analysis Division
                      And
                 Desert Research Institute
                  Reno, Nevada 89506
           U.S. ENVIRONMENTAL PROTECTION AGENCY
                Office Of Air And Radiation
           Office Of Air Quality Planning And Standards
              Research Triangle Park NC 27711
                     May 1987

-------
This report has been reviewed by the Office Of Air Quality Planning And Standards, U.S. Environmental
Protection Agency, and approved for publication as received from the contractor. Approval does not signify that
the contents necessarily reflect the views and policies of the Agency, neither does the mention of trade names
or commercial products constitute endorsement or recommendation for use.
                                       EPA-450/4-87-010

-------
                            TABLE OF CONTENTS





                                                                       Page



List of Figures	   v



List of Tables	   vi



Execjti ve Summary	  vi i



1.0   Introduction	   1



2.0   Planning the CMB Analysis	   3



2.1   Other Useful Receptor Methods'	   3



2.2   Levels of CMB Analysis	   3



3.0   Applications and Validation Protocol	   5



3.1   Determining Applicability of the CMB	   6



3.2   CMB Model Setup and Initial Run	   7



      3.2.1  Ambient Species and Their Uncertainty	   7



      3.2.2  Source Identification - The Fitting Sources	   9



      3.2.3  Source Profile Representativeness and Uncertainty........   10



      3.2.4  Initial Run of the CMB Model	   11



3.3   Outputs, Statistics and Diagnostics - Definitions



           and Interpretation	   11



      3.3.1  Percent Total Mass Accounted For	   12



      3.3.2  R-Square and Chi-Square — Goodness of Fit	   12



      3.3.3  t-Stati sti c	   16



      3.3.4  Uncertainty/Similarity Clusters	   16



      3.3.5  Ratios and Residuals of Fitting Species	   17



      3.3.6  Source Species Contributions	   18



3.4   Deviations from Model Assumptions	   18
                                   iii

-------
3.5   Identifying and Correcting Problems
           by Changing the Model Inputs	  19

      3.5.1  Correcting the Ambient Data - Gross Errors	  19

      3.5.2  Correcting Source Profiles - Gross Errors	  20

      3.5.3  Changing the Source List	  2U

      3.5.3.1  Missing Sources	  20

      3.5.3.2  Noncontributing Sources	  21

      3.5.4  Improving Source Profiles—Uncertainty/Similarity	  22

      3.5.5  Problem Identification and Correction Strategy	  23

3.6   Consistency/Stability of the Model Results	  25

      3.6.1  Source Profile Sensitivity	  25

      3.6.2  Receptor Concentration Sensitivity	  26

      3.6.3  Fitting Species Sensitivity	  26

3.7   Evaluating Results of the CMB Analyses	  26

4.0   Acknowledgments	  27

5.0   References	  28

Appendix A - Determining Source Profiles and Their Uncertainties	A-l

Appendix B - Examples and Quick Guide for Identifying Problems
                  and Corrective Actions	 B-l

Appendix C - CMB Model Assumptions	 C-l
                                    IV

-------
                               LIST OF TABLES
Number                                                           Page

  A       Summary of Model Displays (Outputs, Statistics,
          Diagnostics and Status)	xi i

 2.1      Useful  Receptor Methods	  4

 3.1      Summary of Model Displays (Outputs, Statistics,
          Diagnostics and Status)	 13
                                     vi

-------
                              LIST OF FIGURES



Number                                                           Page





  A        CMB Application and Validation Flowchart	   ix



  B        Flowchart for Problem Identification and  Correction..    x





  1        Flowchart for Problem Identification and  Correction..   24

-------
                            EXECUTIVE SUMMARY
     The objectives of this applications and validation protocol  are to
provide technical  guidance on:

   o  The selection of source categories to be apportioned by the model
and the selection  of model input data.

   o  Methods to determine the validity and uncertainty of a specific
model application.

   o  Methods to improve the quality and reduce the uncertainty of a
specific model  application.

     The objective of a CMB application is to use information about the
chemical composition of sources in an airshed (source profiles) along
witn data on the chemical composition of the ambient air to estimate the
source contributions which would best "explain" the chemical properties
of measured ambient data (species).  These source profiles and ambient
species data are the primary model inputs.  Effective variance weighted
least squares,  which considers the uncertainties in both source and
ambient data is used to derive the source contribution estimates.

     The Cf1b consists of a set of linear equations which express  the
ambient concentrations of chemical species as the sum of products of
source compositions and source contributions.  These equations are a
direct consequence of the law of conservation of mass.  This set  of
equations is overdetermined because the number of measured chemical
species generally exceeds the number of contributing source types.  The
source contributions are generally the unknowns in these equations.
However, a unique solution cannot be found for this set of equations
because measurement uncertainty precludes determination of exact  values
for source and receptor data.  When these uncertainties are estimated for
both source and receptor measurements,  an additional physical constraint
is applied «nicn will yield a unique solution.  This constraint is the
minimization of the difference between calculated and measured receptor
concentrations  by using a weighting scheme, effective variance.  The
weighting has a physical significance in that it is derived from  the
measurement uncertainties of both source and receptor chemical species.
Though the CMB solution is identical to some statistical inference methods,
it is not entirely dependent on statistical principles.  The basic model
equations which represent the source receptor relationship, the effective
variance weighting, and the error propogation are all based on physical
principles which are then incorporated into a statistical estimation
procedure.

     The guidance provided by this document consists of a seven-step
process whic'i is described in Sections 3.1 through 3.7 of this protocol.
This guidance is necessary because there is no single goodness-of-fit
parameter which can assess the models reliability.  Application of the
CMB is ideally  preceded by other receptor methods or dispersion models.
                                   VII

-------
This is described in Section 2.0.   Also,  the CMB results must be compared
with other receptor and dispersion model  results and any differences must
be reconciled.   Finally, a control strategy must be developed.   The model
reconciliation  and control strategy development processes are described
in other protocols.  The CMB must  have a  strong physical basis  in order
for its results to be meaningful.   The seven steps listed below are
summarized in Figures A and B of this summary.

(1) assessing the general  applicability of the CMB model to the situation
    under study;

(2) configuring the model  with appropriate sources, source profiles, and
    chemical  species concentrations at receptor sites;

(3) examining model statistics and diagnostics;

(4) determining agreement  with model assumptions;

(5) identifying problems,  changing the model configuration and  rerunning;

(6) testing the consistency and stability of model results; and

(7) evaluating the validity of model results.

     The applicability of the CMB  model must be determined for  each
application.   This determination concerns such issues as sample quantity,
quality and appropriateness for chemical  analysis.  Also, the model can
only estimate the contributions (to ambient mass concentration) of source
categories which are chemically distinguishable (different) from each
other.  Their chemical characteristics must be obtainable.

     The model  provides three primary outputs.  These are:  (1) contribution
estimates (to ambient concentrations) of the sources or source  categories
which are included in the fit (SCE); (2)  the standard errors of these
source contribution estimates (STDERR); and (3) the species concentrations
calculated from the fit (CALC).  There are three statistics which can be
used to evaluate how well  the model's calculated species concentrations
"fit" or match the ambient measurements for these species.  These are:
(1) the percent of total mass explained by the fit (%MASS); (2) R-SQUARE;
and (3) CHI-SQUARE.  Generally, it is desirable to obtain a good "fit" of
the data and obtain SCE's which have STDERR that are low relative to the
size of the SCE.

     Four diagnostics are available to help identify the data which are
responsible for a poor fit so that improved data might be obtained or
included to improve the fit.  These are:   (1) the uncertainty/similarity
clusters (U/S Clusters);  (2) the ratio of calculated to measured species
concentrations  (RATIO C/M); the ratio of the residual (calculated-measured)
to the uncertainty of this difference  (RATIO R/U); and  (4) the portion of
a  calculated species concentration that is attributed by the model to
each source (SSCONT).  Table A summarizes the information available on the
model displays.
                                   viii

-------
Figure A.  CMB Application and Validation Flowchart
                    Preliminary
                      Analyses
                   (section 2.0)
                   Determine CMB
                   Applicability
                   'section 3.1)
                Set jp and Run Model
            ->    (section 3.2 and
                   'Jser's Manual)
                Examine Diagnostics
                   (section 3.3)
                    Check Model
                    Assumptions
                   (section 3.4)
            -<    Identify Changes
                   (section 3.5)
              Evaluate Consistency and
                Stability of Results
                   (section 3.6)
                  Establish Model
                      Validity
                   (section 3.7)
                Reconcile With Other
                 Model  Results (see
              Reconciliation Protocol)
                  Develop Control
                   Strategy (see
                   SIP Guideline)
Expanded
Flowchart
in Figure B.
                         IX

-------
                                                       Dt^CTE SOURCE FKOM
                                                            U/I ClUJTtX
                                                         [W/ SMALLEST >CE
                                                       PEftMAMfNT IOLUTIOM
                                                            KEFEK TO
                                                           SECTION 9 S»
                                                            CLUSE DELCTEO
                                                         SOUKCEi » KERUM
Figure  B.    Flowchart for Problem Identification And Correction

-------
     There are four main categories of situations that can be addressed
to improve model  performance.   These are:   (1)  incorrect ambient data,
(2) incorrect source profiles, (3)  incorrect source list, and (4) profile
uncertainty/col linearity.  Indication of these  situations is inferred
from the diagnostics and indicators listed above.  The indicators and
appropriate corrective action  for each situation is suggested.   After
corrective action has been taken, the model's fit of the measured species
data is reevaluated.  If the fit is acceptable  for that particular use,
the stability of  the model results  should  be assessed.  This includes
evaluating the sensitivity of  the model's  results to errors in  the sources,
source profiles  and the ambient data.  Finally,  the model results can be
declared valid for that application.

     If (a) the  CMB model is determined to be applicable (Section 3.1),
(b) the summary  statistics and diagnostics are  generally within target
ranges (Section  3.3), (c) thera are no significant deviations from model
assumptions (Section 3.5), and (d)  the sensitivity tests in Section 3.6
uncovered no unacceptable instability or consistency problems,  the CMB
analysis is considered valid.
                                    xi

-------












































^f_

LU
	 1
CO
I—













.-^*
oo
ID
h-
^£
1—
co

^7^
z
^£

oo
o
t— *
f—
00 C
0 0
Z -r-
C5 -M
 00
OO Ol
0 0
r— t
1—
oo
HH
!—
^£
1—
oo

*
OO
a.
source or

jC
0
ra
Ol

(4_
o

co
e

en
3

C
•r™

A
Ol

(O
E
•r™
4^
(/)
Q)

Ol
JC
4_)


• •
4J
3
a.
4J
3
O

>^
t_
T3
E
•r—
£_
Q.


ro
ro
•a
c
0)
^Q
£
fO

T3
Ol

4->

(4.

Ol
JC
4->

0
4->

C
O

4J
3
jQ
•r—
t_
j i
C
0
o

00
•
>~
£_
0
CD
Oi
4->
ITS
O

0)
o

3
O
oo


C
10 •
4-> I—I
S- UJ
O) U
o oo
c
3 V
V
^
O 4->
Ol
c o>
O t-

00 t—
•r- 1 	 1
u
0)
c_ .
Q.LU
o
oi oo
JC
4^ Ol
JC
t4— 4->
0
(4_
t- O
o
4-> Ol
ITS O
o c
•i- IO

c t_
•r- ro
>

• • 01
4-> JC
3 4->
GL
4-} *
3 LU
O 0
oo
>..
L. Ol
TS jC
E 4->
>r_
C- <4-
Q. 0


O 1
••- c
4J O
10 c
(O
0)
1 > 1 *
ft
oo Oi
•r" O^
a>
4-> 3
i— I 00

1—
• ^C
ae. I—
oz oo
LU 1
a h-
1—
oo jc
a>
OJ -i-
JC JC
4->

4J
CU •
t- « 1 — 1
O.OE: o

Ol LU CM
4-> O
C H- A
•i- 00
4_>
O 00 Ol
4-» 4-> a>

T3 ra
Ol 0 1—
00 4-> 1 — 1
3
LU
o o •
•r- OO LU
4-> 0
01 Ol OO
•^" JC
4-> 4J O
(O (_
4»} *4~ OJ
00 O N


ambient

o>
JC
4->

C


0)
o
c
(TS
>p_
C-
ro


Oi

| <

0)
c_
3
OO
TJ
0)
e

o
[ >

T3
O)
oo
3


• .
O
•f—
1 ^
l/l
 •
o
>> •
JO r-4

•o o
O) 4->
C
••- o
T3
— &
a. o
X C.
Ol 14-

00 
•r- Ol
O>
JC C
U rO

c~
2

OO I/I
C C
O 0

4-* 4-*
ro ro
^ ^
4-> 4->
C C
O> 01
(J O •
c c r- 1
o o o
o o •
1 — 1
OO OO
O) O) O
•f- -r- 4->
(J O
O) O) 00
a. Q. •
00 00 O


Iso conside
entrations.
t explained
(O  O
•r- O 00
TJ
4-> 00 JC
ro Oi
JC ••- r— .
4-> O Ol 1 — 1
Ol "O O
4_> a. o •
a. oo E <•
01 I
 4-> O
ro
t i i r» [ % 4J
a: 3 TJ a;
 t-
o- TJ TJ
OO (J 00 1 —
1 4-> 1 	 1
o; oi oo
J= O)
O 4-> CT •
4-> C71^-
M- 3 •—
£- O oo O)
fTS 3
c — 00 LU
•r— OJ Q^ TJ
£ .,- < 4->

oo c O-T:
•i- OO
TS 1 LO
• • 4_> *— « Ol
o t- a: T-
•«- O) O O
4_j cj a;
oo c jc a.
•i— 3 (-71 00
4-> T-
ra Ol JC Ol
4J JC JC
OO 4-> 


c_
o
X> o
JC Q.
4-> <4-
O Ol
C 00
•^ E 3 •
	 3 ro 1 — 1
oo o a*
00 Ol O
00 CO JO CM
T5 JC
c 4_> C7^ + j
C
4-> >!•!-»*
c n -o o
0) TJ O
•r- T3 Ol <-"<
JD Ol »^
e c oo 4->
TJ *^" '^ Ol
TS E ^"^
i| p.. C_
O Q. O) TJ
x JD t—
>e 01 ' — i
c
^ **• "•*• ^3
JC T3 O •
4-> Ol OO
N 9*. OO
*>/ >\ CIO ^C
U F- O 21

t- c s*
+-> fO i-
TS JC
O O^ Ol CTI
4-> C C -r-
•^ JC
T3 O) OO
O) -Q OO TS
00 =C
3 C S 01
O 0

.. j3 o
O O "^ t4~
•r- TS
4-J £- C
l/l 4~~ * ^0
•1— 00 O
4-> Ol -
TS M LU 4->
4-) -r- O -i-
OO 00 OO . 4J
tn D
•i- CT)
j % ^_
TS TJ
4J t—
OO 1 — 1















•
4J
.^
lf_

CU
J^
[ *

c


-^
cu
CO
3

CT)
C
.f—
OJ
JO

TJ
4->
•T3
T^


• •
CO
3

TS
[ *
OO


oo
Q-
oo
UJ
c
o
          C
          o
 rO
•r"
 >

 0)
                      LU
CtL
LU
O


oo
                                         OO
                                          I
LU
ce.

00
^^
J^J
3
0.
J)H>
13
o
^^
b^
•^
O

C
o
•r-
4J
3
jQ
•^
t_
4->
C
O
O

OJ
0
<_
3
O
00

ro
4jJ>
l/l

4J















0)

O) ro
C. 3
IO tT
3 in
o- i

C
3
0
O
o


00
00
ro
X

4->

CU
o
c.
cu
a.







E
O
•a
cu

t_
U-'

>4-
o

00
CU

r_
cn
cu
o














4)
f_
a.
E

00

(U

•r-
00
                                                     xii

-------




















•^— 1.
-o
OJ
3
C
.r-
40
C
O
C_5
>*^


o
—
oo
^_
..
c_
03
£
£
3
co


• •
O
• r^
40
CO
0
c^
a>
«3
•r-
Q



oo
LU
C
o
40
(O
•r*
>
a>
c_
<
H-
" 00
— ^
.^_/
_1


oo
•^^^
=>


o
•— 40
ai
-a >,
E "ai
 r—
•r—
CO
« (O
CO 0)
•t—
CO 40
>> o
i— C
(0 C
£Z TS

OJ
•r™
O
£_
a.
o
40 «
•r- 1 — 1
r— CO
•r" t-
-Q OJ
03 40
CO
CO 3

•— O
ai
•a o

g
i
^)
j= 40
40  3 a. c.
co u a) ai
3 O
0 3T3
C O 40
OO •«- co co
OJ T- J=
O 0 t—
40 i. -a
3 OJ
40 O T3 •
C CO OJ
0) 0)
E a> c
OJ JZ
i— 40 40
a. o
0.14- Z
3 0
CO
CO «
- E
.. UJ 3
40 U CO
3 00
a. v
4O *4— JZ
3 O 40
o
E f-
<— 3 O
a 40
oo E co








+ 1

z
r>
oo


•o
OJ
>
o
c_
a.
E


OJ
o

c
*o
{_)

co
OJ •
o + 1
£_
3 OJ
o jz
CO 40

C- CO
OJ 2
40 O
CO i—
3 i —
r- O
U <4-















c
co
o
OJ
Q.
co

CO
OJ
40
(O
u
tf—
T3
c
.(^

f
E
3

'o
o

c
•r™

*


• .
v>
3
1 *
*o
IJ
oo












1—4


c
E
t_
3
CO
t3
oj
en
c
•r-
CO
CO

CO
,
3 40
a. c
40 -i-
3 — i
i—

"^


]>«) H~
u- 1 — 1
•I—
JJJ
c •
a; •—
"O 
oo
























CO

«*
*••»
Q.
CO
OJ
•a
0
o
^•»x
u
•^
40
CO
-r"
40
l)
4-^

u
^
40

&•.
*TJ
r«™
•^
E
•r-
oo
^>^
40
c
•r-
Hi
40
c_
OJ
u
c
=3








CO
OJ
^J
(.
3
o
oo

•a
OJ
c
•r™
i
o


<4-
o

E
3
oo
CO
OJ
>k
(^
,__
11
CO
.^
o

c
o
•f—
40
fa
(.
40
c
0)
0
c
o
g s

CO
OJ
.,_
u
OJ
a.
oo
•rv
u

Q.
oo

e_
o
<4-

40
c

£
5

3
CO
40 OJ
•^™ s^"
<»-
0>
ai c
JZ i-
40 (/)
CO
c *^*
>-• z:

c
0
•r"
40

(Q
(_
40
c
OJ
u
c
o
o

CO
OJ
•r-
u
OJ
a.
oo
•a
OJ
40

^^
3
o

1O
u
T3
O)
t_
3
CO

-------































X— -X
T3
Ol
3


CU
-c
0)
.0* U
(U
•O Q.
CU CO •
•^ >»'aj
> <»_ -O
•r- -i- O
T3 4J E
C
to 0) CU
«C T3 .c:
LU -I- 4->
^^
2.
1 O >>
O 4J -Q
-J
< -a t_
O CD O
CO 4-
d) Z3
0 "C
C CU
cu • •(->
C_ CU C
CU 0 3
<»- C 0
*4— CU O
•r- C. O
-a cu «
>4- 1
•o <»- t_
cu •••• cu
c -a •o
O) C
•^- 4^ 3
co ns
s: t_
OJ 4-> O •
S: > 	 1
4-» l*- 1 	
O t- O
o •
.. >, > CM
O 4-> O V
•i- C 	
4-> -i- CU
CO ft3 6"- 4J
O 4-> T3 O)
C t- O)
C7) CU 4^ ^
(Q VJ oi) n3
•r™ C *"" ^~
2 3 4-> 1 — 1


I/)
cu
u
cu
a.
CO
•a
cu
3
CO
(Q
CU
c:
jp*
U
(Q
cu

<4_
o

c
o
•r—
4->
O
(O
<_
<4~



CO
J
O
x:
CO


• •
u
•r-
4->
CO
O
C
Ol
fO
•r-
Q


calculated

cu
f"
JM,
4->

>>


t_
O
<4_

•o
cu
4-)
c
3
O
0
u
rt3

CO
•^

4->
03
^
4_>

C
O

4-3
03
(.
4->
C
CU
o
c
o
o


cu
c
03
1—
•
>>
o

^
^^

C-
o


•a
0)
^>
c
3
O
O
u

^
03
a.

£_
o
14-

cn
c
•^
i *
c
3
O
O
(J
03

CU
c_
03


          CO
03
3
•o

CO
cu
a:
                                 (Q
                                 a.
                                 CO
o
u

cu

I

o
V)
                 o

                 o
                                •a
                       CO
                       0)


                       o


                       Q.
                       CO
                                                              xiv

-------
         PROTOCOL FOR APPLYING AND VALIDATING THE CMB MODEL

1.0  INTRODUCTION

     The Chemical Mass Balance receptor model (CMB) and Dispersion Models
(DM) are two primary methods used to provide the source apportionment
which forms the basis of control  strategy development for PMJQ.  Other
methods are allowed (US EPA 1987C).  However, this document deals only with
the application and validation of the CMB model.

    The CMB model recommended by  the EPA is a refinement of a model
originally developed by Watson (1979) at the Oregon Graduate Center.
This model has undergone five major revisions, with major contributions
by Patrick Hanrahan and John Core of Oregon's Department of Environmental
Quality; Hugh J. Williamson and Dennis A. DuBose of Radian Corporation;
Luke Wijnberg of PEI Associates;  John Watson, Norman Robinson, and Judith
Chow of the Desert Research Institute; and Thompson G. Pace of the U.S.
EPA.  Version 6 of the CMB model, on which this protocol is based, was
prepared and tested under a cooperative agreement between EPA's Office of
Air Quality Planning and Standards and the Desert Research Institute of
the University of Nevada.  The software and protocols have been extensively
tested by research, regulatory, and consultant users who have been identified
in the acknowledgments.  The Version 6 software is described in a User's
Manual (U.S. EPA, 1987B).

     The fundamental CMB model equations have been subjected to verification
and evaluation using both real and simulated data as a part of the Quail
Roost II Conference (Stevens and  Pace, 1984).  Additional verification and
evaluation efforts have been undertaken by several investigators (Watson
et al., 1984, DeCesar and Cooper, 1982, Dzubay et al., 1984, Gerlach et
al., 1982, Currie et al., 1984; Watson and Robinson, 1984; Javitz and
Watson, 1986; Watson and Chow, 1986; Henry and Kim, 1986).  These efforts
provide a basis for CMB applications in regulatory assessments of PMig
levels.

     The CMB consists of a set of linear equations which express the
ambient concentrations of chemical species as the sum of products of
source compositions and source contributions.  These equations are a
direct consequence of the law of  conservation of mass.  This set of
equations is overdetermined (more than one possible solution) because the
number of measured chemical species generally exceeds the number of
contributing source types.  The source contributions are generally the
unknowns in these equations.  However, a unique solution cannot be found
for this set of equations because measurement uncertainty precludes
determination of exact values for source and receptor data.  When these
uncertainties are estimated for both source and receptor measurements,
additional physical constraints can be applied which will yield a unique
solution.  This solution minimizes the difference between calculated and
measured receptor concentrations  by using a weighting scheme, effective
variance.  The weighting has a physical significance in that it is derived

-------
from the measurement uncertainties of both source and receptor chemical
species.  Though the CMB solution is identical  to some statistical  inference
methods, it is not dependent on statistical  principles.   The basic  model
equations which represent the source receptor relationship,  the effective
variance weighting, and the error propogation are all  based  on physical
principles.

     The CMB provides a source contribution  estimate (SCE)  and associated
uncertainty (STDERR) for each source category that is chosen beforehand
for consideration with the model.  The model  produces this  estimate by
making an effective variance weighted least  squares "fit" between the
chemical composition of the ambient sample and  the composition of the
sources.  It estimates what amounts of each  source (the  SCE's) will
collectively best "explain" the chemical  composition of  the  ambient
sample.

     The objectives of this applications  and  validation  protocol  are to
provide technical guidance on:

   o  The selection of source categories  to  be  apportioned  by the model
and the selection of model input data.

   o  Methods to determine the vali'dity and  uncertainty  of  a specific
model  application.

   o   Methods to improve the quality and reduce the uncertainty of a
specific model application.

     This guidance consists of a seven-step  procedure.  This procedure is
necessary because there is no single parameter  or indicator  which can assess
the model's validity.  The seven steps are presented below  and discussed
further in Section 3.0.

     Section 1.0 serves as an introduction to this protocol  and summarizes
the steps which constitute the protocol.   Section 2.0 discusses useful  pre-
liminary analyses.  Section 3.0 presents  detailed guidance  associated with
each of the seven steps specified above.   Methods for assessing applicability
of CMB to the problems at hand are discussed  in Section  3.1.  Initial choice
of source categories and fitting species  for  use in CMB  are  addressed in
Section 3.2.  Section 3.3 describes the diagnostics and  statistics  which
measure the performance of the model and  identify possible  deviations from
model  assumptions.  Section 3.4 discusses the problems with  model assumptions
which are often encountered in an individual  application.

     Section 3.5 describes the rationale for identifying problems and
making changes to the model inputs if statistics and diagnostics indicate
that the model assumptions are not being adhered to.  These  problems include:
incorrect ambient data; incorrect source profiles; incorrect source identifi-
cation, uncertainty in source profiles and collinearity  (similarity of

-------
source profiles).  Section 3.6 describes methods to test the consistency
and stability of the model.  Section 3.7 discusses the need to examine the
model results with respect to the preliminary analyses and the model
statistics to determine its validity.   Appendix B includes examples and a
"quick guide" for changing the model inputs.

     Discussions of technical  issues concerning the use of the DM and CMB
models are found in the Guideline for Air Quality Models (Revised)
(U.S. EPA, 1986A) and the Chemical  Mass Balance User's Manual  (U.S. EPA,
1987B).  Guidance on the use of models in SIPs is discussed in U.S. EPA
1987C and guidance for reconciling  differences in DM and CMB results  is
found in U.S. EPA 1987D.

2.0  PLANNING THE CMB ANALYSIS

     2.1  Other Useful Receptor Methods

     A CMB application is often preceded by a preliminary examination of
existing data.  This preliminary analysis is  intended to identify the
source types and chemical species to be included in the model.  This  in
turn is used to support the physical basis for the sources included in the
fit.  It also provides the basis for determining the applicability of the
CMB model to the situation under study and to the available data.  Table
2.1 lists the receptor analysis methods which are the most useful.  Further
information on these methods can be found in  U.S. EPA 1987C, and technical
information can be found in the appropriate references listed  in Table 2.1.

     Another necessary activity which  precedes the CMB is an estimate of
PM^o background concentrations using the procedures in Appendix D of  the
PM10 SIP Development Guideline (U.S. EPA, 1987C).

     2.2  Levels of CMB Analysis

     After the CMB has been determined to be  appropriate for a given
situation, it can be applied with three sequential  levels of complexity
(U.S. EPA 1984A), each level being  more costly, but supplying  more accurate
and precise information than the previous level.  The levels are useful as
a shorthand notation of the general level of  comprehensiveness x»f a CMB
study but have no regulatory significance.  A given level may  not provide
valid results because of data limitations.  In such cases, the next higher
level may need to be undertaken to  complete the CMB.

     The basic level of CMB application (Level I) uses existing data  or
data that can be readily obtained from analyses of existing samples (Gordon,
et al. 1984).  This effort confirms the selection of contributing sources
from the preliminary analysis  and eliminates  minor contributors from
further scrutiny.  If the sources contributing to the high concentrations
of PMiQ are apparent and sufficiently  certain, no further work will  be
needed.  Otherwise, this effort serves to reduce the areas to  be studied in
greater detail under an intermediate (Level  II) analysis.

-------
Analyses
Chem Mass Bal (CMB)
Factor Analysis
 (FA)
Optical. Microscopy
 (OM)
Scanning Electron
 Microscopy  (SEM)
X-Ray Di ffraction
 .(XRD)
Trajectory Anal.
  (TA)
Emission Inventory
Carbon Analysis
Sulfate Analysis
Temporal  Analyses
Spatial Analyses

Activity Patterns
Episode Days
Source Profiles
Background Analysis
Meteorological
 Data
Tracers (Simple
 CMB)
                  TABLE  2.1
           USEFUL RECEPTOR METHODS
Most Useful  W/Suspected Sources
Source categories
Sources unknown
Biologicals, aged vs. new dust,
fugitive process emissions
Inorganic (non-carbon) source(s)
Crystalline sources (e.g., fugitive
dust)
Regional sources
Any local source
Combustion or biological source
Regional Sources
Seasonal .or intermittent source
Single source, line source

Seasonal or intermittent source
Regional Sources, Exceptional Events
Any local source
Local vs. Regional Components
Supports DM, Spatial Analyses, FA
Source
Tracer
Size
                        References
                        U.S. EPA 1987B
                        U.S. EPA 19858

                        U.S. EPA 19R3A

                        U.S. EPA 1983A

                        U.S. EPA 1983A

                        Chow 1935
                        U.S. EPA
                        U.S. EPA
                        U.S. EPA
                        U.S. EPA
                        U.S. EPA
                        1986A
                        U.S. EPA
                        U.S. EPA
                        U.S. EPA
                        U.S. EPA
                            •
                        U.S. EPA
                 1986A, 1987C
                 1985A
                 1981C.D
                 1978, 1984A
                 1978, 1984A,

                 1984A
                 1984A, 1986
                 1985C
                 1987C
                 1984A, 1986A
Auto
Fuel oil
Al umi num
Burning Biomass
Paint Pigments
Galvanizing,
refuse incin.
Marine Aerosol
Fertilizer
Resid. Oil
Pb
V
F
K
Ti
Zn

Na.Cl
P
Ni, V
~T
F
F or
F or
C
F

C
C
F


C
C







-------
     The intermediate (or Level  II) analysis involves additional chemical
analyses on existing samples or  the acquisition of additional  samples
from existing sampliny sites.  It is intended to fill the gaps in model
input data which may have been discovered in Level I so as to  reduce uncer-
tainty in results of the Level I source apportionment.  A comprehensive
CMB analysis (Level  III) involves the acquisition of new data  from new
source and ambient sampling activities.

     The CMB applications and validation protocol described here is
appropriate to all three levels  of PM^y assessment.  It provides estimates
of precision and validity which  serve to define the measurement requirement's
for the next level of analysis.   These estimates can also be used to
determine whether or not the model results at a given level of PM^y
assessment are certain enough to eliminate the need for more extensive
assessment.
3.0  APPLICATIONS AND VALIDATION PROTOCOL

     It is important that each application of the CMB be validated.   The
steps in a CMB analysis which will  increase the assurance that the results
are valid are:

(1) determine the general applicability of the CMB model to the application
    at hand;

(2) set up the model by identifying and assembling the source types, source
    profiles, and receptor concentrations needed for model  input.  Make a
    preliminary application of the model  to these data;

(3) examine the model's statistics and diagnostics to identify potential
    deviations from the model assumptions;

(4) evaluate problems which might result  from problems with model input data
    deviations from model assumptions;

(5) make any model input changes which can be justified to resolve the
    identified problems—rerun the model;

(6) assess the stability of the model  results and their consistency with
    the preliminary analyses; and

(7) evaluate the model results by comparing them with other receptor or
    dispersion model results and reconcile any differences  (U.S.  EPA 19870).

Successful execution of the CMB model  using these seven steps will increase
assurance that the CMB model is valid  for the particular application and
for individual cases within that application.

-------
     3.1  Determining Applicability of the CMB

     The applicability of the CMB model  must  be determined  for use with
each specific ambient and source configuration.  At  this  point, applicability
is qualitatively determined.   After the model  is run,  quantitative measures
are available, as discussed in Section 3.4.   The following  conditions  must
be met for the CMB to be applicable:

(1) a sufficient number of PM^Q receptor samples have  been  taken with  an
    accepted sampling method  to evaluate compliance  with  the annual  and
    24-hour standard (U.S. EPA 1987C)  or to  fulfill  other objectives of the
    study.

(2) samples are amenable to or have been analyzed for  a variety of chemical
    species.  Minimal analyses include concentrations  of  Al", Si, S,  Cl, K,  Ca,
    Ti,  V, Mn, Fe, Ni, Cu, Zn, Br, and Pb.  Preferable additional  analyses
    would include other elements such  as Cr,  As, and Se;  elements; cations;
    anions; and elemental and organic  carbon.   Analytical methods  should be
    chosen such that the concentrations of the majority of  these species
    will exceed the detection limit and the  variability of  the filter
    blank.  The uncertainties of these concentrations  must  also be available
    or estimable.  Sampling and analysis procedures  are discussed  in
    greater detail by Gordon  et al. (1984) and U.S.  EPA (1981B);

(3) the potential source contributors  can be  identified and grouped  into
    source categories of distinct chemical compositions with respect to the
    receptor species available from requirement 2.   The degree of  difference
    in source compositions necessary for successful  CMB model  application is
    discussed in Section 3.3.6;

(4) compositions for the source categories are obtainable which represent
    the source profile as it  is perceived at  the receptor for the  chemical
    species available from requirement 2.  Changes  in  source composition
    between source and receptor must be accommodated in order for  the  model
    to be physically meaningful.  Appendix C  of this protocol  discusses the
    possible changes to the aerosol during transport and  discusses methods
    to deal with these changes.  Initial source profiles  for Level I assess-
    ment may be obtained from the EPA source  composition  library (U.S.  EPA
    1985C).  Whenever possible, these source  profiles  should be determined
    by actual sampling and analysis of sources in the  airshed by appropriate
    methods (Chow et al., 1986, provide a summary of such methods);  and

(5) the number of source types in a single application of the CMB  must be
    fewer than the number of  chemical  species measured above lower quantifiable
    limits at the receptor.

     Unless all five of the above requirments are met, the  Chemical  Mass
Balance receptor model is not applicable to the situation under study.
These are necessary, but not  sufficient, requirements, and  it may  still be
found that even though these  requirements are met,  the precision and validity

-------
of CMB results are not adequate for control  strategy decisions.  The remaining
steps in the applications and validation protocol  must be taken to arrive
at this conclusion, however.


     3.2. CMB Model Setup and Initial  Run

     The CMB rrftdel is configured by selecting up to 35 chemical species and
up to 35 source profiles for inclusion in the input data files.  These are
the prepared files from which a user selects species and sources to be
included in a particular fit.  The number of individual  samples which may
be examined in a single application is limited only by computer file storage
constraints.  Very few ambient data sets will  contain measurements of more
than 35 species in quantities above detection limits.  A subset of these
species and source profiles is used in each  weignted least squares "fit" of
the source profiles to the receptor data.  A maximum of 21 species and 16
source profiles is allowed in an individual  "fit"  with the Version 6 soft-
ware.  The selection of which species  and source profiles to include in the
"fit" may vary from sample to sample.   Both  the source profile and receptor
measurements must be of specified quality.  Several other EPA documents
offer guidance on data quality:

o  The spatial and temporal representativeness of  the data are consistent
   with U.S. EPA, 1987C and 1986B.

o  The mass and chemical data used in  the CMB should be subjected to
   thorough quality assurance procedures in  accordance with an approved
   Quality Assurance Project Plan, prepared  prior  to the data collection
   (U.S. EPA, 1980).

o  Measurement precision should be determined from replicate measurements
   as prescribed in the QA plan (e.g., Watson et al., 1983).

     Quality assurance must be applied to source characterization measurements
and to meteorological measurements as  well as to receptor data.  The CMB
model can return results which are no  more precise or valid than the data
with which it is supplied.  Incorrect  data and uncertainty estimates are
major causes of inconsistent CMB results.  The applications and validation
protocol identifies some (but not all) of the cases in which the input data
are inadequate.

           3.2.1  Ambient Species and  Their  Uncertainty

     Each measured species provides information to help  the model  "fit" the
data and arrive at the best of source  contribution estimates.  All  available
validated ambient species data (up to  the 21 maximum allowed by the version
6 software) should be included as fitting species  in the CMB.  There may be
reasons to change the ambient data or  delete a species from the fit after
the initial model run, as discussed in Section 3.5.

-------
     It  is important  that:

(1)  values for all  species  are  present  in  the  source  profiles  and  in  the
    receptor data  used  in the  fit.   A default  value of  zero with a  standard
    deviation equal to  an analytical detection limit  may  be assigned  to a
    species in a source profile if  that  species  is known  to be absent  from
    that source type  from previous  tests of  similar sources.   If values are
    not  assigned to a species  in  the source  profile file,  the  Version  6
    software automatically  assigns  a value of  zero.   This  may  bias  the
    source contribution estimates  if that  species  is  actually  present  in
    a nonzero amount.

(2)  only one of different measurements  of  the  same species  (such as elemental
    and  total carbon  or sulfur  and  sulfate)  is included in  the fit.   If more
    than one measurement of the same species  is  included  in the CMB solution,
    then that species influences  the source  contribution  estinates  more than
    it should; and

(3)  species with  values below  detection  limits may be included only if
    their uncertainty is also  included.  Minimum detection  limits  may
    be used to estimate this uncertainty if  it is  not otherwise reported.
    If the uncertainty  is underestimated or  is not specified  (and  given a
    default value  of  zero), then  these  very  imprecise measurements  will
    have an excessive influence on  the  source  contribution  estimates;  and

(4)  if a compound  which is  secondarily  formed  or is normally  associated
    with regional  scale pollution (such  as sulfate) is  included as  a  fitting
    species, a "single  constituent  source  type"  (after  Watson, 1979)  must
    also be included  in the fit,  unless  that compound is  felt  to originate
    directly from sources which are included as  fitting sources.

     Uncertainties assigned to the measurements  for use in the CMB  should
be reviewed to ensure that  they are realistic  estimates.   Measurement
uncertainties should  be provided as part of  the  measurement process.
Typical  measurement uncertainties are  on the order of +-5% to  20%,  with
some species being more uncertain than  others  because of  analytical inter-
ferences and proximity  to detection li.nits.   Uncertainties in  source  pro-
files could be much greater, as discussed  in Section  3.2.3.   The model
considers these uncertainties  when it  develops the "fit".   Species  with
high uncertainties are  unlikely to be  very influential  in the  fit.

    Chemical measurements on PMjo samples  are usually reported with their
measurement uncertainties which are normally determined via error  propagation
of chemical analysis  and flow rate uncertainties (e.g., Watson et  al.,
1983).  These uncertainties are determined from periodic  performance  tests
and replicate analyses.  The reporting of  these uncertainties  should  be
specified when the measurements are made.   If chemical  concentrations are

-------
available without uncertainties, typical  uncertainties  may be assigned
based on those reported in previous analyses (e.g.,  Mueller et al.,  1983).
The value of the diagnostics provided by  the CMB software is substvtial ly
decreased without an adequate and accurate definition of measuremer:
uncertainties in receptor data.

     The individual  samples should be run separately in the CMB,  : "  "ost
cases.  Compositing  or combining the data from several  samples vil!  usually
decrease the number  of sources that the CMB can resolve.  Likewise,  separate
analysis of the fine and coarse  ambient samples collected by a siz=  fractiona-
ting sampler is preferable to analysis of a "total"  sample which  cj :bines
the two size fractions.  The sources contributing to these two size  -'factions
are generally quite  different.

           3.2.2  Source Identification -'The Fitting Sources

     A careful selection of sources for the initial  run of the CMB will
minimize errors in the fitting source list.  Col linearity may be  introduced
if extra or incorrect sources are introduced.  Section  2.0 summari .:ed  the
types of preliminary analyses which can be carried out  to assemble  i model
input file of up to  35 source profiles.

     The following is a reasonable approach to selecting a set of jj to
35 source types and  profiles which can be stored in"  the source file:

(1) include ubiquitous area sources, such as motor vehicle exhaust,  residual
    oil combustion,  and resuspended dust.  These sources are almost  universally
    present in all urban areas;
(2)  include  natural  sources,  such  as  sea salt,
    likely to  be  affected  by  such  sources;
                                               if the receptor is  in  an  area
(3)  include  point  sources
    inventory;  and
                          which have been identified from an  emissions
(4) include "single constituent source types"  (after Watson,  1979)  in  cases
    where substantial  amounts of secondary nitrate,  sulfate,  and  organic
    carbon are expected.   These are profiles which represent  only a single
    compound such as sulfate.  They are used to account  for that  portion of
    the compound which is not accounted for by sources  in the airshed.   It
    is mainly used for secondary aerosol  and sometimes  carbon compounds  for
    which a source cannot be readily identified.  The profile contains  only
    that compound and  its fractional composition in  the  profile  is  set  to
    1.0.  A compound which is secondarily formed or  is  largely emitted  by
    sources outside of the airshed should generally  not  be included as  a
    fitting species unless it is also included as a  single constituent
    source type.

-------
     Several  source profiles for each  source type may  be included  in  the
source profile input files,  but  only one profile from  each  type  should  be
included in a fit.

     The following  procedures can be followed to select  i subset of these
35 source types for inclusion in the initial  CMB "fit."

(1) review wind direction data and eluninate sources  tnar. lave virtually no
    chance of contributing a detectable concentration  be:-.use they are  downwind
    of the receptor;

(2) based on the emission rates, stack height, distance  :-om the receptor,
    microscopy and  other preliminary analyses, eliminate :.iose sources  or
    source categories that are minor contributing source, [minor miynt  be
    considered those sources in  the airshed that indi viV.,ally account for
    less than 1 to  2% of the local source contribution t;> a receptor  jmd
    are also minor  contributors  (< 5 to 10%) to each  measured species
    concentration].

(3) eliminate those source types which are not likely  to oe emitting  during
    the period of time being studied (e.g. woodsmoke  emissions during hot
    summer months); and

(4) select only one source profile per so'urce type (devei jp a weighted
    average profile, if necessary).

     The final selection of the most appropriate source  types and the profiles
to represent those types results from  interactive applica:ions of the CMB
with an evaluation  of the diagnostic measures (see Section  3.3).  It  is
possible that more than one subset of  source types and source profiles  will
fit the receptor data equally well. The interactive  application of the
model to different  source subsets will identify these  cases.

           3.2.3  Source Profile Representativeness and  Uncertainty

     The quality of the chemical measurements (elements  or  species) used in
the source profiles is just as important as the ambient  data quality.
Source profile values must not only be accurately and  precisely measured,
they must also represent the range of  variability expected  from a number of
individual emitters in the same source type category.

     Some sources have emissions that  are chemically  similar or consistent
over time - that is, although the absolute magnitude  of the emissions may
vary, the relative composition of many of the measured species present
in a source may be sufficiently stable.  However, the chemistry of some
species could be variable if the source changes its operating conditions,
feedstock or fuel.   This variability must be reflected in the uncertainties
that are assigned to each species in the profile.  (These concerns about
source profile variability are analogous to those faced by  the dispersion
modeler when estimating emission rates or dispersion  parameters).


                                    10

-------
     Because the CMB model  uses the information provided by all  species
included in the fit, mis-estimation of a single species, even so-called
"tracer" species, may not appreciably affect  the source contribution
estimates.  This is especially true if these  species have been assigned
uncertainties which reflect their variability.   When these uncertainties
are adequately estimated, other, less variable  species provide a larger
influence on the source contribution estimates.  Appendix A contains
additional discussion on the source profiles  and their uncertainties.

           3.2.4  Initial Run of the CMB Model

     Once the ambient data and source profiles  and uncertainties have  been
assigned, it is time to make the initial model  run.   The user's  manual
(U.S. EPA 1987B) provides instructions for running the model.  All  selected
sources and all fitting species are initially included.  Following  the
initial run, the source contribution estimates  (SCE) and the statistics and
diagnostics will help to determine the validity of the model results.   This
step is similar to comparing the results of a dispersion model to measured
data in that it provides feedback on the model's performance.  Three output
displays are provided:  1) the source contribution display; 2) the Uncertainty/
Similarity display; and 3) the species concentration display.  Also, several
commands are available to provide the user with additional diagnostics and
detail.
     3.3 Outputs, Statistics and Diagnostics - Definitions and Interpretation

     The primary outputs or variable values estimated by the CMB are:
(1) s_ource contribution estimates (SCE),  (2) the std. error of the SCE,  and
(3) the calculated concentration for each species.   These outputs are  discussed
in detail in the user's manual to the CMB software (U.S. EPA 1987B).

     The  CMB also provides several  performance statistics and diagnostics
that indicate how well the model "fits" the specified data and, also,  what
problems may be affecting the model  outputs.  They can also be used to
identify data which may be in error and deviations from the model assumptions.
These include:  (1) percent of total measured mass accounted for; (2)  R-
Square and Chi Square; (3) t-Statistic; (4) Source Uncertainty Clusters;
(5) RATIOS and residuals of the species;  (6) Source-species contributions.

     The following is a working summary of the meaning or interpretation
of each diagnostic and performance statistic.  A technical description of
how each is calculated is presented in an Appendix to the CMB Version  6
User's Manual (U.S. EPA 1987B), and Appendix B to this document illustrates
their use.  Table 3.1 provides a useful reference summary of the outputs,
diagnostics and statistics.  Target values are suggested for each diagnostic/
statistic.  These rules-of-thumb are generally based on a combination  of
experience and theory.
                                     11

-------
           3.3.1   Percent Total  Mass  Accounted For

     The sum of the source contribution estimates  should account for
essentially all of the measured  mass  within the uncertainty of that sum
and of the measured mass.   Total  mass  is never used as  a fitting species;
thus, it is very useful as a goodness-of-fit indicator.   Percentages outside
of the range of 100 +_ 20%  could be caused by incorrect  source profiles,
incorrect ambient mass or  species data, too many or too  few source types,
or lack of measurements for high  concentration species.   A % mass outside
of 100 _+ 20% coupled with  a low R-Square and a high Chi-Square (see Section
3.3.2) Ts cause for concern.  However, a % near 100 could be misleading
because a poor fit (or collinearity)  can "force" the %  mass to be near
100%.  Sources containing  species which have high  residuals (Section 3.3.5)
may be misspecified.  Low  values  for  percent mass  could  be caused by unmeasured
species.  Also, primary emissions from combustion  sources often contain
substantial quantities of  elemental or organic carbon,  and their gaseous
emissions often result in  secondary particles such as sulfate or nitrate
which comprise significant fractions  of totaj mass.  Measurement of these
species and inclusion of single constituent "source types as fitting sources
can increase the value of  the percent  mass accounted for.  [Target 100%  _+
20%].

          3.3.2 "Chi-Square and R-Square -- Goodness of  Fit

     Both Chi-Square and R-Square statistics are indicators of the goodness-
of-fit of the model estimates.

     The value of Chi-Square is inversely proportional  to the squares of
the uncertainties in the source profiles and in the receptor data.  A high
Chi-Square (generally greater than 4.0) also suggests that the model has
not  adequately explained  the values  for the fitting species.  A high
Chi-Square may result if the source or ambient uncertainties have been
underestimated, even though the fit is totally correct  and is the best
obtainable, given the true uncertainty of the input data.  [Target 0.0 to
4.0].

     R-Square is a measure of the variance in the  ambient species data that
is explained by the species values used in the fit.  This is done by
comparing the original species concentrations to those  calculated from the
model estimates.  A low R-Square  (generally less than 0.8) usually suggests
that the model has done a  poor job of  explaining the receptor concentrations
of the fitting species.  This could be due to errors outside of the stated
uncertainty intervals for  source profiles or ambient data, or the poor fit
could be due to "missing"  sources (i.e., sources that should have been
included in the fit but were not).  If the percent of total mass accounted
for is low when R-Square is low (see  Section 3.4.2), then "missing" sources
are possible causes.
                                     12

-------
















< — 1
•
oo

LU
_l
CD
I—















OO
h^
t—
CO
to
C_5
*— i
\—
00
o
~^*
13
^£
t—t
Q

•«
OO
^_5
1— t
1—
oo
t— t
h-
^£
r—
00

A
00
H~
C3
O_
1—
O

oo
>-
^c
_l
Q.
OO
t— t
C

	 1
1 1 1
Q
O
estimate, in ug/m^ of each source or
tribution to the fitted ambient data.
c
o

4->
a.

^
o
to
01
Q

Ol
JC
+J


• •
4_>
3
CL
^>
3
O

>^
£_

IO
>^»
>
01
s_
_Q
<






LU
O
oo


c
o
o

to
«
^^
c_
o
31

40
10
U

Ol
U
t_
3
O
in

















c
t_ LU
o oo
c
3 V
V
0 4->
Ol
c en
0 I-
•r™ fO
to 1—
•r— 1 	 1
O
Ol
C- .
CLLU
O
O) 00
4-* O>
JC
<4- 4->
0
t- 0
0
4-> Ol


.. ai

3 4->
Q.
40 *
3 LU
O 0
C/}
>^
{_ 0)

•r™
C- <4_
a. o









f*V*
r^
LU
Q
r—



0 1
•r- C
4-> O
fO C
t_
 4-> ai
•r~ t-
TI3 (O
0) O r—
to 4-> 1 — 1
3
LU
(JO •
•t- OO LU
4-> C_3
to O) OO

Z> 4-> O
10 t-
40 
CO C
3 Ol
o
c
.. o
U O
.^B
4-> tO
to Ol
•r~ fr—
4-> U
ia oi
4-» a.
oo to







LU
r^

o
01 00
a. •
to O

















to
t-
ai • -a
T3 tO Ol
•i- C C
in O •»-
C •!•• IO
O 4J f—
u 10 a.
t- X
O 40 Ol
to c
•— Ol 4->
•o u o
c c
4-> O
•I- O tO
IO
4-> to .C
(O 0)
^1 -r— r— •
4-> O Ol f~~l
Ol T3 O
4-> a. o •
CL to E **
O> 1
0 T3 O) O
X O) £2 •
ai 4-> 4J o
to
1 i 1 r— | i ^_)
a; 3 10 01
<: o jc 01
^> r— 4J t-
oo o to I —
1 4-> 1 	 1
c: 01 to
JC Ol
o 4-> en «
4-> O)
"4- 3
t- o to
*o
r— tO LU
•r- Ol 3£
g *r» ^C
>rv 4^ ~~^
to c cx
•«- oo
f^3 |
• • -}^j t— t
0 t_ Z
•i—  *t—
(O 0) JC
40 JC
00 4-> 
3

*O
^-J

-a


Ol

o
01
o.
to

O)
^*
4->

















OJ
JC
4->
C
to
E
Ol
•r—
to
o
JC
[ *

0
4->

T^j
O)
to
3


• •
O
• 1—
] <
to

^J
10
4->
to




00
•<
s:

i—

LU
<_>

LU
Q_


C_
0
Ol O
jc a.
O Ol
to
E 3 •
3 IO 1 — 1
to o a*
Ol O
0) JO CM
JC
4-> en + 1
c
JO* "5 O
rO O
Ol i—
•r- -i- Ol
to E o>
i — (_
CL Ol  01
c c: -i-
•!"• JC
oi oo
jO oo to
«c
re S Oi
O 0

j_* ^5
o < >»-
to
[_ C
c>_ • 10
to o
Ol -
N LU 4->
•r- 0 -r-
to to t-






00
00
<


S* .

c_
o


c
to
O)
o
c_
3
O
o
c
1
c
to
•r—
(J
Ol
a.
to

.
O

1 — 1
ur>
• •
O A
•^
4-* | '
tO O)
•r- 01

to to
( * ^i—
OO 1 — I













u_
a


it-
Ol
c
•r-
T3
0)
to
3

Ol
C
•r™
OJ
_Q

fO
4_>
t3
"^


• •
CO
3

fO
4J
CO





•
u
^J
a;

«s
LU
^_
i-^
C^>


Ol
io*
•a "a,
O i/l
C_3 tflH
-^ Q
U
4J 0
to ••-
•M 3
13 ^
•^> *r*
to «-
**••» 40
4-> C
3 O
a. o
4^
3 0)
0 0
CB
3
O
tO
^~
to

•M
(/)
LU

o
•r-
3
o
.^
t_
4J
C
O
CJ

Ol
u
£v
3
O











t_
O
t_
«- 0
LU •!-
^_>
"O l/>
t- •?—
IO 4->
"O ro
C 4J
to m
40 i
OO 4->
















Ol
t_
rO
3
cr
i/i
i
Q£.
                 O)
                 3
                 cr
                 to
                                  c.
                                  o
                                  O)
                                  4->
                                  c
                                  3
                                  O
                                  U
                                  in
                                  in
 C
 Ol
 U
 t_
 Ol
a.
                                                   •a
                                                    01
                                                    (U
                  to
                  (U
                                                    en
a.

10
to
                                                              oo
13

-------
                                         Q-§
                                     •—•MO
                                      
                                     •a >,
                                  c  o i—  (o

                                 •r—     x**  Q;
                                 •M  0) -i-  TD
                                  o      a; o
                                  CLJ=  t- C.
                                  E  O  «3 Q.
                                  O -I—


                                  0)  S  «> -M
                                  i—  t-  C f—

                                  >  o     5
        GO
        00
        oo
        o
x-s     OO
•a     cj
 D     >—i
 3     I—
 C     00
 o
 o
 r^     oo
  *     ^^»
 00     Z3
        O-
.UJ     I—


 as     o
        CL.
        oo
        UJ

        o
                                              O)
                                  •i-O)
o
in

&
                                  <4—  3 •«—

                                  O r—  3
                                  E  O  i/)  ••- I—


                                  3  CO 13
                                  CO         CT>

                                      4-)  >> C   •
                                  CO -r-     at


                                  C  >,  O C  O





                                  Q  
                                  00


                                  UJ
                                  oo
                                  -O
                                  O


                                  OO
                                      I
                                     •i-  >»-
                                     (U      O  QJ
                                        TJ     J=
                                                                   OO
                                                               -i-  o
                                                            
                                                        OO -r-  CO  CO
                                                            ai -r-  j=
                                                        oo     i—

                                                            3  01


                                                        at      a;  a;
                                                        E  at  c  >
                                                        O> -C     O

                                                       "a.     o  CL

                                                            O     -r-

                                                            co   •    CO

                                                        O          3.1
                                      oj      ai  ai  s
                                     i—  CO     -MO
                                      CL 11   •  CO  <—
                                                                   i — O
                                                               ••-      3
                                                               <4-      CO


                                                               C      a
ai
                                                                                   o
                                                                                   o
                                                                       o
                                                                       o
                                                                                   to      co
                                                                                   3      3
                                                                                  OO    oo
                                      + 1
                                                        oo
                                                                                                     
                                                                                                             OJ
                                                                                                             o
                                                                                                            O  -r-
                                                                                                            o  •*-
                                                                                                             cu
                                                                                                             3  -r-

                                                                                                             CJ
           fC  O)

           O T-
               CL
           r—  CO
           ai
           •a jz

           E  ts
               O)
           at

           4->  O
           3  -M
           Q. C
                                                                                                            O-t->
                      C
                      •r-  t_
                      ia  at
                      •M  TO
                      t-  c
                      at  3
                      o  ~>»
                      c  c-
                      3  ai

                      at  o
                                                                                                                           C-O

UJ     LO




O  -r-

_1  O •*->


                                                                                                                        o  at  c
                                                                                                     fO     O


                                                                                                    Q  O  (O
                                                                                                 OO
                                                                                                            O
                                                                                                            _J
                                                                                                            «c
                                                                                                                       o

                                                                                                                       o
        oo
CO
>>

^"
a.
(/)
ai
•o
o
o
*^s».
0
•f—
1 *
I/)
•M

4->

^
3
Q.
4->
3
O
•r-
O

>»

>PB
t_

(/)
3
P^
o

>>

•r-
L.
*o
•—
'i

oo
^*^
£
c
•r~
(0
^->
c.
a>
0
c
""2








CO
at
o
t_
3
O
oo
•o
at
^
•P_
i
o


<4_
O

E

oo
CO
41
>«j
(O
P"~
a.

•r~
0

C
o
•1—
1 *

g*
at
ai
<_
3

 at

*i
at
at c
J= -r-
•U CO

C i-
•-^ SI

c
o
•r-
+-)
(Q
(_
1 *
C
at
o
c
0


at
•r-
O
at
OL
^

^D
OJ
t_
3
CO
^O
at
21
O
•r—
4^
/O
t-
^J
C
at
o
c
0
o


CL
OO
•o
at

T3
^^
3
O
r—


^3
at
-M
r^
3
0

O

(^
o

0

^.J

-------




<•— ' -s
~O
O)
3
C
•r—
4-^
C
O
o
*^*

I— 1
CO

UJ
CD
[AGNOSTICS AND STATUS)
iption
Q l_
O

oo o>
CJ ^^
1 — 1
1—
00
*— (
1—
^C
^—
00

A
00
^_

Q.
1—
O
signed difference CALC-MEAS divided by the
lat difference. Used to identify species
* under-accounted for by the model.
-h_ *•
U ti? Q
.c
4-> 14— 1
o <-
a>
* * **> ^
O 40 O
•r- C
4J -r- 0)
(/} T3 fi—
O 4O ITS
fc e
en ^j 4-j
fO U 





(— 1
— —
0
•
CM

V
	

4->
(U
en

TO
1—
1 — 1




vs the fraction of each measured species
o
.c
to


• •
u
• ^
4_>
(/)
O
c
en
fO
tfH.
O




•o
Ol
4->
o
0)
rs
c_
0
o>
c
3
o
o
o
IT3
^
4->

C
0
•r-
j *
!O
£_
4_>
C
OJ
u
c
o
o




source or source category. This can be
icular source if that species is over-
.c
o

OJ

^
0
<4-


c.
13
Q.

ft3

c
0
(4~

o
•
t— i

A




to
Ol
1
C
O)
•r—
0
O)
3
to
1— t
0)
j^
iO

c_
o
if_

T3
0)
^j
C
3
0
o
o
••Q




Ol
o
Ol
Q.
i/l
t_
3
O
•r—
t_
a.
0
en
c
c
3
O
o
o
r.
r—
2




oo
Q_

OO
LU
O
o
       0£


       O
           0)
                                          o
                                          C_J
                                          00
                  (U
                  o
                                c
                                o
 o


 u
•I—
4->
 10
                                          3
                                         JO
                                          C
                                          o
                                          o
          OO
                  to
                  a;
                                          3
                                          O
 Q.
4->


O
                  o

                  o
                  SO
                 on
                        •a
                         c
 to
 a;
•r-
 U
 
-------
     R-Square is mathematically related to the reduced Chi-Square as  shown
in Appendix A.  It will  generally approach a  value of 1,0 when the reduced
Chi-Square indicates a good fit.   Generally speaking, the Chi-Square  statistic
is more revealing because it indicates  how well  the model  has  explained the
receptor concentrations  relative  to the source and receptor measurement
uncertainties.  R-Square does not contain this relationship to measurement
uncertainty.  [Target 0.8 to 1.0].

           3.3.3   t-Statistic

     The standard error  of the source contribution estimates is an important
indicator of the precision, or certainty in the model estimates.   The "t"
statistic is used to interpret the standard error.  Botn  statistics are
found on the source contribution  display.  A "t" value of less than 2.0
is generally used to identify model  estimates that are not  significantly
different from zero.  -Source estimates  with small  "t" values may  indicate
that the source is not contributing a quantity which exceeds the  detection
limts of the modeled system.  A low t-Statistic also implies that the
standard error estimates themselves are uncertain  representations of  the
true standard error.

     The presence of low t values for several sources within a run may also
result from collinearities among  the source profiles or imprecision in
either the source profiles or the ambient data.   Section  3.3.4 discusses a
more definitive indicator of collinearity that is  provided  on  the source
uncertainty clusters display.  The preliminary analyses are useful in
corroborating whether a  suspected source is a likely contributor  at the
monitoring site.  Corroborative use of  the results of preliminary analyses
is described in Section  3.6.  [Target t > 2.0].

           3.3.4  Uncertainty/Similarity Clusters

     The interaction of  certain source  profiles  may lead  to high  standard
errors (and low t-Statistics) for one or more of the sources.   High standard
errors associated with a particular source contribution estimate  may  be
attributable to the profiles of other sources as well as  to the uncertainty
of its own profile.  The uncertainty/similarity display helps  to  identify
clusters (U/S clusters), consisting of  other possible sources  whose profiles
may be interacting to cause high  standard errors on one or  more of the
sources in the cluster.   It also  contains an estimate of  the SUM  of the
SCE's (and the uncertainty of the SUM)  for the sources in each cluster.

     The uncertainty/similarity display identifies sources  which  may  be
collinear or whose source profile uncertainties are large,  making it
difficult for the model  to distinguish  between those particular sources.
These difficult-to-distinguish sources  are the "clusters" and  the sources
in the cluster are identified by  number.  Only tnose clusters  which con-
tain a source whose SCE  is uncertain (i.e., a t-Statistic < 2.0)  are  dis-
played; other clusters may be present but they are not displayed  because
                                    16

-------
the SCE's for their sources are acceptable as is.  A Singular Value
Decomposition (SVD) analysis is performed to determine these clusters and
the results are summarized in this display.  The clusters are listed in
rank order such that the cluster which causes the largest overall  uncertainty/
similarity problem is listed first.  The criteria for screening the SVD to
identify clusters is discussed in the user's manual.  The inherent uncertainty
is used to make this ranking (after Henry, 1982).

     Reducing the values of the standard errors of the SCE's is the ultimate
goal once the correct sources have been included in the model run.  The U/S
clusters display identifies those sources whose profiles, if improved, would
most substantially affect the model's ability to estimate SCE's with reduced
standard errors.  The sources in these clusters are candidates for improve-
ment to the source profiles, either by more precise measurement of the
profiles or by adding additional species to make the profiles less similar
(less collinear).  Further discussion of collinearity and source profile
uncertainty can be found in Appendix A and Section C.3 of Appendix C.  The
calculations are documented in an Appendix -to the CMB Version 6 User's
Manual.

     SCE's of sources within groups which are near the top of the  list will
have the highest standard errors or uncertainties.  However, the estimate
of the "SUM" of the SCE's of all of the sources in a group is known with
more relative certainty than are the individual SCE's.  The program computes
this sum and a revised uncertainty for the sum and presents this information
on the right side of the display.  This sum may prove useful for a particlar
application because of the reduction in uncertainty.  The sum is equivalent
to combining the source types in the uncertainty grouping into a single
source type.  Source type resolution is lost, but greater confidence in the
sum of the source contributions is gained.  [Target - no clusters],

         3.3.5  Ratios and Residuals of Fitting Species

     The RATIO C/M of the calculated (CALC) species mass to measured (MEAS)
species mass is a convenient indicator of the magnitude of the residual.
Ideally, it is equal to 1.0.  A ratio » 1.0 means that more mass  for a
given species was accounted for by the model  than was measured on  the
ambient sample.  This statistic is found on the species contribution display.
[Target - 0.5 to 2.0].

     The RATIO R/U statistic, also found on the species contribution display,
is useful for interpreting the significance of the ratio of calculated to
measured receptor species concentration.  The residuals (R) are the signed
difference between the CALC and the MEAS values for each species at  the
receptor location.  The uncertainty (U) is the uncertainty in the  estimate
of the residual.  If the absolute value of the residuals statistic exceeds
2.0, the residual is high enough to be of concern.  [Target |< 2.0|].
                                     17

-------
     A RATIO R/U » 2.0 or « -2.0 for a species  could be caused by the
following:

1) incorrect ambient measurement  of that species;*

2) incorrect source profiles for  that species;*

3) inclusion of noncontributing source in the fit ("high positive"  residual
   (> 2.0) only); or

4) absence of a contributing source from th"^ fit  ("nigh negative" residual
   (< -2.0) only).

*This would include underestimating the uncertainty of that species in
 either the ambient data or one of the source profiles.

     Both the RATIOS C/M and R/IJ  provide insight  into the magnitude of
the difference between the measured a'nd calculated mass-(the residual) for
each species.  The RATIO R/U is generally a more  revealing indicator since
it combines both the magnitude of the residual  and the uncertainty  in the
estimate of the residual into a single measure.

           3.3.6  Source-Species  Contributions

     . The SSCONT command shows the percent contribution of each source to
each species included in the fit.  This information is used to identify
potentially incorrect profiles or an incorrect  source list which might be
responsible for large species residuals (RATIO R/U).   For example,  if a
species has a large residual and  the SSCONT indicates that a particular
source "accounts for" almost all  of that species  CALCULATED value,  tne
profile value for that species should be carefully reviewed along with the
ambient data for that species to  determine which  is in error.


     3.4  Deviations from Model Assumptions

     The CMB diagnostics and statistics discussed in  Sections 3.3.1 through
3.3.6 indicate when deviations from model assumptions may have occurred.
These deviations do not necessarily invalidate the CMB results-they merely
indicate the potential for invalidity.  This is why a separate step is
necessary in the applications and validation protocol which evaluates the
effects of these deviations from  assumptions and  determines whether or not
these effects can be tolerated.

     The CMB model  is based on several assumptions and the model should be
applied in a manner that is as consistent as possible with those assumptions.
These assumptions are explicitly  listed below and are further discussed in
Appendix C to this document.  Deviations from these assumptions may result
in unacceptably large errors in the source contribution estimates.   Therefore,
these assumptions should be reviewed when the model is applied to ensure
that expected deviations from them will not significantly bias the  source
contribution estimates.  The assumptions of the CMB with an effective variance
solution are:

                                     18

-------
     1.   Compositions  of source  emissions  are  constant  over  the  period  of
         ambient  and source  sampling.

     2.   Chemical  species do not react  with  each  other,  i.e.,  they  add
         linearly.

     3.   All  sources with a  potential  for  significantly  contributing  to the
         receptor have been  identified  and have had  their  emissions characterized,

     4.   The  number of sources or source categories  is  less  than the  number
         of species.

     5.   The  source compositions are linearly  independent  of each other.

     6.   Measurement uncertainties are  random, uncorrelated, and normally
         distributed.
     3.5  Identifying and Correcting  Problems  by  Changing  the  Model  Inputs

     There are four main categories of  problems which,  once  they  have  been
identified, can be addressed to improve the  performance of the model.   The
problem categories are:  1) incorrect  ambient data;  2)  incorrect source
profiles; 3) incorrect source list; and 4) profile  uncertainty/similarity.
These are discussed briefly below and with   a  Quick  Guide  and  examples in
Appendix B.  The following subsections  discuss ways  in  which the  model's
diagnostics and statistics discussed  in Section 3.3  may indicate  possible
problems with the model  input, their  possible  causes and corrective  action.
Note that in some cases, not all  "indications" must  persist  for a problem
to be present.  The more "indications"  that  persist, the more  evidence of
a problem.  Because of the complex interactions of  all  of  the  data in  a
least squares estimate,  the statistics  or diagnostics  may  not  always be
adequate to conclusively isolate a problem with model  input.  Additional
physical evidence is also very helpful.  A flowchart is presented in Section
3.5.5 to provide a systematic approach  to identifying  and  correcting
problems.

           3.5.1  Correcting the Ambient Data  - Gross  Errors

     There may be inaccuracies in the ambient  species  that have not  been
uncovered in the routine data validation. If  the data  are "suspect" and
there are no apparent data entry or analytical errors,  the next step would
be to eliminate  the suspect species  from the  fit and  rerun  the model.
Examine the changes in the estimates  for each  source.   If  the  estimate
changes by more than one standard error, and if the  receptor concentration
or a source profile value for the removed species is suspect,  then either
remeasure the species or use the SCE  calculated without that species in the
fit.  Example B.I in Appendix B illustrates  the identification of incorrect
ambient data.
                                     19

-------
INDICATION:

         o RATIO R/U « -2.0  for  a  species  suggests  either  the  ambient
           data are high or the profile  data  are  low for  that species;*

         o RATIO R/U » 2.0 for a species would  imply that  the  ambient
           data are low or the profile data for  that species are  high.   If
           profile data are suspect,  see Section  3.5.2 and  problem  B.2.*

ACTION:     1.  Review the uncertainty  assigned to  the species with the high
              residuals.  Make any  justifiable and appropriate  changes and
              rerun the CMB.   If  this improves the RATIO  R/U, Step  2 is  not
              necessary.

           2.  Delete the suspect  species from the list of fitting species
              and rerun.  If  the  SCE  changes  by  at least  one standard
              error, do not use this  species  in  the  fit until it  has been
              remeasured.

           *  (NOTE:  RATIO R/U can also indicate an incorrect  source  list  -
                      see Section 3.5.3; also, it can be  due to an  under-
                      estimated uncertainty for  that species in either
                      the ambient data or one of  the source profiles.

           3.5.2  Correcting  Source Profiles  - Gross Errors

     A gross error in the value of  one or more species in a profile might
result in a high standard error in  the SCE  and a  high residual  for  those
species. Therefore, one or more high  residual  values su-ggests that  the uncer-
tain source profile (and the  associated  species  in particular)  be checked
and remeasured if necessary.   The high residual  is a likely species to check
for errors.   Appendix B contains  Example B.2  which illustrates  the  problem
of gross errors in a profile.  Indications  of the problem are given below.

INDICATION:

          o SCE that is inconsistent  with preliminary analyses  or physical
            evidence;

          o one or more species has a "high (pos. or neg.)" residual which
            cannot be attributed  to incorrect ambient data; further evidence
            of species error  if the SSCONT  reveals that one source  contribution
            dominates that species.

ACTION:    Review profile data for the suspect species carefully.  Correct
          or remeasure profile if necessary.
                                     20

-------
           3.5.3  Changing the Source List

           3.5.3.1  Missing Sources

     Missing source types are identified by a  low percent  mass  explained
(e.g., less than 80%)  and/or a RATIO R/U « -2.0 for chemical  species  which
are in the missing source.  A "high negative"  residual  for one  or more
species and a high Chi-Square are also indicative of missing sources.   The
key to identifying these sources  resides in the  calculated to measured
chemical  concentrations listed in the SPECIES  CONCENTRATION display.   "Hign
negative" residuals imply that a  source is needed which will  supply  a
larger quantity of that species.   The PHATRIX  command lists all  of the
source profiles in the  model's input data file.   These  profiles  can  be
examined to determine  which ones  would would supply  sufficient  quantities
of the missing concentrations if  they were added to  the set of  fitting
sources.   The CMS can  be reapplied as many times as  is  necessary to  deterni:
which source types and  source profiles best account  for the underestimated
receptor concentrations.  A source should not  be included  in the final  fU
just because it "explains" the data; however,  there  must be a physical
justification for the  source's contribution at a receptor  if it  is to  be
included in the fit.

     The source list can be changed by adding  or deleting  sources.  Example
B.3 illustrates the identification of a missing  source.   Indication  of a
missing source is given by the following conditions.

INDICATION:

          o High Chi-Square;

          o Low percent mass explained;

          o RATIO R/U  « -2.0 (a  "high negative" residual)  for  one or  more
            species that are known to be present in  the suspect  source.

ACTION: Add source profiles to the fit and reevaluate.

           3.5.3.2  Noncontributing Sources

     Noncontributing source types, or better stated, source types with
contributions lower than detection limits, are identified  by  T-STAT  values
below 2.   Such source  types may be eliminated  from the  fit  if tne source
contribution is indeed  small.

     Noncontributing source situation is illustrated in Appendix B in
Example 8.5 and Indications are summarized below.

INDICATION:

          o T-STAT between -2.0 and 2.0

          o RATIO R/U  » 2.0 ("high positive")  residual  for a species  which
            is attributed to the  suspect source  by the  SSCONT diagnostic

                                     21

-------
          o Negative SCE

          o Physical basis for the source's  contribution  is weak.

ACTION:  Delete source from fit.*  However,  if the source's contribution
         (SCE) is large and there is a strong physical  basis for us
         presence, the profile should be remeasured to  reduce its  uncertainty.

         *NOTE:  If the source is present but a very small  contribution to
          total mass, it should only be removed from the  fit if the  SSCONT
          shows that none of the species in  the source  account for more than
          5 to 10% of the ambient concentration for those species.

           3.5.4 Improving Source Profiles—Uncertainty/Similarity

     As discussed in Appendix C.3, there are two reasons  (other tnan  yross
errors) to improve the source profiles:  (1) high profile uncertainty;
(2) collinearity with low profile uncertainty.  This section disc-sses
methods of identifying col linear sources and ways to reduce the i,  certainty
in SCEs.

     The uncertainty/similarity display identifies those  source !:„••;es v/h->c''i
interact sufficiently to contribute to large standard errors in tie  source
contribution estimates in'that group.  An indication that the interaction
may be significant can be seen by noting the sources on the ll/S cijster
display.

    A simple test is proposed to determine if the uncertainty in the  SCE  is
due to high profile uncertainty:  reduce the uncertainties  in the  profile to
levels that might be reasonable to achieve if the source  profiles  were  measured
more precisely; then, rerun the CMB - if the clusters containing those
sources are no longer listed, it is likely that collinearity per se  is  not
significant.  Remeasurement of the profile will probably  improve the  unce^-
tainties of the source contribution estimates.  It is possible that  reducing
the uncertainty will not eliminate the clusters but the SCE uncertainty v i ~ 1
likely be improved somewhat.  This would suggest that collinearity is also
present.  Appropriate action is discussed below.  Example B.4 illustrates
this problem.

INDICATION:

          o Two or more sources listed in a  U/S cluster

          o T-statistic < 2.0 for one or more sources in  that cluster—
            if the T-STAT becomes > 2.0 when species uncertainties for  profile
            for that source is arbitrarily reduced to a potentially  achievable
            level, this indicates that the uncertainty  in the source  profile
            is at least partially responsible for the "apparent" collinearity.

ACTION:  Remedies for unacceptably high uncertainties due to collinearity
         can take five forms ranked from most to least  desirable.
                                     22

-------
(1) the profile of one or more of the cluster sources could be improved by
measuring additional  species.

(2) Reduce the uncertainties in the source profiles of the cluster sources.
If the T-STAT becomes > 2.0, and if these profile uncertainties are realistically
achievable by remeasurement, then the "apparent" collinearity can be improved
in large part by improving the uncertainty in the profiles.  Ideally,  t'ne
U/S cluster tor that  group of sources would disappear.  Remeasure and  r^rjn
the CMB with the improved measurements.   More precise source profile
measurements must be  obtained before re-applying the model.

(3) The estimate of the SUM of the source categories in the U/S cluster can
be used.  Obtain independent estimates of the contributions of the individual
source categories-and use them to apportion the SUM into the source categories.

(4) combine the profiles of the collinear source profiles into a single
signature of a "composite source category" that chemically represents  the
source categories identified by the U/S  cluster.  For example, resuspended
road dust and windblown soil dust are chemically similar, and some modelers
include a single term to represent "crustal material" instead of the two
individual source types.  This would result in improved source estimates of
t:ie crustal component, which can then serve as an estimate of the combined
impact of the two sources.  This aggregated estimate might then be partitioned
into its components by another method (e.g., dispersion modeling, microscopy,
or wind trajectory analysis).

(5) Species which are causing the similarity in source profiles might  be
deleted from the fit.  These species can often tie determined from the  display
produced by the SSCONT command.  Often one of the cluster sources will  be
» 100% for that species and the other will be negative.  Unfortunately,
eliminating too many  species from the fit may cause the model to fail  the
applicability requirements in Section 3.1.  Also, the results should
acknowledge that the  deleted source may  be present.

          3.5.5  Problem Identification  and Correction Strategy

     Figure 1 shows the order in which the above actions should be taken.
Generally, it is best to make only one change at a time to the model  setup
before rerunning.  An exception is that  data errors should all be corrected
when they are identified.  This stepwise procedure may necessitate cycling
through the steps several times.  Each time a change is made, it may "clarify"
the need to make a change that was not evident on a previous iteration  (e.g.,
you may address a collinearity problem and reveal a data problem previously
unidentified because  of the collinearity).  The cycling process is repeated
until no changes are  justified by the criteria in Section 3.5.1 through
3.5.4.  Example 8.6 in Appendix 8 illustrates a solution where multiple
problems are present  after the initial run.  The use of this flowchart  will
greatly increase the  consistency of CMB  application among users.  However,
some operator judgments are necessary regarding data validity and corrective
action(s) to address  collinearity.


                                     23

-------
                                                      TEMPORARY "FIX-
                                                    DELETE SOURCE FROM
                                                      TOP U/S CLUSTER
                                                      W/ SMALLEST SCE
                                                    PERMANENT SOLUTION
                                                        REFER TO
                                                       SECTION 3 51
                                                     RE-INCLUDE DELETED
                                                      SOURCES » RERUN
Figure  1. Flowchart for Problem Identification And Correction
                              24

-------
     3.6   Consistency/Stability of the Model  Results

     The CMB source contributions should be compared to the results of
other receptor methods to provide corroboration of results.  In addition to
providing corroboration of CMB estimates, qualitative receptor analyses can
provide clues to the causes of unresolved issues in the sensitivity and
summary statistics reviews.  For instance, meteorological  data and spatial
emissions inventories could be used to identify potential  missing sources
of a particular fitting element, as could microscopy or factor analysis.

     In the event that significant inconsistencies are observed,  all  results
should be reviewed, focusing on those particular sources that appear to be
inconsistent among the different methods.  If  there is compel liny evidence
that the CMB model inputs be changed, the deviations from model assumptions,
model sensitivity and summary statistics (Sections  3.3, 3.4 and  3.5)
should be reevaluated in addition to rerunning the CMB model.

     The CMB estimates should be tested to see how sensitive they are  to
the various input data.  Unstable estimates (source contribution  estimates
that change by more than one standard error estimate) are an indication
that the model may not be providing stable results.  For CMB validation,
model stability tests are usually taken to mean the evaluation of model
estimates to changes in input parameters, such as the selected 'sources and
their profiles, as well as selection of fitting species used to reach  a
solution with the CMB model.  The following is a discussion of three types
of parameter changes that should be included in a model stability (sensitivity)
test, and a discussion of ways to make the model less sensitive (more
stable).

           3.6.1  Source Profile Sensitivity

     The CMB model's effective variance fitting procedure uses estimates
of the source profile and receptor concentration uncertainties to "weight"
their effect in arriving at source contribution estimates.   It is helpful
to explore how sensitive the source contribution estimates  are to changes
in the source profiles and these uncertainties.  This can  be done by  intro-
ducing changes into the source profiles and rerunning the model for each
change.

     The model user can select several  species from a source(s) of
particular regulatory interest and assign worst case values to those species
in the profile.  The model can then be rerun with the worst case  profile(s).
A practical way to accomplish this sensitivity analysis is  to include  a
"worst case" source profile along with the "best estimate"  profile in  the
"FS" or "CS" data file.  The resulting source  estimate(s)  can be  considered
"brackets" to the source contribution estimates and can be  compared to the
uncertainty intervals calculated for each run.  If the bracketing interval
is greater than the calculated uncertainty interval, then  the model may be
sensitive to changes in the source profiles.
                                     25

-------
            3.6.2  Receptor Concentration Sensitivity

      The  stability of source contribution estimates with respect to receptor
 concentrations  is best tested with collocated chemical measurements from
 one  of the  sampling sites.  These collocated measurements are usually
 included  as part of the quality assurance plan for a subset of all samples.
 If nearly equivalent source contribution estimates are derived from these
 two  independent measurements of the same ambient air, then the receptor
 data are  not  likely causing instabilities in the CMB results.

      Lacking these collocated data, portions of the input data may be
 perturbed randomly or systematically in proportion to their uncertainty
 (e.y., Javitz and Watson,  1986; Watson and Robinson, 1984).  The source
 contribution  estimates for the sources of regulatory interest should not
 change by more  than one standard error in response to small perturbations
 if the results  are stable.  (A "small" perturbation is defined as one std.
•error of  the ambient species concentrations.)   If the results are not stable,
 the  validity of the CMB result for that particular data are questionable.

            3.6.3  Fitting  Species Sensitivity

     The stability of CMB model results to the fitting species can be evaluated
 by identifying  a species which SSCONT attributes in large part to a single
 source.   Eliminate this species from the fit and examine how much the cor-
 responding  source contribution changes.  If this change is greater than the
 STDERR, then that species  must be greatly influencing the "fit."  Review the
 quality of  both the source and ambient measurements for that species carefully
 because of  its  influence on the model estimates.


      3.7  Evaluating Results of the CMB Analyses

      If  (a) the CMB model  is determined to be applicable  (Section 3.1),
 (b)  the  summary statistics and diagnostics are  generally within target
 ranges  (Section 3.3),  (c)  there are.no significant deviations from model
 assumptions (Section 3.5), and  (d) the sensitivity tests in Section 3.6
 uncovered no unacceptable  instability or consistency problems, the CMB
 analysis  is considered valid.   If uncertainties associated source estimates
 are  too high for decision-making purposes even  after taking the steps recom-
 mended  in this  protocol, then the source compositions being used are not
 representative of the sources in the airshed, or they contain too much
 uncertainty associated with the influential species.

      It  is  recommended that both a dispersion model and receptor model be
 used in  a collaborative manner to perform an apportionment, provided that
 the  dispersion model  is applicable and the receptor model is valid for the
 particular  application  (U.S.  EPA  1987C).
                                      26

-------
4.0  ACKNOWLEDGMENTS

     The primary authors of this document were Thompson G.  Pace,  P.E., of
the U.S. EPA and Dr. John G. Watson of the Desert Research  Institute.
However, the experience, talent and ideas of the co-authors played a
large role in its development.   The co-authors participated in a  workshop
to review and revise the document in May 1986 in San Francisco.  The co-
authors are:  Dr. Kit Wagner, Mr. Bart Croes, Mr. Duane Ono, Dr.  Hal
Javitz, Mr. Mike Naylor, Dr. Judith C. Chow, Dr. David Maughan, Dr. Edwin
Meyer, Mr. Luke Wijnberg, Mr. Ken Axetell, Mr. Mike Anderson, Mr. John
Core, Mr. Pat Hanrahan, Dr. Ron Henry, Mr. Bong Mann Kim,  Mr. Chung Liu,
and Dr. Andy Gray.

     Several others provided review and comment outside of  the workshop.
The reviews by Dr. Glen Gordon, Mr. William Cox, Mr. Chuck  Lewis,
Dr. Thomas D. Dzubay, Mr. Robert K. Stevens, Mr. Johnnie Pearson,
Dr. Richard DeCesar, Dr. Sylvia Edgerton, and Ms. Bridget  Landry  are
greatly appreciated.  The typing and revisions by Ms. Cathy Coats,
Ms. Jo Harris, and Ms. Linda Ferrell are much appreciated.
                                    27

-------
5.0  References

Blumenthal, D., Watson, J.  6.,  Richards,  L.  W.,  Itering,  S.  V., and
Chow, J. C. (1986), "Southern California  Air Quality Study:   Suggested
Program Plan,"  Sonoma Technology, Santa Rosa, CA.

Chow, .;. C., 1985, "A Composite Modeling  Approach  to Assess  Air Pollution
Source'Receptor Relationships," Doctor of Science  Dissertation, Harvard
'Jni vanity, Boston, MA, July 1985.

Chow,  J. C., Watson, J. G., Egami,  R. T., Wright.  B., Ralph, C., Naylor,
M., Smith, J.,  and Serdoz,  R. (1986), "Program Plan for State of Nevada
Air Pollution Study (SNAPS)," DRI Document 8086-100, Energy  and Environ-
mental Engineering Center,  Desert Research Institute, NV  89506.

Chow,  J. C., Watson, J. G., and Frazier,  C.  A. (1986A), "A Survey of Existing
Fugii~ /e/Area Source Characterization Methods for  Receptor Modeling" to be .
printed in Proceedings, Particulate Matter Fugitive Dusts-Measurement and
Control in Western Arid Regions, Air Pollution Control  Association, Tucson,
Arizona, Oct. 16-17, 1986.

Curri-e, L. A.,  Gerlach, R.  W.,  Lewis, C.  W.  et al. (1984), "Interlaboratory
Compa-ison of Source Apportionment  Procedures:  Results for  Simulated Data
Sets," Atm. Env., 18, (8): 1517-1537.

Currie, L. A.,  Gerlach, R.  W.,  Lewis, C.  W., Balfour, W.  D., Cooper, J. A.,
Dattner, S. L., DeCesar, R. T., Gordon, G. E., Heisler, S. L., Hopke, P. K.,
Shah, J. J., Thurston, G. D., and Williamson, H. J. (1984),  "Interlaboratory
Comparison of Source Apportionment  Procedures:  Results for  Simulated Data
Sets," Atm. Env., 18, 1517.

DeCesar, R. T., and Cooper, J.  A. (1982), "Evaluation of  Multivariate and
Chemical Mass Balance Approaches to Aerosol  Source Apportionment, Using
Synthetic Data  and an Expanded PACS Data  Set," Proceedings,  Receptor Models
Applied to Contemporary Pollution Problems,  Specialty Conference, Air
Pollution Control Association,  Denver, MA.

Dzubay, T. G.,  Stevens, R.  K. et al. (1982A), "Intercomparison of Results
of Several Receptor Models  for Apportioning  Houston Aerosol", Proceedings,
Receptor Models Applied to  Contemporary Pollution  Problems,  Specialty
Conference, Air Pollution Control Association, Denver,  CO, 1982.

Dzubay, T. G.,  Stevens, R.  K. et al. (1984), "Interlaboratory Comparison of
Receptor Model  Results for  Houston  Aerosol," Atm.  Env., _18 (8): 1555-1566.

Friedldnder, S. V. (1981),  "New Developments in Receptor  Modeling Theory,"
Atmospheric Aerosol:  Source/Air Quality  Relationships, ed.  by E. S. Marias
and P. K. Hopke, CS Symposium Series 167, American Chemical  Society,
Washinjcon, DC, p. 1-p. 19.

Gerlach, R. W., Currie, L.  A. and Lewis,  C.  D. (1982),  "Review of the Quail
Roost  II Receptor Model Simulation  Exercise," Proceedings, Receptor Models
Applied to Contemporary Pollution Problems,  Specialty Conference, APCA,
Denver, MA.

                                     28

-------
Gordon, G. E. (1984), "Atmospheric Tracers of Opportunity from Important
Classes of Air Pollution Sources," DOE Workshop on Atmospheric Tracers,
Santa Fe, NM.

Gordon, G. E., Pierson, W. R., Daisey, J.  M., Lioy, P.  J.,  Cooper,  J.  A.,
Watson, J. G., and Cass, G. R. (1984), "Consideration for Design of Source
Apportionment Studies," Atm. Env., 18:1567-1582.

Gordon, G. E. and Ann Sheffield,  University of Maryland, Chemistry  Building,
College Park, Maryland, 20742, Personal  communication to Tom Pace,  April
1987.

Henry, R. C. (1982), "Stability Analysis of Receptor Models That Use Lowest
Squares Fitting" in Receptor Models Applied to Contemporary Pollution  Problems,
edited by S. L. Dattner and P. K.  Hopke, Air Pollution  Control Association,
Pittsburgh, PA, p. 141.

Henry, R. C. and Kim, B. M. (1986), "Evaluation of Receptor Model  Performance,"
Report to U.S. Environmental Protection Agency, Technology  Development
Section, Air Management Technology Branch, Monitoring and Data Analysis
Division, Office of Air Quality Planning and Standards, Research Triangle
Park, NC.

Hopke, P. K. (1985), Receptor Modeling in  Environmental Chemistry,
John Wiley and Sons, New York, 1985.

davitz, H. S. and Watson, J. G. (1986) "Feasibility Study of Receptor
Modeling for Apportioning Utility  Contributions to Air Constituents,
Deposition Quality and Light Extinction,"  Draft Report  for  Electric Power
Research Institute, prepared by SRI International, Menlo Park, CA.

Lewis, C. W. and Stevens, R. U. (1985) "Hybrid Receptor Model for Secondary
Sulfate from an S02 Point Source," Atmospheric Environment, _T9, 917-924.

Liu, C. S., Gray, H. A., Grisinger, J. E., and Davidson, A. (1986), "Draft
1987 AQMP Revision Working Paper No. 2:  PM^Q Modeling Approach,"  prepared
by the South Coast Air Quality Management District, El  Monte, CA.

Mueller, P. K. and Hidy, G. M. et  al. (1983), "The Sulfate  Regional
Experiment:  Report of Findings,"  Electric Power Research Institute Report
#EA-1901, Palo Alto, CA.

Scheff, P. and Wadden, R. A. (1986), "Predicting Unidentified and Secondary
Sources With Chemical Mass Balance Receptor Modeling,"  Receptor Methods
for Source Apportionment— Real World Issues and Applications, Edited by
T. G. Pace, APCA, Pittsburgh, PA,  1986.

Stafford, M. A. and Liljestrand,  H. M. (1984), "On the Distinction  of  Secondary
Species in Acid Deposition," presented at the 77th Annual Meeting of the
Air Pollution Control Association, San Francisco, CA.
                                     29

-------
Stevens, R. K. and Thompson G. Pace (1984), "Review of the Mathematical  and
Empirical Receptor Models Workshop (Quail  Roost II)."  Atm. Env. 18, 1499-1506.

Trijonis, J., "Model Reconciliation,"  Special  Report prepared for U.S.  EPA
and TRC Environmental Consultants under Contract 68-02-3886, Work Asignment
No. 13, Research Triangle Park, NC 27711,  September 1985.

U.S. EPA, 1978, Digest of Ambient Particulate  Analysis and Assessment
Methods, EPA-450/3-78-013, U.S. EPA,  Research  Triangle Park, NC 27711,
September 1978.
U.S. EPA, 1980, Interim Guidelines and Specifications for Preparing Quality
Assurance Project Plans, QAMS-005/80,  U.S. EPA, Office of Research and
Development, Research Triangle Park,  NC 27711, December 1980.

U.S. EPA, 1981A, Receptor Model Technical  Series,  Volume I:  Introduction
to Receptor Models, EPA-450/4-81-016a, U.S. EPA, Research Triangle Park, NC
27711, July 1981.

U.S. EPA, 19818, Receptor Model Technical  Series,  Volume II:  Chemical  Mass
Balance, EPA-450/4-81-016b, U.S. EPA,  Research Triangle Park, NC 27711,
July 1981.

U.S. EPA, 1981C, "SOP for Technicon Determination  of Sulfate in Suspended
Particulate Matter Collected on Glass  Fiber Filters," EMSL/RTP SOP-EMD-005,
U.S. EPA, Research Triangle Park, NC  27711, November 1987.

U.S. EPA, 1981D, "SOP for the Extraction of Sulfate and Nitrate and for  the
Technicon Determination of Sulfate or  Suspended Particulate Matter Collected
on Dichotomous Filters," EMSL/RTP SOP-EMD-006, U.S. EPA, Research Triangle
Park, NC 27711, November 1981.

U.S. EPA, 1983A, Receptor Model Technical  Series,  Volume IV:  Summary of
Particle Identification Techniques, EPA-450/4-83-018, U.S. EPA, Research
Triangle Park, NC 27711, June 1983.

U.S. EPA, 1984A, Receptor Model Technical  Series,  Volume V:  Source
Apportionment Techniques and Considerations in Combining Their Use, EPA-
450/4-84-020, U.S. EPA, Research Triangle  Park, NC 27711, July 1984.

U:S. EPA, 19848, Interim Procedures for Evaluating Air Quality Models,  EPA-
450/4-85-023, U.S. EPA, Research Triangle  Park, NC 27711, September 1984.

U.S. EPA, 1985A, Technical Support Document for Residential Wood Combustion
EPA-450/4-85-XXX, U.S. EPA, Research Triangle  Park, NC 27711, June 1985, Draft.

U.S. EPA, 1985B, Receptor Model Technical  Series,  Volume VI:  Multivariate
Methods, EPA-450/4-85-007, U.S. EPA, Research  Triangle Park, NC 27711,
July 1985.

U.S. EPA, 1985C, Receptor Model Source Composition Library, EPA 450/4-85-
002, U.S. EPA, Research Triangle Park, NC  27711, November 1984.
                                     30

-------
U.S. EPA, 1986A, Guideline on Air Quality Models  (Revised)  U.S.  EPA,  Research
Triangle Park, NC  27711,  revision in preparation.

U.S. EPA, 1986B, Procedures for Estimating Probability  of Nonattainment  of
a P.MIQ NAAQS Using Total  Suspended Particulate or PM]^  Data,  EPA-450/4-86-
017, U.S. EPA, Research Triangle Park, NC 27711,  December 1986.

U.S. EPA, 1986C, Guideline on the Identification  and Use of Air  Quality  Data
Affected by Exceptional Events, EPA-450/4-86-007, U.S.  EPA, Research  Triangl^
Park, NC 27711, July 1986.

U.S. EPA, 1987B, Receptor Model Technical Series, Vol.  Ill  (Revised), Chemical
Mass Balance Receptor Model User's Manual, EPA-450/x-xx-xxx,  U.S.  EPA,
Research Triangle Park, NC 27711, May 1987.

U.S. EPA, 1987C, PM10 SIP Development Guideline,  U.S.  EPA,  EPA 450/2-87-
001, 'Research Triangle Park, NC 27711, January 1987.

U.S. EPA, 1987D, Procedures for Reconciling Differences in  Receptor and
Dispersion Models, U.S. EPA, EPA 450/4-87-008, Research Triangle Parx, NC
27711, May 1987.

Watson, J. G. (1979), "Chemical Element Balance Receptor Model Metnodology
for Assessing the Sources of Fine and Total  Suspended  Particulate  Matter in
Portland, Oregon," Ph.D.  dissertation, Oregon Graduate  Center, Beaverton, Oregon,

Watson, J. G., Lioy, P. J. and Mueller, P. U. (1983),  "The  Measurement
Process:  Precision, Accuracy and Validity," in Air Sampling  Instruments for
Evaluation of Atmospheric Contaminants. Sixth Edition,  American  Conference of
Governmental Industrial Hygienists, Cincinnati, OH.

Watson, J. G. and Robinson, N. F. (1984), "A Method to  Determine Accuracy
and Precision Required of Receptor Model Measurements," APCA/ASQC  Specialty
Conference on:  Quality Assurance in Air Pollution  Measurements.  Boulder, CO.

Watson, J. G. and Robinson, J. F. (1984), "A Method to  Specify Measurements
for Receptor Models," Proceedings of the National Symposium on Recent Advances
in Pollutant Monitoring of Ambient Air and Stationary  Sources, U.S. EPA,
Research Triangle Park, NC, April, 1984.

Watson, J. G., Cooper, J, A., and Huntzicker, J.  J. (1984), "The Effective
Variance Weighting for Least Squares Calculations Applied to the Mass
Balance Receptor Model," Atm. Env., JUS:1347-1335.

Watson, J. G. and Chow, J. C. (1986), "Volume 4 - An Evaluation  of Ambient
Aerosol Chemistry in the Western United States,"  Draft  Report of Western
States and Deposition Project, Phase I of SYSAPP-86-129, prepared  for
Western Governor's Association, Denver, Colorado.
                                     31

-------
                                 APPENDIX A
            Determining Source Profiles  and  Their Uncertainties

      Source profile variability needs to be assessed  on  a  case-by-case
 basis by reviewing the nature of the fuels  and  raw materials  used by the
 sources.  Also,  values and uncertainties of those species  which  are the
 most influential  in the "fit" should be carefully examined to ensure that
 both the ambient  data and source profiles are  accurate,  valid and precise.
 The SSCONT command (see Section 3.3.6)  shows the percent contribution of
 each source to each species in the fit.  This  provides  some indication of
.each species'  influence on the fit.

      Most combustion processes use fuels with  variable  chemical  compositions.
 Coal and fuel  oil  combustion source profiles may change  substantially when
 the origin of the fuel changes (e.g., South American  vs. Middle  East Crude).
 Process upsets and control  device malfunctions  also affect source profiles,
 but usually not  as much as they affect  the  mass emission rates.   Manufactjring
 process emissions may be stable over time,  but  the use  of  different raw
 materials or changing product specifications could cause source  profiles to
 vary.

      The most accurate and reasonably  "certain" information about sources
 in an airshed is  derived from samples of the effluents  from those sources,
 preferably at the same time that the receptor  measurements are being
 taken.  This may  only be possible during a  comprehensive (Level  III) CMB
 application.  The alternatives are to measure  the source profiles from
 representative sources at representative times  in the airshed under study,
 or to use profiles measured on similar  sources  in other  airsheds  and compiled
 in a source composition library (U.S. EPA,  1985C).

      Source sampling to obtain source profile  information  is  not  as
 complicated as sampling to determine the mass  emission  rate.   This  is
 because source profiles require only that species be  characterized  as % of
 total mass of sample collected.  (Not in terms  of their  absolute  emission
 rate.) Hot exhaust sampling, diluted (cooled)  exhaust sampling,  airborne
 plume sampling,  ground-based plume sampling, and grab sampling methods have
 been developed and are applicable to different  source types (Chow .et al.,
 1986A).

      "Grab sampling" is an especially useful and inexpensive  method of
 characterizing soil dust or storage piles.   The procedure  is  so  named
 because samples  of the effluent or raw  material are "grabbed" in  bulk
 instead of sampled from an exhaust stream.   These sources  can be  sampled
 in bulk and resuspended onto filters in the laboratory  so  that chemical
 analysis can be  performed.   In some cases,  it  can be  determined  that the
 effluent captured by the source's control device is chemically representative
 of the emissions.   In these cases, a bulk sample of the  captured  effluent
 may be likewise  resuspended.  The "grab sample" procedure  is  generally much
 less expensive than stack sampling and  can  yield acceptable profiles for
 some types of sources.
                                     A-l

-------
     Any source profile contains both measurement and analytical  uncertainty.
Use of any profile data which are not representative of  the source at  the
time the ambient sample was collected introduces  additional  (location  and/or
time) uncertainty in the source profiles.   The combination  of these (the over-
all uncertainty) is assigned to each species  in a source profile  and the CMB
model uses these uncertainties to weight  a  species'  influence on  the solution.
The uncertainty estimates for each species  in the source profile  are just
as important as the source profile values themselves.  Uncertainty estimates
can be based on the judgment of an engineer or scientist knowledgeable
about each particular source and its operating characteristics.   This
judgment takes advantage of natural  lower and upper  limits  to the possible
uncertainty.  The analytical uncertainty  derived  from error propagation
provides a lower limit of the uncertainty estimate in a  source profile.  An
upper limit is imposed by the constraint  that the sum of all  the  fractional
chemical compositions in a source profile cannot  exceed  unity, and random
uncertainties cannot be so large that this  might  occur.   A  value  between
these extremes is often appropriate for CMB analysis.  Repeated  source
tests over a range of operating variables provides a better estimate of the
average profile of a source type and the  standard deviation of that average
is a good estimate of its uncertainty.

      The EPA Source Composition Library  presents typical compositions and
uncertainties for the most common source  types.  These uncertainties may be
understated when the profile is used to represent conditions  at  another
location.  Uncertainties of magnitude similar or  even greater than those in
the Library should be assigned when source  composition data without uncer-
tainties has been obtained.  The value of the CMB performance diagnostics
is substantially reduced without accurate estimates  of source profile
uncertainty.

     Previous applications of the CMB model suggest  that typical  uncertainties
in both receptor and source measurements  of up to +-30%  in  each of the species
is tolerable; large uncertainties of +-50 to  100% or more for some species
may also be tolerable (Watson 1979).  As  noted, the  effective variance
weighted least squares algorithm incorporated in  the CMB model considers
both ambient and source data uncertainties.  This fitting procedure gives
less emphasis to those highly variable elements in the fitting process.   If
most elements in a source are highly variable, the source contribution
estimate for that source is likely to have  a  high uncertainty.
                                    A-2

-------
UNCERTAINTY/SIMILARITY CLUSTERS
SUM OF COMB. SOURCES

SPECIES COI
SAMPLE OUR;
R S(
CHI S(
SPECIES-I-*
1 TOT
9 F *
11 NA *
12 MG
13 AL *
14 SI *
16 S
17 CL *
19 K *
20 CA *
22 TI *
23 V *
24 CR
25 MN *
26 FE *
28 NI *
29 CU
30 ZN
35 BR *
82 PB *
91 OC *
92 EC *
93 S04 *
94 N03 *
YCENTRATIONS
VTION
3UARE
3UARE 1.
ii 	 MFAS-

73.30000+-
.03600+-
.97000+-
.85000+-
4.80000+-
14.50000+-
.56000+-
.33000+-
.54000+-
2.00000+-
.40000+-
.02600+-
.02600+-
.08300+-
3.50000+-
.02200+-
.05200+-
.11000+-
.22000+-
.62000+-
13.10000+-
1.70000+-
1.70000+-
1.60000+-
- SITE:PACS2 DATE:0124 78 SIZf
24 START HOUR 0
99 PERCENT MASS 101.3
76 DF 9
PAI r nft-rm

1.10000
.01700
.14000
.12000
.16000
.50000
.14000
.05000
.03000
.07000
.02000
.00200
.00200
.00400
.13200
.00300
.00400
.01000
.02000
.07000
4.30000
.90000
.50000
.30000

74.59217+-
.09646+-
1.05442+-
.90611+-
4.40322+-
15.46888+-
.52574+-
.32922+-
.60618+-
1.75155+-
.55770+-
.02551+-
.02529+-
.08149+-
3.23851+-
.02248+-
.02541+-
.07245+-
.15044+-
.76834+-
13.10000+-
1.14293+-
1.70000+-
1.60000+-

4.83016
.05719
.05228
.07983
.40123
.82643
.04939
.06526
.03237
.12300
.09907
.00349
.00937
.00779
.16125
.00403
.00689
.02069
.04846
.11825
1.04393
.45013
.15074
.17316
---rtrt i lu
1.02+-
2.68+-
1.09+-
1.07+-
.92+-
1.07+-
.94+-
1.00+-
1.12+-
.88+-
1.39+-
.98+-
.97+-
.98+-
.93+-
1.02+-
.49+-
.66+-
.68+-
1.24+-
1.00+-
.67+-
1.00+-
1.00+-
::COARSE
C/n 	 RATIO R/U
.07 TOT
2.03 F
.17 NA
.18 MG
.09 AL
.07 SI
.25 S
.25 CL
.09 K
.07 CM
.26 TI
.15 V
.37 CR
.11 MN
.06 FE
.23 NI
.14 CU
.20 ZN
.23 BR
.24 PB
.34 OC
.44 LC
.31 S04
.22 N03
.26
1.01
.56
.39
-.92
1.00
-.23
-.01
1 . 50
-1.76
l.5b
-.12
-.07
-.17
-1.25
.09
-3.3
-1.63
-1.33
1.08
.00
-.55
.00
.00
                            B-2

-------
                                APPENDIX 8
            Examples  and Quick Guide for Identifying Problems
                          and Corrective Action

     There are four situations which,  once they have been  identified,  can
be addressed to improve the performance of the CMB.   If  they  are  not
satisfactorily addressed, the model  cannot be considered valid  for a  par-
ticular application or the source contribution estimates will  nave to  be
used with unacceptable uncertainties.   The situations are:  1)  incorrect
ambient data; 2) incorrect source profiles; 3) missing source  in  the
solution; 4) profile uncertainty/col linearity; and  5) noncontribating  source
in the solution.

     The following provides a Quick  Guide and examples to  illustrate  the
process of using the model's statistics and diagnostics  to help  identify
these situations.  The indicators proposed in these  examples  are  derived
from a consensus of experienced model  users and should be  considered  as
"guides" —" not as "rules".  They are not a "cure-all".  They  are  only
included to assist the discovery of  errors in the input  data  which may
cause the model to provide incorrect Source Contribution Estimates and they
may be modified and improved in future revisions to  this Protocol  as  a wide
range of experience is gained in practical applications.

     Example B is assumed to be the  "correct" solution for the  data set
used in this example.  Examples B.I  - B.5 show how  the results  would  appear
if the data were modified to include (one at a time) the situations identified
above.  Example B.6 shows a composite of several of  these  situations.

EXAMPLE B:  THE "CORRECT" SOLUTION

SOURCE CONTRIBUTION ESTIMATES -  SITE:PACS2        DATE:0124  78   SIZE:COARSE
SAMPLE DURATION        24      START HOUR         0
       R SQUARE       .99    PERCENT MASS     101.8
     CHI SQUARE      1.76              OF         9
SOURCE
* TYPE
3 UDUST
4 AUTPB
5 RDOIL
6 VBRN1
11 ALPRO
13 FERMN
17 S04
18 N03
19 OC
SCE(UG/M3)
55.0169
2.8220
.2834
3.9257
2.1796
.1251
1.3812
1.3541
7.5042
STD ERR
1.8952
.5755
.0811
1.5100
.9945
.0531
.5240
.3545
4.5103
TSTAT
29.0298
4.9033
3.4932
2.5998
2.1917
2.3535
2.6357
3.8194
1.6638
MEASURED CONCENTRATION FIIME/COARSE/TOTAL:
  42.60000+-    .6007 . 73.30000+-   1.100/ 115.90000+-   1.253
                                    B-l

-------
PROBLEM B.I:  INCORRECT AMBIENT DATA

INDICATION:

         o RATIO R/U « -2.0 for a species suggests either the ambient
           data are high or the profile data are low for the flagged species;

         o RATIO R/U » 2.0 for a species would imply that the ambient
           data are low or the profile data for that species are high.  If
           profile data are suspect, see Section 3.5.2 and problem 8.2.

ACTION:    1. Review the uncertainty assigned to the species with the high
              residuals.  Make any justifiable and appropriate changes and
              rerun the CMB.  If this reduces the RATIO R/U, Step 2 is not
              necessary.

           2. Delete the suspect species from the list of fitting species
              and rerun.  If the SCE changes by one standard error, do not
              use this species in the fit until it has been remeasured.


EXAMPLE B.I:  IDENTIFYING INCORRECT AMBIENT DATA

SUMMARY:  A CHI SQUARE » 4 and a % MASS of -120% suggest that the fit
is not satisfactory.  The RATIO R/U for Si was « -2.0 implying high ambient
data for that species (also, low profile data for the major source of that
species or absence of a source of Si in the source list).  The profile data
were checked and were believed to be correct.  The ambient data was then
reviewed and it was found that the Si ambient data was erroneously entered
as 29 ug/m3 for this example when the correct value was 14.5.  Notice that
the SCE's for UDUST and ALPRO are significantly changed with respect to
Example B.  The high ambient Si data raised the UDUST SCE to 69.6 ug/m3  and
the ALPRO SCE is lowered.
SOURCE CONTRIBUTION ESTIMATES -  SITE:PACS2
SAMPLE DURATION        24      START HOUR         0
       R SQUARE       .93    PERCENT MASS     118.4
     CHI SQUARE     11.54              DF         9
DATE.-0124 73   SIZE:COARSE
SOURCE
* TYPE
3 UDUST
4 AUTPB
5 RDOIL
6 VBRN1
11 ALPRO
13 FERMN
17 S04
18 N03
19 OC
SCE(UG/M3)
69.5898
2.6964
.2797
3.4292
.5877
.0257
1.4130
1.3834
7.3824
STD ERR
2.1824
.6017
.0801
1.4281
.8251
.0580
.5244
.3490
4.5131
TSTAT
31.8870
4.4809
3.4912
2.4012
.7123
.4433
2.6947
3.9634
1.6358
                                    B-4

-------
ENTER COMMAND
SSCONT
                             CALC SP£CIES(PER SOURCE)
1INU
SPECtSOURCE
1 TOTAL
9 =
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
?3 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 BR
82 PB
91 OC
92 EC
93 S04
94 N03
1 V 1UUML
3
.751
.122
.993
.997
.756
1.062
.000
.000
1.049
.825
1.389
.571
.952
.663
.901
.100
.317
.550
.020
.328
.140
.502
.024
.007
KM 1 i\J -
4
.038
.000
.000
.000
.006
.002
.020
.257
.004
.018
.000
.000
.000
.000
.017
.023
.040
.090
.641
.910
.108
.063
.022
.016
MEAS
5
.004
.004
.010
.000
.000
.000
.067
.000
.001
.002
.001
.375
.005
.002
.002
.690
.004
.010
.000
.001
.002
.005
.080
.001
SPECIES(ALL SOURCES)
6
.054
.000
.026
.000
.012
.002
.034
.660
.044
.021
.000
.000
.000
.057
.002
.000
.068
.000
.009
.000
.177
.081
.037
.125
11
.030
2.543
.054
.069
.142
.000
.000
.079
.000
.009
.004
.034
.013
.QUO
.002
.208
.059
.002
.012
.000
.000
.021
.022
.000
13
.002
.010
.004
.000
.000
.000
.004
.002
.024
.001
.000
.001
.002
.261
.001
.000
.001
.007
.001
.000
.001
.001
.003
.004
17
.019
.000
.000
.000
.000
.000
.814
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.812
.000
18
.013
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.846
19
.102
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.573
.000
.000
.000
                                     B-3

-------
MEASURED CONCENTRATION FINE/COARSE/TOTAL:
  42.60000+-    .600/  73.30000+-   1.100/ 115.90000+-   1.253
       UNCERTAINTY/SIMILARITY CLUSTERS
               SUM OF COMB. SOURCES
SPECIES CONCENTRATIONS -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .93    PERCENT MASS
     CHI SQUARE     11.54              DF
DATE:0124 78
      0
  118.4
      9
SIZErCOARSE
ortoito-l -:•!--- ----;•! C.MO- 	 	 	 	 — ---
1 TOT
9 F
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 BR
82 PB
91 OC
92 EC
93 S04
94 N03

*
*

*
*

*
*
*
*
*

*
*
*


*
it
*
*
*
*
73.30000+-
.03600+-
.97000+-
.85000+-
4.80000+-
29.00000+-
.56000+-
.33000+-
.54000+-
2.00000+-
.40000+-
.02600+-
.02600+-
.08300+-
3.50000+-
.02200+-
.05200+-
.11000+-
.22000+-
.62000+-
13.10000+-
1.70000+-
1.70000+-
1.60000+-
1.10000
.01700
.14000
.12000
.16000
.50000
.14000
.05000
.03000
.07000
.02000
.00200
.00200
.00400
.13200
.00300
.00400
.01000
.02000
.07000
4.30000
.90000
.50000
.30000
86

1
1
4
19



2




4





13
1
1
1
----U,~M_O-
.78726+- '
.03048+-
.26480+-
.08755+-
.85758+-
.54129+-
.53117+-
.27837+-
.74277+-
.16760+-
.70362+-
.02865+-
.03155+-
.07828+-
.06171+-
.01949+-
.02697+-
.08731+-
.14299+-
.79707+-
.10000+-
.31958+-
.70000+-
.60000+-
™ — — — — — — ._ .
4.94793
.03639
.06380
.09877
.48899
1.04456
.05034
.05884
.04039
.15426
.12529
.00407
.01183
.00871
.20300
^0400
.00851
.02599
.04630
.13206
1.09582
.55518
.15206
.16353
• *• "•
1

1
1
1



1
1
1
1
1

1




1
1

1
1
r\n i lu
.18+-
.85+-
.30+-
.28+-
.01+-
.67+-
.95+-
.84+-
.38+-
.08+-
.76+-
.10+-
.21+-
.94+-
.16+-
.89+-
.52+-
.79+-
.65+-
.29+-
.00+-
.78+-
.00+-
.00+-
\j/ n----r\rt i iu f\/ u
.07
1.09
.20
.21
.11
.04
.25
.22
.11
.09
.33
.18
.46
.11
.07
.22
.17
.25
.22
.26
.34
.52
.31
.21
TOT
F
NA
MG
AL
SI
S
CL
K
CA
TI
V
CR
MN
FE
NI
CU
ZN
BR
PB
OC
EC
S04
N03
2.66
-.14
1.92
1.53
.11
-8.17
-.19
-.67
4.03
.99
2.39
.58
.46
-.49
2.32
-.50
-2.66
-.81
-1.53
1.18
.00
-.36
.00
.00"
                                    8-5

-------
ENTER COMMAND
SSCONT
                             CALC SPECIES(PER SOURCE)
SPECtSOURCE
 1 TOTAL
 9 F
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 OR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 BR
82 PB
91 OC
92 EC
93 S04
94 N03
V lUUttL.
3
.949
.155
1.255
1.261
.957
.672
.000
.000
1.327
1.044
1.757
.723
1.204
.838
1.139
.127
.401
.696
.025
.415
.177
.634
.030
.009
l\rt 1 1U -
4
.037
.000
.000
.000
.006
.001
.019
.245
.004
.017
.000
.000
.000
.000
.016
.022
.038
.086
.613
.870
.103
.060
.021
.015
ME AS
5
.004
.004
.010
.000
.000
.000
.066
.000
.001
.002
.001
.370
.005
.002
.002
.681
.004
.010
.000
.000
.001
.005
.079
.001
SPECIES(ALL SOURCES)
6
.047
.000
.023
.000
.010
.001
.029
.577
.038
.018
.000
.000
.000
.050
.002
.000
.059
.000
.008
.000
.154
.071
.032
.109
11
.008
.686
.015
.019
.038
.000
.000
.021
.000
.002
.001
.009
.004
.000
.001
.056
.016
.000
.003
.000
.000
.006
.006
.000
13
.000
.002
.001
.000
.000
.000
.001
.000
.005
.000
.000
.000
.000
.054
.000
.000
.000
.001
.000
.000
.000
.000
.001
.001
17
.019
.000
.000
.000
.000
.000
.833
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.831
.000
18
.019
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.865
19
.101
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.564
.000
.000
.000
                                     B-6

-------
PROBLEM B.2:   GROSS ERROR IN A SOURCE PROFILE

INDICATION:

          o  SCE that is inconsistent with preliminary analyses or physical
            evidence;

          o  one or more species has a "high (pos.  or neg.)" residual  which
            cannot be attributed to incorrect ambient data; further evidence
            if the SSCUNT reveals that one source  dominates that species.
          Review profile data for the suspect species carefully.   Correct
          or remeasure profile if necessary.
ACTION:


EXAMPLE 8.2:  GROSS ERROR IN A SOURCE PROFILE
SUMMARY:  The Ca RATIO R/U is « -2.0 implying high ambient data or low profile
data for Ca and several  other species.  In SSCONT, the Ca is dominated by
UOUST.  tkith the ambient and profile data were checked and no problem was
found with Ca.  The RATIO R/U for Fe was » 2.0 implying low ambient data
or high profile c.ata for Fe.  The ambient data were reviewed and Fe was
found to be high in the UDUST profile (.114 instead of .057).  This example
points out that the diagnostics can only suggest possible sources of error.
One real data errur (in Fe) caused other species (Ca)  to "appear" to be
incorrect.
SOURCE CONTRIBUTION ESTIMATES -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .94    PERCENT MASS
     CHI SQUARE     11.51              DF
                                                   OATE:0124 78   SIZE:COARSE
                                                  0
                                               86.8
                                                  9
SOURCE
* TYPE
3 UDUST
4 AUTPB
5 RDOIL
6 VBRN1
11 ALPRO
13 FERMN
17 S04
18 N03
19 OC
SCE(UG/M3)
38.868B
2.7047
- .2403
7.1187
5.7457
.2239
1.2995
1.1902
6.2125
STD ERR
1.2080
.5265
.0878
2.1008
1,2366
.0638
.5278
.4004
4.5956
TSTAT
32.1758
5.1368
2.7354
3.3886
4.6463
3.5111
2.4620
2.9723
1.3518
MEASURED CONCENTRATION FINE/COARSE/TOTAL:
  42.60000+-    .600/  73.30000+-   1.100/ 115.90000+-
                                                         1.253
                                    8-7

-------
       UNCERTAINTY/SIMILARITY CLUSTERS
               SUM OF COMB. SOURCES
SPECIES CONCENTRATIONS -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       K SQUARE       .94    PERCENT MASS
     CHI SQUARE     11.51              OF
DATE:0124 78
      0
   86.8
      9
SIZE .-COARSE

1 TOT
9 F
11 HA
12 MG
13 Au
14 Si
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 BR
82 PB
91 OC
92 EC
93 S04
94 N03

*
*

*
*

*
*
*
*
*

*
*
*


*
*
*
*
*
*
73



4
14



2




3





13
1
1
1

.30000+-
.03600+-
.97000+-
.85000+-
.80000+-
.50000+-
.56000+-
.33000+-
.54000+-
.00000+-
.40000+-
.02600+-
.02600+-
.08300+-
.50000+-
.02200+-
.05200+-
.11000+-
.22000+-
.62000+-
.10000+-
.70000+-
.70000+-
.60000+-

1.10000
.01700
.14000
.12000
.16000
.50000
.14000
.05000
.03000
.07000
.02000
.00200
.00200
.00400
.13200
.00300
.00400
.01000
.02000
.07000
4.30000
.90000
.50000
.30000

63



4
10



1




4





13
1
1
1

.60400+-
.24521+-
.87972+-
.75371+-
.49870+-
.97881+-
.50959+-
.54612+-
.46919+-
.32928+-
.39731+-
.02111+-
.01862+-
.08626+-
.53500+-
.02699+-
.02835+-
.05500+-
.14940+-
.68511+-
.10000+-
.05713+-
.70000+-
.60000+-

4.72030
.13036
.05058
.06538
.39619
.58834
.05195
.10871
.02488
.09279
.07006
.00273
.00669
.00988
.11542
.00466
.00598
.01482
.04763
.10008
1.06037
.39077
.16139
.24324

.87+-
6.81+-
.91+-
.89+-
.94+-
.76+-
.91+-
1.65+-
.87+-
.66+-
.99+-
.81+-
.72+-
1.04+-
1.30+-
1.23+-
.55+-
.50+-
.68+-
1.11+-
1.00+-
.62+-
1.00+-
1.00+-
O/ 1 1 — — — -r\,T 1 1 \.J r\/ U
.07
4.84
.14
.15
.09
.05
.25
.41
.07
.05
.18
.12
.26
.13
.06
.27
.12
.14
.23
.20
.34
.40
.31
.24
TOT
F
NA
MG
AL
SI
S
CL
K
CA
TI
V
CR
MM
FE
NI
CU
ZN
BR
P8
OC
EC
S04
N03
-2.00
1.59
-.61
-.70
-.71
-4.56
-.34
1.81
-1.32
-5.77
-.04
-1.44
-1.06
.31
5.90
.90
-3.29
-3.08
-1.37
.53
.00
-.66
.00
.00
                                   B-8

-------
ENTER COMMAND
SSCONT
                             CALC SPECIES(P£R  SOURCE)
SPECtSOURCE
 1 TOTAL
 9 F
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 8R
82 PB
91 OC
92 EC
93 S04
94 N03
V iUUttL
3
.530
.036
.701
.704
.534
.751
.000
.000
.741
.583
.981
.404
.673
.468
1.266
.071
.224
.389
.014
.232
.099
.354
.017
.005
r\rt t j.u -
4
.037
.000
.000
.000
.006
.002
.019
.246
.004
.017
.000
.000
.000
.000
.016
.022
.038
.086 '
.615
.872
.103
.060
.021
.015
MEAS
5
.003
.304
.009
.000
.000
.000
.057
.000
.001
.002
.001
.318
.004
.001
.002
.585
.003
.009
.000
.000
.001
.004
.068
.001
SPECIES(A,_L SOURCES)
6
.097
.000
.048
.000
.021
.004
.061
1.197
.079
.038
.000
.000
.000
.103
.004
.000
.123
.000
.017
.000
.321
.147
.067
.227
11
.J78
6.703
.142
.183
.375
.000
/}00
.209
.000
.023
.011
.088
.035
,UOO
.006
.548
.155
.(105
.031
.000
.000
.054
.057
.000
13
.003
.018
.007
.000
.000
.000
.007
.003
.044
.001
.000
.002
.004
.467
.001
.000
.002
.012
.002
.000
.002
.002
.006
.008
17
.018
.000
.000
.000
.000
.000
.7^6
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.764
.000
18
.016
.000
.000
.000
.000
.000
.000
. 000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.744
19
.085
.000
.000
.000
.QUO
.000
.000
.oou
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.474
.000
.000
.000
                                   B-9

-------
PROBLEM B.3:   MISSING SOURCE

INDICATION:

          o  High Chi -Square;

          o  Low percent  mass explained;

          o  RATIO K/U « -2.0 (a  "nigh  negative"  residual)  for one or  more
            species%that are known to be present  in the suspect source.

ACTION: Add  sources to the fit and reevaluate.
EXAMPLE B.3:   IDENTIFYING A MISSING SOURCE
SUMMARY:  The Chi -Square is hig
« -2.0.  Using the flowchart p
and no problems were found.  Th
for missing sources.  These low
motor vehicle exhaust source.
high, in fact), suggesting that
The absence of motor vehicle ex
butions to be different from th
has few species in common with
                               h  and  two  species,  Br  and
                               rocedure in  Figure  1,  all
                                                         Pb  have RATIO R/U
                                                         data was rechecked
                               us,  the source  list  was  reviewed to check
                                ratios for 3r  and Pb suggest  a  missing
                               ~he  percent mass  explained  is  not low (a  bit
                                a missing source may be a  minor contributor.
                               haust does not  cause the other source contri-
                               e "correct" ones  because its  source profile
                               cue  other sources.
SOURCE CONTRIBUTION ESTIMATES -   SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .91    PERCENT MASS
     CHI SQUARE     12.16              DF
                                                   DATE:0124 78   SIZErCOARSE
                                                  0
                                              102.5
                                                 10
SOURCE
* TYPE
3 UDUST
5 RDOIL
6 VBRN1
11 ALPRO
13 FERMN
17 S04
18 N03
19 OC
SCE(U6/M3)
55.4251
.2620
5.7480
3.0749
.1072
1.3842
1.2880
•7.8295
STD ERR
1.9200
• .08.31
1.7679
1.1644
.0610
.5259
.3729
4.5613
TSTAT
28.8676
3.1540
3.2513
2.6408
1.7588
2.6322
3.4536
1.7165
MEASURED CONCENTRATION FINE/COARSE/TOTAL:
  42.60000+-    .600/  73.30000+-   1.100/ 115.90000+-   1.253
                                    B-10

-------
UNCERTAINTY/SIMILARITY CLUSTERS
SUM OF COMB. SOURCES

SPECIES CONCENTRATIONS - SITE:PACS2 D
SAMPLE DURATION 24 START HOUR
R SQUARE .91 PERCENT MASS
CHI SQUARE 12.16 DF
SPECIES-I-M 	 MEAS 	 CALC-
1 TOT
9 F
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 BR
82 PB
91 OC
92 EC
93 S04
94 N03

*
*

*
*

*
*
*
*
*

*
*
•*


*
*
*
*
*
*
73.
•
•
•
4.
14.
•
•
•
2.
•
•
•
•
3.
•
•
•
•
•
13.
1.
1.
1.
30000+-
03600+-
97000+-
85000+-
80000+-
50000+-
56000+-
33000+-
54000+-
00000+-
400UO+-
02600+-
02600+-
08300+-
50000+-
02200+-
05200+-
11000+-
22000+-
62000+-
10000+-
70000+-
70000+-
60000+-
1.10000
.01700
.14000
.12000
.16000
.50000
.14000
.05000
.03000
.07000
.02000
.00200
.00200
.00400
.13200
.00300
.00400
.01000
.02000
.07000
4.30000
.90000
.50000
.30000
75.
•
1.
•
4.
15.
•
•
•
1.
•
•
•
•
3.
•
•
•
•
•
13.
1.
1.
1.
11888+-
13403+-
09359+-
93657+-
70535+-
57675+-
52105+-
35636+-
61736+-
75470+-
56247+-
02523+-
02560+-
08100+-
20850+-
02271+-
02634+-
06291+-
01138+-
2Q541+-
10000+-
11920+-
70000+-
60000+-
ATE:0124 78 SIZE
0
102.5
10
	 DflTTll

4.86535
.08205
.05449
.07992
.41893
.83378
.05130
.08534
.03257
.12433
.09977
.00341
.00944
.00929
.16094
.00403
.00714
.02051
.00832
.08319
1.11467
.47212
.15639
.20264
«\ n i i. *_*
1.02+-
3.72+-
1.13+-
1.10+-
.98+-
1.07+-
.93+-
1.08+-
1.14+-
.88+-
1.41+-
.97+-
.98+-
.98+-
.92+-
1.03+-
.51+-
.57+-
.05+-
.33+-
1.00+-
.66+-
1.00+-
1.00+-
I: COARSE
*
C/M 	 RATIO R/U
.07
2.88
.17
.18
.09
.07
.25
.31
.09
.07
.26
.15
.37
.12
.06
.23
.14
.19
.04
.14
.34
.45
.31
.23
TOT
F
NA
MG
AL
SI
S
CL
K
CA
TI
V
CR
MN
FE
NI
CU
ZN
BR
PB
OC
EC
S04
NO 3
.36
1.17
.82
.60
-.21
1.11
-.26
.27
1.75
-1.72
1.60
-.19
-.04
-.20
-1.40
.14
-3.13
-2.06
-9.63
-3.81
.00
-.57
.00
.00
                            B-ll

-------
ENTER COMMAND
SSCONT
                             CALC SPECIES(PER SOURCE)
iNU
SPECtSOURCE
1 TOTAL
9 F
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 CR
25 MN
26 FE
28 MI
29 CU
30 ZN
35 8R
82 PB
91 OC
92 EC
93 S04
94 N03
1Y1UUML r
3
.756
.123
1.000
1.004
.762
1.070
.000
.000
1.057
.831
1.399
.576
.959 '
.668
.907
.101
.320
.554
.020
.331
.141
.505
.024
.007
^M 1 1U -
5
.004
.004
.009
.000
.000
.000
.062
.000
.001
.002
.001
.347
.005
.001
.002
.638
.004
.010
.000
.000
.001
.005
.074
.001
ME AS
6
.078
.000
.039
.000
.017
.004
.049
.967
.064
.031
.000
.000
.000
.083
.003
.000
.099
.000
.014
.000
.259
.118
.054
.183
SPECIES(ALL SOURCES)
11
.042
3.587
.076
.098
.201
.000
.000
.112
.000
.012
.006
.047
.019
.000
.003
.294
.083
.003
.017
.000
.000
.029
.031
.000
13
.001
.009
.003
.000
.000
.000
.003
.001
.021-
.001
.000
.001
.002
.224
.001
.000
.001
.006
.001
.000
.001
.001
.003
.004
17
.019
.00*
.000
.000
.000
.000
.816
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.814
.000
18
.018
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.805
19
.107
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.598
.000
.000
.000
                                   B-12

-------
PROBLEM B.4:  COLLINEARITY

INDICATION:

          o Two or more sources listed in a U/S cluster

          o T-statistic < 2.0 for one or more sources in that cluster—
            if the T-STAT becomes > 2.0 when species uncertainties for profile
            for that source is arbitrarily reduced to a potentially achievable
            level, this indicates that the uncertainty in the source profile
            is at least partially responsible for the "apparent" collinearity.

ACTION:  Remedies for unacceptably high uncertainties due to collinearity
         can take four forms ranked from most to least desirable.  There is an
         additional  remedy if the collinearity is associated with high
         uncertainties as discussed above.

EXAMPLE B.4:  COLLINEARITY

SUMMARY:  VBRN2 is a source that was added to the source list to introduce
collinearity.  In this case, VBRN2 was actually not contributing but it
illustrates the possibility that two potentially legitimate sources may
have similar signatures causing collinearity.  Following the Figure 1
flowchart, all source profile and species data have been checked and no
missing sources are  apparent at this time.  The only indications of a
problem are the negative SCE for VBRN2 and the fact that VBRN1,  VBRN2 and
OC sources were all  flagged in U/S clusters, indicating collinearity.
The SSCONT screen suggests that removing Cl from the fit might reduce the
collinearity.  Other approaches would include deleting one or two of the
sources from the fit.  The uncertainty/similarity display suggests that
the combined contribution of sources 6 & 19 is 13.9 +- 5.1 and the sum of
6,7, and 19 is only  12.1 +- 4.7.  This suggests that VBRN2 (source #7) may
be an insignificant  contributing source and could be deleted.  Other
options for dealing  with collinearity are discussed in Section 3.5.4.
Only VBRNl's SCE is  significantly affected by the addition of VBRN2, but
its STD. ERROR nearly triples with respect to the "true" solution (Example
B).  Notice that the "true" VBRN1 still  falls within one uncertainty
interval of the VRRN1 contribution in this example.  This example graphically
illustrates the effects of col Linearity on source contribution estimates
and their uncertainties.
                                    B-13

-------
SOURCE CONTRIBUTION ESTIMATES -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .99    PERCENT MASS
     CHI SQUARE      1.45              DF
       DATE:0124 78   SIZE:COARSE
      0
  103.9
      8
SOURCE
* TYPE
3 UDUST
4 AUTPB
5 RDOIL
6 VBRN1
7 VBRN2
11 ALPRO
13 FERMN
17 S04
18 N03
19 OC
SCE(UG/M3)
55.9124
2.7711
.2748
8.0754
-1.7924
2.3581
.1061
1.4063
1.1798
5.8962
STD ERR
2.1142
.5742
.0838
4.3667
1.6701
1.2212
.0758
.5395
.4333
4.9430
TSTAT
26.4464
4.8263
3.2791
1.8493
-1.0733
1.9310
1.4003
2.6067
2.7226
1.1928
MEASURED CONCENTRATION FINE/COARSE/TOTAL:
  42.60000+-    .600/  73.30000+-   1.100/ 115.90000+-   1.253
       UNCERTAINTY/SIMILARITY CLUSTERS
               SUM OF COMB. SOURCES
6
6
7
6
19
7 19
11
7 11
13
12

8
.972+-
.179+-
.566+-
.641+-
5.144
4.700
2.172
3.305
SPECIES CONCENTRATIONS -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .99    PERCENT MASS
     CHI-SQUARE      1.45              DF
DATE:0124 78
      0
  103.9
      8
SIZErCOARSE

1
9
11
12
13
14
16
17
19
20
22
23
TOT
F *
NA *
MG
AL *
SI *
S
CL *
K *
CA *
TI *
V *
73.
.
.
,
4.
14.
.
.
,
2.
,
.

30000+r
03600+-
97000+-
85000+-
80000+-
50000+-
56000+-
33000+-
54000+-
00000+-
40000+-
02600+-

1.10000
.01700
.14000
.12000
.16000
.50000
.14000
.05000
.03000
.07000
.02000
.00200

76.
#
1.
*
4.
15.
.
.
,
1.
B
.

18761+-
09823+-
09454+-
92472+-
56913+-
74725+-
52360+-
38261+-
52175+-
80675+-
56560+-
02552+-

5.01175
.09247
-.05457
.08303
.41389
.84354
.06079
.15282
.08708
.12900
.10069
.00349

1.04+-
2.73+-
1.13+-'
1.09+-
.95+-
1.09+-
.93+-
1.16+-
.97+-
.90+-
1.41+-
.98+-
u./ n----r\« i iu r\/ u
.07
2.87
.17
.18
.09
.07
.26
.50
.17
.07
.26
.15
TOT
F
NA
MG
AL
SI
S
CL
K
CA
TI
V
.56
.66
.83
.51
-.52
1.27
-.24
.33
-.20
-1.32
1.61
-.12
                                   B-14

-------
PROBLEM B.5:   NONCONTRIBUTING SOURCE IN FIT CAUSING COLLINEARITY

INDICATION:

          o  T-STAT between -2.0 and 2.0

          o  RATIO R/U » 2.0 ("high positive"  residual)  for a species
            which is attributed to the suspect source by the SSCONT
            diagnostic

          o  Negative SCE

          o  Physical basis for the source's contribution is */eak.

          o  SCE's, statistics and diagnostics  do not change if the suspect
            source is deleted

ACTION:  Delete source from fit

EXAMPLE R.5:    REMOVING NONCONTRIBUTING SOURCES FROM THE FIT

SUMMARY:  All  source profile and species data  were reviewed and found to
be correct.   At this point, no justification can be made that a missing
source must  be added.  The SCE for the GLASS source is negative and has
the lowest T-STAT of any SCE.  The organic carbon (OC) source has  a
T-STAT < 2.0 and the magnitude of the STDERR is high (4.5), so it  may
well be a minor or noncontributor.  The uncertainty/similarity display
shows that there is significant collinearity involving the GLASS source.
There is no  "high positive" RATIO R/U to indicate that GLASS is a  noncon-
tributor.  In  this case, the fact that deletion of the GLASS source did
not affect the SCE's, statistics and diagnostics coupled with its  negative
SCE and low  T-STAT were used to conclude that  the GLASS  was noncontributing.
SOURCE CONTRIBUTION ESTIMATES -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SOUARE       .99    PERCENT MASS-
     CHI SQUARE      1.73              OF
     DATE:0124 78   SIZE:COARSE
    0
102.2
    8
SOURCE
* TYPE
3 UDUST
4 AUTPB
5 RDOIL
6 VBRN1
11 ALPRO
13 FERMN
15 GLASS
17 S04
18 N03
19 OC
SCE(UG/M3)
55.9343
2.8157
.2801
4.0938
2.2675
.1217
-2.0215
2.6921
1.3582
7.3781
STD ERR
2.0525
.5775
.0814
1.5473
1.0213
.0542
1.6002
1.2022
.3565
4.5155
TSTAT
27.2514
4.8755
3.4399
2.6458
2.2202
2.2429
-1.2633
2.2394
3.8095
1.6340
MEASURED CONCENTRATION FINE/COARSE/TOTAL:
  42.60000+-    .600/  73.30000+-   1.100/ 115.90000+-   1.253

                                   B-16

-------
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 8R
82 P8
91 OC
92 EC
93 S04
94 N03

*
*
*


*
*
*
*
*
*


3





13
1
1
1
.02600+-
.08300+-
.50000+-
.02200+-
.05200+-
.11000+-
.22000+-
.62000+-
.10000+-
.70000+-
.70000+-
.60000+-
.00200
.00400
.13200
.00300
.00400
.01000
.02000
.07000
4.30000
.90000
.50000
.30000


3





13
1
1
1
.02550+-
.08324+-
.29570+-
.02242+-
.02864+-
.07313+-
.14953+-
.76144+-
.10000+-
.22355+-
.70000+-
.60000+-
.00952
.01157
.16395
.00399
.00763
.02101
.04775
.11811
1.34876
.53045
.19156
.24294
.98+-
1.00+-
.94+-
1.02+-
.55+-
.66+-
.68+-
1.23+-
1.00+-
.72+-
1.00+-
1.00+-
.37
.15
.06
.23
.15
.20
.23
.24
.34
.49
.31
.24
CR
MN
FE
NI
CU
ZN
BR
PB
OC
EC
S04
N03
-.05
.02
-.97
.08
-2.71
-1.58
-1.36
1.03
.00
-.46
.00
.00
ENTER COMMAND
SSCONT
                             CALC SPECIES(PER SOURCE)
1HU
SPECtSOURCE
1 TOTAL
9 F
11 MA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 BR
82 PB
91 OC
92 EC
93 S04
94 N03
1 V 1UUML.
3
.763
.124
1.009
1.013
.769
1.080
.000
.000
1.066
.839
1.412
.581'
.968
.674
.915
.102
.323
.559
.020
.334
.143
.510
.024
.007
H rt 1 1 U -
4
.038
.000
.000
.000
.006
.002
.020
.252
.004
.017
.000
.000
.000
.000
.017
.023
.039
.088
.630
.894
.106
.062
.021
.016
MEAS
5
.004
.004
.010
.000
.000
.000
.065
.000
.001
.002
.001
.364
.005
.002
.002
.669
.004
.010
.000
.000
.001
.005
.078
.001
SPECIES(ALL SOURCES)
6
.110
.000
.054
.000
.024
.005
.069
1.358
.090
.043
.000
.000
.000
.117
.004
.000
.140
.000
.019
.000
.364
.166
.076
.257
7
-.024
-.159
-.006
.000
-.002
-.001
-.051
-.538
-.216
-.008
-.003
.000
-.008
-.010
.000
.000
-.019
.000
-.004
.000
-.064
-.046
-.053
-.022
11
.032
2.751
.058
.075
.154
.000
.000
.086
.000
.010
.004
.036
.015
.000
.003
.225
.063
.002
.013
.000
.000
.022
.024
.000
13
.001
.009
.003
.000
.000
.000
.003
.001
.021
.001
.000
.001
.002
.221
.001
.000
.001
.006
.001
.000
.001
.001
.003
.004
17
.019
.000
.000
.000
.000
.000
.829
.000
.000
.000
.000
.000
.000
.000
,000
.000
.000
.000
.000
.000
.000
.000
.827
.000
18
.016
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.737
19
.080
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
. .000
.000
.000
.000
.000
.000
.000
.000
.000
.450
.000
.000
.000
                                   8-15

-------
       UNCERTAINTY/SIMILARITY CLUSTERS
               SUM OF COMB. SOURCES
3 15
3 6
3 6
15 17
17
15
11


17
15

56.
60.
60.
•
605+-
699+-
274+-
671+-
2.026
2.431
2.384
.3?7
SPECIES CONCENTRATIONS -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .99    PERCENT MASS
     CHI SQUARE      1.73              DF
OATE:0124 78
      0
  102.2
SIZE:CJARSE
orm.
1
9
11
12
13
14
16
17
19
20
22
23
24
25
26
28
29
30
35
82
91
92
93
94
, ICO-
TOT
F
NA
MG
AL
SI
S
CL
K
CA
TI
V
CR
MN
FE
NI
CU
ZN
BR
PB
OC
EC
S04
N03
• i -:•

*
*

*
*

*
*
*
*
*

*
*
*


*
*
*
*
*
*

73.30000+-
.03600+-
.97000+-
.85000+-
4.80000+-
14.50000+-
.56000+-
.33000+-
.54000+-
2.00000+-
.40000+-
.02600+-
.02600+-
.08300+-
3.50000+-
.02200+-
.05200+-
.11000+-
.22000+-
.62000+-
13.10000+-
1.70000+-
1.70000+-
1.60000+-

1.10000
-.01700
.14000
.12000
.16000
.50000
.14000
.05000
.03000
.07000
.02000
.00200
.00200
.00400
.13200
.00300
.00400
.01000
.02000
.07000
4.30000
.90000
.50000
.30000

74.
•
•
•
4.
15.
•
•
»
1.
•
•
•
*
3.
•
•
•
. •
»
13.
1.
1.
1.

91993+-
09912+-
88748+-
92261+-
48975+-
72217+-
73020+-
33818+-
58858+-
77534+-
56702+-
02558+-
02188+-
08198+-
29103+-
02252+-
02585+-
07319+-
15025+-
76278+-
10000+-
16405+-
70000+-
60000+-

4.35608
.05956
.05694
.08360
.40866
.84027
.10215
.06746
.03336
.12508
.10073
.00351
.00955
.00799
.16388
.00403
.00702
.02103
.04838
.11909
1.04598
.45855
.34105
.17648

1.02+-
2.75+-
.91+-
1.09+-
.94+-
1.08+-
1.30+-
1.02+-
1.09+-
.89+-
1.42+-
.98+-
.84+-
.99+-
.94+-
1.02+-
.50+-
.67+-
.68+-
1.23+-
1.00+-
.68+-
1.00+-
1.00+-
i^/ n — •
.07
2.10
.14
.1R
.09
.07
.37
.26
.09
.07
.26
.15
.37
.11
.06
.23
.14
.20
.23
.24
.34
.45
.36
.22
- - -^.-\
TOT
F
MA
MG
AL
SI
S
CL
K
CA
TI
V
CR
MN
FE
NI
CU
ZN
BR
PB
OC
EC
S04
N03
1 LIJ r\ / J
.33
1.02
-.55
.50
-.71
1.25
.98
.10
1.08
-1.57
1.63
-.10
-.42
-.11
-.99
.10
-3.24
-1.58
-1.33
1.03
.00
-.53
.00
.00
                                   8-17

-------
ENTER COMMAND
SSCONT
                             CALC  SPECIES(PER  SOURCE)
SPECtSOURCE
 1 TOTAL
 9 F
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
?n CA
22 TI
23 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 3R
82 PB
91 OC
92 EC
93 S04
94 N03
₯ i uunu
3
.763
.124
1.009
1.013
.769
1.080
.000
.000
1.0-67
.839
1.412
.581"
.968
.674
.916
.102
.323
.559
.020
.334
.143
.510
.024
.007
r\rv i iu -
4
.038
.000
.000
.000
.006
.002
.020
.256
.004
.018
,000
.000
.000
.000
.017
.023
.040
.090
.640
.908
.107
.063
.022
.016
MEAS
5
.004
.004
.010
.000
.000
.000
.067
.000
.001
.002
.001
.371
.005
,.002
.002
.682
.004
.010
.000
.000
.001
.005
.079
.001
SPECIES(ALL SOURCES)
6
.056
.000
.027
.000
.012
.003
.035
.689
.045
.022
.000
.000
.000
.059
.002
.000
.071
.000
.010
.000
.184
.084
.039
.130
11
.031
2.645
.056
.072
.148
.000
.000
.082
. .000
.009
.004
.035
.014
.000
.002
.216
.061
.002
.012
.000
.000
.021
.023
.000
13
.002
.010
.004
.000
.000
.000
.004
.002
.024
.001
.000
.001
.002
.254
.001
.000
.001
.006
.001
.000
.001
.001
.003
.004
15
-.028
-.030
-.192
.000
-.001
.000
-.408
-.004
-.051
-.003
.000
-.004
-.143
-.001
.000
.000
-.002
-.002
-.001
-.012
.000
.000
-.773
-.008
17
.037
.QUO
.000
.000
.000
.000
1.586
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.oou
.000
.000
.000
.000
1.584
.000
18
.019
.000
,000
.Ouu
.000
.000
.000
.000
.000
.000
.000
.000
.000
.oou
.000
.000
.000
.000
.000
.000
.000
.000
.000
.849
19
.101
.uoo
.oou
.QUO
.uoo
.000
.000
. oou
.000
.000
.000
.OUU
.000
.000
.000
.000
.uoo
.000
.000
.000
.b63
.000
.000
.000
                                   B-18

-------
EXAMPLE B.6:  COMPOSITE - PROBLEMS B.I, B.2, B.3, B.4, B.5
SOURCE CONTRIBUTION ESTIMATES -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .73    PERCENT MASS
     CHI SQUARE     44.78              DF
     DATE:0124 78   SIZE:COARSE
    0
106.8
    8
SOURCE
* TYPE
3 UOUST
5 RDOIL
6 VBRN1
7 VBRN2
11 ALPRO
13 FERMN
15 GLASS
17 S04
18 N03
19 OC
SCE(UG/M3)
41.9295
.2729
63.8163
-6.8923
3.3383
-.2333
-1.6067
1.8586
-1.5037
-22.7108
STD ERR
1.5526
.1488
18.1976
5.5143
2.1902
.4555
2.1410
1.6012
1.8503
12.6036
TSTAT
27.0065
1.8343
3.5069
-1.2499
1.5242
-.5122
-.7504
1.1608
-.8127
-1.8019
INDICATIONS:

     o Chi-Square is » 4, indicating that the fit of the data is not
satisfactory.

     o RATIO R/U « -2.0 for silicon (Si)  suggesting that either:
1) the ambient Si data are high; 2) profile(s) containing Si  are low, or
3) tnere is a missing source which is dominated by Si.

     o RATIO R/U « -2.0 for Pb and Br suggesting bad ambient data,  bad
profile data, or a missing source, as above.

     o RATIO R/U » 2.0 for Fe suggesting  that either:  1) ambient Fe data
are low or profiles containing Fe are high.

     o Minor R/U problems for Cl, In.

     o Evidence of col linearity involving  sources 6, 7,  and 19.

     o Evidence of an undetermined problem with source  15.

ACTION:  Using the flowchart in Figure 1,  the data problems would be
first addressed:  Si would be resolved as  in  problem B.I, and Fe would be
resolved as in problem B.2.  Then, the Pb-Br  problem is  resolved as  a
missing source as in problem B.3.  Following  this, the  col linearity
problems would be addressed as in 8.4 and  B.5.
                                   B-19

-------
MEASURED CONCENTRATION FINE/COARSE/TOTAL:
  42.60000+-    .600/  73.30000+-   1.100/ 115.90000+-
             1.253
       UNCERTAINTY/SIMILARITY CLUSTERS
               SUM OF COMB. SOURCES
6 19
6 7 19
6 7
11 15 17
3 11 15 17
3 11
15 17
41.106+-
34.213+-
56.924+-
3.590+-
45.520+-
45.268+-
.252+-
12.539
10.441
15.480
2.510
2.895
2.595
1.226
SPECIES CONCENTRATIONS -  SITE:PACS2
SAMPLE DURATION        24      START HOUR
       R SQUARE       .73    PERCENT MASS
     CHI SQUARE     44.78              DF
UATE:0124 78
      0
  106.8
SIZErCOARSE
SPECIES-I-M 	 MEAS- 	
1 TOT
9 F
11 NA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI
23 V
24 CR
25 MN
26 FE
28 NI
29 CU
30 ZN
35 8R
82 PB
91 OC
92 EC
93 S04
94 N03

*
*

*
*

*
*
*
*
*

*
*
*


*
*
*
*
*
*
73



4
29



2




3





13
1
1
1
.30000+-
.03600+-
.97000+-
.85000+-
.80000+-
.00000+-
.56000+-
.33000+-
.54000+-
.00000+-
.40000+-
.02600+-
.02600+-
.08300+-
.50000+-
.02200+-
.052005
.11000+-
.22000+-
.62000+-
.10000+-
.70000+-
.70000+-
.60000+-
1.10000
.01700
.14000
.12000
.16000
.50000
.14000
.05000
.03000
.07000
.02000
.00200
.00200
.00400
.13200
.00300
.00400
.01000
.02000
.07000
4.30000
.90000
.50000
.30000
78.
*
1.
9
4.
12.
,
2.
*
1.
*
*
*
,
4.
.
— CALC-
26896+-
12011+-
06045+-
73585+-
69 7 08+-
27399+-
66017+-
89758+-
32103+-
90081+-
42139+-
02191+-
01555+-
07500+-
91307+-
02331+-
.07101+-
.
.
.
13.
2.
1.
1.
04598+-
03764+-
14923+-
10001+-
63859+-
70000+-
60000+-
	 RATIO C/M 	 RATIO R/U
9.61944
.64140
.10329
.09458
.62825
.94406
.22953
1.00184
.31140
.26835
.07564
.00296
.00720
.07681
.14060
.00407
.02638
.01553
.02928
.06299
7.24982
2.14457
.71868
1.61313
1.07+-
3.34+-17
1.09+-
.87+-
.98+-
.42+-
1.18+-
8.78+- 3
.59+-
.95+-
1.05+-
.84+-
.60+-
.90+-
1.40+-
1.06+-
1.37+- .
.42+-
.17+-
.24+-
1.00+-
1.55+- 1
1.00+-
1.00+- 1
.13
.89
.19
.17
.13
.03
.50
.31
.58
.14
.20
.13
.28
.93
.07
.23
52
.15
.13
.11
.64
.51
.52
.03
TOT
F
NA
MG
AL
SI
S
CL
K
CA
TI
V
CR
MN
FE
NI
CU
ZN
BR
PB
OC
EC
S04
N03
.51
.13
.52
-.75
-.16
-15.66
.37
2.56
-.70
-.36
.27
-1.14
-1.40
-.10
7.33
.26
.71
-3.46
-5.14
-5.00
.00
.40
.00
.00
                                   8-20

-------
ENTER COMMAND
SSCONT
                             CALC SPECIES(PER SOURCE)
UHUlVlUUttU KAM 1VJ - 	
MEAS SPECIES(ALL SOURCES)
SPECtSOURCE
1 TOTAL
9 F
11 MA
12 MG
13 AL
14 SI
16 S
17 CL
19 K
20 CA
22 TI 1
23 \l
24 CR
25 MN
26 FE 1
28 NI
29 CU
30 ZN
35 BR
82 PB
91 OC
92 EC
93 S04
94 N03
3
.572
.093
.756
.760
.577
.405
.000
.000
.800
.629
.059
.435
.726
.505
.366
.076
.242
.419
.015
.250
.107
.382
.018
.005
5
.004
.004
.010
.000
.000
.000
.065
.000
.001
.002
.001
.36.1
.005
.002
.002
..665
.004
.010
.000
.000
.001
.005
.077
.001
6
.871
.000
.428
.000
.191
.020
.547
10.733
.709
.341
.000
.000
.000
.923
.035
.000
1.105
.000
.154
.000
2.874
1.314
.601
2.034
7
-.094
-.613
-.023
.000
-.006
-.001
-.197
-2.068
-.830
-.032
-.012
.000
-.032
-.039
-.001
.000
-.072
.000
-.014
.000
-.247
-.178
-.203
-.086
11
.046
3.895
.083
.106
.218
.000
.000
.121
.000
.014
.006
.051
.021
.000
.004
.319
.090
.003
.018
.000
.000
.031
.033
.000
13
-.003
-.019
-.007
.000
.000
.000
-.007
-.003
-.045
-.002
.000
-.002
-.004
-.486
-.001
.000
-.002
-.012
-.002
.000
-.002
-.002
-.006
-.008
15
-.022
-.024
-.152
.000
-.001
.000
-.324
-.003
-.041
-.002
.000
-.003
-.117
.000
.000
.000
-.1)01
-.002
.000
-.010
.000
.000
-.614
-.006
17
.025
.000
.000
.000
.000
.000
1.095
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
1.093
.uoo
18
-.021
.000
.000
.000
.000
.000
.000
.000
.000
.000
- .000
.000
.000
.000
.000
.000
.001)
.000
.000
.000
.000
.000
.000
-.940
19
-.310
.000
.000
.000
.000
.oon
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.QUO
.000
.000
-1.734
.000
.000
.000
                                   B-21

-------
                                 APPENDIX C


                           CMB Model  Assumptions


     C.I  Constant Composition of Source Emissions

     The chemical  profile of particulate emissions must  remain  constant
during transport between source and receptor.   There are two types  of changes
which may occur.  First, if the chemical composition of  emitted particles
differs substantially with particle size, deposition (settling  or removal)
of the larger particles could alter the source profile as  perceived at the
receptor.  For instance, a particular element  might be much  more prevalent
in the fine particles (< 2.5um) than  in the coarse ones  (> 2.Bum) because
of the nature of the source's operation.  If the  source  profiles were avail-
able for the fine and coarse particles separately, there would  be little
cause for concern; however, if only a "total"  profile (fine  and coarse
together) were available, deposition, which primarily, affects the coarse
particles, would effectively alter the profile between source and receptor.

     Remedial  actions could include re-measurement of the  profile in  the
separate size fractions, or deletion  from the  profile of species that are
affected by the selective deposition  of large  particles.

     Second, gases that react and transform into  particles during transport
between source and receptor are often not represented in the source profiles.
One such gas that requires careful  consideration  is $03, because of its
potential for conversion to sulfate.   This  is  discussed  further in
Section 3.2.1.  It may be possible to construct a model  that fits on  both
particulate and gaseous sulfur (e.g., Chow, 1985, Sheff  and  Wadden, 1985).

     C.2  Proper Source Identification and  Characterization

     Incorrect source identification  takes  one of three  forms:

     o  Contributing source types are missing.

     o  Non-contributing source types have  been included.

     o  Several sets of source profiles provide equally  precise but
        significantly different source composition estimates.

     C.2.1  Missing Source Types

     The most common source types are those of secondary compounds  of
sulfate, nitrate,  and organic carbon.  These substances  are  emitted as
gases which later turn into particles which are measured at  the receptor.
Most source profiles apply only to primary  species.   Several methods  have
been proposed for dealing with secondary species, none of  which are
completely satisfactory or proven.
                                    C-l

-------
     The most common, and easiest method,  is  the single constituent
source type proposed by Watson (1979).   This  source profile contains only
one entry for the constituent (i.e.,  sulfate, nitrate, or organic carbon)
which is the result of secondary aerosol  formation.  This "source" will
account for the secondary contributors  to these species, but it cannot
identify the individual source types  contributing the secondary components.
The sulfate, nitrate and organic carbon which is primary (i.e., emitted
as that species as opposed to being formed in the atmosphere) will be
accounted for by the source profile measured  at the emission point.  Tnese
individual contributors can be identified by  using the PSCONT command.

     Another .nethod, proposed by Chow (198b)  and Scheff and Wadden (198b)
is the combination of gaseous and particulate compounds containing the
secondary species.  Very few source profiles  and ambient data which are
currently available contain adequate measurements of gaseous and particulate
phases, and this method would only be applicable in a Level III study (e.g.,
Blumenthal and Watson, 1986; Chow et  al., 1986; Liu et al., 1986).

     A final method for dealing with  the missing secondary source types
involves modifying the source profiles  by a fractionation coefficient cal-
culated froiM transport times and depositio-n and transformation rates.
These fractionation factors might also  be determined from emissions-aging
experiments in enclosed environments  such as  smog chambers (Liu et al.,
1986).  Research on such fractionation  coefficients is in progress (e.g.,
Friedlander, 1981; Stafford and Liljestrand,  1984; Lewis and Stevens, 1985;
Chow, 1985) which might be applicable in certain situations.  These
fractionation coefficients are unvalidated, however, and may add more
uncertainty than they eliminate.

     C.2.2  Non (or Low) - Contributing Source Types

     Low contributing sources generally do not affect the SCE for other
sources if their profiles are not similar to  them.  Therefore, they need
not be eliminated from the fit unless they are shown to be truly very minor
contributors (see 3.5.3.2) and their profiles are collinear with other sources,

     C.2.3  Conflicting Results

     In the process of applying the CMB to different sets of chemical
species and source types for a given receptor sample, it is common to find
more than one "fit" which possesses a low Chi-Square, has a nigh R-Square,
and which accounts for most of the chemical concentrations measured in the
receptor samples.  This was demonstrated in an example by Watson (197y)
from which he concluded that "the receptor model tells what could be the
contributors, not necessarily what are  the contributors."  It is important
to try all suspected source types in the fit  to help identify if this
situation exists.  In such case, additional physical information is then
needed to determine which of them is likely to be true.  This may be derived
from wind direction analysis, microscopic examination of the ambient sample,
operating schedules, dispersion modeling, etc.
                                    C-2

-------
     C.3  Linear Independence of Profiles

     The source profiles must be linearly independent  or as nearly so as
possible.  Collinear, or chemically similar source profiles can interact
to introduce large uncertainties into the SCE.   There  is always some
similarity among source profiles, because there are a  finite number of
elements or species that are readily measurable, and a much larger number
of sources.  High source profile uncertainty can exacerbate collinearity
caused by similar profiles or can result in large standard errors.

     As explained in Section 3.3.4, inherent uncertainty is a measure of
this collinearity and high uncertainty in source profiles and, thus, the
capability of the model to distinguish among the sources included in the
model run.  The sources within each U/S cluster are those which are
largely responsible for the inherent uncertainty of the cluster through
their interaction with each other and the uncert,^ :ity  associated with
their individual species (after Henry, 1982).

     Inherent uncertainty is not a concept which the CMB user inust understand
in detail.  The user must, however, appreciate  that a  high inherent uncer-
tainty is caused by groups of sources that have somewhat similar profiles
and/or by profile uncertainties which are sufficiently large such that the
model cannot distinguish among them with acceptable standard errors.  This
ultimately results in higher standard errors for the SCE's.  If source
profiles were known exactly (profile uncertainty equal to zero), the
model would tolerate a fairly high collinearity and still provide stable
and acceptable SCE's.  However, source profiles generally are not known
exactly and this imprecision can both reduce the amount of collinearity the
model can tolerate and increase the standard error of  the SCE's for the
sources in the uncertainty cluster.

     There are two possible means to reduce the standard errors of their
SCE's, which are listed within U/S clusters. The first is to measure
additional species (at both the source and the  receptor) which will allow
these sources to be differentiated from each othe-- by  the CMB.  The
second is to reduce the uncertainties in the source profiles in the cluster
by making more precise source profile measurements.

     In order to identify the appropriate means to improve the source
contribution estimate (i.e., reduce uncertainty in the profile or measure
additional species to reduce collinearity) one  must determine whether the
high inherent uncertainty of the cluster must be attributed to collinearity
or to source profile uncertainties..  A simple test is  proposed in section
3.5.4.

     C.4  Additional  CMB Assumptions

     In addition to the above four assumptions, tne number of source
categories must be less than the number of species included in the fit.  Also,
measurement errors must be random, uncorrelated and normally distributed.
                                    C-3

-------
                                   TECHNICAL REPORT DATA
                            (Please read Instructions on the reverse before completing)
1. REPORT NO.
                             2.
                                                           3. RECIPIENT'S ACCESSION NO.
4. TITLE AND SUBTITLE
                                                           5. REPORT DATE
  Protocol  for  Applying and Validating the  CMB Model
                                                               May  1987
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)

  Thompson G.  Pace  and Dr.  John G. Watson
                                                           8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
                                                           10. PROGRAM ELEMENT NO.
 Air Management  Technology Branch        Desert  Research
 Monitoring and  Data  Analysis Division     Institute
 U.S. Environmental  Protection Agency    Reno, NV   89506
 Research Triangle  Park, NC  27711
11. CONTRACT/GRANT NO


   CX-813087-01-1
12. SPONSORING AGENCY NAME AND ADDRESS
 U.S. Environmental  Protection Agency
 OAQPS, MDAD, MD-14
 Research Triangle  Park, NC  27711
                                                           13. TYPE OF REPORT AND PERIOD COVERED
14 SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16. ABSTRACT
 This protocol  is  intended to supplement the User's  Manual  for the CMB model  by pro-
 viding technical  guidance on:  (a) the selection  of model  input data, (b)  determina-
 tion of validity  and  uncertainty of a specific model  application, and (c)  reduction of
 the uncertainty associated with the results of a  specific  application.  The  objective
 of a CMB application  is  to use information about  the  chemical composition  of sources
 in an airshed  (source profiles) along with data on  the chemical composition  of the
 ambient air to estimate  the source contributions  which would best "explain"  the chemica
 properties of measured ambient data (species).

 The guidance provided by this document consists of  a  seven-step process:
 1.  assessing the general  applicability of the CMB  model to the situation  under study;
 2.  configuring the model  with appropriate sources, source profiles, and chemical
     species concentrations at receptor sites;
 3.  examining model statistics and diagnostics;
 4.  determining agreement with model  assumptions;
 5.  identifying problems,  changing the model configuration and rerunning;
 6.  testing the consistency and stability of model  results; and
 7.  evaluating the validity of model  results.
17.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                              b.IDENTIFIERS/OPEN ENDED TERMS
              c.  COSATI Field/Group
* Receptor Models
« Chemical Mass Balance
  Source Apportionment
  Least Squares
  Multiple Linear Regression
18. DISTRIBUTION STATEMENT
                                              19. SECURITY CLASS (This Report I
                                                Unlimited
                                                                         21. NO. OF PAGES
                   70
                                              20. SECURITY CLASS (This page I
                                                Unlimited
                                                                         22. PRICE
EPA Form 2220-1 (R«v. 4-77)   PREVIOUS EDITION is OBSOLETE

-------