RIANGLE INST
RTI Project 41U-556-18
August 31, 1972
Statistical Analysis of the Concentration
of Particulate Matter in Air
by
A. C. Nelson, Jr.
S. B. White
K. Poole
Final Report
Contract No. CPA 70-147
Task No. 18
Impact of Secondary Air Quality Standards
on Selected AQCR's
Prepared for:
National Air Pollution Control Administration
411 West Chapel Hill Street
Durham, North Carolina 27701
RESEARCH TRIANGLE PARK, NORTH CAROLINA 27709
-------
STATISTICAL ANALYSIS OF THE CONCENTRATION
OF PARTICULATE MATTER IN AIR
Table of Contents
1. Introduction
2. Discussion of Lognormality of 24-hour
Average Particle Concentration (yg/m^) 2
3. Analysis of Relationship Between GM's
and SGD's, Post 1967 Data 7
4. Time Trends 11
5. Distance Correlations 18
6. Time Series Analysis 24
7. Summary 40
References 42
Appendix 43
-------
1. Introduction
This report extends the work reported in the interim report on May 15,
1972. In that report it was assumed that the distribution of particle
3
concentration (yg/m ) is lognormal without any supporting discussion of
this assumption. In Section 2 of this report one statistical test of this
assumption is made and some basic considerations are discussed. In Sec-
tion 3, the results are given of further analyses of the type performed in
Section 3 of the interim report, e.g., a scatter plot of the standard geo-
metric deviations (SGD's) versus the geometric means (GM's) for data from
the cities of Philadelphia and Pittsburgh, Pennsylvania; Wilmington, Dela-
ware; New York, New York; Newark, New Jersey; and Chicago, Illinois, for
the years 1968 to 1971 as available. Also a comparison is made of the
1967 and 1968 Air Quality Data, annual GM particle concentrations for 116
stations. Section 4 contains some analyses of time trends for the years
1957-1971, as available, for the same cities listed above. The correla-
tions between daily particle concentrations were calculated for several
pairs of neighboring stations in New York, Chicago, and Philadelphia and
related to the distance between stations; these results are given in Sec-
tion 5. Section 6 contains some time series analyses for data collected
and recorded daily for selected stations in Philadelphia, Pennsylvania.
A summary of results, remarks, and conclusions is given in Section 7. A
condensed version of the interim report is included as Appendix A in this
report.
-------
2. Discussion of Lognormality of 24-hour Average Particle Concentration
(yg/m3)
It was assumed in the interim report that the daily average particle
concentrations were lognormally distributed. This was based on recent
studies and several plots made of the cumulative frequency distributions
from several urban and rural stations (sites). For example, three stations
were selected from the 1967 Air Quality Data for particle concentrations
with considerably different GM's, and in each case they appear to be very
well approximated by the lognormal distribution (straight line on log-
probability paper, Figure 1). As a further check on the applicability of
the lognormal distribution, the value of T , the "studentized extreme
deviate" was computed for each site-year combination.
The value of T is given by the relationship
T
n
where x = log (particle concentration),
x, v = largest value of the x's,
x = mean value, and
1/2
s
X
I
-------
.3
1000
2%
10
15 20
30
PERCENTAGE
40 50 60
70
80 85 90
95
98%
Figure 1. Cumulative Distribution Function for Three
Stations -. Based on Air Quality Data for 1967
2002
lOOi
9
8
7
6
5
30
20 2
10 1
9
8
7
6
Charleston, W. Va. (1957-1969)
212 California
(272 Samples)
San Francisco (1967-1969)
101 Grove St.
(333 Observations-Samples)
Cheyenne, Wyoming (1957-1969)
23rd and Central Ave.
(278 Samples)
-------
of T is a function of sample size, but changes very little with small
variations in sample sizes. Thus the data for samples of size n = 21 to
34 were pooled, similarly for samples of size n = 47 to 76. The ob-
served distributions of the T are given in Figure 2 along with the
available percentage points of the theoretical distributions [1]. It
appears there is very good agreement between the sampled distribution and
the theoretical distribution of T . This (in addition to previous
analyses) would imply that the logarithms of the particle concentrations
follow the normal distribution very closely.
It would be desirable to substantiate the use of the lognormal (LN)
distribution through a physical interpretation rather than justifying its
adequacy only on the basis of statistical tests. For example, it is well
known that the distributions of particle sizes, weights, etc., are ade-
quately approximated using the LN distribution and the physical mechanisms
for the use of this distribution have been given [2], In our case we
might start with the assumption that the particle sizes and weights are
LN distributed, and then explain why the 24-hour averages are so distribu-
ted. Thus we need to look for possible physical explanations for the day-
to-day variations. A fundamental property of the LN is that: If X- and
X_ are two independent positive variates such that their product X-X- are
LN variates, then both X. and X- are LN variates (or as a special case,
one of the variates may be constant) [3], Of course, this statement may
be extended to a finite number of variates. Thus, we can state that if
3
the resulting measurement, particle concentration in yg/m , is LN, and if
the particle size and weight distributions are LN, then there must be a
mechanism which samples these particles (and deposits them on the filter)
in such a manner that the sampling may be described by another variate
-------
in.O "•" 99.9 99.8
99 98 95 90
80 70 60 50 40 30 20
10
1 0.5 0.2 0.1 0.05 0.01
10
9
8
7
6
5
m
Figure 2. Comparison of Observed and Theoretical
Distribution of T
X, N - X
(n)
n
EBE
it
P
-rl+i
-H-
Lttt
:E
2JCL
M
ii FT
tt
n - 21 to 34
it
;t*=
Efe
m.
__U_i,.
4*1
m
:i
^EEt
:!£EJ£^
5 .
as I
ip^
:is
4
I
-3~J"
dEQr
^
=t
i
,0
2:
^E&
_u
1.0
_. I..
~EP'
%
JJ
ffb
hi
Fff
n = 47 to 76
_u
• - Observed Distribution of T
n
x - Percentiles of the Theoretical
Distribution of Tn for n «= 24
o - Percentiles of the Theoretical
Distribution of Tn for n - 57
Ji
0.0) D.li!) U.I
u.ti
:MI 41) .MI r
^iiiini I ;l I I v<- !'<• r<'<
,11 711
•lit .'!','<•
90
frj.9'i
-------
having the LN distribution or the degenerate case, i.e., a constant. Two
possible hypotheses are suggested for consideration but not tested, nor
has any supporting discussion been found in a preliminary search of the
literature. One hypothesis is that the meteorological conditions may vary
from day to day in such a manner that the sampling variation may be de-
scribed by a LN variate. For example, the sampling variation may be
described by the extreme values of the magnitude of the wind velocity.
Another hypothesis is that the growth of the sample on the filter in the
high-volume samples might be described by an LN variate. These and possibly
other plausible explanations need to be further investigated for the purpose
of a better understanding of the distribution of particles in our environ-
ment. Hopefully, an improved understanding will result in better air
quality control procedures or strategies.
-------
3. Analysis of Relationship Between GM's and SGD's, Post 1967 Data
The GM's and the SGD's for each year (1968 and later) and for each of
the cities given in Section 1 are plotted in Figures 3A and 3B. It is
evident from these data that there is no correlation between the GM and
SGD--that is, the SGD does not change with changes in the GM. The same
conclusion was noted from the 1967 air quality data as tabulated in
Table 2.1 of [4]. (See Figures 2A and 2B in Appendix A.) The primary and
secondary standards are also shown on the plots in order to provide some
insight into the likelihood of complying with the standards. The computa-
tion of the values of the GM's and SGD's which will result in v violations
per year (assuming the LN distribution) is described in Appendix A.
If one compares those data of Figures 3A and 3B of this report with
that of the corresponding Figures 2A and 2B of Appendix A, two inferences
are readily drawn: (1) the GM's for the data in this report are considerably
higher and (2) there is little change in the variation of the SGD's. The
reason for the change in the GM's is obviously dependent on the fact that
the GM's in this report are based on data from large urban areas, whereas
the data in the interim report were based on all urban areas in the air
quality sampling network. However, the main point of comparison is that
the distribution of SGD's does not change appreciably.
Another analysis was conducted of the GM's based on 1967 and 1968 air
quality data. A small but insignificant decrease in the mean difference
was indicated—that is, the 1968 means tended to be slightly lower for the
116 urban stations for which data were available for the two years. The
frequency distribution of the differences is given in Table 1. This dif-
ference between the 1968 and 1967 GM's is consistent with a slight
downward trend in the GM's for the stations in the cities being studied in
this report. This trend will be discussed in the following section.
-------
Figure 3A. Values of GM and SGD which yield the
Specified Average Number of Violations (V) of
Primary Standard (Daily Average 260 yg/m
Based on 365 Samples - Also Scatter
Plot of SGD vs. GM for
Years 1968-1971 for Selected Cities
... I | .
. 200 ::i.
•GEOMETRIC
MEAN (GM) ; tig/
-------
Figure 3B. Values of GM and SGD which yield the
Specified Average Number of Violations (V) of
Secondary Standard (Daily Average 150 yg/m')
Based on 365 Samples - Also Scatter
Plot of SGD vs. GM for
Years 1968-1971 for Selected Cities.
Listed in Section 1.
1 —-GEOMETRIC-MEAN-(GM)
-------
10
TABLE 1
Frequency Distribution of the Differences in the GM's for 1968 and 1967
(1968 GM - 1967 GM)
3
Interval (yig/m ) Frequency
-49 to -40 1
-39 to -30 5
-29 to -20 8
-19 to -10 23
- 9 to 0 31
1 to 10 24
11 to 20 10
21 to 30 4
31 to 40 7
41 to 50 1
51 to 60 0
61 to 70 0
71 to 80 .0
81 to 90 0
91 to 100 1
101 to 110 1_
Total 116
-------
11
4. Time Trends
The behavior of particulate concentration with respect to time was
examined through the use of linear regression analysis. This technique
assumes a constant change in logarithm of particle concentration per unit
change in time, and is adequate for evaluating long-term trends. A linear
regression model was fitted to the quarterly mean of the logarithms of
particle concentrations for the years 1957 to 1971 for which data were
available. This model or prediction equation was assumed to be of the
form
Y - a + b (t - t)
where Y = predicted quarterly mean of the log of particle concentration
(yg/m3),
a = weighted mean of all of the quarterly means,
b = slope of the fitted equation,
t = time measured in the number of quarters referenced to the first
quarter for which data are available, at which t = 0,
t = mean of the time in number of quarters.
An example is given below to illustrate the computations required in this
analysis. Since the number of observations on which the means are based
vary from one quarter to another, a weighted linear regression was performed,
If n. is the number of observations made during the ith quarter, then the
following formulae apply.
-------
E n,
- a I n± -
2 -2
> I n±(t±-t)
(E n2) [E n
E n.
y = j observation during the ith quarter
—
y. = E y../n. = mean for the ith quarter
— (Student t)
Sb
For the following data [State 8, Area 260 (Wilmington, Delaware), Site 3,
Agency A01, Years 1970-71], the results of applying these formulae are given
below. The results are graphically shown in Figure 4, and a sample portion
of the computer printout is shown as Table 2.
n.,
0
1
2
3
4
5
6
7
2.0677
. 2.0836
2.0029
2.0110
2.1689
1.9837
1.8692
2.0183
6
6
7
5
7
4
1
6
t = 3.098
a = . 2.0465
b = -.00708
su = 0.01074
b
0.068 (not significant)
-------
Log GM
0.22
0.21
0.20
0.19
0.18
Figure 4. Prediction Equation for
(Data in Table 2)
log GM vs. Time
_:_, Y = 2.047 - 0.0071 (t -
Time in Quarters
-------
1
\
Table 2. Poi
rtion of Computer Printout
.
STATE AKEA
8 260
SITE AGENCY YEAR ITEMS UBS,
3 A01 70 72 2lbO >260
,132?BE 01 1 0
.12866E 01 1 0
.1235°L 01 0 0
.12075E 01 00
.12659E 01 20 O.M9977E 01
A9 ITEMS DBS.
71 51 17
SGO >150 >260
,16637£ 01 21
,!2538t 01 .00
, 10000E 01 ..0 0
.15249E 01 1 0
.15530E 01 3 1 0."25671E 01 = T_
.12700E 03 0.30076E 01 0 . 1 07«OE-0 1-0 , 68321 E-0 1
' S
"NTo r\f r\Kc
-------
15
The prediction equation is Y = 2.0465 - .00708(t±- 3.098).
For this example, b = -.00708 indicating that the quarterly mean of the
log of particle concentration is decreasing at the rate of .00708 units per
quarter. A statistical test to determine if b differs significantly from
zero is made with the test statistic t = b/s, , which in this case shows b
is not significant.
A total of 38 regression analyses were performed. In all cases, the
coefficient b was negative, indicating a downward trend in quarterly mean
particle concentration with respect to time. Individually, the trends
were not significant; however, collectively the trend is significant since
all stations showed a downward trend.
Figures 5A and 5B contain plots of GM's and SGD's versus Time for
Philadelphia, Pa., Site 1, Agency A01, to illustrate the trends at one
site for several years. The trend of SGD versus time was analyzed for
seven sites for which data were available for ten or more years. Four
of the seven trends were upward (b was positive), and only two of the b's
were significantly different from zero (one positive and one negative value).
Thus the fact that the trends in the GM's were all downward, but not sig-
nificant, does not imply a downward trend in the SGD's.
-------
Figure 5A. Prediction Equation for log GM and
Plot of Quarterly log GM's vs. j
Time in Quarters
(Site 1 in Philadelphia, Pa.)
log Y = 2.1646 - 0.00306 (t.-t)
100)
;T...-...|.;
"211 I"_L ;: J _T T_"'l.ri CTT"i^i"d::'J— I ""i T;--^'!""'IT;"; "i 1_ "' i:T"*' i ~ •''"]'.
'"-• '" ''• •••
11
fp!
19
23
m
;l!i!!:i.35
I;
43 i^
:47;
51.
..,.....,,....,
i IO 13 ZO 3? 9 IO ID
3 IO it) ZO 23 A IO IS ZO 2E S IO If ZO
IO 13 2O ;h 3 IO IB 20 23 9 IO IS ?O ?f. 9 IO If. ZO 23 f IO IS ?.O Ip C IO ID ZO 23 i IO 13 ZO 3? 9 IO ID ZO £3 3 IO 13 2O 23
1957 58 59
60 61 62 63 64 65 66 67
Year
68
69 70 71
-------
CCg !0 X 10 TO THE INCH 46 O732
f^. 7 X1O INCHES W40CINU.S.A. •
KEUFFCL a ESSEK CO.
0.22
0.20
0.18
00 Q
0 O
I-J c/0
0.16
0.14
0.12
0.10
19
i
1
;
1 i
i i
i i
1 !_. 1
1
i
j
! i '
1 '
J ! '
-I— J-
,
1=1
:
H~
>«•
i
_
!
i I
i-
1
-
1
i
i
i
—It
|J !
! 1 !
i
i
i
j i
!
1
• i i
' ' i
"~
-
! i i
-
1
_
i
I
1
„
4-R4-
L_! ! i
J
"»*
_
—
h
bv
..
!
l
I
i !
"~\ i
1
~_
_
~
\
-
55
-
_
-1
i
_
_
__
~~
""
1
r
-
~
:"
I
:
i
1
1
i
t
!
1
I
1
!
1 i
1
I
1
i
-
-
i
I
<
l :
~_
i
_
-
I
i
_
-
'-
--
1 |
!
1
I
-44-
~
i i
J_
1
-
_.
!
_u
d
-
-
-
-
_
-
_
-
i
I
1
—
_
n
— I —
T"
-
| >
i 1
1 t
»r
1 —
t
__
_
"••
i
!
'T"
_j
«*^
-!~
—
-
-
-
_
^~
i
— r-
H-
-
-
--
_
i
f !
I
i i
i
-
— .,
j j
T
I**.
j
=
1
•• !••
l
n
!
J±
i
_
~|
i
J±
-1-1-
~
— i
i
|
!
-1
i
n
!
!
J_
«t
-
—
-
j
— ,
i
-
I
-
1
L
~ j
i
-
H-
"-.
t
!^-
i
i
4-
i
J-
_
-
-
-
"
i
1
-
-j-
t
•n
l
±
-
-
\
-
„
^
—
._
~
i
i
I
_i_
«£
_
B
—
-
i
-
_|_
-
-
-
:
'
i
_
4n
-
~H
:
_
i
r
-1
-
-
i
-
L
-
-
:
-
i
1 i
-
—
-
-
—
-
...
!
{
J_
-
-
i
!
1
i 1
+T
-
_j_
-p1-
"""
-
-
-
•
-
n*'
1
.„
~rr
!
-
:
-
-
~
1
__
~n
-
i
i
_
i
i
— L_
i
i
i
-
~
i
-
-
-
-f-
I
_
-
•^v-«,
b
I
j.
Lf
"
i
-
-
—
-
-
—
_-
L
_
'
1
H
j
_
-
I
i !
i
-
i
i
i
i i
i
j
_
-
r
-
1
j
L
h.
i~
-*.
_]_
-
-
~
1
I
1
-
_i_
-:~
-
E
—
I
l i
; j_
i.
...L
.
l
i
_
1
-L-
h
1
-
1
~
<
J
•— -u
Y
-
p;
~
-
h
•^
-
_
s
-
-
_
%•
-
H-
-
1
1
i
»
1 J
1
0
-
-
Figure 5B. Prediction Equat
log SGD and Plot of
log SGD's vs. Time in Yea
(Site 1 in Philadelphia, P,
_! ! J J_ L
j
-
lwBh*~*fc»,
254
— 1 —
- c
**
).
or
0
i
I
i
1
-
~~
~
!
-
-
i
TH
0
l
!
k~^J
14(
-
-
-
y
>
-
V
^k
i
1
s
(
0
-
-
.c
-
-
)8
Tt «,-
)
1
6
-.
-
>
-i-
_.
_
1
I
-.-,
-
j»
T
-
--
1-
-
-
-
-
-
l
vr
|
-
~
1
_
—
I
j
-
._
-
1
i
r
a
-1
| 1
|
1
--
::
i
i
on for
s
— 1 ' '
J
-frf
_
--
_
~
—
_
-
|
1
i
1
1
.
i
hi
i
i
i 1
i i
J"
--
-
T
_j__
i_
i
1957 .1.959 196.1 1.963 1965 1967 1969 1971 1972
'I' line
-------
5. Distance Correlations
It is intuitively clear that the data observed on the same day at two
stations located close to one another should be highly correlated and that
these correlations should decrease as the distance between the stations in-
creases. This, in fact, has been observed for a few stations selected in
Philadelphia, New York, and Chicago. Figures 6A and 6B show the relative
locations of the stations and the distances between them. Plots of the
sample correlations versus distance for several pairs of stations in these
three cities are given in Figure 7. For the station-pairs in Chicago,
the 1969 and 1970 values are plotted separately. A major consequence of
3
this observed relationship is that the mean particle concentration (yg/m )
over several nearby stations will not be distributed with variance equal
to 1/n times the variance for a single observation, as would be the case
for independent data. Because of the correlation, the variance will be
between 1/n and 1 times the variance for a single observation. In any case,
it is obvious that the likelihood that the mean particle concentration over
3
. a specified area or region will exceed a specified value, say 150 yg/m ,
would be less than that for the largest observation over the same area.
The importance of this result is dependent on the basis of the air quality
standard for particle concentration. For example, if the standard is
based on visibility and if the latter is dependent on the average concen-
tration rather than the maximum concentration over a specified area, then
it is reasonable to use the information about the distance-correlations to
infer the likelihood of the average exceeding the standard. On the other
hand, if the standard is based on the worst case or maximum value for a
specified area, such as a health damage effect (if a single exposure to
-------
Figure 6A. Relative Locations of Stations Used
in Distance Correlation Study
19
New York
12
7.5 miles
16
Station, Location
12 New York University, 1911 Osborn Place
16 Hillside Avenue and 231st Street
25 Bedford Avenue and Campus Road
3.5 miles
Station, Location
12 1501 Lycoming Street
13 CAMP
69 3200 Frankford
72 Mobile Trailer (10th and Patterson)
-------
20
Figure 6B. Relative Locations of Stations Used
in Distance Correlation Study
15
4.3 miles
6.8 miles
Station, Location
2 445 South Plymouth Court
5 GSA Building, 538 S. Clark Street
12 Cooley Voc. High School, 1225 N. Sedwick St.
8 Hyde Park High School, 6220 S. Stony Island Ave.
14 Farr Dormitory, 3300 S. Michigan Avenue
15 Kelly High School, 4136 S. California. Avenue
13 Crane High School, 2245 West Jackson Boulevard
18 Carver High School, 801 E. 133rd Place
19 Clay Elementary School, 13231 S. Burley Avenue.
-------
1.0
0.8
rni
irl
i-H -
•J-i±
Ijji:!
Figure 7. Sample Correlations
vs. Distance (Miles)
-rt-i+
izrr
t.j.t.LU i,-,
.ffii
1
ffl
ilil
;-fiL
-rB
0.6
V'X;
fen
!• i i-i
t-nf-t
,1-r
ir',4
••
s
4±;+
fl+
Mri±
ttiir
C-l-L
0.4
itl-
m
rtFr
li.O
•Hi
liil
:Ejff
Mt!
PHI
0.2
! ;.
!Hi
IT:-!-
•t!±
Li!n
±BJ
SST
t I_I
t ' :
444.
!|4t
;^R-
1
j-r.t:
i-j-t-
it
iiTi
ijfr
<-rH-
^i-i i4-;.4-! i.j.-Hi+4-i-rl l-i-H-ltrfl:
x - Philadelphia, Pa.
o - New York, N.Y.
A - Chicago, 111.
4 5
Distance (Miles)
8
-------
22
this is critical), the largest value will be critical. Even for a health
effect it is not obvious that the maximum should be used because the site
at which the maximum occurs varies over time. The data in Table 3 illus-
trate the variation of the observed maxima with time, and also the
3
comparison of the frequencies of exceeding 150 yg/m for the means and the
maxima. Typically, the maximum concentration occurs at a particular site
for a few days and then at another site, clearly not a random phenomena
but dependent on the meteorological patterns and distance from the sources.
-------
23
Table 3. Comparison of Frequency of Exceeding Secondary
3
Standard (150 yg/m ) for Neighboring Sites
Area
Chicago, 111.
New York, N.Y.
Philadelphia, Pa.
Year
1969
1970
1969
1970
1969
1970
1969
1969
Sites
2,5,12
2,5,12
8,14,15
8,14,15
13,18,19
13,18,19
12,16,25
12,13,69,72
No.
Cases
53
31
93
119
100
99
33
97
No.
Max.
>150
44
22
60 '
62
72
53
7
49
of Cases for Which
Mean
>150
35
16
35
31
45
34
1
27
Maximum is located
at Site No.
2
30
25
8
56
59
11
50
57
11
13
12.
7
5
4
3
14
9
30
li
30
27
16
2
13.
41
12
19
3
15
28
30
ii
20
15
11
18
ii
32
T2
17
Freq. (Mean
>150 Max >15C"X
0.80
0.71
0.58
0.50
0.63
0.64
t,
0.14
0.55
This conditional frequency is obtained by dividing the frequency for which
the mean is greater than 150 by that for which the maximum value is greater
than 150. It is dependent on the number of sites, the GM's at each site,
closeness of the sites, etc. Thus these conditional frequencies should only
be used to provide an estimate of the range of values vrtiich one may expect.
-------
.24
6. Time Series Analysis
This section is not meant to be a penetrating analysis of the data
contained herein. Rather it is intended to reveal the kinds of questions
one may answer by using time series analysis.
Figures 8A, 8B, and 8C below are plots of the common logarithms of
3
the daily particle concentrations (pg/m ) for three sampling sites in
Philadelphia during the year 1968. Although daily observations were made,
only the first 56 for each site are shown for illustrative purposes. The
behavior of the series for the remainder of the year is remarkably similar
to that for the first 56 days. Previous analysis of the data revealed
that the logarithms of particle concentrations closely follow a normal
distribution. Since the graphs indicate that the series appear reasonably
stationary, then they may be subjected to a lag covariance and a spectral
analysis. The means and variances of the three series are:
Mean
Variance
Set 1
2.097
.030
Set 2
2.037
.030
Set 3
2.020
.041
Lag Covariance Analysis. The lag covariances of a time series are the
covariances of the observations in the series which are 1, 2, 3, ... units
of time apart. These are plotted in Figures 9A, 9B, and 9C below for the
series in Figures 8A, 8B, and 8C, respectively. Whereas the plots could have
been made for lags up to 364 days only the first fifty-five lags are given
for illustrative purposes. The ordinate (y axis) in the plot is the
covariance and the abscissa (x axis) is the time lag. Note that the value
of the plot at the origin is the variance of the series. In order to compute
-------
.25
the lag correlations (correlogram) for the series, the ordinates of
Figures 9A, 9B, and 9C need to be divided by the ordinate at the origin
(i.'e. 0 lag).
The three lag covariance plots reveal essentially the same thing. The
correlation of observations 1 day apart are:
Correlation of Observations 1 Day Apart (Lag = 1)
Figure 9A
Figure 9B
Figure 9C
.0097/.0301 = .32
.0116/.0302 = .38
.0138/.0417 = .33
Though these correlations are not impressively large, they are signifi-
cantly different from zero and must be taken as indicating a dependency
amongst the daily readings. More striking than the moderate but significant
correlation of observations one day apart are the persistent cycles in the
covariance function indicating that perhaps there are significant cycles in
the data. Hence, in order to determine the presence of cycles or periodicities
in the data a spectral analysis was performed.
Spectral Analysis. Any real stationary time series, X , may be written
as a (perhaps infinite) linear combination of cosine functions in the
following manner:
C°S kX
k=Q
k •
(1)
Very often, however, only a finite number of the coefficients E , will
t ""K.
be non-zero and one of the purposes of the spectral analysis is to identify
these coefficients.
-------
26
Mathematically, the spectrum of a time series is the Fourier trans-
form of the lag covariance function. A plot of the spectral density
gives the density of the power of the process at the frequencies A, in
K.
Equation (1) above and significant peaks in the density correspond to
important X . The spectral analysis may also be used to determine the
tC
proportion of the variance of the process which is due to the different
frequencies.
Spectral densities for the three series discussed above are given in
Figures IDA, 10B, and IOC. The frequencies (radians/day) are usually com-
puted from 0 to ir but only the first 2.1 radians are given in these figures.
Frequencies beyond these were visually determined to be unimportant.
A
From the method of calculation of the spectral density f(A), it was
A.
determined that if in fact no peak were present then the estimates f (X)
would be distributed proportional to a chi-squared variate with ten degrees
of freedom. Specifically,
-2 2
where a is the estimated variance of the process and X-IQ is the chi-squared
variate with 10 degrees of freedom. This may be used as a basis for testing
if observed peaks are significant.
Assuming that the variance is around .03 (actually process three has
a variance of .04) then the critical value for the estimated spectral
density at the one percent level of significance is .14. From Figures 10A,
10B, and IOC the significant peaks are calculated and given in the following
table. Radians per day are converted to days per cycle by
-------
.27
Days per cycle = 2ir/Radians per day.
Peaks at zero frequency are ignored since the method of calculation for
these particular spectra often gives spurious results at this frequency.
Series Radians/Day Days/Cycle
1
2
3
.31
.64
.84
.38
.64
.88
1.20
1.79
.31
.38
.62
.88
20.3
9.8
7.5
16.5
9.8
7.1
5.2
3.5
20.3
16.5
10.1
7.3
The first thing to note is the remarkable consistency in the results
for the three series. The estimated peaks at around 20 days and 16.5 days
are probably in reality estimating the same peak, hence one may say that
all series consist of three basic superimposed periodicities: one between
16.5 - 20 days, one at around 10 days, and one at seven days. Series 2
appears to have additional periodicities at around five and three days but
analysis of additional data at this site would be required before making a
final decision.
If one agrees that the peaks at 20.3 and 16.5 days are in fact estimating
the same frequency then the model for series one and three is:
-------
28
Xt = ?0 + 51 Cos Xlt + ?2 Cos X2t + ?3 Cos X3t
where
AX » .35
\2 x .63
X3 w .87
and the £.'s are parameters to be estimated by standard least squares.
Series two may be modeled in a similar fashion.
One consequence of the spectral analysis is that the sampling rate
(number of days per year, or per week) should be determined so that it
does not coincide with the observed cycles, 7 days, 10 days, etc.
Other calculations of interest which may be carried out using
information furnished by the estimated spectrum are:
i) the estimated number of times the log (particle
concentration) exceeds some level u during the
year, and
ii) the proportion of time the log (particle concen-
tration) spends above some level u during the
year.
Let
U = the number of violations of the level u
n
then the expected value of U in a year is
365 .1/2
X0 exp
/. I /-i
\ 2v
2
where a is the variance of the series, m is the mean
1/2
of the series, and X_ is the standard deviation
of the spectral density.
-------
.29
For example in series 1 above:
o2 = .03
m = 2.097
1,1/2 _ / .yj./ \ , i. ,;/, <. i _ n ot;n£
X2 ~ \5T I Xi f (Vj - °-2506
where f(x.) is the estimated spectral density of Figure 7.
If u is taken as log 260, then the expected number of crossings
(i.e. violations of the level u) is estimated as 15.6 per year.
Further, let
P (t) = estimated proportion of time the process
is above the level u.
Then the expected value of P (t) is
and its variance is given by:
2
365 r(t)/a ,
where r(t) is the covariance function and $ is the standard normal distribu-
tion. The variance is difficult to calculate but a simple calculation gives
the estimated proportion, P (t) , for series one as three percent if u is
taken Iog10260.
The utility of these formulae lies in the fact that they give estimates
appropriate for a continuous time series - as level of particulates is - even
through the actual observation may be taken at discrete time points (e.g.
every three days). This, of course, becomes very important when samples are
-------
30
taken rather far apart for in these cases individual violations of the
level u may be missed by looking at the data whereas their frequency may
still be estimated by the above formulae.
-------
Figure 8fi. Login (Daily Particl^Concentration) vs. Time (Days)
3T C'F TIME SEPItS__J
TIME C'WDINATh
1
2
i
u
5
6
7
8
10
1 1
' 12
13
Ib
16
17
18
20
21
23
25
26
27
2b
30
31
32
33
35
3t>
37
33
3E.
0. ln£t
0,17it
01 »
01
01 *
01 *
01
01 *
01 *
01
01 «
01 *
01 *
01 *
01 *
01 •
01 «
01
01
01
01
01
01
01
01 *
01 *
01
01
01
01
01
01
01
01
01 *
01 *
01
01
Cl
01 *
01 *
01 *
0! *
01 »
01 *
01
01
01
o;
01 •
01
01 *
01
01 «
01
01 *
01 *
*
;
*
*
*
*
A
*
*
*
*
*
ft
*
* '
'
*
*
*
*
* - ;
ft
*
ft
* * J
* ;
'
* !
-------
Figure 8B.
L _
OF Tine. SERIES 2..
TIME CROINATt
1
2
u
5
"•~6 '
7
_ 8
9 "
10
11
"12"
13
Ib
16
17
la
19
20
21
22
23
25
2o
27 '
28
"30
31
32
35
36
37
36
39
«0
tiu
U7
U8
u9
50
51
52
53
"So"
55
56
Log.. _ (Daily
Concentration) vs. Time (Days)
0.197E
0.230E
0.205E
O.l^CiE
o!l71E
0.177E
0.21 IE
0.188E
0.2 l-*t
0.193E
0,17oE
c!l"5t
0.236E
0.210E
0.21Bfc
0.2S3E
0 . 1 9 0 1
C-.210E
0 . 2 0 a E
0,232c
0.20JE
" 6, 2 lit
C.WE
0 . I'j'ii
0.197t
o!lt3E
o|201E
0.216E
o! 1931
0. 191E
0.17"E
o!
to
-------
-o
^roi
T Cf TIME SERIES 3
Figure 8C. L°g10 (Daily Partic^^oncentration) vs. Time (Days)
L L
ME
2
U
5
6
7
8
9
10
11
12
13
lu
15
16
17
ie
19
20
21
22
25
24
cb'
27
26
30
31
32
33
3b
JO
37
35
39
uo
-3
1* ii
U6
HB
bC
bl
52
b3
ba
bb
be
C'HDINATE
0. 194E
0.219E
0.207E
o.isaE
0.221E
0. IboE
0.231E
0.175E
0.190E
0.201E
0, 189E
0 . 1 o * E
o! 18bt
C.23-E
0.2«-)£
G.2blE
0.220E
0.22'Jt
0. last
0 . 2 C a r.
G . <; 0 / b
i:l;i[
0.21«E
0.189E
C, 15 Jh
C.J17E
C. 17bfc
0 . 2 1 C i
C.210E
0.202E
O.lBst
c . 1 1 •: K
C . f. 0 •> e.
G. 17t!t
0,2COt
0.2176
0.22&E
0.201E
0.2CCE
0.207E
0.209E
0.21 It
o.iabE
0 . 1 6 1 1
01 •»
01
01
01 *
01
01 *
01 *
01
01 t
01 «
01
01 «
01 «
01
01 - *
01
01
01
01
01
01
01
01 *
01
01
oi
01
01
01 *
01 • *
01
01 *
01 *
01
01
01
01 *
01 *
01
01 . «
01 «
o:
01 *
01 ' *
01
01
01 *
01
01 *
01
01
01
01 •
01 *
-
1
*
*
* . i
*
1
*
• *
* 1
*
* ,
*
*
*
1
*
A
*
*
* !
A
• . i
* •
t
*
* !
-
*
* >
CO
-------
Figure 9A. Lag Covaria]^^ Function for Series 1
ft CF AUTUCCViHIiNCfcJL
U*S OROINATE
l_
1
a
3
1
S
6
7
6
9
10
11
12
13
1<4
15
16
"~ 17
16
1<*
£0
21
22
C.Z
2u
2b
£6
27
26
29
30
31
32
33
3a
^5
.><>
: 37
38
39
"0
«1
«a
<.i
uu
«5
tfe
<-7
b9
C<<
50
51
52
S3
blt-0e
0.75UL-03
•0.178t-02 *
»0,3?<»t-03 *
• 0.1'/2t-0a »
• 0, 1 Oxt.-0£ »
• C . 1 1 2 1: - 0 2 *
C,21t'L-02
-O..»71t-03 *
»OfW9JE.-Otl *
-C.13ot-02 *
-0.22'-»t-03 *
0. J2bt-0u
0.17U-02
0,3<5<;c.-02
o.-ia Jt-oa
C.770S-CM
-C."31'*t-02 . *
- 0 . b s i :. - 0 a *
• c,3<;'-)t-o£ *
»0.23ot-02 *
»Ot207E-03 *
0. 17ot-0a
C.2i-bt-02
C. iblt-03
-C.lfcbc.-02 *
•0.335fc-02 *
-C.v;2t-0j *
C.<«12i.-03
O.bcut-Oi
.0.6T3h-03 *
• 0,3t'ot-03 *
•O.abut-03 • . *
0. 165t-03
0.3m -OH
C'.«.SOL-C?
V . fc1 C V '. - 0 b
• 6.2<»/c.-C2 *
•0.360E-02 +
»0.t66t-02 •
•O."27t-02 »
•C,13ot-02 »
C. mt-C2
C.13-E.-02
• O.b«iit-03 *
O.llrt-02
• 0,17«E.-03 »
•0.2bit-02 *
• G.24b't.03 *
*
*
*
'
'
A
*
*
*
*
4
*
*
*
*
»
-------
-o
Figure 9B. Lag Covaric
Function for Series 2
T CF AUTCCCVARIANCES
.AC
1
2
3
U
5
6
7
6
9
10
11
12
13
14
15
16
17
18
19
20
21
22
25
CO
27
28
30
il
33
33
35
36
37
36
39
ao
"I
••2
us
ii
a?
ib
« 9
50
51
52
53
5a
55
CPDIMATE
0.1 loL-01
0.150L-02 ' '
-O.IU2E-02
-C.35U-02
C,679t-03
0.225K-02
•0,abOt-03
-0.316t-03
-0.1 17E-02
-0.5?Cf-03
-C,l Jlt-02
G.ai lt-03
o|2COt-02
-C, 136E.-02
•0.59ct-03
C , 2 2 V t - C 2
0.1 17t-02
C.2ost-C2
C.<:t>GE.-0;;
0 . 1 To t - 0 3
-0,9bot-03
-0.150E-02
•O.abut-03
C.^'Sir. -02
-C . lMt-02
-O.li9h-Cc
0.251E-03
O.SU71-03
-C,7uit-03
0 . 1 <> :• i: - 0 2
s$H •
o!l35t-02
-C.3i7fc-03
C,557fc-03
0 . J 3 7 fc - 0 2
C,2C/t-02
0.122F.-C3
-0,l"5t-02
-0.261E-02
1 *
*
*l
* 1
* 1 " :
1 *
A
*l
*i
*i
«i
*
I *
I *
* !
*l
1 *
*
1*
1 •
*
* 1
« I
* 1
*l
*l
A
1 *
i *
* \
*
*
* '
*
*l
A
1 *
1 * :
*
*
* - »
*l
*
1 *
*
* 1
CO
-------
-o
^roi
Figure 9C. Lag Covaria^fc Function for Series 3
T CF AUTOCCVAR1ANCES
LAC,
i
i
i
i
•
1
: —
0
1
2
3
4
b
7
e
9
10
11
12
S3
It
IS
Ib
17
18
20
22
23
2b
27
25
2?
30
31
32
33
3U
36
37
36
39
(.0
•*!
u2
1.3
tin
US
tb
(.7
SO
bl
52
S3
SS
CKOINATt
0.136E-01
• 0,<"UE-02
•G.317t-02
-0.33<>E-02
0.353t-02
0,3
-------
r~i f-~! r-n r -i • r ; r~i r ~\. rn r~i
1 ' Figure 10A. Spectral Density for Series 1
til
C
LJ LJ
L...
l_
0.3],
L4..I I i L.*
0.64 0.8
r;
4 (Radians/Day)
Co
-------
Figure 10B. Spectral Density for Series 2
Jt[.J LifLJ L
' '
1.20 (Radians/Day)
L.i Li.L
Li L.i
0.38
0.64
0.88
-------
Figure IOC. Spectral Density for Series 3
aooooooo*
> o o o o «
• rj *j s* f, o -> -t> r- r
OOOOOOOOOOOOC*OOOOOOOOOOOOOOe>OOO«JOOOOOOOOO«
ni9 *»-»— «ji,rt'xjor- a —• « ir> -^ o »-- :? r\i o A *o c> AI ^i ru
«>**p^-'v'~-<'«-»'V''^J1'~<'*>'>'-«or.^— f^^^»«QO--^>
C-
0.31 0.38
0.62
0.88 (Radians/Day)
to
VD
-------
7. Summary
A brief summary of the results contained in both the interim report
and in this report on analysis of particle concentrations is given herein.
(1) The SGD's behave independently of the GM's, that is, reductions
in the GM's do not imply corresponding reductions in the SGD's. As a re-
sult, introduction of air quality control technology could reduce the
annual GM without changing the SGD.
(2) One statistical test and several empirical tests of log normality
of the particle concentration have been made. The LN distribution very
adequately fits the daily average particle concentrations, but no definite
physical explanation is given for this observation based on empirical
results. However, some hypotheses are given to support this observation.
(3) There is a downward trend in the GM's for the several large urban
areas considered in this analysis. All trends are downward although none
are significant.
(A) The 1968 GM's are slightly less than the 1967 GM's but not
significantly less. This agrees with (3) above.
(5) The SGD's do not exhibit the same downward trend as the GM's,
some increase and others decrease, very few of the trends are significant.
This further supports (1) above.
(6) There are significant correlations between daily average
concentrations obtained at neighboring urban sites, say within 5 or 10
miles. The correlations range from about 0.30 to 0.80 depending upon
the distance between sites, their relative locations, etc.
-------
41
(7) The conditional frequency that the mean particle concentration
over several neighboring sites (i.e., less than 5 miles apart) exceeds
3 3
150 vg/m given that at least one site has a value exceeding 150 vg/m
3
(i.e., maximum exceeds 150 vg/m ), ranges from 0.14 to 0.82 depending on
the site location, closeness of the sites, annual GM's, etc.
(8) The primary and secondary standards are not equivalent in the
sense that if an area (station) meets the annual standard, the likelihood
of exceeding (violating) the daily average standard is considerably higher
for the secondary standard than for the primary standard.
If one allows a greater number of violations per year, the likelihood
of 15 or more violations of the daily average for the secondary standard
3
150 vg/m is approximately consistent with one (1) violation of the primary
standard 260 vg/m » given that the mean satisfies the secondary standard
for the annual GM.
(9) The time series analyses for three stations in Philadelphia,
Pennsylvania, show remarkable similarity in that the correlations of ob-
servations on successive days is between 0.33 and 0.38 and periodicities
of approximately 7 days/cycle, 10 days/cycle, and 16-20 days/cycle are
indicated in the spectral analyses. Further analyses of this type would
need to be performed for other stations-locations in order to make any
general inferences.
-------
42
References
[1] Grubbs, Frank E., "Table of Critical Values for T (One-sided Test)
When Standard Deviation is Calculated from the Same Sample,"
Technometrics, Vol. 11, No. 1, February 1969, p. 4.
[2] Herdan, G., Small Particle Statistics, Amsterdam: Elsevier, 1953.
[3] Aitchison, J. and J. A. C. Brown, The Lognormal Distribution,
Cambridge: University Press, 1957.
[4] Environmental Protection Agency, "Air Quality Data for 1967 from
the National Air Surveillance Networks (Revised 1971)," Research
Triangle Park, North Carolina, August 1971.
-------
Appendix A
Condensed Interim Report
on
Impact of Secondary Air Quality Standards
on Selected AQCR's
-------
Condensed Interim Report
on
Impact of Secondary Air Quality Standards
on Selected AQCR's
1. Introduction
The first part of Phase I of this task has as its objective the
investigation of the statistical relationships between the present
secondary standard for suspended particles (150 yg/m ) maximum 24-hour
concentration not to be exceeded more than once per year, and a hypotheti-
cal standard with 2, 5, and 10 violations allowed annually. The frequency
distributions satisfying the above are then to be compared to both the
secondary and primary standards.
This condensed interim report is divided into three sections. Following
this introduction, Section 2 compares the primary and secondary standards
as to their consistency; and Section 3 contains an analysis of data on
concentrations of particles for urban regions,— and some other pertinent
analyses.
Throughout this study the average 24-hour concentration of particles
is assumed to have the LN distribution. Thus if X is the measure of the
concentration of particles, log X = Y has a normal distribution. Hence,
all computations for percentiles of Y can be immediately transformed to
Y
the corresponding percentiles for X by the relationship X = 10 . The
geometric mean (GM) and the standard geometric deviation (SGD) of X are
the transforms of the mean (y) and standard deviation (y) of Y, i.e.,
GM = 10V and SGD = 10y.
— Air Quality Data for 1967 from the National Air Surveillance Networks
and Contributing State and Local Networks, Revised 1971. U.S. Environ-
mental Protection Agericy.
-------
45
2. Comparison of the Standards
It is immediately clear that the primary and secondary standards
are not statistically equivalent, nor should they be, as they were
set independent of statistical data because of health and visibility
reasons. However, when the question of the ability of a region to
comply with the standards is considered, it becomes necessary to
compare them and to evaluate the cost of compliance versus the benefits.
The following table illustrates how the standards are inconsistent.
Table 1
Expected Numbers of Samples Having a Suspended Particle
Concentration Exceeding the Daily Average Requirement
(n = 365 samples/year)
Primary Secondary
Standard Standard
(75, 260) . (60, 150)
GM = 75 GM = 60
SGD = 1.82 6.9 23.0
SGD = 1.59 1.3 8.5
3. Analysis of Data
Based on the data in Table 2.1— which give the frequency distributions
of particle concentration for urban areas (328 stations), the distributions
of the GM and SGD are summarized in Table 2. The distributions of the
observed GM's and SGD's for the 328 stations located in urban areas are
given in Figure 1. From these distributions, the following percentiles
are estimated.
— Air Quality Data for 1967 from the National Air Surveillance Networks
and Contributing State and Local Networks, Revised 1971. U.S. Environmental
Protection Agency.
-------
.46
Table 2
Percentiles of Distributions
10th 50th 90th
GM(yg/m3) 51 80 125
SGD 1.45 1.60 1.82
In order to relate the observed results to the likelihood of compliance
with the standards, two sets of curves were developed (one set for each
standard) to give the expected number of violations based on 365 samples
from a LN distribution with specified GM and SGD given by points of the
curves. That is, the values of the GM and SGD were determined so that
the expected number of violations in 365 samples was exactly 1, 2, 5, 10
(and 20 for the case of the secondary standard). Figures 2A and 2B give
these curves for the primary and secondary standards.
The following computational procedure was used in determining the
two sets of curves shown in Figures 2A and 2B. The daily average concen-
tration of particles were assumed to have a lognormal distribution with
specified GM. The SGD was then determined as a function of the standard
for daily averages and the allowed number of violations. The standard
3 3
for the daily average was taken to be 260 yg/m for Figure 2A and 150 yg/m
for Figure 2B. A series of values of the GM were assumed and the correspond-
ing SGD's computed. The following figures indicate the approach, and a
specific computation is given for GM = 75, one violation in 365 samples.
-------
u
a
-------
.48
These curves of Figures 2A and 2B permit one to quickly count the
number of stations "currently" (based on 1967 data) able to comply with
the standards, both daily average and annual GM. The following values
were obtained by these counts.
Table 3
Percentage of Stations Complying
with Specified Standards
Secondary Standard
(60, 150 pg/m3)
Primary Standard
(75. 260 pg/rn3)
Average No.
of Violations
Allowed in
365 Samples
1
2
5
10
20
Percent of
Stations
Complying
with Mean
Requirement
15
20
27
37
48
Percent of
Stations
Complying
with Both
Requirements
11
15
18
21
23
Percent of
Stations
Complying
with Mean
Requirement
56
69
76
81
—
Percent of
Stations
Complying
with Both
Requirements
39
42
43
44
—
-------
49
PERCENTAGE
40 50 60
98%
1r Figure 1
1 Source: Air Quality Data for
1967
t{ Distribution of Geometric Means
and Standard Deviations -
Standard Geome:
i-ffi ,,T» f-ft+iUT-
«- -jJ-l.- J- --
I I I II II I I I I
4.5 5.0 5.5
PROBITS
-------
Figure 2A. Values of GM and SGD Which
Yield the Specified Average Number of
Violations (V), Daily Averages Exceeding
260 yg/m , Based on 365 Samples.
r o
~GEOMETRICr~MEAN"(GMr~£?7m3
-------
Figure 2B. Values of GM and SGD Which
Yield the Specified Average Number of
Violations (V), Daily Averages Exceeding
Based on 365 Samples
-GEOM EF-me—W EAN-teM-
------- |