GUIDELINE SERIES
OAQPS NO. 1.2-015
GUIDELINES FOR THE EVALUATION
OF AIR QUALITY DATA
US. ENVIRONMENTAL PROTECTION AGENCY
Office of Air Quality Planning and Standards
Research Triangle Park, North Carolina
-------
45OR74O05
GUIDELINE SERIES
OAQPS NO. 1.2-015
GUIDELINES FOR THE EVALUATION OP
AIR QUALITY DATA
U. S. ENVIRONMENTAL PROTECTION AGENCY
OFFICE OF AIR QUALITY PLANNING AND STANDARDS
MONITORING AND.DATA ANALYSIS DIVISION
RESEARCH TRIANGLE PARK, NORTH CAROLINA 27711
-------
TABLE OF CONTENTS
PAGE
PREFACE i
1. INTRODUCTION 1
2. BASIC CONVENTIONS FOR HANDLING AIR QUALITY DATA 2
2.1. Significant Figures 3
2.2. Minimum Detectable Limit 3
3. CHARACTERISTIC PATTERNS OF AIR QUALITY DATA 5
3.1. Seasonal Patterns 7
3.2. Diurnal Patterns 7
3.3. Frequency Distribution 10
4. SUMMARIZING AIR QUALITY DATA 10
4.1. Indicating Typical Values 13
4.2. Indicating Maximum Values 15
4.3. Indicators of Spread 17
5. MAKING INFERENCES FROM AIR QUALITY DATA 17
5.1. Inferences About a Particular Site 19
5.2. Inferences About a Region 22
6. SOME STATISTICAL TESTS ' 24
6.1. Student's T-test 26
6.2. Non-Parametric Quantile Test 28
7. BASIC MEANS OF OBTAINING AIR QUALITY DATA 29
-------
LIST OF TABLES AND FI'GURES
PAGE
TABLE 1 Suggested Reporting Accuracy 4
For Raw Data
TABLE 2 Minimum Detectable Limits for Selected
Measurement Techniques 6
TABLE 3 Number of Hours Above Oxidant Standard 11
By Month and Time of Day (1971 Data)
TABLE 4 Maximum and Second High Values (Phila.) 16
for Various Sampling Schemes.
TABLE 5 Geometric Means, Medians, and 90th 18
Percentile Values For Table 4
TABLE 6 Summary Criteria for Continuous 21
Measurements
TABLE 7 Probability of Selecting Two or More 23
Days When Site is Above Standard
TABLE 8 NADB Output for Common Questions on 31
Air Quality . .
FIGURE 1 Graphs of Monthly Averages for Various 8
Pollutants at a Particular Site
FIGURE 2 Graphs of Seasonal Patterns for Various 9
Pollutants at a Particular Site
FIGURE 3 Frequency Distribution - TSP (Phila.) 12
-------
PREFACE
The Monitoring and Data Analysis Division of the Office
of Air Quality Planning and Standards has prepared this
guideline entitled "Guidelines for the Evaluation of Air
Quality Data" for use by the Regional Offices of the Environ-
mental Protection Agency. The purpose of the report is to
provide guidance information on current air quality data
evaluation techniques. Adherence to the guidance presented
in the report will, hopefully, ensure mutually compatible
ambient air quality data evaluation by all States and Regions.
Further, any risks involved in policy decisions concerning
National Ambient Air Quality Standards should be minimized.
This report will serve on an interim basis until more
specific and detailed guidance on this subject is developed.
-------
1. INTRODUCTION
The purpose of this guideline document is to present
the basic elements of air quality data analysis that are
essential in preparing reports describing the air quality.
status of a given region. With this aim in mind, emphasis has
been placed upon describing both the conventions and the
methodology to be employed with minimum discussion of the
associated statistical theory. Much of the material that
is presented has been treated before but for the sake of
completeness, is reiterated in this document with appropriate
references indicated.
Since the phrase "air quality data" covers a variety
of possible data sets/ it is convenient to indicate the
exact nature of this phrase as used in this paper. For present
purposes, the term "air quality data" refers to a set of ob-
servations for a particular pollutant having the following
properties:
1. All measurements were made at the same site.
2. Uniform methodology was employed.
3. All measurements have the same averaging time.
It should be noted that the statistical treatments described
here for such a data set constitute a minimum effort. There
.are a variety of more sophisticated techniques available that
could be used to extract more information from the data. -In
general, the degree of effort devoted to data analysis should
-------
bo consistent with the value associated with the data. This' .
can be viewed in financial terms as cost: of data analysis
versus cost of data collection or cost; of 'data analysis versus
potential cost of control strategies, etc. In most cases,'
the extent of the data analysis phase is determined by a sub-
jective judgment of what is appropriate. It should be noted
that no matter how extensive the data analysis effort is, the
end result can be no better than the original data. :'This .
point is particularly important because throughout'the following
discussions no analysis is made concerning 'the errors inherent
i .
.in the measurement method. Therefore, it is essential that
the air quality data analyst be aware of the shortcomings in
the data and the conclusions that are "statistically signifi-
cant" be carefully evaluated to determine if they are "realty
significant."
2. BASIC CONVENTIONS FOR HANDLING AIR QUALITY DATA
Before discussing the analysis of air quality data, it
is essential that certain basic conventions be presented for
handling the raw data. These conventions are introduced to.
prevent the air quality summaries Sfrbm;'appearing; to be mprb' .
accurate than the data warrants. These conventions have been
discussed previously (Nehis and AklancC 1973) and are rep'oated
here since they are the procedures presently employed'by1EPA
in maintaining the National Aerometric Data Bank.
-------
The two topics treated in this section both relate to
the relative precision of the raw data with respect to the
methodology employed in obtaining the!measurement. The first
topic concerns the number of significant figures that should
be reported while the second deals with values that are below
the minimum detectable limit. ;
2.1. Significant Figures '
The number of significant figures that are meaningful
for a particular air quality measurement is limited by the
methodology employed. To use more significant figures than
is warranted by the sensitivity of the analytical procedure
adds no real information and can often be misleading.
Table 1 presents the suggested reporting accuracy for raw data
for various pollutants. While the conventions apply to the
raw data it is also useful to specify the accuracy of geometric
and annual means. For simplicity/ the general convention is
that all means be reported to one more significant digit than
the raw data.
2.2. Minimum Detectable Limit
Some reported pollutant measurements are below the limit
of detection for the analytical procedure. In such cases,
i
the reported number should be viewed as representing a range
i
from zero to the minimum detectable. ! However, in order to
use such data in computing annual summary statistics such as
-------
TABLE 1 -' SUGGESTED REPORTING ACCURACY FOR RAW DATA
Pollutant
Number of Decimal Places
I ug/m3
Suspended Particulate Matter
Benzene Soluble Organic Matter
Sulfates
Nitrates
Ammonium
Sulfur Dioxide
Nitrogen Dioxide . :' .
Nitric Oxide
Carbon Monoxide
Total oxidants
Total Hydrocarbons
Ozone
Methane .
0
1
1
1
1
0
0
0
1
0
1
0
1
2
2
2
0
2
1
3
.1-
-------
geometric means it is convenient to have a convention indi-
cating -what value should be substituted for a measurement
below the minimum detectable. As a general rule, each value
below the minimum detectable is replaced by a value approxi-
" ' , / ' ' "' i '." '.-.(
mately equal to one-half the minimum detectable. Table 2
:..'';'. '',/'('
indicates selected minimum detectable limits used by the
Natipnal Aerometric Data Bank (NADB) for various analytical
methods. A complete listing may be obtained from the National
Air Data Branch, EPA, Research Triangle Park, N. C. 27711.
The mid-point substitution was selected after examining the
statistical distribution of the data (Nehls and Akland, 1973).
It should be noted that in comparing data over several years,
a standard minimum detectable should be used unless it has
changed by an order of magnitude.
In preparing summary statistics, if more than 25% of the
observations are less than the minimum detectable no statistics
are computed from the data.
3. CHARACTERISTIC PATTERNS; OF AIR QUALITY DATA
Before summarizing;any data, some thought should be given
to the characteristics of the raw data. This is particularly
true of air quality data for which strong seasonal and diurnal
patterns may effect the interpretation of the data. For
example, the maximum hourly oxidant value for a year based on
4,000 observations could h&Ve Completely different meanings,
depending upon whether the observations were made primarily
during the winter or -the summer. This section presents
-------
TABLE 2
MINIMUM DETECTABLE LIMITS FOR SELECTED MEASUREMENT TECHNIQUES
Pollutant
Collection
Method
Analysis Method
Units
Minimum
Detectable
Suspended Particulate
Nitrate -;'. .^-'. ;-
Sulfate .r. > i* ;< - ._-
, -' - .-"~\ ' . -'-
Carbon Monoxide
*? . " .
Sulfur DiokiderA
Total Oxidants
Hi-Vol
Hi-Vol
Instrumental
Gas Bubbler
Instrumental
Gravimetric
.?!'*.
Reductipn-piazo Coupling
Colorimetrie
Nondispensive Infra-Red
West-Gaeke Sulfamic Acid
Colorimetric Neutral KI
ug/m
ug/m
'ug/m
.mg/m
ug/m
ug/m
3 '
1.0
.05
.5
.575
5.0
19.6 "
-------
excinplos of some of these patterns. The analysis of these
patterns can frequently be an end in itself since they pro-
vide insight into the behavior of the pollutant. An awareness
of these patterns also provides a means for screening the data
for anomolous values. It should be noted that while the
following discussion is general in nature, the characteristic
pattern at a given site is a function of local factors such
as emissions and meteorology and as a consequence characteristic
pattern may be specific to that site or locality.
3.1. Seasonal Patterns
Figure 1 displays graphs of monthly averages for various
pollutants at a particular site. Superimposed on these graphs
is a smooth curve selected to emphasize the long term trend in
the data. Figure 2 displays smoothed curves illustrating the
seasonal patterns in the data. The intensity of the seasonal
pattern for a particular pollutant may vary from site to site
within an area depending upon factors such as proximity to point
sources. A knowledge of the seasonality of a pollutant can
provide .useful information for interpreting the data since it
suggests the season in which maximum concentrations would be
expected.
3.2. Diurnal Patterns
In addition to seasonal patterns some pollutants also have
pronounced -diurnal patterns. These patterns may be due to
factors such ar, solar radiation, traffic density, etc. which
influence pollution levels.
-------
it
u
o
rACHii'i M Wiixiw , ,
INiiTMl'MI M'l'.M. N *.>! :M'i:i:S1Vr IMUA-M:I>
M.:A.'i',M:;7i,|«, (I', i1)
63 '64
a-
O .
, v ' . ;'.»,,:' . .''
' ' '" ''' ': '
I'.':
.
' < '.'"/'£)','
;'.>. .; ;frv,;,'
-..".' ^ / '', . -h
" f ' ' . -' '(!
''1' '-'*1 *
,.-- ,
.: / - . .,-.'. ; ; 4 ; '.' . ,'.:."
,ti»'Jtu'Mirr.i:ir 125 cv '
6,4;' 65'
SAC
300
aoo
56
' NITRIC OXIOE'' 'f;.';;.'
INSTRUMENTAL COLOHIMETfRIC
UG/Ctl METEH (8S C): ':'t~
i r ' 1 I ,i-
61, . t9 76 71 7J
»££.
na.
SUSPENDED TARTICULATB
HI-VOL GRAVIMETRIC
UC/CU HBTF.R (25 C)
.. P ':., '^^..v^'Sf!' .' ..;'.,
INSTRIIMENTAL COLORIHtVKIC
'METE*; Hi c) ! ,
: >'Ui
jqJU
*,
;--&W
-UK
04 , oJ- b<4 oa
FLAMS ION1IATTON
.; Ufl/CO HETiR (SfS C)
TOTAL OXIDANT8
JNSTnUMtNTAL
b » t,4 ' I.-.. , tv
' i i '
;.. DJ . ici i--,»a 1'Wji^w;*/;» ' j <»,.T ; f v 'f : ,i f
-------
I
. lll-V'H. I'I'.V. I'fl .TUIC-
m'TCH i:"j r)
57 ' 58 * 59 ' 60 '61^63 ' CJ '04 68 "f 6i> .' 67 « 68 69
NrAi. UiKitun L
M«t/CU «KTi:«
USIVB i;:ru\i>i.»
(25 C)
64 ' 65 ' 66 ' 6? ' 6B ' fc» ' '0 '
SULFUR DIOXIDE
INSTRUMENTAL CONDUCTIMETRIC
UO/CU METEK (25 C)
0 ' 64 ' 65 ' 66 «7 ^8 «» ' 70 71 7.8
20JX
U
e
NITROGEN DIOXIDE
INSTHUMENTAL CPLORIMETRIC
UO/CU METER (25 C)
O 64 65 <6 '67 68 69 70"T 7i ' 7J
li!
Ul
JJ
TOTAL OXZDANTI
IMSTRUMENTAJ. COLORIMETRIC KBUTBAl XX
W/CV METER (IS C)
UflJ
JOI
C) '64 ' 65 ' 66 ' 67^68 ^ 6» ' 70 ' 71
OWOEf Of HITKOOBH
IK8THUMENTAI. COLORIMKTRIC
00/CU MBTER (}5 C)
68 68 70 ' 71 it
iUA
14M
TOTAL »YDROCAR»G!IS
STRUMENTAL PLAMU ICMItATIOH
KTJiH (35
gflft
NITRIC OXIOf
INSTMUMtHTAL
UQ/CU MKTKR (>l
<..c :. /
-------
10
Table 3 !;u'",!'Kirir<:;j lhc» ,1971. oxidant data for the Downtown
Los Anrjoloy uit<.!s operated by Los /mcjales Air; Pollution Control. -i
District.' ' The number'; of ti^mcs ...that the* .nationairb^idan't .fjtkh
darcf was exceeded iu p'rxiipented ,by month and hou',r of the -day.
; The marginal totals indicate both the diufnal pattern and
seasonal pattern. . ' ;. . . ' '; ' : .
3.3. .Frequency Distributions , . :;. ;,.. : ,.;-' .-, -;
One characteristic pattern of air quality data' that is
particularly important becomes apparent after examining sbrne
frequency distributions. Many quantities are' assumed to have
a symmetric distribution about the average such as the normal-
distribution. Figure 3 shows the frequency. distribution for
total suspended particulate data from Philadelphia, it is
apparent that this distribution is not symmetric. However,
Figure 4 shows the frequency distribution for the logs of
this same data. The distribution is more symmetric and. may .
be approximated by a normal curve. Data having this; property
is said to be log-normally distributed arid ,this>is a; qo^utvpn Y ;
., , . _ . ,'",.. ,'''.',>''.' i ';
assumption 'regarding air quality data (I^argen,; 1971)> ' ,'j
4. SUMMARIZING AIR QUALITY DATA ./ ' : ,:
In preparing a summary of air quality data, one of ^the
moot important steps is to determine t,ho pvrposQ of (the .:'' ;.. .
... " ':' ,. '. ' . '. '' '", '"V" -- '',.\
summary. The usual use of these suitunar.ies is to indicate J- .'''
typical levels and peak IpVels. This" section d^abuQses \ ;; ; 0
coino of the basic statistics that can be usod for this purpose.
-------
TABLE 3 NUMBER OP HOURS ABOVE OXIDANT STANDARD
BY MONTH AND TIME OF DAY (1971 DATA)
DOWNTOWN LOS ANGELES
H123456789
T: *j
rz3
:C.R 1
Jv??.
-~Vl
iu:; 12
JVL . ; .2
:.-JG " 2
3E?T 3
3CT .
iJC'V
3F.C
IOTAL BY
HOUR 1 10
10
1
4
9
13
8
6
2
43
11
1
1
6
3
9
19
17
10
7
73
N
1
4
3
8
4
12
18
16
10
5
1
82
1
2
4
3
8
4
12
15
16
10
9
1
84
2
2
4
2
7
3
11
11
7
. 6
6
59
3
3
3
1
7
1
6
4
3
1
2
31
TOTAL BY
45678 9 10 11 MONTH
8
16
12
31 44
1 16
2 *1 65
1 83
1 70
46
31
2
. o I
73 393
-------
FIGURE 3 - FREQUENCY DISTRIBUTION - TSP (PHILADELPHIA-1969)
5C
30
20
10
c
Cf> Cf» Cf» CT» CT» CM CT» Cft
IO 1^ CO Cf> O
I I I I I I I i
O O O O O O O I
CM O> CTl
."r- CM CO
en
CT> CTv
in vo
Ol CT»
co en
i i
O o o
I O. . CU, CO,
r-l , » , J
.1
o
i i i i i
o o o o o
u> vo I"**, oo en
Cn en
O r
CM CM
I I
O O O
cr> c\ ci c^
CM n -^- i.-j
CVJ CM CM C,J
I I I
O r
CM
CM,
en o A
CM CM
40
30
20
10
0
FIGURE 4 - FREQUENCY DISTRIBUTION - LOG OF TSP DATA (PHILADELPHIA-1969
! to !
oo
CO
"t-
.0
CO ' CO . CO
I III
000
in *o t**
..co -co. co co ro co ]
_..__.
O» O 1 r-i CM CO
CO «
I . I . I
O O : O
en . O-! r
to r»» oo en O- CM ro
*3- . *j- «i- < «3- «s- «^- *3- in. inintn
i i i: i i i n i i. it i
O O O'O O OOC3 O OO O
c\j " co ' «?-. in'UJt»-cocnc3 CM n
I
O
: in
I
~\~s ~~-*-$&'
in in «n
i
t
m
in
-------
13
The first two subsections discuss the treatment of. typical
and peak values. The third discusses. the range of the data,
4.1. Indicating Typical Values
This section discusses the arithmetic mean, the median,
' ',,'' .
and the geometric mean as indicators of typical values. The
arithmetic mean and the median are frequently used in air ;
pollution studies" because :. of certain properties of the log-normal
distribution. In choosing the appropriate statistic, the purpose
of the sximmary must be considered. While all three may indicate
typical values, if the purpose of the summary is to compare
the data to the National Ambient Air Quality Standards, then
the standard suggests the: appropriate statistic. A commonly
used statistic to indicate typical values is the mode. The
mode is the value that occurs most frequently. The use of. the
mode is not discussed here since it is frequently of little
value in summarising air quality data. For example, the mode
for oxidant could be near the minimum detectable due to low
values throughout the night.
Arithmetic Mean
Given a set of n observations, say X,, X_, .'..", X ,
the arithmetic mean is simply rr 1 £- v .
' - ' x " x ' '
,: When the term "average" is! used the arithmetic mean
' " ' . .. ,1,1 '''*'* ' ' ' ' '
is usually what is meant.' :: ''. ,
-------
14
Modi c^n
The median is the middle value of the data. That
1 ''-'.' ! . - 'J
'" . ' . . . ' {'. ,-"
is if tho data is ranked in order of magnitude so that
... < X then the median is X-..-, if n is odd,
n, ".-*
s .-.. . ..
and/X + X , \ if n is even.
2
The median is a convenient statistic that is not
influenced as much as the arithmetic mean by changes in the
;. ; '.... .' .:.'; '..> . :-
extremely high or low values of the distribution. .
Geometric Mean
Given a set of n observations, say X. , X,, ... X ,
the geometric mean is g = (X1«X2...X ) 'n .
Since this probably is the least intuitive of the
statistics presented, it is worthwhile to discuss it in
more detail.
If distribution is symmetric, such as .the normal
distribution, then the expected value of the arithmetic roe.an
and median are identical. However, for a log-normally
distributed variable, it is the expected value of th^ geometric
mean that approximates the expected value of tho med.,i;an,. j
Therefore, since some air pollutants have a.distribution that
is approximately log-normal, the geometric ir.ean .became'used/as
a convenient method of summarizing the data and for to.tal, .
suspended particulate, tho annual standards are expressed as
geometric moans.
-------
1
r
As an alternate computational formula, it should
be noted that
, M :. : ;.. / '- ''-. (-1, n
9 = ~ 2H log x. or g = EX.''};?. 21 log x
n i«l- . /"-'-. .'-1',' '- --.w:V
4V .
*!"
**
4.2. Indicating Maximum Values , :
As in the previous section, the purpose of the summary
is a critical factor in determining the appropriate statistic.
.''aximum values may be indicated by listing the maximum and/or
the second highest value. The second highest value is important
uinee compliance with the short-term air.; quality standards is ;
determined by this value. 'However( there are other statistics '
that are useful for indicating maximum values. The principle
difficulty with using the second highest value is that it does,
not allow for differences in sample sixes. For example, if :
two monitoring devices are side by side and one monitors every
day of the year while the other monitors only every sixth day,
it would be expected that the second high value for the every
day device would be higher than the every sixth day device
even though both monitored the same air. Table 4 illustrates
how the second high value may vary depending upon different
sampliisg frequencies based upon total suspended pafrticulate
da|a from a Philadelphi^*site; that samplad daily; !i-^; .'.',
* .'.., . - ' ', ; ' '.'' :'V ".."' ' H " '- ' '!**'', ' -. ' ' . ' '" , ,V- -' ''.*" - ' '> '
'.'' ' ' ' ''.''' i '' ';''--'-'''"'' ' .-'" ^T.' --''-'v''' '. ; ' '.' A
\ To allow for this d:ependenco upon saunple.'d^Jt'^V;'}^4r.i6us
percentiles are somotimq's! used to indicate maximum values.
For example, the 99th percentilo nugtit be used for hourly
delta while the 90th might be appropriato for daily measurements.
-------
TABLE 4
MAXIMUM AND SECOND HIGH VALUES ( PHILADELPHIA-1969)
FOR VARIOUS SAMPLING SCHEMES
' *' " .",
Sampling Schedule Observations
Everyday - "' 365
Every Sixth Day 61
" 61
X: .61
61
" 61
: ; " 60
" ' - " .-- --._'. " n .- ; . .:;: - ., - __ 25 '."'"
:---,- ', ' ,;:: --- '-' "- 25 :- v:
-v.-- '.', ' -: " ..':.-.' '- ' . 25 ;
".. .:- -; '-' "- '-. ' ' 24-
: " 24
-.:; ; r, ,:"--' - - - '24 '
. ' - ..-.--- " - "'" ---, 24
'---.- . .,;. "; "'. -' '- " ' " , -24. ..
-Cv' ' 'v,-::- -^r.'-::- -:-- -:; 24;.
--.. :-... --,".: ---.' - - 24
-';- -' --"- :--" :-. \ ' - 24"'ir-\ :
. - .- "' . . / ' --. 24
"---. -". ' - ' , . 24
Haximvim
325
219
195
: 244
. 215
325
239
. 205 : -.
,325 .
"-C-r "^'239 '-i- "' '
~. 219 :
'..--..^234 -,' ;.-...:
201
215
195 -
. - 183
/ .-,.;' 195 x;:,;- -.-
'" ---j 160 .":.;; ^
- 244 ^ '::':' '
:. "'" .215 ' ?'""'
:-'. -179 ..=:'.. -:.
238
Second Highest '"'"'
''-' 24;4 ' ; '
' >- .;; 215 ''"' :'- :' -
j 171 ;
;. :- -238 - - -^::-:
': .-''211 . . '.: ;
. "; .:. 234 -:. . ,,,-;:
. ' '205 ---vTv.--
'':.'"." :^ 176 .- ':' -;{;; ;":
-X.c=; ;;' 207 ''' ".. .'/' " i^-,'
"'-''*' -' '""'' -191; ~? .:-;!V %.'X
v-v- -' -196 . ; -V-^^
-' V-:'-, X-;;'X65 X -';/- -'-; .'. . 'S
.-' -1 98 r ::/. "-: "'.""
;'" 211 ' 'i;:- -""" .
183 " ," \;-'
-.... - 173; .-, ;;\.;.-:i-f.)
'.-. ' : 169 -.''f '.' '... >V/Xi
.:-'.'..': ,;.;Xi54;;- ;=;\.'^i- . .X-X;-^
.' ^...-'X:;139/ *;;'X"^.i-'X '-" ;-^:
;XX;-", ; ,.7^*2 01-" - .'"'. : ' ' :.' " ";' -^
-.<:.>-::'-;.171. - ?.:'>" 'X:-.- - '-"':'
-X-'-'-'iS??? ":a-';^--
'' --' "C - : *' V-%-''-'"-' V
-------
17
By using a percontilc value, allowance is made for varying
sampling frequencies from site to site and year to year.
Table 5 indicates the 90th percentile for the sampling
schedules used in Table 4.
4.3. Indicators of Spread
In addition to an indication of typical and peak values ,
it is also desirable to have a measure of how variable the
data is. Did it fluctuate widely or were all values fairly
uniform? The customary statistics for this purpose are either
the arithmetic standard deviation or the geometric standard
deviation. Ranges or perccntiles could also be used depending
upon the desired use of the summary but they are not discussed,
The basic formulas for the arithmetic and the geometric
standard deviations are given below.
Let X.,, X2, ..., X be a set of n observations.
Then the arithmetic standard deviation is:
i £ (x.-x)2]1/2 where x-I £ xi
n i=i i J n i=1
and the geometric standard deviation is
EXP [i £ (In x± - Ing 2J 1/2
where g is the geometric mean.
5. MAKING INFERENCES FROM AIR QUALITY DATA
Once the air quality data has been summarized, it is in a
convenient form to be examined so that conclusions can be made
regarding air quality. At. this point the data is either
-------
TABLE 5 GEOMETRIC MEANS, MEDIANS, AND 90TH PERCENTILE VALUES
FOR SAMPLING DATA OF TABLE 4
Sampling Schedule
Everyday
Every Sixth Day
n
w
n
w
, . if
Every Fifteenth Day
- ; If. '.:'
' * ';
''.-"' H
" " ' . ;
. .'
'* "
W
n .
*<-. '. '
- ' . . 1*
"»'""-" " -
". ' "
it
: . .. ...
tt
Observations
- 365
61
- 61
61
61
61
60
" " ^s - :
' "- -25 ' -
-' -- 25--,:,,.
., - ' .25--'
-.- 25-
24
24
24
..': .24
24
, ./.-.-,- . -.24
24
24
24
''::" 24 .
Goornctric Mean
102.5
99.3
95.2
113.6
107.2
106.4
,S*.7--
:.100,2
vl!4;6.
, 125.0
104.9
100.3
99.8
104.4
- 102.4
. .:92.1
.100.8
92. o:
104.6
.107.2
' :-- 94.1
' -S3. 6
Median
-97
105
93
113
101
105
94
":--' "111
121 :
130. ,
:95 -"
105
90
98
99 -.
25
. S6
; -as
97
109
- -54 --
98
90th Percentile
171
-162
155
188
. 177
'171\
153
175
178
.: ::iS9
' -+n £
: !1-18.
190.
M77
-.- .!1?i
" v-. ;-.:.:'l^3;;'
..162
140
186
173
. .U6-2-
. '- " '165
-------
19
extremely useful or extremely dangerous depending upon the
quality of the summary. Thin section discm^seu thane inferences
to illustrate the potential dangers that can. roe'ult-from in-
adequate s:rrjaaries. - For convenience, the discussion is divided
into two parts. The first deals with inferences about a
particular site while the second deals with inferences about
a region.
5.1. Inferences About a Particular Site
This section discusses inferences that can bo made about
a given sito from one year's data for a particular pollutant.
Since any conclusions based upon the data can be. no better
than the data itSelf, the most important part of the summary
is to decide if the data gives adequate annual coverage. This
relates directly to the previous discussion of characteristic
patterns. If an annual average is to be computed from the
data, then it is essential that .all portions of the year be
represented equally. An examination of the seasqnality that
exists for certain pollutants shows why this is essential.
As a convenient rule, it may bo. assumed that if each calendar
quarter contains at least 20% of the total observations then
the sample is adequately balanced. If this is not the case,
then a more appropriate way to determine the annual average is to
use a weighted mean calculated as follows:
(1) determine the average for each quarter and
(2) compute the average of these four quarterly averages.
While the previous constraint applies to the seasonal balance
of the sample, it is also essential to have a restriction on
the minimum number of observations that are required to compute
an annual ir:-.-an. Such constraints are employed in the National
Aeronatric Tata Bank system ('llehls and Akland, 1973) and to
maintain un if trinity/ they are repeated here. For continuous
-------
20
measurements at. lirtist 75C, of the total possible observations
should be present before sunimary statistics are calculated.
The exact requirements arc given in Table 6. For intermittent
sampling data, l.hore must be at least five observations
per quarter and if one month has no observations the remaining
two months in that quarter must both have at least two obser-
vations. While those conventions are used in general/ it is
of course possible to modify them for certain applications.
For the most part the general intention of these restrictions
is to ensure that the observations are sufficiently represen-
tative of .the entire year to calculate an annual mean. For
4
peak value statistics such as the number of times a certain
value is exceeded the constraint is not essential in showing
violations. For example, two hourly oxidant values in excess
of the standard is sufficient to show non-compliance even if
there were no other observations that year. Nevertheless, to
assess the extent of the problem, data sufficient to meet the
requirements for determining a mean would be advantageous
although for seasonal pollutants it could suffice to summarize
only particular quarters or months.
In discussing the inferences that can be made from a given
sample, it is worth observing that while the annual mean can be
either undei or over-estimated the maximum and the second
high values can only be vindcrestirnatcd assuming no instrumental
error. Tor example, if a simple hypergeor.ietric probability
-------
TABLE G
SUMMARY CRITERIA FOR CONTINUOUS MEASUREMENTS
Time Interval
Minimum Number of Observations
3-hour running average
8-hour running average
24-hour
Monthly
Quarterly
Yearly
3 consecutive hourly observations
6 hourly observations
18 hourly observations
21 daily averages
3 consecutive monthly averages
9 monthly averages with at least
two monthly averages per quarter
-------
22
inoclcl 'is i-.rjsunioci, Table 7 shows the probability of detecting
violations of the .'.short-tcrin standard 'as a function of
.sampling frequency. From this table it may be seen that if
. ,. i i I
/samples are taken every sixth day the probability of detecting
two excursions above the standard is less than 500 unless the
site actually exceeds.the standard 10 days per year. This
illustrates the weaknesses associated with xiatcriuining maximum
values on the basis of intermittent sampling.
Two possible solutions to this problem are (1) to
intensify sampling schedules or (2) to use 'mathematical
equations to extrapolate from the data tip predict maximum
values. At the present time, /there is no convenient predictive
formula that can be applied on a general basis to give sufficiently
accurate maximum values. As a guide, the predictive formula
developed by Larsen (1971) based on the log-normal distribution
may be used to determine the possible magnitude of the under-
estimation due to intermittent sampling., However, this
empirical model assumes log-normality arid independence and
should not be used to determine compliance with the standards
since its predictive accuracy has not been 1 fully documented.
5.2. Inferences About a Region ;
(i
Once conclusions have been made for each site in a region
the next step is to draw conclusions concerning the region* If
any one of the sites exceeds the NAAQS then the region is not
in compliance. It should also be pointed out that the worst
-------
TABLE- 7 PROBABILITY OF SELECTING TWO OR MORE DAYS WHEN SITE
. \ . IS ABOVE STANDARD
Sampling Frequency - Days per year
Actual no.
of excursions 61/365 122/365 183/365
o
t-
4
6
8
10
' 12
14
16
.- IS
20
22
:24
26
.03
' " , -13 "
.26:
.40
.52
.62
.71
.78
V .83
".?';'' ; ;"' .87 '
.91
.93
.95
.11
.41
.65
.81
.90
.95
.97
.98
.59
.99
.99
.99
.99
.25
.69
.89
.96
.99
.99
00
s *
.99
.99
-.;; .99
,99
.99
.99
NJ
OJ
-------
si to in the region may r;t.i 11 uhrorostinate the magnitude of
the air pollution problem. The only way in which a cite may
ovc:. o:;tiT".-;:te Iho air pollution problem is if it ic not
rcprri^eiTt-.ative of the air to which receptors are exposed. There
are guideline documents discussing this subject. While it is
rel.i..lively ear.y to compare the aij: quality in a region with
the TUv^QS it is not so easy to compare one region with another.
For maniple, one region may choose to concentrate most of its
monitoring efforts at sites having high pollution potential while
another region may have numerous sites monitoring background
levels. Therefore, extreme caution should be used if such
comparisons must be made and particular attention should be
given to the placement of monitoring sites.
Some Statistical Vests
When making inferences from air quality data it is frequently
necessary to have some objective means to make judgments. This
is the point at which statistical inference becomes useful. The
previous treatment has used statistics merely for descriptive
purposes in order to conveniently summarize the data. The
purpose of statistical inference is to objectively substantiate
generalisations made from the data. For this reason, two basic
statistical tests are discussed.
While those statistical tests are relatively straight forward,
a certain dorree of caution is required regarding the underlying
aosuriptions that determine their validity. Since one of these?
cu»suy..ption£ in particularly important in applications dealing
-------
with air quality data, it. will be discussed in detail.
In statist-:ic:r>, it is commonly assumed that the data to be
analysed is a random sample of all the data and thcit the
measurements arc; independent. While this may be approximately
true for interini.tt.ont data col lee tod on a sampling scheme com-
parable to that employed by the NASN., it may not be true for
all samples. Tor the most part, these statistical assumptions
are merely a mathematical formulation of common sense ideas.
Certainly, if data were only collected on Sundays, it would
not be expected that the average of these numbers is truely
representative of the annual average. Sampling schedules that
only monitor certain days of the week result in non-remdom
samples and their degree of usefulness is inherently limited.
The problem of independence is somo;:what more subtle. For
example, successive hourly oxidant measurements are not in-
dependent. While the concept of statistical independence
may be clearly defined in mathematical terms, it is possible
to present an intuitive notion of what it entails. Two
numbers may be thought of as being independent if knowing
one of the numbers does not help in guessing what the other
number is. The classical example of this is rolling dice in
which knowing what number occurred on one die does not improve
a guess of what number occurred on the other. With this in
mind, it is apparent that knowing one hourly oxidant value
hclpr. in guessing what the next hourly value will be. It
-------
26
should ho noted that it is not nocosr-ary that it make the
guess a 'certainty-only that it improve.the chances of guessing
correctly.
With the ideas of randomness rnd independence in mind, it
is possible to present two statistical techniques that are
generally useful in practice. The first test is commonly known
as student's t-test and is useful for examining the mean. The
second test is the non-parametric quantile test and despite
the rather elegant name it is a convenient test for the median
and other percentiles and is very easy to use.
6.1. Student's t-test
The Student's t-test is a commonly used statistical test for
data that may be assumed to be normally distributed. As mentioned
earlier, air pollution is frequently assumed to be log-normally
distributed so that the t-test may be employed to examine the
logarithms of the data. The application of this technique to
determine confidence intervals for annual goemetric means has
been discussed by Hunt (1972) and is briefly treated here. This
present discussion examines construction of a confidence in-
terval for an annual mean. Extensions to comparisons of two
means may also bo performed but are not treated here since the
approach is almost identical and can be found in basic statistical
texts. More general tests concerning trends at a site are ex-
cimined in the guideline document for trend analysis.
The basic application is that a set of data from an intermittent
monitoring device has been obtained. This data has been used
to detenr.ine the annual, geometric moan. Since this data re-
-------
presents only a fraction of the tottil number of cluyn in the year,
the question arises uu to how clone the mc-un of the- data in to
the actual annual mean. The statistical technique employed for
this purpose is the'confidence interval so that a probability
statement may be made regarding the range of the true annual
mean.
To calculate a 95£ confidence interval for the geometric
mean, the interval in first constructed for the arithmetic mean
of the logarithms. To do this, the following calculations
are necessary:
1
Let x, - n £ log x. , where n is the sample size
J is-"1
I n - > 1 /?
T r*4- C _ I I V fl*~ ^. _ \*-\ *-/*
L'^ Sloq
n-1
Let d =^ t1_a/2 - U-'j), ;>where '-'t^ay2 'is obtained
from a table for Student's t-test where 1-ct is the con-
fidence level and N is the possible number of samplest
e.g. 365 for daily samples.
Then the lower and upper confidence intervals for the geometric
mean, denoted as L and U respectively, are given by
L ' EXP(*log - d) -
and U = EXP(x.j + d) .
It should be noted that in the above formulas the finite
correction factor, (!-£) , was used since it is asfUKiiod that the
r.
-------
population pixo is.finite raUv.-r than infinite. For example,
in com; I do ring daily measurements it is assumed that the
population size i;.; 3G5, i.e. the total number of days in the
year.
6.2. Non-Paranatrie Quantile Test
In discussing the t-test it was pointed out that it is
necessary to assume; that the logarithms of the air pollution
moasurci'.iOMits are normally distributed. In some cases, it may
not be dosirable to make this assumption. For example, an
.examination of the data may show that such an assumption is
unwarranted. For such cases, non-parametric statistical tests
are appropriate since they do not require any assumptions
regarding the form of the underlying distribution. Moreover,
-non-parametric tests are frequently quite easy to employ since
many of the calculations are relatively simple. A variety
of non-parametric tests are available. A more detailed des- ,
cription of the test discussed here is available in the text
by Conover (1971). -
Quantile is a more general term than percentile. For the
present discussion, the test is used to examine the median but
it may also be applied to any percentiles or quantiles. It is
also assumed that there are more than 20 observations since
this is generally true for air quality problems and reduces
the need for tables.
-------
29
Let x ,, x~» ..., x be n sample of air quality measurements
cind suppose it is desired to test if the annual median is
greater than a specific valxje, say s.
Then it is only necessary to calculate the following two
values:
T - the number of sample values less than or equal to s
cind t = pn + w ./ np(l-p) , where n is the sample size p is i
the quantile value and w is the a quantile of a standard
normal random variable.
For tests at the .05 level w is - 1.645.
For tests concerning the median the quantile value is. .5
so the above formula becomes
t = ,5n - 1.645 /.25n
v
= .5n - .822 Jn .
If T is less than t then the conclusion may be stated that
"the median is greater than s" and that the result was obtained
by employing the quantile test "at the 5% level."
7. Basic Means of Obtaining Air Quality Data
One station continuously monitoring oxidant can produce
8,760 observations. Therefore, considerable caution should
be exercised when requesting air quality data since there is
a considerable rink of being inundated with unnecessary numbers.
Usually when questions arise concerning air quality, the answer
may be given in terms of summary statistics and it is not necessary
to review the raw data. Certain biisic sources include the various
-------
30
periodic reports from State and local agencies as well as
/
y;?/Vs r'eporl-.y on the NASN and CAMP monitoring efforts.
Overview reports with extensive appendices sucli as The Nat tonal
7-.i. r Monitor ing Prog r o.r.i; Air Quality an d Emis.si on Trends Annual
Rejjojrt, are also available.
The National Aorometric Data Ban]; provider- many
summary files that may be accessed by time sharing terminals.
In addition, the NADD provides printouts containing general
information that moy be easily looked up with no need to
access the computer. Table 8 lists frequent questions and a
readily available source.
-------
TABLE 8
NADB OUTPUT FOR COMMON QUESTIONS ON AIR QUALITY
Question
What data is available nationwide for a
particular pollutant?
Source
Inventory by pollutant
What cat a is available for a particular
geographical region?
Inventory by-site
f-rhat was maximum value at a site (annual)?
Any inventory
What'was mean value at a site (annual)?
Any inventory -
if valid year
How many observations (annual)?
Any inventory
Status of a site with respect to NAAQS?
Frequency Distribution
Time Sharing Option (TSO)
Quarterly or monthly data
Raw data
Description of the site such as UTM coordi-
nates, county, operating agency, etc.
Site File
-------
REFERENCES
, W. J. , "Practical Nonparametrie Statistics , "
[iley and Sons, Inc., New York, 1971.
2.
Pol
. F. , Jr. , "rj'ho Precision Associated with the
5 Pr^am.-ncy o:': Log-Normally Distributed i\i.\:
nt Measurements," Journal of the 7iir Pollution
Association, Volume 22, No. 4, September 1972.
3..
R- I. / "A Mathematical Model fox- Ralating Air
.Measiirementrj to Air Quality Standards," AP-C9,
1971.
4. Nchls^ G. J. and G. G. Akland, "Procedures for Handling
Aercir.e'tric D;.'.t--\, " Journal of the Air Pollution Control
Association, Volume 23, No. 3, March 1973.
5; "Guidb.linels for the Evaluation of Air Qiiality Trends,
Inter,! Guidelines," U. S. Environmental Protection
Agency, Offico of Air Quality Planning and Standards,
Re'sc^fch' Triangle Par]:, N. C. , OAQPS No. 1.2-014,
December 1973.
------- |