OAQPS 78-9, II-A-9
ANALYSIS OF PEAK DAILY MAXIMUM N02 VALUES
AND ASSOCIATED ANNUAL AVERAGES IN 1979-1981 DATA
FEBRUARY 1983
Thomas McCurdy
Richard 8. Atherton
Ambient Standards Branch
Strategies and Air Standards Division
U.S. Environmental Protection Agency
Research Triangle Park, N.C.
-------
This document has been prepared by staff of the Office of Air
Quality Planning and Standards and is being circulated for technical
review and public comment. Anyone wishing to comment on this report
may address their remarks to Thomas McCurdy, MD-12, U.S. Environmental
Protection Agency, Research Triangle Park, N.C. 27711.
-------
INTRODUCTION
The Staff Paper1 developed as part of the on-going review of the
current nitrogen dioxide (N02) national ambient air quality standard contains
material on monitored N0£ air quality. Data used in the Staff Paper are from
the 1977-1979 time period. More recent N02 data are now available and were
analyzed to determine if relationships and conclusion discussed in the Staff
Paper still hold. This brief report documents one such analysis
focused mainly on whether or not relationships shown in Figure 5 of the
Paper^ are seen in current air quality data.
THE NEW DATA BASE
The Environmental Protection Agency (EPA) maintains numerous computer
files of aerometric data. The file of interest for our purposes is called
SAROAD, or Storage and Retrieval of Aerometric Data. SAROAD is updated
periodically by State and local air pollution control agencies. The most
recent N02 data for a full year in SAROAD as of this writing is for
1981. Since EPA usually allows States to use three years of data: in
determining compliance with a NAAQS, the data base used here is for the
time period 1979-1981?
The data were screened to first identify all urbanized areas of the
country that exceeded the current annual average-of 0.053 ppm (100 ug/m3).
Only sites having a valid annual average were considered^ Six urban
areas and one non-urban area in the U.S. were found to exceed the annual
average at least one year in the last three. Only two of these areas exceed
the standard level using a continuous monitoring method; the remaining
five areas that exceed the annual average use a non-continuous (24-hour)
method?, In all cases where one of these areas has both continuous and
non-continuous data, the area meets, the annual standard under the first
method but does not under the second. This situation indicates that the
non-continuous sampling method may be overestimating N02 daily concentrations
(and, consequently, annual averages) or that the continuous sampling method
underestimates N02 concentrations (and, consequently, annual averages).
The methods appear- to be biased in different directions^
If only the most recent valid year of data are used in the seven areas
instead of the highest year in the last three, only three areas exceed the
0.053 ppm annual average standard. Only three areas exceed the standard if
the average of the valid 1979-1981 annual averages is computed. (However,
two of the seven areas have only one year of data so a three year average
could not be computed.)
After looking at the annual averages data, the maximum 1-hour N02
values were screened to identify all monitoring sites in the U.S. that had
at least one 1-hour N02 value >_ 0.15 ppm in 1979-1981. From this list
the site with the highest 1-hour value and a valid annual average in each
county was identified. These 63 sites are listed in Table 1. This table
forms the basis for most of the analyses presented in this report. •
-------
Table 1
N02 SITES WITH 1ST HIGH HOUR >_ .15 PPM AND
A VALID ANNUAL AVERAGE* IN 1979-1981
Site //
County
01
03
1)5
1)5
U'j
05
05
05
05
05
05
0!i
U'j
05
05
05
05
05
06
06
06
(«>
lib
U'J
.i^ Value Indicated (ppm)
0.15 0.20 0.25 0.30 0.35 0.40
18
4
1
2
9
4
57
1
38
4
3
19
38
3
2
28
1
3
3
2
23
1
8
1
10
7
1
1
0
1
1
20
0
24
1
0
2
11
0
0
7
0
1
1
1
4
0
2
0
4
3
0
0
0
0
0
6
0
16
1
0
1
1
0
0
1
0
1
0
1
1
0
0
0
3
2
0
0
0
0
0
3
0
8
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
2
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
-------
Site #
County
14
15
15
15
18
21
n
22
23
26
26
29
31
31
31
32
33
33
33
36
36
36
36
3tt
3'j
1220
0640
2040
4220
2380
0120
0680
2640
1180
4280
L040
0320
5560
3480
0180
0040
4680
0660
5760
1460
1220
7760
1000
1460
7260
039
003
030
002
034
004
001
012
020
064
001
001
002
008
003
015
003
005
004
004
019
007
001
002
031
F01
F01
F01
J02
601
HOI
G01
F01
G01
HOI
G01
G01
F01
F01
F01
H02
F01
F01
F01
F01
H05
101
H02
F01
G01
Cook, IL
Clark, IN
Marion, IN
Vanderburgh, IN
Jefferson, KY
Baltimore, MD
Balti-Essex, MD
Worcester, MA
Wayne, MI
St. Louis, MO
St. Louis City, MO
Clark, NV
Cumberland, NJ
Essex, NJ
Hudson, NJ
Bernalillo, NM
Bronx, NY
Erie, NY
Monroe, NY .
Franklin, OH
Hamilton, OH
Mahoning, OH
Stark, OH
Multnomah, OR
Allegheny, PA
Year
1981
1981
1981
1981
1979
1979
1981
1979
1980
1980
1979
1980
1979
1979
1981
1980
1979
1979
1979
1980
1979
1979
1980
1980
1981
Annual
Average
.032
.016
.030
.014
.035
.032
.025
.026
.036
.035
.028
.038
.024
.044
.029
.020
.038
.024
.019
.020
.036
.050
.029
.028
.028
Number of Daily Max 1-Hour
Values >_ Value Indicated (ppm)
0.15 0.20 0.25 0.30 0.35 0.40
1
1
5
1
1
7
1
4
1
16
11
23
3
7
1
1
6
1
1
10
3
43
1
1
2
1
0
3
0
0
2
0
1
0
6
4
12
0
3
0
0
1
0
0
4
2
25
0
0
0
0
0
2
0
0
1
0
1
0
2
1
10
0
1
0
0
0
0
0
1
1
12
0
0
0
0
0
1
0
0
0
0
0
0
1
0
7
0
0
0
0
0
0
0
1
0
2
0
0
0
0
0
1
0
0
0
0
0
0
1
0
6
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
-------
SUe
Mean Value
Median Value
County ^
3y
39
3'J
3'J
3'J
4!)
4b
41.
4b
4«
48
48
'jlJ
7620
1080
3880
9430
(J780
3910
1880
0060
0920
0080
0200
1060
0280
009
012
401
016
017
022
003
001
001
009
020
014
004
F01
F01
F01
F01
F01
F01
F01
F01
F01
HOI
G01
G01
F01
Berks, PA
Bucks, PA
Delaware, PA
Luzerne, PA
Northampton, PA
Ector, TX
Tarrant, TX
Davis, UT
Salt Lake, UT
Alexandria, VA
Arlington, VA
Fairfax, VA
Kanawha, WV
Number of Daily Max 1-llour
Values >^ Value Indicated (ppm)
Year
1979
1979
1979
1979
1979
1979
1980
1980
1979
1979
1979
1979
1981
Annual
Average
.033
.029
.031
.035
.029
.013
.021
.023
.031
.035
.030
.029
.021
.031
.030
0.15
2
3
2
4
3
1
8
6
4
10
3
3
1
8.8
3.8
0.20
0
0
0
1
2
0
2
1
0
4
2
0
0
3.0
1.0
0.25
0
0
0
0
0
0
0
0
0
2
0
0
0
1.3
0
0.30
0
0
0
0
0
0
0
0
0
0
0
0
0
0.5
0
0.35
0
0
0
0
0
0
0
0
0
0
0
0
0
0.2
0
0.40
0
0
0
0
0
0
0
0
0
0
0
0
0
0.2
0
*See footnote 4 in the text for the definition of a valid annual
Table is parts per million (ppm).
average. The unit of the annual average in this
-------
The first three major columns of Table 1 present monitoring site and
county identifiers and the year during which the NO2 data presented in the
remaining columns come from. The next column lists the annual average in
parts per million. The remaining six columns contain simple counts of the
number of days in the year of interest with a daily maximum 1-hour value
equal to or exceeding certain specified "cutpoint" values. The cutpoint
values are 0.15 ppm to 0.40 ppm by 0.05 ppm increments.
Every site in Table 1 has at least one day'with a 0.15 ppm 1-hour
maximum, since that was the basis for developing the list originally. The
county with the most days over a 0.15 ppm daily maximum has 57 days over
this cutpointZ The mean number of days over 0.15 ppm is 8.8, but the
median is a much lower 3.8 days.
As the daily maximum cutpoint values increase, the number of days at or
over the cutpoint value decrease dramatically. For a 0.20 ppm 1-hour daily
maximum cutpoint, the number of days over that value varies from zero
to 25. The mean value for the data set is 3 days over a 0.20 ppm daily
maximum, and the median value is one day. The mean value for the data set
of days over a 0.25 ppm cutpoint is 1.3 and the median value is zero,
because most of the sites have no days over 0.25 ppm. In fact, the median
value for the remaining daily maximum cutpoints is zero for the same reason.
A graph of the mean number of days over the daily maximum cutpoints appears
as Figure 1.
REPETITIVE DAYS WITH 0.15 PPM OR HIGHER 1-HOUR MAXIMUM
The Staff Paper expresses concern about multiple day exposure to N02
hourly peaks above certain cutpoints. This issue was investigated to some
extent in this analysis.
All sites listed in Table 1 that had more than 15 days with a 1-hour
peak of 0.15 ppm or higher were identified. These 10 sites are listed in
Table 2. A simple count of the number of consecutive days with a 1-hour
daily maximum _> 0.15 ppm also appears in the Table. Not unexpectedly,
most days with a maximum 1-hour value over 0.15 ppm occur in isolation.
Then comes two days in succession, three, and so forth. (The monotonically
declining pattern stops at six days, however.)
There are a few relatively long-term events having numerous consecutive
days with a daily maximum 1-hour value of 0.15 or higher. They are
highlighted in the Table 2 column entitled "Other." Three sites have 20,
13, and 8 days in succession, respectively, over a 0.15 ppm daily maximum value.
An examination of the 0.20, 0.25, and 0.30 ppm cutpoint values was
made concerning the repetitive days issue. The data are listed in Table 3.
-------
Figure 1
MEAN NUMBER AND RANGE OF THE NUMBER OF DAYS _>
SPECIFIED 1-HOUR NOe DAILY MAXIMUM VALUES IN
U.S. COUNTIES IN 1979-1981
«)
u
o
c
>
^
d
CM
_o
7
U
0)
60
50
40
30
20
10
0.15 0.20 0.25 0.30 0.35
Daily Maximum 1-Hour N02 Value (nptn)
0.40
-------
Table 2
NUMBER OF CONSECUTIVE DAYS HAVING A
DAILY MAXIMUM 1-HOUR HQ2 VALUE
> .15 PPM IN 1977-1979 FOR SELECTED SITES*
Number of Consecutive Days with a
Daily Maximum _>. -15 ppm
Other
Site # 123456 7 (# Days/Freq.) Total
01 1860
05 5820
05 0230
05 8440
05 6800
05 6980
06 0580
26 4280
29 0320
36 7760
7
12
5
3
7
4
12
7
1
13
2
6
6
3
6
3
2
2
0
9
1
2
0
2
3
2
1
0
2
1
1
0
2
1
0
0
1
0
2
1
0
0
0
0
2
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
20/1
13/1
0
0
0
0
0
8/1
0
18
57
38
19
38
28
23
16
23
43
Total 71 39 14 8 5 0 2 - 303
*Sites listed in Table 1 having more than 15 days with a daily maximum peak
>_ 0.15 ppm.
-------
Table 3
NUMBER OF CONSECUTIVE DAYS HAVING
A DAILY MAXIMUM 1-HOUR N02 VALUE
_> SPECIFIED CJTPOINTS IN 1977-1979
FOR SELECTED SITES*
Value
(ppm)
Site #
Number of
With a
> Value
1
Consecutive Days
Daily Maximum
Shown at Left
Other
(#Days/Freq.)
0.20
0.25
0.30
05 0230
05 5820
05 6300
29 0320
36 7760
05 0230
05 5820
29 0320
36 7760
05 0230
29 0320
3
7
5
2
13
2
3
0
4
5
1
4
2
3
1
4
2
0
1
2
1
1
2
0
0
0
0
1
1
0
0
0
0
0
1
0
0
1
0
0
0
1
0
1
7/1
5/1
0
3/1
0
7/1
0
8/1
0
0
0
*Sites listed in Table 1 having more than the following number of days with
a daily maximum peak 2. the value shown:
0.20 ppm = 7 days
0.25 ppm = 4 days
0.30 ppm = 3 days
-------
REGRESSION & OTHER ANALYSES OF THE DATA
Three regression analyses were undertaken on the data presented in
Table 1 in order to determine if a reasonable relationship could be obtained
between N02 annual average and number of days at or above the outpoints
mentioned before.8 In general, the regressions were not strong and got
worse as the outpoints increased. For example, for the 0.15 ppm daily
maximum outpoint, the following R^ (coefficient of determination) values
were obtained:
Form of the Regression R^_
Linear 0.44
Semi-Logarithm 0.32
Log-Log 0.35
Regressions on the 0.20 ppm set of data showed even lower R2s.
Plots of the data indicated a wide scatter, particularly for sites
with annual averages below 0.04 ppm. On the advice of statisticans in
EPA's Data Analysis Section,9 the median values of the number of days above
a 0.15 ppm daily maximum were plotted against median values of grouped N02
annual averages. The groups of annual averages were developed simply by
dividing the annual averages into 0.01 ppm intervals, so that averages
between 0.00-.009 ppm formed one group, 0.01-0.019 formed another, and
so forth. This plot, along with the observed data, for the 0.15 ppm outpoint
is depicted as Figure 2. As can be seen, there is a lot of scatter around
the median/ grouped relationship, which generally follows an exponential
curve.
The data in Table 1 do not reflect the amount of monitoring that
occurred during the year of analysis. One site had as many as 361 days of
valid data while another had as few as 257.10 Consequently, analyses of
the number of days at or above any cutpoint without considering the number of
days of available data can be misleading. The statistic that is more
relevant, and probably more stable, is the proportion of days in a 365-day
year at or above a cutpoint. Looked at another way, this would be the
"expected number" of days in a full year at or above a specified cutpoint.
Such a normalized statistic was developed for the county monitoring
sites listed in Table I.11 A series of linear, non-linear, and logistic
regression analyses were undertaken on the normalized data to obtain a
"best-fit" relationship between number of expected days with a daily maximum
NOg values >_ specified outpoints and annual average N0£ concentration.
The analyses used programs in the BMDP statistical package.12
The models listed below were hypothesized to fit the data.13 in all
cases the following symbols apply:
-------
o
M
(U
CM
O
_o
T
e
=
CO
C3
!B
i^
TJ
•rt
3
0)
CO
O
U-l
O
V
I
»0
0
0
0
/•>
u
0
0
0.
— ™- '
1 ...
1 1
• • ! ' • • •
i i it
_.. • i • ; '
Figure 2
GROUPED MEDIAN RELATIONSHIP BETWEEN NUMBER
OF DAYS WITH A DAILY MAXIMUM 1-KOUR N02
VALUE £0.15 PPM AND NO 2 ANNUAL AVERAGE
'••{.'•
, • • i ( • !
i . i • • .
i ' i . '
1 i - t • i
: . i
1 ; • ' ; i
' . . • • . •
! - • . •
: > - 1 '1
1 - • . '
! . , I t 1 1
! " : • *
1 ' ' .
: ! i
i • ; ' ;
i . 'i
' • i ; t
. !
1 . : i
! . .
, i •
i i '
i i '
' • : ' ,
1 r : t
, . • i . - : i
. i •
i ! ' !
1 ' 1 : , ' i '
i • 1 • . .
' • '• 1 '
1 •. ' '.
; 1 ! . ' :
i \ : ; ' '
i ! . • i ' • '
* 1
: 1 i
1 ' . . 1
-lit' ;
. : i ' r
1 . i i ; i — r —
i ' I ! .
tit'
i • • •
i f ' ! • '
i ! • ; ;
i _ - • ; |
; ( * i i ,
.It!'! L
' \ F ' i '
< i i : .
: • • ; > i i
' ' . • !
i
: ' ' '
'- • , '
! ' '
r r ! ' - ,
i ! | i ;
: •• ' ' i : '
' ' : ' . ' i .
. i -
1 . '• . ' i : T
• . ' ! • i i '
1
1 , ' ; ' I
I t 1 . •
. t
lit'' ' • .
• • 1 • ' ' 1
ill 1 t 1 !
i . i ' ! ' ! i '
1 , ' i » ! 1 t 1
. . 1 • 1 . 1 , i
1 j ! . . ! ! .
! l r ' 1 1 1 I
1 t ! i ' I ' 1
..... r
i - i • : .
. : i • . • •
' ' i • .1 i • i ^
.;•.;*!••
' » «_ij
i*^-**^H=T
1
1 1 • '
• . i i ; •
. : . i i ! !
• • • p . : : .
1 ' . . • ,
• '- • , 1 I
1
: , i l i i
• • • . i i ' • ,
.'.>•• ' '
, , i . '
1
' • ' • . .
- '• • i ' ^ '
' i :
"1 — ! — 1 ! i J_'
' . i j i '
( : ' • 1 1
i i • -. f i
I ' i ' 1
: i : ' • * .
: ' • i ' i •
' . t . . . t •
. 1 . i , ' •
•l'. , ./
* *' - f
' ^ ' f '
' ' ' ' -f
• ' ' ^f ' '
i jJ-HR' • :
\*f^ A
4 ••
i
1 - -
• 1
' . • . •
. , 9
: i i • . ;
J
... /
/
/
/
'• f
'•'• / *
: • / ,
><•/'•.
••••/•<:.
•'/•'•
• 1 / . ! , •
' f\ • • • '
/ . . : . .
f'
iii i ' '
' i ' 1 •
1 - I .
• i 1
Iz
!
i
i
1
.1 . . "*
1 :
—
/
/
/•
/•
' /: • ' ' :
/
f
t
t
i
/
f
~ • ' • •
!
1 '
' • . . • : ; .
• - . . ! . ,
'
,
• • . 1
00 0.01 0.02 0.03 0.04 0.05 0.06
N02 Annual Average (ppm)
10
-------
A
y
Pi
exp
Estimated expected number of days with a daily maximum
1-hour N02 >_ a specified cutpoint (the dependent
variable).
Parameters of the regression equation, where i is from
1 to 3 depending upon the model being analyzed.
The base of the natural logarithm, e (2.7133), raised
the power contained in the following parentheses.
NC>2 annual average concentration (the independent
variable).
Model 1.
Model 2.
Model 3.
Model 4.
Linear Regression.
y = PI x + P2
Two-Parameter Exponential.
y = PI exp (p2 x)
Three-Parameter Exponential
y = P! exp
Logistic Regression.
y = 365
. exp (pi + P2 x)
1 + exp
p2 x )
In addition, a 4-parameter exponential model was attempted, but it
would not converge for the 0.20 ppm or higher daily maximum cutpoints.
Results of the regression analyses are listed in Table 4. The
for Models 2, 3, and 4 are reasonable for the 0.15 and 0.20 ppm daily maximum
cutpoints, but are not very good for the remaining models and cutpoints.
Model 3, the three-parameter exponential, performs slightly better
than the others. However, the improvement in R statistic for it is only
marginal over the two-parameter exponential and the logistic regression.
The improvement in .curve fitting is not a sufficient reason to favor using
the three-parameter exponential model instead of the logistic model, which
is theoretically more appealing!1^ Consequently, the remainder of this
analysis will assume that the logistic model provides an adequate model of
the relationship of interest between N02 annual average and expected
number of days with a daily maximum 1-hour N02 value at or above a specified
concentration.
11
-------
Table 4
RESULTS OF FITTING SELECTED CURVES
TO EXPECTED NUMBER OF DAYS WITH
A DAILY MAXIMUM 1-HOUR N02 VALUE _>
SPECIFIED OUTPOINT VS. MOa ANNUAL AVERAGE
Daily
Maximum
Cuptoint
(ppm)
0.15
0.20
0.25
0.30
Model
Number*
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Fitted
. P!
0.39
0.42
0.25
5.69
0.37
0.10
0.13
8.38
0.17
0,05
0.15
9.61
0.64
0.01
0.01
10.10
Parameters
P?
-18.95
0.08
0.09
- 0.08
- 8.55
0.09
0.09
- 0.10
- 4.06
0.09
0.07
- 0.11
- 1.48
0.10
0.10
- 0.10
of the Model „
P, R2
_
1.43
-
_
.
- 0.29
-
—
-
- 0.66
-
—
_
0.02
_
0.47
0.66
0.67
0.66
0.38
0.54
0.54
0.54
0.27
0.35
0.36
0.34
0.14
0.22
0.22
0.22
*See the text for a description of the model associated with each model
number. Model 1 is linear regression, Models 2 and 3 are two- and three-
parameter exponentials, and Model 4 is a logistic regression. For these
equations to be used properly, the NO2 annual average independent variable
must be in units of parts per billion.
1 9
-------
A computerized plot of the logistic regression curve appears as Figure 3.
Its shape is quite similar to the grouped median relationship shown in
Figure 2.
To use any parametric model to predict outcomes from sampled data
requires that chance variation in the data be assessed and incorporated
into the model. For many statistical models, this is done using confidence
intervals, or confidence bands. Computing a confidence band for a logistic
curve is a complex undertaking due to the non-normal distribution of the
underlying data. One method to compute the band was found in the literature:
it was developed by statisticans at the University of California!^
Applying this confidence band computational method to the N02 air
quality data base provides the 90% and 95% confidence intervals shown in
Figure 4. The confidence bands obviously are quite wide due to the large
scatter in the underlying data. The bands are narrowest at the mid-point
of the N02 annual average data, as expected.
The algorithms used to develop the confidence bands shown in Figure 4
were exercised to provide estimates of the expected number of days with a
daily maximum 1-hour N0£ value _>. 0-15 ppm at the 90% and 95% confidence
levels. These estimates are provided in Table 5. For a 0.053 ppm annual
average. The best estimate is 35, and the 95% and 90% confidence bands are
2-230 and 3-200, respectively.
One final regression analysis was undertaken on the data. It involved
treating the cutpoint specification (i.e., 0.15 ppm,'0.20 ppm, etc.) as an
independent variable as well as the annual average N0£ concentration. This
formulation uses all the data in the regression analysis instead of partitioning
the data set into different equations for each cutpoint. The resulting output
can be viewed as a three-dimensional relationship -- a surface, rather than a
two-dimensional relationship. The three dimensions are (1) number of days per
year at or above a specified daily maximum cutpoint, (2) NOg annual average,
and (3) the specified daily maximum 1-hour concentration cutpoint. The last
dimension is comprised of a discrete variable having four possible values:
0.15, 0.20, 0.25, and 0.30 ppm. (Thus, the surface is discontinuous.)
Various linear models using exponental independent variables were tested.
In general, they were of the form:
6, i 6
Y = a +i bxi x + I b2i X'1
1=0 i 1=0 '
-------
Figure 3
LOGISTIC RELATIONSHIP FITTED TO DATA
REPRESENTING THE EXPECTED NUMBER OF DAYS> 0.15 PPM
DAILY MAXIMUM 1-HOUR N02 CONCENTRATION
.... + .... + .... + ....+ + .... + + .... + . T. ... + ...
+ t
60
c
o
°" 2 =;Q
m AJ
rH C
. Q)
O >*TI • • • T , •••T,»»»1T» • * • T • * • *
0.02 0.03 0.04 0.05
0.06
N02 Annual Average Concentration (ppm)
14
-------
a
a.
c
o
o to
Al £j
en c
>* »
0) H
4J -H
O CO
01 Q
0.
X CM
w o
z
T3
0) ^
CO
w
350
300
250
200
150
100
50
Figure 4
90% AND 95% CONFIDENCE BANDS FOR THE
LOGISTIC CURVE FITTED TO DATA REPRESENTING
DAILY MAXIMUM 1-HOUR N02 CONCENTRATIONS> 0.15 PPM
0.0
0.01 0.02 0.03 0.04
0.05 0.06
N02 Annual Average Concentration (pptn)
15
-------
Table 5
ESTIMATED EXPECTED NUMBER OF DAYS*
WITH A DAILY MAXIMUM 1-HOUR N02
VALUE > 0.15 PPM, WITH CONFIDENCE INTERVALS
Lower Estimate for the Upper Estimate for the
Specified Confidence Level Specified Confidence Level
NO? Annual Best
Average 95% 90S Estimate 90% 95%
(PP"0 ;
0.020 0 0 2 72-106
0.030 01 6 53 71
0.040 12 11 58 72
0.050 23 28 166 195
0.053 2 3-35 200 230
0.060 1 2 59 315 332
*Rounded to nearest whole day,
16
-------
where:
/\
Y = (as before)
a = Intercept constant
b = Regression coefficients for independent variable xj
(bi) and X2(b2) corresponding to the different exponents
tested (the superscript i).
x = Independent variables: xj = annual average N02 concentration;
X2 = specified daily maximum 1-hour N02 concentration cutpoint
(e.g., 0.15, 0.20, 0.25, 0.30 ppm). These variables are raised
to the various powers denoted by i.
The regression analyses indicated that most variables were not significant
(at p = 0.05) and could be eliminated. The best coefficient of determination
(R^) obtained by the above regression model provided an R? = 0.47. This
coefficient is lower than those obtained earlier for the 0.15 and 0.20 ppm
cutpoints (see Table 4).
Another model form was attempted that treated the two independent variables
as a multiplicative function. The general form of the regression model was:
Y = a + b [ f (xi • x2) ]
Four exponential variants were attempted, and the best one, x^ and
x2 , produced an R^ = 0.63. This R^ is almost as high as those obtained
earlier for the 0.15 ppm daily maximum 1-hour N02 cutpoint (see Table 4), and
is much higher. than those obained for other cutpoints. The mulitplicati ve
model, therefore, provides the best overall fit to the data base taken as a
whole. The coefficients for the best fit model are:
Y = -1.5 + [ 0.05 (xj • X22) ]
Plugging specified daily maximum 1-hour N02 cutpoints and alternative N02 annual
averages into this equation produces the results shown in Table 6. Confidence
bands were not computed for this regression analysis as a formula to calculate
confidence levels was not readily available. The non-linear nature of the
multiplicative relationship precludes using an "off-the-shelf" method. The
data in Table 6 are shown graphically in Figure 5.
DISCUSSION
All of the analyses -undertaken and reported in this paper point to one
conclusion: there is a lot of variance in the data relating expected (and
actual) number of days at or above a specified daily maximum N02 concentration
and annual average N02 value. There is wide site-to-site variation, which
results in rather weak functional relationships having wide confidence bands.
17
-------
Table 6
ESTIMATED EXPECTED NUMBER OF DAYS*
WITH A DAILY MAXIMUM 1-HOUR NO?
VALUE > SPECIFIED CONCENTRATIONS
N02
Annual Daily Maximum 1-Hour Concentration (ppm)
Average
(ppm) 0.15 Q.20 0.25 0.30
0.020 0000
0.030 5210
0.040 13 7 4 2
0.050 27 14 9 6
0.053 32 17 11 7
0.060 . 47 26 16 11
*Rounded to the nearest whole day. Values shown as 0 in the table
include some that were negative in sign.
18
-------
RELATIONSHIPS AMONG EXPECTED NUMBER OF DAYS WITH A
DAILY MAXIMUM 1-HOUR N02 CONCENTRATION* SPECIFIED
CUTPOINTS, THE CUTPOINTS, AND ANNUAL AVERAGE N02 CONCENTRATION
50
c
c.
-o
0)
o
01
a.'
!
M
C
o
a
c
o
CJ
(J
3
O
X
iH
•H
X
3
CO
u-i
O
U
-------
In those counties in the data base that meet the 0.053 ppm NOg annual
average standard, the highest observed number of days _> 0.15 ppm daily
maximum is 43 (47 on a normalized year basis). The mean number of days
2.0.15 ppm for attaining counties, however, is only 7.1 (and the median is 3.5
days). Thus, only a few areas that meet the 0.053 ppm N02 annual average
standard have very many days with a daily maximum 1-hour value _> 0.15 ppm.
Unhappily, these few areas have a disproportionally large impact on functional
relationships that are fit to the data. This can perhaps best be seen by a
cumulative plot of the expected number of days 2. 0-15 ppm daily maximum
1-hour value. This plot appears as Figure 6. The Figure indicates that in
the "raw" data, for N02 attainment areas almost 90% of the values are less
than 20 days and 95% of the values are less than 30 days. This fact becomes
obscured when relationships are fitted and-confidence bands are developed.
The pattern shown in Figure 5 holds for the other daily maximum
cutpoints also. In the expected number data base for N02 attaining areas,
90% of the counties have less than 1.1 days per year with a daily maximum
1-hour value _> 0.30 ppm. In addition, 95% of the counties have less than
1.3 days with a daily maximum 1-hour value _> 0.30 ppm. These numbers are
certainly quite small, but the site with the maximum number of day _> 0.30
ppm has 9.1 expected days, which "warps" the distribution acutely. This,
in turn, adversely affects any relationship fitted to the data. The
functional form analyses developed in this paper must be used with this
caveat in mind.
20
-------
100
90
80 ±—
70
60
c
o
(2 50
e
o
I 40
u-
0.15 PPM
10
20
30
40
50
Expected Number of Days with a Daily
Maximum 1-Hour N02 Value * 0.15 ppm
21
-------
FOOTNOTES & REFERENCES
1. Office of Air Quality Planning and Standards. Review of the National
Ambient Air Quality Standards for Nitrogen Oxides: Assessment of
Scientific and Technical Information. Research Triangle Park, N.C.:
U.S. Environmental Protection Agency, 1981 (EPA-450/5-82-002). Hereafter
cited as Staff Paper.
2. Ibid., p. 54. The Figure is titled: "Expected Number of Days on Which
Maximum 1-Hour N02 Concentrations Exceed 0.15 and 0.30 ppm Associated with
Annual Average Concentrations (based on SAROAD data for 14 sites during
1979-1980).
3. National Air Data Branch. "National Aerometric Data Bank: Quick Look
Report" (Computer printout). Research Triangle Park, N.C.: U.S. Environmental
Protection Agency, January 21, 1983. Hereafter cited as the "Quick Look
Report."
4. A valid annual average meets certain data reporting requirements set by the
National Air Data Branch, EPA. The requirements depend upon the sampling
interval used. Since N02 ambient data are monitored using two sampling
periods, one-hour and 24-hour, two reporting requirements for a valid
annual average apply. For continuous sampling (only intervals less than
24 hours), data representing an annual period must reflect a minimum
of 75% of the total possible observations. For N02, this means a
continuous monitoring site must have at least 6,570 hourly values for
a valid annual average.
For non-continuous sampling (a sampling interval of 24 hours or more),
data representing an annual period must have four quarters of data having
at minimum five observations per quarter. If no measurements were made
during one month of the quarter, each remaining month must have at least
two observations. For N02» this means that a non-continuous monitoring site
must have at least 20 daily averages, with each quarter having at least
5 daily averages spread throughout the quarter as stated above.
For more information, see: National Air Data Branch. AEROS Manual
Series; Volume III: Summary and Retrieval. Research Triangle Park,
N.C.: U.S. Environmental Protection Agency, 1981 (EPA-450/2-76-009b),
p. 2.3.0-5 ff.
5. N02 monitoring methods and their associated calibration procedures have
changed greatly over the years. Currently there are three methods
that are accepted by EPA as being a "Reference Method" or equivalent
for N02» The Reference Method is a continuous (one-hour) chemiluminescence
procedure; the two EPA-designated equivalent methods are both non-continuous
procedures: NASN sodium arsenite and TGS-ANSA. All of the 24-hour data
used in this analysis are obtained using one of the equivalent methods,
primarily the sodium arsenite procedure. The continuous data, on the
other hand, are based upon the Reference Method and another procedure
used extensively in California and a few other States: colorimetric-
22
-------
Lyshkow, a modification of the Greiess-Saltzman technique. The
colorimetric approach is not an equivalent method according to EPA,
but data from it have to be used if data from California are included
in any comprehensive study of ambient N02 concentrations.
6. Environmental Criteria and Assessment Office. Air Quality Criteria for
Oxides of Nitrogen. Research Triangle Park, N.C.: Office of Research
and Development, 1982 (EPA-600/78-82-026F).
7. There may be a site in the U.S. with more than 57 days over a 0.15 ppm
daily maximum, because sites were selected to maximize the 1-hour
value in a county, not to maximize the number of days over, say, 0.15
ppm. The NADB "Quick Look" program used to develop Table 1 did not
contain the latter type of information.
8. For purpose of the regression analyses, N02 annual average is treated
as the independent variable and number of days with a daily maximum
2. specified cutpoints is treated as the dependent variable. We treat
the variables in this manner because we want to be able to predict,
with some known degree of confidence, how many days over a certain
cutpoint can be expected if the annual average N02 concentration is
kept below a specified value. Doing this in no way implies a causal
relationship. In other words, we do not imply, nor do we believe
that annual average N02 concentrations cause daily maximum N02 values.
The two variables are related, however, and we are interested in how
strong is this relationship.
9. Neil Frank, personal communication; February 10, 1983.
10. A valid day is one that has data for at least 75% of total hours in
the day (i.e., 18 hours or more).
11. The calculation is simply:
Number of observed days in the
year _> specified cutpoint
x 365
Number of valid days of data
12. W.J. Dixon, et £l_. BMDP Statistical Software 1981. Berkeley. CA: University
of California Press, 1981. The regression routines were programmed by
James Capel of PEDCo Environmental, Inc.
13. Ted Johnson of PEDCo Environmental, Inc. was responsible for developing
the hypothesized models.
14. However, there is no reason to expect the logistic curve to be the most
theoretically correct model for the data being analyzed. Theory of
why air quality data is distributed in the manner it is, especially
for the instant case, is really in an immature state. However, the
relationship between expected number of days with a 1-hour N02 daily
23
-------
maximum at or above a specified outpoint and N02 annual average does
meet certain characteristics of a logistic curve. Those properties
include (1) lower and upper bounds to possible values of the dependent
variable and (2) an overall sigmoidal, or S, shape to the curve.
The authors are indebted to Dr. William Biller for these comments
and to Ted Johnson for related insights into the logistic model.
15. Richard J. Brand, et_ aj_. "Large Sample Confidence Bands for the Logistic
Response Curve and Its Inverse," The American Statistician, 27(4):157-160
(October 1973). The algorithms comprising the confidence band solution
methods were programmed by Ted Johnson and Jim Cape! of PEDCo Environmental,
Inc.
16. This procedure was suggested to us by Joseph Padgett, Division Director
of the Strategies and Air Standards Division of EPA. The stepwise
regression program BMDP1R of the BMDP package was used for the analysis.
Ted Johnson and Jim Capel of PEDCo International, Inc. were responsible
for developing the hypothesized models and programming the routines.
24
------- |