OAQPS 78-9, II-A-9
                ANALYSIS OF PEAK DAILY MAXIMUM N02 VALUES

             AND ASSOCIATED ANNUAL AVERAGES IN 1979-1981 DATA
                              FEBRUARY 1983
Thomas McCurdy
Richard 8. Atherton
                         Ambient Standards Branch
                  Strategies and Air Standards Division
                   U.S. Environmental  Protection Agency
                       Research Triangle Park, N.C.

-------
     This document has been prepared  by staff of  the  Office  of  Air
Quality Planning and Standards  and is being  circulated  for technical
review and public comment.   Anyone wishing to comment on  this report
may address their remarks to Thomas McCurdy, MD-12, U.S.  Environmental
Protection Agency, Research Triangle  Park, N.C.   27711.

-------
INTRODUCTION

     The Staff Paper1 developed as part of the on-going review of the
current nitrogen dioxide (N02) national ambient air quality standard contains
material on monitored N0£ air quality.  Data used in the Staff Paper are from
the 1977-1979 time period.   More recent N02 data are now available and were
analyzed to determine if relationships and conclusion discussed in the Staff
Paper still hold.  This brief report documents one such analysis
focused mainly on whether or not relationships shown in Figure 5 of the
Paper^ are seen in current  air quality data.
THE NEW DATA BASE

     The Environmental Protection Agency (EPA) maintains numerous computer
files of aerometric data.  The file of interest for our purposes is called
SAROAD, or Storage and Retrieval of Aerometric Data.  SAROAD is updated
periodically by State and local air pollution control  agencies.  The most
recent N02 data for a full year in SAROAD as of this writing is for
1981.  Since EPA usually allows States to use three years of data: in
determining compliance with a NAAQS, the data base used here is for the
time period 1979-1981?

     The data were screened to first identify all urbanized areas of the
country that exceeded the current annual average-of 0.053 ppm (100 ug/m3).
Only sites having a valid annual average were considered^  Six urban
areas and one non-urban area in the U.S. were found to exceed the annual
average at least one year in the last three.  Only two of these areas exceed
the standard level using a continuous monitoring method; the remaining
five areas that exceed the annual average use a non-continuous (24-hour)
method?,  In all cases where one of these areas has both continuous and
non-continuous data, the area meets, the annual standard under the first
method but does not under the second.  This situation indicates that the
non-continuous sampling method may be overestimating N02 daily concentrations
(and, consequently, annual averages) or that the continuous sampling method
underestimates N02 concentrations (and, consequently,  annual averages).
The methods appear- to be biased in different directions^

     If only the most recent valid year of data are used in the seven areas
instead of the highest year in the last three, only three areas exceed the
0.053 ppm annual average standard.  Only three areas exceed the standard if
the average of the valid 1979-1981 annual averages is computed.  (However,
two of the seven areas have only one year of data so a three year average
could not be computed.)

     After looking at the annual averages data, the maximum 1-hour N02
values were screened to identify all monitoring sites  in the U.S. that had
at least one 1-hour N02 value >_ 0.15 ppm in 1979-1981.  From this list
the site with the highest 1-hour value and a valid annual average in each
county was identified.  These 63 sites are listed in Table 1.  This table
forms the basis for most of the analyses presented in this report.  •

-------
                                                        Table 1

                                       N02 SITES WITH 1ST HIGH HOUR >_ .15 PPM AND
                                          A VALID ANNUAL AVERAGE* IN 1979-1981
Site //
County
01
03
1)5
1)5
U'j
05
05
05
05
05
05
0!i
U'j
05
05
05
05
05
06
06
06
(«>
lib
U'J
.i^ Value Indicated  (ppm)

0.15   0.20   0.25   0.30   0.35   0.40
18
4
1
2
9
4
57
1
38
4
3
19
38
3
2
28
1
3
3
2
23
1
8
1
10
7
1
1
0
1
1
20
0
24
1
0
2
11
0
0
7
0
1
1
1
4
0
2
0
4
3
0
0
0
0
0
6
0
16
1
0
1
1
0
0
1
0
1
0
1
1
0
0
0
3
2
0
0
0
0
0
3
0
8
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
2
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

-------
Site #
County
14
15
15
15
18
21
n
22
23
26
26
29
31
31
31
32
33
33
33
36
36
36
36
3tt
3'j
1220
0640
2040
4220
2380
0120
0680
2640
1180
4280
L040
0320
5560
3480
0180
0040
4680
0660
5760
1460
1220
7760
1000
1460
7260
039
003
030
002
034
004
001
012
020
064
001
001
002
008
003
015
003
005
004
004
019
007
001
002
031
F01
F01
F01
J02
601
HOI
G01
F01
G01
HOI
G01
G01
F01
F01
F01
H02
F01
F01
F01
F01
H05
101
H02
F01
G01
Cook, IL
Clark, IN
Marion, IN
Vanderburgh, IN
Jefferson, KY
Baltimore, MD
Balti-Essex, MD
Worcester, MA
Wayne, MI
St. Louis, MO
St. Louis City, MO
Clark, NV
Cumberland, NJ
Essex, NJ
Hudson, NJ
Bernalillo, NM
Bronx, NY
Erie, NY
Monroe, NY .
Franklin, OH
Hamilton, OH
Mahoning, OH
Stark, OH
Multnomah, OR
Allegheny, PA
Year
1981
1981
1981
1981
1979
1979
1981
1979
1980
1980
1979
1980
1979
1979
1981
1980
1979
1979
1979
1980
1979
1979
1980
1980
1981
Annual
Average
.032
.016
.030
.014
.035
.032
.025
.026
.036
.035
.028
.038
.024
.044
.029
.020
.038
.024
.019
.020
.036
.050
.029
.028
.028
       Number of Daily Max 1-Hour
     Values >_ Value Indicated (ppm)

0.15   0.20   0.25   0.30   0.35   0.40
1
1
5
1
1
7
1
4
1
16
11
23
3
7
1
1
6
1
1
10
3
43
1
1
2
1
0
3
0
0
2
0
1
0
6
4
12
0
3
0
0
1
0
0
4
2
25
0
0
0
0
0
2
0
0
1
0
1
0
2
1
10
0
1
0
0
0
0
0
1
1
12
0
0
0
0
0
1
0
0
0
0
0
0
1
0
7
0
0
0
0
0
0
0
1
0
2
0
0
0
0
0
1
0
0
0
0
0
0
1
0
6
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0

-------
SUe
Mean Value

Median Value
County	^	
3y
39
3'J
3'J
3'J
4!)
4b
41.
4b
4«
48
48
'jlJ
7620
1080
3880
9430
(J780
3910
1880
0060
0920
0080
0200
1060
0280
009
012
401
016
017
022
003
001
001
009
020
014
004
F01
F01
F01
F01
F01
F01
F01
F01
F01
HOI
G01
G01
F01
Berks, PA
Bucks, PA
Delaware, PA
Luzerne, PA
Northampton, PA
Ector, TX
Tarrant, TX
Davis, UT
Salt Lake, UT
Alexandria, VA
Arlington, VA
Fairfax, VA
Kanawha, WV
                                                                                  Number of Daily Max 1-llour
                                                                                Values >^ Value Indicated  (ppm)
Year
1979
1979
1979
1979
1979
1979
1980
1980
1979
1979
1979
1979
1981


Annual
Average
.033
.029
.031
.035
.029
.013
.021
.023
.031
.035
.030
.029
.021
.031
.030
0.15
2
3
2
4
3
1
8
6
4
10
3
3
1
8.8
3.8
0.20
0
0
0
1
2
0
2
1
0
4
2
0
0
3.0
1.0
0.25
0
0
0
0
0
0
0
0
0
2
0
0
0
1.3
0
0.30
0
0
0
0
0
0
0
0
0
0
0
0
0
0.5
0
0.35
0
0
0
0
0
0
0
0
0
0
0
0
0
0.2
0
0.40
0
0
0
0
0
0
0
0
0
0
0
0
0
0.2
0
   *See footnote  4  in  the text for the definition of a valid annual
    Table is  parts  per million (ppm).
                                           average.  The unit of the annual  average  in  this

-------
     The first three major columns of Table 1  present  monitoring  site  and
county identifiers and the year during which the  NO2 data  presented  in the
remaining columns come from.   The next column  lists  the  annual  average in
parts per million.  The remaining six columns  contain  simple  counts  of the
number of days in the year of interest with a  daily  maximum 1-hour value
equal to or exceeding certain specified "cutpoint"  values.  The cutpoint
values are 0.15 ppm to 0.40 ppm by 0.05 ppm increments.

     Every site in Table 1 has at least one day'with a 0.15 ppm 1-hour
maximum, since that was the basis for developing  the list  originally.   The
county with the most days over a 0.15 ppm daily maximum  has 57  days  over
this cutpointZ  The mean number of days over 0.15 ppm  is 8.8, but the
median is a much lower 3.8 days.

     As the daily maximum cutpoint values increase,  the  number  of days at or
over the cutpoint value decrease dramatically.  For  a  0.20 ppm  1-hour  daily
maximum cutpoint, the number of days over that  value varies from  zero
to 25.  The mean value for the data set is 3 days over a 0.20 ppm daily
maximum, and the median value is one day.  The  mean  value  for the data set
of days over a 0.25 ppm cutpoint is 1.3 and the median value  is zero,
because most of the sites have no days over 0.25  ppm.  In  fact, the  median
value for the remaining daily maximum cutpoints is  zero  for the same reason.
A graph of the mean number of days over the daily maximum  cutpoints  appears
as Figure 1.
REPETITIVE DAYS WITH 0.15 PPM OR HIGHER 1-HOUR MAXIMUM

     The Staff Paper expresses concern about  multiple  day  exposure  to  N02
hourly peaks above certain cutpoints.   This issue was  investigated  to  some
extent in this analysis.

     All sites listed in Table 1 that  had more than  15 days  with  a  1-hour
peak of 0.15 ppm or higher were identified.  These 10  sites  are listed in
Table 2.  A simple count of the number of consecutive  days with a 1-hour
daily maximum _> 0.15 ppm also appears  in the  Table.  Not unexpectedly,
most days with a maximum 1-hour value  over 0.15 ppm  occur  in isolation.
Then comes two days in succession,  three, and so forth.   (The  monotonically
declining pattern stops at six days, however.)

     There are a few relatively long-term events having numerous  consecutive
days with a daily maximum 1-hour value of 0.15 or higher.  They are
highlighted in the Table 2 column entitled "Other."  Three sites  have  20,
13, and 8 days in succession, respectively, over a 0.15 ppm  daily maximum value.

     An examination of the 0.20, 0.25, and 0.30 ppm  cutpoint values was
made concerning the repetitive days issue.  The data are listed in  Table 3.

-------
                                       Figure  1
                    MEAN NUMBER AND RANGE  OF  THE  NUMBER OF DAYS _>
                     SPECIFIED 1-HOUR NOe  DAILY MAXIMUM VALUES IN
                              U.S. COUNTIES  IN  1979-1981
 «)
 u
 o

 c

 >
^
d

CM
_o
7
U
0)
       60
50
      40
       30
      20
       10
                0.15      0.20     0.25     0.30    0.35

                 Daily  Maximum 1-Hour N02 Value  (nptn)
                                                       0.40

-------
                                  Table 2

                    NUMBER OF CONSECUTIVE DAYS HAVING A
                       DAILY MAXIMUM 1-HOUR HQ2 VALUE
                 > .15 PPM IN 1977-1979 FOR SELECTED SITES*
                     Number of Consecutive Days with a
                          Daily Maximum _>. -15 ppm

                                                        Other
Site #	123456    7    (# Days/Freq.)    Total
01 1860
05 5820
05 0230
05 8440
05 6800
05 6980
06 0580
26 4280
29 0320
36 7760
7
12
5
3
7
4
12
7
1
13
2
6
6
3
6
3
2
2
0
9
1
2
0
2
3
2
1
0
2
1
1
0
2
1
0
0
1
0
2
1
0
0
0
0
2
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
20/1
13/1
0
0
0
0
0
8/1
0
18
57
38
19
38
28
23
16
23
43
Total          71    39   14    8    5    0    2          -            303
*Sites listed in Table 1 having more than 15 days with a daily maximum peak
>_ 0.15 ppm.

-------
                                  Table 3

                     NUMBER OF CONSECUTIVE DAYS HAVING
                      A DAILY MAXIMUM 1-HOUR N02 VALUE
                     _> SPECIFIED CJTPOINTS IN 1977-1979
                            FOR SELECTED SITES*
Value
(ppm)
 Site #
                         Number of
                            With a
                           > Value
 1
                       Consecutive Days
                       Daily Maximum
                       Shown at Left
                        Other
                      (#Days/Freq.)
0.20
0.25
0.30
05 0230
05 5820
05 6300
29 0320
36 7760

05 0230
05 5820
29 0320
36 7760

05 0230
29 0320
 3
 7
 5
 2
13

 2
 3
 0
 4

 5
 1
4
2
3
1
4

2
0
1
2

1
1
2
0
0
0
0

1
1
0
0

0
0
0
1
0
0
1

0
0
0
1

0
1
7/1
5/1
  0
3/1
  0

7/1
  0
8/1
  0

  0
  0
*Sites listed in Table 1 having more than the following number of days  with
a daily maximum peak 2. the value shown:

            0.20 ppm = 7 days
            0.25 ppm = 4 days
            0.30 ppm = 3 days

-------
REGRESSION & OTHER ANALYSES OF THE DATA

     Three regression analyses were undertaken on the data presented in
Table 1 in order to determine if a reasonable relationship could be obtained
between N02 annual average and number of days at or above the outpoints
mentioned before.8  In general, the regressions were not strong and got
worse as the outpoints increased.  For example, for the 0.15 ppm daily
maximum outpoint, the following R^ (coefficient of determination) values
were obtained:

            Form of the Regression                      R^_

            Linear                                      0.44
            Semi-Logarithm                              0.32
            Log-Log                                     0.35

Regressions on the 0.20 ppm set of data showed even lower R2s.

     Plots of the data indicated a wide scatter, particularly for sites
with annual averages below 0.04 ppm.   On the advice of statisticans in
EPA's Data Analysis Section,9 the median values of the number of days above
a 0.15 ppm daily maximum were plotted against median values of grouped N02
annual averages.  The groups of annual averages were developed simply by
dividing the annual averages into 0.01 ppm intervals, so that averages
between 0.00-.009 ppm formed one group, 0.01-0.019 formed another,  and
so forth.  This plot, along with the observed data, for the 0.15 ppm outpoint
is depicted as Figure 2.  As can be seen, there is a lot of scatter around
the median/ grouped relationship, which generally follows an exponential
curve.

     The data in Table 1 do not reflect the amount of monitoring that
occurred during the year of analysis.  One site had as many as 361  days of
valid data while another had as few as 257.10  Consequently, analyses of
the number of days at or above any cutpoint without considering the number of
days of available data can be misleading.  The statistic that is more
relevant, and probably more stable, is the proportion of days in a  365-day
year at or above a cutpoint.  Looked at another way, this would be  the
"expected number" of days in a full year at or above a specified cutpoint.

     Such a normalized statistic was developed for the county monitoring
sites listed in Table I.11   A series of linear, non-linear, and logistic
regression analyses were undertaken on the normalized data to obtain a
"best-fit" relationship between number of expected days with a daily maximum
NOg values >_ specified outpoints and annual average N0£ concentration.
The analyses used programs in the BMDP statistical package.12

     The models listed below were hypothesized to fit the data.13  in all
cases the following symbols apply:

-------
o
M
(U
CM
O
_o
T
e
=
CO
C3


!B
i^
TJ
•rt
3

0)

CO
O

U-l
O
V
I
»0
0
0
0
/•>
u
0
0
0.





— ™- 	 ' 	










	




















1 ...

1 1

• • ! ' • • •
i i it
_.. • i 	 • ; '
Figure 2
GROUPED MEDIAN RELATIONSHIP BETWEEN NUMBER
OF DAYS WITH A DAILY MAXIMUM 1-KOUR N02
VALUE £0.15 PPM AND NO 2 ANNUAL AVERAGE


















'••{.'•
, • • i ( • !

i . i • • .
i ' i . '
1 i - t • i
: . i

1 ; • ' ; i

' . . • • . •
! - • . •
: > - 1 '1
1 - • . '











! . , I t 1 1



! " : • *

1 ' ' .

: ! i
i • ; ' ;
i . 'i



' • i ; t
. !
1 . : i




! . .
, i •
i i '
i i '
' • : ' ,
1 r : t
, . • i . - : i

. i •


i ! ' !
1 ' 1 : , ' i '

i • 1 • . .
' • '• 1 '
1 •. ' '.





; 1 ! . ' :

i \ : ; ' '

i ! . • i ' • '
* 1


: 1 i

1 ' . . 1


-lit' ;
. : i ' r



1 . i i ; i — r —
i ' I ! .
tit'

i • • •
i f ' ! • '
i ! • ; ;
i _ - • ; |


; ( * i i ,

.It!'! L
' \ F ' i '
< i i : .
: • • ; > i i
' ' . • !













i
: ' ' '

'- • , '

! ' '


r r ! ' - ,

i ! | i ;
: •• ' ' i : '




' ' : ' . ' i .

. i -
1 . '• . ' i : T
• . ' ! • i i '


1

1 , ' ; ' I

I t 1 . •
. t








lit'' ' • .
• • 1 • ' ' 1
ill 1 t 1 !

i . i ' ! ' ! i '

1 , ' i » ! 1 t 1
. . 1 • 1 . 1 , i
1 j ! . . ! ! .
! l r ' 1 1 1 I
1 t ! i ' I ' 1






..... r
i - i • : .
. : i • . • •
' ' i • .1 i • i ^
.;•.;*!••
' » «_ij
i*^-**^H=T









1




1 1 • '


• . i i ; •



. : . i i ! !


• • • p . : : .
1 ' . . • ,
• '- • , 1 I

1
: , i l i i




• • • . i i ' • ,


.'.>•• ' '
, , i . '




1


' • ' • . .


- '• • i ' ^ '

' i :

"1 — ! — 1 ! i J_'
' . i j i '
( : ' • 1 1
i i • -. f i
I ' i ' 1
: i : ' • * .
: ' • i ' i •
' . t . . . t •
. 1 . i , ' •

•l'. , ./
* *' - f
' ^ ' f '
' ' ' ' -f
• ' ' ^f ' '
i jJ-HR' • :
\*f^ A
4 ••















i








1 - -








• 1



' . • . •
. , 9

: i i • . ;








J
... /
/
/
/
'• f

'•'• / *
: • / ,
><•/'•.
••••/•<:.
•'/•'•
• 1 / . ! , •
' f\ • • • '
/ . . : . .
f'

iii i ' '
' i ' 1 •

1 - I .


• i 1



	
Iz







	


!

i








i
1









.1 . . "*











1 :


—
/
/
/•
/•

' /: • ' ' :
/
f
t
t
i
/

f



~ • ' • •

!
1 '






' • . . • : ; .



• - . . ! . ,




























'

,






































• • . 1



00 0.01 0.02 0.03 0.04 0.05 0.06
                 N02 Annual Average  (ppm)
                                     10

-------
            A
            y
            Pi


            exp
Estimated expected number of days with a daily maximum
1-hour N02 >_ a specified cutpoint (the dependent
variable).

Parameters of the regression equation, where i is from
1 to 3 depending upon the model being analyzed.

The base of the natural logarithm, e  (2.7133), raised
the power contained in the following parentheses.

NC>2 annual average concentration  (the independent
variable).
     Model 1.
     Model 2.
     Model 3.
     Model 4.
Linear Regression.

 y  =  PI x  +  P2

Two-Parameter Exponential.

 y  =  PI exp (p2 x)

Three-Parameter Exponential

 y  =  P! exp

Logistic Regression.
                     y  =  365
                .  exp (pi + P2 x)
                   1 + exp
                                                     p2 x )
     In addition, a 4-parameter exponential  model  was attempted, but it
would not converge for the 0.20 ppm or higher daily maximum cutpoints.
     Results of the regression analyses are listed in Table 4.  The
for Models 2, 3, and 4 are reasonable for the 0.15 and 0.20 ppm daily maximum
cutpoints, but are not very good for the remaining models and cutpoints.

     Model 3, the three-parameter exponential, performs slightly better
than the others.  However, the improvement in R
-------
                                  Table 4
                     RESULTS OF FITTING SELECTED CURVES
                      TO EXPECTED NUMBER OF DAYS WITH
                     A DAILY MAXIMUM 1-HOUR N02 VALUE _>
                 SPECIFIED OUTPOINT VS. MOa ANNUAL AVERAGE
Daily
Maximum
Cuptoint
(ppm)
0.15



0.20



0.25



0.30



Model
Number*
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Fitted
. P!
0.39
0.42
0.25
5.69
0.37
0.10
0.13
8.38
0.17
0,05
0.15
9.61
0.64
0.01
0.01
10.10
Parameters
P?
-18.95
0.08
0.09
- 0.08
- 8.55
0.09
0.09
- 0.10
- 4.06
0.09
0.07
- 0.11
- 1.48
0.10
0.10
- 0.10
of the Model „
P, R2

_
1.43
-
_
.
- 0.29
-
—
-
- 0.66
-
—
_
0.02
_

0.47
0.66
0.67
0.66
0.38
0.54
0.54
0.54
0.27
0.35
0.36
0.34
0.14
0.22
0.22
0.22
*See the text for a description of the model  associated with  each  model
number.  Model 1 is linear regression, Models 2 and 3 are two- and three-
parameter exponentials, and Model  4 is a logistic regression.   For these
equations to be used properly,  the NO2 annual average independent  variable
must be in units of parts per billion.
                                     1 9

-------
     A computerized plot of the logistic regression curve appears as Figure 3.
Its shape is quite similar to the grouped median relationship shown in
Figure 2.

     To use any parametric model  to predict outcomes from sampled data
requires that chance variation in the data be assessed and incorporated
into the model.  For many statistical models, this is done using confidence
intervals, or confidence bands.  Computing a confidence band for a logistic
curve is a complex undertaking due to the non-normal distribution of the
underlying data.  One method to compute the band was found in the literature:
it was developed by statisticans  at the University of California!^

     Applying this confidence band computational  method to the N02 air
quality data base provides the 90% and 95% confidence intervals shown in
Figure 4.  The confidence bands obviously are quite wide due to the large
scatter in the underlying data.  The bands are narrowest at the mid-point
of the N02 annual average data, as expected.

     The algorithms used to develop the confidence bands shown in Figure 4
were exercised to provide estimates of the expected number of days with a
daily maximum 1-hour N0£ value _>. 0-15 ppm at the 90% and 95% confidence
levels.  These estimates are provided in Table 5.  For a 0.053 ppm annual
average.  The best estimate is 35, and the 95% and 90% confidence bands are
2-230 and 3-200, respectively.

     One final regression analysis was undertaken on the data.  It involved
treating the cutpoint specification (i.e., 0.15 ppm,'0.20 ppm, etc.) as an
independent variable as well as the annual average N0£ concentration.  This
formulation uses all the data in  the regression analysis instead of partitioning
the data set into different equations for each cutpoint.  The resulting output
can be viewed as a three-dimensional relationship -- a surface, rather than a
two-dimensional relationship.  The three dimensions are (1) number of days per
year at or above a specified daily maximum cutpoint, (2) NOg annual average,
and (3) the specified daily maximum 1-hour concentration cutpoint.  The last
dimension is comprised of a discrete variable having four possible values:
0.15, 0.20, 0.25, and 0.30 ppm.  (Thus, the surface is discontinuous.)

     Various linear models using  exponental independent variables were tested.
In general, they were of the form:



               6,      i     6
     Y =  a  +i bxi x   +  I  b2i X'1
               1=0    i     1=0     '

-------
                     Figure 3



        LOGISTIC RELATIONSHIP FITTED TO  DATA

REPRESENTING THE EXPECTED NUMBER OF DAYS>  0.15 PPM

      DAILY MAXIMUM 1-HOUR N02 CONCENTRATION
           .... + .... + .... + ....+	+ .... +	+ .... + .	T. ... + ...


           +                                           t
     60
  c
  o
°" 2  =;Q
m AJ
rH C
 . Q)
O >*TI • • • T , •••T,»»»1T» • * • T • * • *



  0.02     0.03    0.04    0.05
                                                  0.06
            N02 Annual  Average Concentration  (ppm)
                          14

-------
a
a.
   c
   o
o  to
Al  £j
en  c
>*  »
0) H
4J -H
O CO
01 Q
0.
X CM
w o
  z
T3
0) ^
CO
w
     350
      300
     250
200
     150
100
      50
                                           Figure  4



                           90% AND 95%  CONFIDENCE BANDS FOR THE
                        LOGISTIC CURVE  FITTED TO DATA REPRESENTING
                    DAILY MAXIMUM 1-HOUR  N02  CONCENTRATIONS>  0.15 PPM
          0.0
             0.01      0.02     0.03      0.04
                                                         0.05      0.06
                      N02 Annual Average Concentration  (pptn)
                                    15

-------
                                        Table 5

                           ESTIMATED EXPECTED NUMBER OF DAYS*
                            WITH A DAILY MAXIMUM 1-HOUR N02
                      VALUE > 0.15 PPM, WITH CONFIDENCE INTERVALS
                      Lower Estimate for the                  Upper Estimate for the
                    Specified Confidence Level               Specified Confidence Level
NO? Annual                                          Best
 Average                 95%        90S          Estimate         90%        95%
  (PP"0             	;	

  0.020                  0           0               2             72-106
  0.030                  01               6             53         71
  0.040                  12              11             58         72
  0.050                  23              28            166        195
  0.053                  2           3-35            200        230
  0.060                  1           2              59            315        332
  *Rounded to nearest whole day,
                                         16

-------
where:
     /\
     Y =  (as before)

     a =  Intercept constant

     b =  Regression coefficients for independent variable xj
          (bi) and X2(b2) corresponding to the different exponents
          tested (the superscript i).

     x =  Independent variables: xj = annual average N02 concentration;
          X2 = specified daily maximum 1-hour N02 concentration cutpoint
          (e.g., 0.15, 0.20, 0.25, 0.30 ppm).  These variables are raised
          to the various powers denoted by i.

The regression analyses indicated that most variables were not significant
(at p = 0.05) and could be eliminated.  The best coefficient of determination
(R^) obtained by the above regression model provided an R? = 0.47.  This
coefficient is lower than those obtained earlier for the 0.15 and 0.20 ppm
cutpoints (see Table 4).

     Another model form was attempted that treated the two independent variables
as a multiplicative function.  The general form of the regression model was:


     Y =  a  + b [ f (xi • x2) ]
Four exponential variants were attempted,  and the best one, x^ and
x2 , produced an R^ = 0.63.  This R^ is almost as high as those obtained
earlier for the 0.15 ppm daily maximum 1-hour N02 cutpoint (see Table 4), and
is much higher. than those obained for other cutpoints.  The mulitplicati ve
model, therefore, provides the best overall fit to the data base taken as a
whole.  The coefficients for the best fit model are:


     Y =  -1.5 + [ 0.05 (xj • X22) ]

Plugging specified daily maximum 1-hour N02 cutpoints and alternative N02 annual
averages into this equation produces the results shown in Table 6.   Confidence
bands were not computed for this regression analysis   as a formula  to calculate
confidence levels was not readily available.  The non-linear nature of the
multiplicative relationship precludes using an "off-the-shelf" method.  The
data in Table 6 are shown graphically in Figure 5.

DISCUSSION

     All of the analyses -undertaken and reported in this paper point to one
conclusion:  there is a lot of variance in the data relating expected (and
actual) number of days at or above a specified daily  maximum N02 concentration
and annual average N02 value.  There is wide site-to-site variation, which
results in rather weak functional relationships having wide confidence bands.
                                       17

-------
                              Table 6

                 ESTIMATED EXPECTED NUMBER OF DAYS*
                  WITH A DAILY MAXIMUM 1-HOUR NO?
                  VALUE > SPECIFIED CONCENTRATIONS
N02
Annual                     Daily Maximum 1-Hour Concentration (ppm)
Average
(ppm)	0.15	Q.20	0.25	0.30

0.020                       0000
0.030                       5210
0.040                      13          7           4           2
0.050                      27         14           9           6
0.053                      32         17          11           7
0.060        .              47         26          16          11
*Rounded to the nearest whole day.   Values shown as 0 in the table
 include some that were negative in sign.
                             18

-------
                  RELATIONSHIPS AMONG EXPECTED NUMBER OF DAYS WITH A

                  DAILY MAXIMUM 1-HOUR N02 CONCENTRATION* SPECIFIED

            CUTPOINTS,  THE CUTPOINTS, AND ANNUAL  AVERAGE N02 CONCENTRATION
     50
 c
 c.
-o
0)
 o
 01
 a.'

-------
     In those counties in the data base that meet the 0.053 ppm NOg annual
average  standard, the highest observed number of days _> 0.15 ppm daily
maximum is 43 (47 on a normalized year basis).  The mean number of days
2.0.15 ppm for attaining counties, however, is only 7.1 (and the median is 3.5
days).  Thus, only a few areas that meet the 0.053 ppm N02 annual average
standard have very many days with a daily maximum 1-hour value _> 0.15 ppm.
Unhappily, these few areas have a disproportionally large impact on functional
relationships that are fit to the data.  This can perhaps best be seen by a
cumulative plot of the expected number of days 2. 0-15 ppm daily maximum
1-hour value.  This plot appears as Figure 6.  The Figure indicates that in
the "raw" data, for N02 attainment areas almost 90% of the values are less
than 20 days and 95% of the values are less than 30 days.  This fact becomes
obscured when relationships are fitted and-confidence bands are developed.

     The pattern shown in Figure 5 holds for the other daily maximum
cutpoints also.  In the expected number data base for N02 attaining areas,
90% of the counties have less than 1.1 days per year with a daily maximum
1-hour value _> 0.30 ppm.  In addition, 95% of the counties have less than
1.3 days with a daily maximum 1-hour value _> 0.30 ppm.  These numbers are
certainly quite small, but the site with the maximum number of day _> 0.30
ppm has 9.1 expected days, which "warps" the distribution acutely.  This,
in turn, adversely affects any relationship fitted to the data.  The
functional form analyses developed in this paper must be used with this
caveat in mind.
                                    20

-------
   100
    90
    80  ±—
    70
    60
 c
 o
(2   50
 e
o

I  40
u-
 0.15 PPM
               10
   20
30
                                           40
                               50
           Expected Number of Days with a Daily
          Maximum 1-Hour N02 Value * 0.15  ppm
                                   21

-------
                           FOOTNOTES & REFERENCES


1.  Office of Air Quality Planning and Standards.  Review of the National
    Ambient Air Quality Standards for Nitrogen Oxides:   Assessment  of
    Scientific and Technical  Information.   Research Triangle Park,  N.C.:
    U.S. Environmental  Protection Agency,  1981 (EPA-450/5-82-002).   Hereafter
    cited as Staff Paper.

2.  Ibid., p. 54.  The Figure is titled:   "Expected Number of Days  on Which
    Maximum 1-Hour N02 Concentrations Exceed 0.15 and 0.30 ppm Associated  with
    Annual Average Concentrations (based  on SAROAD data for 14 sites during
    1979-1980).

3.  National Air Data Branch.  "National  Aerometric Data Bank:  Quick Look
    Report" (Computer printout).  Research Triangle Park, N.C.:   U.S. Environmental
    Protection Agency,  January 21, 1983.   Hereafter cited as the "Quick  Look
    Report."

4.  A valid annual average meets certain  data reporting requirements set  by the
    National Air Data Branch, EPA.  The requirements depend upon the sampling
    interval used.  Since N02 ambient data are monitored using two  sampling
    periods, one-hour and 24-hour, two reporting requirements for a valid
    annual average apply.  For continuous  sampling (only intervals  less  than
    24 hours), data representing an annual period must  reflect a minimum
    of 75% of the total possible observations.  For N02, this means a
    continuous monitoring site must have  at least 6,570 hourly values for
    a valid annual average.

        For non-continuous sampling (a sampling interval of 24 hours or  more),
    data representing an annual  period must have four quarters of data having
    at minimum five observations per quarter.  If no measurements were made
    during one month of the  quarter, each  remaining month must have at least
    two observations.  For N02»  this means that a non-continuous monitoring site
    must have at least  20 daily  averages,  with each quarter having  at least
    5 daily averages spread  throughout the quarter as stated above.

        For more information, see:  National Air Data Branch. AEROS Manual
    Series; Volume III:  Summary and Retrieval.  Research Triangle  Park,
    N.C.:  U.S. Environmental Protection  Agency,  1981 (EPA-450/2-76-009b),
    p. 2.3.0-5 ff.

5.  N02 monitoring methods and their associated calibration procedures have
    changed greatly over the  years.   Currently there are three methods
    that are accepted by EPA  as  being a "Reference Method"  or equivalent
    for N02»  The Reference  Method is a continuous (one-hour) chemiluminescence
    procedure; the two  EPA-designated equivalent  methods are both non-continuous
    procedures:  NASN sodium  arsenite and  TGS-ANSA.  All of the 24-hour data
    used in this analysis are obtained using one  of the equivalent  methods,
    primarily the sodium arsenite procedure.  The continuous data,  on the
    other hand, are based upon the Reference Method and another procedure
    used extensively in California and a  few other States:   colorimetric-
                                   22

-------
    Lyshkow, a modification of the Greiess-Saltzman technique.   The
    colorimetric approach is not an equivalent method according to EPA,
    but data from it have to be used if data from California  are included
    in any comprehensive study of ambient N02 concentrations.

6.  Environmental Criteria and Assessment Office.   Air Quality  Criteria  for
    Oxides of Nitrogen.   Research Triangle Park,  N.C.:  Office  of Research
    and Development, 1982 (EPA-600/78-82-026F).

7.  There may be a site  in the U.S. with more than 57 days  over a 0.15 ppm
    daily maximum, because sites were selected to maximize  the  1-hour
    value in a county, not to maximize the number of days  over, say, 0.15
    ppm.  The NADB "Quick Look" program used to develop Table  1 did not
    contain the latter type of information.

8.  For purpose of the regression analyses,  N02 annual average  is treated
    as the independent variable and number of days with a  daily maximum
    2. specified cutpoints is treated as the dependent variable.  We treat
    the variables in this manner because we want  to be able to  predict,
    with some known degree of confidence, how many days over  a  certain
    cutpoint can be expected if the annual  average N02 concentration is
    kept below a specified value.  Doing this in  no way implies a causal
    relationship.  In other words, we do not imply, nor do  we believe
    that annual average  N02 concentrations cause  daily maximum  N02 values.
    The two variables are related, however,  and we are interested in how
    strong is this relationship.

9.  Neil Frank, personal  communication; February  10, 1983.

10. A valid day is one that has data for at least 75% of total  hours in
    the day (i.e., 18 hours or more).

11. The calculation is simply:

    Number of observed days in the
    year _> specified cutpoint
    	    x  365

    Number of valid days of data

12. W.J. Dixon, et £l_. BMDP Statistical Software  1981. Berkeley.  CA:  University
    of California Press,  1981.  The regression routines were programmed  by
    James Capel of PEDCo Environmental, Inc.

13. Ted Johnson of PEDCo Environmental, Inc. was  responsible for developing
    the hypothesized models.

14. However, there is no reason to expect the logistic curve to be the most
    theoretically correct model for the data being analyzed.  Theory of
    why air quality data is distributed in the manner it is, especially
    for the instant case, is really in an immature state.   However,  the
    relationship between expected number of days  with a 1-hour  N02 daily
                                   23

-------
    maximum at  or above  a  specified  outpoint  and N02  annual average does
    meet certain  characteristics  of  a  logistic curve.  Those properties
    include (1) lower and  upper bounds  to  possible  values of the dependent
    variable and  (2)  an  overall sigmoidal,  or S, shape to the curve.
    The authors are  indebted  to Dr.  William Biller  for these comments
    and to Ted  Johnson for related  insights into the  logistic model.

15.  Richard J.  Brand,  et_ aj_.   "Large Sample Confidence Bands for the Logistic
    Response Curve and Its Inverse,"  The  American  Statistician, 27(4):157-160
    (October 1973).   The algorithms  comprising the  confidence band solution
    methods were  programmed by Ted Johnson and Jim  Cape! of PEDCo Environmental,
    Inc.

16.  This procedure was suggested  to  us  by  Joseph Padgett, Division Director
    of the Strategies  and  Air Standards Division of EPA.  The stepwise
    regression  program BMDP1R of  the BMDP  package was used for the analysis.
    Ted Johnson and  Jim  Capel of  PEDCo  International, Inc. were responsible
    for developing the hypothesized  models and programming the routines.
                                   24

-------