National Institute of Statistical Sciences
          Modeling High Threshold Exceedances
                   of Urban Ozone
              Richard L. Smith and Li-Shan Huang

-------
Modeling High Threshold Exceedances
            of Urban Ozone
      Richard L Smith and Li-Shan Huang
            Technical Report # 6
             December, 1993
    National Institute of Statistical Sciences
              P.O. Box 14162
      Research Triangle Park, N. C. 27709

-------
 MODELING HIGH THRESHOLD EXCEEDANCES OF URBAN OZONE

                                       by

                     Richard L. Smith and Li-Shan Huang

                   National Institute of Statistical Sciences
                                       and
                         University of North Carolina

    First version:  December 28  1993.  Research supported by the U.S. Environmental
Protection Agency under Cooperative Agreement #CR819638-01-0.  Address for corre-
spondence:  Richard L. Smith, Department of Statistics, University of  North Carolina,
Chapel Hill, N.C. 27599-3260; email RS@STAT.UNC.EDU.

    Abstract. Urban ozone arises as a consequence of the emissions of nitrous oxides and
hydrocarbons into the atmosphere, but it is also very strongly affected by meteorological
conditions.  In analyzing ozone data, it is hard to separate genuine trends, that might be
explained by changes in the levels of emissions, from apparent ones determined by meteo-
rology. Previous statistical studies of this problem have used classification and regression
techniques. In particular,  Bloomfield et al. (1993) have developed a detailed nonlinear
regression model fitted to weighted averages of measured ozone concentrations at 45 mon-
itoring stations in the Chicago area.  Here, based on the same data set, we develop an
alternative analysis aimed specifically at characterizing the probability of  exceeding a high
threshold.

    Keywords:  Ozone concentration, Meteorological adjustment, Regression,  Extreme
value analysis, Threshold exceedances.


  1. Introduction                                                        Page 2
  2. The data                                                           Page 4
  3. Exceedances of a single threshold                                     Page 6
  4. Comparison with the  nonlinear regression approach                     Page 25
  5. Excesses over a threshold                                             Page 34
  6. Identifying the extreme ozone days  adjusted for meteorology            Page 49
  7. Recurrence of 1988                                                  Page 54
  8. Sums of excesses over a threshold                                     Page 56
  9. Conclusions and summary                                            Page 61
  References                                                             Page 63
  Captions to figures                                                     Page 66
  Figures                                                               Page 68

-------
    1. Introduction

    A major difficulty in the analysis of urban ozone data is how to separate genuine
trends, that might be the consequence of changes in precursor emissions, from the effects
of meteorological variability.  This problem has been reviewed in Chapter 2 of the report
of the National Research Council (1991), which discussed three broads approaches to it:

    • Measurement approaches: Find some other ozone indicator  (besides the present
national ambient air quality standard, which is based on the number of daily ozone maxima
exceeding a threshold of 120  ppb), which would be more "robust"  against meteorological
variability — for example, if might be beneficial to monitor the 80th or 95th percentile of
the ozone distribution, or exceedances of lower thresholds;

    • Classification approaches: Find a suitable classification of meteorological "types",
and consider ozone exceedances separately within each type;

    • Regression approaches: Construct a regression model linking ozone level to meteo-
rological  variables, and use that to identify days which have high ozone levels relative to
the background meteorology.

    Of the three approaches, the regression approach is the one most  amenable to a de-
tailed statistical analysis. In earlier work as part of the current project, Bloomfield, Royle
& Yang (1993) used principal  components analysis to determine suitable weighted averages
of ozone  levels at 45 urban stations in the  Chicago area, and then constructed a nonlin-
ear regression model to relate the resulting weighted average to measured meteorological
variables. Their model will be described in  more detail in Section 4.

    One possible disadvantage of regression  approaches is that they may fail to charac-
terize well what is happening in the extremes of the distribution. Since the current ozone
standard is defined in terms of extremes,  this is a natural concern.  The report of the
National Research Council (1991) specifically mentioned this as a drawback of regression
analysis,  pointing out that high ozone levels have been underpredicted by  such analyses,
and suggesting that the  explanation  lies in the standard method of fitting a regression
model, which is based on minimizing the sum of squares of residuals over the whole  data
set, not giving enough attention to the extremes of the data. Cox & Chu (1992) recognised
this as a difficulty with standard regression analyses based on the normal distribution, and
proposed an alternative model based on the Weibull distribution, using a maximum like-
lihood fit. The  reports of Cox & Chu (1992)  and Bloomfield et al. (1993) considered the
power of their models to predict the 95th and 99th percentiles of the ozone distribution,
with seemingly satisfactory results, but this leaves open the question of whether alternative
analyses, focussed specifically on extreme quantiles or threshold crossings,  could produce
better results. The main purpose of the present paper is to consider alternative methods
of analysis with just such an objective.

    Extreme value theory is  that branch of statistics concerned with distributions of ex-
treme values in random samples. Classical extreme value theory is based on asymptotic

-------
distributions for extremes in large samples (Gumbel 1958). Although there have been ap-
plications of classical extreme value techniques in air quality problems (Singpurwalla 1972,
Roberts 1979), these are somewhat restricted in their applications because data series are
generally too short for analyses based on annual maxima, which is the classical domain of
the theory.  A more specific objection to classical extreme value theory, in the context of
the present  study, is that it would be almost impossible to figure out how to incorporate
the meteorological effects which are our main interest in the context of a classical analysis.

     An alternative method of extreme value analysis which has been developed extensively
in recent years, however, is that based on exceedances over high thresholds. Among the
papers  developing this approach are Smith (1989),  Davison &: Smith (1990) and a recent
review by Smith  (1993). The broad aim of these methods is to find suitable probability
distributions for  exceedance times and excess values,  which  can then be estimated by
maximum likelihood.  A specific selling point  of these methods is  that  they  can easily
be extended to include covariates  through a regression analysis, and indeed the paper of
Smith (1989) showed how such an analysis could be used to study trends in the extreme
values of urban ozone, though in that paper without making any attempt to account for
meteorology.

     Davison & Hemphill (1987) analyzed the crossings of a single threshold level, taking
account  of missing values, through a generalized linear model for binary data. In their
analysis, temperature  was admitted as a covariate.  Shively (1991) adapted a  Poisson
process  viewpoint proposed by Smith  (1989) to incorporate a number of meteorological
variables, but still only for crossings of a single threshold level. In other words, his analysis
accounted for the probability of crossing a given threshold, but did not attempt to model
the actual values or excesses of those ozone readings above the threshold. This would be
adequate if it were possible  to formulate the entire study in terms of exceedances of  a
single threshold level, but in most cases, to calculate adequate extremal  properties, it is
necessary to extrapolate to higher thresholds than the one on which the statistical analysis
is based, and for that, we need to model the excesses as well.

     In  the present report, we apply and extend these techniques to a number  of data
sets  extracted from  the Chicago ozone study.  Specifically, we pick out three stations at
which ozone levels are high, and also consider daily maxima across the Chicago network.
A detailed description of the data used is in Section 2. Section 3 presents an analysis of
the exceedances of a single threshold.  In Section 4, this  is compared with the model of
Bloomfield et a].; essentially, our conclusion is  that there are difficulties  in applying the
Bloomfield model to determine probabilities of crossing a high threshold, which do  appear
to be resolved by the model we are proposing. Section 5 then extends the analysis to include
excesses over the threshold. In Sections 6 and 7, we propose some interpretations of these
results in the light of the broader scientific aims of the project. More specifically, Section 6
discusses ways of identifying extreme ozone days after making adjustments for meteorology.
This information may be useful in establishing control strategies or in suggesting ways that
the interpretation of the ozone standard might  be  modified to account for meteorology.
Section  7 presents a methodology  for establishing how extreme a particular year was, in

-------
the light of past patterns of meteorology — this is of particular interest with regard to the
year 1988, which  has seen the highest ozone levels of recent years. Section 8 presents an
alternative analysis,  based on sums of exceedances over a threshold rather than the daily
maximum. Finally, Section 9 summarizes the results of the report.
    2. The data

    The ozone data consist of hourly averages at 45 stations in the Chicago area. Detailed
description of the stations and their locations is given by Bloomfield et al. (1993). For the
present study,  it is of particular interest to focus on stations with high ozone levels.  Table
2.1 shows  the sample size (i.e. total number of complete days available) and the number
of exceedances of the daily maximum above the levels 120 and  150 ppb., for each of the
45 stations.
                 Table 2.1: Numbers of exceedances by station
Station       Sample  Exc.      Exc.
ID            size      of 120    of 150
Station       Sample  Exc.     Exc.
ID            size      of 120    of 150
170310001
170310003
170310009
170310032
170310037
170310038
170310044
170310045
170310050
170310053
170310062
170310063
170310064
170311002
170311003
170311601
170312301
170313005
170314002
170314003
170315001
170316002
170317002
655
446
330
970
1880
129
825
1222
2286
1062
49
130
467
1851
1515
1351
500
431
1595
1285
282
131
3448
4
1
1
13
8
1
2
2
5
4
0
0
2
6
15
7
1
3
12
5
0
1
41
                                0
                                1
                                0
                                2
                                3
                                0
                                0
                                0
                                1
                                2
                                0
                                0
                                0
                                1
                                1
                                1
                                1
                                0
                                1
                                0
                                0
                                0
                                11
170318003
170431002
170436001
170890005
170970001
170971002
170971003
170973001
171110001
171971007
171971008
180890011
180891016
180892001
180892002
180892008
181270020
181270021
181270024
181270903
181271004
181271005
164
765
2288
3284
3180
3269
275
3345
3374
1770
3192
190
1435
235
258
1056
686
596
1053
31
422
266
0
1
6
5
15
38
0
17
6
3
.7
0
8
1
0
11
7
7
23
0
8
0
                                0
                                1
                                0
                                0
                                6
                                9
                                0
                                4
                                2
                                0
                                1
                                0
                                2
                                0
                                0
                                2
                                2
                                1
                                5
                                0
                                3
                                0

-------
     Also computed were exceedances of the level 180 ppb., but only four stations had any
 exceedances here:  170971002, 180892008 and 181270024 had one each, and 170317002 had
 three. Based on these results, and taking sample sizes into account, three stations were
 selected  for detailed analysis: 170317002, 170971002 and 181270024.  In  the rest of the
 report these axe referred to as stations P,  Q and R, respectively. Figure 2.1 (based on
 Figure 1 of Bloomfield et. ai, 1993) shows these stations.

     In addition to the daily maxima from these three stations, we have  analyzed daily
 maxima for the whole network. Network maxima are not easy to define, because different
 stations  have been in operation for different periods of time, and there are many missing
 values.  However,  Bloomfield et.  al (1993,  Section 4.2) applied median polish kriging to
 interpolate missing values within a subset  of 16 stations and subsequently (Section 6.6)
 used the same data to construct network maxima based on those 16 stations.  We use the
 same constructed data set here.

     The meteorological data consist of surface weather data at a single station in Chicago.
 Data are available on an hourly  basis, but for the purpose of the present study, only the
 noon values are considered. Bloomfield et  al.  (1993) consider extending the analysis to
 incorporate lagged data (i.e.  using previous days' meteorology,  as  well  as that of the
 current day, as covariates in predicting the ozone level of a specific day) and to use upper
 air data, but these have not been brought into the present analysis.

     The measured meteorological variables are given in Table 2.2. Amongst these, TOT-
 COV and CHT were omitted from the analysis, TOTCOV because it is generally considered
 that OPCOV is  a  more relevant  measure of cloud cover for the ozone problem, and CHT
 because it is difficult to interpret (a missing value or 0 actually meaning that the ceiling is
 too high to be measured).  Both omissions are supported by statements in Bloomfield et
 al. (1993).


                  Table 2.2: Measured meteorological variables

 Variable name    Description

 TOTCOV        Total cloud cover (%)
 OPCOV         Opaque cloud  cover (%)
 CHT            Ceiling height  (m.)
 PR              Barometric pressure (mb.)
 T               Temperature (°F)
 TD              Dewpoint Temperature (°F)
 RH              Relative humidity (%)
 Q               Specific humidity (g./kg.)
 VIS              Visibility (km.)
WSPD           Wind speed (m./sec.)
WDIR           Wind direction (° from North)

-------
     In addition, several additional variables were created from the above, and axe described
 in Table 2.3.  These are mainly motivated  by the results in Bloomfield et al.  (.1993),
 who factored  windspeed into  directional components,  and also used up to cubic terms
 in temperature (the divisors 10 and 1000 being introduced into T2 and T3 to improve
 numerical stability of the coefficients). Finally, the inclusion of the T.WSPD variable is
 motived by the conclusion of Bloomfield et al.  (1993) that there is a strong  temperature-
 windspeed interaction, though in their case the actual model is highly nonlinear (see Section
 4) and it  is open to question whether a single cross-product term, as here, will  capture
 that effect.
             Table 2.3:  Additional variables created from the data

WIND.U         -WSPD x sin(2 x TT x WDIR/360)
WIND.V         -WSPDx cos(2 x TT x WDIR/360)
T2              (T-60)2/10
T3              (T-60)3/1000
T.WSPD         WSPD  x (T-60)
    In addition to the meteorological covariates, YEAR was used as a covariate (=1 for
1981, through to 11 for 1991), and two variables CDAY and SDAY, defined by

       CDAY = cos(2 x TT x DAY/365.25),   SDAY  = sin(2 x TT x DAY/365.25),

where DAY represents the day within the year (=1 for January 1, etc.). The inclusion of
CDAY and SDAY is intended to reflect the fact, also discussed by Bloomfield et al. (1993),
that there remains a residual seasonal effect even when the direct influence of season on
meteorology has been taken into account.
    3. Exceedances of a single threshold

    Suppose now we are concerned to model the probability, as a function of the covari-
ates, that the ozone on a specified  day exceeds  a specified threshold.  For most of the
discussion that follows, the threshold will be taken as the ozone standard 120 ppb, though
the methodology  applies to any high  threshold and, especially in conjunction with the
analysis of exceesses over the threshold (Section 5), there can be advantages in adopting a
lower threshold so as to capture more data.

    For the initial analysis we shall assume that separate days are independent.  Such
an assumption is not unreasonable, because it is widely believed that persistence of high-
ozone conditions has more to do with persistence of meteorology than with the ozone itself.
Nevertheless, the independence assumption is not exactly satisfied, and further on we shall
describe ways to get around it.

-------
     Under this independence assumption, the likelihood for the data may be denned by
where pi  denotes  the probability that the threshold is exceeded on day i, and 6i  is an
indicator of whether the threshold is in fact exceeded on day z (<$,• = 1 if the threshold is
exceeded, 0 otherwise).

     As a model for p;, we assume the familiar logit model:


                              log (-^-) = 5>^,                          (3-2)
                                 \l-pij   ^

where i,-j is the value of the j'th covariate on day i and /3y is the corresponding coefficient.
We always assume xn =  1, i.e. there is always a constant term in the model, but the other
covariates are selected from those described in Section 2.

     For a given set of covariates, estimation proceeds by maximum likelihood, i.e. choose
/?i,j02>-" to maximize L as defined by (3.1) and (3.2).  In most of the examples in this
paper, this has been achieved using Fortran programs employing the DFPMIN (Davidon-
Fletcher-Powell variable  metric function minimization)  algorithm of Press  et al.  (1986),
Chapter 10, to minimize the objective function — log L. The algorithm was converted to
double precision with EPS reset to 10"14,  and for most runs the convergence parameter
FTOL was taken  as 10~8 (the algorithm is taken to converge as soon as two consecu-
tive function evaluations are within FTOL). The algorithm requires first-order derivatives
of the objective function, as well as the function itself, but these have been adequately
approximated using  simple first-order differences (df/dx  w {f(x + <$) — f(x}}/f>, where
we have taken S — 10~6). The other modification of the published DFPMIN algorithm
was to retain the HESSIN matrix as an argument of the function. This matrix contains
an approximation  to the  inverse of the Hessian matrix of — log L.  In maximum likeli-
hood theory this is the observed information matrix, used as an approximation to the
variance-covariance matrix of the parameter estimates {/3j, j = 1,2,...}. In particular, the
square roots of its  diagonal entries define approximate standard errors for the parameters.
However, it should be remembered that these standard  errors are only an approximation
and that  alternative methods  of obtaining standard errors, for instance via jackknife or
bootstrap procedures, may well give better results. The procedure requires specification
of starting values,  but it  does  not appear to be sensitive to these and even starting from
/3j = 0 for all j produced convergence in all cases  tried, though sometimes rather slowly
(50+ iterations). The only other programming point that requires care is to guard  against
exponential overflow; a trap was built into the function minimization routine to test for
this, the objective function being set equal to a very large number (1010) if the test failed.

     For this specific model, specialized programs are not in fact required as  it  is a well-
known model for which a number of packaged routines exist, for instance the glm command

-------
 to fit generalized linear models in Splus, or the LOGISTIC procedure of SAS. Shreffler
 (1993) successfully used the latter to obtain a very similar analysis of data from St. Louis,
 Missouri. However, the more complicated models  to be discussed in Section 5 do not fit
 within the generalized linear models framework,  and so do require specialized program-
 ming.

     Stepwise selection of variables

     Although the maximization of the likelihood function is mechanical, given the covari-
 ates, the process of deciding which covariates to include is  not. The standard method of
 doing this is  a stepwise selection, in which the variables are introduced one at a time, the
 minimized values of the negative log likelihood — log  L, hereafter referred  to as NLLH,
 being used to decide which variables to introduce and when to stop.  This  is sometimes
 combined with, or replaced by, a backward selection procedure, in which initially too many
 variables are  included, and the least significant ones are then dropped until all the remain-
 ing variables  are significant. As a test of significance, one widely used measure is the t ratio
 defined as the value of the parameter divided by its standard error. As a rough guide, any
 variable with a t ratio greater than 2 in absolute value  is considered significant. However,
 as mentioned above, the standard errors obtained from the function minimization routine
 are only approximate,  and so  should not be used as  the sole criterion for including or
 exckiding any variable. As an alternative to t ratios, NLLH may be used directly. Here, a
 suitable rule  of thumb is that a variable is significant if its inclusion in  the model reduces
 the NLLH by more than 2. This is based on the approximate  x\  distribution for  the
 deviance statistic (twice the difference of NLLHs for the two models  being compared). If
 more than one variable is introduced at a time, then the x? distribution is replaced by x^,
 where q is the number of new variables introduced.

     As an indication of  how  these ideas are applied,  we now go in detail through  the
 selection of variables for Station P, threshold 119.9 (the threshold was set slightly below
 120 to ensure that a daily maximum of exactly 120 would be counted as an  exceedance).
 In other cases considered in this paper, a  similar procedure was followed, though in most
 cases, not in  so much detail.

     One other point to be noted is that there are certain combinations of variables which
 do not make  sense.  For instance, it would not make sense to include either of CD AY or
 SDAY without the other, since the relation between them depends on the  quite arbitrary
 decision to define DAY with respect to January 1, rather  than any other fixed date of
 the year.   Similarly, WIND.U and WIND.V go together,  but we  do not include both
 them and WSPD, since the latter is determined completely (albeit  nonlinearly) by  the
former. Also, we do not include the T.WSPD interaction term unless one of WSPD or  the
WIND.U/WIND.V pair is present.

    With these  considerations in mind, the detailed analysis is as follows.

    Step 1. Since YEAR can be considered as a trend variable, an initial analysis is made
using just YEAR as a covariate. This results in NLLH=221.740.

                                         8

-------
     Step 2. Various other variables were added to the model,  in each case  calculating
NLLH.  For example, when CDAY and SDAY were added together, NLLH dropped to
168.680. Other tries were OPCOV (NLLH=210.537), PR (221.722),  T (136.369), TD
(186.833), RH (202.383), Q (191.617), WSPD (206.610), WIND.U+WIND.V (220.636),
VIS (221.481). T is clearly the most significant, so it is added to the model.

     The model therefore now contains variables YEAR, T and has NLLH=136.369.

     Step 3. New variables added to model: CDAY+SDAY (133.455), OPCOV(135.040),
PR (134.204),  TD  (132.040), RH (132.045), Q (130.936), WIND.U+WIND.V+T.WSPD
(123.821), WSPD+T.WSPD  (123.369), VIS  (136.250).   The combination of variables
WSPD+T.WSPD gives the biggest reduction in NLLH and so is selected.  The model
now contains variables YEAR, T, WSPD, T.WSPD and has NLLH=123.369.

     Step 4. New variables: CDAY+SDAY (121.061), PR (122.925), OPCOV (122.076),
TD  (118.505), RH (118.438), Q(117.019), VIS (122.861).  Q selected.   The model now
contains variables YEAR, T, Q, WSPD, T.WSPD and has NLLH=117.019.

     Step 5. New variables: CDAY+SDAY (114.553), PR (117.018), OPCOV (116.936),
TD  (115.536), RH (115.514), VIS (116.972).   CDAY+SDAY selected.   The model now
contains variables YEAR, CDAY, SDAY, T, Q, WSPD, T.WSPD and has NLLH=114.553.

     Step 6. Once again PR (114.553), OPCOV  (114.323), TD (113.382), RH (113.545)
VIS (114.485) were added to the model but it is clear that none of these is significant, so
they were not included.

     Step 7. The model is now complete in terms  of the original variables tried, but there
are still some additional models to be considered. All the current variables appear to be
significant — the only one with a t ratio less than 2 in magnitude was SDAY (—0.9), but
as discussed above  it would not make sense to drop SDAY and retain CDAY, while the
latter has a t ratio of —2.04, which clearly indicates it should be retained. However, we
still have not used T2 and  T3. In fact, including  T2 reduces the NLLH  to 114.215 while
adding T3 as well creates no further reduction. Thus we do not include T2 and T3.

     Step 8. So far, we are still modeling the days  as independent.  One way to improve on
that assumption is to include an additional variable PDAY, defined to be 1 if the previous
day  is an exceedance (of the  same threshold) and 0 otherwise.  If the previous  day is
missing, this variable is simply omitted. Although this can be incorporated with minimal
change in the software used to fit the model,  the actual effect is to turn the model from
independent observations into a first-order Markov  chain — of which, more will be said
in Section 5. In this case,  adding PDAY reduces NLLH to 112.729, indicating that this
variable is borderline significant.  However, adding another variable PDAY2  (=1 if the
previous day but one was an exceedance, otherwise 0) does not have any further effect
(NLLH=112.618), indicating that the first-order Markov chain is adequate.

-------
     Step 9. A final step, considering that a major purpose of this whole analysis is to
determine the significance of the YEAR variable in the presence of the meteorological
factors, is to repeat the most important analyses without YEAR. For the model of Step 7,
if YEAR is omitted, NLLH rises from 114.553 to 117.503. For the model of Step 8, it goes
from 112.729 to 115.451. Either way, YEAR is still significant.

     The two principal models  that have been identified (with and without  PDAY) are
summarized in Tables  3.1 and 3.2.
Variable

CONST
YEAR
CDAY
SDAY
T
Q
WSPD
T.WSPD
Table 3.1: Station P, threshold 119.9 (NLLH=114.553)

        Estimate            Stand, error         t Ratio
        -38.58
        -.1574
        -2.995
        -.4992
        .4476
        -.2166
        .4800
        -.03308
4.837
.06184
1.471
.5331
.05628
.06017
.1781
.006211
-7.977
-2.545
-2.036
-.9364
7.954
-3.599
2.695
-5.326
   Table 3.2: Station P including PDAY, threshold 119.9 (NLLH=112.729)

Variable             Estimate            Stand, error        t Ratio
CONST
YEAR
CDAY
SDAY
PDAY
T
Q
WSPD
T.WSPD
        -36.66
        -.1548
        -2.857
        -.5030
        1.044
        .4248
        -.215
        .4402
        -.03144
5.142
.06739
1.557
.6109
.5261
.05992
.06278
.1872
.006639
-7.13
-2.297
-1.835
-.8233
1.984
7.089
-3.424
2.351
-4.735
    Similar analysis is applied to Station Q and Station R. The resulting models are the
following. Note that T2 is included among the variables for station Q. In neither of these
cases was PDAY significant.
                                       10

-------
Variable

CONST
YEAR
T
WIND.U
WIND.V
T2
T.WSPD
Table 3.3:  Station Q, threshold 119.9 (NLLH=99.317)

        Estimate            Stand, error         t Ratio
        -171.2
        -0.09339
        2.256
        0.02198
        0.05041
        -0.3335
        -0.02139
46.05   .
0.0699
0.6182
0.09156
0.09536
0.1056
0.005378
-3.72
-1.34
3.65
0.24
0.53
-3.16
-3.98
Variable

CONST
YEAR
T
PR
WIND.U
WIND.V
T.WSPD
Table 3.4: Station R, threshold 119.9 (NLLH=67.524)

        Estimate            Stand, error         t Ratio
        -25.76
        -.3851
        .2846
        .1660
        .2024
        -.1822
        -.01382
3.637
.1614
.04539
.06312
.09925
.08821
.006699
-7.083
-2.385
6.270
2.623
2.040
-2.066
-2.063
    If the variable YEAR is omitted, then NLLH rises to 100.359 in the case of station Q
and 70.166 for station R. This is still significant in the case of station R, but for station
Q, whether judged from the t ratio or by the increase in NLLH, the trend does not appear
to be significant.

    One possible disadvantage of this analysis is that the actual number of exceedances
(41, 38 and 23 respectively for stations P, Q and R),  although in excess of the standard,
is still rather small for making a definitive assessment of the trend in the light of so many
covariates.  We therefore consider also an analysis based on threshold 100, for which there
are far more exceedances (94, 84 and 55).  As we shall see in Section 5, it is still  possible
to construct estimates of the probability  of exceeding the level 120 (or any higher level)
by combining these results with models for the excesses over a threshold.

    Tables 3.5, 3.6 and  3.7 show the results of the best model, after repeating the entire
process of variable selection, for each of the three stations.
                                        11

-------
Variable
Variable
Table 3.5: Station P, threshold 99.9 (NLLH=220.315)




        Estimate            Stand, error        t Ratio
CONST
YEAR
CDAY
SDAY
PDAY
T
TD
Q
WIND.U
WIND.V
VIS
T.WSPD
-28.99
-0.08892
-1.572
-0.2333
1.192
0.3058
0.1805
-0.6315
0.06596
0.07411
-0.05836
-0.02171
3.684
0.0432
OJ19
0.317
0.3201
0.02899
0.09277
0.2178
0.04931
0.05648
0.02333
0.003469
-7.87
-2.06
-2.19
-0.74
3.72
10.55
1.95
-2.90
1.34
1.31
-2.50
-6.26
Table 3.6: Station Q, threshold 99.9 (NLLH=205.190)




        Estimate            Stand, error         t Ratio
CONST
YEAR
CDAY
SDAY
PDAY
T
TD
Q
WIND.U
WIND.V
T.WSPD
-34.84
-0.01963
-1.717
-0.4778
1.178
0.3056
0.2692
-0.7543
0.06855
0.1563
-0.02176
4.917
0.04365
0.8759
0.3835
0.3648
0.0327
0.1179
0.2675
0.05627
0.063
0.003745
-7.08
-0.45
-1.96
-1.25
3.23
9.35
2.28
-2.82
1.22
2.48
-5.81
                                      12

-------
             Table 3.7: Station R, threshold 99.9 (NLLH=106.066)

Variable             Estimate             Stand, error         t Ratio

CONST             -68.16              13.8                 -4.94
YEAR               -0.184              0.1278  '             -1.44
T                   0.8645               0.1639               5.27
RH                  0.2951               0.07919              3.73
Q                   -1.344              0.3067               -4.38
WIND.U             0.08165              0.07059              1.16
WIND.V             -0.1845             0.07138              -2.58
VIS                  -0.09188             0.03728              -2.46
T.WSPD             -0.01191             0.00487              -2.45

    If the variable YEAR is dropped, then the NLLH rises to 222.289, 205.289 and 107.202
respectively for stations P, Q and R. This is still significant in the case of P (just!), but still
not significant for Q and also no longer significant for R. There is thus some suggestion
that it is harder to establish significant trends when the threshold is lowered.  We shall
return to this point later (Section 5).

    One more issue is whether  a linear trend is adequate — for instance, should we be
using a quadratic trend? This can be tested  by adding a new variable YEAR2, defined as
the square of YEAR, and repeating the above analysis.  In only one of the cases considered
so fax was this found to be significant, namely for station P at threshold 119.9, but in view
of its  importance for later discussion, we give the full result here:
                                                                             »

   Table 3.8: Station P including YEAR2, threshold 119.9 (NLLH=110.410)

Variable             Estimate             Stand, error         t Ratio

CONST              -41.45              5.274                -7.86
YEAR               .7496                .3505                2.14
YEAR2              -.0782              .02997               -2.61
CDAY               -3.07               1.576                -1.95
SDAY               -.4892              .6051                -.81
T                   .461                 .05928               7.78
Q                   -.2318              .06393               -3.63
WSPD               .468                 .1859                2.52
T.WSPD             -.03463              .006423              -5.39
                                       13

-------
     Comparison with a Poisson process formulation

     Shively (1991) approached exactly the same problem,, but using slightly different me-
teorological variables and data from Houston instead of Chicago, from a Poisson process
viewpoint. This was apparently motivated by the procedure in Smith (1989), which itself
was motivated by the fact that much of the modern probabilistic theory of extremes (Lead-
better et al. 1983, Resnick 1987) uses point process formulations as the most natural way
of looking  at the exceedances of a high level.  Indeed, this approach unifies the  different
approaches towards the limit theorems of extreme value theory.
     A nonhomogeneous Poisson process on (0, oo) with intensity function A(t),  t > 0,
is defined by the property that the probability of an "event" (in the present  context, an
exceedance of the threshold) in a short time interval (t, t + 8t) is of the form \(t)8t + o(St},
the probability of more than one such event being o(5t), and such events are independent
over different time  intervals.  For such a process, the number of events observed in any
finite time interval, (£1,^2) sav> has a Poisson distribution with mean
jf
«/11
                                         \(t}dt.
The intuitive reason  why we might expect such a process to  be relevant to high-level
exceedances of ozone stems from the fact  that exceedances are "rare events" (at least,
they are if the threshold is  high enough),  and if we also assume that the independence
assumption is at least approximately satisfied, then a nonhomogenous Poisson process is
the natural model which results.

    Suppose now we  have such a process with intensity function \(t; /?) depending on an
unknown parameter vector /3. Suppose we observe the process on an interval (0, T) and in
that time observe N events (a random number) at times T\ , ..., TV. The likelihood function
to be maximized is given by

                            N              (    rT        ]
                       L = T[\(Ti;/3)-exp\-    \(t;fidt\                    (3.3)
                            i=i             (  Jo          J

For a careful but non- measure- theoretic derivation of this, see Section 3.3 of Cox & Lewis
(1966).  A modern and mathematically rigorous treatment of inference from point process
is the book by Karr (1986).

    Shively (1991) modeled the  dependence on the covariates through a logarithmic link
function of the form
                                                0-ft,                           (3.4)
where xtj is the value of the j'th covariate at time t.  Using a discrete approximation to
the integral in (3.3), he was then able to define a likelihood function for the parameters
/3j, j = 1,2, ..., and so to obtain maximum likelihood estimates.

                                        14

-------
     An  alternative viewpoint on this approach,  however, is to note  that the nonhomo-
 geneous  Poisson process, with each of the covariates  being defined on a daily basis, is
 equivalent to assuming that the number of exceedance  on  day i, for each i within  the
 observed period, has a Poisson distribution with mean A^, which we may write, imitating
 (3.4), in the form

                                  log A,- =    ^/3,.                             (3.5)
If 6i denotes the observed number of exceedances on day i, then the likelihood is propor-
tional to
                                             e-Ai.                              (3.6)


Of course, in reality Si will only be 0 or 1, since we are applying these ideas to the daily
maxima.
     Comparing the likelihood defined by equations (3.1) and (3.2) with that defined by
(3.5) and (3.6), the difference is seen to be the following: defining A±  = Y^jxijPj
component of the likelihood arising from day i in the binomial model is

                                       eAiSi
whereas that arising in the Poisson model is

                                              Ai
Thus, going from the binomial model to the Poisson model amounts to replacing 1 + eAi
by exp(eAi).  This will make little difference if p,- or A; are uniformly small, i.e. if we are
genuinely in the situation where we might expect the Poisson and binomial distributions
to be indistinguishable, but otherwise the two models could produce substantially different
results.
    To make a comparison between the two kinds of model, each of the models in Tables
3.1-3.7 was refitted, both with and without the YEAR variable, under this alternative Pois-
son assumption. The results  are summarized in Table 3.9. Note  that the model selection
process was not repeated, though it is possible that the optimum  choice of variables under
the Poisson model will be different from that under the binomial model. The results do
not change any of our broad  conclusions about the significance of YEAR, but quite a few
of the individual parameter estimates and their levels of significance were changed under
the new model. Table 3.10 gives one instance of this, in which the complete results for the
model of Table 3.2 are repeated under the Poisson assumption.

                                         15

-------
             Table 3.9: Neg. log.  likelihoods under Poisson model

 Model of Table             NLLH with YEAR          NLLH without YEAR
 3.1
 3.2
 3.3
 3.4
 3.5
 3.6
 3.7
       122.979
       122.487
       105.784
       71.571
       248.234
       226.114
       133.627
              125.788
              125.181
              106.446
              74.118
              249.952
              226.276
              134.319
          Table 3.10:  Model of Table 3.2, under Poisson assumption

Variable            Estimate            Stand, error         t Ratio
CONST
YEAR
CDAY
SDAY
PDAY
T
Q
WSPD
T.WSPD
-28.11
-0.1405
-2.241
-0.4288
0.4277
0.3157
-0.1535
0.2571
-0.02072
5.443
0.06103
1.311
0.5317
0.4212
0.06302
0.05165
0.3325
0.0118
-5.17
-2.30
-1.71
-0.81
1.01
5.01
-2.97
0.77
-1.76
    These results suggest that the comparability between the binomial and Poisson models
is not as good as had been previously supposed.  The binomial model is preferred, both
on the reported NLLH values and because it more directly reflects the nature of the data,
being based on daily maxima so that by definition there is no more than one exceedance
per day.  The reason for the difference between the  models would seem to be that the pi
or Aj are not  uniformly small. There are some days when the weather will be conducive
to a high ozone level and for these days  the difference between the Poisson and binomial
distributions is important.

    For this reason, we do not pursue the Poisson model any further in the present paper,
though in order  to establish a firm framework for future studies of the same nature, it
would seem to be important to continue  to consider both approaches.
                                       16

-------
     Analysis of network maxima

     As explained in Section 2, we have also considered model fits to a data file of network
maxima  constructed from a subset  of 16 ozone stations (April-October only), using a
median polish technique  to interpolate missing values (Bloomfield et al.  1993, Section
6.6).  The same analysis  was applied to these data,  using a threshold 119.9.  The best
model found is given in Table 3.11. Note that YEAR,. YEAR2 and PDAY are all included
in this analysis  and all appear significant as judged by the standard errors.

                 Table 3.11: Network maxima, threshold 119.9

Variable             Estimate            Stand, error         t Ratio

CONST             -22.9               2.357               -9.72
YEAR              .3416               .1632               2.09
YEAR2             -.04268             .01395              -3.06
CDAY              -.8855              .7036               -1.26
SDAY              .5628               .2911               1.93
PDAY              .6829               .2736               2.50
T                  .2928               .02747              10.66
RH                 -.01799             .01058              -1.70
WIND.U            -.03688             .04011              -.92
WIND.V            -.1887              .0449               -4.20
VIS                 -.06959             .02065              -3.37
T.WSPD            -.01945             .002992             -6.50


    As a comparison of different models, Table 3.12 gives the NLLH values for six models
in which  PDAY is either present or absent, and there is no trend, a linear trend (YEAR)
or a quadratic trend (YEAR and YEAR2).

                Table 3.12:  Network maxima, threshold 119.9
                      Comparison of models using  NLLH

Trend/PDAY               No                        Yes

None                 '    319.299                   313.211
Linear                     309.023                   305.025
Quadratic                  302.984         •          299.934

    Table 3.12 reinforces the conclusion that YEAR, YEAR2 and PDAY are all significant.
However, adding PDAY2 did not improve the fit.

    Our  results up to this point reinforce the overall conclusion that there has been a
downward trend in the rate of ozone  exceedances, after accounting for meteorological

                                       17

-------
 effects, during the period of study. In the two cases (station P and the network maxima)
 where we found evidence of a quadratic trend, the fitted model supports a slight increase
 in ozone levels up to 1984 or 1985, followed by a decrease in the subsequent years.

     Predictive assessment of model fit

     An important part  of any statistical modeling is to be able to assess the fit of the
 model adopted.  In the case of continuous data, there are a number of approaches, both
 informal ones such as graphical plots of residuals, and more formal procedures such as the
 Kolmogorov-Smirnov and Cramer-von Mises tests, which are widely used.  For inhomoge-
 neous binary data, such  as we have here, these approaches are not so easily applied.

     Nevertheless, there is an extensive literature of the subject of probability forecasting,
 well exemplified by the references Dawid (1982,  1984, 1986), DeGroot  & Fienberg (1983)
 and Seillier-Moiseiwitsch & Dawid (1993).  The usual paradigm is weather forecasting.
 Each day, a forecaster quotes the percentage probability  of rainfall, in  practice invariably
 expressed to the nearest 10%. After many days, we are  able to compare the sequence of
 forecasts with the sequence of binary  variables representing  the observation of whether or
 not it rained on each day. How should we assess how well the forecaster is performing?

     One criterion is that the forecaster should  be  well-calibrated.  This means that  ob-
 served frequencies should be close to forecast probabilities. For example, if we were to look
 at all the days on which the forecaster had quoted a 30% chance of rain, then the actual
 frequency of rain on those days should be close to 30%.

     However, being well-calibrated is  not enough to make a good forecaster. For example,
 a forecaster could be perfectly calibrated by making exactly the same forecast every day.
 If this forecast probability  matched  the true long-run probability of rainfall, then  the
 forecaster would indeed be well calibrated, but of course such a forecast would be entirely
 useless in practice.

     A second criterion is resolution (also called refinement). This is  a measure of how
 well the forecaster discriminates.  A forecaster who  always quoted the probability of rain
 as either 0 or 1 would have maximum resolution, while one  who always quoted the same
 probability would have the minimum.

     It is not  too  difficult to measure how  well a forecaster is calibrated.  Chi-squared
goodness of fit statistics can be constructed based on groupings of forecast values, and
 we shall give examples of this below.  The main technical difficulty associated with this
 method is the justification of an  asymptotic chi-squared distribution when the forecasts
 are sequential and possibly dependent on past data.  However, much is now known about
 this problem, cf. Seillier-Moiseiwitsch & Dawid (1993).

     Resolution is much  harder to measure. Dawid  (1982, 1986) argued that a  criterion
for a forecaster to be both well-calibrated and to have good  resolution should be that  her

                                         18

-------
 forecasts remain well-calibrated when restricted to any subsequence of the data, subject
 to a selection rule that requires the decision whether or not to include a particular day in
 the subsequence should depend only on the information available to the forecaster herself,
 which in most  cases means the data available from previous days.  DeGroot & Fienberg
 defined rules for comparing two forecasters in terms of whether one was more refined than
 the other. It is not so easy, however, to construct specific tests based on these concepts.

     An older idea is that of scoring rules.  A scoring rule is simply a measure or score
 of how well  a  forecaster  is performing.  There  are a number of such rules known, but
 the best known are the Brier scoring rule ]T)(a,- — pj)2 and the logarithmic scoring rule
 ^{—cii logp,- — (1 — a;) log(l — Pi)}.  Here a,-  is the observed value (0 or 1) on day i. These
 are both proper scoring rules, which have the property that if there is such a thing  as a
 "true model" generating  the data, then the minimum score will be achieved by quoting
 the probabilities given by that model.

     In the context of the present study,  prediction of ozone exceedances is not one  of
 our primary  aims.  We are much more concerned with  long-term issues such as whether
 there is a trend in the data or whether a particular year such as 1988 represents unusual
 conditions when judged against long-term climatology. Nevertheless, the models that have
 been used in this section are very much  of a predictive nature, since  they allow us to
 quote  a "probability of exceedance" based on current weather conditions and (if PDAY
 is included) past ozone exceedances.  Therefore, we  can use ideas from the probability
 forecasting literature to assess their goodness of fit. In the models  with PDAY,  this is a
 genuinely sequential problem, so we do need the recent theory of sequential tests developed
 by Dawid (1984)  and Seillier-Moiseiwitsch & Dawid (1993).

     There is, however, one further complication in  applying these ideas here. The litera-
 ture we have described is  concerned with an "honest" forecasting procedure in which the
 forecast for day i depends only on data observed preceding day i.  This is not the case if
 we are considering a parametric model whose parameters have been estimated from the
 whole  data set. The issue is similar to the familiar one in goodness of fit  testing, that
 tests constructed on the assumption that the model is known are not valid without some
 adjustment if the model depends on parameters which are estimated from the data.
                                             i
     One version  of this problem has been considered by Seillier-Moiseiwitsch (1993) for
the case of a linear logistic model. She considers a procedure in which, before making a
forecast for day i, the parameters of the model are re-estimated based on all the data up to
time i — I. Under this set up, the  asymptotic properties of the procedure are identical to
those of a sequential forecasting scheme as in Seillier-Moiseiwitsch & Dawid (1993).  The
 difference between the two papers is that the analysis  of Seillier-Moiseiwitsch & Dawid
(1993) is based on the null hypothesis that the sequential forecasting scheme is indeed
the exact model  that generated the data, whereas Seillier-Moiseiwitsch (1993)  assumes
that the parametric model is correct and makes explicit allowance for the fact that the
parameters at each stage are estimated.

                                        19

-------
     In the present context, it is not practicable to update the model after every obser-
 vations but we have employed a variant which seems to achieve the same effect: for each
 year, the model was refitted to the data omitting that year and then used to produce prob-
 ability forecasts for the year. All the resulting forecasts were then combined to assess how
 well they performed on the observed data on threshold exceedances. Thus the analysis is
 honest in the sense that no forecast probability of exceedance depends on the exceedance
 itself,  or on nearby correlated values.

     As an example of these ideas, Table 3.13 is concerned with the calibration of proba-
 bility forecasts produced by the model of Table 3.1. This table is very similar to Table 1 of
 Seillier-Moiseiwitsch & Dawid (1993). Each row of the data represents a specific interval of
 forecast  probabilities, denoted (pmin,Pmax\- The frequency n denotes the number of days
 for which the forecast lay in that  interval, and r is the observed number of exceedances
 based  on forecasts within the  interval.  The next value e is the expected number of ex-
 ceedances  Y,Pi I(Pmin  < Pi  < Pmax) and w is its variance £p;(l - Pi)  I(pmin < Pi <
 Pmax]t under the assumption  that the pi  do indeed represent true forecast  probabilities.
 The final column gives the test statistic z = (r — e}/^/w. In large samples, these will have
 approximately standard normal distributions, independent for non-overlapping probability
 intervals. The main difference from Seillier-Moiseiwitsch & Dawid (1993) is that we have
 used unequal probability intervals  to reflect the fact that the forecast probabilities are
 heavily weighted towards 0. Finally, the last row of Table 3.13 gives an overall assessment
 of calibration by combining the intervals together.
Pmin

0.000
0.050
0.100
0.300
0.500

0.000
                 Table 3.13:  Calibration of probability forecasts
            Station P, threshold 119.9, linear trend without PDAY
0.050
0.100
0.300
0.500
1.000

1.000
n

3210
76
72
20
14

3392
                                                            w
9
2
17
9
4

41
8.602
5.508
11.182
8.042
9.084
8.406
5.095
9.302
4.723
3.091
42.418
30.617
0.137
-1.554
1.908
0.441
-2.892

-0.256
    Considering the z values in the last column, it can be seen that there is some indication
of significant values in the third and fifth rows. Computing ^ z2 based on the first 5 rows,
we obtain 14.63, which is significant against the x\ null distribution (5% point 11.07, 1%
point 15.09). Thus there is some indication that the model is not  well calibrated.

    The same calculation has been repeated for the models with no trend, a quadratic
trend, and a linear  trend including PDAY. The overall pattern of results was similar the
Table 3.13 in each  case, but the values of ^z* were  respectively 3.35, 14.27 and 7.62.
                                         20

-------
 Thus it seems, contrary to our earlier evidence, that the model with no trend is the best
 calibrated amongst these four, but also that the model with linear trend and PDAY passes
 the test.  For the moment we treat these result cautiously since it  is not at all clear how
 sensitive a test this is (for example, we do not know how  sensitive it is to the grouping
 intervals). The other values for the overall z value (Last row) are 0.137, —0.090 and —0.245,
 none of them remotely significant.

     A second way of classifying the data is by year, Table 3.14 shows a table constructed
 similarly to  Table 3.13 but where the rows represent individual years of data.  Recall that
 each year's entry is based on a model fit excluding that year's data. The z values suggest
 that the model significantly  underpredicted in 1988, and (surprisingly) overpredicted in
 1991. The other clear feature of the z values is that they go from negative to positive and
 back to negative - it was in fact this feature which induced us  to consider the quadratic
 trend.
            Table 3.14: Calibration of probability forecasts by year
            Station P, threshold 119.9, linear trend without PDAY
Year

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
n

268
280
307
329
324
324
323
328
331
317
261
2
1
7
4
3
2
6
14
1
0
1
2.775
3.274
10.209
2.507
1.648
1.690
3.130
8.466
2.726
0.677
5.316
w

2.022
2.754
6.164
2.320
1.553
1.588
2.791
5.293
2.095
0.653
3.384
-0.545
-1.370
-1.293
0.980
1.085
0.246
1.718
2.406
-1.193
-0.838
-2.346
    Table 3.15-3.17 show the corresponding results for no  trend and for the quadratic
trend without PDAY, and for the linear trend with PDAY. For no trend (Table 3.15), the
individual z values again do not look too bad, except for 1991, but there is a clear trend.
Except for 1982, the z values  are all positive at the beginning and negative at the end,
and this reinforces the fact that we really do need a model  with trend to fit these data
adequately.  For the quadratic trend (Table 3.16), there is no evident trend and the only
apparently significant z value is the first (1981),  though since this is based on a very small
expected number of exceedances  it should probably not be taken too seriously.  Finally,
the linear trend with PDAY (Table 3.17) shows similar results to Table 3.14 but generally
a slightly better fit - in particular, the z values for 1988 and 1991, though still the largest
two in absolute value, are reduced compared with Table 3.14.
                                        21

-------
           Table 3.15: Calibration of probability forecasts by year
               Station P, threshold 119.9, no trend, no PDAY

Year         n            r            e           w            z
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
268
280
307
329
324
324
323
328
331
317
261
2
1
7
4 .
3
2
6
14
1
0
1
1.400
1..664
6.168
1.941
1.483
1.766
3.851
9.974
3.821
1.292
6.891
1.199
1.531
4.447
1.832
1.410
1.664
3.375
6.258
2.812 •
1.219
4.226
0.548
-0.537
0.395
1.521
1.278
0.181
1.170
1.609
-1.682
-1.170
-2.866
           Table 3.16: Calibration of probability forecasts by year
         Station P, threshold 119.9, quadratic trend without PDAY

Year         n           r            e            w            z
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
268
280
307
329
324
324
323
328
331
317
261
2
1
7
4
3
2
6
14
1
0
1
0.509
1.842
10.833
3.390
2.727
3.020
4.862
10.168
2.573
0.287
1.276
0.473
1.635
6.054
3.050
2.469
2.697
4.047
5.760
1.833
0.282
1.062
2.168
-0.659
-1.558
0.349
0.174
-0.621
0.566
1.597
-1.162
-0.541
-0.268
                                     22

-------
            Table 3.17: Calibration of probability forecasts by year
              Station P, threshold 119.9, linear trend with PDAY
Year
              n
                                                       w
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
268
280
307
329
324
324
323
328
331
317
261
2
1
7
4
3
2
6
14
1
0
1
2.514
3.088
10.079
2.653
1.903
1.604
3.085
9.589
2.398
0.632
4.792
1.932
2.664
5.494
2.390
1.734
1.519
2.755
5.481
1.971
0.613
3.180
-0.370
-1.279
-1.314
0.871
0.833
0.321
1.756
1.884
-0.996
-0.807
-2.127
    Finally, we consider the Brier scores.  For the four models of Tables 3.14-3.17, the
Brier scores are 33.95 (0.97), 34.63 (1.00), 31.50 (0.55) and 34.36 (1.15). Here the figures
in parentheses are z values for the Brier scores, corresponding to the quantity called Y^  in
Seillier-Moiseiwitsch & Dawid (1993), page 359. None of these values are significant, but
on the other hand this is just another measure of calibration.  Of more interest than the
z value is the Brier score itself, and we note again that the model with no trend appears
to do better than either of the models with a linear trend, but from this point of view the
quadratic trend model really does perform the best of the four.

    Now let us consider the same analysis based on network maxima. For the best model
we found (quadratic trend including  PDAY),  the  calibration results are summarized in
Tables 3.18 and 3.19.  In Table 3.18, the forecast probabilities have been classified into
a greater  number of intervals to reflect the greater number of  exceedances. The only z
value to look remotely significant is that for the first row,  but  the overall value of ^z2
(9.51) is clearly not significant against its nominal Xio distribution.  Table 3.19 is a little
more disturbing since the model has  clearly overpredicted  the exceedances for  1989 and
greatly underpredicted for 1991, though for the most important years, 1983  and 1988,
the agreement  of observed  and predicted  numbers of exceedances  is remarkably good.
The  underprediction for 1991 should make us cautious about extrapolations  based  on
the quadratic trend, especially when contrasted with the overprediction for 1991 that  we
observed in Table 3.14 when using a linear  trend for station P.
                                         23

-------
                Table 3.18: Calibration of probability forecasts
        Network maxima, threshold 119.9, quadratic trend with PDAY
0.000
0.025
0.050
0.075
0.100
0.200
0.300
0.400
0.500
0.750

0.000
f max

0.025
0.050
0.075
0.100
0.200
0.300
0.400
0.500
0.750
1.000

1.000
n

1660
182
102
65
132
70
45
28
46
24

2354
                                                          w
13
10
3
5
18
21
15
11
26
21

143
7.724
6.574
6.282
5.640
18.949
17.692
15.832
12.629
28.813
20.889
7.621
6.327
5.890
5.147
16.131
13.161
10.227
6.913
10.551
2.614
1.911
1.362
-1.352
-0.282
-0.236
0.912
-0.260
-0.620
-0.866
0.069
141.024
84.581
0.215
            Table 3.19: Calibration of probability forecasts by year
       Network maxima, threshold 119.9, quadratic trend with PDAY
Year

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
  n

  214
  214
  214
  214
  214
  214
  214
  214
  214
  214
  214
    8
    11
    26
    18
    13
    9
    18
    24
    3
    3
    10
      11.119
      11.129
      28.326
      12.870
      8.398
      10.058
      19.274
      23.433
      11.909
      2.207
      2.301
        w

        7.642
        7.847
        12.358
        8.695
        6.597
        7.907
        10.312
        11.509
        7.750
        2.026
        1.938
          -1.128
          -0.046
          -0.662
          1.740
          1.792
          -0.376
          -0.397
          0.167
          -3.200
          0.557
          5.530
    This analysis was repeated for all six models in Table 3.12, and in all six cases the
basic calibration table was similar to Table 3.18, with no significant discrepancies. For the
analysis by year, the model with no trend showed a distinct pattern of z values similar
to that for station  P, with a significant underprediction of exceedances in 1984 and  1985
and a significant overprediction  in 1989 and 1991.  The model with  linear trend again
showed a pattern of negative z values at the beginning, then positive, and then negative
again, with the most significant z values being -2.823 (1981), 2.546 (1984), 2.886 (1985)
                                        24

-------
and —3.192 (1989). These results axe ail for models including PDAY; the results without
PDAY showed identical patterns in each of the three cases, but generally more extreme z
values. Finally, the Brier scores and associated z vzilues were:

            No trend, no PDAY                       94.82       (0.52)
            No trend, include PDAY                   92.54       (0.52)
            Linear trend, no Pday                     93.28       (0.40)
            Linear trend, include PDAY               91.63       (0.40)
            Quadratic trend, no PDAY                 90.81       (0.75)
            Quadratic trend, include PDAY            89.67       (0.86)

     These results confirm the pattern that  the model improves with increasing order of
trend, and is better including PDAY than omitting it.

     The logarithmic scoring rule has also been considered, but does not seem to produce
satisfactory results.  Seillier-Moiseiwitsch & Dawid (1993)  comment on the difficulty of
applying this rule when many of the forecasts are near 0 or 1. Our situation is better than
theirs, since we are not obliged to deal with probabilities rounded to the nearest 0.1, but
nevertheless we suspect that the results are highly sensitive to a few forecast probabilities
very close to 0 or 1.

     In Figure 3.1, we have shown the observed numbers  of exceedances  for station P
together  with  the  expected exceedances  under each of the models with no trend, linear
trend and quadratic trend. Figure 3.2 shows the |same thing for the network maxima. To
avoid the picture becoming too cluttered, we haye shown only the models without PDAY
in the case of station P, and only the models with PDAY in the case of annual maxima.
    4. Comparison with the nonlinear regression approach

    A natural question is how the preceding analysis ties in with the nonlinear regression
approach of Bloomfield et al. (1993). The purpose of this section is to investigate that.

    Bloomfield et al.  (1993) constructed weighted averages of ozone over a set of the
monitoring stations, and  then fitted a nonlinear regression model.  The model used in this
section does not correspond exactly to theirs:  we have used a slightly different parametriza-
tion to improve numerical stability, and have not included all the variables  they did. For
example, we have not included any lagged meteorological variables in our approach, and
with some of the ones  that are common to the two models, our definition differs slightly
from theirs, for  example, by being based on noon values rather than daily averages or
maxima. An additional  feature is that, in view of earlier results, we have included the
possibility of a quadratic  trend (not just linear). The model as we consider it is as follows:
                                        25

-------
    03 =
m0
x WSPD
- 60)
                                       - 60)2/1000
                                                                  - 60)3/1000
                                   1 + vh x WSPD/100
                                                                    7
                 r(RH - 50) 1
                    1000   J
                 op(OPCQV - 50)
                      1000
           1000
        mu x WIND.U + mv x WIND.V }
                    1000              J
  J     yi (YEAR -1985)   y2(YEAR-1985)2
*\            10       +        1000
        /2?r x DAY\      .  /2?r x DAY\
+ Ol °°S V  365.25  J +  l Sm V  365.25  )
        /47rxDAY\   ,  .  /47TxDAY\
+ °2 C°S (,  365.25  J + &2 Sm (  365.25  J
+ random error.
                                                                              (4.1)
    Here we axe following the convention of letting lower-case italic letters denote param-
eters, while upper case Roman letters are the covariates, defined as in Section 2. Although
Bloomfield et al. proposed this model for weighted averages of ozone, it seems reasonable
that it would apply to ozone from a single station as well. The main feature of the model is
the complicated interaction term between temperature and windspeed, intended to reflect
the facts that (a) ozone is a nonlinear function of temperature, the slope increasing as tem-
perature rises, (b) high winds tend to nullify the effect of temperature. In addition, there
are multiplicative terms reflecting the influence of the other meteorological variables, and
additive terms reflecting seasonal  effects. This form of model was determined by Bloom-
field et al. after extensive initial examination of the data, and subsequently confirmed to
be a suitable fit.

    For this section we consider the network maxima first, and fit  the model  (4.1) to the
data set, by nonlinear least squares, assuming the random errors to be independent normal
with mean 0 and variance a2. Also, initially we assume just a linear trend (t/2 =  0).  This
produces the parameter estimates  shown in Table 4.1. In this and subsequent  models fits,
we quote the logarithm of the scale parameter, log cr in place of a,  as this has again  been
found to improve numerical stability.  The results are comparable to Table 9 of Bloomfield
et al.  (1993) — although using slightly different covariates from theirs, there is not much
difference in the overall R2 (.64  as against their .67).
                                        26

-------
                 Table 4.1:  Nonlinear regression model of (4.1),
                  Normal errors, fitted to all network maxima

Parameter           Estimate            Stand, error         t Ratio

m0                  56.38                2.888   .            19.52
mi                  13.28                1.6                  8.30
ti                   2.003                0.154                13.00
i2                   63.31                6.141                10.31
t3                   -0.1989              0.221                -0.90
vh                   21.87                2.239                9.76
r                    -2.933                0.4282               -6.85
Op                   -0.8051              0.171                -4.71
v                    -9.19                0.8704               -10.56
mu                  -3.268                1.45                 -2.25
mv                  -3.013                1.519                -1.98
yi                   -0.1317              0.01566              -8.41
01                   -5.53                2.289                -2.42
61                   6.715                0.9121               7.36
a-i                   -1.443                1.115                -1.29
62                   0.3204               0.9041               0.35
log a                2.788                0.01471              189.50
     A natural question to  ask  is to what extent this model adequately represents the
probability of crossing a high threshold.  To study this, first let us rewrite equation  (4.1)
in the form
                                03(») = /(*.•; 0) + z»                            (4.2)

where 03(1) represents observed daily maximum ozone on day i, x,- is the vector of covari-
ates on day i, fi  is the vector of unknown parameters in (4.1),  and the Z{ are assumed
independent N(0,cr2). Then the probability p,- of exceeding a high threshold u, on day i,
is given by
                                                                                (4.3)
where $ is the standard normal distribution function. The likelihood function is defined
by (3.1), with p,- defined by (4.1)-(4.3).  With u == 119.9  as before, on the basis of the
parameter values defined in Table 4.1, NLLH turns out to  have the value 357.787, which
is completely uncompetitive with any of the values quoted for this data set in Table 3.12.

    However, this should not be surprising.  We have chosen a high threshold corresponding
to less than 5% of the data set; it is only natural that the model should have to be refitted
to obtain good results at this threshold level. What we might expect is that the form of

                                         27

-------
 the model will be an improvement on the linear dependence on covariates of the models
 in Section 3.

     With this in mind, we have re-estimated the parameters by maximizing the likelihood
 defined by (3.1) and (4.1)-(4.3), using just the binary information on exceedances over u.
 This leads to NLLH=308.152, which is competitive with, but still not as good as, the best
 of the fits given in Section 3. The parameter estimates under this refitted model are given
 in Table 4.2, and it can be seen that they are completely different from those in  Table 4.1.
 In effect, what we have here is a nonlinear probit model.
                 Table 4.2: Nonlinear regression model of (4.1),
                Normal errors, exceedances over threshold 119.9
                                  NLLH=308.152
Parameter
m
m
y\
0-2
fc
logo-
Estimate

103.6
6.163
0.5609
16.7
0.005362
7.575
-0.5859
-0.05895
-2.738
-1.497
-8.071
-0.0654
2.197
3.186
2.33
0.7938
2.12
Stand, error

42.69
7.5
1.417
45.93
0.298
4.018
1.735
0.2791
7.706
5.116
22.63
0.1829
11.2
8.811
7.283
3.187
2.61
t Ratio

2.43
0.82
0.40
0.36
0.02
1.89
-0.34
-0.21
-0.36
-0.29
-0.36
-0.36
0.20
0.36
0.32
0.25
0.81
    An alternative model widely used in binary data analysis is based on the logistic distri-
bution, i.e. replace $(•) in (4.2) by $(•), where $(x) = ex/(\ + ex}. In fact this is precisely
the logit model of Section 3, but with a nonlinear instead of linear regression function.  Op-
timizing this form of the likelihood produces a NLLH of 307.763, with parameter estimates
shown in Table 4.3.
                                        28

-------
                 Table 4.3:  Nonlinear regression  model of (4.1),
                Logistic errors, exceedances over threshold 119.9
                                  NLLH=307.763

Parameter           Estimate             Stand, error          t Ratio

m0                  118.4               .4.196                28.23
ml                  3.939                4.213                0.93

-------
 seemingly performing badly from every other criterion, is competitive from the point of
 view of Brier score, a result for which we have no ready explanation.
                 Table 4.4: Nonlinear regression model of (4.1),
                Logistic errors, exceedances over threshold 119.9
                    Include quadratic term: NLLH=301.613

Parameter           Estimate            Stand, error         t Ratio

m0                  118.8               2.86                 41.52
mi                  5.21                4.267                1.22
ti                   0.04127             0.1042               0.40
*2                   1-175               2.946                0.40
*3                   -0.01565             0.04044              -0.39
vh                   4.473               3.756                1.19 .
r                    -0.03539             0.09017              -0.39
Op                   -0.002989            0.01427              -0.21
v              •      -0.1584              0.3854               -0.41
mu                  -0.1022              0.2649               -0.39
mu                  -0.44                1.068                -0.41
yi                   -0.001986            0.005009             -0.40
y2                   -0.1007              0.2473               -0.41
ai                   -0.1613              1.014                -0.16
61                   0.2046               0.5507               0.37
ai                   0.03245              0.3326               0.10
62                   0.04814              0.2094               0.23
logo-                 -1.361               2.414                -0.56


    For station P, the NLLH values for the nonlinear probit and logit models, correspond-
ing to Tables 4.2-4.4, were 110.407, 109.014  and 103.292.  Since the model based on a
linear trend does not depend explicitly on past values (such as PDAY),  the appropriate
comparison is with Table 3.1, where we had NLLH=114.553.  Thus, the fit does appear to
be better under  the nonlinear models. Under the quadratic fit, the model improves even
more.  Note, however, that this is at the cost of introducing more parameters into  the
model. In the present case it appears possible to remove ti  and t3 without significantly
worsening the fit, but no other parameters.

    The predictive diagnostics again showed  that the nonlinear model in which the  pa-
rameters are not re-estimated significantly underpredicts the total number of exceedances:
e = 13.050 against r = 41, resulting in a highly significant z = 10.4, whereas the nonlinear
probit model had e = 48.375 which is not a significant discrepancy  (z = —1.35). The
calibration by year for the probit model resulted in Table 4.4, which is to be compared
with Table 3.14 for the corresponding linear-logistic model:

                                        30

-------
Year
             Table 4.5: Calibration of probability forecasts by year
                 Station  P, threshold 119.9, model of Table 4.2
n
                                                       w
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
268
280
307
329
324
324
323
328
331
317
261
2
1
7
4
3
2
6
14
1
0
1
5.071
2.366
9.245
2.293
2.771
2.523
4.732
12.729
2.332
0.894
3.420
2.681
2.002
4.868
1.997
1.517
1.959
3.358
5.994
1.977
0.873
2.763
 -1.876
 -0.965
 -1.018
 1.208
 0.186
 -0.374
 0.692
 0.519
 -0.947
 -0.957
 -1.456
    Looking at the z values in the last column, the overall pattern is similar to that of
Table 3.14, but there are fewer significant values and ^ z2 is smaller (11.8 as against 22.4
for Table 3.14).  The Brier scores were 37.08 (nonlinear model  withoxit  refitting), 31.53
(nonlinear probit model).

    Again, a logit  model was fitted with both linear and quadratic trends. For the logit-
linear case,  the overall fit has e =  46.686 (z  = —1.014), and  a  calibration by year as
follows:
Year
             Table 4.6: Calibration of probability forecasts by year
                 Station P, threshold 119.9, model of Table 4.3
              n
                                                       w
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
268
280
307
329
324
324
323
328
331
317
261
2
1
7
4
3
2
6
14
1
0
1
2.400
2.021
10.690
2.496
1.641
2.022
3.144
11.499
4.089
1.044
5.642
1.718
1.854
4.963
2.115
1.525
1.856
2.741
7.507
2.828
1.019
3.317
-0.305
-0.750
-1.656
1.034
1.101
-0.016
1.725
0.913
-1.837
-1.034
-2.549
                                        31

-------
     In the case of the logit-quadratic model, we had e = 49.361 (z  = —1.507), and the
following table for calibration by year:
             Table 4.7: Calibration of probability forecasts by year
                 Station  P, threshold 119.9, model of Table 4.4

Year          n             r             e             w             z

1981          268           2             0.728         0.722         1.498
1982          280           1             1.466         1.401         -0.394
1983          307           7             10.345        5.099         -1.481
1984          329           4             3.002         2.444         0.639
1985          324           3             3.235         2.478         -0.150
1986          324           2             3.064         2.516         -0.671
1987          323           6             5.025         3.733         0.505
1988          328           14           15.899        7.076         -0.714
1989          331           1             3.158         1.978         -1.534
1990          317           0             0.901         0.888         -0.956
1991          261           1             2.539         2.456         -0.982
    The Brier scores for the two logit models were 36.49 (linear trend model) and 32.54
(quadratic trend model). In this case, the linear trend model seems to pass all the tests
except for the clear pattern of z values in Table 4.6, while the quadratic trend model seems
fine from every point of view.

    Figure 4.1 illustrates these results for the four main models we have applied to station
P. Thus, we show the actual  number of exceedances in each year (solid dot), and  the
predicted numbers under each of the models, in each case dropping the year being predicted
from the data so that the prediction is based on the model refitted to the remaining years.
The two logit models being fitted here omitted the terms t-i and £3. It can be seen that  the
original least squares fit significantly underpredicts most of the exceedances, but the other
three fits all follow the  actual counts reasonably well, with the one based on the quadratic
model perhaps identifiable as the best.  Figure 4.2 shows the same thing for the network
maxima, with a less clear-cut distinction between the different fits but the quadratic model
still doing best if we exclude the one year (1991) for which it performs badly. However,
since 1991 is the most recent year in the data, we should be concerned about its failure to
predict correctly in that year.

    For station Q, we obtained NLLH=93.402 for the nonlinear probit model, which is
to be compared with 99.317 in Table 3.3, but were unable to obtain a fit under the logit
model. For station R, we have NLLH=66.184 (probit), 66.788 (logit) against 67.524 (Table
3.4). Thus the nonlinear model does  appear to improve the fit for station Q, but not for

                                        32

-------
 station R. Because of the computational time required, we did not compute the predictive
 diagnostics for stations Q and R.

     We can now attempt some interpretation of these results. The failure of the direct
 attempt to apply (4.3), with the parameter estimates from the nonlinear least squares fit, is
 disappointing but  was anticipated in view of two facts made perfectly clear by Bloomfield
 et al.  (1993):  a is not constant, and the normal distribution does not fit the extreme
 residuals. One could,  in theory, extend the Bloomfield et al. analysis to model both of
 these features, but then the model runs the danger of becoming seriously overparametrized.
 At any rate, Bloomfield et al.  themselves made no attempt to do this.

     The nonlinear probit and logit models might be considered a more reasonable attempt
 to profit from the work of Bloomfield et a]., using it to establish the appropriate form of a
 binary data regression model.  The results are mixed: for stations P and Q the evidence is
 that the nonlinear model really does improve things, whereas for station R and the network
 maxima it does not. The apparent improvement in the fit when we use a quadratic trend
 model, in the case of station P and the network maxima, should also be noted.

     One point we  have not emphasized, however, is that there are computational stability
 problems with these models  that do not appear to be present with the models of section 3.
 We pointed out already that we were unable to obtain a fit for the nonlinear logistic model
 with station Q, and the same difficulty arose in one of the instances of station P, when we
 attempted to refit the nonlinear logit model (with either linear or quadratic trend) with
 the year 1982 omitted.  For this reason, the 1982 projections for these two models in Tables
 4.6, 4.7 and Figure 4.1 are based on the full data! set.
                                              i
     Moreover, even in the cases when we were able to fit a model, it appears that small
 perturbations  of the model  fit resulted in substantial changes in the parameters. As an
 example of this, after obtaining Table 4.4, we tried, rerunning the nonlinear optimization
 program with the previous parameter estimates as starting values. The reduction in NLLH
 when we did this  was  minimal (from 301.613 to 301.611) but the changes in  individual
 parameters were non-trivial. Figure 4.3 shows the ratios of all 18 parameters under the
 two fits, and it can be seen that several differ substantially from 1. However, the forecast
 probabilities for the two fits are very similar:  Figure 4.4 plots forecast probabilities of
 exceedance from one model  against those from the other.  The two sets are in very good
 agreement and in fact the maximum discrepancy is less than 0.002. Thus there appear to
 be instabilities  in the model  which do not affect the final conclusions. One consequence of
 this should be noted, however: although standard errors have been quoted in this section
just as they have in other sections of the paper, they should not be trusted! We might also
 add that the computational time required for these models was substantially larger than
for those of Section 3.

    It  is difficult to reach any definitive conclusions from these results, but in comparing
the models of this section with those of section 3, we can note that the models of this section
do indeed appear to be slightly better when exploited to their fullest extent, but there is

                                        33

-------
not  a substantial difference, and the associated computational  and stability difficulties
might well lead one in practice to prefer the approach of Section  3.

     Of course, there remain many points of contact between the two approaches, especially
concerning the identification of suitable covariates — as will already be apparent from the
discussion in Sections 2 and 3, we have made extensive use of the results of Bloomfield et
aJ. in this respect. Where the comparison is less certain is in trying to use their functional
form directly to model threshold crossings.
     5.  Excesses over a threshold

     So far, the analysis has been concerned solely with the probability of crossing a high
threshold.  We now extend the analysis so that the excess, or the amount  by which the
ozone level exceeds the  threshold when it crossed, is also modeled.  This is important
for a number of reasons, all connected with the desirability of calculating probabilities
of crossing levels higher than the ones  to which the methods of Sections 3 and 4 would
be applicable.  For example, even if there are still too many exceedances of the official
ozone standard of 120 ppb, it is possible that the excesses are getting smaller so that we
can say that, from the point  of view of its effect on human health, the ozone  situation is
improving. A further  reason for considering excesses is to allow us to perform comparisons
based on the maximum levels achieved in different years (Sections 6 and 7). The method
developed here provides a possible means of extrapolating to higher thresholds than could
be justified in a direct analysis. At the same time, it is important to be aware that an
extrapolation is involved in this, and it would be unwise to attempt to extrapolate too
far.  The discussion that follows will, it is hoped, provide some insight into how far it is
reasonable to extrapolate.

     To motivate the  development of models for excesses over a threshold,  consider first
the simple case in which the underlying observations, X\, ...,-X"n sav) are independent with
common distribution function F. The conditional probability that Yi = X{ — u < y, given
Y{ > 0, is represented by

                                      ^1-                         <«>
The question is what parametric form would be appropriate for Fu.  In this connection,
Pickands (1975) introduced the generalized Pareto distribution (henceforth GPD) whose
distribution function is given by


                                                ~1/<.   y>°>                   (5-2)

where a > 0, £ is any real number, and x+ = max(x, 0).  Thus the range of y is 0 < y < oo
for £ > 0 and 0 < y < — cr/£ if f < 0. The exponential distribution, 1 — e~y/a ', appears as
a limiting case when £ — * 0. This  is of historical importance, because most early attempts

                                         34

-------
 at this kind of analysis (e.g. in connection with Dutch sea level modeling after 1953) were
 based on the exponential distribution.

     The theoretical justification for the GPD is bcised on a theorem proved by Pickands
 (1975), which may be interpreted as saying that the GPD occurs as a limiting distribution
 of .Fu(-), for high u, if and only if the distribution F is in the domain of attraction of
 one  of the classical extreme value distributions  described for example in  the books of
 Gumbel  (1958) or Leadbetter et al. (1983). Since the  "domain of attraction" condition
 is ubiquitous  in extreme value  analysis,  this implies that the GPD is a natural choice
 for exceesses over high thresholds in any context where one might try to apply extreme
 value theory.  Estimation of the parameters a and £ can be carried out by the method of
 maximum likelihood, which has regular properties whenever £  > — |, a condition which is
 almost always met in environmental applications.

     In the present context,  of  course, we want  to  extend this to include  covariates in
 the analysis.  A  general framework for this was laid out by Davison & Smith (1990).
 They considered  regression models in  which the excess (if there is one) on  day z, say, is
 representated by  the G(-;a{,£i)  with &i and £,- depending on covariates. In practice it is
 usually adequate (and a lot simpler) to assume £ constant, while the interpretation of <7,- as
 a scale parameter suggests naturally that a logarithmic link function would be appropriate. .
 Thus we are lead to consider models of the form
in terms of new  coefficients  (7^, j  = 1,2,...}.  There is no reason why the significant
covariates should be the same as in the binary analysis of Section 3, so in general we would
expect to repeat the variable-selection procedure based on the excesses over the threshold.
For the time being, we assume the daily values are independent.

    A more general model, whose significance will become clear later on, is to extend the
GPD to include a power-law transformation of Y{. That is, we assume Yf, for some 6 > 0,
follows the GPD, so the distribution function of  the excess Yi is given by
                                                        ,  y>0,              (5.4)
To give this a specific name we call it the TGPD (transformed GPD).

    We now consider the application of these models to the three stations we have been
considering, as  well as the network maxima.  In the case of the three stations P, Q and
R, it  was found more satisfactory  to use a lower threshold, so we present most of our
results with respect to the threshold 99.9 rather than 119.9. Again, we did not use 100
or 120 as a threshold  so as to avoid the difficulties caused by having observations exactly

                                         35

-------
on the threshold. One reason for using the lower threshold was that there are many more
exceedances in the data set  -  94, 84 and 55 respectively for stations P, Q  and R, as
compared with 41, 38 and 23  for threshold 119.9 - and this improves our ability to fit and
compare different models. For the network maxima, there are plenty of exceedances of the
higher threshold - 143 in all - so we continued to use threshold 119.9 for that data set.

    We now present specific  results, starting with station P. All the  analyses have used
YEAR as a covariate, and a  process of variable selection, similar to  that carried out in
Section 3, showed that T, WSPD and T.WSPD were also significant variables. Here are
the results for the model defined by equations (5.2) and (5.3):
               Table 5.1: Station P: Excesses over threshold 99.9

Variable            Estimate            Stand,  error         t ratio

CONST            -19.03              5.293                -3.60
YEAR              -.09418             .0331                -2.85
T                  .2631                .06151               4.28
WSPD              1.06                 .3693                2.87
T.WSPD            -.03967             .01468               -2.70
£                   -.2846              .138                 -2.06
    Here NLLH=365.376. If YEAR is omitted, this rises to 368.904. Thus, the evidence in
a downward trend in the excesses, and not just in the probability of crossing the threshold,
seems clear from this.

    As an indication of the effect of £ and Q, with the other covariates remaining the same,
the following results were obtained:
                      Table 5.2: Station P, threshold 119.9
                   Comparison of different models for £ and 6

f               Stand, err.       B               Stand, err.      NLLH

—              —              —               —              367.016
-.285           .137             —               —              365.376
—              —              .930             .080             366.650
    No fit was obtained for the TGPD model including both parameters £ and 9. With
this model, the iterative numerical routine moved into the region £ < — 1 for which no

                                       36

-------
local maximum of the GPD likelihood exists (Smith 1985).  Although there axe theoretical
procedures for obtaining model estimates in the situation  (Smith 1993), past experience
with these kinds of models has suggested that the phenomenon is more likely to be due to
having too small a sample to fit the models adequately. Indeed, with the original threshold
of 119.9, this happened even with the GPD model (without the transformation parameter
0), and with several of the data sets.  Since a probability plot (Figure 5.1) suggests that
the GPD model fits perfectly well, for the rest of the analysis we  adopt that model and do
not attempt to resolve the difficulty concerning the TGPD model.

    To construct Figure 5.1, we defined a "residual" Yi/&i  with  respect to the i'th excess
Y{ and the associated estimated  scale parameter d{.  The residuals were then arranged in
order and plotted against expected order statistics under each of the four models.  In the
case of the TGPD model, although not finding a maximum likelihood fit, we did plot the
residuals corresponding to the best model found (with £«—!). If the model fits the data,
then the residuals should be tightly clustered around a straight line of unit  slope through
the origin (also shown on the plots).  The  results show clear deviation from this in the
upper tail of the exponential and transformed exponential data sets, but a good fit for the
GPD. In the light of this, we believe that the GPD forms a good fit overall.

    For the  other.stations, in all  cases the same model seems to produce a good fit. Tables
5.3-5.5 show GPD model fits for  stations Q, R and the newtork maxima, and Figures 5.2-
5.4 are the corresponding probability plots for residuals under each of the four models  for
the marginal distribution.                     '
               Table 5.3: Station Q: Excesses over threshold 99.9

Variable             Estimate            Stand,  error         t ratio

CONST             -8.511              5.414               -1.57
YEAR               -.09546             .02657               -3.59
T                    .1502                .06346               2.37
WSPD               .1658                .3652               .45
T.WSPD             -.01252             .01424               -.88
f                    -.3603              .09193               -3.92
                                        37

-------
               Table 5.4: Station R: Excesses over threshold 99.9

Variable            Estimate            Stand, error         t ratio

CONST            -6.911              3.654               -1.89
YEAR              -.2613              .1245               -2.10
T                  .1424               .04434              3.21
WSPD              .4443               .3097               1.44
T.WSPD            -.01896             .008808             -2.15
                    -.4019              .1718               -2.34
          Table 5.5: Network maxima: Excesses over threshold 119.9

Variable            Estimate            Stand, error         t ratio

CONST            -.8527              1.685               -.51
YEAR              -.04                .03098              -1.29
T                  .05795              .02026              2.86
WSPD              -.1005              .1307               -.77
T.WSPD            -.004424            .005415             -.82
f                   -.2333              .08884              -2.63
    In all four cases, the GPD model was a significant improvement on the exponential
model, and better than the transformed exponential model. Results for the TGPD were
somewhat mixed: in the case of station Q, the TGPD model was fitted successfully, but
with a negligible improvement in log likelihood (0.003) over the GPD model. For station R,
we encountered the same difficulty as with station P, no fit being obtained. For the network
maxima, the TGPD was successfully fitted and  did provide a significant improvement in
log likelihood over the GPD (3.363). On the other hand, the probability plot in Figure 5.4
still shows an adequate fit for the GPD, and in view of the difficulties in fitting the TGPD
in the other cases, it was decided to adopt the GPD as the principal model for this  case
also. As a comparison, here is the detailed table of results for the TGPD model:
                                       38

-------
          Table 5.6: Network maxima: Excesses over threshold 119.9
                                  TGPD model
Variable

CONST
YEAR
T
WSPD
T.WSPD
£
0
Estimate

.2615
-.002885
.04926
-.1785
-.004063
-.5786
.7222
Stand, error

1.226
.05484
.01385
.09756
.004196
.2435
.1252
                                                              t ratio

                                                              .21
                                                              -.05
                                                              3.56
                                                              -1.83
                                                              -.97
                                                              -2.38
                                                              5.77
    In three of the four cases, there appears to be a significant improvement as a result
of including YEAR in the model. Using the GPD model as the standard, the difference in
log likelihood when YEAR is omitted are 3.528 (station P), 5.107 (Q), 2.115 (R) and 0.860
(network maxima). Only the last of these is not clearly significant.  The result is especially
interesting for station Q, since in that case our earlier analyses based on exceedances of a
single threshold have cast doubt on whether the trend is significant.  The present analysis
shows that, for high enough ozone levels, there is a significant downward trend in that case
also.

    One additional comment  should be  made concerning the GPD. Earlier analyses of
ozone data, including those of Smith (1989), have suggested that the high-level exceedances
of ozone have an approximately exponential tail.  The present analysis contradicts that,
since in all four cases  the GPD is a significant improvement on the exponential tail. This
is clear from the plots in Figures 5.1-5.4, and from the differences in log likelihoods (1.640,
4.420, 2.157 and 2.829 respectively for stations P, Q, R and the network maxima). As a
comparison, we reran all the models without any covariates,  focussing just on the com-
parison of the exponential and GPD  models, obtaining log likelihood differences of 0.574,
1.041, 0.000 and 0.122.  None of these are significant, so we would accept an exponential
tail of the distribution if we ignored the covariates. Thus, our conclusion is that the form
of tail function adopted is dependent on whether covariates are included in the model, but
in the case of interest  to us, where the covariates are included, the exponential is too long
a tail and the GPD with £ < 0 is a significantly better fit.
                                        39

-------
     Predictive assessment of model fit

     Now that we have a model for both threshold crossing probabilities and the distri-
bution of excesses over a threshold, we can compute crossing probabilities and expected
numbers of exceedances with respect to any level larger than the original threshold. This
makes it possible to repeat the predictive diagnostics which were first used in Section 3,
with respect to higher ozone levels.

     As an example, here is what happens when the models of Tables 3.11 (quadratic trend
for exceedance probabilities)  and 5.5 (linear trend for the distribution of excesses), both
of which were calculated with to respect to the threshold 119.9, are combined and used to
model exceedances of the level 139.9.  At this level, the expected numbers of exceedances
are still large enough for standard \2 tfists to be applicable.

     The overall calibration table (compare Table 3.13) is as  follows:


                 Table 5.7:  Calibration of probability forecasts
                Network maxima, level 139.9, models with trend
0.000
0.050
0.100
0.300
0.500

0.000
0.050
0.100
0.300
0.500
1.000

1.000
n

2123
84
106
30
11

2354
10
5
22
10
8

55
8.004
6.108
18.572
10.947
6.663
7.813
5.647
14.974
6.864
2.537
0.714
-0.466
0.886
-0.362
0.839
50.295
37.834
0.765
    Here, based on the first five rows, ^ z2 = 2.35, clearly not significant as a xl variable.
The final jz value (0.765) represents the overall fit, while  the Brier score is 40.42 with
a corresponding z (Y^) value of 0.61.  All of these point to the model as being fully
satisfactory in terms of its overall calibration.

    The calibrations by year, corresponding to Table 3.14,  are as follows:
                                         40

-------
             Table 5.8: Calibration of probability forecasts by year
                Network maxima, level 139.9, models with trend
Year

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
     n

     214
     214
     214
     214
     214
     214
     214
     214
     214
     214
     214
                           4
                           4
                           13
                           8
                           3
                           1
                           7
                           13
                           0
                           0
                           2
4.346
3.873
14.527
4.151
2.482
2.749
7.002
6.104
3.357
0.532
1.172
w

3.430
3.273
8.803
3.489
2.191
2.478
5.212
4.742
2.707
0.513
0.996
-0.187
0.070
-0.515
2.061
0.350
-1.111
-0.001
3.167
-2.040
-0.742
0.830
    Here there do appear to be some discrepancies, especially in 1988 where the model
underpredicts the true rate of exceedance, and to a lesser extent in 1984 and 1989. We have
£z2  = 21.34, which is significant as a Xu variable (5% point 19.68, 2.5% point 21.92),
though this is somewhat  questionable as several of the expected values in the e column
are less than 5. If we group the years together as {1981,  1982), 1983, {1984,1985,1986},
1987, 1988, {1989,1990,1991} then we obtain the following table:
Year
            Table 5.9: Calibration of forecasts by year (regrouped)
               Network maxima, level 139.9, models with trend
              n
                                                      w
1981-2
1983
1984-6
1987
1988
1989-91
428
214
642
214
214
642
                           8             8.219
                           13            14.527
                           12            9.382
                           7             7.002
                           13            6.104
                           2             5.061
                                                      6.703
                                                      8.803
                                                      8.158
                                                      5.212
                                                      4.742
                                                      4.216
                                                           -0.085
                                                           -0.515
                                                           0.917
                                                           -0.001
                                                           3.167
                                                           -1.491
for
= 13.36, which is significant against
                                               the 5% level but not at the 2.5% level.
    On the other hand, if we fit the same models without any trend parameters (YEAR
and YEAR2) and repeat the same calculations, we obtain the following results. For the
overall calibration we have:
                                       41

-------
                 Table 5.10: Calibration of probability forecasts
              Network maxima, level 139.9,  models without trend
f mm

0.000
0.050
0.100
0.300
0.500
            0.050
            0.100
            0.300
            0.500
            1.000
n

2098
110
108
28
10
                                                          w
8
11
16
15
5
8.562
7.942
18.760
10.739
6.137
8.378
7.348
15.185
6.545
2.311
-0.194
1.128
-0.708
1.666
-0.748
0.000
            1.000
2354
55
52.141
39.767
0.453
    Here ^z2 = 5.15, the Brier score is 41.47, and the z (Y^} value corresponding to
that is 0.38, none of them significant. The calibration by year is as follows:
Year
            Table 5.11:  Calibration of probability forecasts by year
              Network maxima, level 139.9, models without trend
              n
                                                      w
1981          214          4             3.006         2.498         0.629
1982          214          4             2.831         2.503         0.739
1983          214          13            10.134        6.618         1.114
1984          214          8             2.875         2.528         3.224
1985          214          3             L895         1.721         0.842
1986          214          1             2.179         2.014         -0.831
1987          214          7             6.942         5.208         0.025
1988          214          13            8.522         6.368         1.775
1989          214          0             4.711         3.724         -2.441
1990          214          0             1.763         1.621         -1.385
1991          214          2             7.283         4.964         -2.371

                                                         z2 = 22.81  if the  data are
grouped in the same way as Table 5.9. Clearly, these values are much more highly signifi-
cant than for the fit with the YEAR and YEAR2 values. Even though the fit for 1988 is
better, the overall calibration shows a clear downward pattern in the z values.

    Similar, but mixed, results were obtained for  stations  P, Q  and R. In every case,
the predictive fit of the model becomes questionable at higher levels  when assessed on a
"calibration by year"  basis, and it is not always the case that the model including a trend
dominates over the one that does not. As a graphical comparison, Figures 5.5-5.8 show
with 23 z2 = 30.62 if the  data are grouped as shown, or
                                       42

-------
actual versus predicted numbers of exceedances for several ozone levels. In each case the
models were based on those of Section 3 and the present section, with different assumptions
about  trend. In all cases, the predicted numbers follow the general pattern well, but the
results for stations P, R and the network maxima clearly fail to reflect the very high values
for 1988, while for station Q, 1988 was not a particularly extreme year.

    Some attempt has also been made to do the same thing using nonlinear models, as
in Section 4, for the probability of crossing the initial threshold, but this does not appear
to improve the results. The difficulty is that the nonlinear models  we have tried are still
based on a constant scale parameter,  so they cannot  reflect relative changes in crossing
probabilities as the level gets higher.
    Allowing for dependence

    The preceding analysis has allowed for serial dependence in a crude way, in allowing the
variable PDAY as a covariate in  the probability of crossing the initial threshold. However,
this does not adequately reflect  the dependence that may occur at higher thresholds.  In
this section we outline an alternative approach based on a first-order Markov structure
incorporated into  a threshold analysis. A recent review paper by Smith (1993) contains
necessary background material.
    Suppose we observe data {^"1,..., Xn} from a first-order discrete-time Markov chain,
i.e. the conditional distribution of each Xi given X\ , ..., X{-\ depends just on X{-\ through
a conditional density f(Xi\Xi^i)\ here the variables may be discrete or continuous, the
density referring to a transition probability in the discrete case or a conditional p.d.f. in
the continuous case. We also assume that the process is stationary and that its stationary
distribution is represented by a density fa say. Then the joint distribution of X\, ...,Xn is
                                                 -i)                           (5-5)
                                       «=2

and this may be used to define a likelihood for the data.

    A more convenient way to write (5.5)  is in the form

                               L =  Hfaa/afo-i.*)                          (5 6)
where /2(-, •) denotes the joint density of two consecutive values of the series. We write out
equations (5.5) and  (5.6) merely to make explicit an obvious point: that for a stationary
first-order Markov chain, to define the joint likelihood of the data, it is necessary to specify
the first- and second-order marginal densities of the process.

                                         43

-------
    In the present context we are, in effect, working not with the original process {Xi} of
daily ozone maxima, but with a censored process {4, Y>} where  •
and Yi = Xi — u if 6i = 1 and is undefined otherwise. To model this within the framework
of (5.6), we therefore have to think of the observed process as a mixed discrete-continuous
process.  Suppose, on day i, the probability of exceeding the threshold u is  PJ, and the
conditional distribution of the excess, given the threshold is exceeded, is represented by a
GPD G(y; <7i, £) from (5.2), with density g = dG/dy. Then the relevant density component
/i is
                              fl-p,-        if 4=0,
                                        .     if 4 = 1.
To complete the specification of the problem, then, we need to define a joint density ji.
To do this, we need to find a suitable form of the joint distribution function F2(x\, x2) —
Pr{-^i 5- x\i  X2  <  x2},  which will be  well-defined for x\  > u,x2 > u, that ideally
will have the properties (a) the marginal distributions of the excesses have  the GPD, (b)
the distribution will contain independence as a special case, (c) the model should also
incorporate some meaningful dependence among the extreme values.  One model that has
all these properties, given as equation (3.5) in Smith's (1993) review paper, is defined by
                           -<  -log  i-Pl  1 +
                     = exp
                                                                               (5.8)
                                          /      -r.._,A-1/fc\V'-l'"
                         + |-log(l-


valid in zj > u, x2 > u.. Here a is a parameter representing the strength of dependence
between the two components. The permissible range is 0 < a < 1, where a = 1 represents
independence and the limit a I 0 the total-dependence case in which the two components
are equal  with  probability  1. The justification  of the parametric form of (5.8) is that it
represents a threshold version of the "logistic" dependence structure which has proved to
be a valuable contribution to  the literature on bivariate extreme  value  theory (see e.g.
Tawn,  1988).

    With F2 defined by (5.8),  the  appropriate form of f2 for insertion in  (5.6)  is

                    {^2(14, u)                      if 8\ = 0,  S2 = 0,
                    dF2(u + Y1,u)/dYl           if $i = l,  $2=0,              ,c n,
                                                                               I *S M I
                    /^ ri /     i T r \ I OT^            *JC    f\  £    1               \   J
                    d2F2(u + Yl,u + Y^/dY^dYz  if <5a = l'  82 = 1.


    We thus have a model whose components are

                                        44

-------
    • the probability of exceedance p,-, of the form (3.2) (with no PDAY, PDAY2 covariates
in this case),

    • the GPD G(-;<7;,0 for the daily excesses, defined by (5.2) and (5.3),

    • a first-order dependence parameter a, defined by (5.8), that captures the strength
of dependence from day to day.

    This model was then fitted to the network maxima data using the models of Tables
3.11 (without PDAY)  and 5.5  to define starting values for, respectively, the  exceedance
probability parameters and the excess parameters. The final values from the fitted models
are given in Tables 5.12 and 5.13. The estimated value of a is .940 with a standard error
of .032, and the value of NLLH is 859.564, compared with an initial value of 862.729 with
a. = 1. For technical reasons (J. Tawn, private communication) the likelihood ratio test
statistic of the null hypothesis a — I against  the alternative  a.  < I does not follow the
usual asymptotic x\ distribution, but the drop in NLLH would seem big enough to indicate
a significant result. On the other hand, there is no evidence of a  very strong dependence.
   Table 5.12: Exceedance probability parameters under dependence model
                       Network maxima: threshold 119.9
Parameter

CONST
YEAR
CDAY
SDAY
T
RH
WIND.U
WIND.V
VIS
T.WSPD
Estimate

-21.85
-0.1545
-1.047
0.472
0.2894
-0.01627
-0.0297
-0.1851
-0.06701
-0.01795
Stand, error

1.575
0.03762
0.7126
0.2954
0.02157
0.00967
0.03868
0.04311
0.01955
0.002721
t Ratio

-13.87
-4.11
-1.47
1.60
13.42
-1.68
-0.77
-4.29
-3.43
-6.60
                                       45

-------
           Table 5.13:  Excess parameters under dependence model
                      Network maxima: threshold 119.9
Parameter
         Estimate
                 Stand, error
                          t Ratio
CONST
YEAR
T
WSPD
T.WSPD
         0.04538
         -0.04698
         0.047
         -0.1312
         -0.002606
         -0.2367
                 1.556
                 0.02982
                 0.01877
                 0.1283
                 0.005078
                 0.07145
                          0.03
                          -1.57
                          2.50
                          -1.02
                          -0.51
                          -3.31
    Once again, we believe that the predictive diagnostics form the best guide as to how
well this model is doing.  Corresponding to the overall calibration in Table 5.7,  we have
the following:
                Table 5.14: Calibration of probability forecasts
               Network maxima, level 139.9, dependent model
                       n
                                                         w
0.000
0.050
0.100
0.300
0.500
0.050
0.100
0.300
0.500
1.000
2107
93
116
30
8
7
5
23
14
6
7.902
6.477
20.410
11.618
4.803
7.722
6.009
16.422
7.002
1.887
-0.325
-0.602
0.639
0.900
0.871
0.000
1.000
2354
55
51.210
39.042
0.607
    Here £^z2 = 2.45 based on the first five rows, and the Brier score is 38.70 (Y^  =
—0.08) compared with the earlier value of 40.42 for the model without dependence. The
calibration by year leads to:
                                       46

-------
            Table 5.15:  Calibration of probability forecasts by year
               Network maxima, level 139.9, dependent model
Year

1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
n

214
214
214
214
214
214
214
214
214
214
214
4
4
13
8
3
1
7
13
0
0
2
4.039
4.365
12.284
4.520
2.613
2.580
6.548
8.930
2.463
0.561
2.306
w

3.268
3.736
8.009
3.778
2.301
2.349
4.982
6.217
2.053
0.540
1.809
-0.021
-0.189
0.253
1.790
0.255
-1.031
0.203
1.632
-1.719
-0.763
-0.227
    The z values do appear to be better than those in the earlier Table 5.8. In this case
£)z2 = 10.728, clearly not significant against fjj, though the interpretation of this is still
a little dubious as several years have small e values.  Regrouping in the same way as in
Table 5.9, we have:
Year
           Table 5.16: Calibration of forecasts by year (regrouped)
               Network maxima, level  139.9, dependent model
n
                                         w
1981-2
1983
1984-6
1987
1988
1989-91
428
214
642
214
214
642
                           8
                           13
                           12
                           7
                           13
                           2
                           8.404
                           12.284
                           9.713
                           6.548
                           8.930
                           5.330
                           7.004
                           8.009
                           8.428
                           4.982
                           6.217
                           4.402
                            -0.153
                            0.253
                            0.788
                            0.203
                            1.632
                            -1.587
for X^^2 =  5.93, not significant against xl- Note that the e values here are computed
sequentially as we go through the data, i.e. the predicted value for day n does depend
on the observed  value for day n — I. However, this does appear to be a valid method of
comparison, since this kind of sequential analysis is precisely the situation that the theory
of Seillier-Moiseiwitsch and Dawid (1993) is designed for. As in our earlier analyses, we
have dealt with the problem of parameter estimation by dropping one year at a time, each
year's predictions being calculated using the model fitted to the other ten years' data.
                                        47

-------
     The conclusion at this point is that the dependent analysis does appear to deal with the
extrapolation to higher thresholds, in the sense that any discrepancies between predictions
and data are consistent with normal statistical variability. To extend the comparison, for
example, at level 159.9,  the calibration by year yields £}z2 = 10.97, not significant as XH,
or if we group the years in the same way as in Table 5.16, £ z1 = 8.03, still nowhere near
significant against  xl-  On the other hand, the individual values for 1988 in this case are
r = 7, e = 3.187, w = 2.58 for z = 2.37. So the 1988 value is significant when taken on its
own, but not when viewed as just one year amongst 11 years' data.

     A similar analysis has also been carried out for station P, based on threshold 99.9,
with linear trends  (the YEAR covariate) in both the exceedance probabilities and excess
values. In this case we found 5 = 0.94, standard error 0.048, for a log likelihood difference
of 0.832 compared with the  independent (a = 1)  case.  In this case we do not believe the
difference in the model fits to be significant, but again the predictive diagnostics do  point
to some improvement. For example, at level 139.9, we find £};2 = 10.29 for the calibration
by  year, with  1988 values r =  10, e = 5.093, z = 2.635, and a Brier score  for this level
of 17.72. The  results for the corresponding model with no dependence are ^ z2 = 23.17,
e = 3.421 and z = 4.252 for  1988, and Brier score 18.54.

     Plots of the year-by-year comparisons of observed with expected exceedances, based
on the dependent models for station P and the network maxima, are shown in Figures 5.9
and 5.10. In each case the agreement is better than for the corresponding plots in Figures
5.5 and 5.8.  (For  station P, the value for 1988 in the dependent linear trend model of
Figure 5.9 is similar to that  for the independent quadratic trend model of Figure 5.5, but
the overall fit is much better in the case of Figure 5.9.)

     As  a final evaluation of the  dependent models, Figures 5.11  and 5.12 show plots of
the actual data on exceedances  over 120, for station P and the network maxima, together
with five simulations from our final dependent model. The purpose of this is to allow us to
judge by eye how well the simulations are producing data sets that look like the real thing.
The plot here  is of day within the period  (day 1=1/1/81) on the x axis, ozone in ppb on
the y axis. For station P, the simulations generally produce'similar 1988 values to  those
observed, and higher ones for  1983. In the case of network maxima (Figure 5.12),  none
of the simulations produces values as high as those actually observed for 1988, though the
agreement elsewhere is reasonably good.

     The overall conclusion from this section is that the independent models,  or those that
merely incorporate dependence through the inclusion of the PDAY variable  in the model
of Section 3, seem to fit the data when judged by probability  plots of the excesses, but
do  not do an  entirely satisfactory job  of reproducing the higher levels as judged by the
predictive diagnostics.  The Markovian model introduced in this last part of the section
does seem to get around that difficulty as well: although the models still underpredict the
1988 values, they are consistent with the data as judged by overall goodness of fit criteria.
                                         48

-------
     6.  Identifying the extreme ozone days adjusted for meteorology


     Here, some preliminary thoughts axe given as to how the preceding results could be
interpreted  as  a means of identifying extreme ozone days  when allowance  is made for
meteorology.


     Suppose we have a model which defines the probability pi(x) that the ozone on day i
is over a level x. For the models in this paper, this function is defined for any x above the
initial threshold, by a combination of the models defined in Sections 3 and 5, and in the
case of a dependent model as at the end of Section 5, it is to be interpreted as a conditional
probability given past ozone values. If there is indeed an exceedance of the threshold on
that day, with daily maximum, at level F;, then the quantity 77,- = pi(Yi) can be thought of
as the tail probability associated with the value YJ. A very small value of r/,- thus indicates
some additional factor that must have produced extreme ozone on that day.


     This r]i has a rather different interpretation according to whether or not trend is
included in  the model. If trend is not included, then it represents purely an adjustment
for meteorology, not allowing for any other factors that may have affected ozone levels. If
trend is included then, at least within the confines of the current study, it represents the
best available indicator of how extreme a particular day is. The qualification "within the
confines of the  current study" is added because, of course, this is one aspect which could
be changed  considerably if additional information, such as data  on precursor emissions,
were included.
    Table 6.1 lists all ozone days over 170 for the network maxima, with associated tail
probabilities calculated both with a trend in the model (probability 1) and without (proba-
bility 2). The model with trend was the same as in Section 5 for threshold 119.9: quadratic
trend in the binomial exceedance probabilities, linear trend in the GPD for excesses, with
the dependence parameter a also included.  Thus, the probability r/j is  actually a condi-
tional probability given the previous day's ozone.  The model without trend is the same
model refitted without the YEAR and YEAR2 covariates.
    It can be seen that there are considerable differences among the probabilities associ-
ated with the highest ozone days.  The table includes several values from 1988 with very
small r/,- values,  reinforcing the fact  that this year's very high ozone readings cannot be
exaplined solely  in terms of meteorology. In contrast, the 1983 values are associated with
much more modest values of  rj.
    Table 6.2 is similar to Table 6.1 but computed for, not the overall highest ozone days,
but the highest within each year (including some ties). The effect of the trend on the tail
probabilities can be more clearly seen in this table.

                                        49

-------
         Table 6.1: 14 highest ozone days and associated probabilities
                                Network maxima
 Date                Ozone

 July 7 1988          223
 July 8 1988          215
 June 23 1983         188
 July 6 1988          186
 July 15 1983         180
 June 18 1987         178
 June 24 1987         177
 July 7 1981          176
 July 19 1983         175
 August 11  1988       173
 August 1 1981        172
 August 4 1984        172
 July 5 1988          170
 May 8 1982          170
Prob. 1

0.0012
0,0055
0.0533
0.1387
0.3319
0.0194
0.0312
0.1036
0.3272
0.0063
0.0116
0.0714
0.0685
0.0002
 Prob. 2

 0.0019
 0.0071
 0.0239
 0.1573
 0.2919
 0.0170
 0.0345
 0.0485
 0.2875
 0.0104
 0.0062
 0.0422
 0.0679
 0.0002
      Table 6.2:  Annual highest ozone days and associated probabilities
                               Network maxima
Date                Ozone

July 7 1981          176
May 8 1982          170
June 23 1983        188
August 4 1984       172
June 7 1985          148
June 8 1985          148
July 28 1986         142
June 18 1987        178
July 7 1988          223
July 6 1989          139
August 22 1990      130
June 1 1991          152
June 20 1991        152
Prob. 1

0.1036
0.0002
0.0533
0.0714
0.0699
0.0844
0.0356
0.0194
0.0012
0.4565
0.0015
0.0701
0.2049
Prob. 2

0.0485
0.0002
0.0239
0.0422
0.0485
0.0592
0.0295
0.0170
0.0019
0.5130
0.0080
0.2589
0.4038
    An alternative approach to identifying high ozone days might be to identify any day
with an exceptionally small value of r/,- as an extreme ozone event.  This does not appear
                                       50

-------
to be a sensible  thing to do on its own, however, for two reasons. The first is that it
is quite possible  for a very small 77,- to arise on a day when the ozone is not at all high
in absolute terms,  if it is merely a lot higher than would have been expected from the
background conditions. Identifying such days would appear to be of little use in dealing
with the overall ozone problem.  The second reason for not giving too much attention to
the very small values of 77; is that they are likely to be very sensitive to arbitrary features
of the model, such  as which threshold was used or which covariates were included in the
models for p,- and G{ .

    However, another possibility is to  adopt a joint criterion: select some intermediate
ozone level, say 130, and some intermediate critical value of r/,-, say .05, and flag all days
where both criteria are violated. The idea is that this will help identify days where the
ozone was high but, in a sense, it should not have been. The results of this are shown in
Table 6.3, arranged in order of increasing 77^.

    Similar results for station P are presented in Tables 6.4-6.6.  In this case,  for the
models  with trend, a linear trend  in both the exceedance probability and excess  value
components was adopted. In Table 6.4, it can be seen that four of the top five values are
from July 5-8 1988, and that despite the fact that we are  using the dependent model, so
that the quoted probabilities take previous high values into account, the 77,- values for July
7 and 8 are still very small,  whereas those for July 5 and 6 are much more moderate. The
conclusion from this is that  the meteorology through our models will "explain" the values
for July 5 and 6, which are still very high compared with the rest of the record, but that it
still cannot explain the exceptionally high ozone readings of July 7 and 8. Table 6.5  again
presents the annual maxima, and Table 6.6 lists all the days for which the ozone level was
at least 120 and 77,-  < 0.1.

    The results are necessarily highly tentative. As is already clear from comparing the
results for station P and the  network maxima, the results axe highly volatile to the selection
of model,  and the interpretation of a small value- of 77,- is far from clear.  Nevertheless, it
seems valuable to try to identify those days on which a high ozone level occured which  could
not be explained by meteorology, as this gives at least indirect evidence that additional
factors may have been responsible for the high ozone level on those days, which could in
turn point the way  towards  improved control strategies for the future.
                                         51

-------
                Table 6.3: Highest ozone days by joint criterion
            All days with Ozone > 130 and rji < 0.05 are tabulated
                               Network maxima
Date
May 8 1982
June 30 1981
July 7 1988
August 22 1990
June 6 1988
July 23 1983
August 15 1982
July 8 1988
August 17 1986
August 11 1988
July 19 1991
May 28 1982
June 23 1984
April 18 1987
August 1 1981
May 25 1985
June 18 1987
August 7 1988
June 21 1991
June 28 1984
August 18 1984
July 7 1985
August 16 1982
June 24 1987
August 14 1991
July 3 1982
July 28 1986
July 12 1982
August 15 1984
June 19 1986
August 29 1983
Ozone

170
136
223
130
164
162
135
215
131
173
135
132
160
139
172
146
178
144
134
130
151
130
147
177
134
134
142
144
136
131
155
Probability 77,-

0.0002
0.0008
0.0012
0.0015
0.0048
0.0050
0.0051
0.0055
0.0063
0.0063
0.0078
0.0087
0.0088
0.0101
0.0116
0.0126
0.0194
0.0201
0.0205
0.0212
0.0233
0.0236
0.0311
0.0312
0.0348
0.0356
0.0356
0.0366
0.0420
0.0444
0.0450
                                      52

-------
         Table 6.4: 11 highest ozone days and associated probabilities
                                   Station P
Date
July 7 1988
July 8 1988
July 6 1988
August 4 1984
July 5 1988
June 6 1988
June 23 1983
August 3 1988
July 22 1983
July 29 1983
August 1 1981
Ozone

215
215
186
172
170
164
161
160
158
157
151
Prob. 1

0.0292
0.0607
0.3230
0.0195
0.1568
0.0022
0.0169
0.0473
0.0171
0.1712
0.0102
Prob. 2

0.0784
0.1192
0.4052
0.0129
0.1789
0.0104
0.0078
0.0676
0.0060
0.0567
0.0016
      Table 6.5: Annual highest ozone days and associated probabilities
                                   Station P
Date                Ozone

August 1 1981       151
July 12 1982         144
June 23 1983         161
August 4 1984       172
June 8 1985         143
July 28 1986         123
June 18 1987         149
June 6 1988         164
July 6 1989          121
September 10 1990   107
May 15 1991         123
                    Prob. 1

                    0.0102
                    0.0026
                    0.0169
                    0.0195
                    0.0413
                    0.0210
                    0.1150
                    0.0022
                    0.3644
                    0.0311
                    O.OlOl
                    Prob.  2

                    0.0016
                    0.0020
                    0.0078
                    0.0129
                    0.0319
                    0.0198
                    0.1176
                    0.0104
                    0.4480
                    0.0573
                    0.0273
                                       53

-------
                Table 6.6: Highest  ozone days by joint criterion
             All days with Ozone > 120 and r]i < 0.1 are tabulated
                                    Station P
Date
June 20 1983
June 61988
July 12 1982
August 8 1985
September 2 1983
July 27 1985
May 15 1991
August 1 1981
June 20 1987
August 11 1988
June 23 1983
July 22 1983
August 4 1984
July 28 1986
August 3 1984
July 7  1988
August 7 1988
August 9 1986
July 24 1988
June 81985
August 15 1984
August 3 1988
July 13 1984
July 8  1988
June 19 1987
Ozone

122
164
144
133
141
128
123
151
124
149
161
158
172
123
145
215
144
117
144
143
126
160
136
215
133
Probability 77,-

0.0012
0.0022
0.0026
0.0048
0.0049
0.0068
0.0101
0.0102
0.0123
0.0143
0.0169
0.0171
0.0195
0.0210
0.0258
0.0292
0.0312
0.0327
0.0368
0.0413
0.0433
0.0473
0.0576
0.0607
0.0736
    7. Recurrence of 1988

    One question that has repeatedly occurred in discussion of the ozone problem is "how
extreme a year was 1988?". Of course we could ask the same question with respect  to any
other  year in the study, but  1988 is the focus because it is clearly the worst ozone year
in the current record.  In this section we attempt to answer this question with respect to
historical climatological data going back to 1959.

    Each of the models we have considered can be regarded as providing an algebraic
expression for
                                    Pn(t\xn)
                                       54

-------
 defined as the probability of crossing level t on day n, given the meteorology vector xn
 for that day. We can therefore define an expected number of crossings for year N by the
 formula
                            SN(t) =     £     Pn(t\xn).
                                    day n 6 year N
 This quantity can be calculated with respect to each year TV for which the meteorological
 data are available, and can be thought of as the "ozone potential" for that year.

     For an analysis of this nature, it would not make sense to include trends in the model,
 so we use only models for which no trend is included.  Also, since the model is not being
 applied directly to the ozone record, we cannot include PDAY in the model.  However, we
 can calculate the function 5W(0 with respect to different ozone models  that have been
 fitted to different stations.

     For our main evaluation, we used the models refitted to the network maxima data as
 in Tables 3.11 and 5.5, omitting YEAR, YEAR2 and PDAY from the models. The function
 SW(t) was  then plotted against N, for each of t =120, 140, 160, 180. The results are shown
 in Figure 7.1. The figure shows  substantial fluctuations, with high projected ozone levels
 for 1966, 1975, 1983 and  1988. Overall 1988 has the highest projected exceedances, though
 at the higher levels (160, 180), 1983 is higher. This is consistent with what we have seen
 in earlier comparisons.  A similar figure computed using model fits for station P (Figure
 7.2) gives a very similar message, though in this case with 1988 giving the highest number
 of projected exceedances at all levels.

     These models were  based on independent days.  However, the calculations were re-
 peated using the Markov models of Section 5, with results indistinguishable from those
 in Figures  7.1 and 7.2.  There is a distinction between using the Markov model in  this
 context and for the predictive diagnostics of Section 5, because there, the diagnostics were
 computed sequentially (one day's ozone reading affects the diagnostic for  the next). Our
 analysis in  this section is based purely on expected numbers of exceedances, so it is only to
 be anticipated that the dependence in the model has minimal influence on the conclusions.

     There  are a number of possible interpretations of the results. There appears to be
 a general increase of ozone potential over the whole period of the study, which implies
 climatological variability.  This  could therefore be tied  in with much broader issues of
 climate change, including the greenhouse effect and its possible effect on global warming.
The results do lend some support to the contention that  1988 was the most extreme year
in terms of meteorology in the record. On the other hand, we have seen in Section 5 that
the models still underpredict  the actual ozone levels that occurred in 1988,  so there is still
some support for the notion that additional factors were responsible for the exceptionally
high levels  of that year.
                                        55

-------
     8.  Sums of excesses over a threshold

     A different characterization of the "extremeness" of an ozone day, other than the daily
maximum, is some form of criterion based on the persistence of hourly ozone readings over
a lower threshold than the ozone standard 120. For example, the sum of exceedances over
the threshold 60 is used in calculating the SUM06 criterion. In general, we may define a
threshold UQ (e.g. UQ = 60) and consider either of the criteria

                                 Si =   (** - UQ)+                            (8.1)
or                                  	
                                    ~~                                          (8-2)
where Yik is  the fc'th hourly value on day i. In (8.1) we use the notation, as previously,
x+ = max(x,0), and in (8.2), I(Yik > UQ} denotes the indicator function (1 if Yik > UQ, 0
otherwise).

    Note: This is the first time in the whole report that we consider hourly ozone values.
Up to now, all  the analysis has been in terms of daily maxima.

    In this analysis we confine ourselves to (8.1), whose distribution is easier to characterize
than (8.2). However, current interest in SUM06 is focussed on (8.2),  so it would seem to
be important in future work  to try to characterize that  distribution as well.

    The current status of work in extreme value theory is that it has a good deal  less
to say about quantities such as (8.1) than it  does about overall maxima.  Nevertheless,
a recent paper  by  Anderson  & Dancy (1992) has  provided a theoretical characterization
and some practical results. In particular, they proposed the  TGPD  (equation 5.4) as a
distributional form that was consistent with the theoretical characterization and performed
well in a practical (hydrological) data study. Although their result should not be considered
a complete treatment of the problem, it does suggest that an analysis based on the TGPD
would have both theoretical and practical justification, and we therefore pursue that here.

    The possibility has also been suggested of trying other values of UQ besides 60.  In
particular, one  could define a similar criterion  based on  UQ = 80, so that is tried as well.

    In Tables 8.1-8.4, the results of the TGPD model  are given for UQ = 60  and 80, for
stations P and R. In both cases the full model (including  both £ and 8) is quoted, though in
no case are both of these parameters actually required. In all four cases, if 9 is included in
the model, then a test of HQ : £ — 0 produces a result which is not significant.  In fact, only
in the case of station P and UQ = 80 is the null hypothesis of an exponential distribution
(£ = 0,  0 = 1)  rejected by a  likelihood ratio test.  In all cases the other parameters were
chosen by a variable selection procedure similar to that used in section 3, though assuming
a linear trend in the YEAR variable.

                                         56

-------
    Probability plots are shown for all four data sets and all four models in Figures 8.1-
8.4. In Figure 8.1, there is no discernable difference among the models but there are three
apparent outliers, corresponding to (from the top) July 12 1982, July 22 1983 and June 19
1983. From Table 6.6 we see that the first and third of these were also events corresponding
to a very low tail probability with respect to their daily maxima, while the July 22 1983
event also appears in both Tables 6.4 and 6.6.  Thus it seems, as we might expect, that
there is a high association between extreme events from the daily maximum analysis and
ones from the present results. On the other hand there are no obvious outliers in the other
plots, except in Figure 8.4 where it appears that there is one outlier for the date of June
19 1986. No attempt has been made to remove outliers in this analysis.

    To complete the description of sums over a high threshold, we  also need a model for
the probability of crossing  the threshold.  Tables 5.5-5.8 present models that have been
identified for that,  using the same methods as in section 3 but repeating the variable
identification with the lower thresholds.
Variable

CONST
YEAR
CDAY
SDAY
PR
T
TD
Q
VIS
WSPD
T.WSPD
f
e
      Table 8.1: tt0=60, Station P

Estimate            Stand, error
-4.66
-.05252
-.8205
.06538
.02687
.1094
.04833
-.2004
-.01439
-.05544
-.004823
-.0318
.9293
1.53.7
.01584
.1987
.09482
.01149
.01648
.02229
.05481
.008711
.06709
.002938
.07157
.05059
t ratio

-3.03
-3.32
-4.13
.69
2.34
6.64
2.17
-3.66
-1.65
-.83
-1.64
-.44
18.37
                                        57

-------
                          Table 8.2:  u0=80, Station P
Variable
Estimate
Stand, error
CONST
YEAR
CDAY
SDAY
PR
T
TD
Q
WSPD
T.WSPD
£
e
-14.77
-.09933
-.749
.1772
.03685
.2131
.05107
-.1762
.5068
-.02725
-.04198
.8169
2.957
.02954
.4882
.2132
.02101
.03049
.05694
.133
.1681
.006264
.1634
.08419
t ratio

-4.99
-3.36
-1.53
.83
1.75
6.99
.90
-1.33
3.02
-4.35
-.26
9.70
                          Table 8.3: u0=60, Station R
Variable
Estimate
Stand, error
CONST
YEAR
CDAY
SDAY
T
TD
Q
WIND.U
WIND.V
VIS
£
6
-2.648
-.05715
-.4452
.1314
.09399
.05699
-.2594
.03126
-.05123
-.03928
-.04785
1.039
1.009
.03077
.2597
.1178
.009624
.0258
.06871
.01559
.01589
.01057
.08586
.07002
t ratio

-2.62
-1.86
-1.72
1.12
9.77
2.21
-3.78
2.01
-3.22
-3.72
-.56
14.84
                                       58

-------
                          Table 8.4: u0=80, Station R
Variable
Estimate
Stand, error
CONST
YEAR
CDAY
SDAY
T
Q
WIND.U
WIND.V
VIS
£
e
-1.84
-.1212
-1.249
-.5489
.09555
-.1306
.06009
-.07609
-.05623
.1531
1.018
1.223
.06906
. .5977
.2709
.01805
.04333
.03706
.03556
.01981
.1921
.1232
t ratio

-1.50
-1.76
-2.09
-2.03
5.29
-3.02
1.62
-2.14
-2.84
.80
8.27
Variable
                Table 8.5: Exceedance probability parameters
                           Station P, Threshold 59.9
Estimate
Stand,  error
t ratio
CONST
YEAR
CDAY
SDAY
OPCOV
PR
T
TD
Q
WIND.U
WIND.V ,
VIS
T.WSPD
-18.34
-.04824
-.9727
.7299
-.009412
.04982
.2499
.07584
-.3279
-.01097
.1102
-.0896
-.01122
1.417
.0223
.2512
.1332
.002484
.0162
.01799
.03041
.08207
.02112
.02204
.01331
.002127
-12.95
-2.16
-3.87
5.48
-3.79
3.07
13.89
2.49
-4.00
-.52
5.00
-6.73
-5.27
                                      59

-------
Variable
                Table 8.6: Exceedance probability parameters
                          Station P, Threshold 79.9
Estimate
Stand, error
t ratio
CONST
YEAR
CDAY
SDAY
OPCOV
PR
T
TD
Q
WIND.U
WIND.V
VIS
T.WSPD
-25.89
-.06021
-1.644
.01026
-.008993
.03602
.3
.1617
-.5966
.04669
.1414
-.09568
-.02015
2.372
.02917
.4253
.2043
.003259
.02308
.02319
.05357
.1331
.03084
.03447
.01674
.002457
-10.92
-2.06
-3.87
.05
-2.76
1.56
12.94
3.02
-4.48
1.51
4.10
-5.72
-8.20
Variable
                Table 8.7: Exceedance probability parameters
                          Station R, Threshold 59.9
Estimate
Stand, error
i ratio
CONST
YEAR
CDAY
SDAY
T
Q
VIS
WSPD
T.WSPD
-16.21
-.07686
-.3045
.6717
.2864
-.2503
-.1051
.1216
-.01032
2.218 '
.05045
.2978
.1599
.03318
.04173
.01786
.07249
.004714
-7.31
-1.52
-1.02
4.20
8.63
-6.00
-5.88
1.68
-2.19
                                      60

-------
                 Table 8.8:  Exceedance probability parameters
                            Station R, Threshold 79.9

Vaxiable             Estimate            Stand, error          t ratio

CONST             -25.73              4.964                -5.18
YEAR               -.07884             ..06826               -1.16
CDAY               -.177               .5151                -.34
SDAY               .8467                .2346                3.61
T                   .3789                .06297               6.02
RH                  .07593               .03169               2.40
Q                   -.5981              .1476                -4.05
WIND.U             .04774               .03499               1.36
WIND.V             -.156               .03975               -3.93
VIS                  -.1117              .02276               -4.91
    9. Conclusions and summary

    In this section we give an overall summary of what the objectives of the study were,
what methods were used, and what we think has been achieved as a result. The section
can be read independently of the rest of the paper.

    The study was conceived as a follow-on to the study by Bloomfield, Royle & Yang
(1993), in which nonlinear regression techniques were used to predict ozone levels as a
function  of a suite of meteorological variables.  Our selection of meteorological variables
does not exactly correspond  to theirs -  we do not include lagged meteorological variables
or upper-air data,  and we base all results on 12:00 noon measurements rather than any
kind of daily average - but  the main variables are; the same  and achieve similar R2  in
the only  direct comparison made  between the two papers, when the nonlinear model of
equation (4.1) is fitted to the network maxima by least squares.

    For our study we have concentrated  on ozone daily maxima from three stations labelled
P, Q and R, chosen because  they had high ozone levels, and also the network maxima as
defined by Bloomfield et al.  The  overall objective of the study is to fit models that are
based solely on exceedances of a high threshold, so as to answer questions relating to trends
in the high levels of the ozone series. For much of the study, the threshold used was 119.9
(i.e. just below the standard level 120), though for parts of the  study which used excess
values as well (Section 5 onwards), it was often found convenient to use a threshold  99.9.
At the lower threshold, there are more exceedances and this helps us fit a good model.

    The  simplest model of this kind is to fit a logit model to the binary data consisting of
1 if the threshold is exceeded  on a given day, 0 otherwise. In Section 3, we fitted this model

                                        61

-------
 by maximum likelihood, assuming the logit probability is linear in the covariates and with
 stepwise selection of covariates.  However, we also considered  a model with PDAY as a
 covariate, this being  an indicator of whether the previous day  was an ozone exceedance,
 so introducing an element of serial dependence into the model.  For stations P and R, the
 linear trend represented by the covariate YEAR was significant, but not (in this analysis)
 for station Q. Also, in station P  the variable PDAY was found to be significant, but  not
 for the other two.  An alternative analysis based on threshold 99.9  in fact reduces  the
 significance of the YEAR variable, suggesting it is no longer significant for R and only just
 for P. A quadratic trend was also fitted and found significant for station P.


     Comparison with an alternative Poisson process formulation showed less satisfactory
 fits for the Poisson model, so this was not pursued further in the paper. A corresponding
 analysis for net work, maxima showed that linear and quadratic trends were both significant,
 as was the PDAY variable indicating dependence.

     Some ideas from the literature on probability forecasting were adapted to construct
 predictive measures of goodness of fit for these models.  The results generally supported
 the conclusions that had been drawn by a likelihood analysis.

     In  Section 4, a direct attempt was made to compare  the logit  models with linear
 covariate structure with the  nonlinear models proposed by Bloomfield et al.  Fitting a
 nonlinear model by ordinary least squares, and using that to predict high exceedances,  did
 not give good results. On  the other hand, rewriting the model as a  nonlinear probit or
 logit model, and fitting  that to the exceedances over a high threshold, did produce results
 that were at  least as good as the  ones from the approach of Section 3. In some cases,  the
 nonlinear models did better as judged by the predictive diagnostics.  On the other hand,
 fitting the nonlinear models requires much more computing time and there are problems
 with numerical stability of the parameter estimates.  Our conclusion is that there may be
 advantages in using a nonlinear model  as measured by the overall fit, but these do  not
 seem to compensate the difficulties of fitting such a model.

     In Section 5, the  analysis was extended to excesses over a threshold. The Generalized
 Pareto distribution (GPD) with covariates does  uniformly well as  a model for the distri-
 bution. In the case of station Q, where we earlier found no significant evidence of a trend,
 we do find that the trend in the excess values is significant.  The predictive assessments of
 model fit show good results at lower levels, but suggest that the models significantly  fail
 to predict what is happening at much higher levels.  However, we also introduce a model
 for dependence, in which daily values are assumed to follow  a Markov chain in which  the
joint distribution of successive pairs lies in the domain of attraction of a bivariate extreme
 value distribution. This model (the dependent model) does indeed  seem to produce a sat-
 isfactory fit at higher levels. Although there are still observed discrepancies, especially in
 1988, the overall deviation from the model is not significant as measured by the predictive
 diagnostics.

                                         62

-------
     In Section 6, some preliminary and tentative suggestions were made as to how the
results might be interpreted from  the point of view of identifying extreme ozone days
adjusted for meteorology.

     In Section 7, we addressed the issue of "recurrence of 1988" by using the models
(refitted without trend)  to project  ozone exceedances across the whole period for which
meteorological data are available (1959-1991). The result confirm that 1988 and 1983 were
the two most extreme years in the whole period, from the point of view of meteorological
conditions conducive to high ozone. On the other hand, we have also seen that the models
did not fully explain the very high ozone levels that actually occurred in 1988. Thus we
still leave open the possibility that additional factors, over and above anything that could
be explained by meteorology, were responsible for the ozone levels of that year.

    Finally, in section 8  we have made some preliminary proposals for the distribution of
sums over a high threshold, though we concentrated on the definition (8.1), for which some
probabilistic theory is available, rather than (8.2) which corresponds to SUM06. In view
of continuing work on this topic, we have not attempted  a fully comprehensive analysis
here.
                                    References

    Anderson,  C.W. &  Dancy, G.  (1992), The severity of extreme events.   Preprint,
Sheffield University.

    Bloomfield, P., Royle, A. & Yang, Q. (1993), Accounting for meteorological effects in
measuring urban ozone levels and trends. National Institute of Statistical Sciences Techical
Report #1.

    Cox, D.R. & Lewis, P.A.W.  (1966),  The Statistical Analysis of Series of Events.
Methuen, London.

    Cox, W.M. & Chu,  S.-H.  (1992), Meteorologically adjusted ozone trends  in urban
areas: A probability  approach.  U.S. Environmental Protection Agency Technical Support
Division  MD-14, Research Triangle Park, NC 27711.

    Davison, A.C. & Hemphill, M.W. (1987), On the statistical analysis of ambient ozone
data when measurements are missing. Atmospheric Environment 21, 629-639.

    Davison, A.C. & Smith, R.L. (1990),  Models for exceedances  over high thresholds
(with discussion). J.R. Statist.  Soc. B 52, 393-442.

    Dawid, A.P. (1982), The well-calibrated Bayesian (with discussion). J.  Amer. Statist.
Assoc 77, 605-613.

                                         63

-------
     Dawid, A.P. (1984),  Statistical theory:  the prequential approach (with discussion).
J.R. Statist.  Soc. B 147, 278-292.

     Dawid, A.P. (1986), Probability forecasting. In Encyclopedia of Statistical Sciences,
Vol. 7, eds. S. Kotz, N.L. Johnson and C.B. Read. New York: Wiley-Interscience, 210-218.

     DeGroot, M.H. & Fienberg, S.E. (1983), The comparison and evaluation of forecasters.
The Statistician 32, 12-22.

     Gumbel, E.J. (1958), Statistics of Extremes. Columbia University Press, New York.

     Karr,  A.F.  (1986), Point Processes and their Statistical Inference.  Marcel Dekker,
New York.

     Leadbetter, M.R., Lindgren, G. &  Rootzen, H. (1983), Extremes and Related Prop-
erties of Random Sequences and Series. Springer Verlag, New York.

     National Research Council (1991). Rethinking the Ozone Problem in Urban and
Regional Air Pollution. National Academy Press, Washington, B.C.

     Pickands, J. (1975), Statistical inference using extreme order statistics. Ann. Statist.
3, 119-131.

     Press, W.H., Flannery, B.P., Teukolsky, S.A. & Vetterling, W.T.  (1986), Numerical
Recipes: The Art of Scientific Computing.  Cambridge University Press.

     Resnick, S. (1987), Extreme Values, Point Processes and Regular Variation. Springer
Verlag, New York.

     Roberts, E.M. (1979), Review of statistics of extreme values with applications to air
quality data, I: review, II: applications. J. Air  Pollut.  Control Assoc.  29, 632-637 and
733-740.

     Seillier-Moiseiwitsch,  F. & Dawid,  A.P. (1993), On testing the validity of sequential
probability forecasts.  J. Amer. Statist.  Assoc 88, 355-359.

     Seillier-Moiseiwitsch, F. (1993), Predictive assessment of logistic models. Unpublished
technical report.

     Shively, T.S. (1991), An analysis of the trend in ground-level ozone using nonhomo-
geneous Poisson processes.  Atmospheric Environment 25B, 387-396.

     Shreffler, J.H. (1993),  Trends in high ozone concentrations in St.  Louis, 1983-1991.
Department of Statistics, University  of North Carolina.

                                         64

-------
    Singpurwalla, N.D. (1972), Extreme values from a lognormal law with applications to
air pollution problems. Technometrics 14, 703-711.

    Smith, R.L. (1985), Maximum likelihood estimation in a class of nonregular cases.
Biometrika, 72, 67-90.

    Smith, R.L. (1989), Extreme value analysis of environmental time series: An appli-
cation to trend detection in ground-level ozone (with discussion).  Statistical Science 4,
367-393.

    Smith, R.L. (1993), Multivariate threshold  methods. Paper presented at the NIST-
Temple University Conference on Extreme Value Theory and its Applications, Gaithers-
burg,  May 1993.

    Tawn, J.A. (1988), Bivariate extreme value  theory - models and estimation.
Biometrika 75, 397-415.
                                        65

-------
                                  Figure Captions

     Figure 2.1 Locations of 43 monitoring stations around Chicago.  The  three specific
stations identified for this study have been marked with the letters P, Q and R.

     Figure 3.1 Number of exceedances per year over threshold 119.9 for Station P. Actual
(observed) counts denoted by solid points; model fits by solid and broken lines.

     Figure 3.2 Same as Figure 3.1, but for network maxima.

     Figure 4.1 Same as Figure 3.1, but using four examples of nonlinear fits.  The solid line
represents  a least squares fit to the whole data, analogous to the method of  Bloomfield et
a/. (1993).  The three broken lines represent models refitted to the binary data consisting of
excesses over the threshold, using the probit model with linear trend and the logit models
with both linear and quadratic trend.

     Figure 4.2 Same as Figure 4.1, but for network maxima.

     Figure 4.3 Illustration  of instability in  the model of Table  4.4.   The same model
was refitted to the same data starting from the previous  fit. The result was a reduction
in negative log likelihood of just  0.002, but with substantial changes  in the  individual
parameter  estimates. In this plot,  the ratios of all 18 parameter estimates for the two fits
are plotted.

    Figure 4.4 Same pair of models as Figure 4.3, but  the forecast probabilities of exceeding
the threshold, under the two models, for each of the 2354 days in the data set, are plotted
against each other. The plotted points lie almost exactly on the straight line of unit slope
through the origin; maximum discrepancy 0.002. We conclude  that, although the two
model fits differ substantially in their parameter values, they are indistinguishable in their
predictive performance.

    Figure 5.1 Probability plot of normalized residuals for excesses over threshold 99.9 for
station P. Four models: (a) exponential distribution, (b) GPD, (c) exponential distribution
with power transformation, (d) TGPD.

    Figure 5.2 Same as Figure 5.1, but for station Q.

    Figure 5.3 Same as Figure 5.1, but for station R.

    Figure 5.4 Same as Figure 5.1, but for network maxima and threshold 119.9.

    Figure 5.5 Observed (solid points) and  predicted (solid and dotted  lines) numbers
of exceedances of four levels, based on the logit models with no trend and with quadratic
trend, for the threshold crossing probabilities and the GPD for excesses, fitted with respect
to threshold 99.9 assuming independence of daily values. The predicted values for each

                                         66

-------
year axe based on refitting the model to the rest of the data with that year omitted, and
using the refitted model to determine the expected number of exceedances in that year.

     Figure 5.6 Same as Figure 5.5,  for station Q with no-trend  and linear-trend model
fits.

     Figure 5.7 Same as Figure 5.5, for station R with no-trend and linear-trend model fits.

     Figure 5.8 Same as Figure 5.5,  for network maxima fitted to threshold  119.9,  with
no-trend, linear-trend and quadratic-trend model fits.

     Figure 5.9 Same as Figure 5.5, for station P with model fits based on the dependent
model with linear trends in both the probability of exceedance and excess value compo-
nents.

     Figure 5.10 Same as  Figure 5.8, for network maxima  with model fits based on the
dependent model with linear trends in both the probability of exceedance and excess value
components.

     Figure 5.11  Simulations of the final (dependent)  model for station P. Shown is a plot
of all values over 120,  plotted against day from 1/1/81, for the real data (top left  plot)
and five independent simulations.

    Figure 5.12 Same as Figure 5.11,  but for network maxima and the corresponding fitted
model.

    Figure 7.1 Expected exceedances for four thresholds in each year from 1959 (year  1) to
1991 (year 33). The calculations use actual meteorological data and predicted exceedances
according to the model.   The model in this case is the one fitted to network maxima
assuming independent days, but refitted without any trend  component.

    Figure 7.2 Same as Figure 7.1, but using the model fitted to Station P.

    Figure 8.1 Probability plots for  normalized residuals  for sums of excesses over the
threshold u0 =60, station  P, based on stepwise selection of covariates. Four models: (a)
exponential distribution, (b) GPD, (c) exponential distribution with power transformation,
(d) TGPD.

    Figure 8.2 Same as Figure 8.1, but relative to UQ = 80.

    Figure 8.3 Same as Figure 8.1 for station R.

    Figure 8.4 Same as Figure 8.2 for station R.
                                        67

-------
            J.4   88,2    88.0    87.8    87.6    87.4   871   87.0   86,8

            j	i	i	i	I	I	I	!	I
oo
^^
"If
                                                                            cs
                                                                            o
                                                                            H
                                                                            oo
                              Longitude (degrees W)
   Figure 2.1: Location of measuring stations with  stations P, Q

                and  R marked.

-------
      Fig. 3.1:  Exceedances and projections, station P
oo
CD
O
c
03

-------
      Fig. 4.1:  Exceedances and projections, station P
                      Nonlinear models
CO
CD
o

CO

CD
CD
O
   20
   15  -
10 -
    5  -
    o  -
            Least sq.
                              	 Logit-L
        	Logit-Q

        •  Actual
          1982     1984     1986     1988     1990      1992

                             Year


  Fig. 4.2: Exceedances and  projections, network maxima

                      Nonlinear models
   30 -
CO
Q>
O

§  20
•a
0)
0)
o
   10 -
   0 -
            Least sq.
                      Probit
Logit-L	Logit-Q

        • Actual
          1982     1984     1986      1988     1990     1992


                             Year

-------
  1.1
I 1-0 H
DC
  0.9 -
  0.8 -



  0.6 -



  0.4 -



  0.2 -



  0.0 -
         Fig. 4.3: Ratio of parameters under two fits
          Network maxima: Log it-quadratic model
                              10           15


                          Parameter
        Fig. 4.4: Scatterplot of forecast probabilities
                from two models in Fig. 4.3
  1.0 H
      0.0       0.2       0.4       0.6       0.8       1.0

-------
            Fig. 5.1: Station P excesses (u=99.9)
              (a) Exp.
                                          (b) GPD
 CD


 CD
 CO
JQ

o
2 -
    1 -
    0 -
                       i    t
       012345

              Expected



             (c) T-Exp.
                                   2.5



                                   2.0
 CD

 £  1.5 -\
 CD
 CO
.Q

O  1.0 -
                                   0.5 -
                               0.0 -
                                        rri    i
                                   0.0     1.0      2.0

                                           Expected



                                         (d) TGPD
CD


CD
CO
.a

O
    3 J
2 -
    0 -
              t   i
                        i    i
       0123456


              Expected
   1.2 -



   1.0 -



•o  0.8 -
CD


to  °-6 "
.a

°0.4-


   0.2 -


   0.0 -
                                  0.0    0.4    0.8


                                           Expected
                           1.2

-------
            Fig. 5.2: Station Q excesses  (u=99.9)
              (a) Exp.
                                           (b) GPD
•a
CD


a>
CO
    3 -
2 -
    1 -
    0 -
   3.0


   2.5
•a  p 0 -
0)  *-iVJ ~


CD  ., ,- J
co  1.5 -
.Q

O

   1.0 H
   0.5 -



   0.0 -
                i    i
       012345


               Expected
             (c) T-Exp.
       0123


               Expected
                                2.5



                                2.0
CD  1.


CD
CO


O  1'°
                                 c: .
                                    0.5  -
                                0.0 -
                                    2.5
                                2.0 -
                             "8  1.5


                             CD
                             CO
                                    0.5 -
                                0.0 -
                                                  i    i
                                   0.0  0.5   1.0  1.5  2.0


                                            Expected
                                          (d) TGPD
                                   0.0  0.5   1.0  1.5  2.0


                                            Expected

-------
            Fig. 5.3: Station R excesses (u=99.9)
              (a) Exp.
                                               (b) GPD
   3.0 -
   2.5 -
"S  2.0 -\
CD
£  1.5 -
O
   1.0 -
   0.5 -
   0.0 -
       0123
               Expected

             (c) T-Exp.
   3.0
   2.5 H

"8  2.0 -
0)
w  1 5 -
_Q  I.O 1
O
   1.0 -
   0.5 -

   0.0 -
       0
               234
              Expected
                                    2.0 -
                                   0.0 -
                                       0.0   0.5   1.0   1.5
                                               Expected

                                              (d) TGPD
                                 7=
                                 CD
1.2  -
1.0  -
0.8  -
0.6  -
0.4  -
0.2  -
0.0  -
                       2.0
    0.0     0.4     0.8
            Expected
1.2

-------
       Fig. 5.4: Network maxima excesses (u=119.9)
•a
I
CD
V)
JD
O
CD
CD
CO
rj
O
    0 -
    0  -
             (a) Exp.
       012345
              Expected
            (c) T-Exp.
      012345
              Expected
                                 3.0
                                 2.5
•o p 0 -
CD ^•u i
O
  1.0  -
  0.5  -
  0.0  -
                                 2.0 -
                                 1.5 H
CD
CD H
co  I .
                                 0.5 -
  0.0 -
             (b) GPD
0.0    1.0    2.0
        Expected
      (d) TGPD
                        3.0
      0.0  0.5  1.0   1.5   2.0
             Expected

-------
    Fig. 5.5: Station P exceedances over several levels
            Level 99.9
                                          Level 119.9
   25 -
   2o ^

CO
CD

c  15
05
•D
CD
CD  -in
O  10
    5 -
    0 -
          No trend

          Quadratic
       1982
               1986


               Year
1990
                                  15 i
                                CO


                                § 10

                                CO
                                CD
                                0)
                                O
                                   5 -
                                   0 -
                   No trend

                  Quadratic
    1982
        1986


        Year
1990
           Level 139.9
                                          Level 159.9
CO
CD
O

05
•a
CD
CD
o
   12


   10  ^


    8


    6


    4


    2  -


    0  -
          No trend

          Quadratic
                                   8
             I    I   ~    I
       1982
               1986


               Year
1990
         CO
         o
         o
        . c
         05
        T3
         CD
         CD
         O
         X
        LU
4 -
                                   2 -
                                   0 -
                  No trend

                  Quadratic
                                        I    I
1982     1986


        Year
                   1990

-------
    Fig. 5.6: Station Q exceedances over several levels
            Level 99.9
                                      Level 119.9
   20 ^
   15
CO
CD
O


CO
73

®  ^r>
CD  10
O
    5
    0 H
       No trend

       Linear
       1982    1986    1990

               Year



           Level 139.9
                                  15
                            CD 10
                            O
                            c
                            en
                            TJ
                            0)
                            Q)
                                   5 H
                               0 H
                                         No trend

                                         Linear
                                   1982
                                              1986


                                              Year
                   1990
                                      Level 159.9
w
0)
o

CO
•o
0)
CD


I
8



6



4



2



o 4
          No trend

          Linear
       1982
           1986


           Year
                       1990
                               0
                               o
                               c
                               ca
                               •o
                               CD
                               CD
                               O
6


5


4


3


2


1


o
                                         No trend

                                         Linear

1982     1986


        Year
                   1990

-------
    Fig. 5.7: Station R exceedances over several levels
            Level 99.9
                                          Level 119.9
   25 -
co  20
CD
O


I  15
CD
CD

0  10 4
    5 -
    0 -*
          No trend

          Linear
                     i    i
       1982    1986    1990


               Year



           Level  139.9
                                   15
                                §  10

                                03
                                •a
                                0)
                                CD
                                o
                                   5
                                   0 -
          No trend

          Linear
                                       1982     1986     1990


                                               Year



                                          Level 159.9
CD
o

03
T3
CD
CD
O
X
LU
    4  -
    2 -
    o  -•
          No trend

          Linear
         i    i    i    i    i


       1982    1986    1990



               Year
c
03
                                CD
                                O
3.0 -



2.5 -



2.0 -



1.5 -
                                X  1 0 -
                               UJ  '-u 1
                                  0.5 -
                                  0.0 -
                                          No trend

                                          Linear
                                       1982
               1986


               Year
                     1990

-------
                 Fig. 5.8: Network maxima,
              exceedances over several levels
           Level 119.9                   Level 139.9
   40 -
8  30
o
c
03

I  20
o
   10 -
    0 -
       — No trend

       •••• Linear

       -•- Quadratic
1982
               1986


               Year
                       1990
                           20
                                  -IK
                                CD lO
                                O
                                C
                                CO
                                CD
                                o
                                  10
                                  ' u
                                   5
                                   0 -
       No trend
       Linear
      Quadratic
    1982
1986


Year
1990
           Level 159.9
                                          Level 179.9
CO
CD
o
c
CO
T3
CD
CD
O
X
111
   12  -


   10


    8


    6


    4


    2  -


    0  -
          No trend
          Linear

          Quadratic
       1982
               1986


               Year
                1990
                        CO
                        CD
                        o

                        03
                        •o
                        CD
                        CD

                        8
                        UJ
6


5


4


3


2


1


0
                                - No trend

                                •  Linear

                                •- Quadratic
    1982
1986


Year
1990

-------
    Fig. 5.9: Station P exceedances, dependent model
           Level 99.9
                                       Level 119.9
   25  -
   20  -
CO
CD
O  -II-
c  15
co
CD
CD
o
  10 -
    5  -
    0  -
                i    i
       1982    1986    1990

              Year


          Level 139.9
                                15
                              CO

                              § 10

                              CO
0)
CD
O
                                 5 -
                                 o  -
                                             i    i
                                    1982    1986    1990

                                           Year


                                       Level 159.9
CO
CD
O
C
CO
•o
CD
CD
O
  12 -


  10 -


   8


   6


   4


   2 -


   0 -
        ITI    i   i

       1982    1986    1990


              Year
CO
0)
o
c
CO
•a
CD
CD
o
X
LU
                                 8
                                 6 -
4 -
                                 2 -
                                 0 -
                                      i    i
                                    1982
              1986


              Year
                  1990

-------
Fig. 5.10: Network maxima exceedances, dependent model
          Level 119.9
                                    Level 139.9
   40 -
 CD 30 -\
 o
 c
 OJ

 I 20
 o
   10 -
    0 -
            ir   i
       1982
           1986
              Year
1990
                             20
                          CD  15
                          o
                          c
                          03

                          CD  1Q -
                          CD  "-'I
                          O
                             0 -
   1982
1986
                                        Year
1990
          Level 159.9
                                    Level 179.9
 CO
 CD
 O
 C
 03
 •o
 CD
 CD
 O
 X
 HI
12

10  ^

 8

 6

 4

 2  -

 0  -
       1982
           1986

           Year
1990
        w
        Q)
        O
        C
        03
        T3
        CD
        CD
        O
        X
        LJJ
(3 H


5


4


3 -\
2
                                1


                                0
   1982
1986

Year
1990

-------
240 -
220 -
200 -
180 -
160 -
140 -
120 -
240 -
220 -
200 -
180 -
160 -
140 -
120 -
        Fig.  5.11: Simulation comparisons for Station P
                  Actual data
Simulation 1
                                             240 -
                                             220 -
                                             200 -
                                             180 -
                                             160 -
                                             140 -
                                             120 -
            1000
                     2000
                     Day

                  Simulation 2
                             3000
                                      4000
                                                         1000
   2000
    Day

Simulation 3
                                                                          3000
                                                                                   4000
240 -
220 -
200 -
180 -
160 -
• .
140 -
• * •
I .
1 - " ' ' • : ' • ' 120-

,
•
•
•
•
•
• •
«• * • •
1 1 1 I I 1 1 1 1 1
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Day Day
                 Simulation 4
Simulation 5
240 -I
                                             240 -|
220 -
                                             220 -
200 -
                                             200 -
180 -
                                             180 -
160 -
                                             160 -
140 -
                                             140 -
120-
            1000
                    2000
                     Day
                             3000
                                      4000
                                             120 -
                                                        1000
   2000
    Day
                                                                          3000
                                                                                  4000

-------
  Fig. 5.12:  Simulation comparisons for network maxima
                Actual data                               Simulation 1
250 -
200 -
150 -
250 -
200 -
150 -
                                       250 -
                                       200 -
                                       150-
           1000
                  2000
                   Day
               Simulation 2
                          3000
                                 4000
                                                  1000
   2000
   Day
Simulation 3
                                                                 3000
                                                                        4000
250 -
200 -
% t *
• v :H ..•.-'.' iso -
• "• -: *' ••' ."•• J ./••'•
. «• •• .1 • . £. *#• 4
•
1 •
I
* * • •
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Day Day
               Simulation 4
Simulation 5
250 -
                                       250 -
200 -
                                       200 -
150 -
                         '
          1000
                  2000
                  Day
                          3000
                                 4000
                                       150 -
                                                     *.     .   .
                                                  /  •••   •  '   *.
                                                  L x  <•  :;  ..-
                                                  1000
   2000
   Day
                                                                 3000
                                                                        4000

-------
                 Fig. 7.1: Expected exceedances 1959-1991
                       Model fitted to network maxima
   25 -
   20 -
   15
o
O
   10 -
    5 -
    0 -
	   Threshold 120

	   Threshold 140

	   Threshold 160


	   Threshold 180
                    /\
                                \
                                                     •'  5

                                                     '
                   .  /\ V
                    • -*. \

         ••*•"• A -
                              /

      o
        T

        5
 I

10
 I

15
20
25
                                                      s  \   • . i
                                                      /   \  '• •
30
                                      Year

-------
                 Fig. 7.2: Expected exceedances 1959-1991
                           Model fitted to Station P
   25 -
   20 -
   15 -
O
O
   10 H
    5 -
    0 -
        •\ /
Threshold 100
Threshold 120
Threshold 140
Threshold 160
                     •  t
                                       •' A
      0
         I
        10
 \
15
20
 I
25
30
                                       Year

-------
      Fig. 8.1: Station P, Sums of excesses over 60
0


0
CO
0


CD
co
-Q
O
8 -



6 -



4 -



2 -



0 -
            (a) Exp.
           246

             Expected



           (c) T-Exp.
      0246


            Expected
                    8
                         0


                         0
                         CO
0
CO
.a
O
   10 -



    8



    6



    4



    2



    o 4
8




6




4




2




0 -I
                                     (b) GPD
                               0246

                                     Expected



                                    (d) TGPD
                      8
      0246


            Expected

-------
      Fig. 8.2: Station P, Sums of excesses over 80
            (a) Exp.
                                     (b) GPD
CD



CD
OT
•a
CD


CD
w
.Q
o
6



5



4



3



2



1



0 -\
      0  1
    6 -
4 -
   2 -
   0 -
        23456


         Expected
           (c) T-Exp.
      02468


             Expected
   10




    8

•o

I   6
CD


O   4 -




    2 -




    0 -
•a
CD


CD
w
.Q

o
      0
6 -



5 -



4 -



3 -



2 -



1 -



0 -
        5    10    15


         Expected
                                    (d) TGPD
                               0246


                                      Expected
                        8

-------
      Fig. 8.3: Station R, Sums of excesses over 60
            (a) Exp.
                                     (b) GPD
T3
CD


CD
OT
.a

O
T3
CD



CD
C/)
.Q

O
6 -



5 -



4 -



3 -



2 -



1 -



0 -
5



4



3



2



1



0 -I
            246


             Expected




           (c) T-Exp.
            23456


             Expected
•a
CD


0
w
.a

O
-a
CD


0
                                5 -
4



3



2



1
                                0 -
                               01234


                                      Expected




                                    (d) TGPD
6



5 H



4



3



2



1



0 4
                                      2345


                                      Expected

-------
0)


0)
CO
J2

O
•a
0)


0)
co
      Fig. 8.4: Station R, Sums of excesses over 80
    6 -
4 -
    2 -
    0 -
   8
   6 -
4 -
   2 -
   0 -
            (a)  Exp.
      012345


             Expected
           (c) T-Exp.
               i  i
      0123456


            Expected
   1C) -



    8


o
fc   6 H
0)


O   4 -



    2 -



    0 -
•a
CD


0)
10  ^


 8


 6  -


 4  -


 2  -


 0  -
                                     (b) GPD
                               02468


                                      Expected
                                    (d) TGPD
                               02468


                                      Expected

-------