United States      Office of Air Quality      EPA-450/4-81-031e
           Environmental Protection  Planning and Standards     September 1981
           Agency        Research Triangle Park NC 27711

           Air
v>ERA     Evaluating Simple Oxidant
               Prediction Methods
                  Using Complex
             Photochemical Models

              Cluster Analysis Applied
          To Urban Ozone Characteristics

-------
This report was furnished to the U.S.  Environmental  Protection
Agency by Systems Applications,  Incorporated in fulfillment of
Contract 68-02-2870.  The contents of this report are reproduced
as received from Systems Applications, Incorporated.   The opinions,
findings and conclusions expressed are those of the  author and not
necessarily those of the Environmental Protection Agency.  Mention
of company or product names is not to be considered  as an endorsement
by the Environmental Protection  Agency.

-------
                                 EPA-450/4-81-031e
Evaluating Simple Oxidant Prediction
        Methods Using Complex
         Photochemical Models
    Cluster Analysis Applied To Urban
          Ozone Characteristics
             EPA Project Officer: Edwin L. Meyer, Jr.
                    Prepared by

              U.S. Environmental Protection Agency
               Office of Air, Noise and Radiation
            Office of Air Quality Planning and Standards
            Research Triangle Park, North Carolina 27711

                   September 1981

-------
                             EXECUTIVE  SUMMARY
     Control  of urban ozone pollution poses a unique problem because ozone
is not directly emitted into the atmosphere by anthropogenic sources.
Rather, it results from atmospheric photochemical  reactions involving
hydrocarbons and nitrogen oxides.  Because the reactions involved in ozone
formation take several  hours to produce maximum ozone levels, the
dispersion during such periods results in ozone's  being a regional, rather
than a local, problem.  Thus, the processes involved in determining the
location and severity of ozone concentrations include the temporal and
spatial characteristics of the emission rates of the two precursor
species, transport and dispersion by meteorological  and topological
effects in the region, and photochemical reactions dependent on solar
intensity, temperature, and the like.

     Currently, the only satisfactory way to characterize the temporal and
spatial nature of ozone pollution in an urban area is through the use of
complex mathematical models.  Because of the complexity of these models,
the costs associated with data gathering and their exercise on a computer
can be substantial.  Moreover, considerable expertise is needed to
successfully mount a full-scale study of an urban  area.  It is thus
fruitful to search for ways in which knowledge gained in one application
can be transferred to application in another urban area.  Specifically, if
two urban areas can be shown to have sufficiently similar characteristics
with respect to their ozone problems, a control strategy developed for one
through application of a complex model could also be applied to the other,
thus potentially obviating the need for a second costly study.

     This report covers two exploratory studies that apply multivariate
clustering techniques to the identification of similarities between urban
areas.  The results showed promise in assigning urban areas to distinct,
relatively homogeneous classes; however, no clear-cut classification was
achieved.  The more qualitative of the two techniques showed a greater
ability to classify areas into well-defined groups,.  The results  indicate
that further work is needed to refine the choice of classificatory
variables, and that other techniques might be applied with greater
success.
 8032^2  1

-------
                                 CONTENTS


EXECUTIVE SUMMARY 	 ii
LIST OF ILLUSTRATIONS	 iv
LIST OF TABLES	  v
  I   INTRODUCTION	  1
 II   CLASSIFICATION OF URBAN AREAS BY PROFILE ANALYSIS	  4
III   CLASSIFICATION OF URBAN AREAS BY HIERARCHICAL CLUSTERING	 14
      A.  Stepwise Discriminant Analysis	 17
      B.  Cluster Analysis	 21
 IV   SUMMARY AND RECOMMENDATIONS	 33
REFERENCES	R-l
8032*^2 1

-------
                               ILLUSTRATIONS



 1    Profile of Cluster 1	„	  7

 2    Profile of Cluster 2	  8

 3    Profile of Cluster 3	  9

 4    Profile of Cluster 4	  10

 5    Profile of Cluster 5	  11

 6    Profiles of Denver, Phoenix, and Salt Lake City 	  12

 7    Urban Areas Included in Hierarchical Cluster
      Analysis	  16

 8    Dendogram Based on All Variables	  24

 9    Dendogram Based on Meteorological  and Emissions
      Vari abl es	  25

10    Dendogram Based on Meteorological  Variables Excluding
      Temperature Va ri abl es	  27

11    Dendogram Based on Ozone Level Variables	  29

12    Dendogram Based on Meteorological  Variables	  30

13    Dendogram Based on Emissions Variables	  31
8032^2 1                          IV

-------
                                  TABLES
1    Urban Areas Included in the Profile Analysis	  5

2    Urban Areas Included in Cluster Analysis	 15

3    Ozone Monitors Corresponding to Certain Urban Areas	 18

4    Urban Area Classifications	 18

5    Identification of Urban Area Classifications	 19

6    Variables Entered and Percent of Cases Classified
     Correctly at Each Step of Discriminant Analysis	 22

7    Summary of Clusters Based on Ozone, Meteorological,
     and Emission Variables	 32

-------
                              I    INTRODUCTION
      Ozone  is  unique  among  regulated air pollutants  in that it is not
 directly  emitted  into the atmosphere from anthropogenic sources.  Rather
 it  results  from atmospheric photochemical reactions  involving hydrocarbon
 (HC)  and  nitrogen oxide  (NOX) precursors, which are  emitted in varying
 amounts by  industrial, utility, and automotive sources.   Levels of atmos-
 pheric ozone thus depend not only on the usual factors of atmospheric
 transport and  dispersion and on the amount of pollutant emitted, but also
 on  the relative amounts  and spatial distribution of  emissions of two pre-
 cursors and on the  level of solar radiation necessary to  initiate the
 ozone-producing reactions.   The speed as well as the extent of the reac-
 tions depend on the ratio of HC to NOX and on the  level of solar radia-
 tion.

      Because the  ozone-producing reactions take several hours to produce
 the maximum amounts of ozone, by the time this maximum has been reached
 much  atmospheric  dispersion has taken place.  Thus,  ozone tends to be a
 regional, rather  than a  local, problem.  Furthermore, since a period of
 maximum ozone  concentration depends on the amount  of solar radiation
 available to sustain  the photochemical reactions leading  to its produc-
 tion, the highest concentrations of ozone are reached during summer and
 early fall  when higher insolation is observed.

      The  regional nature of the problem and the lack of a direct source-
 receptor  relationship make  the direct control of ozone concentrations,
 using emissions control  strategies, more difficult.  Currently, the
 concentration  level at which the NAAQS is set is 0.12 ppm, and many urban
 areas exceed this level  more than the prescribed one time per year.
 However,  control  of HC or NOX emissions, or both,  does not necessarily
 produce corresponding reductions in ozone (EPA, 1977a).   In fact, the
 benefit to  be  derived from  controlling a given set of sources in an urban
 area  depends on current  levels of HC and NOX as well as on the location of
 those sources  relative to the location of observed ozone  maxima.  To
 assess potential  benefits of different strategies, many methods have been
 developed and  applied in recent years.

      The  simplest of  these  methods is that of proportional rollback, which
 assumes that a reduction in HC emissions will result in a proportional
 reduction in ozone.   As  pointed out above, this approach  does not work,

32«»R/2                               1

-------
not only because of the influence of NOX levels  on ozone-producing  reac-
tions, but also because the relationship between HC and  ozone  is  non-
linear.  Other rollback methods that take nonlinearity into account (e.g.,
the Appendix J method, 40 CFR)  also fail because of their  lack of con-
sideration of NOX.

     More complex methods, that account for the  dependence of  ozone
concentrations on both hydrocarbon and nitrogen  oxides (EPA, 1977a), are
more successful in describing or predicting the  results  of potential ozone
control strategies.  However, these models concentrate on  the  chemistry of
the problem and do not account  for the spatial relationships between emis-
sions sources and transport and dispersion phenomena in  the region  of
interest.  To include all aspects of the problem, simulation models have
been developed that account for emissions and their spatial and temporal
relationships, atmospheric photochemical reactions, and  atmospheric trans-
port and dispersion.  Such models are required to solve  a  large and com-
plex set of equations describing all of the atmospheric  phenomena listed
above.  Moreover, these equations are solved many times  through a series
of time steps to yield temporally- and spatially-averaged  ozone
concentrations.

     Such photochemical simulation models are computationally  extremely
complex, and for use in simulating a major urban area they require  access
to a large computer.  In addition, because of the complexity of the com-
puter programs, a knowledge of atmospheric pollution processes and  com-
puter programming is necessary.  An additional characteristic  of  these
models that discourages their use for many potential ozone problems is  the
cost of setting up and running the program.  Before applying such a model
to an urban area, an extensive data base containing emissions  and meteoro-
logical data on a temporally- and spatially-resolved basis must be
developed.

     Because of the cost and the difficulties involved in  applying  complex
photochemical models to a large number of urban  areas, it  seems fruitful
to investigate ways in which knowledge and experience gained by model
application to one city could have transfer value when a second city is
being considered.  To that end, we have applied  multivariate clustering
techniques to pertinent emissions, meteorological, and ozone-level  data
from several cities in the United States.  The objective of the study  was
to determine whether urban areas could be objectively classified  by
characteristics relevant to photochemical pollution.  Identification of
city groups with similar characteristics would permit a small  number of
prototypical cities to be subjected to detailed  analysis using complex
photochemical models.  Results obtained for one  city within a  group could
be used in evaluating possible control strategies for the rest of the
group.  In addition, the performance of a model  when applied to a proto-
typical city could be used as an indicator of its likely performance when
applied to other cities in the same group.

                                     2                             32UR/2

-------
      The work described in this report covered two essentially exploratory
 studies using different clustering techniques:  first, a study employing
 profile analysis to identify similarities between 29 urban areas (chapter
 II), and second, a further analysis that applies a hierarchical  clustering
 technique to data from 45 urban areas (chapter III).

      Cluster analysis comprises a set of mathematical  techniques that are
 used to examine and develop underlying structure in multivariate data.
 The techniques range from fairly mathematical  to almost purely descrip-
 tive, and have been applied to an extremely wide range of data sets
 (Hartigan, 1975).  Clustering techniques can be thought of as a qualita-
 tive analog of regression analysis in terms of describing structural  con-
 tent of multivariate data.  In carrying out regression analysis a well-
 defined mathematical model of the data structure is used; however, in
 cluster analysis one frequently has no preconceived notions about an
 appropriate model.  Thus, whereas the goal  of regression analysis is  to
 estimate the parameters of the model, the purpose of cluster analysis is
 often merely to see whether or not the data fall into any reasonably  well-
 defined groups.
32«tR/2

-------
          II   CLASSIFICATION OF URBAN AREAS BY PROFILE ANALYSIS
     In this study the urban areas chosen were those where the ozone NAAQS
was exceeded by more than 100 percent during the period 1974 to 1976 (EPA,
1977b).  The areas included are shown in table 1.   For classification
variables, five meteorological and two emissions-related variables were
chosen.  These were:

     >  Summer morning mixing height.  This quantity was chosen as
        a measure of the volume into which morning emissions are
        injected.  These emissions are mainly responsible for
        ozone formation in the afternoon.

     >  The difference between summer afternoon and summer morning
        mixing height.  This variable represents the degree to
        which morning emissions are diluted by the increase in the
        depth of the mixed layer.

     >  Summer afternoon wind speed. This measure represents the
        dilution of the pollutant  cloud in the afternoon, when
        ozone reaches its peak in  some areas.

     >  Normal daily July maximum  temperature.  Ozone formation is
        a strong function of temperature, and areas having high
        maximum temperatures are expected to have a greater
        potential for ozone formation.

     >  Mean daily July solar radiation.  Ozone is formed photo-
        chemical ly, and the amount of solar radiation is an
        important measure of its formation potential.

     >  Ratio of hydrocarbon to nitrogen oxides emissions.  This
        ratio affects the rates of atmospheric photochemical reac-
        tions and the amount of ozone that can be formed.

     >  Percentage of nitrogen oxides from transportation
        sources.  This variable was chosen as a surrogate for the
        mix of point and mobile sources in the area.  This factor
        can have important effects on ozone impacts.
                                                                     32UR/2

-------
          TABLE 1.    URBAN AREAS INCLUDED IN THE PROFILE ANALYSIS
                   Hartford-New Haven, Connecticut
                   Philadelphia, Pennsylvania
                   Chicago, Illinois
                   Milwaukee, Wisconsin
                   Houston, Texas
                   Denver, Colorado
                   San Francisco Bay Area, California
                   Fresno, California
                   Boston, Massachusetts
                   Northern New Jersey
                   District of Columbia
                   Erie, Pennsylvania
                   Richmond, Virginia
                   Newport News, Virginia
                   Huntsville, Alabama
                   Tampa, Florida
                   Louisville, Kentucky
                   Nashville, Tennessee
                   Kingsport, Tennessee
                   Detroit, Michigan
                   Minneapolis-St. Paul, Minnesota
                   Cincinnati, Ohio
                   Cleveland, Ohio
                   Baton Rouge, Louisiana
                   Dallas, Texas
                   Wichita, Kansas
                   St. Louis, Missouri
                   Salt Lake City, Utah
                   Phoenix, Arizona
32<*R/3

-------
     The values of the variables for the different urban areas (mixing
heights and wind speeds) were taken from Holzworth (1972).  the Climatic
Atlas of the United States 1968  and the 1973 Emissions Trends report
(EPA, 1974).  The meteorological data were interpolated from the maps
given in the two compilations.

     Profile analysis, as a cluster analysis technique, falls into a more
qualitative class and is very simple in both concept and execution.  First
a set of axes, one for each variable, is laid out.  Scales  on these axes
are chosen so that the ranges of the variables over tne complete set of
urban areas (or "cases") cover approximately equal lengths.  Each case is
then plotted on each axis according to its value for that variable.  A
"profile" for each case can then be constructed by joining  all of the
plotted points for that case.  The similarities between the profiles can
then be used to judge similarities between urban areas.  In addition,
points of difference can be identified by discrepancies between
profiles.  Using this technique, we identified five clusters of cities,
which are illustrated in figures 1 through 5.

     In figure 1, Los Angeles and San Francisco are identified as forming
a fairly good cluster of coastal California urban areas.  Some discrepancy
is noted in July maximum temperatures, but the value obtained from the
climatic atlas has more marine influence than do the values obtained in
the area having the highest ozone concentration.  In figure 2, a cluster
of midwestern urban areas consisting of Chicago, Milwaukee, Minneapolis-
St. Paul, Detroit, and Cleveland is identified.  Although their climates
are predictably similar, the emissions-related variables also reinforce
the clustering.  Figure 3 shows a group of urban areas, all of which are
on the eastern seaboard of the United States, stretching from Connecticut
to Virginia.  In figure 4 we have a cluster of eastern United States
cities consisting of Washington, D.C., Louisville, Nashville, Kingsport,
Cincinnati, and St. Louis.  These cities can be differentiated from those
in the previous cluster on the basis of lower wind speeds,  which are
presumably related to greater distance from the ocean.  This cluster seems
to be closer in terms of meteorology than it is in terms of emissions-
related variables.  Figure 5 shows profiles of three urban areas located
near the Gulf of Mexico:  Houston, Tampa, and Baton Rouge.   While these
cities do not form a particularly tight cluster, the similarities between
them are evident.

     In figure 6 we show an example of three profiles that, though they
might be expected to show similarities, in fact show large discrepan-
cies.  These profiles are for Denver, Phoenix, and Salt Lake City.  The
largest discrepancies are for July maximum temperature, July insolation,
and the ratio of HC to NOX emissions.
                                                                32I+R/2

-------
 SUMMER
 MORMNG
 MIXING
 HEIGHT
800-
                       SUMMER
           CHANGE IK  AFTERNOON      JULY
             MIXING      HIND       MAXIMUM      JULY
             HEIGHT      SPEED     TEMPERATURE  INSOLATION

                                      U»F
         I NOX FROM
HC/NO   TRANSPORTATION
  I  x
700-
600-
400-
300-
                                  100-J       700-J       0.5-
500-1     2000-
200 -J     3500-1
                  FIGURE  1.   PROFILE OF CLUSTER 1

-------
MORNING
 MIXING
 HEIGHT
800-
            SUMMER
CHANGE IN  AFTERNOON       JULY
 MIXING       HIND       MAXIMUM       JULY
 HEIGHT      SPEED    TEMPERATURE  INSOLATION
         J NO* FROM
HC/NOx  TRANSPORTATION
700-
                 FIGURE  2.   PROFILE OF CLUSTER 2
                                     8
                                                                 32U  R/5

-------
 Sl»"1ER                 SUMMER
MORNING    CHANGE IN  AFTERNOON      JULY
 MIXING       MIXING      WIND       MAXIMUM       JULY               ,JLLsB,,T,«i.
 HEIGHT       HEIGHT      SPEED     TEMPERATURE  INSOLATION     HC/NO   TRANSPORTATION
800 H
700H
600 -i
500 H
400H
300 H
                                                NJ
                                                NNEWS
200 -J     3500-J
«-J       TOO-'       700-1       0.5-J
                  FIGURE  3.   PROFILE  OF  CLUSTER 3

-------
 SUPMER
 MORI, ING
 MIXING
 HEIGHT
800-
700-
600 J
500-1
           SUMMER
CHANGE IK  AFTERNOON      JULY
 MIXING      HIND      MAXIMUM      JULY
 HEIGHT     SPEEO     TEMPERATURE  INSOLATION
500-
1 000-1
            8-
               DC
               KINGSP
                INCI
                                85-
80-
ZOO-"     3500-J
                         CINCI  95-
                         KINGSP
                         NASHV
            4-1      100 -J       700-1
500-
                                                     l NOX FROM
                                            HC/NO   TRANSPORTATION
                                                     100-
                                                    3,0-
DC
                                    KINGSP
                                          2,5-
                                    NASHV
                                                     70-
                                                               90-
                                                               80-
                                                         KINGSP30-
                                               LOUISV
                                               DC
                                            •5 -'NASHV 2°
                                                                   
-------
                      SUHMER
                     HORWNG
                      MIXJNG
                      HEIGHT
                     BOO-
            SUMMER
CHANGE IN  AFTERNOON      JULY
 H1X1NG      WIND       MAXIMUM      JULY
 HE1CHT      SPEED     TEMPERATURE  INSOLATION
         J NOX FROM
HC/NOx  TRANSPORTATION
                     700-
                     600-
                     500--
                    400-
                    300-
                                                                                           TAMP
                                                                                           HOUS
                                                                                           B.ROUG
                    200-*     3500J
                      100 J       700-1       0.5-1       20J
                                      FIGURE  5.   PROFILE  OF  CLUSTER  5
32**R 5
                        11

-------
  HORDING
   MIXING
   HEIGHT
           SUMMER
CHANGE IN  AFTERNOON      JULY
 NIXING      HIND      MAXIMUM      JULY
 HEIGHT     SPEED    TEMPERATURE  INSOLATION
         S NOX FROM
HC/NOx  TRANSPORTATION
                                                     0.5-J       20J
FIGURE 6.   PROFILES OF DENVER,  PHOENIX,  AND  SALT LAKE CITY
                                12
                                                              32«»R/  5

-------
     This preliminary analysis demonstrated the feasibility of grouping
urban areas according to the factors contributing to oxidant problems.
However, profile analysis gives no information of a quantitative nature
about the degree to which cases in a given cluster resemble each other and
differ from cases in other clusters.  Moreover, since the clusters  are
identified by visual inspection of profiles, there is an arbitrary  element
in the selection of cases.  We therefore carried out a further study  in  an
attempt to achieve a more quantitative clustering and to consider more
variables in the clustering process.
32W2                             13

-------
      Ill   CLASSIFICATION OF URBAN AREAS BY HIERARCHICAL CLUSTERING
     For this further analysis,  we examined data for 45 urban areas.   They
were selected as follows:   First,  we took those major urban areas that
requested an extension to  1987 of  their attainment date for the ozone
NAAQS (Federal Register, 44. 65667).  Of these, we eliminated Wilmington,
Delaware, because some of  the required data were not available.  To this
list of urban areas we added six more, to have more comprehensive geo-
graphical coverage of the  United States.  The urban areas included are
shown in table 2 and figure 7.

     Three types of data were compiled for each urban area:  emissions,
climatological, and ozone  levels.   Emissions data for each area, obtained
from the National Emissions Data System, were taken for each county within
that area.  Three emissions variables were included: total HC emissions,
the ratio of HC to NOX emissions,  and CO emissions from transportation
sources.  The HC and NOX emissions influence the ozone-producing chemical
reactions as detailed above and, thus, should be important in classifying
urban ozone problems.  We  included CO emissions from transportation
sources as a surrogate for vehicle miles traveled in an area.  The amount
of transportation-related  emissions is an important, facet of urban ozone
problems.

     Climatological data were again obtained from Holzworth (1972) and the
Climatic Atlas of the United States, 1968.  Data were interpolated from
maps or taken from tabular compilations in these documents.  Since ozone
is a regional problem, regional  climatology is likely to be more apposite,
it therefore seems appropriate to  use data interpreted on a large-scale
rather than local-scale, climatology.  The climatological temperature data
used for the analysis were June, July, August, and September maximum
temperatures, the annual maximum temperature, and the average maximum for
June through September.  This late-summer period generally produces the
highest ozone concentrations.  Three variables related to the amount of
sunlight at each location  were obtained:  the total hours of insolation in
the summer months, the average daily summer insolation, and the average
percent of cloud cover.

     Since the ozone-producing reactions are initiated and sustained by
sunlight, ozone formation should be sensitive to the amount of sunlight
incident at a specific location.  Average summer morning and afternoon

                                   14                            32HR/2

-------
            TABLE  2.    URBAN  AREAS  INCLUDED  IN  CLUSTER  ANALYSIS
              1    Allentown,  PA
              2    Baltimore,  MD
              3    Boston, MA
              4    Bridgeport  CT
              5    Butte, MT
              6    Chicago, IL
              7    Cincinnati, OH
              8    Cleveland,  OH
              9    Dallas, TX
             10    Dayton, OH
             11    Denver, CO
             12    Detroit, MI
             13    Fresno, CA
             14    Hartford, CT
             15    Houston, TX
             16    Indianapolis, IN
             17    Kansas City, KN
             18    Los Angeles, CA
             19    Louisville, KY
             20    Miami, FL
             21    Milwaukee,  MI
             22    Minneapolis, MN
             23    New Haven,  CT
             24    New Orleans, LA
25    New York, NY
26    Philadelphia, PA
27    Phoenix, AZ
28    Pittsburgh, PA
29    Portland, OR
30    Providence, RI
31    Richmond, VA
32    Sacramento, CA
33    Salt Lake City, UT
34    San Bernardino, CA
35    San Diego, CA
36    San Francisco, CA
37    Scranton, PA
38    Seattle, WA
39    Springfield, MO
40    St Louis, MO
41    Trenton, NJ
42    Ventura - Oxnard, CA
43    Washington, DC
44    Worcester, MA
45    Youngstown, OH
324R/3
                                   15

-------
                                                             re
                                                             0)
r«
re
                                                                              1/1
           •r-    C
                       V) =>
re
c  -C

                       re x
                       C-r-
                       reu
              0)
              .*
              re x
                                                                              O
                                                                              C£
                                                                              «r
                                                                              Qi
                                                              O
                                                              UJ
                                                              Q
                                                                               OO
                                                                               «t
                                                                               UJ
                                                                               Q£
                                                                               
-------
wind speeds were obtained because a higher average wind speed should favor
dispersion of precursor emissions and thus limit the concentrations of
ozone that can be formed.  Average summer morning and afternoon mixing
heights were also recorded, as well as the change in average mixing height
from morning to afternoon.  The height of the mixing layer gives a measure
of the effective volume into which emissions are discharged, and the con-
centrations reached are to a first-order approximation inversely propor-
tional to this volume.  Moreover, the change in mixing height is a measure
of the dilution of morning precursor emissions.  In some hot, interior
locations a low morning inversion is largely dissipated by afternoon,
whereas in a coastal location an inversion can persist into the afternoon,
trapping pollutants into a concentrated layer near the ground.

     The ozone data used in this study were obtained from the Monitoring
and Data Analysis Division of the Office of Air Quality Planning and
Standards.  They consisted of the maximum and second highest ozone level,
the average ozone level, and the number of exceedances of the ozone
standard for 1978.  In cases where data from more than one station were
available for an urban area, the stations with the readings most represen-
tative of the area's ozone problems were chosen.  The areas for which a
differently located monitor was used are shown in table 3.
A.   STEPWISE DISCRIMINANT ANALYSIS

     We first attempted to reduce the number of variables to be considered
by ascertaining which of the total number were most effective in discrimi-
nating between levels of severity of ozone problems.  To do this we
applied stepwise discriminant analysis, using the variables related to
ozone concentration level to classify the cases.  The cases were classi-
fied into five groups of approximately equal size using the variable
values shown in table 4, which also shows the urban areas in each group,
it may be seen that the groups do vary according to the variable used for
classification.

     Since we carried out the discriminant analysis in a stepwise manner,
those variables entered early in the analysis should be the most influen-
tial in discriminating between the groups shown in table 4 (an analogy can
be drawn using stepwise regression).  Ideally, the results of the three
classifications would show the same variables to be important, but the
results obtained allowed only general conclusions to be drawn.

     Table 5 shows the order of entry of variables for the three cases run
and the percentage of cases correctly classified, for the first 14
steps.  Entry of variables was halted when an entering variable had a
squared multiple correlation coefficient (R ), with the other variables,
of more than 0.99.  At this stage, more than 60 percent of the cases were
324R/5                               17

-------
      TABLE 3.   OZONE MONITORS CORRESPONDING TO CERTAIN URBAN AREAS
                    Area             Monitor Used
                New York            Richmond County
                Philadelphia        Morristown
                Springfield         Amherst
                Cleveland           Painesville
                San Diego           Escondido
                Ventura             Simi Valley
                New Haven           Derby
                Bridgeport          Greenwich
                San Francisco       San Jose
                Dallas              Arlington
                  TABLE 4.  URBAN AREA CLASSIFICATIONS
Second Highest
Ozone
Group
No.
Values
(pphm)
No. of
Cases
Average Ozone
Concentration
Values
(pphm)
No. of
Cases
Number of
Exceedances
Values
(pphm)
No. of
Cases
1      Less than  12      6       Less than 6      9      Less than  5     13
2              12-16     11               6-7      8             5-10      8
3              16-18     10               7-8     10            10-15      9
4              18-20      9               8-9      1            15-20      6
5      More than  20      9       More than 9      6     More than 20      9
                                                   324R/3
                                18

-------
TABLE 5
IDENTIFICATION OF URBAN AREA CLASSIFICATIONS
    (a)   Based  on  Second  Highest  Ozone  Concentration
Group No.
1
Portland
Miami
New Orleans
Dallas
Minneapolis
Butte





2
Boston
Springfield
Worcester
Trenton
Youngstown
Dayton
Indianapolis
Denver
Phoenix
Fresno
Seattle
3
Washington
Pittsburgh
Detroit
San Diego
Providence
Hartford
Al lent own
Scranton
San Francisco
Kansas City

4
New York
Philadelphia
Baltimore
Chicago
St. Louis
Cincinnati
Milwaukee
Sacramento
Louisville


5
Houston
Los Angeles
Cleveland
Ventura
New Haven
Bridgeport
Richmond
Salt Lake City
San Bernardino


       (b)  Based on Average Ozone Concentration
Group No.
1
Boston
Worcester
Trenton
Seattle
Portland
Miami
New Orleans
Minneapolis
Butte



2
Springfield
Providence
Hartford
Denver
Phoenix
San Francisco
Fresno
Dallas




3
New York
Washington
Cincinnati
Detroit
Milwaukee
Sacramento
New Haven
Youngstown
Dayton
Kansas City


4
Philadelphia
Baltimore
Chicago
Pittsburgh
Cleveland
San Diego
Bridgeport
Al lentown
Scranton
Richmond
Indianapolis
Louisville
5
Houston
St. Louis
Los Angeles
Ventura
Salt Lake City
San Bernardino






                     19

-------
        TABLE  5 (Concluded)
(c)   Based on number of exceedances
Group No.
1
Springfield
Worcester
Trenton
Youngstown
Denver
Phoenix
Seattle
Portland
Miami
New Orleans
Dallas
Minneapolis
Butte
2
Boston
Baltimore
Cincinnati
Providence
Dayton
Indianapolis
San Francisco
Fresno





3
New York
Detroit
Milwaukee
Sacramento
San Diego
Hartford
New Haven
Scranton
Kansas City




4
Washington
Chicago
Pittsburgh
Bridgeport
Al 1 entown
Louisville







5
Philadelphia
Houston
St. Louis
Los Angeles
Cleveland
Ventura
Richmond
Salt Lake City
San Bernardino




            20

-------
correctly classified.  However, with 5 to 6 variables, over 50 percent
could be correctly classified.  The data in table 6 show that somewhat
different variables are important in discriminating between groups based
on the three criteria, as would be expected given the different composi-
tion of the groups for different classification variables.

     Some general conclusions can be drawn from these discriminant
analyses, however:  First, the effect that appears to be most important
overall is the insolation; cloud cover and total and average insolation
are among the first variables to be entered in each case.  Next most
important appear to be precursor emissions, since all three of these vari-
ables are brought in among the first eight or so.  After these two
factors, it appears that some measure of ventilation (that is, a wind
speed or a mixing height, or both) is brought in.  Summer temperatures do
not appear to have great importance in the classifications; they are only
used after many other variables have been brought in.

B.   CLUSTER ANALYSIS

     We had hoped that the discriminant analysis would give us a clear
picture of the most influential variables to include in a cluster analy-
sis.  Because this did not happen, we tried clustering the cases on the
basis of several  different sets of variables.  According to Hartigan
(1975), this method can be used to test the stability of the clustering
process; that is, clusters that persist for different combinations of
variables have a greater probability of representing a real effect.
Accordingly, we carried out clustering using the program BMDP2M (Dixon and
Brown, 1979), with the following sets of variables:

     1)  All variables (ozone levels, meteorological variables,
         emissions).

     2)  Meteorological and emissions variables.

     3)  Meteorological variables excluding temperature variables.

     4)  Ozone level variables.

     5)  Meteorological variables.

     6)  Emissions variables.

     The clustering based on all variables should give an indication of
overall effects.   Analyses 2 and 3 give a clustering based on ozone forma-
tion potential, analysis 3 being restricted by eliminating temperatures,
which were shown to be relatively unimportant by the discriminant analy-
324/2

-------
u
                       I/)
                   ^  -
                                            X>—
                                                                       o>


                                                                       4->  OJ
                                                                       (0  S-
                        Ol CL 4-> E  L.  O>

                     c         i- *j  CL CL

                    ••-  c c  CL x  OJ
                                                C  O O ••-  -i-    4J     X -r-

                                            O     ini/)O>      O  x O)  E
                                                  •= '!Z _xc c  E
                                                                          4J  C
                                                               i_

                                                               CL

                                                               OJ


                                                               X

                                                               E
oo
00
UJ
to

o

u.
o


z
LU

OC
LU GO
fN  t.^


O >•

Ig

o <:
UJ
a: t-
LU Z

Z Z
LU i—i

OO •—'
LU Q£
—I O
CO bO
VO

LU
_J
CO
             •o
              O)
             CO
              0)
 O)  C
 c  o
 O "-
 IM •<->
<*^  w

 
 1.  U
 ft>  c

<5
              c
              o
              (U

              JO
                           iff
                            0)

                            c
                           LU
                            1C
    0)     tJ
 CO)     J-
 O  Q.
•i-  «/)     «/)
*J         C
 T3 ^3     O
^^  ^«    ,^_

 O •«- t/)  t/»  «/»

 C     O •*-  O tt)

    O in  O)  01 O

 CD C ••-   X-i-
                                                         i— ivofOir>'9-csjvo
 c:  4J        01
 O  JC        C.      a>
 c  c     J-  E  CL CL
                                                             X  C  Q.+J  Q>


                                                                •r-  4-> (C  X •!-



                                                             o>  c  E o>4-> c
                                      ><4-OOOi—  OJZ
                                                                    3  >  3
                   0)
                   c
                   o
                   rsi  c
                   O  O
                    Vt  TJ

                    en c
                   •p-  O>
                   z  u

                   •o  o
                    c u
                    o
                    u
         LU

         0>

         Lo
                    tt)   •
                   4-»  O
                   CO Z
                                                              22
                                                                                                                       32<*R/5

-------
 sis.   Analyses 4,  5, and 6 should show groups of cities that have similar
 overall  ozone problems, similar meteorology, and similar emissions,
 respectively.

      Output from BMDP2M includes a dendogram (Everitt, 1977).  The
 dendogram based on all variables is given in figure 8.  Identification of
 clusters is still, to a degree, a matter of judgment, but the dendogram
 gives  quantitative information on the similarities between cases.  The
 distance measure used is the Euclidean distance between cases:
                              k=l

where x^  is the value of the  i-th variable  in the k-th case.  The program
standardizes the variables to  z-scores  (subtract mean and divide by stan-
dard deviation) so that distances are comparable for all variables.  The
dendogram  produced by the program is based on the single-linkage algorithm
(Everitt,  1977).   In this method, cases are  joined according to the dis-
tance between them, with the closest being joined first. (The separation
between cases is read from the distance scale on the figure.)  Clusters
are  identified visually, and several can be  seen in figure 8, though the
appearance of this dendogram  is indicative of little group structure
(Everitt,  1977).   Five clusters can be tentatively identified:

     1)  Boston, Hartford, Bridgeport,  New Haven, Providence,
         Worcester, Springfield, Scranton.

     2)  St. Louis, Kansas City.

     3)  Cleveland, Detroit, Milwaukee.

     4)  Baltimore, Washington, Allentown, Richmond, Youngstown,
         Dayton, Indianapolis, Louisville, Cincinnati,
         Philadelphia.

     5)  Fresno, Sacramento.

The  first  cluster  would represent urban areas in New England, and the
second, the midcontinent.  The third cluster has cities in the Great Lakes
area, and  the fourth includes  the Ohio  river valley and the East Coast.
Cluster 5  has warm, dry, interior California areas.  Thus, these clusters
can  be interpreted mainly on a geographical  basis.

     Figure 9 shows the dendogram based on meteorological and emissions
variables.  Again, there is a  lack of obvious clusters, though more group
32^/2                             23

-------
                                                      I
                                                                                                                   oo
                                                                                                                   LU
                                                                                                                   CD
                                                                                                                   O

                                                                                                                   o
                                                                                                                   UJ
                                                                                                                   I/)
                                                                                                                   et
                                                                                                                   CD
                                                                                                                   O
                                                                                                                   O
                                                                                                                   CO

                                                                                                                   UJ
O

csi
o

CO
                                                     to

                                                   9DUEJSIQ
O

OJ
o
o
                                                            24
                                                                      32UR/5

-------


















































1
0
ff>


















































1 1
0 0
• •
00 t^






































































































1
o
10


















































1
0
*
ir>















































































































































































































0
«st



































































































































-I











_J











C
c
	 t
/r
— — — CC
F, ,.., 	 , . ,,, +fh

1C
_ 1-J
r n.
1 	 8
1 r-
M tl
T*.
M IJ
r\r
UL
Ofr
t. ... QT
yi
^
... , /

01
1 T
	 rt
L r1 -t
J 1C
H 4l
o*1
06
IT
	 1 i:;
• 	 9
*-i
bo
81

be

?!>
*"•
yt.
be
r
rr
TT
n-7

+f-5

	 5
rr
j I
/•J
frr
VL
1 " • "" — PT

ct
81
1 1 1
3 0 O 0
O 
                                                                C5
25

-------
structure is apparent than in figure 8.   However, it is hard to identify
many clear clusters.  Possible clusters  are:

     1)  Boston, Scranton, Springfield,  Worcester, Milwaukee,
         Minneapolis, Cleveland, Bridgeport,  New Haven, Hartford,
         Trenton, Philadelphia, Providence.

     2)  St. Louis, Indianapolis, Louisville, Cincinnati.

     3)  Dayton, Allentown, Washington,  Youngstown, Richmond,
         Baltimore.
After these three clusters are identified, the remainder show several
pairs of similar cases:

     4)  Detroit, Chicago.

     5)  Ventura, San Francisco.

     6)  Butte, Salt Lake City, Denver.

     7)  Miami, New Orleans, Dallas, Houston.

     8)  Fresno, Sacramento.

Again, there are the obvious geographical connotations to the clusters,
except for cluster 1, which consists mostly of the New England area but
also includes Milwaukee, Minneapolis, and Cleveland.

     When temperatures are left out of the analysis, we obtain the dendo-
gram in figure 10.  There is a little more structure in this diagram, and
we identify these clusters:

     1)  Springfield, Worcester,  Bridgeport, New Haven, Hartford,
         Providence.

     2)  Scranton, Trenton.

     3)  St. Louis, Indianapolis, Louisville, Dayton, Cincinnati,
         Milwaukee, Minneapolis,  Cleveland.

     4)  Baltimore, Richmond, Youngstown, Washington, Allentown.

     5)  Seattle, Philadelphia.
                                   26                                 324R/2

-------















































1
o
*
CO















































1 1
O 0















































1
c
LT















































1
0
^-























































































—
















































H










.















































c
r







\








































3
•)



















































































































































	 bd
w L


r-t
U6
rf
06
CT
bl
/T
i fif




L| tl
V I
nr>
	 	 — ut
/r
1 f +,

_4 01
|L_ QT
y i
I 61
— 1 /
	 	 T •»
Lo
_J 66
" n
7
r i— i IC
	 c +,
T
or
1 	 DC.
	 1 n->
"'f

or

cr

n-J
Uc
r
rr
Lt


•i /**
PT
b L
1 1 1
000
CM •— 1 O
                                                    CO
                                                    UJ
                                                    	I
                                                    CO
                                                    a:
                                                    cr
                                                    UJ
                                                    a.

                                                    UJ
                                                     o
                                                     X
                                                     UJ

                                                     I/O
                                                     UJ
                                                     _J
                                                     C2

                                                     i—t
                                                     cr
                                                     o
                                                     cr
                                                     o
                                                     o
                                                     UJ
                                                     CO
                                                     
-------
     6)  Fresno, Sacramento.

     On the basis of the analyses performed to this point,  no obvious  pat-
tern emerges.  As with the discriminant analysis,  the results obtained
appear to depend more on the  details of the analysis than on any under-
lying structure in the data.   The problem may lie  with the  algorithm used
in BMDP2M, which does not deal  effectively with noisy data  even when there
is clear structure (Everitt,  1977).   Possibly a different algorithm, or
use of a divisive rather than an agglomerative technique, would be more
successful.

     Dendograms based on ozone levels, meteorological variables, and emis-
sions variables alone are given in figures 11, 12, and 13.   In these cases
the algorithm has been more successful in identifying clusters, and these
clusters are listed in table 7.  As  would be expected, clustering based  on
meteorology alone produces geographically close groups.  The other two
types of variables, however,  produce clusters that do not have any geo-
graphical component to them at all.   For instance, it appears that Boston
and Seattle resemble each other in terms of their  ozone levels.  The
values of the variables for these two cities are,  respectively, maximum
ozone, 16.9 and 16.0 pphm; second highest ozone, 13.8 and 14.0 pphm;
average ozone, 5.7 and 4.3 pphm; and numbers of exceedances, six and
four.  Similarly, based on emissions, St. Louis and San Diego are in the
same cluster.  Their emissions are,  respectively:   HC, 127,000 and
137,000; HC/NOX, 1.42 and 1.42; and CO, 495,000 and 1,000.
                                   28                                324R/2

-------
                                                  l_r<=zz
                                                         62

                                                         2fr

                                                         K

                                                         81
                                                             CO

                                                             ct
                                                             o

                                                             o



                                                             o

                                                             o
                                                             LjJ



                                                             CO



                                                             I
                                                             cc.
                                                             C3
                                                             O
                                                             O
 O
  •


 00
o
 •

ro
J

o

o
32UR 5
                                  29

-------
                              92
                              Li
                              ee
                              9
                              12
                              ZI
                              22
                              8
                                                -61
                                                -z
                                                -91
                                                •01
                                                -Of
                                                -2
                                                -I
                                                -It
                                                -92
                                                •S
                                                -SC
                                                -62
                                                -ae
                                                -ge
                                                -Zfr
                                                -81
                                                -02
                                                -frZ
                                                -ZI
                                                -6
                                                -91
                                                -ee
                                                -ii
                                                -ei
                                                •ze
                                                          CO
                                                          
-------
                                                                             I/)
                                                                             UJ
                                                                             _J
                                                                             CQ
                                                                             OL
                                                                             
-------
TABLE 7.   SUMMARY OF CLUSTERS BASED ON OZONE  METEOROLOGICAL  AND
           EMISSION VARIABLES
          (a)   Clusters Based on Ozone Variables  (figure 5)

  1)    Boston, Trenton, Worcester, Seattle, Phoenix, Fresno,
        Denver, Dallas, Youngstown.
  2)    Philadelphia, Louisville, Chicago, San Diego, Scranton.
        Allentown, Pittsburgh, Washington, Providence, San
        Francisco, Detroit, Hartford, Sacramento, Kansas City,
        Baltimore, Cincinnati, Milwaukee, New York.
  3)    Cleveland, St. Louis, Bridgeport, New Haven, St. Louis,
        Houston.
  4)    Miami, Minneapolis, Butte, Portland.
     (b)  Clusters Based on Meteorological Variables  (figure 6)

  1)    Boston, Worcester, Providence, New Haven, Bridgeport,
        Hartford.
  2)    Chicago, Milwaukee, Detroit, Minneapolis, Cleveland.
  3)    Washington, Richmond, Louisville, Cincinnati,
        Indianapolis, Dayton, St. Louis.
  4)    Pittsburgh, Youngstown.
  5)    Baltimore, Allentown, Trenton, Philadelphia.
  6)    Ventura, Los Angeles.


        (c)   Clusters Based on Emissions Variables (figure 7)

  1)    St. Louis, San Diego, San Francisco, Milwaukee.
  2)    Ventura, New Haven, Indianapolis, Hartford, Denver,
        Louisville, Scranton, New Orleans, Kansas City,
        Trenton.
  3)    St. Louis, San Bernardino,  Pittsburgh.
  4)    Baltimore, Allentown, Richmond, Youngstown, Providence,
        Washington.
  5)    Springfield, Portland, Fresno, Dayton, Worcester,
        Sacramento, Bridgeport,  Cincinnati.
  6)    Phoenix, Minneapolis, Seattle, Miami, Dallas,
        Philadelphia.


                                    32                                 32HR/3

-------
                      IV    SUMMARY  AND  RECOMMENDATIONS
     The analyses carried out for this study do not lead to a definite
conclusion about the possibility of classifying cities by using combina-
tions of characteristics such as we have used here.  On the one hand, the
profile analysis appears qualitatively to show that there are definite
resemblances and differences, and the discriminant analysis was reasonably
successful in classifying the ozone problems of the cities on the basis of
a set of variables that included both meteorology and emissions.  On the
other hand, the agglomerative hierarchical clustering algorithm with which
we attempted some quantitative clustering failed to achieve a clear-cut
classification.  This technique is known to be susceptible to failure in
the presence of noisy data, and it is possible that a different agglomera-
tive algorithm (e.g., Ward, 1963) or a divisive technique such as the
Automatic Interaction Detector (A.I.D.) method (Sonquist and Morgan, 1963,
1964) could give better results.  These methods are more robust in the
presence of noisy data.

     We believe that the results presented here indicate that classifica-
tion techniques can be used to identify urban areas with similar ozone
problems.  However, more work is necessary to determine the best group-
ings.  One possible approach would be to apply principal components or
factor analysis to identify groups of variables that best account for
variations in the data.  An alternative to this approach would be to apply
insights into the physical nature of the problem.  Once an appropriate set
of variables has been identified, clustering algorithms could be applied
to the data; many of these algorithms can be found in the work of Hartigan
(1975).
32UR/2                             33

-------
                                REFERENCES
Dixon, W. J., and M. B. Brown, eds. (1979), Biomedical Computer Programs
     P-Series, Systems, Program and Statistical Development  (University of
     California Press, Berkeley, California).

EPA (1977a), "Uses, Limitations and Technical Basis of Procedures for
     Quantifying Relationships Between Photochemical  Oxidants and
     Precursors," EPA-450/2-77-021a, U.S. Environmental Protection Agency,
     Research Triangle Park, North Carolina.

EPA (1977b), "National Air Quality and Emissions Trends Report, 1976,"
     EPA-450/1-77-002, U.S. Environmental Protection  Agency, Research
     Triangle Park, North Carolina.

EPA (1974), "Monitoring and Air Quality Trends Report, 1973," EPA-450/1-
     74-007, U.S. Environmental Protection Agency, Research  Triangle Park,
     North Carolina.

Federal Register (1979), Vol. 44, No.l 221, Nov. 14,  1979.

Everitt, B. (1977), Cluster Analysis (Heinemann Educational  Books, London,
     England).

Hartigan, J. A. (1975), Clustering Algorithms  (John Wiley &  Sons, New
     York, New York).

Holzworth, 6. C. (1972), "Mixing Heights, Wind Speeds, and Potential for
     Urban Air Pollution Throughout the Contiguous United States," AP-101,
     U.S. Environmental Protection Agency, Research Triangle Park, North
     Carolina.

The Climatic Atlas of the United States, 1968  (U.S. Department of
     Commerce, Washington, D.C.).

Sonquist, J. A., and J. N. Morgan (1964), "The Determination of
     Interaction Effects," Survey Research Centre, Institute of Social
     Research, University of Michigan.

Sonquist, J. A., and J. N. Morgan (1963), "Problems in the Analysis of
     Survey Data and a Proposal," j. Am. Stat. Assoc., Vol.  58,
     pp. 415-435.

Ward, J. H. (1963), "Hierarchical Grouping to Optimize an Objective
     Function." J. Am. Stat. Assoc., Vol. 58, pp. 236-244.

-------
TECHNICAL REPORT DMA
,'Plcase read Instructions e >i the revtrsf utiforc con. pie ting1
1. REPORT NO.
EPA-450/4-81-031e
2.
4. TITLE AND SUBTITLE
Evaluating Simple Oxidant Prediction Methods Using
Complex Photochemical Models: Cluster Analysis Applied
to Urban Ozone Characteristics
7. AUTHOR(S)
Martin J. Hi 1 Iyer
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Systems Applications, Incorporated
101 Lucas Valley Road
San Rafael, California 94903
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Environmental Protection Agency
Office of Air Quality Planning and Standards
Research Triangle Park, North Carolina 27711
15. SUPPLEMENTARY NOTES
3. RECIPIENT'S ACCESSION NO.
5 REPORT DATE
August 1981
T. PERFORMING ORGANIZATION CODE
8 PERFORMING ORGANIZATION REPOHT NO.
SAI No. 81176
10. PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO.
68-02-2870
13. TYPE OF REPORT AND PLRIOD COVERED
14. SPONSORING AGENCY CODE

16. ABSTRACT
This report describes efforts to classify cities observing ozone levels greater than
0.12 ppm into distinct subgroups. Cluster analysis, using such factors mixing height,
wind speed, temperature, NMOC/NOX ratio and type of precursor sources, is used to
identify subgroups of cities. Identification of a limited number of such subgroups
could provide a means for more convincingly demonstrating the general applicability
of complex photochemical models by conducting validation exercises in cities repre-
sentative of each subgroup. The report indicates that the technique shows promise
but, nevertheless, requires some further refinement before it can be used to identify
most appropriate subgroups.
17.
a. DESCRIPTORS
Photochemical models
Ozone
Cluster analysis
18. DISTRIBUTION SiA'IMENT
Unlimited
KEY WORDS AND DOCUMENT ANALYSIS
b.lDCNTIFIERS/OPKN ENDED TERMS C. COSATI Field/Group
.- i
: •-•> stc. r- * 01 A
|20 SE'ru'qiT" C> •'
I — -
•;- /TVv Vppcrt/' ,'21 NO. OF P/ GES
42
?S (THispaze, -22. P^ICE
!
r
EPA Form 2270-1 (Rev. 4-77)    PREVIOUS ED-T ON is c SSOLE-EL

-------