EPA-650/4-74-038

October 1974                  Environmental Monitoring Series
                PROCEEDINGS
            OF THE SYMPOSIUM
         ON STATISTICAL  ASPECTS
           OF AIR QUALITY  DATA
                  Meteorology Laboratory
              National Environmental Research Center
               Office of Research and Development
               U.S. Environmental Protection Agency
               Research Triune, It Park, N.C. 27711

-------
Research reports of the Office of Research and Development, Environmental
Protection Agency, have been grouped into five series.  These five broad
categories were established to facilitate further development and application
of environmental technology.  Elimination of traditional grouping was
consciously planned to foster technology transfer and a maximum interface
in related fields. The five series  are:

      1. Environmental Health Effects Research
      2 . Environmental Protection Technology
      3. Ecological Research
      4. Environmental Monitoring
      5. Soci©economic Environmental Studies

This report has been assigned to the ENVIRONMENTAL MONITORING series.
This series  describes research  conducted to develop new or improved
methods and instrumentation for the identification and quantification of
environmental pollutants at the lowest conceivably significant  concentrations.
It also includes  studies to determine the ambient concentrations of pollutants
in the environment and/or the variance of pollutants as a function of time
or meteorological factors.
Copies of this report are available free of charge to Federal employees,
current contractors  and grantees, and nonprofit organizations - as supplies
permit - from the Air Pollution Technical Information Center, Environmental
Protection Agency, Research Triangle Park, North Carolina 27711. This
document is also available to the public for sale through the Superintendent
of Documents, U. S. Government Printing Office, Washington, D. C.  20402.

-------
                                    EPA-650/4-74-038
               PROCEEDINGS

           OF THE  SYMPOSIUM

       ON  STATISTICAL  ASPECTS

         OF  AIR  QUALITY DATA
                      Editor:
               Dr. Lawrence D. Kornreich
          Executive Director, Triangle Universities
              Consortium on Air Pollution
                Symposium Sponsors:
Meteorology Laboratory, National Environmental Research Center
                       and
      Triangle Universities Consortium on Air Pollution
               Contract No. 68-02-0994
       (with University of North Carolina, Chapel Hill)
                  ROAPNo. 21ADO
             Program Element No. 1AA009

          EPA Project Officer: Dr. Ralph Larsen
                    Prepared for

      U.S. ENVIRONMENTAL PROTECTION AGENCY
           Office of Research and Development
          National Environmental Research Center
            Research Triangle Park, N.C. 27711

                   October 1974

-------
This report has been reviewed by the Environmental
Protection Agency and approved for publication.   Ap-
proval does not signify that the contents necessarily
reflect the views and policies of the Agency,  nor
does mention of trade names or commercial products
constitute endorsement or recommendation for use.
        Publication No. EPA-650/4-74-038

-------
 PREFACE

     The symposium on Statistical Aspects of Air Quality was held on November
 9 and 10, 1972, at the Carolina Inn, in Chapel Hill, N. C., in accordance with the
 terms of a contract between the Division of Meteorology* of the U. S. Environ-
 mental Protection Agency (EPA) and the University of North Carolina at Chapel
 Hill (UNC).
     Although UNC was the contractor, it was agreed that the symposium would
 be  sponsored  by  the  Triangle Universities Consortium  on  Air  Pollution
 (TUCAP), an association of Duke University,  North Carolina State University
 and tne University of  North  Carolina at Chapel Hill. The project officer for
 EPA was Mr. Charles R. Hosier; the  responsible officer for TUCAP was Dr.
 Lawrence D. Kornreich. The detailed planning for the symposium was done by a
 steering committee representing both TUCAP and EPA.
     All papers that were presented at the symposium are included in this vol-
 ume. Most of the technical papers were reprinted and distributed to the partici-
 pants prior to the meeting. An open discussion followed the presentation of each
 technical paper, and the questions and answers were recorded and transcribed.
 Each discussant was given the  opportunity to review and edit his comments.
 Only those comments which were  reviewed and approved  by the discussants
 appear in this volume.
     For their outstanding performances at the banquet, special thanks are due
 Mr.  Donald  Pack of the National Oceanographic and Atmospheric Administra-
 tion, who acted as Toastmaster, and Dr. Roy Kuebler of the UNC Department of
 Biostatistics who was the featured speaker.
     The registration of participants  and preparation of information packets was
 ably handled by Continuing  Education  and  Field Service of the U,NC School Of
 Public Health. The audio-visual arrangements throughout  the meeting were ca-
 pably managed  by Mr. Lewis Kontnick and Mr. E. James Dale, graduate students
 in the air pollution curriculum at UNC.
     I  am  especially grateful  to  Professor  Arthur  C.  Stern  of  the  UNC
 Department of Environmental Sciences for his help and guidance in preparing
 this volume.  I  also  want  to  thank  my  secretaries for  their  outstanding
 service—Mrs. Jean Lang during the planning and holding of the symposium, and
 Mrs. Ann Harrell during the preparation of this volume.
                                                   Lawrence D. Kornreich
                                                   Chapel Hill, N.  C.
                                                   February, 1974
*Now the Meteorology Laboratory of the National  Environmental Research Center, Re-
 search Triangle Park, N. C.
                                                                      Ill

-------
STEERING COMMITTEE
Kenneth L. Calder, Chief Scientist, Division of Meteorology, Environmental Pro-
    tection Agency.
Kenneth R. Knoerr, Associate Professor, Biometeorology, Duke University.
Lawrence D. Kornreich, Executive Director, Triangle Universities Consortium on
    Air Pollution
Ralph  I. Larsen,  Environmental Research Engineer, Division of  Meteorology,
    Environmental Protection Agency.

Arthur C. Stern,  Professor of  Air Hygiene, University  of North Carolina at
    Chapel Hill
Allen H. Weber,   Associate  Professor,  Geosciences,  North  Carolina  State
    University
SESSION CHAIRMEN


Irving Singer, Smith-Singer Meteorologists

Donald Rote, Argonne National Laboratory


James Arvesen, Purdue University


Arnold Court, California State University


George Tiao, University of Wisconsin


Ray Wanta, Consulting Meteorologist


 iv

-------
WORKSHOP LEADERS


Ralph Larsen, Division of Meteorology, U. S. Environmental Protection Agency


Donald McNeil, Princeton University
Warren Johnson, Division of Meteorology,  U. S.  Environmental Protection
    Agency
Victor Hasselblad, Human Studies Laboratory, U. S. Environmental Protection
    Agency

-------
CONTENTS


 /.   Robert A. McCormick                                          1-1
     Welcoming Remarks


 2.   Arthur C. Stern                                                2-1
     Keynote Address: Statistical Analysis of Air Quality Data
 3.   Frank A. G if ford. Jr.                                           3-1
      The Form of the Frequency Distribution of Air Pollution
      Concentrations
 4.  F. K. Wipperman                                               4-1
     Meteorological Parameters Relevant in a Statistical Analysis
     of Air Quality Data
 5.  Michael Benarie                                                5-1
     The Use of the Relationship between Wind Velocity and
     Ambient Pollutant Concentration Distributions for the
     Estimation of Average Concentrations from Gross
     Meteorological Data
 6.  Richard E. Barlow and Nozer D. Singpurwalla                     6-1
     Averaging Time and Maxima for Dependent Observations
 7.  Allan H. Marcus                                               7-1
     A Stochastic Model for Estimating Pollutant Exposure by
     Means of Air Quality Data
 8.  Harold E. Neustadter, Steven M. Sidik and John C. Burr, Jr.         8-1
     Evaluating Conformity with Two-Point Air Quality Standards,
     Polludex
 VI

-------
 9.   Joseph B. Knox and Richard I. Pollack                           9-1
     An Investigation of the Frequency Distributions of Surface
     Air-Pollutant Concentrations
10.   D. Bruce Turner                                                10-1
     Air Quality Frequency Distributions from Dispersion Models
     Compared with Measurements
/ 7.   Bernard E. Saltzman                                            11-1
     Fourier Analysis of Air Monitoring Data
12.   F. Barry Smith and G. H. Jeffrey                                 12-1
     The Prediction of High Concentrations of Sulfur Dioxide in
     London and Manchester Air
13.   David A. Lynn                                                  13-1
     Fitting Curves to Urban Suspended Particulate Data
14.   Joseph R. Visa/li, David L. Brenchley and Howard Reiquam        14-1
     A Proposed Ambient Air Quality Sampling Strategy and
     Methodology for the Design of Surveillance Networks
15.   Yuji Horie and John H. Overton                                  15-1
     The  Effect on Rollback Models Due to Distribution of Pollutant
     Concentrations
16.   Symposium Participants                                         16-1


17.   Bibliographic Datasheet                                         17-1
                                                                    VII

-------

-------
                     1. WELCOMING REMARKS
                      ROBERT A. McCORMICK*

                        Division of Meteorology
               National Environmental Research Center
                  Environmental Protection Agency
                Research Triangle Park, North Carolina

    On behalf of the National Environmental Research Center in the Research
Triangle Park, North Carolina, and in particular the Division of Meteorology who
have actually sponsored the Symposium,  I would like to warmly welcome you
all and say how pleased  we are  that so many distinguished investigators have
found time to attend, especially those from overseas.
    Before Professor Stern's opening remarks, I would like to say a few words as
to why we in the Division of Meteorology (DMT)** wereianxious to support this
Symposium. Following the  pioneering efforts of Fran Pooler and  Bruce Turner,
our  primary  efforts  have  been  in  the  development  of  source-oriented
diffusion-type  models. Because of their  source-oriented  structure, they  have
potentially wide application in the  consideration of the effects on air quality of
hypothetical and arbitrary  emission control strategies. They  thus  provide  a
rational basis for air quality management through control of selected sources of
pollution. At the present time the development  and improvement of such air
quality simulation models has the  highest priority of all  items in the research
program of the DMT.
    In contrast are those models that primarily involve some form of statistical
regression analysis and that depend entirely  on  the  availability of  extensive
meteorological and  air quality data for a particular urban location.  Over the
years these  statistical approaches have become increasingly sophisticated and
now  include  such  things  as  "multiple-discriminant  analysis",  "empirical
orthogonal functions",  "factor  analysis", and most  recently "computerized
adaptive  pattern classification". Although these developments have had useful
applications to  specific problems, the fact that they are  receptor- rather than
source-oriented  and do not  involve any explicit input of information concerning
pollution emissions, so far makes them not applicable to comparative studies of
control  strategies.  This  is the reason  for their lower priority  in our  research
program  as compared with  the source-oriented dispersion-type models. I  have
intentionally said "so far"  in the preceding  as it  seems to us that a question
 *On assignment from the National Oceanic and Atmospheric Administration, U. S. Depart"
  ment of Commerce at the time of the Symposium. Mr. McCormick has since retired from
  Federal service.
**Now the Meteorology Laboratory.
                                   1-1

-------
which ought to be resolved is whether some form of statistical predictive scheme
could be evolved that would incorporate hypothetical changes in the pollution
emissions distribution without any direct use of meterological diffusion theory.
Such a  model would,  of course, then possess the desirable source-oriented
property.
     By analogy with the  striking developments of the  last decade  or  so  in
turbulent  fluid  mechanics,  we   can   hope  that advances  and  improved
understanding of air quality simulation  will  result from a  marriage of statistical
techniques with more precise physical formulations of  the problems. It was this
strong feeling that suggested  the need for the present Symposium. Perhaps this
will  stimulate a stronger interaction between some of the more meteorologically
inclined  of our air quality modelers and non-meteorological statistical experts.
No less important is the hope that the workshop sessions will more clearly define
for  our- air  quality modeling fraternity,  those areas and  approaches  where
improved cooperation between meteorologist and statistician might  be most
helpful  and fruitful  in the immediate future, in much the same manner as the
clarification now being achieved between the meteorologists and atmospheric
chemists.
1-2

-------
                           Keynote Address
      2. STATISTICAL ANALYSIS OF AIR QUALITY DATA
                         ARTHUR C. STERN

       Department of Environmental Sciences and Engineering
                     University of North Carolina
                      Chapel Hill, North Carolina
Introduction

    The need for a Symposium on Statistical Analysis of Air Quality Data arises
from  a  quite  diverse assortment  of  challenges to  our  understanding of the
meaning of air quality data now coming into sharper focus. A closer look at
these  challenges will give a better understanding of some of the answers we are
seeking at this Symposium.
    Much  of the research effort  in the field of air  pollution is and has been
directed to the development of air quality criteria,  and through them to the
establishment of air quality standards. Since air quality criteria are explorations
of the relationship  between levels of air quality and the adverse effects found in
receptors exposed to  these levels, it is essential that there be precise description
of both these levels and their associated adverse effects. It is not the function of
this Symposium to discuss the precise description of the adverse effects, but it is
our function to discuss the precise description of the levels of exposure that
cause  them.
Simple Chamber Atmospheres

    Our simplest task is to describe the exposure in a chamber in which one or
more receptors are being experimentally exposed. These receptors may variously
be materials specimens, such as textiles, paper  or  leather; vegetation,  such as
plants,  lichens, or bacteria; animals, such as monkeys, guinea  pigs, or mice; or
human  volunteers. However, even  in this  most simple  situation an adequate
description of the quality  of the air in the chamber must reflect the variance in
the system generating the  chamber atmosphere;  the decay in contaminant level
in  the chamber due to wall effects and absorption by chambers contents, such as
the fur of exposed animals, their urine and feces, and cyclic response associated
with the effect of activity on animal uptake and light-intensity on plant uptake.
                                  2-1

-------
Complex Chamber Atmospheres

    Description of the  level of exposure becomes  more complicated  when
long-term exposure is adjusted to the working hours of the experimenters, as
when exposures are for  8 hours a day, 5 days a week, with occasional hours or
days lost due to staff holidays, equipment maintenance or malfunction, or other
causes.
    A still higher level of skill is needed to describe the level that results when
the experimenter,  instead  of  attempting  to  maintain a constant  level,
deliberately attempts to  simulate in  the chamber some of the  changes  in air
quality  level observed in the ambient air. This is a problem with  which some of
my colleagues at UNC-Chapel Hill  are now having to cope in connection with the
introduction of reactants into a 12,000 cubic foot, naturally irradiated "Teflon"
chamber under construction  a  few miles from here. The intent is to introduce
the reactants at a rate that will simulate what happens  in the ambient air of a
community  such  as  Los Angeles on  a typical weekday  morning. Precise
description of the quality of  the contents of the chamber will be as complex as
describing the quality of ambient air.
    In effects research,  be it of materials, plants, animals, or humans, one of the
things  we most need to know  is the relative influence  on the adverse effects
observed  of  short-duration,  high-concentration  spikes  superimposed on  long
sustained  average  levels. Very few, if any, chamber experiments attempt this
type of superimposition and raise the problem of air quality description which it
would  impose.  The importance of this kind  of  information  arises  because we
need to know  in which  classes of receptors adverse effects are proportional to
integrated dose, and  in which classes  protective or  defense mechanisms are
inhibited by short-duration, high-concentration spikes so that adverse effect is
more than proportional to integrated dose.
Ambient Air Quality Data for Specific Effects

    As great as is the challenge of providing precise description of exposure
chamber atmospheres, ev^n more challenging is the task of precisely describing
the ambient air. Air quality criteria development also requires that the exposure
of materials, livestock, forests, crops, and human populations to the ambient air
be described during both short-term episodic conditions and for long periods of
time, up to the lifetimes of  viable receptors. Here we have three situations of
increasing complexity. The first, and least complex, is providing a description of
air  quality at a  fixed location  in the field where material specimens are being
exposed; test  crops are  being  grown; or animals are being maintained under
observation and air quality is being monitored.
    Next in level of complexity is providing a description of the quality of the
air  associated with a  specific observed adverse effect occurring at other than a
2-2

-------
fixed experimental location, as was the former case. Typical situations are:
     (a) Gradual or sudden increase in clinic or hospital admissions for asthmatic
attack, respiratory or cardiac disease.
     (b) Gradual or sudden awareness of specific damage to trees, crops, or
livestock.
     (c) Gradual or sudden awareness of specif ic damage to materials.
     In  such  situations, it is rare that  air was being monitored  at the precise
location where the person who was hospitalized lived,  the damaged crop was
being grown, or the damaged material  was in use. However, part of the reason
why we are here today is to better define the kinds of air quality measurements
that  must be  made to provide air  quality data to relate to these types of
situations. A closely related reason for our Symposium is the as-yet-unresolved
role of air quality measurement in the prediction  and management of short-term
air  pollution episodes. There is no question but that we need good description of
air  quality to  understand what has  happened  and  is happening during air
episodes.  However, the extent that air  quality data can be used to forecast the
occurrence or course" of an episode is still not clear and  is an  area into which a
group such as this should  be able to provide some  insight.
Ambient Air Quality Data for Epidemiological Studies

    Our most  difficult and  imperative task in the statistical  analysis of air
quality  data  is to provide descriptors of air quality that are  meaningful for
understanding epidemiological data on human  mortality and morbidity, since
this is the description of air safe tp breathe. We need to know where to measure,
what  to measure, how frequently to measure, and how to analyze and interpret
the data we measure.
    Once air quality  criteria are endowed with meaningful descriptions of air
quality, we are in a position to select some of these descriptors as air quality
standards. To date, we have chosen such descriptors rather sparingly, using  only
averaging time  and simple statements of frequency of  occurrence. Can  you
supply better ones?
Federal Air Quality Standards for CO, HC, and NO2

    So  much for  the generalities. Now let's get down to a few examples of
specifics—to  some real  problems created by the way air  quality data  and air
quality standards are presently described. Among these problems have been the
descriptors used for the Federal Air Quality  Standards for  CO, HC and IMO2.
First  letfs  look at the CO standard of 9 ppm (8-hour  average),  not to be
exceeded more than once a year. This  is ambiguous since it does not specify
which 8-hour period  to use.  There  are  an  infinite number of possible 8-hour
                                                                      2-3

-------
running  average  values  if  the  running  averages  may  start randomly, not
necessarily on the hour, at any instant of time during the year. In practice, to
determine compliance with this national standard, data must be organized. Since
there is  also a 1-hour average national standard, the unit of time apparently
intended was the clock hour rather than any  randomly started period of 60
minutes. For clock hour data, the 8-hour average possibilities start with the use
of one specific 8-hour period (e.g. 8 a.m. -4 p.m.) per calendar day. Since there
are 365 such periods possible per year, if one period exceeds the standard, the
standard  represents  the 99.7  percentile  value.   There  are  1095  possible
values. There are 1095 possible non-overlapping 8-hour periods per calendar year,
and between 8752 and 8760 possible running 8-hour periods, which if similarly
used would  set the standard at the 99.9 and  99.99 percentiles, respectively. It is
thus unclear whether the 99.7, 99.9 or 99.99 percentile values were intended as
the standard. The problem for the person who  must establish compliance with
the National standard  is then to  determine which  of  these possibilities was
intended and is acceptable.
    Next let's look at the N02 standard of 0.05 ppm—annual average. In order
to relate this value meaningfully to  the  National  standard for hydrocarbons
(non-methane)  of 0.24 ppm (3-hour  average—6-9 a.m.),  and to  the NOX
reduction required in  automobiles, it is necessary to convert the Federal NO2
standard to its equivalent 3-hour average NOX value. This requires a double
conversion—first of annual average NO2 to equivalent 3-hour average NO2then
to equivalent 3-hour average NOX. The hydrocarbon standard with which this
latter value must be  considered  is for non-methane hydrocarbons,  while the
National automobile emission standard which  is intended to achieve it, is for
hydrocarbons including methane. Finally the National oxidant standard that is
intended to be achieved by the control of HC (3-hour average) and NO2 (annual
average)  is expressed as a 1-hour average. We hope that this Symposium will help
provide more rational bases for  understanding  and  expressing air quality data
and air quality standards.
Conclusion

    In October, 1969, we ran a predecessor to this Symposium, one on Multiple-
Source Urban  Diffusion Models.  There are several important ties between these
two symposia.  Both were under the same joint sponsorship of the  Division of
Meteorology (EPA) and the Triangle Universities Consortium on Air Pollution.
Some of the areas opened up at the 1969 symposium form the basis for research
papers at this one. Finally as diffusion modeling comes of age, its requirements
for real air quality data for model calibration and validation become increasingly
important. Some  of  the  air quality data analysis techniques discussed  here
should help improve the quality  of diffusion models just as diffusion modeling
should help improve the analysis, tabulation and presentation of air quality data.
2-4

-------
One can envision a combined air quality monitoring-air quality modeling effort
in which more accurate air  quality data  for a community can be obtamed at
lower cost in monitoring equipment and manpower, by using the model to fill in
the data where there are no monitoring stations,  and using the monitoring"
stations to calibrate and validate the model. In the past, air quality monitoring
and modeling have been  considered quite separate and disparate operations.
What  is proposed  is that they can be operated as a joint enterprise with benefit
to both aspects.
    If this Symposium can point the way to these kinds of interactions, we may
well be planting the seed  from which another symposium on such interactions
may arise a few years hence, just as this one arose from the seeds planted in 1969.
                                                                      2-5

-------

-------
      3. THE FORM OF THE  FREQUENCY DISTRIBUTION
             OF AIR POLLUTION CONCENTRATIONS
                        FRANK A. GIFFORD

   Air Resources Atmospheric Turbulence & Diffusion Laboratory
          National Oceanic & Atmospheric Administration
                        Oak Ridge, Tennessee
Introduction

    The practical need to be able to estimate the frequency distribution of air
pollution concentrations doesn't have  to be  pointed out to the participants in
this symposium. It will  I'm sure be emphasized in  many of the papers that we
will hear. Instead,  I'm going to try to bring out some of the possible reasons for
the concentration frequency distributions that we observe. Along the way I hope
to emphasize the basic difference between the frequency distribution of urban
air pollution, which results from the  combined effects of many sources, and that
of concentrations from  a single, isolated source of  pollution, like an electric
power plant. I'll also mention several different proposals that have been given in
the literature for the mathematical form of these distributions, and will try to
bring out some relationships among several of  these.
The Lognormal Distribution of Urban Air Pollution

    Larsen (1970;  1971)  has established, by means of a large number of data
comparisons, that observed air pollution concentration distributions are closely
approximated by the lognormal function. This is an interesting fact about air
pollution, which calls for some kind of explanation.
    The concentration X due to an urban air pollutant, the ambient air quality
in other words, is ordinarily observed over successive, short, time intervals. A
record of  X at an air pollution sampling point consists of a series of observed
values, Xj, i = 0,1,2, .  . . n. The irregular change in  these numbers from one
sampling interval to the next reflects all the obviously complex variability of
source, meteorological, and other factors. But at any time it will most strongly
depend on the existing air pollution concentration level. This is true for at least
two reasons. First,  relevant  meteorological factors,  principally the wind  speed
and direction, tend  to be strongly self-correlated. Second, urban air pollution is
                                 3-1

-------
more uniform than that from isolated rural sources because the urban source is
distributed more uniformly, over a very large area.
    This suggests that the X; are generated by the following simple process:
                                                                     (1)
X| = X0 + y,X0|  X2 = X,+y2X|;	,  Xn=Xn_,  +  ynXn_,

The quantities y-t are specified to be irregular, stochastic variables, known only in
terms of their means and standard deviations. Nothing more definite than this
needs to be said about the y-t, but the  implication is that yj results from a large
number of small, irregular effects acting on Xj.1f such as brief shifts in the wind,
changes in  traffic, and so on. The lognormal distribution of X follows  directly.
    From Equation 1 it is seen by rearranging that

                2  y; = zr=2  (Xj-Xi-|)/Xj_,             (2)
                i=l            i=l
According to the central limit theorem, zn, the sum of n independent stochastic
variates, will be normally distributed for large n; but Equation 2 is equivalent to
                          Xn
                    zn= f X"' dX = ln(Xn/X0)               (3)

                         X0
That is, Xn has a lognormal distribution. The conditions of validity of the central
limit  theorem are very general. The y-t do not have to be normally distributed, or
even to have the same distribution.
    This derivation of the lognormal distribution is just a particularization of
the standard explanation of how skew  distributions are generated; see  for
example  Hald (1952). Since the cause of the irregular changes in yj did not have
to be specified  exactly, the derivation  only  gives the* form and not the pa-
rameters of the distribution.  Nevertheless it seems to be an adequate explanation
of the observed strong tendency towards Ibgnormality of air pollution concen-
tration distributions.


The  Distribution  of Concentration from  an Isolated Point Source of
Pollution

    In some contrast to the above simple, but rather general result, I proposed a
specific model of the frequency distribution of  concentrations from an isolated
point source (Gifford (1959)).  See also Scriven (1965).  In this model,  the
concentration Xp at  a point  (x, y, z) due to a fluctuating plume of contaminant
originating  at (0,  0, 0)  is  given  bv a randomly  positioned,  spreading-disk,
Gaussian plume mpdel:
                                       1  txp-[(y-Dy)2/
                     (z-Dz)2/2crz2]
                                                                     (4)
                /    -      »•---«, j


3-2

-------
Q is the source strength, U is the (constant) mean wind speed, and D  and D2
are the  fluctuating distances of the center of the instantaneous plume from the
mean plume axis (x, 0,  0). D  and Dz are assumed to be normally distributed,
and ay  and az are the standard deviations of the instantaneous plume spreading.
If the new variables
     Y=(y-Dy)/(2o-y2)l/2,  and  Z = (z-Dz)/( 2<7Z2)"2
are defined, it follows from Equation 4 that
                    L  =  Y2 + Z2=  -ln(cXp/Q)                 (5)
where                         _            ,,
                          C  = 27T CTy CTZ U
    For suitably standardized values of the variables, and for  concentrations
measured on  or  near the mean plume axis,  it was demonstrated that the
distribution of L is         p (|_)  = e-L/2/2                       (6)

Thus  the logarithm  of concentration  from  a single source is distributed
exponentially, which is the same as chi-square with two degrees of freedom. The
degrees  of freedom correspond to  the two directions, y and z,  into which the
plume fluctuations are resolved by the  model.  The mathematical  form of the
dis'tribution for sampling points off the mean axis turns out to be  considerably
more complicated. However the parameters of  the distribution, the mean and
standard deviation, are given explicity in  terms of the plume parameters. See the
references for details.
    Gartrell  (1966) pointed out that the distribution of concentrations from an
isolated source is qualitatively different  from that due to urban air pollution.
The most probable concentration from an isolated source is clearly zero, and the
distribution is strongly skewed. Gartrell found that a semi logarithmic diagram, in
which concentration is  plotted as the logarithm of frequency,  yielded a  good
linear correlation  of large amounts of TVA  SC>2,  i.e., essentially  isolated
point-source,  data.  Urban  air  pollution  distributions  are, on the other  hand,
flatter, and richer in the  low concentration range with a higher modal value; high
concentrations are less frequent.
    More recently, Barry (1971) has also plotted extensive amounts of argon-41
data semilogarithmically and  shows an excellent fit to  the semi logarithmic
distribution
                                         bXn
                           P(Xp) = oe    P                         (?)
The sources of these concentrations are isolated, tall stacks. Since  there is little
or no  background  contamination, these give  the isolated point  source  in  a
particularly clear-cut  example. He and Scriven (1971) discussed this result,
concluding that because of the simplicity of Equation  7 and the high quality of
the agreement with data,  this empirical, semilogarithmic distribution is to be
preferred  to  Equation  6.  Actually  the two  distributions refer  to  different
quantities.
                                                                      3-3

-------
The mean wind is assumed to be constant in Equation 6, over a time period of
the order of an hour, whereas in Equation 7 all wind fluctuations are included.
For this reason Equation 7 is a more  practically useful result. There must be
some fairly direct relationship between the two  results, but so far it has not been
found.

The Distribution from n Point Sources

    If there are a large number, n, of sources of a pollutant whose plumes affect
a particular point, the i-th one will make a contribution to the concentration at
the point given by
   Xpj/Qj  = UTTcrjyCrjzUr1  exp{-[(yj-Dyj)2/ 2cryj2 +
                                                                     (8)

                                           (Zj-DZJ)2/2crZi]}
from Equation 4. Following the same procedure,
                    -ln(CjXpj/Qj) = Yj2+zf                  (9)
Summing over n such sources gives
                        n                   p
              -L=ln   n  CjXpj/Qj = 2 (Yj2 + Zj2)          (10)

     If the quantities Dyi, Dzj are independently normally distributed with mean
=  0 and  a =  (D2)1/2, then  the quantities (yj-Dyj)2 and  (Zj-Dzj)2 are also
normally distributed, with mean  = yi/(2ayj2)1/2 and a = (D2j/2ayj2)1/2 and
similarly for the z-term. By the central limit theorem, the quantity L is therefore
asymptotically normally distributed, with mean and standard deviation obtained
by summing those for the individual sources. Equation 10 can  be written
   r A            i
ln[n CjXpj/QjJ
                                       l/n
                                            = -L/n
and the quantity in parentheses is just the geometric mean of the concentration,
weighted by c-r If the arithmetic mean is simply related to the geometric mean
(for  instance if they are proportional), then Equation  11 says that the logarithm
of the  concentration due  to  a  large  number of point  sources is  normally
distributed.
    A distribution  function for point sources based on the Poisson distribution
was proposed by Wipperman (1966). His model essentially assumed a uniform,
"top-hat" distribution of the instantaneous plume, which lends itself very neatly
to the Poisson representation. Similarly Prinz and Stratmann (1966) developed a
model using the negative binomial  distribution. These have also been used to
describe multiple-source, urban pollution data. Probably any of these general,
skewed, frequency  distributions could be used successfully to describe urban air
pollution  distributions. Most  of the  empirical comparisons, as a  result  of
3-4

-------
Larsen's extensive studies, have been made with the lognormal distribution. The
fit of urban pollution  data  to the lognormal curve, while good, is not perfect.
Systematic departures occur, particularly for low concentration values. For this
reason extrapolation of the Ibgnormal, or any other distribution function out of
the usual range of observed frequencies, should be made cautiously.
    A final point is that  many  observed  air pollution frequency distributions
must  be composites, reflecting the presence of both the multiple, distributed,
urban pollution sources and  nearby, strong, isolated point sources as well,  in
varying degrees.  Study  of  all  the  resulting distribution types  should   be
rewarding,  not only for their theoretical interest but also as clues to the nature
of urban air pollution sources. Concentration distributions are, so to speak, the
"fingerprints" of air pollution, and their characteristics may help us detect and
analyze urban air pollution patterns.

Acknowledgement

    This research  was performed  under an  agreement  between  the Atomic
Energy Commission and the National Oceanic and Atmospheric Administration.

References

Barry, P. J., 1971:  Use of argon-41 to study the dispersion of stack effluents.
      Proc. of Symposium on Nuclear Techniques in Environmental Pollution,
      Int. Atomic Energy Agency, pp. 241-253.
Cramer, H., 1946:  Mathematical methods of statistics. Princeton Univ.  Press,
      Princeton, N. J.
Gartrell, F. E., 1966: Control of air pollution from large thermal power stations.
      Revue Mensuelle 1966 de  la Soc. Beige des Ingenieurs et des Industriels
      Bruxelles, pp.  1-12.
Gifford,  F. A.,  1959:  Statistical properties  of  a fluctuating  plume model.
      Proceedings of Symposium on Atmospheric  Diffusion and Air Pollution,
      Advances in Geophysics. 6: 117-137.
Hald,  A.,  1952: Statistical  theory with engineering applications. Wiley, New
      York, N. Y.
Larsen, R., 1970: Relating air pollutant effects to concentration and control../.
      Air Pollution Control Association. 20: 214-225
Larsen, R., 1971: A mathematical model for relating air quality measurements
      to air quality  standards. Office of Air Programs, USEPA  Pub. No. AP-89.
Prinz, B. and Stratmann,  H., 1966: The statistics of propagation conditions in
      the  light of  continuous  concentration measurements  of gaseous   air
      pollutants. Staub. 26: 4-12 (English translation edition).
Scriven, R. A., 1965: Some  properties of ground level pollution  patterns based
      upon a fluctuating plume model. CEGB Lab.  Note No. RD/L/N 60/65.
                                                                      3-5

-------
Scriven, R. A., 1971:  Use of argon-41 to study the dispersion of stack effluents.
      Proc. of Symposium on Nuclear Techniques in Environmental Pollution,
      Int. Atomic Energy Agency, pp.^254-255.
Wipperman, F., 1966: On the distribution of concentration  fluctuations of a
      harmful gas propogating in the atmosphere. (Translation of unpublished
      MSS.) 17 p.
 DISCUSSION
Arnold Court: Your closing comment that the lognormal does not fit very well
at low  concentrations is obvious in the derivation. That derivation is not valid
when x can approach 0, because you would be dividing by 0 in your derivation.
The lognormal will fit  only when the fluctuations are  small compared to the
concentration value.
Gifford: I have no comment; that seems reasonable.
F. B. Smith: Referring back to one of your very first equations. I don't think
this equation which relates the concentration at time (tf) to the concentration at
time (tj..j) can be  unique. In fact the relationship normally used, for example
with wind velocities, is that the velocity, or the concentration  in this case, is
related  to  the  velocity  or concentration of  the  previous  time through a
correlation coefficient.  In other words, looking at this first equation, x1 would
equal  the value XQ times  some  sort  of  decay function, which  might  be
exponential but related to the correlation between the concentrations at those
two times, plus some random element, which would have zero mean and some
specified  standard deviation. I  think this would  probably give-quite a different
distribution, a different answer. I'm not quite sure whether it would give a
lognormal.
Gifford: Yes, I would be interested to know the answer to this. I think that
certainly  the important  problem is to try to say something more about why,
other than just that it's an irregular function. I would certafnly think it would be
a  good idea to use different kinds of generating functions and see what the
resulting distributions are.
M.  M.  Benarie:  I  really  am not  here to disprove  or attack  lognormal
distributions, which I use in the next  paper to a  great extent. All this discussion
about  the  genesis  of  describing functions reminds me  very  much  of  the
discussion in aerosol physics which began about 25 years ago, but was luckily
ended about 10  years ago, about the exponential, Risen-Rammler,  lognormal
and other descriptive functions for the  distribution of aerosols. For me, any
function which is mathematically easy to handle and is a good approximation is
good. So why not take the lognormal?
James Arveson:  I have  two questions - one is for information, the other might
open a  Pandora's Box. The first one is for your equation 4. Why is there circular
3-6

-------
symmetry?  Why not allow the possibility of some  kind  of correlation in the
model? The second  question  is  why  is there a need  to have a parametric
formulation for these distributions? Perhaps we should be content to deal with
just quantiles of generally non-parametric distributions. Why is there the need to
fit a lognormal, a Weibull, etc.?
Gifford: This  equation  4  doesn't have  cross-correlation terms  in  it simply
because, in  the usual  form,  the distributions  with respect to  y of  the
concentration is  assumed not to involve a cross-product term in ayand az. In
conditions of strong stability this is undoubtedly a poor assumption and I don't
think there  is any particularly good reason for  not setting up the model on the
basis that you suggest; I  just haven't done it here.  As to the second question,  I
don't  see  offhand—it  goes back  to  what the  introductory  speakers were
saying—you  want to be able to  rationalize what you observe in terms of physical
variables.  Now  the exact details of the  model for doing this could be debated,
but it seems to me that whether you  use...I  am trying to see rather desperately,
not being a statistician,  how what you would call a non-parametric approach
would apply here,  but it seems to me that  it is something like, "Look Ma,  no
hands." What vou want is a  wav of  relating the  atmosphere,  which is  the.
physical medium, to the observed concentration values,  because you need to
specify  the transfer function in the atmosphere, and  I certainly don't have any
strong feeling about how that should  be done.  Here  I intended to show only a
couple of possibilities that occurred to me.
Ron Snee:  I agree  that  if you  had to pick a single distribution  function, the
lognormal would be  the one  to pick.  I  would point out, however, that the
Pearson system of distribution functions includes a variety  of distribution curves
and  has been used  by statisticians for many years  in the characterization of
empirical distributions. You did not mention the Pearson  system. I wonder if it
hasn't been used or if perhaps there is some reason why it shouldn't be used?
Gifford: I don't know of any reason why any rational approximation to what is
observed couldn't be used and I  certainly didn't intend to imply that it shouldn't
be. They're  all equivalent.  If you, for instance, look at a table of the parameters
of distributions, you  will find  that all of the skew frequency distributions are
related. The main difference among the different families of distributions has to
do with whether they are discrete or continuous variables.  For the rest of it they
are mostly more or less all inter-related.
Snee: I would encourage the group to investigate the Pearson system. I believe
the Pearson  system will help get us out  of the problem of deciding whether the
lognormal distribution is appropriate in a given instance.
Gifford: The problem isn't the  form of the distribution, in my opinion. It's how
you go about specifying the physical mechanisms involved, and that's the reason
that I don't  really care too much about this little explanation here without some
rational way of characterizing the y's, which is what Dr. Smith was getting at.
The problem is to be able to express the parameters in some suitable distribution
in terms of atmospheric physical variables.
                                                                       3-7

-------

-------
     4. METEOROLOGICAL PARAMETERS RELEVANT IN A
                     STATISTICAL ANALYSIS OF
                       AIR QUALITY DATA
                          F.WIPPERMANN

                     Department of Meteorology
              Technical University, Darmstadt, Germany
Introduction

    Today many measurements (continuous or at discrete times) of air quality
are  carried out at many places in industrial countries. Statistical evaluations of
these measurements are as different (and therefore incomparable) as the places
are. Mostly concentration of a gaseous component or of particulate  matter is
considered as depending on surface  wind (speed and direction), on surface
temperature,  and sometimes  on  humidity.  However  these three  or  four
parameters  are  not  able  to describe completely the  turbulent  state  in  the
atmospheric boundary layer and, therefore, not the diffusion condition  in it.
    This paper   intends to show  which  meteorological  parameters can be
considered as relevant  in a statistical  analysis of air quality data. However the
conclusions drawn are valid only if the atmospheric boundary layer satisfies the
conditions of a planetary boundary  layer (PBL), i.e., stationarity and horizontal
homogeneity. In general, since the actual boundary layer differs from  the PBL,
the  conclusions are therefore only approximately correct.
    This basic concept has been developed together with Dr.  Yordanov  (Sofia,
Bulgaria) and  is  the subject of a recent joint paper (Wippermann and Yordanov
(1972)).
 Planetary Boundary Layer (PBL) and Rossby Number Similarity

    The PBL is defined  as a steady state horizontally homogeneous boundary
 layer. In a PBL, there exists, for z » ZQ (ZQ = roughness length), a so-called
 Rossby  number similarity, which  means that the vertical profiles of certain
 variables  (non-dimensionalized  correctly  by  internal  parameters) are
 independent of the given external parameters. They depend only on an internal
 parameter ju for thermal stratification, and on two internal parameters Xx and Xy
                                                                   4-1

-------
for baroclinicity in the PBL. Of course, height above ground as an independent
variable has also to be made nondimensional bv a scale height H of the PBL.
                       •
                       Z*  z/H     H = /cu/f                    (1)
where K is the von Karman constant; u» = (rn//o)1/2 the friction velocity; r the
Reynolds stress; p the density; and f the Coriolis parameter. Variables for which
Rossby number similarity is valid are for instance
                         Q=AC(v-Vg)/U*    T=T/(j5ll|)       (2)
                             =  km/(H2f)
                                                         Ex£x(n)
                Ez£z(n)    T= (T -)/^#    S=(sT-s)/s

    P and  Q  are  non-dimensional velocity  defects; u and  v are velocity
components  in a coordinate  system,  the  x-axis of which coincides with the
direction  of  the surface stress trn (an internal parameter); and E is  the rate of
dissipation of turbulent energy; u",  v" and w" are the fluctuations,  and Ex, Ey
and Ez are the three parts of  turbulent kinetic energy; fx, f  and fz are spectral
density functions with a frequency n. F is the non-dimensional difference of
temperature  to the temperature at  the top of the PBL (index T), and S is the
non-dimensional difference of water vapor to the water vapor content at the top
of the PBL. #* = -q0/(/cpcpu*)  is a  characteristic temperature fluctuation, with
cp the specific  heat and q0 the turbulent heat flux at the ground, s* = j0/(/cpu*)
is a characteristic moisture fluctuation, with J0 the turbulent moisture flux at
the ground, i.e., the rate of evaporation.
    All the  variables  listed  in Equation  2  form  universal vertical profiles,
depending only on the three internal parameters, ]U, Xx, Xy. For instance

               T= T(Z,/i, Xx, Xy)    for  Z»Z0            (3)
where
                                                                      (4)
  is the internal parameter for thermal stratification with L* = — cp/5U*3/(Kj3qg), the
  Monin-Obukhov stability length; |S = g/# with & a reference temperature.
 4-2

-------
                                                                      (5)
are the two internal parameters for baroclinicity^of the PBL.They are internal
ones because they contain the components of dV /dz in a coordinate system
oriented with the x-axis in the direction of the internal parameter tr0-
    For all the variables listed in Equation 2 the same as in Equation 3 is valid.
This means that  the  vertical profiles depend only on u, Xv,  A.,. This means
                                                        x   y
furthermore that the state of turbulence in the PBL,  and, therefore also the
turbulent diffusion, is completely described by these three parameters.
The Vertical Profile of Concentration Caused by a Horizontal Sur-
face Source

    The condition of horizontal homogeneity of the  PBL is satisfied as long as
the source of  a gas or of particulate matter is a horizontal and infinite surface
source.  This means that the concentration rs [g cm"3], made dimensionless by
the characteristic  fluctuation  of concentration r, = in/(Kpu.), with iQ  [g cm"2
sec"1] the source strength, must have a universal vertical profile
             Rs= -- (2,fJi, Xx, Xy)   for Z»Z0         (6)
     Actually the  Rossby number similarity is valid  for  the non-dimensional
 difference (Rs)j — Rs. Here (Rs)T/-the concentration at the .top of the PBL, is
 made zero (vanishing background concentration). Examples of natural sources of
 this kind  are an evaporating sea surface or a very large sand  desert with sustained
 strong winds. Water vapor or  sand is added to the air. The vertical  profile of
 concentration of these admixtures depends only on the three parameters ju, Xx,
 X .
 The Concentration in the Case of a Continuous Point Source

     If one  considers a continuous point source,  the condition of horizontal
 homogeneity is violated and Rossby similarity can no longer be used to conclude
 on which meteorological parameters the concentration pattern depends. (In the
 case of  .an  instantaneous  point source,  the condition  of stationarity is  also
 violated).
     One can try to  make  a statement concerning the relevant  meteorological
 parameters  by  assuming  that  the diffusion process is described  by  the
                                                                     4-3

-------
 steady-state Fickian diffusion equation

                                    Z  r   d Tn 1
                                                             (7)
 and by replacing the velocities of u and v and the turbulent diffusion coefficients
 kx, ky, kz by variables, for which universal vertical profiles exist. The horizontal
 coordinates should be made dimensionless by the internal scale height H given in
 Equation 1b, X = x/H and Y = y/H, and the diffusion coefficients by H2f.
    Since
                             ,/\
                            flu              ,N
                                               Xxz
and
                     "go3 '^go' cos(a0)
                  -^go= '^go1  sin(ia0D

there results

             u/(Hf) = [P+KCOS (a0)/Cg]//c2          (8)


            v/(Hf)=  [o-Ksin(la0l)/Cg]//c2

         a0 is the cross-isobar angle and Cg = u*/|vgo| the geostrophic drag
coefficient. Both an and Cg can be eliminated in Equation 8 by making use of
the resistance law for the PBL

      K cos  (a0)/Cg= -Mm(/t,XXIXy)  + ln(Ro0Cg)   (9)
      K sin (Ia0l)/Cg= N(ft, Xx, Xy)

  where Ro0 = |\y  |/fzn  is the surface Rossby number, a non-dimensional
4-4

-------
combination of given external parameters. The functions N and Mm appearing in
this law are universal functions, i.e. they are independent of external parameters
and depend only on ju, Xx, Xy. The resistance law was first derived by Kazanskii
and Monin (1961) for  the barotropic and neutral case. It was extended to the
diabatic  case  by  Blackadar  (1967)  and by Monin and Zilitinkevich  (1967),
Recently  it was extended to  baroclinic cases by  Yordanov and Wippermann
(1972). By using the resistance law Equation 9a and 9b and the definition

                       Z0= (K Ro0Cg)"'

one obtains for the diffusion Equation 7, the following form

                             dR0               %      dRp
    (  P-Mm+  Xx Z)

The boundary conditions are



        X = 0  ,    Y = 0   ,    Z  = Zb:      Rp=I

        Z  = Z0:                                 dRp/dZ  = 0

     R  =  fp/(b  f5  g"3)  is the non-dimensional  concentration caused by a
continuous point source; b [g sec'1 ] is the strength of the source; and Zb = zb/H
is the non-dimensional effective height  of the source. If one assumes that the
vertical  profiles of the non-dimensional diffusion coefficients for matter Kv, K..,
                                                                  A   y
Kz are universal ones like the vertical profile of the diffusion coefficient Km for
momentum, all  coefficients in  Equation 11—except ZQ—depend only on p., Xx
and Xy  and, of course, some of them on  the independent variable Z. They all are
universal functions. However the non-dimensional roughness length ZQ depends
on Ro0 and Cg as seen in Equation 10, and Cg itself depends on Ro0,/z, Xx, Xy,
as seen  in the resistance law, Equations  9a and 9b. Therefore since ZQ depends
on Ro0/ JJL, Xx, Xy the non-dimensional  concentration Rp must also depend on
Ro0.
                 R* D  / V  V 7  "7   ^)A    • •  \     \\          / "1 O V
               p - npiAtit£,£bl nO0, ^i, AX»  Ay J          (\o)

    Universality  is now  lost;  dependence  from  Ro0  (i.e.,  from  external
parameters)  enters  because  of  violation  of  the condition  of horizontal
                                                                    4-5

-------
 homegeneity  by having a continuous point source. A comparison of  R  in
 Equation 13 with R. in Equation 6 shows the difference.
 The Concentration (at a Fixed Point) Caused by Multiple Sources

     A  statement can  be made only if one assumes that the sources do not
 change their coordinates and their strengths during the period of measurement.
 Furthermore one has to assume that the period of measurement is long enough
 to cover all possible cases (wind direction, Rossby number, thermal stratification
 and, possibly, the baroclinicity) in almost equal parts. This later assumption has
 to  be  made in almost all statistical analyses of  measured concentrations.
 However, the first assumption will be only  incompletely fulfilled and therefore
 causes errors.
     For  each_ direction  (of  the  geostrophic  surface  wind)  the  mean
 concentration rd at the point under consideration should be  evaluated and a
 non-dimensional concentration
                              Rd = r/7d                           (14)

 should  be formed therefrom. Fluctuations of RH should  be independent of the
 source positions (xb)j, (yb)j,  (zb)j and of the source strengths bj. They should
 depend on the remaining parameters in Equation 13.
                     Rd = Rd (R0OI fJL,  Xx>  Xy)                (15)

     The surface Rossby number RoQ  = IWgol/(fzn) can  be determined directly
 from the given geostrophic surface wind, when zn in the denominator is known.
 This can be obtained by conventional measurements of the wind profile near the
 ground, but should be representative of the whole area in which diffusion takes
 place. It may vary with the wind direction  (and is therefore a "meteorological"
 parameter) and, possibly, with  the vegetation period.
     The internal parameter p. for thermal  stratification has  to  be found by
 converting the external parameter a for thermal stratification

                              
                        A=  *3

 with 6£ = #T -#0- When the temperature difference 5# from top to bottom and
^oo the geostropic wind at the ground are given, parameter  a then can be
4-6

-------
formed. This parameter must be converted to parameter/H- Zilitinkevich (1970)
gives diagrams for the conversion of given external parameter a into the wanted
internal parameter ju- For this conversion RoQ is again needed.
     Of course difficulties^ will arise in forming external parameter a from the
given parameters #T, #Q'^go' because the first two are difficult to find. A PBL
can only have a monotonically decreasing or increasing temperature profile #(z).
It does not know temperature inversions or layers with unstable stratification or
similar things very frequently observed in  the real atmospheric boundary layer.
Therefore an observed temperature profile has to be smoothed in order to obtain
the corresponding profile belonging to the PBL. The difference #T — #0 should
be taken from such a profile.
     The baroclinicity seems to be less important. It has to be considered only
for muchly elevated  sources, e.g., very tall stacks. An example has been given by
Wippermann and Yordanov (1972) showing a case with a pronounced minimum
of eddy diffusivity in 320 m caused by baroclinicity. When baroclinicity must be
considered, one  has  first to form the two external parameters (in a coordinate
system x*, y* oriented with the x-axis  in the direction of the geostropic wind
at the surface)

                           2
                         K     f  ^   \o   ^       i       IT
                  **    fzT  i              g     i       u


                         r\    (|
where ZT is the height of the top of the PBL. For a conversion into the internal
parameters Xx, Xy of baroclinicity, the cross-isobar angle a0 is needed:

                                  a0) + -n »sin(|a0l)             (19)
                                            y

                               sin
The angle ao can be obtained from the resistance law Equations 9a and 9b.
Concluding Remarks

    The baroclinicity  of the  boundary  layer  flow  has an  effect on the
concentration pattern only when these are caused by much elevated sources; the
effect can be neglected in most cases. The remaining meteorological parameters
are the internal parameter ju for the thermal stratification of the boundary layer
and  the surface  Rossby-number ROQ.  Both these parameters determine  the
concentration uniquely (as long as the assumptions of a PBL are satisfied).
                                                                      4-7

-------
    It seems that the parameter /i is just the one which has been sought as a
measure  of  diffusion  characteristics  which  depend  mainly  on  thermal
stratification. The empirical "dispersion categories" can possibly be replaced by
this parameter, if we  succeed in  determining the PBL which is equivalent to the
(measured) actual boundary layer.
References

Blackadar, A. K.,  1967:  External parameters of the wind flow in the barotropic
     boundary layer. GARP Study Conference Report, Stockholm, Sweden,
     Appendix IV, 11 p.
Kazanskii, A. B., and Monin, A. S., 1961: On the dynamic interaction between
     the atmosphere and  the  earth  surface.  Bull, (tzv.) Acad.  Sci.  USSR,
     Geophys. Ser. Nr. 5, 786-788, English translation. 514-515.
Monin, A. S.,^nd Zilitinkevich,  S. S., 1967: Planetary boundary layer and large
     scale atmospheric,dynamics. GARP Study Conference Report. Stockholm,
     Sweden, Appendix V, 37 p.
Wippermann, F., and Yordanov, D., 1972: A perspective of a routine prediction
     of concentration patterns. Atmospheric Environment. 6: 877-888.
Yordanov,  D., and  Wippermann, F.,  1972:  The parameterization  of the
     turbulent fluxes of  momentum, heat and moisture  at the ground in a
     baroclinic planetary boundary layer. Beitr. Phys. Atm. 45: 58-65.
Zilitinkevich,  S. S.,  1970: The dynamics of the atmospheric boundary layer.
      Gidrometeor., Leningrad, (in Russian) pp. 291.
 DISCUSSION

 Smith: Could I ask you. Dr. Wipperman, if you consider that the depth of the
 boundary layer could be adequately given by the Rossby similarity theory. My
 experience with this is that one can use the theory very adequately to give you
 estimates of the surface stress and the turning of the wind at the surface, but in
 unstable conditions it doesn't normally give very good estimates of the depth  of
 the boundary layer which  you've used in your scaling. Usually the depth of the
 boundary layer depends  much  more  on the  historical  development of the
 boundary layer due to the input of heat over the daytime period.
 Wipperman:  This is a question of  how one defines the depth of the boundary
 layer. So if you have, for instance, an inversion on the top and you consider the
 height of this inversion as depth  of  the  boundary layer, this could not be
 considered in a planetary boundary layer in which an inversion is not possible.
 I'm just  choosing this "H" as a scale height for the boundary layer, which does
 not mean somewhere is the top of this boundary layer.
4-8

-------
      5. THE USE OF THE RELATIONSHIP BETWEEN WIND
  VELOCITY AND AMBIENT POLLUTANT CONCENTRATION
     DISTRIBUTIONS FOR THE ESTIMATION OF AVERAGE
 CONCENTRATIONS FROM GROSS METEOROLOGICAL DATA
                          M. M. BENARIE

         Institut National de Recherche Chimique Appliquee
                        Vert-le-Petit, France
Introduction

    A  synonym  for  "computation  of  pollutant  concentrations  from
meteorological data" is atmospheric modeling. In this matter, one has on the ojie
hand mechanistic (or explaining) models, which seek the breakdown as well as
the comprehension of the  elementary physical processes of dispersion. On the
other hand, one has formal (or phenomenological) models, which look for the
necessary and sufficient coefficients for  the computation of some  mean or
probability of given concentrations. After explaining the reasons why I have not
chosen  a mechanistic  model for concentration  frequency computation,  I will
deal further with one specific model.
    This distinction is very near to the one defined by Stern (1970) at this place
just  two  years  ago,  in  his Symposium Summary on Multiple-Source Urban
Diffusion  Models:  mechanistical   models  are  source-oriented  and  the
phenomenological ones are receptor-oriented. It should nevertheless be stressed
that phenomenological and statistical  models do not necessarily mean the same
thing. Mechanistical (source-oriented) models are constrained by considerations
of material balance, as opposed to statistical (empirical)  ones, which are not
(Calder  (1970)).  The receptor-oriented  phenomenological  model proposed
herewith does not implicitly avoid the use of the law of the conservation of
matter  (even  if  in the  present paper we fail  yet to attain  its ultimate
consequences) and it has  the pretension to give more insight  into physical
processes than just a correlation between measured pollutant concentrations and
simultaneously observed meteorological parameters.
    Concentration frequency distributions are generally obtained by calculating
the  concentration values  for  all possible combinations of meteorological
parameters: wind direction, wind speed and stability category. Afterwards, we
take the sum of  the joint frequencies of all combinations of classes that rise to a
given concentration. The first objection to this way of proceeding is economical.
                                 5-1

-------
Since the chosen dispersion equation has to be evaluated numerically at least a
few  hundred times for each  receptor  location of any interest, such extensive
computation can  quickly  involve  prohibitive time, even  with a  high-speed
computer.
    Secondly,  the  relevant  meteorological  data  or statistics  have  to  be
extensively  known for the given location. While these data may be available in
sonie  cases,   in   others  not  less   important,  the  next  meteorological
station—perhaps a  hundred  miles away—may not be at all representative.
    Thirdly, the result of individual computations of the dispersion equation has
the character of a  differential.  In the case of important point sources, the result
justifies  careful consideration  of the  constants, which  are  often  based  on
extensive surveys  of  emission  and meteorological  parameters.  As maximum
concentrations are mostly  sought, the  results are acceptable even when off the
target by  a factor of two  or even  more.  But if concentration frequency
distributions, or one of their derivatives as a mean, are sought, the input errors in
the dispersion equations (including lack of basic meteorological information) can
easily be amplified by the effect of the summation. As experience shows, the
final result  is, as often as not, off by ±50%. This error is unacceptable since any
air  pollution engineer, worthy of this name, should  be able to estimate  in ten
minutes  on a slide rule from very few data (such as population density, space
heating  habits,   industrial  context  and  some  general  knowledge   about
climatology), a mean with the specified accuracy.
     The fourth objection that can be  made is that the basic concept of plume
computations is a  short-time process, say typically of 30 minutes. The passage to
long-time averages is especially awkward at  the level of transition from gaussian
plume of constant direction to the normally meandering plume. Next follows
another  transition  to changing wind directions, as described by a specific wind
rose. At least two different  physical processes are  involved, one of which is
definitely not gaussian.
     The fifth and last, but not least, objection is that a  model should require a
minimum of basic assumptions and discard everything not absolutely needed.
This should apply to stability classes, as far as averages are concerned. Not that a
frequency  table of the  stability classes for  towns and most populated areas
should  be so difficult to obtain. However such tables are usually  unavailable
when and where they  are most needed, e.g., for the location of a new plant (see
also the second argument above).
     And thus arises  the question  as  to whether frequency tables  of stability
classes are necessary. Looking at Table I, we are inclined to answer that, at least
for the  limited purpose of  computation of  averages, they are not absolutely
indispensable.
     It would be highly interesting to supplement the data of Table I (the only
ones we have been able to locate), with non-European statistics and with figures
concerning other  than temperate climates.  From what is available for the time
being, we  can  see that  the  near-neutral  classes  represent  76% ±5%  of all
5-2

-------
 situations*. We effectuated a few trial computations which show that the mean
 is but slightly—and the median not at all—sensitive to the frequency changes in
 the stable and unstable classes. If this is the case, and the overall class frequency
 distribution is an approximately constant property of the temperate climates,
 then  why  bother to split  up first into categories and afterwards totalize them
 during expensive computer hours?
    The purpose of the present paper is  to provide a simplified method of
 estimation of pollutant  concentrations, in cases where detailed meteorological
 data are not at hand. As far as possible, the method is an empirical one and has
 no pretension  to give  insight  into the  physical  processes of  atmospheric
 dispersion. The purpose is to provide an easy and rapid means for  atmospheric
 modeling around point sources.
 The Experimental  Data

     The experimental basis for the present work is due to Prof. P. Bourbon and
 his staff who obligingly permitted us to make the following statistical analysis of
 the  data of their survey around  the  natural gas sweetening plant at  Lacq
 (southwest  France).  Gratitude  is expressed at this point for this  important
 contribution.
     For several years, 24-hour  mean  concentrations of SO2, NO2,  H2S and
 other pollutants were measured  at 37  points.  Figure 1  shows their repartition
 around the plant. For the time  being, we will use only the data for the two.
 years,  1968-1969, which are complete and  homogeneous  insofar as  SO2 and
 NO2 concentrations are concerned. The former  was analysed by the very specific
 nitroprussate-pyridine method  (Bourbon  et  al.   (1971));  the  latter by the
 Jacobs-Hochheiser method.
     The concentrations  were represented  in  the form of cumulative frequency
 diagrams, one for each survey point and pollutant.  Of the 74 diagrams, 54 are
 very nearly straight  lines on the lognormal coordinates which  were used.  In
 Figure 1 the survey  points with two straight-line distributions are marked-^;
 those with one by-O-;  Figures 2, 3, 4 and 5 give samples of these distributions.
Discussion

     It has been found that the cumulative frequency distributions of suspended
particulates at  CAMP (urban)  sites have  a tendency  toward  lognormality
(U.S.D.H.E.W.  (1958)). Earlier  applications, referring also  to other pollutants

*) With the restrictive condition that classification  criteria should be the same. This meant
   that we had to leave the Szepesi (1964) data out of Table I because they are based on a
   perhaps better but nevertheless  different  criterion,  i.e.,  the  measurement  of  the
   temperature gradient.
                                                                        5-3

-------
but always to receptors located in or at area sources, are to be found in Zimmer
et al.  (1959)  and Gould  (1961). As stated by  Gifford (1969), the lognormal
concentration   distribution   can   be   mathematically   derived   by   the
particularization of the general explanation of skew distributions.
     It was shown by Benarie (1969) (1971): (a) that the lognormal distribution
is strictly valid for concentration  frequency distributions in any given direction
around a point source. This is a consequence of the facts that wind velocity
distributions in any given  direction may be approximated by a lognormal and of
some very general mathematical  properties of this function.* (b) that in the
special  case of  the point source without thermal plume  rise, the geometrical
standard  deviations  of  the  wind velocity  distribution  and  that of  the
concentration distribution are numerically equal. From this equality it follows as
a corollary: the concentration frequency distributions for receptors, situated at
various distances along the same radius and from the point source, should have
the same geometrical standard deviation. The general  case of the source with
plume rise, will  be  discussed further  when speaking about S02  (c) that the
observed lognormal distributions for area  sources follow directly by summation
of the effect of a large number of likely distributed point sources.
    At this point, we should distinguish between NO2 emitted at nearly ambient
temperature  and  the SO2  contained  in plumes  of  higher temperature.  The
former contributes  evidence  to  points  (a)  and (b) above.  As these general
affirmations  were obtained from  relatively few data, this further evidence is
useful. The discussion of the S02 results below will add a new contribution.
    Figures 2, 3, 4 and 5 illustrate the above affirmations (a) and (b). Geometric
standard  deviations  for  (unperturbed)  NO2 receptors  in  the  same  radial
orientation are identical and nearly equal  to the geometrical  standard deviation
of corresponding wind velocity just as  required by the corollary to the theorem
cited in the Appendix.
     Figures 2, 3, 4 and 5 are only a fractional sample of the evidence on hand.
Although they are quite convenient for interpolation, as will be shown below,
space does not permit displaying similar figures for all 37 survey points. Instead,
the  principal information  from them have  been summarized  in Figure 6,
which displays values of kNQ2 = o NO^  /aw (ONO^ and aw are the respective

geometrical standard deviations  for N02 concentrations and wind velocity).
Values of kNO   deviating from  unity,  are found in the western half of the
pattern, where topographical accidents are more pronounced.  In the eastern half,
in the  first  approximation  level  the behaviour of k|\io2   is as  theoretically
expected.
    As for the S02 which is definitely associated with a thermal plume rise and
is emitted by 60 to 80 m high stacks, the hypothesis of (concentration) prop w"1
cannot be assumed  and  a kSO2 = °S°2 value different from  unity should be
                                  Ow
expected. The numerical value of ksoo mav readily be computed by applying a
*See Appendix,  for discussion of the frequency  distribution of wind velocity and the
 properties of the lognormal distribution which are of interest here.

5-4

-------
dispersion formula to a plume rise expression. As one has a rather wide choice
from both sort of equations and naturally an even larger one of combinations, it
is easy to find one or more to "prove" that the kso2 values reported in Figure
7 are correct. For us, as far as they were empirically observed, they are indeed.
     It has already been mentioned that cumulative frequency diagrams such as
Figures 2, 3, 4 and 5 are convenient for interpolation purposes. At least for level
terrain, the  sequence of (almost) parallel, straight lognormal representations is
related to distance. This is easy to understand,  as concentrations diminish with
distance, or, what is equivalent, a given concentration  occurs with  decreasing
frequency and with  increasing distance.
     This relation between concentration at constant frequency and distance, is
illustrated  by Figure  8.  As most survey points show some  (topographical)
singularity,  it is  not easy to align enough  points, in order to judge the form of
the  regression and the exactitude of  fit of  some function.  Therefore Figure 8
should  be considered  as an empirical data collection and  will be used in the
following as such, as means for interpolation.
     We may now investigate whether there is a correlation for a given distance
between the frequency of exceeding  some concentration and the frequency of
wind blowing from the source in the direction of the receptor.
     Figure 9 is the wind rose observed between 1961 and  1965 near the plant
site. Frequencies corresponding to the opposite wind directions are the abscissae
of  Figure 10, concerning only receptors at approximately  the same distance,
between 5 and 7 km,  in this case. The ordinates are the frequencies by which
some  given  concentration—here 50jug NO2/m3—are  exceeded. It might  be
expected that a  correlation should  exist between these frequencies. At first
approximation, this assumption seems to be verified.
     The observed scatter is due, among other causes, presumably to the lateral
wind turbulence  and its directional change  during a 24-hour  sampling period.
Probably, with shorter sampling times, this scatter would diminish.
Example of Application

    Up to this  point, we have presented these experimental  data somewhat
differently from the usual  tabular or isoconcentration-map  form.  How far
reaching is this  special presentation?  Should it  be called a model;  or, more
modestly, a relationship between wind and ambient concentration.
    Suppose we ask for a cumulative concentration frequency diagram for the
point marked  X on Figure  1, a location  at which  no receptor was operated.
From Figure 9 it can be seen  that the frequency of wind blowing from the stacks
in the direction of the receptor is 5.5%. Entering Figure 10 at this abscissa value,
the frequency of 25% is read at the ordinate. This would be the frequency of
                                                                      5-5

-------
exceeding 50 jug NO2/m3/ if the receptor were 6±1 km distant from the source.
Actually it is «* 3.5 km from the source. The concentration corresponding to.this
distance (and  direction)  is interpolated as  shown  in  Figure 8. Thus 62 jug
N02/m3 is found. This  pair of values (62 /ig  N02/m3, 25%) is one point of the
cumulative frequency diagram. As its geometric standard deviation  should be
equal to that of the wind,  the concentration distribution at this hypothetical
receptor is defined.
     For a thermal source, the same procedure should  be followed except that an
experimental k-value should  also be  determined and stack height,  effluent
temperature and velocity also taken into account. In our case this  is defined only
by  Figure 7 and therefore cannot be considered of  general validity. Nearer
details will be supplied in a subsequent paper.
Outlook

    These  interpolations and perhaps slight extrapolations have limited uses in
a dense survey network,  as  the one just discussed. However the empirical
relations (a) wind direction frequency versus frequency of exceeding an arbitrary
concentration  (Fig.  9)  and  (b) concentration versus  distance  (Fig. 8)  are
generally established. Then a few  points  per diagram,  perhaps three, will  be
sufficient to obtain  concentration versus frequency diagrams for a multiplicity
of geographically scattered points. The only additional information needed, are
wind  roses and frequency  distributions of wind velocities. This seems true for
plane, undisturbed topography. The evidence under consideration is just enough
to say that topographic relief does something to the constants. But for the time
being, we are unable  to express this effect in a general and quantitative way.
    If, with more  evidence at  hand,  the  functional  form  and the general
constants of both these relations can be found, we shall have a modeling method
which will  need very little  meteorological input.  In this way, computer time can
be replaced by equivalent graph reading time, and, what is even more important,
results  will  be of irrefutable  empirical  character. Its advantage,  above purely
statistical models, is  a greater generality, as cause and effect are more evidently
related in the present model. We hope to continue working in this direction.
Acknowledgement

     I wish to express my gratitude to Mr. P. Bessemoulin and Mrs. T. Menard for
all  their help,  computations,  computer programs, tedious graph  drawing, etc.
involved in the  present work.
 5-6

-------
               Table 1. FREQUENCIES OF STABILITY CLASSES
Frequency %
 Un-   Near-
stable neutral Stable
A,B   C,D,E  F,G
 4.2  76.8   19.0
10.1  69.6   20.3
 5.0  85.7   6.7
14.3  70.6   15.1
10.2  82.2   7.6
 7.8  81.4   10.7
  Principle of
 classification
Pasquill-Turner
Pasquill
t measured
Pasquill
Pasquill-Turner
Pasquill
 Year         Reference
1961-62   Nester (1966)
   ?      Bryant (1964)
1958-63   Szepesi (1964)
1964      Polster-Vogt (1965)
1965-70   Hodin
1967-71   Bessemoulin (1972)
      Site
Frankfort, G.
   ? ,Br.
Budapest, Hung.
Julich, G.
Trappes, F.
Rouen, F.
Figure 5-1.  Location of sampling stations around gas sweetening plant at Lacq,
      France.
                                                                         5-7

-------
            0.01


            0.09
             ai
             az

             as
             10





             "

             X
             40

             90

             60

             TO


             •O



             •0
                        • Station 10
                        +  •  II
                        a  -  12
                        o  -  is
1    1   I   I  I  1 I I I
                             i    i  i  i  i i
8   S
          8 3 £888
          CONCENTRATION
                                                         § § §§!
Figure 5-2. Cumulative frequency diagram for S02  concentrations at stations
      located in the 289° -308° sector from the source, at Lacq, France.
              0.01


              0.09

               O.I

              0.2


              0.9

                I

               2

             I
             X 9
             (J

             2 .„
              > 40

              ; 90

               60

               70


               80


               90


               »9


               96
                      o Stolkm 10
                      +  «  II
                      Q  -  12
                      O  -  19
                            8  ?  8 8!? 888      §   8888 8888
                                        —      N   10  *  K  3 F?A3)O
                                 ODNCENTRATION t/ig/m1)
Figure 5-3. Cumulative frequency  diagram for NO2 concentrations at stations
      located in the 289°-308° sector from the source, at Lacq, France.
5-8

-------
             aoi

             aos
              0.1
             0.2
             as
              40

              90

              60

              TO

              •0


              90
              I    I
II i  III
               2      8   8  ? 8 82888      g    8   8  8 8&S88
                               CONCENTRATION I
Figure 5-4. Cumulative frequency diagram for SO2 concentrations at stations
      located in the 108°-121° sector from the source, at Lacq, France.
                        I    I   i  I i  I I i I      I    I  I  I  I I I  I I
                            8O Q Q Q O OO Q      O  O  OQ9QOfiC
                            8 ? s gessg      i  1  ; § sesS|
                               CONCENTRATION
Figure 5-5.  Cumulative frequency  diagram for  N02 concentrations  and wind
      speed at stations located in the 108°-121°  sector from the source at Lacq,
      France.
                                                                        5-9

-------
  Figure 5-6. Values of kNO  = pN02  for sampling stations at Lacq, France.
                         2   awind
Figure 5-7. Values of kso^ =  °  . 2  for sampling stations at Lacq, France.
                       ^    /T\A/irirl
                            awm
5-10

-------
       too -
                                                              16%
                                     10
                                 DISTANCE (km)
Figure 5-8. The concentration at three frequencies as a function of distance from
      the source.
                      NW
                                                        NE
                                                        SE
               • 1%
      Figure 5-9. Wind rose for 1961 - 1965 at plant site in Lacq, France.
                                                                       5-11

-------
                     70

                     6O

                     50


                     30

                     20


                     10
                      2

                      I
                    as
                     0.1
              7
                      or
                             as
                                   2    5   10    20   30  40
                                   WIND FREQUENCY (%)
Figure  5-10. Frequency  of exceeding  50|ug  NO2/m3 as a  function of wind
     frequency from the source to the receptor for eight sampling stations.
                O.OS
                 O.I
                o.t
                OS
                 2

               2 •
n	1—i—r i  i i i  i	1	1—i—i  i i  i i

                 Mil™.'1"*
                               RESULTANT WIND SPEED (rtl/sec)

Figure 5-11. Cumulative frequency of resultant wind speeds for three averaging
      times.
5-12

-------
 References
Aitchinson J. and Brown, J.A.C., (1969): The lognormal distribution. Cambridge
      University Press, p. 9 ff.
Benarie, M., (1969): Le calcul de la dose et de la nuisance du polluant e'mis par
      une source ponctuelle. Atmospheric Environment. 3: 467.
Benarie, M.,  (1971): Sur la validite de la distribution  logarithmico-normale des
      concentrations de polluant.  Proceedings of the 2nd Internat. Clean Air
      Congress, 1970, Washington, D. C., Academic Press, New York, N. Y., pp.
      68-70.
Bessemoulin, P.,  (1972):  Contribution a I'etude de la diffusion des polluants
      gazeux dans /'atmosphere. Thesis, Paris, France.
Bourbon, P., Malbosc, R., Bel, M. J., Roufiol, F. and Rouzaud, J. F.,  (1971),:
      Contribution a la determination specif ique  dans  I'atmosphere du  dioxyde
      de soufre. Poll. Aim. 52:  271-275.
Brooks,  C.E.P.  and Carruthers, N., (1953): Handbook of statistical methods in
      meteorology, Her Majesty's Stationery Office,  London, England,  chapters
      3 and 11.
Bryant, P.  M., (1964): Methods of estimation of the dispersion of windborne
      material and data to assist in their application. AHSB (RP) R42.
Calder,  K.  L., (1970): Some miscellaneous aspects of current urban pollution
      models. Proc. Symposium on Multiple-Source  Urban Diffusion  Models,
      Research Triangle Park, N. C., APCO Pub. No. AP-86.
Gifford,  F.  A. Jr.,  (1969):   The lognormal  distribution of air pollution
      concentrations.  Air  Resources Atmospheric  Turbulence and Diffusion
      Laboratory, ESSA, Oak Ridge Tennessee (preprint, 3p.).
Gould, G., (1961):  The statistical analysis  and interpretation of dustfall data.
      Proc. 54th Annual Meeting Air Pollution Control Association, New York,
      N.Y.
Hodin, M.: Personal Communication.
Nester,    K.,    (1966):    Hauf igkeitsstatistische    Aussagen    uber
      Maximalkonzentrationen von Schornsteinabgasen auf Grund  synoptischer
      Wetterbeobachtungen.Sfat/6. 26: 521.
Polster, G. and Vogt,  K.  J.,  (1965):  Grundsatze  and  Untersuchungen zur
      Beurteilung   der   Ausbreitung  radioaktiver   Abluft.  Protokoll  zur
      Informations- und Arbeitstagung im Kernforschungszentrum Karlsruhe,
      Germany, p. 15.
Stern,  A. C.,   (1970):  Utilization  of air pollution models. Proc.  of the
      Symposium  on Multiple-Source  Urban  Diffusion Models,  Research
      Triangle Park, N. C., APCO Pub. No. AP-86.
Szepesi,  P., (1964): Computations of concentrations  around a single source.
      Idojaras. 68: 257.
                                                                     5-13

-------
U.S.D.H.E.W., (1958): Air Pollution Measurements of the National Air Sampling
     Network - Analyses of Suspended Particulates, 1953-57. PHS Publication
     No. 637. p. 245.
Zimmer, C.  E., Tabor,  E. C. and Stern, A. C., (1959): Particulate pollutants in
     the air of the United States. J. Air Pollution Control Association. 9: 136.

Appendix

    It  is fairly  well known  in  meteorology  (Brooks,  et al. (1953)) that the
distribution  of wind velocities is skew in a given direction, with high frequencies
at  low  velocities.  Several  two  or  more  parameter  laws  present  a  fair
approximation of the experimentally observed distributions.
    It  has  been observed that  among two-parameter  skew distributions the
logarithmic  normal function  is an experimentally convenient representation of
the wind velocity (Benarie (1969)),
    At first, it  seems singular to use  a mathematical function which  will not
accomodate the zero value of the variable for wind. Measured with the usual cup
anemometer, wind velocity values almost everywhere show a high percentage of
"calm" periods. Closer scrutiny of sensitive thermoanemometric data seems to
suggest that this abundance of calms is purely instrumental. In reality, very low
velocities occur with finite frequencies, and a true zero does not  physically exist.
As our present  purpose is not to  get  into meteorological arguments, we avoid
this difficulty by defining wind velocity classes as "less than 1 m/sec."  (the
starting point of the anemometer)  and by including in this class all observations
between 0 and  1  m/sec. The fraction of observed "calms" is  proportionately
attributed to each directional  frequency.
    A  second difficulty arises from the fact that (except for specially conceived
survey networks which we do not possess) wind data are from meteorological
stations,  following  international  meteorological   conventions   (i.e.,  one
observation   every  three  hours.);  while  pollutant  concentration  data  are
integrated for shorter or longer periods (24  hours in our case).  Figure 11 which
presents the cumulative frequency distributions of: (a) 3-hour,  (b) 24-hour and
(c) 1-week wind vectors from the same station, shows that the error  committed
by using (a) instead of (b) is slight.  Anyway,  this is a minor point, as the 24-hour
wind vector, which should be physically better justified, can  be obtained easily
from the original data by a minor computational program.
    This rather lengthy argument: about the approximation of the observed
distribution of wind velocity frequency by a lognormal  function was necessary
because  of  the  interesting reproductive  properties  of  this  two-parameter
distribution  (Aitchinson  and  Brown  (1969)),  which  are  the   immediate
consequences of those forjthe normal distribution.
    Theorem: if w is A (w,a) i.e. a lognormal function with the geometric mean
w and  the geometric standard deviation a, and k and c are constants, where c >
o (say  c = ea), then cw is A (a + bw, ka).
    This theorem implies the corollary result: if w is A (w,a) then w1 is A (-w,a).

5-14

-------
DISCUSSION
Donald Rote: I'm not completely sure but it seems to me that your approach
depends very heavily upon having a uniform wind field. If you have topographic
features that in some way influence the wind field, you will have differences at
the same time between wind directions at the source, and at a given receptor. As
a consequence, this will  greatly distort your capability of generating curves  of
constant percentile. Could you comment on that please.
Benarie: I fully  agree with you. The fact, in  Figure 7 I think, of having ratios
different from the expected value of 1 in the western part of the  pattern, and
about  1 in the eastern part, which is level, illustrate your point very well. But
survey data are very expensive; I had to do with the data  I had and these were
the many meteorological data  I had.  The correct experiment to verify would  be
to have had wind vanes at at least 8 stations.  Then the conclusion would  be
immediate or almost immediate. I agree fully with your point.
Harold Neustadter: Have you had an opportunity yet to  attempt  any  internal
check  op  the validity of your conclusion. Namely taking three or four  of your
receptors  and seeing if  you  can generate the  results of  your dense network
within the set you already have?
Benarie: Sure,  I  did. That was the first check, and it was as  reliable as the
receptors  and measurement results. You know, of course, that manual chemical
and  analytical methods  are good  to, say, plus or minus 20 percent.  I can't
pretend more.
Singer: Work like this is being done by Brookhaven National Lab where they are
studying a network outside of New York City. A paper was just presented by Gil
Raynor at the Philadelphia  meeting where he had concentration vs.  distance
from New York City out to a hundred kilometers and it is very similar to yours.
Predictions were done very similar to yours, and it is related to Frank Gifford's
paper  this morning.  Using the lognormal distribution, the predictions worked
very well as long as you stayed  near the mean. But when you  went to  the
extremes, if you tried to predict the  extremes near ninety-nine percent,  which is
needed for many problems, the whole system fell apart. While near the  means it
worked verv well.
Benarie:  It's quite a  general  statistical property that if  you don't have any
infinite samples, then at the ends of the sample distribution you go wrong.That's
sure. One more  point, I stressed that I am interested here in long-time means,
and as you  have seen in the first table,  I neglected the stable and  the unstable
situations, saying that the mean is  mainly influenced by the 75% of neutral
situations. I know that I can't do anything with the extremes.
Singer: It's true,  but I  know  the normal situation. People will  then take your
curve and  extrapolate it to 99 percent.
                                                                     5-15

-------
Benarie: No, they shouldn't do that.
Arnold Court:  The  apparent  relation  between  the  distributions  of  the
concentrations and the wind speeds may be valid, but this does not mean that
either  is  necessarily  lognormal.  For one thing,  we  as  meteorologists and
climatologists cannot accept lognormality for wind  speeds. By the argument
which  the  speaker made earlier in  the discussion, we are looking for the most
simple  relation.  Winds  are  basically  three-dimensional  vectors.  The third
dimension, up, is generally  one or two orders of magnitude less than  the two
horizontal  vectors, so we tend to ignore it. However our general attitude toward
wind  is that we have two  orthogonal components, and we represent it by a
bivariate distribution. We generally accept the bivariate normal largely in default
of any other bivariate distribution that we can handle. If winds and components
are bivariate normal, then the wind speed itself, independent of direction, has a
Chi distribution of two degrees of freedom, also called the Rayleigh distribution.
Now  this  is quite similar  in  appearance to  a lognormal, but is a  different
distribution. On the other hand if you accept  lognormality for wind speed, you
have a very difficult time deriving the distribution for winds by components.
Therefore  I think that if the speaker's argument holds that the distributions of
concentrations  and  wind   speeds  must be  similar,  this  indicates  that
concentrations also may have a Rayleigh rather than a lognormal distribution.
Benarie: Thank you very much and mostly I agree with you, Mr. Court. Firstly,  I
stressed one of your points  in my appendix which I  didn't read here. Normally
it's known that wind  speed having  lots of  zeros  is not  a function to  be
represented by a lognormal. I am asking the meteorologists present here if they
can  provide  me any data.  I  have made some experiments with  a  sensitive
thermistor  anemometer in a wind field. Because it's  not a cup anemometer, it
registers lots of values down  near zero. It seems there are values everywhere. As I
told already  at the end of  Mr. Gifford's paper,  I am looking for a convenient
engineering fit and an easy mathematical manipulation and not a theoretical
explanation.  Lognormal is good for me, but the argument is open as to how far
it's physically good, and I leave it open.
Joseph Knox:  I would like to ask you a  question if I'may about two of the
figures pertaining  to direction  108° to 121°  in  regard  to  pollutants SO2  and
N02. These figures have  different slopes for the pollutants on lognormal paper,
and the wind is shown as being approximately a  lognormal function paralleling
the NO2 distribution. Since the slopes for these two pollutants are different, it
doesn't parallel the S02 distribution.
Benarie: It should not be.  I stressed  in the paper that for  the NO2,  which is  a
non-thermal emission, parallelity is requested. For a thermal emission, if we put
concentration  against distance  with the parameter  of  wind  speed, by a
combination of the effective  stack  height with a formula like Brigg's, and  a
diffusion formula, you get .... (writing  on board) things like that. In this case
the concentrations have a slope in logarithmical representation. The exponent is
-1 only when there is no thermal elevation. With thermal elevation it is different
 5-16

-------
from 1. But I stress the point that you can choose a chimney thermal elevation
formula which fits just the numerical value which gives you a good slope.
Knox:  My  point  is  this,  I  would also  be afraid of chemical reactivity or
photochemical  reactivity  affecting these distributions.  As noted  by Larsen
several years ago, the pollutants with the steepest slopes for the largest standard
geometrical  deviations on lognormal paper are the most reactive pollutants. And
so,  I really  want to comment that I see some  cause for caution about dealing
with photochemical reactive pollutants in this manner.
Benarie:  It has  to be remembered; your argument is quite valid. I asserted up to
now that only  the thermal rise is a cause of the variation of this ratio from 1.
Another could  be a sink, a reaction. Because  I cannot yet give a quantitative
evaluation or a  theoretical explanation  of  the differing values, I note them only,
so any tenative  explanation is good.
                                                                      5-17

-------

-------
              6. AVERAGING TIME AND MAXIMA
               FOR DEPENDENT OBSERVATIONS
                      RICHARD E. BARLOW

   Department of Industrial Engineering and Operations Research
                   and Department of Statistics
            University of California, Berkeley, California

                               and

                   NOZER D. SINGPURWALLA

                Department of Operations Research
 The George Washington  University, Washington, District of Columbia.


 Introduction


 Monitoring Air Pollutant Concentrations

    Under the Continuous Air  Monitoring  Program  of  the Environmental
 Protection Agency, pollutant concentrations are punched into a computer tape
 every five minutes. Let tvt2, . . ., tk, . . ., tn denote the instants of time, spaced
 five minutes apart, at which concentrations of a certain pollutant, say xt ,xt , . .
 ., xt   . . ., xt are recorded on the tape (Larsen (1969)).
   K      n
    We assume, for now, that the observations represent a time series in which
 the successive observations are highly correlated. Consider averages of length k
where k« n.
    For  purposes of evaluating  air quality,  it is  important to know  the
                               6-1

-------
probability  of  maximum  pollutant concentrations exceeding state standards
which are stated for various averaging times. Let
                      ,-l
                                          + X
                                              tn
We are interested  in obtaining the distribution of 77k n for k moderate and n
large.
A Survey of Results Assuming Independence

    If the sequence of observations xt., i = 1,2,  . . ., n were independent, as
was assumed by Barlow (1972) and Singpurwalla (1972), we could use extreme
value theory to determine the limiting distribution of Tjk n as a function of the
averaging time k. Under the hypothesis of independence, it is easy to verify that
when the distribution of pollutant concentration, F, is assumed to be either a
normal, a lognormal, a gamma or a Weibull
       LimP
       n—CO
                      *
                       k,n
= exp(-e~*) = A(x)
                           - CD <  x < CO

exists and is nondegenerate, where ak n> 0 and |3k n are a sequence of norming
constants.
    LetG(x) = 1-e'*forx>0and
                        Rk(x) = G   'Fk(x)

where Fk is the k-fold convolution of  F with itself.  Gnedenko  (1943) (cf.
Marcus  and Pinsky (1969))  showed that the norming constants could be
expressed as
                                 Rfc'Uogn)
                                                                   (2)
and
                                  log n) - R^1 (log n)
                                                                   (3)
6-2

-------
Hence for large n,
                                            k.n
                                                                  (4)
j3k n is the location parameter and also approximately the 37th percentile of
                               X -
           'k,n
                                °k,n

and thus provides a convenient way of summarizing 17 k n.
    The main difficulty in using j3.   occurs in computing the convolution F. .
                              K f n                                  K
In the case where  F is the gamma (normal) distribution, then of course F. is
again  a gamma  (normal)  distribution  and there is no problem  in  computing
Rk(x).
    For large n, and k« n, Gurland (1955) has approximated j3k n and ak n
when  F is the gamma distribution, i.e.,

                            x  X-l  -u/0
For this case
       n)'  and
                                                ~-
                                                                 (5)
If we let
where
              F(x) = O [* ^ ]    -00 < x < GO
   A


-00
                                      k-u2/2
                               A/2TT
                                              du
that is, if F is a normal distribution with mean ji and variance a2, then we can
immediately verify (cf. Cramer (1951) pp. 374-375) that for large n
               £M~o-k-|/2(2logn)l/2 +/i

               and a^n~crk~l/2(2logn)-|/2
                                        (6)
                                                                6-3

-------
     Barlow (1972)  Corollary 4.3 has obtained bounds on ]3kn when F is
 continuous, F(0) = 0, and R(x) is convex (concave). He has shown that
                                                                     (7)

                         sk)^^)  FT1 [ -i- f"1 G(log n)]
 where
                                   if  i    *
                                    f
               TU)  =  l-e-x[  Z  Tfl   forx>0
                K                 L  j=0  J! J
 is the gamma distribution and G = F-j .
     For example, if we let
                F(x)=  >-
 that is, if F is a Weibull distribution with scale parameter 5 and shape parameter
 1/b, then for large n Equation 7 gives

              kH8(logn)b<  £kn  < k~bS(logn)b          (8)

 when 0 < b < 1 .
 Motivation and Summary

     Since air pollutant data are often correlated, as will be illustrated in the next
 section, the assumption  of independence for the sequence  xt.  , i = 1,2, . . ., n
 is clearly incorrect. We can overcome this difficulty if it is reasonable to assume
 that  the  sequence  of  observations   [xtj]  is associated. Association  is  a
 strengthening of the concept of positive correlation and is defined and discussed
 in the next section. In that section we show that certain air pollutant data can be
 modelled by an autoregressive process of suitable order. In the next section and
 the one  following it,  we show  that the extreme value approximation function
 given by  Equation 4 is a lower bound  on  the distribution function of the
 maxima of averages of associated observations.  Based on this result, j3k n, (or its
 upper bound),  is an upper bound on the 37th percentile  of the distribution of
 the maxima of averages of associated observations.
6-4

-------
 Time Series Models for Air Pollutant Concentrations
 Preliminaries

            Suppose than n observations
                   *t,»  *t2>--  • . xtk.---.Mn

 which  are generated sequentially in time represent a discrete time series. We
 regard these observations as a particular realization of a stochastic process.
    We focus attention on those processes which are  strictly stationary. For
 such processes, the joint distribution of any set of observations is unaffected by
 shifting all the times of  observation  forward or backward  by any integer
 amount k. The mean n of the process can be estimated as
                                  I  n

                            "*,?,""
 and the variance a^of the process can be estimated as

                      .2    i      r      -2
                        •
    The covariance between Xtj and Xti+k is called the autocovariance at lag k,
and is defined as (capital Xtj's are random variables)

        yk=Cov[xt.,X,i+k]=E[(X,.-M)(Xti+k-/i)]

For a stationary process, the autocorrelation at lag k is defined as
    The most satisfactory estimate of pk is given as (cf. Box and Jenkins
(1969)).

                                   ck
                             rk =  •£"•"                             (9)
                                   <"o
where

        Ck=7TnZ  [xtj-x][xti+k-x],k = 0,l,2,...,K.
                i=l  u

    In practice, to obtain a useful estimate of the autocorrelation function, we
would need at least 50 observations and the estimated autocorrelations rk would
be calculated for k = 0,1,2, . .  ., K, where K is not larger than n/4. (cf. Box and
Jenkins (1969), p. 33.)
                                                                  6-5

-------
 Associated Processes and Air Pollutant Measurements

            Random variables X (, X2,. . ., Xn are said to be associated if

                      CovtfCXD, Ar.)U) >0
for all pairs of binary, increasing functions F and A where
                       £ = ( Xj, X2,..., Xn)
(Binary  functions  are 0  and  1  valued functions.)  Essentially,  this  is  a
strengthening of the concept of nonnegative correlation. The definition is due to
Esary, Proschan  and Walkup (1967), who also prove many important properties
of associated random variables. For example, two binary random variables X and
Y, are associated if and only if
                            Cov (X,Y) >0
This is not true for arbitrary random variables. They also show that independent
random variables are associated.
    It  follows  easily from  the definition  that  increasing  functions  (not
necessarily binary)  of associated  random variables are associated. Hence if air
pollutant measurements
                           Xf,i Xf2, .  . . , Xtn
are associated then so are their averages
    Now let [XT ; T e D ]  be a stochastic process, where D = [1,2,3, . .  .] or D =
 [0,°°], for example. The process is said to be associated if, for all [T1fr2, • • -,
rn] e D (the TJ'S need not be equally spaced) and all n>1, the random variables
are associated. The definition can be found in Esary and Proschan (1970). They
study special performance processes of interest in reliability theory. It follows
from  the definition of an associated process that the autocorrelation function,
p(t) > 0. However, p(t) > 0, does not of course imply in general that the process
is  associated. Additional  restrictions on p(t) are required, in general, to assure
association.
    Air pollutant  concentrations follow a diurnal cycle which  results  in an
autocorrelation which may assume both positive and negative values. Hence we
cannot expect hourly averages to be associated. If we record only the high-hour
daily  average the association  concept is more reasonable if we also  confine
observation to a single season. Figure 1  is a plot of oxidant data for Livermore,
California covering the period June-August 1970. The sample autocorrelation
shown in Figure 2 shows the  existence of a 6 - 8 day weather cycle phenomenon.
 6-6

-------
 Since  the autocorrelation shows negative values, it is unreasonable to assume
 oxidant values are associated  in time  according  to our definition. However,
 oxidant is  a secondary  pollutant and highly dependent on  meteorological
 conditions.  Figures 3 and 4 are plots of carbon monoxide data for Livermore,
 California covering the period June-August, 1970.  The autocorrelation seems to
 remain  positive  within  the range  of sampling  error. The assumption  of
 association  may  be  reasonable for primary pollutants over a time  period not
 exceeding a season.  Also, the  less  dependent the pollutant is on the weather
 cycle,  the more likely the assumption  of association will be valid. As we shall
 see, the association  assumption, when valid, will  enable us to obtain useful
 bounds on quantities of interest.
     Ash, Bloomfield and  McNeil (1972) have used a fourth root transformation
 on S02 data. The resulting data was modelled using a Brownian motion process.
 Such processes have  independent increments and are always associated, since
 independent random variables are associated.
 The Autoregressive Process

     Most of the time series occuring in practice can be reasonably well explained
 by an autoregressive process. In this section, abstracted from Box and Jenkins
 (1969), we review some well-known properties of such processes.
     The models that are usually employed in time series analysis are based on
 the idea that a time series in which the successive values Xt. , Xt2, . . . are highly
 dependent, can  be  regarded  as generated from a series of independent shocks
 [atj]  i = 1,2, ....  These shocks are random drawings from a fixed distribution,
 usually assumed  normal, and having  a mean zero and a variance af . The  [atj]
 process is transformed to the  [Xt.] process by what is known as a linear filter.
     A  [Xtj]   process  which  is  extremely  useful  for  representing  certain
practically occuring  situations is called the autoregressive process. Let Xt.-/u =
*-^/                                                                    '
Xt. = 1,2 ..... Then  the process
is called an autoregressive process of order p. In the next section we establish
conditions under which an autoregressive process is associated.
    If we define a backward shift operator B as
                             BXtn=  Xtn_,
then, the above autoregressive process can be written as
where 0(B) = (1 - 0., B - 02B2 - ... - 0pBP).
    The equation 0(B) = 0 is called the characteristic equation of the process.
    Several properties of the autoregressive process have been given by Box and
Jenkins (1969). We summarize below a few pertinent ones.
                                                                      6-7

-------
    (a) An autoregressive process is stationary if the roots of its characteristic
equation lie outside the unit circle.
    (b) The autocorrelation function pk of an autoregressive process satisfies a
difference equation whose general solution  is
                                            where Gj1 are the roots of the
characteristic equation. Thus, the autocorrelation function of an autoregressive
process tails off either  exponentially, or as a mixture of  exponentials  and
damped sine waves, depending on the nature of the roots Gj1  (or equivalently,
the parameters <£j).
     (c) If we let
                                      Rp=
 and
                                P\
                P2" ' Pp-\

                P\  "  ' Pp-2
                        P>-<    Pp-Z   VV  '  '
 then 0 =  TTp Rp can  be used to obtain what are known as the Yule Walker
 estimates of the parameters 0j, by replacing the p-{ by their estimates r-t.
     (d)  The  partial autocorrelation function of an autoregressive process is
 defined as
I
                                          P\  " •
                                            -3
                                                                       (10)
                                 Pi

                                  I
                      ..  p


                      "  P
                          k-l
         k-2    rk-3
 6-8

-------
 For an autoregressive process of  order  p , the partial autocorrelation function
 0kk will be non-zero for k less than or equal to p, and will be zero for k greater
 than p.
     Estimates for the partial  autocorrelation function can be obtained by using
 rk in place of pk. If $kk is an  estimator of kk then
                    Vor (   kk) *         k >  p  +  I  ,

and this can be used to  test if the partial autocorrelation function has a cut-off
at lag (p + 1).
Examples

    As an example, we consider an autoregressive process of order two. Most of
the time series commonly occuring in practice can be described by this process.
The process can be written as
    For stationarity, the roots of  1 — (^B-02 B2 = 0 must be outside the unit
circle.  This implies that the parameters $^  and 02  must lie in the triangular
region given by
                             2  ~

                             -I  «f>2<  I

If G~.|1 and G21 are the roots of the characteristic equation, the autocorrelation
function is
    When the roots are real (i.e., 0  + 402=** 0). the autocorrelation function
consists of a mixture  of damped exponentials. Additionally, if <^>1 and $2 are
both positive, the process is associated and the autocorrelation function remains
positive as it damps out. If the roots are complex the autocorrelation function
                                                                       6-9

-------
damps out sinusoidally. A necessary condition for the association of an autore-
gressive process of order two with positive coefficients, is that its autocorrelation
function  remain positive as it damps out. The coefficients ^ and 02 can De
estimated using the relationships
Application to Carbon Monoxide Data
    We estimate the autocorrelation function pk of the carbon monoxide data
given in Figure 3 using Equation 9. The estimates are r^ = .736, r2 = .676, r3 =
.560, r4 = .461, .... The partial autocorrelation function 0k k is estimated using
Equation  10 and  replacing the p-t's by  the TJ'S, for k = 2 and k  = 3. These
estimates are 022  = -294 and 033  = -.018. Since  $33 « 0, it is reasonable to
conclude that the carbon monoxide data can be reasonably well described by an
autoregressive process of order 2.
     Estimators of the parameters of the autoregressive process  0
6-10

-------
 which implies 0 > 0.
 Lemma 1:

     If 0j > 0 (i =  1,2, .  . ., p), then the autoregressive process of order p is
 associated. (See also Theorem 2.)
 Proof:

     Esary, Proschan  and Walkup  (1967)  prove  that  independent  random
 variables are associated and also that increasing functions of associated random
 variables are associated. Hence
                               Xf=ot
                          Xt2s*lflt,*flts
 and
are associated. The lemma follows by induction. ||
    Clearly, it  follows from the lemma and  the previous remarks that an
autoregressive process of order 1 is associated if and only if 01 > 0.
Bounds on the Distribution of the Maxima of Averages for Stationary
Associated Processes

    Let Xt1 ,Xt2, • • •, X tk,.. ., Xtn be an associated process and
Lemma 2:
    Let  [Xt1,  Xt2, .  . .]  be a stationary associated process with marginal
distribution F, /3k nand ak n as defined in the introductory section. If Fk is such
that Equation 1  holds, then
                       •n -        < x
                        'k,n   k,n
                          ak,n
>  exp (-
                                                                   6-11

-------
for sufficiently large n.
Proof:

     Let Xv X2, .  . ., Xn be associated random variables. Esary, Proschan and
Walkup (1967, pp. 1472-73) prove that
           r                              1     £
         P[Mox(X,,X2,..  , Xn)< xj  >  II  P(Xj
-------
standard will be violated. The standard deviation of this data, a, was estimated as
2.8. Basing our total sample size as hourly observations for 90 days, we have n =
2160, and  considering averages of length 8 (because of the 8-hour averaging
time), we take k = 8. Thus
                         20-8'l/2(2.8)(2loq2l60)"2-5.l6

                           8-"*<2.8)<2lo, 2I60)-'2

                              = A (43.39)
                         n
                                           -(43.39)
                           < 20)  > e~e           w  I
and this bears out the fact there were no violations of the specified standard.
     In  the light of the  observed data it appears that the specified standard is
 unreasonably high.

    In  general, if  F  is difficult  to convolute and if  R(x)= -log [1 - F(x)]  is
convex, then
as noted in Equation 7. Let £ 37 be the 37th percentile of P[i?k n< x] . Thus,
even in the presence of association
Additional Associated Processes

    It  is difficult, in  general, to verify that a process is associated from the
definition  of association. Another useful concept which implies association  is
that of conditionally increasing in sequence.
Definition:

  Random variables X1f X2, . .  ., Xn are conditionally increasing in sequence  if

         P(Xj>x|  X| = x,,X2= X2,... , Xj_| = Xj_,)
is increasing in x1, x2, • . ., Xj.., for i = 1,2, . . ., n.
    A stochastic process is conditionally increasing in sequence if any subset of
random variables based on  the process is conditionally increasing in sequence.
                                                                  6-13

-------
    This concept is due to Esary and Proschan (1968) who also proved the
following theorem.
Theorem 1: (Esary and Proschan)


     If X1 ,X2, . . -, Xn are conditionally increasing in sequence, then X1 ,X2, - - •,
Xn are associated.
     The concept has an immediate application  to autoregressive processes of
order p, which, according to Theorem 2, are associated if 0; > 0, for i = 1,2, . . .,
P-

Theorem 2:

    Autoregressive processes of order p are conditionally increasing in sequence
if and only if 0; >0, i = 1.2,. . ., p.
Proof:

    For an autoregressive process of order p, it is easy to verify that

       P[Xtn>xl  Xtn-| = xtn-.|'"-»Xtn-p-l  =  xtn_p.,]
is increasing in xt  ^ , . .  ., xtn_D_i if ar|d on'y  if 0X- ^ 0 for  i = 1,2, . .  ., p.||


Lemma 3:
    If [Xt; t e D] is a Markov process and if

                           P(xt>x| xs = y)
is increasing in y for s< t, then the process is associated.


Proof:

    It  is  sufficient to prove that the process  is  conditionally increasing in
sequence,  i.e.,

             P[xtn>x| X,|rxt|l...,X,n_| = xtn.

But the Markov property implies that this equals
which completes the proof.
6-14

-------
Theorem 3:

    A stationary, Gaussian process with autocorrelation function
                               00
                        /D(t)=Je'XtdH(X)
                                0
for some distribution, H, on [0,°°] is associated. (Note that time may be either
continuous or discrete.)
Proof:

    It is well known that a stationary Gaussian process is completely determined
by its autocorrelation function together with the marginal mean and variance.
By Lemma 3, the stationary Gaussian Markov process with autocorrelation

is associated                   pdl s 6
       (Note P[Xt > x|Xs = y) = /x.py exp[-u2/[2(1-p2)]] du/(27r(1-p2)) 1/2. )
                                                         K
    To complete the the proof, let PJ > 0, i  = 1,2, . . ., k and 2   PJ = 1 . Also
specify Xj > 0, i = 1,2, . . ., k. Let [Xj(t);t ^ Q] be a stationary Gaussian process
with autocorrelation
for i = 1,2, . . ., k. Assume that the k processes are mutually independent. Since
each process is associated, it follows that the process
                               k
                         Yt=  I -/Pi Xj(t)
                               i=l
is associated. (Recall that increasing functions of associated random variables are
associated.) Also
                         =  Cov[Y(t), Y(t+s')]
                 r k                   k
         =  Cov    £  yPj  Xj (t),   £ -/Pi  Xj
                 L in                  i=l
.-X"
                           =  Z  P.,e
By a limiting argument we can show that if
                              00
                                  e'XtdH(X)
                                                                 6-15

-------
and the process is a stationary Gaussian process then the process is associated. II
    The previous theorem has useful applications to data which is believed to be
generated by a stationary Gaussian process. If we can approximate the sample
autocorrelation function by a convex combination of exponentials then this is
evidence that the process is associated.
Discrete State Markov Processes

    An example of a discrete state Markov process is the birth and death process
assuming states [0,1,2, . . .]. Such processes, it turns out, are always associated if
the time  variable t  > 0,  can assume any non-negative value. When  such
processes are restricted to integer values they of course remain associated. On
the other hand, a random walk process in discrete time with transition matrix
                       - b   c   0   0    ...

                         a   b   c   0   0

                         0   a   b   c   0 ...
is associated if and only if b2>  a c.
    The above remarks follow from
Theorem 4:

     If [Xt; t e D] is a Markov process with transition probability matrix (Pjj(t))
which is totally positive in i and j for all t > 0; i.e..
2,j2
                                              (t)
                                                      >0
then  the process is associated. (The time variable may be either continuous or
discrete.) We assume here that i 1 < i2 and j1 < j2.
    Theorem 4 was proved by D. J. Daley (1968).
    Karlin (1968) showed that birth and death processes with state space [0,1,2,
. . .]  always satisfy the conditions of Theorem 4 and hence are associated. Esary
and Proschan (1970) showed that  two-state birth and death processes are asso-
ciated.
 6-16

-------
 Conclusion

     Our objective in this paper has been to present a new and different approach
 to the analysis of  air pollution  data, which  can be,  and perhaps should be
 modelled as a time  series. The results presented here are based on more realistic
 considerations than  those of a similar nature  presented before, and should be
 useful in setting and monitoring air pollution standards.
     Though the primary motivation in  this paper has  been the analysis of air
 pollution data,  the results obtained  here should be of  a more general interest.
 The results on associated stochastic  processes presented in the section headed
 "Associated  Stochastic Processes"  should  have applications in  time  series
 analysis, queueing theory, and reliability theory.
     By showing that the  extreme value distribution  is a lower bound on the
 distribution function of the maxima  of observations generated by an associated
 stochastic process, we have expanded the scope of applications of extreme value
 theory. However, the extreme value approximation may be too conservative in
 many applications.
     Theorem  3 asserts that if  [Xt; t  > 0]  is a stationary Gaussian process and
 p(t) can be represented as a mixture of exponentials, then

      p[Mox(Xf|lX,2f ... ,  Xtn)  0
 S. M. Berman [Annals of Mathematical Statistics, Vol. 35, pp. 502-516, (1964)]
 has shown that, in general, if EXt = 0, EXt = 1 and EXtXt = rn, then
                            ,..  . , Xfn)  < x]  - ft  P[xt.2 's  a two dimensional normal  density with  mean  vector  Q and
 correlation | r j |. Berman further shows that if either
                              Mm rn   logn = 0
 or                            00
                              I  r*n and a^n given by Equation 6 where // = 0 and
0=1.
                                                                    6-17

-------
   30
   28
   26
   24

  r
  * 20

  §,6
   14
   12
   10
  | 08
  §06
   04
   O2
   01
   .00
        June ID
         1970
21
     30 July I
 II
DAYS
        21
31 Aug. I  10
Figure 6-1. Oxidant Concentrations in  ppm for Livermore, California, June
August, 1970.
      .5
      .4
      .3
      .2
       .1
    -ai
    -0.2
           12    4    6   6    10   12   14   16   18  20  22   24  26  28
                                     LAG (days)
 Figure 6-2. Sample Autocorrelation Function for Oxidant Data From Livermore,
      California, June - August, 1970.
 6-18

-------
  19
  18
  17
  16
 E 16

13
Z 12
9 M
5 10
   June I
              15
July I      15       August I

        TIME (day»)
Figure 6-3. Carbon Monoxide Concentrations in ppm for Livermore, California,

     June-August, 1970.
Figure 6-4.  Sample Autocorrelation Function for Carbon Monoxide Data from

      Livermore, California, June - August, 1970.
                                                                     6-19

-------
Acknowledgement

     This research has been partially supported by the Office of Naval Research
under Contract N00014-69-A-0200-1036 and the National Science Foundation
under Grants GP-29123 and GK-23153 with the University of California.
     Research  supported in part by  the Office  of Naval  Research  under
Contract N00014-67-A-0214 Task 001, Project NR 347 020 and the  National
Science  Foundation Institutional Grant GU3287 with  the  George Washington
University, D. C. 20006. This work was begun while the author (N.D.S.) was a
visitor at  the Operations Research Center,  University  of California, Berkeley.
Reproduction in whole or in part  is  permitted for any purpose of the United
States Government.
 References

Ash, D., Bloomfield, P., and McNeil, D. R., 1972: On the Statistical Analysis of
     Air  Pollution  Data.  Department  of Statistics, Princeton University,
     Princeton, N. J., Technical Report 19, Series 2.
Barlow,  R.  E.,   1972:  Averaging  Time  and  Maxima  for Air  Pollution
     Concentrations. Proceedings of  the Sixth  Berkeley  Symposium  on
     Mathematical Statistics and Probability, Vol. VI, pp. 433-442.
Berman.  S.  M., 1969:  Limit Theorems for the Maximum  Term  in Stationary
     Sequences. The Annals of Mathematical Statistics. 35:  512-516.
Box,G. E. P., and Jenkins, G. M., 1969: Time Series Analysis Forecasting and
     Control. Holden-Dav, Inc., San Francisco, California.
Cramer, H., 1951: Mathematical  Methods of Statistics. Princeton University
     Press, Princeton, New Jersey.
Daley,    D.  J.,   1968:   Stochastically   Monotone  Markov   Chains.  Z.
     Wahrscheinlichkeitsth. 10: 305-317.
Daley,  D. J.,  1969: Integral Representations  of  Transition Probabilities and
     Serial Covariances of Certain Markov Chains. J. App: Prob. 6: 648-659.
Esary,  J. D., and Proschan, F., 1972: Relationships Among Some Concepts of
     Bivariate  Dependence.  The Annals of Mathematical Statistics. 43:
     651-655.
Esary,  J.  D.,  and Proschan, F.,  1970: A  Reliability  Bound for Systems of
     Maintained,  Interdependent Components.  Journal  of the  American
     Statistical Association. 65: 329-338.
Esary,  J. D., and Proschan  F., 1968: Generating Associated Random  Variables.
     Boeing Scientific  Research Laboratories, Doc. D1-82-0696.
Esary,  J. D., Proschan  F.,  and Walkup, D. W., 1967: Association of Random
     Variables, With Applications. The Annals of Mathematical Statistics. 38:
     1466-1474.
Gurland, J., 1955: Distribution of the Maximum of the  Arithmetic Mean of
     Correlated Random Variables. The Annals of Mathematical Statistics. 26:
     294-300.

 6-20

-------
Karlin, S., 1968: Total Positivity, Volume I, Stanford University Press, Stanford,
     California.
Larsen, R. I., 1969: A New Mathematical Model of Air Pollutant Concentration
     Averaging  Time and  Frequency.  J. Air Pollution Control Association.
      75/24-30.
                                                                    .j^
Marcus, M.  and Pinsky,  M.,  1969:  On  the Domain of  Attraction of e"e  . J.
     Math. Anal. Appl. 28: 440-449.
Singpurwalla,  N. D., 1972:  Extreme Values from  a  Lognormal  Law with
     Applications to Air  Pollution Problems, Technometrics. 14: 703-711.
DISCUSSION
Don Pack: The question is as follows—in working with a time series of data you
know that it  is contaminated  in various ways  either by the position of the
sampler or otherwise but I'd like to particularly direct the question toward the
instrumental contamination. We have remarked that the tails of the distribution
are qf particular interest.  Yet, if I understand instrumentation,  if the dynamic
range of  the  instrument  is  just about that of the range  of the  pollution
concentrations you can expect, the  maximum error will  occur at the threshold
and  at the very high values—such  things as poisoning bubblers by a spike of
concentration. Can the statistician clean up the time series distribution through
examination of the instrument characteristics so you don't have to examine each
individual observation for its validity?
Singpurwalla: If  I understand your question correctly, I would say possibly yes.
Pack: Do  you know how long it takes to go through each one of these? And yet,
you  know that there are errors in  here. Can you establish the probability that
the observation is real and not instrumental?
Singpurwalla:  There  are techniques, outlier  techniques, or there are probably
techniques of  some  kind  of pattern recognition, discriminant analysis,  which
could be  used to put a particular piece of data in category  A or category B,
where category A might be something  that is real, and Category B might be
something that is phony for somebody else. I  would imagine yes.
Pack: I simply haven't seen it done. That is why I  asked.
Singpurwalla: I think it could be done.
                                                                    6-21

-------

-------
   7. A STOCHASTIC MODEL FOR ESTIMATING POLLUTANT
         EXPOSURE BY MEANS OF AIR QUALITY DATA
                      ALLAN H. MARCUS

               Department of Mathematical Sciences
             University of Maryland-Baltimore County
                        Baltimore, Maryland
Introduction

    Air quality data  can and  should be made more useful in determining the
public health implications of air pollution control strategies. The problem is that
the performance of pollution control strategies is tied to time-averaged pollutant
concentrations at spatially  fixed  sampling or  monitoring stations.  Different
individuals in the  population receive vastly different exposures from the  same
polluted environment. For example, an executive who drives from a nearly rural
suburb to a polluted urban center business district in an  enclosed air-conditioned
car, and works in the upper stories of an air-conditioned office building, receives
a much smaller pollutant exposure than does, say, a traffic policeman working in
the same urban center. The executive  may actually receive most of his dosage
while waiting in a poorly ventilated parking garage for his car. An individual
exposure thus  depends significantly on the personal "trajectory" or movement
of the individual  in  the urban area  through  space and  time, and  on other
hard-to-predict  factors. There are also important differences in individual
response to exposure, such as the age and state of  health of the person, history
of smoking,  and the  time required for intake and  elimination of pollutants by
various body organs.  For this reason, the urban poor (who have to live with a
variety of  environmental stresses, and who have a  high proportion  of children,
elderly, ill and other susceptible types) are particularly vulnerable (U.S.E.P.A.
(1971)). Age, health,  income, travel patterns and other elements of "lifestyle"
are interdependent, and make the prediction of exposure and response  more
difficult.
    The purpose of this paper is to show that many of these questions can be
formulated  mathematically  in terms of the excursions of filtered stochastic
integrals  of  pollutant concentration. When  pollutant  concentrations are
functions of a  Gaussian random field, some of the questions raised  can be
answered analytically, and most can be studied  numerically.
                                  7-1

-------
    The data base required to actually use the model for predictive purposes is
enormous. These include: meteorological variables such as wind speed, height of
the mixing layer, and ground-level turbulence; an inventory of the major point,
line and area emission sources; demographic data for the estimation of personal
trajectories for various population types; and reliable dosage-response data for
various pollutants. The advantage of the analytical approach is that it may prove
possible to combine much  of the above data into a relatively small number of
parameters  which  determine the level-crossing properties of the stochastic
integrals.  In this way  it  may prove  possible  to study  simultaneously the
performance of air quality  standards and health  effects on  various segments of
the population of alternative pollution control strategies, without resorting to
extensive (and expensive) computer simulations.
Performance of Air Quality Standards

    Air  quality  standards  are   defined  in  terms  of  average  pollutant
concentrations with respect to a specified averaging time T, which are not to
exceed a threshold level  I_T more than nT times in a period of length ST. We can
define the air quality standards problem in the following rather formal way. Let
C(R,t) be the pollutant concentration at a point R at a time t. The time-averaged
concentration is
                   CT(R,t) = ~ f*   C(R,u)du
(D
Define NT(R,t) as the  number of excursions  of CT(R,t) above  I_T during the
interval of time  (t-ST,t] . That is, for u in the interval (t-ST,t], there will be a
random  number  NT(R,t)  of  episodes  in  which  CyfFKu)   exceeds  LT
continuously. Let D: be the duration of the j^tl such excursion, which starts at
time tj(t-ST LT  for tjnT, !0<  t< tf]
(3)
 7-2

-------
         Now, if there are k monitoring stations at points RI, .  . ., R|< in the
region, the standards are harder to interpret. What might be meant is either that
the standards are satisfied for all sites, so that the measure of severity of the
pollution problem is
l-p[NT(R|,t) <  nT,...,NT(Rk,t)0,
                                                                      (4)
not exceed Lj more than nj times. Thus, letting Nj(t) be the number of times
Cj(u) exceeds Lj for t-S < unT]                             (6)

The quantities may differ substantially.

    One approach to these problems is by modeling. We could start by assuming
that C(R,t) is a stationary stochastic process, and develop the needed results in
terms of familiar level-crossing probabilities, but this is a very difficult problem
(Marcus  (1972)).   Even  here,  what  we  would  need is the  matrix of
cross-correlations  Pjj(t)   between  the  transformed  concentration  at R( and
transformed concentration t units of time later at Rj.
    For  purposes  of  evaluation  of  performance  probabilities, it may be
sufficient to consider only a two-state stochastic process

                 IT(R,O = I    '*    CT(R,t) > LT               (7)

                            = 0  if  CT(R,t) <  LT

we could  then enquire whether the times between successive crossings of level
LT constitute  a realization of an alternating renewal process.  If so, the intensity
function (i.e.,  renewal  density) and distributions of duration and frequency of
exceedances are  easily  estimated. We  could then answer the  usual "quality
control" question of whether  or  not  a  certain adversely high concentration
proves that the underlying process is out of bounds.
    The analysis  of pollutant concentrations as a (multi-variate) time series is
essential with  regard to the stationarity  of the underlying processes  (i.e.,
statistical homogeneity with  respect to time). We should examine concentrations
at each site for:
    (a)  secular variations, such as trends (increase  due to  increasing regional
population;  decrease  due  to  movement  of  industry or  conversion to
low-polluting fuels).
                                                                     7-3

-------
    (b)  cyclical variations, including seasonal, weekly, daily, and other regular
periodic climatic or human movements.
    (c) other persistent but irregular events.
    Some useful first steps in the time series analysis of air pollution data have
been made by Merz, etal. (1972) and by Ash, et. at. (1972). We will discuss these
in  more  detail in  a later section on  "Stochastic  Models  for  Pollutant
Concentrations."
Individual Dosage Histories

    What  we  are  really interested in are the public  health implications  of
pollutant  control  policies. Policies are often tied to  physical  performance
characteristics  of spatially fixed monitoring and  sampling stations. These only
indirectly  characterize individual physiological response  to pollutants. It would
be more directly meaningful to measure cumulative dosage  or dosage-response
on an individual history basis. Let P j be the space-time "trajectory" of person i,
i.e., the entire  history of his or her movements in the metro  region during some
interval of time. Most  people travel  extensively  during the day and  may  be
exposed to quite different concentrations at various times and places.
    There are  several  possible  indicators of total dosage in  the interval (tn,tf).
Total dosage is given by

                              »f
                  Q(Pj)  =  /   /   C(R,u)du dR                (8)

                             »o   Pi
If the threshold level L is important, then in terms of an indicator function

                         IL(C) = I  if  C>L                       (9)
                                = 0  if C
-------
If there is a possible non-linear physiological response at time t due to after-effect
of a pollutant contentration C(R,u) at some point T at an earlier time u, then we
define an after-effect or response function f(C, t-u) so that response at t is
                            t
            r(Pj,t)  =  f    f  f [c(R,u),t-u]dudR        (11)
                          -CO  PJ

which is a variable, stochastic response to a stochastically variable exposure for
each trajectory. Because of the highly non-stationary nature of the exposures, an
analytical study of the distribution  of the indicators Q, QL, DL  or r seems less
promising than a simulation study.  The required individual trajectories can be
estimated from demographic data.
    These  simulated histories  could  be compared with  personal  pollutant
monitoring devices analogous to individual total radiation dosimeters.  It should
then be possible to more adequately evaluate epidemiological studies, e.g., data
collection by the CHESS or CHAMP networks.
Stochastic Models for Pollutant Concentrations

    Much of what is called "random" variation in a system is merely due to
ignorance—we often do  not know which  factors affect the evolution of the
system, or else we know (or  suspect) that certain factors are  significant, but
cannot relate them precisely to system performance, and so choose  incorrect
functional  relationships.  If  important variable factors are not  included in
predictions  of  system  performance, they  may  contribute  greatly to  the
unexplained "random" variation, and their exclusion could greatly modify the
structure of the statistical data analysis. This is, in fact, the most serious problem
in  finding  a stochastic  model  for  statistical   interpretation  of   pollutant
concentration data.
    The  state of the art in predicting urban air pollution by  multiple-source
diffusion models was thoroughly explored in a symposium held here in 1969
(Stern (1970))  and the field has continued to develop rapidly. We assume the
usual  continuous point- and line-source Gaussian plume dispersion model. Let
the ground-level monitor be affected by np point sources and by  nL line sources,
both emitting continuously but with a possibly slowly varying emission rate. Let
the mean wind speed be U m/sec. The ith point source emits Qpj  micrograms/sec
of a given pollutant, and is located at elevation Hj  and distance Sj meters from
the monitor.  Let 4>-t be the angle between the mean wind direction and the line
from  the source to the monitor. The downward distance is then Xj = Sj cos fy
and the crosswind distance is  y{ = Sj sin  0;. Similarly, let the shortest distance
from the monitor to the ith infinite continuous line source be Rjf and  let QLj be
its emission rate in micrograms/sec/m. if 6-t is the angle between the line source
                                                                      7-5

-------
and  the  direction  of the mean  wind, then Xj  =  Rj/(sin 6-t) is the downwind
distance of the monitor. Then, defining the crosswind dispersion variance 02 by

                            CTy2 =  Oy2 X2""                        d2)


and the vertical dispersion variance a\ by
                             
-------
large  number of  roughly  equal  sources, individual source strength and wind
direction may contribute  less to the distributional variation of C(t) than does
wind speed and turbulence. The latter situation may be approximated in urban
centers.
    Some studies are available which give the distribution of wind speeds and
turbulence in urban environments (Holzwofth (1967);  Brook (1972); Luna and
Church  (1972)). The wind speed distribution is significantly non-Gaussian and
positively skewed  (Brook (1972)),  and may possibly be of lognormal form.
However, wind  speed and  turbulence parameters are strongly dependent. Let aA
be the elevation standard deviation in radians (note that if n = 0, then a = aA
and az = aE). The  product oAoE\J  (and equivalently, its reciprocal) does not
have a lognormal distribution except for stability classes E and F, corresponding
to low winds and night or overcast conditions (Luna and Church (1972)). These
are, however, conditions of very  high air pollution potential. One might suspect
then,  that urban air pollutant  concentration distributions  are a  mixture of
distributions, but  with  an  approximately  lognormal  upper tail.  This is  also
suggested by Fig. 7 of Holzworth  (1967).
    Larsen (1971) has shown that the lognormal concentration distribution is
applicable to many sets of observations, such as hourly averages of S02 at the
CAMP monitor in Washington, D. C. from 1961 to 1968. However, examination
of smaller data  sets shows that some are well described as lognormal, but others
are not (see e.g., U.S.P.H.S. (1966)). A recent, very  thorough study  by Ash,
Bloomfield and McNeil  (1972) of CO and S02 concentrations in  Camden and
Bayonne, N. J. in  1970-1971, shows that the lognormal  distribution is not
particularly satisfactory, especially at low concentrations. They suggest that a
"fourth-root  Gaussian" distribution  may  be  better  (although  suggesting a
lognormal distribution for NO20; the fourth-root transformed concentrations
also   have  the  property  of  approximating  a  stationary  Gauss-Markov
(Ornstein-Uhlenbeck) random process.
    What is needed is a combination of the deterministic diffusion modeling and
the purely empirical statistical data analysis. The deterministic model provides a
structural framework  for  predicting pollutant concentrations, with important
variables such  as wind speed and  direction, and turbulence parameters, as
predictors  rather  than  unexplained  sources  of  variation. The   residual
unexplained variation is the true "random" variation,  and understanding its
structure should suggest the most useful statistical analysis methods.
Some Useful Results on Level Crossings and Exceedances

    It would be convenient to study the frequency, duration, and  intervals
between  pollution episodes using well-known  results about curve-crossings of
random processes  (see Cramer and  Leadbetter  (1967)  for a rather complete
exposition of known results). However, we can  use the extensive body of results
                                                                     7-7

-------
about Gaussian  random  processes  only  if  we can first  find a  monotone
transformation  of  pollutant  concentrations  such that   the  transformed
concentrations approximate a realization of  a Gaussian process. Explicitly, let
g(C) be a monotone increasing function of C. We are looking for a representation
                g[cT(t)] =  o-(T)Z{t)
(15)
where  Z(t)  is a  zero-mean,  unit-variance,  Gaussian random process  with
auto-correlation function Py(t); the parameters may depend on the averaging
time T and implicitly on the location of the monitor. Hence, if LT is the T-hour
standard, a pollution episode occurs if
                           CT(t) >  LT                       (16)
 i.e.,

              Z(t)>h=  [g(LT)-fL(T)]/T(t) =  I - X2(T)  t2/2j + X4(T) t4/4|  +  o(t4)     (18)
 We then have, e.g.:
     (a) The expected number of episodes in (tQ,tf),

                  E[NT(R,tf-t0)] r  X(tf-to)                (19)

 where

                  X =  [X2(T)/2tr] "2 <£(h)               (20)

     (b) The expected duration of an episode,

                    E[D|]=  [i-(h)]/X                 (2D

 where
                                  h
                                          x                    <22a)
                                 -CD
 and

                     ) =  (2TT)-"2 exp(-x2/2)
Higher moments of these and related random variables have more complicated
formulae. Simple asymptotic results are available if h is very large.
    Integral  functionals of CT(t)  are  particularly  important for  estimating
cumulative dosage and dosage-response. The mean and variance of the so-called
"Zn-exceedance  measures" are relatively easy  to compute  (Cramer  and
 7-8

-------
                          I            n
           2n(z,T)  = / [Z(t)-z]   Iz[z(t)]  dt/T
Leadbetter (1967)). These are random variables defined by

                           r        i" _  r      •}
                                                                  (23)
                           L        J    ' L      J
                        0
where IZ(Z) is the Heaviside step function  defined in the previous section on
Individual Dosage Histories. Thus
                                 GO
                                          . «>
                                                                  (24)
                               ~Z
and a more complicated formula for the variance. In particular, for:
    (a) Lognormal distribution, g(C) = logeC

                                                                  (25)
                                                                 (26)
                                 uu
               E[zn(z,T)] = f  (x-z)n <£(x) dx
                    CT(t) = exp [erZ(t) +


          Zn(z,T)= ] dt/T
                                                                  (28)

Note that if (as is often the case) physiological response is proportional to the
logarithm of the stimulus above a certain theshold level ;u + zo, then Equation 26
is most appropriate for health impact predictions.
    These results  are readily extended to curve  crossings by  non-stationary
Gaussian  processes. This generalization is needed to discuss  the effects of
pollution episode control strategies.  An episode is declared on the basis of values
of CT(t) and its time derivative Cy(t) over some interval, or equivalently, on the
basis of the values of  the jointly  Gaussian  process Z(t), Z'(t) during some
interval. Once the episode is declared, the effect of the controls is to decrease
                                                                  7-9

-------
the mean value with time. The correlation matrix of the controlled process is the
matrix of partial correlations conditioned on the history of the process up to the
time the control strategy is initiated. The results are rather complicated and
details will be presented elsewhere.
    The correlation structure of the process Z(t) is crucial in predicting episode
frequency and  duration, but is not well known. Merz, et at. (1972) find that in
Los Angeles, there is a weak, yearly trend in weekly averages of hourly maxima
for oxidants, CO, NO, and HC; superimposed on this are strong semi-annual
(seasonal) and weekly regressions, and a possible bi-weekly regression for CO and
NO. Ash, et al.  (1972)  find that the fourth-root transformation reduces the
process  C.,(t)  to  a  stationary Gaussian random process  with  approximately
Markov dependence,

                       /> (t)=exp(-Wltl)                     (29)

for  T =  1  hour,  CO  and S02  in  Camden and  Bayonne  in 1970-1971.
Unfortunately, Equation 29 cannot apply  for small times, since it implies that
the process Z(t) is not differentiate. One possible solution  is that Z(t) is doubly
stochastic, and that w is itself a  random variable. Larsen  has observed (1971)
that the standard  deviation and  the maxima of  log C-r(t) are very slowly
decreasing power functions of T.  This suggests (Marcus  (1972)) an average
correlation function for large times t of the  form

          p (t) =  a|t|~b + 0(|t|~b)       0 I  hour
      ' o
7-10

-------
The predictions are:
Standard

Primary
Secondary
Secondary
Averaging
Time T, hr
24
3
24
Standard
LT, ppmSO2
0.14
0.50
0.10
Avg. No. Yearly
Exceedances
1.95
0.20
6.74
Avg. Episode
Duration, hr
210
7.84
176
These  predictions could, in principle, be  compared  with  observations, e.g.,
Highway  Research  Board  (1966).  Unfortunately, my research has not been
funded  or supported and  I  do not  personally have  the time or computing
resources  needed  to carry out the data analysis. The predicted values are typical
and plausible.
    The  importance  of the correlation structure  of the process,  and  the
formulation of health impact problems in  terms of stochastic integrals, suggests
that it may not be useful to study only the  maxima of a series of  independent
random variables  (e.g.  Barlow  (1971) and Singpurwalla  (1972)). Maximum
concentrations, and maxima of integral functionals, are of considerable interest,
but it would be more informative to study  them for non-stationary correlated
random processes.
Applications to Human Populations: Some Problems

    One of the  first problems is that  of defining an appropriate physiological
response function  for  exposure to pollutant  concentrations which vary with
time.  This depends significantly on the pollutant, the most sensitive organ,  the
time  scale and  method of elimination  as well as other factors. These  are
discussed  in a review by Saltzman  (1970). See also Rossin and Roberts (1971).
    The second  problem is the personal trajectory estimation for different types
of individuals. We first need to classify  individuals by potential health effects
(the preceding problem), and then to relate these to demographic characteristics
of  the  individual—age, sex,  income,  state of  health,  occupation,  etc. These
demographic factors, and an inventory  of land uses in the metropolitan region,
largely determine the personal trajectory. The techniques for doing this are an
essential part of travel demand forecasting (e.g., Hanafani (1972); Wooten and
Pick (1967); Highway  Research Board  (1972)). A recent, very useful approach
involves the  estimation of personal  trip  patterns  as  a Markov chain (Sasaki
(1972)).
                                                                     7-11

-------
    Finally, meteorology and human activities interact in complicated ways not
readily accessible to modeling.  For example, on a hot, calm, overcast summer
day  (light  winds and  stable turbulence conditions being conducive to high
pollutant concentrations), an unusually large  number of people might absent
themselves   from   downtown  offices,  reducing  motor   vehicles  emissions
downtown  but possibly increasing them near roads leading to parks and beaches.
Power plant emissions might also  change as a result of the redistribution of air
conditioning demands, etc. A severe winter storm would introduce a different
set of interactions.
    The interpolation and extrapolation of pollution in space and time from air
quality  data  at  fixed  monitors  is not an  extremely  difficult problem.  The
prediction  of multivariate time series is well  known (Cramer and Leadbetter
(1967)). The extrapolation of spatial random fields can be conveniently done by
the use of empirical eigenvectors Peterson (1970) (1972), thus also exposure
along trajectories.
References

Highway  Research  Board,  1972:   Transportation  Demand  and   Analysis
      Techniques. 18 reports. Highway Research Record No. 392.
Ash, D., Bloomfield, P., and McNeil, D. R., 1972: On the statistical analysis of
      air pollution data. Statistics Dept., Princeton Univ., Princeton, N. J., Tech.
      Rept. 19, Ser. 2.
Barlow,  R.  E.,   1971:  Averaging  time  and  maxima  for  air  pollution
      concentrations.  Univ. of  Calif.,  Berkeley, Calif.,  Operations   Research
      Center Rept. ORC-71-17.
Brook,  R. R., 1972: The measurement of turbulence in a city environment. J.
     Applied Meteorology. 11: 443-450.
Cramer, H., and  Leadbetter,  M.,  1967: Stationary and Related Stochastic
      Processes. John Wiley, New York, N. Y.
Hanafani, A.  K., 1972:  An aggregative model  of trip making.  Transportation
      Research. 6: 119-124.
Holzworth,  G. C., 1967: Mixing depths, wind speeds and air pollution potential
      for selected locations in  the United States. J. Applied Meteorology. 6:
      1039-1044.
Larsen,  R.  I.,  1971:  A  Mathematical  Model  for  Relating  Air  Quality
      Measurements  to  Air Quality  Standards.  U.  S. Env. Prot. Agency  Publ.
      AP-89.
Luna, R. E., and Church, H. W., 1972: A comparison of turbulence intensity and
      stability  ratio measurements  to Pasquill stability  classes. J. Applied
      Meteorology. II: 663-669.
Marcus, A. H., 1972: Air pollutant averaging times: Notes on a statistical model,
      and predicting the frequency and duration of air pollution emergencies: A
7-12

-------
      statistical model.  Johns Hopkins Univ., Baltimore, Md., Statistics Dept.
      Tech. Repts.
Merz,  P,  H., Painter,  L. J.,  and  Ryason,  R.  R., 1972:  Aerometric data
      analysis-time  series  analysis  and  forecast and an  atmospheric smog
      diagram. Atmospheric Environment. 6: 319-342.
Peterson, J.  R., 1970: Distribution of sulfur dioxide over metropolitan St. Louis
      as described by empirical eigenvectors and  its  relation to meteorological
      parameters. Atmospheric Environment. 4: 501-518.
Peterson,  J.  T.,  1972: Calculations  of  sulfur  dioxide  concentrations over
      metropolitan St. Louis. Atmospheric Environment. 6: 433-442.
Rossin, A. D., and Roberts, J. J., 1971: Episode control criteria and strategy for
      carbon  monoxide. Paper 71-55, presented at 64th Annual Meeting of the
      Air Pollution Control Association.
Saltzman, B., 1970: Significance of sampling  time  in  air monitoring. J. Air
      Pollution Control Association. 20: 660-665.
Sasaki, T., 1972:  Estimation of person trip pattern through Markov chains. Proc.
      Fifth   International  Symposium   on   Traffic   flow   Theory   and
      Transportation, G. W. Newell (ed.), American Elsevier, New York, N. Y.
Singpurwalla,  N.  D.,   1972:  Extreme values from a  lognormal law  with
      applications to air pollution problems. Technometrics.  14: 703-712.
Stern, A. C. (ed.), 1970: Proceedings of Symposium  on  Multiple-Source Urban
      Diffusion Models,  U. S. Env. Prot. Agency Publ.  AP-86.
U.S.E.P.A., 1971: Our  Urban Environment and Our Most  Endangered People.
      Report, U. S. Environmental Protection Agency.
U.S.P.H.S.,  1966: Continuous Air Monitoring  Program, Washington,  D.  C.,
      1962-1963.  U. S. Public Health Service Publ. AP-23.
Wooten,  H.  J.,  and Pick, G. W.,  1967:  A  model for  trips  generated  by
      households.^. Trans. Econ. Policy. 1: 137-153.
DISCUSSION

Joseph Vasalli: I noticed that in a great many of the purely statistical papers that
have  been presented there's  been  a concern with  the time variance  of  air
pollution concentration. If you think of it in a slightly different fashion there is
a spatial variation that is superimposed on the time variance such that if you  are
looking at a time-series average concentration over an area with time, you get a
band  instead  of a line. I am  wondering what is the effect of superimposing  the
spatial variance on the time variance?
Marcus: You mean  a spatial variance in  the individual movements or what? I
don't quite follow that.
Visalli: If you attempt to sample over an area, instead of at one point .  . .
                                                                    7-13

-------
Marcus: Yes. You are right. If you attempted to sample over an area instead of
at a single point you would have a problem. That would be exactly analogous in
a formal sense to the moving average problem that we've had in here. That is, if
 instead of an instantaneous spatial  point concentration,  you were somehow
able to simultaneously  accumulate measurements  over a very large area, then
you would have the two dimensional analog of  the moving average sort of
formulation we  have here.  You would have a moving average with respect not
only to the time axis, but also to three dimensions. This sort of thing could be
dealt  with if   you  went  over to,  say,  Gaussian  random  processes  with
multi-dimensional  index  sets,  say,  time  and  three  space  dimensions.  In
laboratory studies of  turbulence,  this  kind of  representation is necessary.
Unfortunately  it  becomes very difficult  to do  anything  in  the   spatially
completely isotropic  and homogeneous  case, and this doesn't   describe very
many  cities I know—Los Angeles, maybe. Even in  Los Angeles there are places
that are distinctive from other places.
Ralph  Larsen: Your observation  that if 1-hour concentrations of a pollutant are
lognormal   then   the   24-hour   observations   by   theory   cannot   be
lognormal, but that the fit may be still good to lognormal, is confirmed  in
another field by^R. L. Mitchell in the September, 1968, issue of the Journal of
the  Optical  Society  of  America,  in an  article  titled  "Permanence  of the
lognormal  distribution." His abstract states that the distribution of the sum  of
lognormal  variates is  shown for most  cases  of interest to  be accurately
represented by  a  lognormal  distribution instead  of a normal or  Rayleigh
distribution that  might  be expected from the  Central  Limit Theorem. Then he
goes on to show in his analysis that he does tend to get summations which look
lognormal, but they're not perfect, they're just quite close.
Marcus: I  wasn't aware of that  paper, I would  be interested  in  seeing it. The
lognormal  distribution  has a number of strange properties which  haven't come
out  yet,  which   I  think  should be  mentioned.  It  doesn't have  a  moment
generating  function that has a unique inverse, which is  rather embarassing in
some applications.  As far as heavy  tailed, I  wasn't  aware that the  sums  of
lognormal  variates don't  coverge to normality  very  quickly.  I  suppose it
shouldn't be too surprising.  I'd like to believe in the Central  Limit  Theorem. But
there  is another  family of heavy   tailed distributions which hasn't  been
mentioned yet, the so-called stable distribution laws, which have a number of
extremely  awkward properties like having infinite variance  and, in many cases,
infinite mean values. On the other hand, besides being a very heavily skewed sort
of distribution, they do have one good property and that is that sums, or  more
generaHy,   moving  averages, of stably  distributed  variates  have  a stable
distribution law. This gives us  a  useful  kind of  reproductive  property. The
problem about dealing  with distributions that don't have a theoretical finite
variance I  find rather horrifying and would prefer not to look into. The question
of the underlying distribution structure is one I absolutely haven't  discussed. I
did it  in the paper a little bit. I  even tried to  go into  how, starting out from a
7-14

-------
fundamental diffusion  model, you could try to derive a lognormal distribution
either by  assuming that some of the components of  the  concentration like
reciprocal  wind speed, and  reciprocal  product  of  the azimuthal  times  the
elevation  standard deviation  times  wind speed  might   be  approximately
lognormally  distributed. But  when you  have a large number of point, line and
atea  sources, as you do in an urban region, you have a combination of both
multiplicative and additive factors which  will give you a distribution which is not
evidently lognormal or anything else for that matter, and I don't  know how to
handle that. I'm afraid  you have a mixture of distributions with a heavy tail, and
that's about  all  I can say.
Benarie: Being  an engineer and not at all a mathematician I would have checked
the theory by available monitors. In radiation contamination protection personal
monitors that  are  movable  with the person are  very extensively used, which
could be used as a check of your theory first of all. In industrial  hygiene there
are several  types  of portable  paniculate monitors and it should be checked  on
them.
Marcus:  I  thoroughly  agree  with that, and  the analogy  with  the radiation
dosimeters is perfect.
                                                                     7-15

-------

-------
     8. EVALUATING CONFORMITY WITH TWO-POINT AIR
              QUALITY STANDARDS, POLLUDEX*
       HAROLD E. NEUSTADTER AND STEVEN M. SIDIK

                       Lewis Research Center
           National Aeronautics and Space Administration
                           Cleveland, Ohio

                                and

                        JOHN C. BURR, JR.

                   Air Pollution Control Division
                       City of Cleveland, Ohio
Introduction

    This report presents  the  results of various  statistical  analyses of data
obtained by  the Air Pollution  Control Division (APCD) of Cleveland, Ohio. It
contains a tabulation  of averages, statistics relevant to lognormal distributions,
and good ness-of-fit statistics. In addition, a pollution-level index is introduced
which relates the measured pollution levels over a year to the existing air quality
standards.

    The  air  sampling  program of  APCD  is currently in  its sixth  year.
Twenty-four-hour  samplings have been made of  total  suspended particulate
(TSP) since January 1967, and of nitrogen dioxide (NO2) and sulfur dioxide
(SO2) since  January  1968. The sampling methods used  are high-volume  air
sampling, Jacobs-Hochheiser, and West-Gaeke, respectively. The geographic  de-
ployment of  sampling sites is shown in Figure 1. The meandering heavy  line in
the center of the city is the Cuyahoga River, about which is centered most of  the
region's heavy industry.
    At present, there are 21 stations monitoring the air. Fifteen of these stations
monitor  all three pollutants, while the remaining six (stations O to T in Figure 1)
"This paper has also been released as a LeRC publication, NASA TN D-6935 entitled
 "Statistical Summary and Trend Evaluation of Air Quality Data for Cleveland, Ohio, in
 1967 to 1971: Total Suspended Particulate, Nitrogen Dioxide, and Sulfur Dioxide."
                                 8-1

-------
measure TSP only. Seventeen of these sites have been in operation for more than
5 years. Stations  B, D,  K, and N have undergone  relocation since  their initial
installation.  However, because of  the proximity  of their present sites to  their
former sites, we have assumed that essentially the  same environment has  been
measured throughout the 5-year period. Currently, the air is sampled every third
day, although the sampling frequency has varied over the 5 years and has been as
low as once-a-week. Some of these data have been presented elsewhere in a more
preliminary  manner  (Neustadter, et al.  (1972)).  The data analysis reported
herein was performed by the Environmental Research Office of the NASA Lewis
Research Center (LeRC) as part of the preliminary phase of a joint APCD-LeRC
program to study trace elements and compounds in airborne paniculate matter.
 Cleveland Aerometric Data

     Pertinent results are presented  in Tables I,  II, and  III for TSP, N02, and
 SO2, respectively. In each table, the first column gives an alphabetic designation
 of the monitoring site corresponding to the code shown in  Figure 1. The second
 column lists the various parameters of interest for each of  the pollutants. These
 parameters are (a) number of days observed (readings); (b) geometric (TSP) or
 arithmetic  (SO2  and  IMO2)  averages; (c)  standard  geometric deviation;  (d)
 estimated value of the second  largest pollution  level for the year; and  (e) an
 adjusted Kolmogorov-Smirnov goodness-of-fit statistic for lognormality, denoted
 as(N)1/2D.
     Air  quality standards are set nationally by  the  Environmental  Protection
 Agency (EPA)  of the Federal Government (Anon. (1971))  and statewide by the
 Air Pollution Control Board of the Department of Health (DoH) of the State of
 Ohio (Ohio,  (1972)). Whenever these two standards differ, we have chosen to
 work  with the DoH (more stringent) standard, which  is listed  in  the  third
 column. In the remaining five columns are the various statistics for each  of the
 years 1967 to 1971.
 Number of Readings

     For each pollutant, both EPA and DoH require a minimum of one sampling
 every sixth day,  or an equivalent set of at  least 61  random samples per year.
 Thus, we designate this standard as > 60 in the tables. Even though early in the
 program some  stations did not achieve 60 samples per year for each pollutant,
 we have included the analyses of these data sets in this report. At present, the
 nominal schedule of APCD calls  for monitoring the environmental  air every
 third day. In practice,  this procedure generally allows sufficient margin for
 unanticipated disruptions  (e.g.,  equipment failure)  while still  exceeding  60
 readings per year.
  8-2

-------
Geometric and Arithmetic Averages

    The geometric average is used in Table I, and the arithmetic average is used
in Tables II and  III. This  corresponds to  the particular averaging method
stipulated by EPA and  DoH standards.  Calculations were performed  whenever
the number  of readings exceeded 10. The values listed as standards are the DoH
primary standards, which correspond to the EPA secondary standards.
Standard Geometric Deviation (SGD)

    It has  been noted that, irrespective of sampling duration or location,  air
sampling data are generally distributed lognormally (Larsen (1971)). When such
is actually the case, the entire data set is sufficiently described by its geometric
average and  SGD. The higher the SGD, the greater the spread between the lower
and higher values. As with the averages,  SGD was calculated  for data  sets of
more  than 10 readings.
Second Largest Value

    Both EPA and DoH standards for TSP and SO2 specify that a certain level
of pollution is  ". .  . not to be exceeded more than one time per year." This
implies that for the 365 daily pollution levels per year (366 for leap years), there
is no  upper bound on  the  largest single level. However, the next largest value
(i.e., the second most polluted day of the year) is required to be at or below the
standard. Thus,  Tables I, II, and'lll include estimates of the second highest
pollution level for each year. As with the averages, the Values listed here are the
DoH  primary standards, which  correspond  to EPA secondary standards.  While
N02 has only a standard for the annual average, we believe the estimated second
largest level for a year  is useful  information and we have included it in Table II.
    An  approximation  to the second largest pollution  le"vel estimate, for a year
of  n  days, and  a  sample of  N observations, is obtained  by the following
procedure.  (The  transformation to  the logarithms of  the data values  is made
because the expected values of normal order statistics are well developed in the
literature,  whereas  we are not aware of  any comparable development for
lognormal distributions.) The logarithms yf = In(Xj) of the pollution levels Xj are
computed.  According to the assumption of  lognormality, the y; values follow a
normal distribution. The sample mean y and sample standard deviation s  of the
set  of logarithms are computed. From Harter (1961), the expected value of the
second largest observation in a sample of 365 (366 in a leap  year) independent
values from a  normal distribution is 2.63 (to three significant digits) standard
deviations from the  mean. This value, along with the average y and the standard
                                                                    8-3

-------
deviation sy of the set of logarithms, is used in the following equation to obtain
the estimate of the second largest pollution level of the year:

                         y2nd=  y +  2-63sy                       (D
The values of  x2nd  listed in Tables I, II, and III are obtained by exponentiation,
as

                         *2nd  =
     Because of the decreased precision  which occurs when extrapolating to the
tail of a distribution and because the sample mean and standard deviation are
used, the  minimum number of  readings for this calculation was increased to 30
as opposed to 10  used for  the averages. Implicit  in using Equation  1 is the
assumption of lognormality of the data, which leads us to the final entry in these
tables.
Kolmogorov-Smirnov Statistic

    The Kolmogorov-Smirnov statistic is a goodness-of-fit statistic which can be
applied  to  any  distribution  (Noether  (1967)).  In  testing for a  lognormal
distribution,  it is easier for calculation purposes to take the logarithms of the
values and test for goodness-of-fit to  a normal distribution. This statistic was
originally intended for use when the distribution which the data is suspected of
following is completely specified. For the normal distribution, this is equivalent
to  knowing  the mean  ju and  the standard  deviation a.  In  this case, the
Kolmogorov-Smirnov statistic is denoted D and is calculated as
                       max
                       i=l,
(3)
where the function 3>(z) denotes the cumulative standard normal distribution
function.
     The statistic D measures the maximum deviation of the observed cumulative
distribution  function  from the  theoretical  cumulative  distribution  function.
Thus, D is always a value between 0 and 1. A value of 0 would indicate a perfect
fit of the sampled data to a lognormal distribution, and larger values indicate an
increasing deviation from lognormality.
     When  the  mean and the standard deviation are unknown, it is common to
use  the estimates  y and sy =  [2j(yj - y)2 /(N -  1)]1/2  in place of /i and a.
Lilliefors (1967) has studied the use of the Kolmogorov-Smirnov statistic in this
situation. Table IV of this report  presents the significance levels of (N)1/2D
from Lilliefors (1967) for samples of N > 30. Thus, the statistics in Tables I, II,
and  III  are presented as (N)1/2D.
8-4

-------
    It should be recognized that the observed pollution levels are but a sample
of levels from some distribution. Thus, even if the distribution of the complete
set of pollution levels is indeed lognormal, some of the samples will lead to large
values of  (N)1/2D. The interpretation of the tabulated significance levels a is
that  if the distribution  is indeed lognormal,  then  about 100a percent of the
samples tested  will  lead  to a value  of  (N)1/2D  which exceeds  ((N)1/2D)a,
whereas about  100(1 -  a)  percent will lead to a value of (N)1/2D lower than
((N)1/2D)a. Because subsequent calculations in  this report depend heavily  on
the assumption  of  lognormality, the value of a =  0.20 was chosen. Choosing this
large value for a has the drawback of rejecting the assumption of lognormality a
substantial proportion of the times that the distribution is lognormal. However,
it  has  the compensating  advantage  of  being  more  discriminating  against
distributions which are not lognormal.
Lognormality

Lognormal Plots
    As a graphical  means  of assessing the goodness-of-fit of  the  data  to a
lognormal distribution, we can enter the observed data on lognormal probability
graphs. Figures 2 and 3 show two plots for TSP. The solid line indicates the plot
of  the cumulative sample  distribution of  all measurements over the 5-year
period. The data  points present the separate sample distributions for the 5 years
(1967 to 1971). Any steady increase or decrease  in the pollutant concentration
would be discernible as a vertical sequence of the data points representing those
years. In the two cases shown, there is no overall trend. Figure 2 is for station I
in the industrial valley.  The overprinting of the data points shows the TSP levels
to be fairly  uniform at a rather high average level for the 5-year period.  Figure 3
represents station K, in  a residential neighborhood, predominantly upwind  from
the industrial region.
    A full  set of lognormal curves for all  21 stations for the 3 pollutants is
available on  microfiche from the authors upon request.
Goodness of Fit

    To indicate the decreasing likelihood of lognormality as (N)1/2D increases,
all  values calculated  on  the  assumption  of lognormality  for  which  the
goodness-of-fit statistic is outside the 20-percent confidence level (i.e., the data
having (N)1/2D > 0.736) are footnoted in the Tables. For a further indication of
lognormality, as well as for a check on the consistency of our data, we examined
the distribution of sets for which (N)1/2D > 0.736.
                                                                      8-5

-------
    Table V summarizes the results of the goodness-of-fit tests in which the a -
0.20 significance level was used. The first column lists the station identification.
The remaining columns list for each of the pollutants the number of yearly tests
which  were performed, and  the  number of  these tests which  rejected the
assumption of lognormality.  For TSP,  there  are 85 tests, of which  20 were
rejections. This is very close  to the expected number of rejections and implies
that the distribution  of TSP may very safely  be considered to be lognormal. For
NO2 and S02, however, there are more  than twice as many rejections as would
be expected, and hence their  close:ness to a lognormal distribution is somewhat
suspect. On the basis of an examination  of the lognormal plots of SO2 and the
fact that the SO2 departure from lognormality, as indicated by (N)1/2D, is not
severe, we will proceed on the assumption  that the lognormal is still a useful
approximation to the distribution of SO2.
    Further examination of Table  V shows that the lognormality of TSP, SO2
and NO2  is most  questionable at  stations  E, F, and I. Benarie  (1970) and
Mitchell (1968) have each considered the additivity of lognormal distributions.
Mitchell has shown that under certain conditions the sum of independent and
identically distributed lognormal variates also follows a lognormal distribution.
Benarie has considered a more general situation,  where the lognormal variates
have  differing geometric  means  and   standard  geometric  deviations.  His
conclusions are that when a large number (>10) of lognormal variates with
slightly differing geometric means are superimposed, the resulting distribution is
still well approximated by a lognormal distribution. However, when a small
number (<10) of lognormal variates with differing means are superimposed, the
resulting distribution generally is not a lognormal. Thus, it is possible to assume
that pollution levels at stations E, F, and I are dominated by a small number of
major sources, whereas the remaining stations reflect the influence of either a
single large source or  a superposition of many sources.
Air Quality

     Among the  goals  of APCD are monitoring  of the environmental  air,
determination of its quality, and initiation  of action to improve the local air
quality, where indicated. There  are well established techniques for analyzing
lognormal  plots to extract information  pertinent to determining compliance
with  air quality  standards and/or the existence of long-term trends  (Larsen
(1971)). However, it is often desirable to have available  some single number, or
index, which presents as simply as possible a maximum  of information. To this
end we have  developed an index, which we call Polludex, which  gages the
conformity of the measured environment  to the established standards.
 8-6

-------
Polludex, An Air Pollution Index

    Many  indices  have  been proposed and  a  number  are  in use by  various
agencies (Babcock  (1970)). Polludex is a variation of an index proposed by Pikul
(1971). The rationale for constructing this modified index is as follows. The
standards for TSP and S02 specify values for the annual mean which may not be
exceeded and also  values which may not be exceeded more than once per year.
In relation to a lognormal plot of the underlying population, these standard
values specify  the coordinates of  two points  on a straight line. If the data
obtained during a  1-year period conform to lognormality and conform to the
required standards, the plot of the  data will closely approximate a straight line
falling entirely below (or on) the line segment joining the standard points.
    For each of the three pollutants, define

                              Sample average
                           Standard for average

                      Estimate of second largest level
         s =
             Standard not to be exceeded more than once yearly

Then Polludex, P (pollutant), is defined for TSP and SO2 by

      P(TSP, S02) = 50 x [max(0,r-l)+ max(0,s-l)l    (4)

and  for NO2 by

                  P(N02) = IOO x   mox(0,r,-l)                 (5)
where max(a,b) means that the larger of the two values, a or b, is to be used. The
geometric average  is to be used in calculating r  for TSP and the arithmetic
average is to be used in calculating r for SO2 and NO2.  For the estimate of the
second largest level to be used  for s,  we used the approximate value listed in
Table I for TSP and in Table III for S02.
    With this definition, the same weight is given to the long-term (chronic)
effects of pollution as is given to the severe short-term (episode) incident. The
standards  for these  pollutants  have  presumably  been  set  with  regard to
maximum acceptable levels for reasons of public health and/or welfare. Thus, we
assume that normalization of the estimated mean and second highest values by
the standards will, in  a sense, put each P on an equal basis with respect to the
potential harm caused by excesses. If the air quality is equal to or better than
the standards, Polludex =  0. A  value of Polludex =100 can be understood to
mean that the air is, in a sense,  100 percent polluted, in that a value of 100 is
obtained when the average and the second highest values are each 100 percent
                                                                     8-7

-------
higher than their respective permissible levels. Of course, Polludex = 100 would
also result from a continuum of other combinations, as, for example, when the
second highest value is three times its standard, provided the average was at or
below its standard. Figure 4 graphically illustrates several of these  possibilities.
Figure 4(a) shows three possible examples which have P = 0. Figure 4(b) shows a
line having P = 100, where both the mean and second largest standards are
exceeded. Figure 4(c) shows a line where again P  = 100, but the standard for the
mean  has been met. Finally, Figure 4(d)  shows a line with P = 50, where the
standard for the mean is not  met but the other standard is.
Four-Year Trends

    Polludex was evaluated for the  APCD  data  and is listed  for  all three
pollutants  in  Table  VI.  The  State  of Ohio standards  were used  in these
calculations.
    Where there are adequate data, the 1968 and 1971 values are also presented
as bar graphs overprinted on the Cleveland map. The Polludex values for TSP,
NO2, and S02  are shown in Figures 5(a), (b), and (c), respectively. If there are
two bars, the left bar represents 1968 and the right bar 1971. With the exception
of site M of Figure 5(c), a single bar represents 1971. It is clear that, in general,
TSP levels have increased to the west of the Cuyahoga River and decreased to
the east. The most pronounced improvements are  downwind of the valley  (in
Cleveland, the winds are predominantly out of the southwest) at sites A, I, and
E. The levels of N02 show much less variation, except for the increased levels at
sites H and C. With one exception, there has been  a significant  reduction in the
levels of S02  throughout the  city, with the most pronounced improvements
occurring,  as with TSP,  at sites  A,  I, and E.  Since space  heating  is fueled
primarily  by natural gas, this  implies  a  reduction  in  SO2 contamination  by
industrial  and power-producing sources. At this time we do not have sufficient
information to  determine whether the improvements in the valley are due to the
general decline  in business activity in recent years, the abatement efforts by the
industrial  community,  both of these  reasons,  or, possibly, neither of these
reasons.
Concluding Remarks

    Air quality  data  (total suspended particulate, nitrogen dioxide, and sulfur
dioxide) for Cleveland, Ohio, for the period of 1968 to 1971 have been collated
and subjected  to statistical  analysis. It is apparent that the  data for total
suspended particulate and, to a lesser degree, the data for sulfur dioxide and
nitrogen dioxide are  lognormally distributed. The air quality standards of the
 8-8

-------
State of Ohio are met only sporadically by sulfur dioxide in isolated residential
neighborhoods. The available data indicate  that definite improvement  in air
quality has taken place in the industrial region. Overall, there appears to be a net
improvement in air quality, which would be a reflection primarily of the striking
reduction in sulfur dioxide levels.
    A pollution  index has been  introduced which directly displays information
regarding  the degree to which the environmental  air conforms to the mandated
standards for the  environment. As  such,  it  is  a  useful  tool  in  air quality
monitoring programs.
                                                                       8-9

-------
    Table I. Total Suspended Particulate Data Summary for 1967 to 1971
Monitoring
station (m
fig. 1)
A




B




C




D




E




F




G




H




|




1
Statistic
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-f it statistic, (NP^D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-f it statistic, (N)1'2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-f it statistic, (N)1'2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-f it statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-f it statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-f it statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-f it statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D

Standard
>60
60

150

>60
60

150

>60
60

150

>60
60

150

>60
60

150

>60
60

150

>60
60

150

>60
60

150

>60
60

150


1967
19
190
1.4


36
112
1.5
351
0.76
64
124
1.6
343
0.55
44
134
1.5
371
0.37
61
139
1.4
352
0.59
64
101
1.5
303
1.0
8









55
210
1.4
a543
1.08

1968
70
242
1.7
919
0.53
64
104
1.6
349
0.72
79
121
1.6
a429
0.76
72
126
1.5
390
0.42
75
147
1.5
a410
0.83
75
103
1.6
357
0.67
75
99
1.6
317
0.56
65
83
1.6
280
0.53
75
232
1.5
694
0.60

1969
73
199
1.6
"711
0.84
66
94
1.4
226
0.63
72
107
1.6
346
0.50
74
123
1.5
378
0.50
75
119
1.4
276
0.61
75
88
1.6
297
0.64
73
82
1.6
a292
0.79
68
84
1.6
299
0.59
75
223
1.5
8639
0.97

'970
76
188
1.6
a682
0.81
b?2
113
1.6
370
0.48
97
124
1.6
420
0.39
b62
154
1.6
487
0.40
93
136
1.5
a395
0.80
82
109
1.5
307
0.07
103
94
1.7
358
0.59
96
94
1.7
384
0.48
101
225
1.5
701
0.51

1971
69
183
1.7
730
0.73
63
92
1.6
319
0.53
89
121
1.7
502
0.65
C30
163
1.8


80
120
1.5
a328
0.80
74
105
1.5
304
0.72
83
91
1.6
337
0.57
70
89
1.7
352
0.68
93
196
1.6
a658
0.83
8-10

-------
Table I (cont'd). Total Suspended Particulate Data Summary for 1967 to 1971
Monitoring
station (see
Fig. 1)
j




K




L




M




N




O




P




Q




Ft




S






Statistic
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (NI1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (Nt1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D


Standard
>60
60

150

>60
60

260

>60
60

260

>60
60

260

>60
60

260

>60
60

150

>60
60

150

>60
6O

150

>60
60

150

>60
60

150



1967
63
174
1.5
474
0.62
74
53
2.5
399
0.55





53
50
1.9
220
0.72





69
92
1.5
265
0.62
62
135
1.4
343
0.71
63
105
1.5
310
0.62
57
81
1.6
265
0.44







1968
76
161
1.6
"538
0.78
75
58
2.1
320
0.57





73
55
1.9
235
0.67





75
86
1.6
298
0.39
74
139
1.5
390
0.40
69
95
1.5
277
0.42
72
80
1.7
304
0.69







1969
74
151
1.7
"613
0.76
b105
&
1.9
258
0.64
42
157
1.7
569
0.62
98
58
2.3
309
0.67
35
68
2.6
a548
0.76
72
79
1.6
"270
0.83
•72
?27
1.6
407
0.64
70
96
1.4
241
0.67
65
81
1.6
285
0.52







1970
103
156
1.6
"530
0.98
81
49
2.4
8 359
0.83
79
1«I6
2.6
a1013
0.98
58
41
2.6
"372
0.74
81
72
2.9
"755
0.90
90
89
1.7
333
0.71
93
137
1.5
412
0.55
88
106
1.8
8495
0.97
90
89
1.6
309
0.49







1971
90
163
1.7
645
0.73
78
92
1.6
312
0.52
73
212
1.6
637
0.64
72
82
1.6
284
0.59
86
138
2.0
905
0.71
76
90
1.8
422
0.55
74
146
1.4
371
0.60
79
101
1.4
256
0.65
66
89
1.7
384
0.60
51
92
1.5
290
0.71
                                                                   8-11

-------
Table I  (cont'd). Total Suspended Paniculate Data Summary for 1967 to 1971
Monitoring
station (see
Fig. 1) Statistic Standard 1967 1968 1969 1970 1971
T
U
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2b
Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
>60
60
150
>60
60
150
41
170
2.0
1014
0.48
d34
114
2.3
137
0.55
aThe calculation used to obtain this estimate assumed lognormality despite 60
100



>60
100



>60
100



>60
100



>60
100



>60
100



1968
71
211
1.4
517
0.60





76
177
1.5
a495
0.87
55
207
1.4
497
1.65
69
203
1.4
497
0.70
47
212
1.4
a511
0.78
1969
73
220
1.4
470
0.57





75
248
1.3
a454
0.88
70
219
1.3
424
0.70
74
237
1.3
a437
0.90
74
197
1.3
a370
0.76
1970
84
214
1.4
464
0.61
9




115
234
1.4
a576
0.88
b83
217
1.5
576
1.03
108
217
1.4
a504
1.39
96
215
1.3
444
0.70
1971
86
202
1.5
538
0.59
81
190
1.5
a539
0.77
96
255
1.6
835
0.64
C47
205
1.6
686
0.62
9b
205
1.6
a686
1.69
86
203
1.5
a518
0.93
  8-12

-------
     Table II  (cont'd). Nitrogen Dioxide Data Summary for 1968 to 1971
Monitoring
station (see
Fig. 1) Statistic
G Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
H Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
I Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
J Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
K Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
L Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
M Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
N Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
U Number of readings
Geometric average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1|/2D


Standard 1968
>60 72
100 201
1.5
571
0.56
>60 66
100 166
1.5
a471
1.03
>60 67
100 247
1.4
535
0.45
>60
100



>60 74
100 162
1.5
433
0.53
>60




>60 55
157
a1'4
342
0.80
>60




>6U
100





1969
72
221
1.3
a432
0.91
71
225
1.3
a443
0.75
76
253
1.3
495
0.71
52
225
1.4
488
0.65
74
192
1.4
417
0.67





74
168
1.3
335
0.60












1970
104
224
1.3
453
0.43
114
213
1.4
464
0.70
111
238
1.3
a495
1.1
113
255
1.4
a548
0.82
b104
209
1.4
a486
0.76
41
220
1.4
513
0.68
96
176
1.3
341
0.65
39
208
1.6
647
0.65







1971
89
203
1.5
516
0.65
78
202
1.6
a633
1.1
88
217
1.5
a615
0.93
93
240
1.5
600
0.58
88
183
1.6
565
0.67
80
219
1.5
572
0.71
73
159
1.6
507
0.54
88
223
1.6
a712
0.95
d36
230
1.9
a1030
1.34
^he calculation used to obtain this estimate assumed lognormality despite (N) '  D > 0.736.
bSampling site was relocated within same general neighborhood in midyear. It is assumed that for sampling
 purposes the environmental air was the same at both locations.
cTemporarily discontinued because of construction at sampling site.
^Sampling was initiated in the latter part of the year.
                                                                                      8-13

-------
          Table III. Sulfur Dioxide Data Summary for 1968 to 1971
Monitoring
station
Fig.
A




B




C




D




E




f




Q




H



I




J




(see
1 ) Statistic
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N|1/2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fi-. statistic, (N)1/2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)"2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1'2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D
Number of readings
Arithmetic average
Standard geometric deviation
Second highest reading
Goodness-of-fit statistic, (N)1/2D

Standard
>60
60

260

>60
60

260

>60
60

260

>60
60

260

>60
60

260

>60
60

260

>60
6O

260

>6Cr
60

260

>60
60

260

>60
60

260


1968
71
137
2.4
8972
0.75





72
95
2.4
644
0.61
53
106
1.8
413
0.52
/I
112
1.9
476
0.68
47
84
1.9
"364
0.80
69
77
2.1
414
0.57
62
64
2.3
a416
0.85
64
129
1.8
"522
1.04






1969
74
135
2.0
a674
0.96





76
85
2.3
546
0.48
72
103
1.7
278
0.47
75
107
1.6
314
0.42
75
76
2.1
8 409
1.04
71
58
2.0
294
0.70
71
63
2.3
390
0.69
77
110
1.8
467
0.64
32
113
1.9
543
0.53

1970
82
116
1.9
"518
0.88
9




105
74
2.3
476
0.54
b79
109
2.0
a538
0.91
107
96
1.8
8 397
0.88
97
90
1.8
373
0.68
105
63
1.9
295
0.70
113
66
2.2
408
0.47
108
101
1,9
8449
0.87
113
124
1.8
504
0.70

1971
88
84
2.2
523
0.66
86
50
2.1
284
0.70
93
67
2.4
485
0.73
C45
89
2.0
8469
0.76
U4
65
2.1
375
0.71
86
59
2.3
8401
0.83
86
5O
2.4
8 363
0.75
72
48
2.4
336
0.72
83
67
2.1
8358
0.90
93
79
2.0
8410
1.23
8-14

-------
         Table III (cont'd). Sulfur Dioxide Data Summary for 1968 to 1971
Monitoring
station (see
Fig. 1) Statistic
K Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
L Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
M Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
N Total suspended paniculate
Nitrogen dioxide/
Sulfur dioxide
O Total suspended paniculate


Standard 1968
*SS a59
62
27



61 62
57
0
205 293


65 71


1969
43
92
11



37
68
0
268


a56


1970
b59
b109
bo
222
120
141
70
76
9
b436
108
a62
85


1971
81
83
819
280
119
a192
63
59
a 22
317
a127
a105
116
   P        Total suspended paniculate            127      146       142       151       145

   Q        Total suspended paniculate             91       71        60      a153        69

   R        Total suspended paniculate             56       68       62        77       102

   S        Total suspended paniculate                                                    73

   T        Total suspended paniculate                                                   380

   U        Nitrogen dioxide                                                           d129
            Sulfur dioxide                                                             d138

aThe calculation used to obtain this estimate assumed lognormality despite (N)  ' D ^ 0.736.
'-'Sampling site was relocated within same general neighborhood in midyear. It is assumed that for sampling
 purposes the environmental air was the same at both locations.
cTemporarily discontinued because of construction at sampling site.
 Sampling was initiated in the latter part of the year.
                                                                                      8-15

-------
                    Table IV. - Significance Levels for the
                Kolmogorov-Smirnov Goodness-of-Fit Statistic
                          [From Lilliefors( 1967)]


        Significance level, a   0.20     0.15     0.10     0.05     0.01

        Statistic, (N)1/2Da    0.736    0.768    0.805    0.886    1.031
            Table V. - Summary of Results of Goodness-of-Fit Tests
Monitoring
station (see
fig. 1)
A
B
C
D
E
F
G
H
I
J
K
L
M
N
0
P
Q
R
S
T
U
Total suspended
Number
of tests
4
5
5
4
5
5
4
4
5
5
5
2
5
5
5
5
5
5
1
1

part icu late
Rejected
2
0
1
0
3
2
1
0
3
3
2
0
0
1
1
0
1
0
0
0

N itrogen
Number
of tests
4
1
4
4
4
4
4
4
4
3
4
2
4
2






1
dioxide
Rejected
0
1
3
2
3
3
1
3
2
1
1
0
1
1






1
Sulfur
Number
of tests
4
1
4
4
4
4
4
4
4
3
4
2
4
2






1
dioxide
Rejected
3
0
0
2
1
3
1
1
3
1
1
1
1
2






0
Total
85
20
49
23       49      20
Percentage of
tests rejected
                24
                     47
                             41
Expected number
of rejections
                17
                     9.8
                             9.8
8-16

-------
                       Table VI. Polludex Values for 1967 to 1971.
Monitoring
station (see
fig. 1)
A


B


C


D


E


F


G


H


1


J




Pollutant
Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
Total suspended participate
Nitrogen dioxide
Sulfur dioxide
Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
Total suspended paniculate
Nitrogen dioxide
Sulfur dioxide
Total suspended particulate
Nitrogen dioxide
Sulfur dioxide
Total suspended particulate
Nitrogen dioxide
Sulfur dioxide
Total suspended particulate
Nitrogen dioxide
Sulfur dioxide
Total suspended particulate
Nitrogen dioxide
Sulfur dioxide


1967 1968
408
111
a201
111 103


117 a144
77
103
135 135
107
68
133 a159
103
85
a85 104
112
a40
89
101
44
62
66
a34
"255 324
147
a108
203 a213




1969
a303
120
a142
54


105
148
75
129
119
58
91
137
50
72
97
a42
a66
121
7
70
125
27
8299
153
82
a230
125
99


1970
a284
114
a97
b117


144
134
55
b191
b117
a,b94
a145
117
a56
a93
115
47
98
124
10
106
113
34
321
138
8 70
"207
155
100


1971
296
102
70
82
90
5
167
155
49
(c)
C99
a.c64
a109
105
26
89
103
27
89
103
a20
91
102
15
a283
117
25
251
140
a45
*The calculation used to obtain this estimate assumed lognormality despite (N)1^D ^0.736.
 Sampling site was.relocated within same general neighborhood in midyear. It is assumed that for sampling
 purposes the environmental air was the same at both locations.
"^Temporarily discontinued because of construction at sampling site.
 Sampling was initiated in the latter part of the year
                                                                                          8-17

-------
              Air Pollution Control Oltict' ?7h5 Broddttdy
              Audubon Junior High School  3055 East Boulevard
              Brooklyn > MCA We st 25 St  and Demson
              Cleu'ldnrt Health Museum 8111 tuclid
              Clrieland Pneumatic Tool  3701 East 71 (near Broadway)
              Collirmood Hiqli School East 15? and St Clair
              Cudell Recreation Center West Boulevard and Detroit
              Estalirook Recreation Center Fulton and Memphis
              Fire Station H 474" Broadway
              I ire Station 11 East 5' and St  Clair
              G \VashinqtonElementarySchool 16210 Lorain
              Harvard Nards 4150 East  49 St
              J F  Kennedy High School. 17100 Harvard
              P I Ounbar Elementary  School 22(10 West 28 St
              Alnira Elementary School West 98 St and Almira
              Fire Station ?9 East 105 St dnd Superior
              John Adams High School  3817 East  116 St
              J f  Rhodes  High School 5100 Biddulph
              St Joseph High School  18491 Lake Shore Blvd
              Supplementary Education Center 1365 E
              St Vincent Charity Hospital, E. 22 St
                         Lakewood
           Rockv River
Figure  8-1.  Air pollution monitoring sites  for  Cleveland, Ohio.  The heavy  line
       down  the center is the Cuyahoga  River.  The municipal boundaries have
       been straightened somewhat but are accurate in their essential features.
                          600-

                          400-



                          200-
                       I  100
                       §  80

                       5  60


                          40
                          20
                                                                             J
                           .1
                                                    50
                                                 Frequency
                                                              90
                                                                       99   99.9
 Figure  8-2.  Lognormal  plot  of  distribution  by  weight  of  total  suspended
        particulate   (24-hr,   sampling)   for  monitoring  station   I   (see  Fig.  1)
        downwind of the industrial region.
8-18

-------
      600

      400
tf\
      200
 g
 1   1Qo
 §    80
 o
 0    60
       40
       20
              Year
         o   1967
         o
         o
         Ci
         o
   1968
   1969
   1970
   1971
           I
                                   i   i   i
                              1
                                                                                 I
          1
1
10
90
99     99.9
                                          50
                                      Frequency
Figure  8-3.   Lognormal  plot  of  distribution  by  weight  of total  suspended
      paniculate (24-hr, sampling) for monitoring station K (see Fig. 1) upwind
      of the industrial region.
                                                      'P-100
                                                      ^Standard
                                      P-0
                    =  (a) Standards for air quality    (b) Pollution levels twice the
                    g   are met.                allowed standards.
                                                        /Standard
                                                         P-50
                                       Frequency

                      Figure 8-4. Examples of Polludex levels.
                                                                             8-19

-------
          (a) Total suspended paniculate
Figure 8-5. Bar graph presentations of Polludex values for the three pollutants at
      the various  monitoring stations.   Left  bar  represents  1968  level  of
      pollution; right bar or a single bar represents 1971 level. Alphabetic coding
      of  monitoring sites corresponds to that of Figure  1.
8-20

-------
References
Anon., 1971:  Federal Register. 36: 8186.
Babcock, L. R.,  1970:  A Combined Pollution Index for Measurement of Total
     Air Pollution,./. Air Pollution Control Association. 10: 653.
Benarie, M., 1971:  Sur  la Validitie de la Distribution Logarithmical-normalie des
     Concentrations de Polluant. Proc. Second Intern. Clean Air Congress, pp.
     68-70, eds., H.M. Englund and W. T. Beery, Academic Press, New York.
Harter, H.  L., 1961: Expected Values  of Normal Order Statistics. Biometrika.
     48: 151-165.
Larsen,  R.  I.,  1971:  A  Mathematical  Model  for Relating  Air Quality
     Measurements  to  Air  Quality  Standards.  Environmental  Protection
     Agency, Office of Air Programs, U. S. Rep. AP-89.
Lilliefors, H.  W., 1967: On the Kolmogorov-Smirnov Test for Normality With
     Mean and Variance Unknown./ Am. Stat. Assoc. 62: 399-402.
Mitchell, R. I., 1968: Permanence of the Log-Normal Distribution./ Opt. Soc.
     Am. 58: 1267-1272.
Neustadter, H. E.;  King,  R. B.; Fordyce,  J. S.; and Burr, J. C., Jr., 1972: Air
     Quality  Aerometric Data for the  City of Cleveland from 1967 to 1970 for
     Sulfur Dioxide, Suspended Particulates, and Nitrogen Dioxide.  NASA TM
     X-2496.
Noether, G. E., 1967: Elements of Nonparametric Statistics. John Wiley & Sons,
     Inc. New York, N. Y.
Ohio,  1972:   State  of Ohio,  Department  of Health,  Air  Pollution  Unit.
     Regulations AP-3-02, AP-7-01.
Pikul,  R.,  1972: Development  of Environmental Indices. Mitre Corp.,  Rep.
     M71-47.
DISCUSSION
Singpurwalla: When you  tried these Kolmogorov-Smirnov goodness of fit tests
could you care to tell which tables you used for the levels of significance?
Neustadter:  I believe they are referenced in the report.  I don't have it in my
head. The statisticians did it, but I believe the tables are in the report and the
reference is there.
Singpurwalla: The reason for questioning this is  because if  you estimate the
parameters from the data and go  ahead and use Kolmogorov-Smirnov tables
which are generally available, then you are apt to make sdme kind of an error.
But there are modified tables.
Neustadter: We are aware  of that. We used the modified  table.
                                                                    8-21

-------
Singpurwalla: O.K., and I  was wondering if that might be any reason why it
might change the answer.
Neustadter:  No. We are aware of the problem of using estimated parameters and
we did use modified tables. [See Lilliefors (1967).]
Rustagi:  This morning I have heard quite a  bit of glorification of lognormal
distribution.  I want to add one  more reference  to that.  In  1964, Archives of
Environmental  Health  I have  the  paper,  titled  "Stochastic behavior  of trace
substances," in  which many of these substances have been studied including air
pollutants.  The amazing thing was that many substances were  in liquids.  For
example, amino acids in urine also  followed lognormal distribution.  The second
point I want to ask is for Mr. Marcus, who used the concept of a very interesting
cumulative dose. I  think most of the trace metals such as lead, about which I am
familiar with, in  the human body or biological  systems  are excreted in also
certain random fashion. Could the  deposition of a substance like lead  or other
gases be put into  the model?  A  simple  model in that  connection was also
mentioned  by me in Archives of Environmental Health giving a model  of body
burden where intake and output were used in the model, however, not as the
formal stochastic  processes,  rather as  probability  distributions without any
assumptions for parametric form such  as lognormal. I would like to mention the
physiological experiments connected  with  air pollutants. There are very few
studies but I think the audience  should be aware of two famous studies—one is
on human subjects over the past thirty years on lead by Dr. R. A. Kehoe and I
think it is given in a series of lectures by Dr. Kehoe, "Metabolism of  lead in man
in health and disease," where he studied whatever metabolism could be studied
in man. In animals Professor Herman  Cember of Northwestern has  studied the
metabolism  of  mercury in rats  over  a  period of time and  I think these two
studies should be noted. I'm not aware  of others.
 8-22

-------
 9. AN INVESTIGATION OF THE FREQUENCY DISTRIBUTIONS
      OF SURFACE AIR-POLLUTANT CONCENTRATIONS
                             J. B. KNOX

                           R. I. POLLACK

                   Lawrence Livermore Laboratory
            University of California, Livermore, California
    In several papers, Larsen and co-workers (1965; 1967; 1969) have discussed
the frequency distributions of various pollutant concentrations calculated from
data taken at CAMP  (Continuous Air Monitoring Program) sites for 3 years in
various cities. The data consist of instantaneous measurements taken at 5-minute
intervals. When used in  this  fashion, or averaged over any period of time, the
resulting frequency distribution is in all cases approximately lognormal. It was
also noted that median  concentration is proportional  to averaging time to an
exponent.
    This result  allowed  these investigators  to  relate  the geometric mean,
standard geometric deviation and averaging time to the probable number of
times during a year that a given level of pollution would be exceeded. This type
of data presentation is useful because ambient air quality standards are set in the
form of a maximum allowable average over a given period of time e.g., 0.03 ppm
for 8-hr-duration samples would be allowed once a year for oxidant.
   The CAMP data indicates that reactive pollutants may have a larger SGD
(standard   geometrical  deviation)  because of the. additional  variability
introduced by the nature of  chemical  or photochemical reactions.  It was noted
by Larsen and verified by Knox and Lange (1972) that continuous point sources
give concentration  distributions with larger SGD's  than area  sources. This is
attributed to the dependence of the pollutant concentration on the lateral and
vertical standard deviation of the plume.
    Barlow  (1971)  suggested  that the lognormal  distribution may not be
appropriate because the averaging process implies that the sum of lognormally
distributed random variables is itself lognormal.  This is contrary  to statistical
theory. He suggests that a Weibull distribution  would be more appropriate. This
suggestion is supported  by Milokaj (1972) who  has argued the validity of the
Weibull distribution  for  a  variety  of  situations involving  pollutant
concentrations. He emphasizes the importance of the threshold parameter (7).
                                 9-1

-------
The probability of occurrence of a value of the random variable smaller than 7 is
essentially zero. The lognormal distribution also has a formulation including a
threshold parameter,  however it is  usually ignored due to the fact that  low
concentrations are  ordinarily   beneath  the  sensitivity  of  the  measuring
instruments. Also,  the most interesting cases are the higher concentration levels
where  the lognormal  fits well, and the significance of a threshold  parameter is
minimal. There is some indication, justifying  further  investigation, that  the
lognormal fits well at both high and  low concentration levels, but  with slightly
different parameters. This suggests  that two  adjacent lognormal  distributions
may be present, perhaps caused by different types of meteorology.
    The  motivation for using the Weibull distribution is largely empirical. This
distribution, with density function

                    f(x)  =  kxme-kxm+l/
-------
until their increased weight causes them to fall out. This growth process dictates
that the size of a particle is a function of its previous size multiplied by the rate
of coagulation—a multiplicative process. This can be shown to yield a lognormal
distribution. The rate of coagulation is a function of a variety of other variables.
     Another  important variable which has  been found to  be lognormally
distributed is the rate of dissipation of energy in turbulence (e). This distribution
was  postulated  by Kolmogorov  (1941) based on the assumption of a cascade
process by which energy is transferred from a large-scale turbulent motion to
progressively smaller-scale motions.  The transfer stages are assumed similar and
independent. Thus, the  amount of energy dissipated at any stage is a function of
the amount dissipated at the previous stage. The process may be viewed as

                            €i  =  (€j_,)Y;                          (1)

where ej  is the energy  dissipated  at  stage  i and  Yj is a characteristic of the
transfer stage causing the change in e j_i.
     Due to the reproductive  properties of  the  lognormal  distribution, the
distribution of  e implies the lognormality of several related variables including
the dissipation  of temperature by thermal eddy conduction, the squared-space
differences of temperature and velocity, which imply that the differences them-
selves  are  lognormally  distributed  on either  side  of  the  origin, and the
horizontal  eddy  diffusivity. These  imply that the diffusive  transfer between
adjacent  volumes  of  air  is  lognormally  distributed.  Furthermore,  the
lognormality of wind speeds, which has been demonstrated empirically, implies
that the advective transfer is also lognormally distributed. These distributions
have all been verified experimentally  (Knox and  Lange (1972);  Gibson, et al.
(1970) and (1970a)).
The Lognormal Process

    The fundamental  question has  yet  to be  answered:  Why are all these
variables lognormally distributed?  What underlying physical  phenomena cause
the lognormality of these variables? The  answer to these questions requires a
basic  understanding of  the  theory  behind the generation  of  the lognormal
distribution.

    Consider a stochastic process of the form

                        xi  =  xi-| +  *i-l Yj                         (2)
 where Yj is an independent stochastic variable, arbitrarily distributed.
     If we solve  Equation 2 for Yj.
                            X  - X_
                                                                       9-3

-------
and sum both sides.
                                 i-l       0
We can approximate*the left side by
                            n=0
                                          N
                                N
by  the Central Limit  Theorem ZYj is  normally distributed, hence  Xn  is
lognormally distributed.          0
    This is known as the law of proportional effect; the percentage change in a
variable is equal to a constant plus an error.  If the absolute change had been
equal to this same constant + error  term,  the normal distribution would have
resulted. Hence,  the lognormal distribution is the result  of  a multiplicative
process whereas the normal distribution results from an additive process.
    The basic properties of the lognormal are all multiplicative analogies to the
normal distribution.  This includes the reproductive properties.  In particular the
product of two lognormal distributions is lognormal,  the sum,  however, is not.
Aitchison and Brown (1957) discuss these matters more fully,  but the above is
sufficient for the purposes herein.
    We recognize  at this  point  that the processes leading  to  the particle size
distribution,  and the lognormality of e, are similar to  Equation 2. In the latter
case we need merely to replace Y1 by (1+Y.,). This adds a constant to the error
term, but makes no fundamental change in the process.

A Simple Model of Pollutant: Concentrations

     It has been found that
                                 = KQ/uU)                        (3)

    where i// is pollutant surface concentration
          Q is source strength
          u  is wind  speed
          v  is frequency
\&
 Approximation error small as At>0.
9-4

-------
 is an appropriate model for predicting concentration of S02 and particulates in a
 well mixed urban environment (Gifford and Hanna (1972)). The constant K has
 been  measured  rather extensively  and a  range constructed  for  K for each
 pollutant.
    Based upon the lognormality of  1/u, which has been verified empirically
 (Knox and Lange (1972)), and the relative  invariance of K, which has been said
 to  be a  weak  function of city  size, we conclude  that  i//  is  lognormally
 distributed.
    Knox and Lange (1972) have estimated K' = KQ by experiment  and  by
 using a box  model  to predict concentrations. Their findings indicate that a
 suitable value of K can be found either by using the box model or comparing i//
and  1/u  by visual superposition, and adjusting K' so  that i// and K'/u have
approximately equivalent geometric means. In addition, with this value the
variances of i// and  K'/u  are approximately equal. (See Figs. 2-6 of Knox and
 Lange (1972)).
    For continuous point sources Knox and Lange (1972) fitted the  model
                     Q        2  
-------
                              ^_  ik  -TU>     d   .    y ~T    I "•" T~  V >>7  ^   I
                              0y    '  oy      oy       dz
where  t//a  is  the  concentration  of  pollutant  a;  u,  w,  v are the velocity
components;  Kx,  Ky  are  the lateral  vertical  eddy  diffusitives,  which  are
lognormally distributed based upon the lognormality of e and the reproductive
properties;  Kz is the vertical eddy diffusivity; Sa is the source term for pollutant
a; P is the term representing changes in concentration due to photochemistry; V
is the volume of air for which S and P act.
    This equation  can  be manipulated to represent a box  model  formulation
(MacCracken, et  al. (1972)) wnere we are concerned with the  concentration
averaged over a box which is surrounded by M other boxes.


     «[*k(mft)]      M
     - Tt -  = -£     TA(m,j) + TD(m,j)  hK(m,t)   (6)
           at                L                        j   *
         Z   [TA(J,m) + TD( j,m)]  \//k ( j ,t) + Sk(m,t)
         j=0
where  TA  (m,j) and  TD  (m,j) are  the  advective and eddy diffusive transfer
coefficients from box  m to box j. The lognormal distribution can bearguedfor
these latter variables in a similar manner as for Kx and K .
     This equation is also  consistent with the generating process. Equation 2,
when certain reasonable restrictions hold:
     (a) The contribution of advection and diffusion terms are larger than  the
contribution of the source term. It has been found empirically that if this is  not
the case, lognormality does not result (Hopper (1972)).
     (b) The concentrations in the surrounding boxes  are, on the average over
long periods of time,  close to that of the box we are  interested in, due to  the
fact that they are subjected to similar stimuli.
 9-6

-------
    These restrictions transform Equation 4 to

            M)      M
         dt           .=Q


                M
                                                                 (7)
                    r                       T
               I     TA(j,m) + TD(j,m)|
               i=o L                       J
    Suppose we let

                ^(J.O = ^k(m,t)  + Ek(j,t)               (8)

the equation becomes

       diMm.t)      M  r                       1
      — T: - =  - I   TA(m,j) + TD(m,j)J  //(j,t) and i//(m,t) will be small.
Hence the term  tends to  zero.  Conversely, in the case where  the error term
Ek(jft) is  large,  the constant term will usually be small indicating light winds.
Furthermore,  in either case, or any combination  of cases occurring between T
and TR, we can  expect that the sign  of the term will vary, implying that the
positive and negative terms  will cancel each other.
    This  argument implies that Equation  9 is consistent with  the  law of
proportional effect.
    The solution will be  source-dominated only when  the magnitude of the
source terms is comparable  to the magnitude of the current concentration. There
                                                                 9-7

-------
is  reason to believe  (Hopper (1972)) that in such cases the concentrations will
not  be  lognormally  distributed,  as  the model  indicates. This result has been
noted in investigations of  particle size distributions also (Blifford and Gillette
(1971)).
    This reasoning is most easily justified for a well mixed urban region. It is not
clear that the lognormal distribution  will fit as well for non-urban, poorly mixed
areas. We do feel, however, that the  characteristics of an area's topography and
typical meteorology  would have to be highly unusual tor  (10) to be so large that
the lognormal distribution would fit poorly.
    We have not yet discussed the  At interval  necessary for these results. We
recognize that it  must be sufficiently small not to obscure  the  generating
process.  If,  for an extreme example, At was  6  months, we would not see the
process  defined by Equation 2 because the effect of (//j^  on i//j would have long
since died out.  Larsen's (1965) data  is for 5-minute instantaneous readings. We
accept this as an appropriate time scale for our purposes, based on the fact that
meteorology certainly  does  not change  enough in a 5-minute  period  to
obscure  the relevant  correlations.  Of course,  the distribution of  pollutant
concentration remains unaltered.
    When the  data  is  averaged  over  other time periods within the realm  of
atmospheric motion, the averaging  time acts as a  filter  which smooths out
motions of  a smaller time scale.  This has the effect of allowing us to see only
motion  of  a time scale comparable  to the averaging time in the averaged data.
Hence the process described by Equation 2 still  holds for larger averaging times,
but the TA, TD  terms now represent motion of a larger scale. This results  in
lognormality over a large spectrum of averaging times.
    We  are  presently   investigating  the   magnitudes of the  first order
multiplicative  autocorrelations  for  all  averaging times.   Preliminary results
indicate that significant positive autocorrelation is present for averaging times up
to at least  2  weeks. This lends  credence  to the assumption that Equation  2
acts over a large spectrum of averaging times.
Applications

    Ambient air quality standards (AAQS) are set in terms of the number of
times a concentration of a particular pollutant shall exceed  a  specified limit,
averaged over a specified number of hours. For example, an 8-hour average of
CO may not exceed 30 ppm more than once per year.
    Concentration  distributions  must  be calculated to compare ambient air
quality with these types of standards. An air quality prediction has meaning only
when the averaging  time and level of confidence of the estimate are included.
This requires knowledge of the concentration distributions. Thus,  whether we
are interested in prescribing standards, describing levels, real time monitoring or
land use planning, knowledge of concentration distributions is indispensable.
9-8

-------
     The foregoing discussion  indicates that  there is an increasing amount of
 evidence  supporting  the contention that surface air pollutant concentration
 frequency  distributions  are  lognormal.  This  evidence  includes empirical
 investigative results,  arguments  regarding  the  relationship of meteorological
 variable distributions to pollutant frequency distributions from simple diffusion
 models, and deductions of the nature  of the pollutant frequency distributions
 from considerations of the complete set of governing equations for a multiple
 box model of photochemical pollutants. The possible exceptions to lognormality
 of  pollutant distributions have been indicated. However, it is now pertinent to
 explore the practical and  research implications of large portions of air quality
 regions having pollutant distributions that are lognormal;  significant implications
 include:
     (a) The application of air quality  simulation models  to  land-use planning
assessments for consistency with  AAQS or to  the  design  of measures to achieve
consistency with AAQS in growing areas should be  expedited in principle.
     (b) The validation tests of air quality simulation  models should include the
requirement that calculated pollutant frequency distributions, or key portions of
those distributions, correspond to reality.
     (c) Knowledge that the pollutant concentration distributions are lognormal,
 should eventually lead  to  simplifications in data  acquisition by air monitoring
 networks and to the feasibility of real time control  mechanisms.
Land Use Plan Assessment

    Consider the future when a verified and acceptable numerical simulation
model for air pollution exists. The question then is, how can such an acceptable
numerical simulation model be employed in land-use plan assessments? Given a
region of interest for planning purposes and a suite of pollutants of concern, one
could examine, for  instance, the frequency distribution of hourly-average values
of surface air concentrations and identify the portion of the distribution which
                                                     •
is  equal  to or  greater  than  the  ambient  air  quality  standard involved.
Conceptually, the days or episodes involved in that part of the distribution could
be composited into mesoscale or regional  weather types.  The meteorological
fields and air quality data from those days or  episodes would constitute case
studies  for  model calculations.  In  Figure 3, 3  weather types are illustrated,
corresponding to high, moderate, and low levels of pollution.. The solutions  of
such numerical modeling case studies would delineate a spatial distribution  of
the excess over ambient air quality  standards in the region which might not
necessarily be defined by the network of monitoring stations. From examining
those excesses and their spatial distributions, one could determine the degree  of
control  and a location of control necessary to remedy the excess. In principle,
the same set of analytical  steps could be applied to forecast emission  zonings
                                                                       9-9

-------
associated with either  growth or alternative land-use plans for the same set of
identifed days. Hence, in this matter, one could evaluate the degree  of control
necessary for an existing situation in a region of interest to bring the air quality
of that region into line with ambient air quality standards, or else to assess the
excess of ambient air  quality standards in need of control that correspond to
various land-use plans.
    Kennedy, et al.  (1971) developed such a program for Chicago  utilizing a
sub-model to predict the effects of a certain type of emission zone in a particular
place. These  "coupling coefficients" are essentially a linear model of dispersion.
They are used as coefficients in  a  linear program, the objective of which is to
minimize  the social and  financial  burden of  restrictions while  satisfying  air
quality  constraints.  Of course,  non-linearities  caused by  interaction between
pollutants and such are overlooked, and extremes are calculated through the use
of coupling coefficients and extrapolation of the frequency distribution. But the
model seems appropriate for making a land-use plan assessment or corresponding
emission zoning.
Model Validation

    Application of the model  to such economically sensitive problems as land
use  planning  requires that  the  model  predict  the surface concentration
distributions quite  accurately. In order  to  discuss  validation  of numerical
simulation models of regional air pollution we reference some recent results in
the development and initial verification  of an air pollutant model  for the San
Francisco Bay Area (MacCracken, et al. (1972); Gelinas (1972)). This model uses
historical  meteorological data to predict the mean and surface air concentration
in each of the model cells, including transport and diffusion by the ambient
wind  field between  the irregular earth surface and the time  and space variable
marine  inversion layer. (See Figs. 3-4 of MacCracken, etal. (1972)). The verifica-
tion work was carried out on a 48-hour test period during July, 1968. Figure 4
displays the  observed hourly-average  concentrations of CO in parts per million
during the case study, as well  as the computed vertical average and computed
surface  hourly-average   CO  concentration.  There  is  very  reasonable
agreement between the observed and  the computed surface concentrations. This
information  of calculated versus observed  concentrations can also be displayed
as a lognormal frequency  distribution plot, Figure 5.  The significant feature to
be noted  here is that the frequency distribution of the predicted hourly-average
concentrations on  lognormal paper  parallels  the observed  (Knox and  Lange
(1972)). In addition, it is  parallel to that obtained  by  Larsen  for the frequency
distribution  of   hourly  averages  of  CO  for  a  year. Frank Gifford
(ARATDL-NOAA)  has recently noted that several  of the numerical simulation
models under development at  this time render numerical solutions which  are
 9-10

-------
"noisier"  than  the  observed distributions.  A  numerical  solution, that  is
contaminated  with  noise will, in  general, not be able to predict the frequency
distribution of the surface pollutant and, therefore, will have severe limitations
in regard to a comparison  of predicted frequency distributions to ambient air
quality standards. Hence, one criterion for  an acceptable model for numerical
simulation  of air  pollution  is whether the  model is  able  to  reproduce the
frequency  distribution  characteristics  of the pollutants involved and in the
region of interest.
Monitoring

    Knowledge of  the  particular distribution and its parameters allows us to
make statistical comparisons  between  predicted  air quality and  air quality
standards.  Alternatively, we may simplify the procedure by taking  random
samples and manipulating only  this reduced  volume  of  data. The resulting
estimates would be measures of typical long-term concentrations and variability.
Figure 6 shows estimates of the  distribution of hourly averages of CO in San
Francisco  from 1968 through  1970. The good agreement between estimates
made from  100 random samples and 10 random samples, with the distribution
obtained   from   continuous  monitoring suggests  the  possibility  that  an
appropriate  spatial  and temporal  random sampling  scheme  would  allow one
movable receptor  to estimate annual averages  in a number of locations. This
method has potential for use with land use models where long-term information
is desired. Methods of sampling local air  quality, as contrasted to  continuous
monitoring of local air  quality, are not well suited to comparison of predicted
concentrations from a model to short term AAQS.
    Nonparametric methods were also investigated, but they tend  to be less
powerful than parametric methods in cases where the assumptions of parametric
statistics apply. The latter  methods  also  have  the  advantage  of  ease of
manipulation and the simplicity of exact specification of the distribution.
    A natural extension of the principles of air pollution monitoring is real-time
control. This is a  potentially effective method  for controlling air pollution
episodes.  It  requires  a model with  the ability to predict  future pollutant
concentration distributions at all points in the region sufficiently far in advance
so that control actions can  be  taken  to  avoid an  impending episode.  These
actions  may be  quite  selective, in that they  need  only  be taken  during
emergencies and then only  in offending emission zones. We recognize that this is
not within present capabilities, but we  look ahead to the construction of such
"feed forward" control schemes.
                                                                      9-11

-------
Conclusions

    There  is  increasing evidence  to  support the theory  that air  pollutant
concentrations are  lognormally  distributed in areas devoid of strong sources,
whether they be passive or photochemical, and in the absence of meteorological
or topographic effects resulting  in sharp differences in concentrations between
adjacent  volumes of  air.  This  lognormal distribution  is supported by  (a)
empirical  evidence,  (b) the simple  model of urban  pollutant  concentrations
proposed by Gifford when examined in the light of the lognormal distribution of
the reciprocal  of wind speed verified by Knox and  Lange, and (c) the theoretical
derivation  from  the full set of equations  governing  the time evolution of
pollutant concentrations presented herein.
    Weibull wrote  ". . . it  is utterly hopeless to  expect a theoretical basis for
distribution functions such as ... particle sizes," and yet one has been provided.
In  fact,  it seems reasonable  to  expect that  the  physics describing a process
should be  consistent with a distribution function  describing the results of that
process, indeed, anything else would be suspect. This is what has been provided
here, a consolidation of empirical evidence with physical  theory.
    Knowledge of pollutant concentration distributions  is necessary for land-use
plan assessment  to compare predicted air  quality with ambient air quality
standards.  It  is useful  as a method of verification of  a numerical simulation
model of air pollutant evolution, and it is a potentially valuable tool for use in a
real-time model predicting short-term  fluctuations in pollutant concentrations.
Acknowledgement

     This work was performed under the auspices of the U. S. Atomic Energy
Commission.
 9-12

-------
 100
I
  10
5
u
§
  1.0
                                        100
                                        E
                                        Q.
                                        O.

                                       o

                                       I 10
                                       LJ
                                       8
                                                  10
                                                 40   60
                                             FREQUENCY
80
                             Figure 9-2. Lognormal probability plot
                             of CO concentration vs frequency for
                             San Francisco.
6 8 10   20    40
      FREQUENCY
                           60  80
Figure 9-1. Weibull probability  plot
of CO concentration vs frequency for
San Francisco.
                                                                      9-13

-------
 lf
 I'
 Z 9
                                                               High Pollution
                                 Annuol Average •
                                 I	I  I   I   I  I
                                                             Moderate Pollution

                                                        Low Pollution
   001  QOSQIQ2  05 I   2    5   10   20 30 4O  50 60 70  80  90  95   98 99 99.5 99.8 |     99.99
                                     FREQUENCY (%)                       99.9
Figure  9-3. Carbon monoxide concentration vs frequency for San Francisco for
      various categories of pollution-days.
                   i«
                   O 3
                   H

                   P 2

                   Z  I
                      00
                           T    I    I     I    I     I
                            -Computed vertical average
                                                    Observed
12    18   00
12    18    00
                             (July 10)    T|ME   (July II)
Figure 9-4. Carbon  monoxide concentrations  for  San  Francisco, July  10-11,
       1968.
9-14

-------
                e  10
                0.
                0.
                Id
                (J

                O
                O
                0.03
                         I  I I  I  I  I   I   I   I   I  I I I  I   T


                                   LARSEN MODEL CONC.
                              BAY AREA MODEL
                                   SURF CONC.
                    OBSERVED SURF CONC.
                                           x^v

                             BAY AREA MODEL    *^X
                                    AV CONC.
                        I  I  I  I  I  i  I   I  I  I  I  I  I  I
                    0.01
                              I
10
50
90
                               FREQUENCY-% of time
                            concentration  is  exceeded
Figure 9-5. Carbon monoxide concentration vs frequency for San Francisco.
                 ~i—p—i—i—i—r
                                    i—i—i—r
  9O
  eo
  70
  6O

  5O

  4O


  30



  2O
5

4

3 I-



2 -
                                                 O - 10 Random Samptn
                                                 A - 100 Random Samplti
                                              Continuous Monitoring
                                              '(20,100 data pomlt)
   0.01  0050102 0.5    2   3   10   20  30 4O SO 60 70  80   90  95   969999.5 (99.9   99.99
                                 FREQUENCY (%)                      99.8


Figure 9-6. Carbon monoxide vs frequency for  San Francisco—hourly  averages.
                                                                         9-15

-------
References

Aitchison and  Brown,  1957:  The  Lognormal Distribution. Cambridge Univ.
      Press, 176.
Barlow, R. E.,  1971:  Averaging time  and maxima for air pollution concentra-
      tion. NTIS AD-729 413, ORC 71-17.
Blifford, I.  H., and Gillette, D. A., 1971:  Applications of the lognormal  fre-
      quency distribution to the chemical composition and  size distribution of
      naturally  occurring atmospheric  aerosols. Water, Air & Soil Pollution. 1:
      106-114.
Friedlander, S.  K., 1960: On the particle size spectrum of atmospheric aerosols.
      Journal of Meteorology. 17: 373-374.
Gelinas, R. J., 1972: Stiff systems of kinetic equation, a practioner's view. J. of
      Computational Physics. 9: 222-236.
Gibson, Stegen, and Williams, 1970: Statistics of the fine structure of turbulent
      velocity and temperature fields measured at  high reynolds number. J. Fl.
      Mech. 41: 153-167.
Gibson, Stegen, and McConnel, 1970: Physics of Fluids. 13: No. 10.
Gifford, F. A., and Hanna, S. R., 1972: Modeling urban air pollution. ARATDL
      Contribution No. 63.
Hopper, C., 1972: personal communication.
Kennedy, A. S., Cohen, A. S., Croke,  F. J., Croke, K. G., Stork, J., and Hurter,
      A. P.,  1971: Air pollution-land use planning project, phase I. Final Report,
      ANL/ES-7.
Knox, J. B.  and Lange,  R.,  1972: Surface air pollutant concentration-frequency
      distribution: implications for urban air pollution modelling. University of
      California, Lawrence Livermore Laboratory, Report UCRL-73887.
Kolmogorov, A. N., 1941: Dokl AN SSSR. 30: 301.
Larsen,  R. I., Zimmer, C. E., Lynn,  D. A., and Blemel, K. G., 1967: Analyzing
      air pollutant concentration and dosage data. J. Air Pollution Control As-
      sociation.  17: 85-93.
Larsen,  R. I., 1969: A new  mathematical  model of air pollutant concentration,
      averaging  time, and frequency. J. Air Pollution Control Association.  19:
      24-30.
MacCracken, M. C., Crawford, T. V., Peterson, K. R., and Knox, J. B., 1972:
      Initial   Application of a  Multi-Box  Air Pollution  Model  to  the San
      Francisco Bay Area. Univ. of California, Lawrence Livermore Laboratory,
      Report UCRL-73348.
Milokaj, P.  G.,  1972: Environmental  applications of the Weibull distribution
      function:  oil pollution. Science. 176: 1019-1021.
9-16

-------
Singpurwalla,  N.  D.,  1972:  Extreme  values  from  a lognormal  law with
     applications to air pollution problems. Technometrics.  T4: 3.
Weibull, W., 1951: A distribution function of wide applicability. J. Appl. Mech.
     293-297.
Zimmer  C.  E., and Larsen, R. I., 1965: Calculating air quality and its control. J.
     Air Pollution Control Association. 15: 565-572.
 DISCUSSION
J. Arvesen: Regarding your model itself, the box model that you applied to the
San Francisco data, how do you go  about estimating the parameters  in that
model to fit that data? Is there a problem involved with that? It would seem to
be a problem to me. There seemed to be a lot of parameters in there and I was
wondering how  you can  estimate them reasonably well on 2 days data. Am I
missing something?
Knox: Let  me  see if I  can  answer the question. The predicted  frequency
distribution  for  the 48-hour test period was generated from the predicted 48
1-hour average CO  concentrations for San  Francisco receptor from the model.
This distribution was compared  to the actual data from San Francisco—the 48
average hourly values at the sampling station. And so the frequency associated
with the highest CO value corresponds to 1 in 48. There is an interesting aspect
of this: the obvious question is how do we know that the model has an averaging
time that is  appropriate to be compared to the average hourly data. If one looks
at the boxes used, they are "T" shaped, "l_" shaped, or any arbitrary shape that
fits the area roughness or source strength. Their average*dimension divided by
wind  velocity  is about an hour, so that the travel  time across the boxes is
comparable  to sampling period. If we had used 5-minute integrations, then the
comparison to actual data should be performed with 5-minute average CO data.
                                                                    9-17

-------

-------
    10. AIR QUALITY FREQUENCY DISTRIBUTIONS FROM
  DISPERSION MODELS COMPARED WITH MEASUREMENTS
                        D.BRUCE TURNER*

                 Environmental Protection Agency
              National Environmental Research Center
                      Division of Meteorology
               Research Triangle Park, North Carolina
Introduction

    Cumulative  frequency distributions (hereinafter abbreviated CFD)  of air
quality can be estimated by dispersion models. By comparison with CFD'sfrom
air quality measurements at the same location, some indication of the accuracy
of these estimates can be made.  Extremes  of the estimated CFD for specific
locations can be compared with air quality standards. Not only can estimates be
made for existing pollution sources, but  projected  estimates can be made for
expected degrees of control  of  existing sources and  inclusion of additional
sources. These projected  estimates  can  also be compared with air quality
standards.
    It is the purpose of this paper to present CFD's estimated from short-term
dispersion models and  determined from measurements for the same locations,
periods  of record  and averaging  times, and to compare these, especially the
maximum value, to indicate the accuracy of the estimates.
Background

    National ambient air  quality  standards have  been set in response to the
Clean Air Act. In most cases the standards consist of a long-term average, usually
the annual average, and a  short-term standard, such as a maximum 24-hour or
3-hour concentration not to be exceeded more than once per year. For existing
sources, it  is possible to monitor ambient  air quality  at selected sites to
determine if air quality standards  are met at these locations. Due to the  small
*0n Assignment from the National Oceanic and Atmospheric Administration, Department
 of Commerce

                                10-1

-------
number of monitoring stations, it is highly likely that maximum concentrations
occur that exceed those measured at these stations.
    It is desirable to supplement present air quality measurements by estimating
concentrations at additional locations. It is also desirable to estimate projected
ambient air quality  at a number of locations for proposed source configurations
including both dditional sources and various assumptions as to degree of control.
These  estimates can also be compared with  air quality standards. Air quality
dispersion models have been developed to meet this need.
    Long-term or climatological models have been used to estimate mean annual
concentrations at specific locations. These models typically require mean annual
emission rates from point and area sources and joint frequency distributions of
wind direction, wind speed, and stability.  The relative accuracy of these models
is discussed elsewhere (Turner, Zimmerman and Busse (1972)). Summarizing this
paper, comparison  of model  estimates with  measurements at a number of
sampling locations  indicates that the ratio of root mean  square error to the
measured  mean  for all stations is typically from 0.3 to 0.5. This indicates that
annual  means can  be estimated quite well.  These  estimated means can be
compared with the standards for the annual mean.
    Dispersion models that calculate concentrations for averaging times of 1 to
2  hours can  be  used to make estimates  for comparison with  short-term
standards. Calculations can be made for each hour of the period  of record and,
in  addition to determining  the  extreme'Concentration occuring once during the
period, a frequency  distribution of  concentrations can be obtained. Hourly
concentrations  can also  be averaged  for  any  longer  averaging  time, such as
24 hours, and a frequency distribution determined for this longer averaging time.
These  short-term dispersion models require both meteorological and emission
information. Meteorological information typically consists of;(a) wind speed and
direction  or wind flow fields, and (b) atmospheric stability class and mixing
height or  temperature variation  with  height.  Emission  information typically
consists of emission rates for both significant point sources and  all other sources
considered collectively as area sources. To be realistic the variations in emissions
from  season to  season, weekday to weekend and for  various times of the day
should  be included. It has  been the  experience of  the author  that this
information is difficult to obtain and  also difficult to organize into a convenient
form. Stack parameter data are usually included for the point sources in order to
calculate plume rise. Because of inclusion of most emissions near the ground into
area sources, the resulting concentration estimates represent concentrations
averaged over an area the size of the smallest area source, usually 1 km2. On the
other hand, air quality measurements represent the concentration at the specific
point  of measurement and are therefore particularly sensitive to any nearby
sources. Validation of dispersion models in  urban  areas is therefore difficult,
since it is necessary to compare the point measurements with estimates that are
more representative of an area.
 10-2

-------
 Frequency Distributions From Dispersion Models

     Fortak  (1970) and  Koch and  Thayer (1972) have  estimated CFD's for
 locations in urban areas from short-term dispersion models. Both used Gaussian
 plume models, making separate calculations for point and area sources.
     Fortak had 30-minute measurements of sulfur dioxide for four locations in
 the city of  Bremen, Germany. He made estimates using  short-term dispersion
 models for the same locations and averaging times and determined the frequency
 distributions over various periods.  The following  results are for the heating
 period (20 September 1967 - 31 May 1968). At two stations estimates are higher
 than  measurements for corresponding percentiles throughout the  distribution.
 At another station,  estimates  are less  than measurements over the entire
 distribution. For the remaining station, estimates are higher than measurements
 except beyond the 99.6 percentile of the CFD where estimates are too low. At
 the extreme end  of the  CFD, at the 99.5 percentile, Fortak's estimates for all
 four  stations are well  within a  factor of 2  of the measurements. The worst
 estimate is off by a factor of 1.7.
    Koch  and Thayer  (1972) of Geomet, working on a contract for EPA, also
 used  a short-term dispersion model to estimate  1-hour  concentrations for 8
 locations in Chicago for a 1-month period,  (January 1967), and to estimate
 2-hour concentrations  for  10  locations in   St.  Louis for a 3-month  period
 (December 1964 - February 1965). CFD's were determined from these estimates
 and compared with CFD's from measurements  at the same locations.
    In Chicago, the model underestimates concentrations  for the entire CFD at
 one  of the  stations. Four stations  have concentrations overestimated for  the
 entire CFD. At one of  the stations, concentrations are overestimated at the low
 end of the CFD with slight underestimates past the 55 percentile. The other two
 stations have concentrations  underestimated  at the low end of the CFD and
 overestimated beyond the 65 percentile for one station and beyond about the 90
 percentile for the other. Only one station has an estimate at the 90 percentile off
 by more than a factor of 2. The  error at this station is a factor of 2.8. For these
 CFD's the 90  percentile  is  the highest cumulative frequency for which data is
 presented.
    In St.  Louis,  the  model underestimates concentrations for the entire CFD
 for five of the stations. One  station has concentrations overestimated for the
 entire distribution.  At  the  other  four stations concentrations are generally
 underestimated, but are  overestimated at the top end of the CFD  with  the
cross-over ranging from the 55 percentile to the 90 percentile. Only one station
has an estimate at the 90 percentile off  by more than a  factor of 2 from  the
 measurement at the same point in the  CFD. The error at this station is a factor
of 2.6.
                                                                    10-3

-------
24-Hour Frequency Distributions

    The  author, using a short-term Gaussian plume model similar to that used
by Koch and Thayer (1972), calculated 2-hour concentrations for 40 locations
in St. Louis. Measurements of 24-hour sulfur dioxide concentrations were made
at  these  stations  during  89 consecutive days  in  December,  1964 through
February, 1965. Estimates of 24-hour concentrations at these stations were
made by averaging  12 successive 2-hour  estimates.  Frequency distributions of
24-hour  concentrations for the period  were  determined for both  estimated
concentrations and for measured concentrations for all 40 stations.
     Because of the interest in the extreme end of the CFD (at the frequency of
the  air  quality standards), the  extreme  estimated  value  and  the extreme
measured value were compared. These are near the 99 percentile for these three
months of 24-hour concentrations.  The ratio of calculated concentration to
observed concentration was determined for each station. These ratios for the
extreme, arranged  in  ascending order,  are 0.63, 0.70, 0.78, 0.81, 0.82, 0.84,
0.84, 0.87, 0.88, 0.88,0.90, 0.97, 1.07, 1.07, 1.09, 1.10, 1.11, 1.20, 1.23, 1.24,
1.30, 1.32, 1.42, 1.42, 1.44, 1.45, 1.45, 1.46, 1.59, 1.63, 1.65, 1.67, 1.69, 1.79,
1.90, 2.05, 2.23, 2.34, 2.35, 2.37.
     Note that  35 of the 40 stations have estimated  extreme values within a
factor of 2 of the measured  extreme (ratio  between 0.5  and 2.0). Also 15
stations have errors of less than or equal to ±20%.
     Examples of agreement of estimates  from the model and measurements at
the extreme (around the 99 percentile) are shown  in Figures 1  through 3. Station
4 (Fig. 1) has the  best agreement (a ratio of 0.97). Station 23 (Fig. 2) has the
highest overestimate  (off by a factor  of 2.37). Station 27 (Fig. 3)  has the
greatest underestimate (a ratio of 0.63).
     The  comparison  of  the  CFD's for the  40  locations is characterized
subjectively as  follows:  At  ten  stations  the CFD's for  estimates  and
measurements are  close. At ten stations overestimates occur throughout the
entire CFD. At three  stations  underestimates  occur throughout the entire
distribution. At four stations overestimates occur primarily, but underestimates
occur at the higher percentiles (beyond the 88, 93,  95, and 96 percentiles). At
10  stations both underestimates  and overestimates occur, with overestimates
beyond the crossover points of 7,  10,  25, 25, 25,  25,  30,  40, 83, and 90
percentiles.  At two stations, although both underestimates and overestimates
occured, the comparison could be described as mostly underestimates. At one
station underestimation occured except at each end of the distribution.
     Other visual comparisons of the estimated and measured CFD's for 24-hour
concentrations  are  given  in Figures 4 and 5. Station 8 (Fig. 4) has the  best
agreement between estimates and  measurements  over the whole CFD and has a
ratio of  1.07 at the extreme. Station 23 (Fig. 2), discussed previously, has the
 10-4

-------
worst overestimate irregardlessof place in the distribution, with the estimate four
times the measurement at the  7 percentile. Station  38 (Fig. 5) has the worst
underestimate with an estimated 2 and a measured 46 at the 4 percentile, off by
a factor of 23. This is probably because low levels of background concentration
exist  due to  emissions  from  distant  sources that  are not included  in  the
calculations made by the model.
    It is also desirable to  consider if the CFD's appear to be lognormal (straight
lines on  log-probability plots),  particularly in  view  of  the frequent use of the
Larsen statistical model (Larsen  (1971)) to estimate extremes of concentrations
in urban areas.  It  appears that there  is  some deviation  from the lognormal
distribution  in the  figures previously discussed, especially Stations 27  (Fig. 3)
and 38 (Fig. 5). Stations 28 (Fig. 6) and 16 (Fig. 7) seem to have two different
slopes  in their distributions  of measured  concentrations, with  the  transition
taking place in the vicinity of the 50 percentile. Stations 2 (Fig. 8) and 6 (Fig. 9)
have a sudden transition to higher measured concentrations around the 95 to 97
percentiles. Station 19 (Fig. 10)  has  two portions  of the  CFD  of measured
concentrations with the same slope but with a displacement occurring near the
50 percentile. For the most  part,  measured concentration CFD's appear to be
near lognormal. Although  many of the CFD's from estimated concentrations are
also nearly lognormal, some of them appear to deviate more than those of the
measurements and to have an  "S" shape such as station 28 (Fig. 6).
Two-Hour Frequency Distributions

    At 10 of the 40 measurement stations in St. Louis, 2-hour measurements of
sulfur  dioxide  were  also  made.  At  these  10  locations,  estimates  and
measurements were used to determine CFD's for 2-hour concentrations over the
89 day period (December 1964 - February 1965).
    For each  station, the extreme estimated value and the extreme measured
value  were compared. Since the data period consisted of 12 periods per day for
89 days, the extreme represents a frequency near the 99.9 percentile. The ratio
of calculated concentration to observed concentration was determined for each
station. These ratios for the extreme, arranged in ascending order, are 0.52, 0.66,
0.74,  0.75, 1.12, 1.47,  1.51, 1.60, 1.76,  1.88. All 10 of the stations have the
estimated extreme value within a factor of  2 of the measured extreme (ratio
between 0.5 and 2.0).
    A selected number of these 2-hour CFD's are shown in Figures 11 through
13. Station 17 (Fig. 11) has the best agreement at the 99.9 percentile. Station 36
(Fig. 12) has the highest overestimate at the 99.9 percentile  (off by a factor of
1.88). Station 10 (Fig. 13) has the greatest underestimate at the 99.9  percentile
(a ratio of 0.52).
    The comparison of  the 2-hour CFD's for the ten  locations is characterized
subjectively  as follows:  Two stations  (4  and  12)  have cumulative frequency
                                                                     10-5

-------
distributions close to those  of the measurements. At two  stations  (3 and 23)
overestimates of concentration occur for the entire CFD with the largest errors
less  than a  factor  of  3.  At  two  stations  (10 and  33)  underestimates of
concentration occur for the  entire distribution with errors as large as a factor of
4. At three of  the stations  (17, 28  and 36)  concentrations are overcalculated
beyond  the  following  percentiles:  99, 56,  and 63.  At one  station (15)
concentrations are undercalculated beyond the 45 percentile.
    Other visual comparisons of the  estimated and measured CFD's for 2-hour
concentrations  are given in  Figures 14 through 16. Station 4 (Fig. 14) has the
best  agreement  between  estimates  and  measurements  over the  whole
distribution. Stations 23 and 28 (Fig. 15 and  16)  have poor agreement between
estimates and  measurements throughout  most of the  CFD.  At  station  23
concentrations  are primarily overestimated. At station 28 concentrations  are
underestimated at low percentiles and overestimated at high percentiles.
    The two  measured CFD's that are least lognormal occur at stations 33 and
36. Station  33 (Fig.  17)  appears  to  have  two slopes,  and  at the highest
concentration   (greater  than  99.8 percentile)  there  is  a  sudden increase  in
concentration.  Station 36 (Fig.  12) also seems to have two different slopes with
the transition occurring around the 70 percentile. For estimated CFD's, station
28, 33, and 36 (Figs. 16, 17, and 12) appear to be least lognormal.
Discussion

    These  CFD's from measured air quality data and from dispersion model
estimates have been determined  for  averaging times from  30 minutes to 24
hours, for  periods of record  from 1 month to a heating season. These are all
for locations within urban areas. These cannot be compared  directly to present
U. S. air quality standards since  the  standards specify periods of record of  1
year.  However, it is quite likely that during the heating season in Bremen, and
during December through  February  in St. Louis,  the  highest sulfur dioxide
concentration of the  year occurs, due to the number of space heating sources
that produce sulfur dioxide. Concentrations with the extreme frequency of once
per year should  be expected to vary considerably from year to year, due to the
high  variability  of  occurrence of stagnant or  other  special meteorological
conditions  that cause the extreme.
    The number of stations with  extreme estimates from the dispersion models
within  a   factor of  2  of the extreme  measurements  for the investigators
mentioned  herein are summarized  in Table  I.
    Dr. Frank Pasquill  (1971)  in his presidential address delivered before the
Royal Meteorological  Society on April 21, 1971 stated, "The agreement as close
as  20  or   30  percent  which  may  be  achievable in  the most  favorable
circumstances for a long-term multi-station average, is obviously unattainable in
respect of an individual value even when this is averaged over an hour or so. In
 10-6

-------
this  case  the  only prospect of useful  prediction lies in the statistics of the
cumulative frequency distribution of a large number of such values, and it would
appear. . . that prediction of the rather extreme high concentrations encountered
only occasionally may be achievable with an error factor of about two."
    Since most of the extreme value estimates are within a factor of two of the
extreme measurements,  these results  are  in  agreement  with  Dr. Pasquill's
statement.
    It should  be pointed out that these model results mentioned here contain
both overestimates and underestimates so that no constant correction factor can
be used to bring the estimate of these extremes more in line with the measured
extremes. Errors  in both directions with regard to emissions and small sources
near the receptor probably account for a large proportion  of the differences.
Keep in mind  that model estimates representative of areas a square kilometer or
larger are being compared with point measurements.
    There  are many  other comparisons  and  statistical  tests than  can  be
performed with  these  CFD's in addition to the consideration of the extremes
and the rather cursory examination of the lognormality of them. Some of the
possibilities for further examination of this data follow: Perform statistical tests
to determine how close the given CFD's are to lognormal. Determine standard
geometric  deviations  (slope  of  distribution)  from  two percentiles  in  the
distribution and see how these vary with location in the urban area. From the
40-station sampling network determine  measured and estimated concentration
patterns at various percentile levels. Determine what meteorological conditions
cause  the  extreme  value  estimated concentrations and  the extreme  value
measured concentrations at each station.
Conclusions

    Gaussian  plume  dispersion  models  for  urban areas produce  CFD's  at
individual  sampling  locations  similar  to the distributions determined  from
measurements.  These distributions subjectively  appear  similar to lognormal
distributions. The maximum 24-hour concentration  estimated during an 89-day
period was within a factor of 2 of the measured maximum at 35 of 40 sampling
stations in St. Louis, Missouri. The maximum 2-hour concentration estimated
during the same 89-day  period was within a factor of 2 of the measured 2-hour
maximum  at  all   10  sampling  stations, having  2-hour  measurements
available.  Estimates of air quality concentration at  a downwind receptor for a
given  hour from a point source are generally regarded as accurate only within a
factor of  2  because of uncertainties in  estimates of emission  rate, turbulence
structure, plume height, wind direction and wind speed. It is encouraging to find
similar accuracies  for the extreme value  (99  percentile for 24-hour,  99.9
percentile for 2-hour)  estimates for  urban locations influenced  by  multiple
sources.  (Note that the maximum estimate may  be calculated for a different
                                                                     10-7

-------
2-hour period than the period that has the maximum measured concentration.)
This gives  somewhat  increased  confidence  to  the  air  pollution
meteorologist asked to estimate urban air quality concentrations to be compared
with standards. One must keep in mind that good estimates of concentrations
from dispersion models can only  result from  good emission estimates and
reliable measurements of meteorological parameters.

   Acknowledgements
    The author wishes to thank Adrian D. Busse for his development some years
ago  of a  computer program to  routinely  produce a  cumulative  frequency
distribution from a time series of data. Dale H. Coventry for programming the
computer-plotter routine to produce log-probability plots, Ralph I.  Larsen for
suggesting the  preparation of this paper,  and  Lea Prince for her valuable
assistance.

                                  Table I
         Number of stations with extreme estimates from models within
           a factor of two of extreme measurements, and worst error.
Investigator
City
Averaging Time
Extreme Percentile
Within a factor of 2
Worst error, a factor of:
Fortak
Bremen
30-min.
99.5
4 of 4
1.7
Koch and Thayer
Chicago
1-hour
90
7 of 8
2.8
St. Louis
2-hour
90
9 of 10
2.6
Turner
St. Louis
2-hour
99.9
10of 10
1.9
St. Louis
24-hour
99
35 of 40
2.4
              FREQUENCY DISTRIBUTION ST. LOUIS 24-HOUR S02. DEC64-FEB6S STflTION  4

              Figure 10-1. Best agreement at the 99th percentile.
 10-8

-------
                         •Q.O  To.o io.a no,o «0ip  »
 J.t  0 K  1.0  1.0    ( G   10.0   M-0  90.0 40 0 tfl-0 M.O 10.0  M.O   M 0   >».0   M^ M^



1-REQUE.Nrr  DISTRIBUTION ST. LOUIS  24-HOUR  S02,  OECS4-FtB6&   STflTION   23
   Figure 10-2.  Highest overestimate at the 99th percentile.
M.*  Mife Mil t*i*   tk 0  !• 0    •• 0 T«.i  M • 
-------
                                 « JO..O 40.0 M 0 M.O T».fl  N 0   M.C  N 0
                                                                 fl w o M.I Mint
                   DISTRIBUTION ST. LOUIS 24-HOUR S02.  DEr64-FtB6fa  STATION   8
Figure  10-4. Best agreement over the whole cumulative frequency distribution.
           cz  o.i 10  t.o   ir  ia o   no  9004orHOMOToa  M •   M>:O  «b •   NJ M^I ••.•
          PREOUENCY  DISTRIBUTION ST. LOUIS 24-HOUR  S02- DEC64-FtB6b  STATION   38
                       Figure 10-5. Greatest underestimate.
10-10

-------
"*4
0»-

o: *°"
t—
x_
0
i*
w E
E"

o
a;
0 jg.
E
-•b
oj S


0 „

o
tt
g-b
M *.
«H
O uj.
0


t>
w





















• N





















• N





















f "


















'


r **















M 8 <





» 0















XX)
x si





« M















f




.














g
r


















f
-^~
Wf
I

















-ft
r
ji
r


















^
ffM




















/*











!_U








V*
^
j



















^*»°°
^*
--^X


















^
,e*

xx"


















1 0 *

X X
























































Li





















fM





















                                                        M.O M.O M.B M.t ft.t
FREQUENCY  DISTRIBUTION ST. LOUIS 24-HOUR 302.  DEC64-FEB65   STRTION   28

        Figure 10-6. Example of two different slopes.
    0.1 i.v f.o   i.o  i«.«   tg^ M.a ««.• M.B M • TO   NO   00.0   ooo   004 00:0 oo^o oo o o
FREOUENCr DISTRIBUTION ST. LOUIS 24-HOUR S02. DEC64-FEB6S  STflTION    16
        Figure 10-7.  Example of two different slopes.
                                                                    10-11

-------
           e'.t  a'.a i-«  > o   a'.a  >a.a


          FREQUENCY  DISTRIBUTION ST.  LOUIS 24-HOUR S02.  DEC64-FEB65  STRTION   2
 Figure 10-8. Example of sudden transition to higher measured concentrations.
      o ».

      Z m-
      o
           at  aaTa  te    • o  IBB   taa  M o «a a ao a ae.e TB.O  aa-e   ta-8   ta.p   M.a «»-a aa.a  aa.a «a.a


          FREQUENCY  DISTRIBUTION ST.  LOUIS 24-HOUR  S02.  OEC64 FEB65  STHTION   6
  Figure 10-9. Example of sudden transition to higher measured concentrations.
10-12

-------

a: "'
h~
J_
0
OCtt)
X *>-
£


t>









































































O

















o
o
X
















Oft
0 u
x-^
















bi^
o
xxxxx

















^
w"**

















-^.0
T8"


















^
P5


















^



















s











0 iO






_.w
ss











C 1C







oo^""











0 1





m»*
N * * x












a i





e e
x x












,0 1




a














..0 0



















fl 0



















ft 0



















        O'.l  01  10  I 0    SO  10 0   200 300*00 MO 000 10.0  M 0   BO .0  K.O   M 0 M.O M.i  N •
       FREQUENCY  DISTRIBUTION ST. LOUIS 21-HOUR  S02.  DEC64-FEB65   STflTION   19

Figure 10-10. Example of sudden  transition (both portions have same slope).
•b

h-
uJ
0 tT

tt
CD


(O f~-

Q

O
CE
* &

CJ


•b















.



















xx'
ea<


















w X

e
e

















, x

b *
















	 r


_,
e *
i
















:ss£

^
«e
















vX'
. « *

• e
w

















x«»^
n^



















x-^
jit*



















K-^
J*








































^
•**



















xx^x'
e«°*











0 10






n'""
keo0"












e »j





wX
ji®













0 t






»"













0 1




F-M—
1 w















0 0,




i_*L















• a




















i »,•




















         O.t  C.I  1.0  1.9   k.O  10.0   MC  X.O  400 U.O 00.0 700 OC 0   td.0  %.0   0*0 004 010  <• • 00 0
        hSfcOUENCf DISTRIBUTION ST. LOUIS  2-HOUR S02. DEC64-FEPGb   STflTION   17
             Figure 10-11. Best agreement at the 99.9 percentile.
                                                                           10-13

-------
•b
10-
V w-
¥
X.
t_
GD^D
5?
X. *°"
(T -o-
O
Oi
O p^.
r:
-•b
0 0>
VI f"'
o ^.,

o
cr
ft
if, *
Z  —

















£

















ee
«e
»»x
:x















e
k8"
«TTX
















r* 	

tx x




































K

















W1













ft 0



4














            O.t  C'f 1.0 1-0   • C  10*0   M>0  !IO.O «-0 M.B M-O iO-9  MO   MO   MO  M.O M-0 M I  M4 M^


           FREQUENCY DISTRIBUTION  ST. LOUIS  2-HOUR S02. DEC64-FEB6b  STflT;oN   36
            Figure 10-12. Highest overestimates at the 99.9 percentile.
       3 —
       t-i 0*4
       V. <0-

       10
       O ^


       Z n-
            o'.t  o> i.o to   i'.c  in.c   »:»  « o ».o 10 o 10 o ™ o  «o o   MO  010  oo-i  ftJ ooo  NO ooo


           FREQUENCY DISTRIBUTION ST.  LOUIS  2-HOUR S02. DEC6*-FEB65  STRTION   10
            Figure 10-13. Greatest underestimate at the 99.9 percentile.
10-14

-------

(0-
CK W'
*"~ «
n
u
CQ*b
Pfr
£«,-

0
or
0 csj-
z:
-•fa
c^ "5:
to f--
.
o ^

0
a
£°
£*
u ^.
Z to-
CS io-
0 'JJ


*b
M




















• M




















I M














X X
0





0 M














I °





0 M













k * *
, 0 °'





0 M












x*k"
n=^






0 W










w **'
, X* 1
o







0 70









^xjf
00









0 M








^











0 SO








^











0 *0








•^











o W








^











0 20







l**8*












C 10






.***«
,8*e












0 B





n**i-














0 ^




















.0 )




pr~















.0 0




















M




















t c



*
















       FREQUENCY DISTRIBUTION ST. LOUIS  2-HOUR S02.  OEC64-FtP6b  STRTION   4

Figure 10-14. Best agreement over whole cumulative frequency distribution.
•b

m
a: "*"

c
u
|t
(0 -
= »

0
cc
U p,,.
XI
-fe
Sff
U_ u>-
o ^.

o
cc
£-b
P
8^


•b
M




















• M














































































ri




















-^


















e'
«e«xJ
x^

















i
n"
•~
_M*
XX"

















5/"

rf**1


















^
-^
»^*



















^
_«^


















^
^^



















^
,xX

















_ 1
l«e°

^x*













^


••
^^

^M
















> w


, X

















—

"•

















•
k x



















*• '
X X '



















'














                            MC  M.B «^ N.C M.O
       FREQUENCr DISTRIBUTION ST. LOUIS 2-HCUR S02. DEC64 -FEB6S  STfiTION   23
          Figure 10-15. Poor agreement, primarily overestimation.
                                                                       10-15

-------

a: *'
*~ «•
n
o
ZD — i


O
or
Sfr
CO r-
O ^.

0
d
sl
Z o-
O uj.



^
M






































.1 C



















• 1



















0 2







































.0 10














xx«"


















¥

M~

















y^
e
















^>
X* |
*
















^f
o
9



































.
£
















i
• °
*
xX"X










1* *l





fr

¥xxX











r* *(




^ 	 — r
j|
, X












r*



,















r* *i



a B
X M














r* *i



• e















•t Cf.



•















         (•REQUENCr  DISTRIBUTION ST. LOUIS  2-HOUR S02. OEC64-FEB6S  STflTION   28
        Figure 10-16. Poor agreement, underestimates and overestimates.
      v|
      N »
      O A
      in r-
              c.i  ii 20   i.e  ic c   n-o  10.0 «.c n.o M.C io.a
                                                          «4   M4 M.8
          FREQUENCY DISTRIBUTION ST. LOJIS  2-HOUR S02 .  Dtf64-FEB65
                                                                        33
 Figure 10-17. Example of sudden transition  to higher measured concentrations.
10-16

-------
References

Fortak, G., 1970: Numerical simulation of temporal and spatial distributions of
     urban  air  pollution  concentration.  Proceedings  of Symposium  on
     Multiple-Source  Urban  Diffusion Models.  EPA, Air  Pollution Control
     Office Publication No. AP-86.
Koch, R. C., and Thayer, S.  D., 1972: Validation and sensitivity analysis of the
     Gaussian  plume multiple-source  urban diffusion model.  Final  Report
     prepared under Contract Number CPA 70-94, Geomet, Inc. EPA Office of
     Air Programs Publication No. APTD-0935.
Larsen, R. I., 1971: A mathematical model for relating air quality measurements
     to air quality standards. Environmental Protection Agency, Office of Air
     Programs Publication No. AP-89.
Pasquill,  F., 1971: Atmospheric dispersion of pollution. Quarterly J.  Royal
     Meteorol. Soc. 97: 369-395.
Turner, D. B., Zimmerman, J. R. and Busse, A. D., 1972: An evaluation of some
     climatological  dispersion models. Presented at 3rd meeting of Panel on
     Modeling  of  NATO  Committee on the Challenges  of Modern Society,
     Paris, France.
DISCUSSION

J. Visalli: It has been suggested in some papers that were presented earlier that,
particularly  for S02, the body is susceptible to even shorter-term fluctuations
than  2 hours;  I'm  talking about fluctuations more  on the order of 2 to 4
minutes. I was wondering how good you feel your model would be in predicting
variations for this short of a time interval.
Turner:  I   would  like  to  have  good representative 2-  to 4-minute
meteorological  information  and good  emission information that includes  the
variability of all sources with  the time interval of 2  to 4 minutes,  in order to
attempt such short-term concentration estimates.
Singer: One comment touched on a bit before by Don Pack and also questioned
by Frank Gifford, and  a slight comment  which you made at the end of your
paper, pleased me. Everyone has been dealing with numbers and just using them
blindly without making any  comment about  the accuracy of  the data  and
bringing in the statistics. The source term, Q, many times is out by an  order of
magnitude when you actually look  at the  data itself. The meteorology may be
out by a factor of 2. When you start verifying it and looking at  the SO2 data,
which can also easily be out. I  would like to see  someone say what  is the
accuracy of the data and try to bring  that into the statistics, the meteorology
                                                                  10-17

-------
and also the final  verification.  You said you were out by 20, this could easily
just simply  be an instrument  error.  But I  would like to see this  aspect  of
statistics.  I mean,  Don asked that question this morning—can you bring in the
error into the analysis or can you verify it. I  have always heard the answer; yes
(it can  be done) but I have never seen anyone do it.
Turner: With regard  to the accuracy of the data  Irv,  I have to give the people
credit  who did the sampling in  St. Louis for  their care to obtain good data. For
24-hour samples they did have replicate bubblers side by side. Air was drawn
through them by the same pump but with two different critical  orifices, one for
each of the samplers. These duplicate samples were compared in the laboratory.
I  forget the exact numbers, but on the order of 3 to 5 percent of the  total
number of  samples  deviated by more than 10  percent. Most of these  were
thrown out for the reason that they did not duplicate within 10%.
Singer: I knew your data was  good,  but it was just  a general warning to the
statisticians who take the number that we provide and blindly use it. We know
better  in that respect.
J. Rustagi:  I also  notice  this kind of behavior is lognorrnal.  Actually, there  are
outlier-prone distributions. I don't know whether lognorrnal is one of them but
Professor  Neyman  has  given  a detailed  analysis of  outlier  proneness  of
distributions in a  symposium which was conducted at Columbus in  1971.*The
gamma distribution is one of them. This is one approach which could be taken.
Secondly, as has been mentioned before, that if the concentration is  too low or
too high we have different kinds of errors of measurement. Suppose that you use
the same model as lognorrnal and you have the variance dependent on the mean.
In my data it was noticed that at low levels the errors were proportional to that
of the mean. So if you put in the model the variance as a certain function of the
mean,  the estimate of mean and so on can be correspondingly calculated.
Helmut Lieth: I can verify your statement for the low values from our analysis
of the national  air pollution  network data, but what do  you  do with the
variability in the high levels?
Rustagi: As you  said if  there are different kinds of behavior at lower  end and
upper  end one could put the variance as a quadratic function of the mean, cubic
function  of the   mean,  some  other function of  the mean,  or some other
complicated function. What I mean is that the variance should  reflect the error
in measurement as noticed by instrumentalists.
Lieth: Yes, but there is a problem. There is a logical difference in the production
of the high values and subsequently their variability, and the low values. It  is a
factual problem  as  well that  you have more of  some kind of pollutant at a
certain weather condition. So this is not plainly a  statistical deviance. How can
you get this logical difference out with a straightforward statistical method?
D. McNeil:  We attempted to look at that problem with some data in New Jersey
and the point of  view that we  adopted was to try and find a transformation of
the  pollutant concentration which would make the variance constant. In fact,
that was how we arrived at the fourth root transformation. In doing so we rather
*J. S. Hustagi (Editor), Symposium on Optimizing Methods in Statistics, Academic Press,
 New York, N. Y., 1971.

10-18

-------
fortuitously found it made the mean value of the increment in the level linear as
the function of time. That would be one way of solving that problem. You can
do that .  . .  you  might find you need a different transformation depending on
the weather conditions, but we did it just by lumping all the values for one year
together.
Don Pack: I  think we could belabor this to death, but I  would like to point out
one thing that when one is dealing in trying to identify extreme values you want
the longest  possible period of record. I did  a little arithmetic back there.  New
York is generating around three million two  hundred thousand estimates of an
individual pollutant for about ten  pollutants. Alright, you've got thirty million
values,  very attractive  to people  with  large computers.  However,  the error
function in  a real measurement system is not stationary.  It trends with time.
Initially, and I point specifically to Mr. Turner's  data—research data carefully
controlled with operators who were dedicated to producing the best possible
information. On the other hand, the kind of data  that is becoming available to us
in the many  urban  areas of order 10 to 20 cities are not of that kind at all. The
technicians  may be v,ery devoted initially and the equipment will  be new.  But
with time everything deteriorates  so that we would have error functions such
that as the length of your record increases, errors also increase. The only point
that I am  trying to make is that deductions on the kinds of distributions can be
markedly  affected by the character of data.
L.  Crow:  In studying extreme values'and  measurements of particulates in  a
natural background in Wyoming, some important meteorological influences have
been noted.  Natural dust produces the very high extremes due to high winds, but
the high winds are not neatly distributed throughout this year or any other year.
The extremely low  values are affected by precipitation. Wind blown dust can be
locked-in  during  winter by  heavy  snow cover.  Is there ;a way  that we as
meteorologists and  statisticians can treat these extrerhe  ends using real  data
instead of some arbitrary formula? Can  we bring about an  adjustment for the
extremes  that  do  occur if we add the  meteorological parameters to actual
instances of  extreme data?
Rustagi: There would be a way of mixing distributions if we know enough about
the distribution at the  other  extremes. There are procedures available  for
estimating parameters with mixtures and  you can put the distributions in two
different tails with corresponding probabilities—that would be one possible way.
Lieth:  I think we have listed, in the paper that we handed out here a little while
ago, the  new program  NONREG, which   probably  solves  that  problem
mathematically for  you. NONREG is in a package available here on the UNC
campus.
Court: \ can't help being impressed by the great similarity of the problems being
discussed  today and those with which we have been dealing for many years in
the field of hydro-meteorology-such problems as the inaccuracies of rain gages,
the non-normality of rainfall,  the  various procedures such as cube  root and
fourth  root  to obtain homoskedacicity  for regressions and many other similar
relations.
                                                                   10-19

-------
Neustadter:  I would desire to talk with anyone who might be able to help me
with a problem that we have almost at hand in our program.  I mentioned very
passingly at  the beginning of my presentation that we have hundreds of samples
which  we're subjecting to analysis. These are samples collected on high-vols on
high quality analysis paper, and we are doing a lot of analysis. Essentially what
we are going to end up with  is a set of hundred's of items each characterized by
ten's of parameters. We have been  looking for techniques and so far we don't
know that much about it, but pattern recognition seems to sound like the best
thing. The only thing we are aware of is one article from Livermore that seems
to  indicate that pattern recognition  is now coming into the field of chemistry
and handling multiple parameter chemical reactions and phase changes.
10-20

-------
      11.  FOURIER ANALYSIS OF Al R MONITORING DATA
                      BERNARD E. SALTZMAN

                Department of Environmental Health
                        University of Cincinnati
                           Cincinnati, Ohio
    The proliferation of pollutant monitoring activities is now providing massive
amounts  of data.  Effective utilization  of  this information requires  proper
analysis. This may  be as costly as the collection of the data. A major problem
has been to obtain  the "signal" from the data in the presence of overwhelming
amounts of "noise" produced by  environmental fluctuations. Computers have
been  utilized to provide  information on the  statistical distributions  of the
numbers. Tabulations also have been presented of data averaged by time of day
and/or by  season (NAPCA (1969)). The purpose of this paper is to explore the
application of another technique,  Fourier analysis of data, which  offers the
promise of extracting significant new types of information.
    In Figure 1, a plot is given of monitoring data for particulate  lead in air
(Cholak, et al. (1968)). This was obtained by continuously sampling outside air
from  the second floor window at the Kettering Laboratory, Cincinnati, Ohio,
through 4-inch membrane filters. The  filters were  changed  on  Mondays,
Wednesdays and Fridays; thus the  lead analyses represented values averaged for
2- or 3-day-periods. Application of the usual statistical calculations provided the
following  information: mean  1.07 jug/m3, standard deviation  0.55 jig/m3,
geometric  mean  0.95  jug/m3,  standard  geometric deviation  1.60. A plot  of
cumulative and  differential  frequency distributions is  given in Figure 2, which
shows a  tendency   to  a lognormal  distribution.  What  other  significant
information can be  extracted from this data?
    Examination of Figure 1  indicates  irregular fluctuations with time.  These
can be regarded as  analogous to colored light, comprised of the sum  of a mean
value  and of a series of fluctuations of differing periods, amplitudes and phases.
In the case of a mixture of colors of visible light, resolution into a spectrum can
be obtained by the means of a spectroscope. In the case of sound or radio wave
mixtures,  tuned circuits  can be utilized  to obtain the  spectra. Recent
developments in computer science now make practical Fourier analysis of data,
                                  11-1

-------
which  is  the  equivalent of  a  spectroscope in  providing  the spectra  of
fluctuations. A good  explanation of  this technique  for  chemists  has been
presented by Horlick  (1972). In order to explore these possibilities, computer
programs were prepared utilizing a Wang computer and  plotter, which was
available and convenient for  program development. Table I  provides a summary
of the programs that were developed.
 Explanation of Program

     Data for this program  should consist of a series of values at uniform time
 intervals. The  time units are usually hours or days. Provisions are made for
 missing data. The fluctuations are resolved as the sum of a series of sine and
 cosine waves of different amplitudes and periods. Thirty-six periods are used
 covering 7 octaves (doublings) from 3 times to 384 times the time interval of the
 data; each octave is divided into 5 equal, logarithmically-spaced steps.
 Data Processing

     For each data point, the time from the middle of its interval to a selected
 initial  reference time and date (e.g., midnight on a Sunday) is calculated. This
 time is divided by the first period (3 time  units), and converted to a time
 phase angle; the date value is multiplied by the sine of the angle and stored in
 one register, and by the cosine and stored in a second register. This calculation is
 repeated for successively longer periods (up to 384 time units), and the data
 stored in 70 other registers. About 10,000 computer steps are required for each
 data point. Each of the 72 data registers accumulates sine or cosine products for
 its assigned period. Mathematically, the calculations are as follows:
     For each value of data Xj taken  at time tj, a series of 36 calculations is made
 by assigning to an  index, i, consecutive integral values from 0 to 35:

                       Period, p.  =   3x2i/5
                                             360 t:
                   Sine term, SM  =   X-. sin	L
                              IJ         J       PJ
                                              360 t:
                 Cosine term, GJ:  =   Xj cos
 The  36 pairs of  registers are each  assigned  to a specific period, PJ.  They
 accumulate the corresponding sums 2:Sj: and 2:Cj: for all data points.
11-2

-------
 Data Printout

     In the data printout, the final time, t, from the reference time to the end of
 the  last data value is given. The number of time units of data, d, is tabulated.
 The mean value is calculated as follows:

                              X  =  2Xj/d

     Rather than presenting the sine and  cosine results separately, a  clearer
 picture is obtained  by combining them into a vector sum and a phase angle. The
 latter is combined with  the period  to calculate the first peak time after  the
 selected reference time.
     For each period tabulated, the results are calculated as follows:
 amplitude,
 The peak time, tjf (past the reference time) is calculated as follows:
 peak degrees,                            .  ^
                                           Zsjj
                       Q\  = arc tan
peak time,                     _  Q.

                         tj =  ["seo"

     If the fluctuations are in phase with the cosine wave (peak at reference
time), the resulting angle, and peak time are zero.

Data Plotting

     In both types of data plots, the horizontal scale is a logarithmic scale of
periods. The initial point represents 3 time units, and each inch represents 1
octave (doubling of period). In the amplitude plot, the vertical scale above the
origin is a linear scale, on which  5 inches is equal  to  the amplitude range
selected. The plotted points present the spectrum of fluctuation intensities. In
the  peak  time  plot, the vertical scale below the origin is a linear time scale
beginning at the time selected. Each 0.1 inch represents 6 time units. The scale
can be marked off in appropriate divisions, e.g., days of the week or months of
the  year.  The plotted points  represent the first  peak time and  6  consecutive
                                                                     11-3

-------
subsequent ones. For periods exceeding 64 time units, fewer peaks are plotted
because the maximum ordinate is 384 time units, or 6.4 inches downward.
Results

    Table II illustrates the data printout of the computer program. This program
required that  the data  values be entered as integers.  The values, which  were
expressed in  micrograms per cubic  meter  to  the nearest hundredth,  were
therefore multiplied by 100 before entry. The first column gives the selected
values  of  period,  which   are  the same for  all data  processing.  Each  is
approximately  1.149  times  the previous one. Each fifth line represents exactly
1  octave, or a  doubling  of the period.  It will  be noted  that  due to slight
inaccuracies in  the computer,  the  values  for  48, 192, and 384 are listed
respectively  as 47.999, 191.999, and 383.999. Amplitude values are given in the
second  column.  It can be  seen that there are  several peak values. The  third
column presents the peak times as phase angles.  Since  these are not convenient
to visualize, the fourth column  presents the times for the  first  peak past the
selected reference time. In this case, the units are days after January 1, 1967. -
    A clearer visualization of  these  results can be seen  from the plot of the data
in the first  and second columns of  Table  II, shown in Figure 3. Surprisingly,
there  is a dip  in the  amplitude  line at a  period  of 7  days, although there are
peaks at 3 1/2, 6, and 8 days. There are also successive amplitude peaks at the
following multiples of 8 days: 2, 4, 6,  8, 12. Since the Fourier calculations are
not accurate unless the  data cover a time interval of at least  4 periods, the
plotted values for periods exceeding 100 days cannot be considered as accurate.
These data represent 364 days of measurement.
    The computer output also includes phase information. Figure 4 is a plot of
values in the first and fourth columns of Table II. The vertical scale downward
represents a linear peak time scale, which is marked off in months of the  year.
The horizontal scale  is a logarithmic representation of cycle periods, identical
with that of the horizontal scale in  Figure 3. To understand the significance of
this plot, one may visualize  a straight line vertically downward  for the period 96
days.  It can be seen from column 4, Table II that the  first peak time occurs 51
days after January  1st (or February 21st). This is indicated in  Figure 4 by  a dot
towards the bottom of the box representing February.  There are successive dots
vertically  downward  for each  96 days  thereafter.  Thus  the  dots and the
connecting lines represent the times during the year when each cycle maximum
occurs. If Figure 4  is  viewed vertically  below Figure 3, peak times are shown  in
correspondence  with each  amplitude  value  plotted  in  Figure 3.  To avoid
crowding on the left side of the figure, for each period no  more than 7 peaks
are plotted. In this plot the vertical time scale begins  at January 1, 1967. The
computer program  also permits starting this vertical scale at  any desired time
after the reference  time.  This is  the equi"~!ent of  shifting the  plotted lines and
11-4

-------
the time scale vertically  upward and viewing any selected lower portion. The
lower end can be understood to extend to infinite time.
    In the preceding  discussion  it was indicated that data for many cycles of
period were required for accurate  results.  Figure 5 presents Fourier spectral
amplitude data for the  3-month period of October-December,  1968 for total
hydrocarbons in Cincinnati. The data for this and the two following figures were
hourly-averaged  values  reported  (NAPCA  (1969))  by the Continuous  Air
Monitoring Program  of  the  National  Air  Pollution  Control Administration.
Surprisingly, again there is no peak at 7 days. Major amplitude peaks can be seen
at 12 hrs., 18 hrs., and 1,31/2,6,  8, and 12 days. Similar plots were made for
the hydrocarbon  data for  each  individual month  of  October,  November and
December. The patterns  of amplitude peaks showed a similarity  although their
proportions were altered for the different months. The combined  data for  the
3 months eliminated  some  of the erroneous high peaks  that were obtained  for
periods  exceeding 1  week. As the  amount of data  increases,  the sharpness
increases of the "tuning" of the calculations for each period, and  some of  the
peaks are reduced.
    Figure  6   shows the   Fourier  amplitude spectrum  for  sulfur  dioxide
concentrations in Cincinnati,  for the month of October,  1968. Amplitude peaks
are evident at periods of 12 and 18 hours, and 1,4 1/2, 6, and  8  days. If this
figure is compared with  the hydrocarbon results in Figure 5, it can be seen that
there is a remarkable similarity, even though these pollutants come from entirely
different sources. This suggests that the atmospheric dispersion processes, which
are similar for  both pollutants, exert the major controlling role in  determining
the  atmospheric  levels  of  these  pollutants.  Figure  3  also shows  peaks at
corresponding  periods. All of these figures show a dip at 7 days  and peaks at 6
and 8 days. They all show evidences of peaks at 3 1/2 days.
    Figure 7 shows the phase results for the sulfur dioxide data. The downward
time scale in this case is from  0 to 16 days.  The days of the week are indicated
on it. The computer program can view any portion of these results, which can be
assumed to extend downwards to infinity.


Discussion

     The significance  of the  Fourier spectra presented  will become clearer after
more types of data from  more locations are analyzed. Interesting possibilities are
opened up by this technique.  Common factors operating to determine pollutant
levels should become  evident by amplitude spectra peaks in alignment. Differing
periods indicate differing sources of variation. The Fourier analysis technique
also offers a   means of correcting  the data for  the  incomplete  response
characteristics of the sampling methods or of the instrumentation. It has been
shown (Saltzman (1970); Schnelle and Neeley (1972); Horlich (1972)) that the
resultant data  include distortions because of failure to respond to rapid changes.
                                                                    11-5

-------
If the transient and frequency responses are known, they can be incorporated in
the Fourier computer program. This may permit recalculation of the data to
correct for the distortions and  more closely approximate the actual levels in the
atmosphere.
    The  calculations described above were carried  out on a small computing
system  which  was  readily accessible  and convenient to  rapidly develop  a
program.  The  Wang  system requires  approximately  1  millisecond  for  each
step. Approximately  10,000 steps were required for the calculations on  each
data point. Thus  the calculations for  hourly data  for  1-month  period  (720
data points) required  100 minutes of computer  time. A program is being
developed for an IBM S/360/65 computer. Preliminary  results are in agreement
with those already presented. The IBM  computer, of course, has a much greater
capacity, and can calculate for  more intervals of period, allowing finer detail or
greater range.  Calculating time  was found to be 500 times as fast  as that of the
Wang system.  Future work should show whether  results in  other cities are
parallel to those in Cincinnati.
Acknowledgement

    This work was supported in part by the Center for the Study of the Human
Environment, under U. S. Public Health Service Grant ES00159, and in part by
the Environmental Protection Agency Grant R800869.
                                   Table I
                       Summary of Fourier Programs for
                         Wang Model 700B Computer
                    With Model 702 Plotting Output Writer
      Name of Program    No. Blocks
Total No.
 of Steps
Functions
 3-Digit Data Recording
 Fourier Analysis of Data
 Recording and Retrieval      1
   of Fourier Data
  718      Records,  edits,  and  retrieves
           3-digit numbers on magnetic tape
           cassette;  48  numbers  in each
           block, 100 blocks on each side of
           cassette.
  911      Retrieves  3-digit  numbers from
           tape cassette, performs  Fourier
           calculations,  tabulates and plots
           results.
  352      Records on magnetic tape cassette
           and  retrieves  contents   of  74
           Fourier   calculation   registers
           before  they'are altered by data
           printout  and  plotting.  Permits
           adding  and subtracting of blocks
           of data.
 11-6

-------
               Table II
       Data Printout of Program
  Cincinnati, Lead in Air (/ug/m3 x 100}
    2nd Floor, Kettering Laboratory
         1/13/67 to 12/31/67
Reference Time: Sunday, January 1,1967
Mean
Period
3.000
3.446
3.958
4.547
5.223
6.000
6.892
7.917
9.094
10.446
12.000
13.784
15.834
18.188
20.893
24.000
27.568
31 .668
36.377
41.786
47.999
55.137
63.336
72.754
83.572
96.000
110.275
126.672
145.508
167.145
191.999
220.550
253.345
291.017
'334.291
383.999
106.812, Data Time
Amplitude
1.666
2.550
2.000
3.789
5.270
7.042
3.955
10.084
4.754
5.065
8.175
8.230
12.176
2.727
8.819
7.131
9.846
15.490
3.478
9.145
15.615
10.204
23.046
6.397
9.391
12.771
10.484
18.866
28.376
15.357
16.456
33.781
38.536
32.181
28.957
44.370
352.00, Final Time
Peak, Degrees
65.082
181.068
58.181
282.193
141.800
90.440
31 3.694
263.186
91 .834
239.137
242.803
152.877
132.093
341 .883
265.902
330.826
68.748
45.089
56.746
58.009
156.919
84.015
170.716
299.028
98.225
190.476
121.677
156.635
103.599
67.377
120.906
92.264
48.657
355.468
281.102
216.058
364.00
Peak Time
.542
1.733
.639
3.564
2.057
1.507
6.005
5.787
2.319
6.939
8.093
5.853
5.809
17.273
15.432
22.055
5.264
3.966
5.734
6.733
20.922
12.867
30.034
60.432
22.802
50.793
37.272
55.115
41.874
31 .282
64.483
56.524
34.241
287.354
261 .027
230.462
                                                      11-7

-------
   2.5
 ^2.0
 | 1.9
 5 ..0
 a
   0.5
JAN.
FEB.
MAR.
                                    APR.       MAY
                                        TIME
JUNE
                                          JULY
                              ,2.56
          AUG.
Figure  11-1.  Concentrations of  lead  in air sampled  from  the second  floor
     window at the Kettering Laboratory, Cincinnati, for the period January 1
     to December 31, 1967.

              100
              80
            UJ
              20
                                         X of Time
                                       Equal  or Greater
                 0.5
                   1.0
                   LEAD
                   1.5
                                                      2.0
       2.5
Figure  11-2.  Differential  and  cumulative  frequency  distributions  of the
      concentrations of lead in air shown in Figure 1.
11-8

-------
    0.5 r
    0.4
    0.3
 o

 E!02
    0.1
       3   4   6    8  10    15  20   3O  40   60  8OIOO   150200  300400
                                 PERIOD (Days)

Figure 11-3. Fourier amplitude spectrum of the concentrations of lead in  air
     shown in Figure 1.

                                 PERIOD (Days)
        34    68  10       20904000  80100  ISO 200  300400
          ii  t  i  i i  i i	i	i   i  iii  •  i i  i i I	i	i   i  i  • i
Figure 11-4. Fourier peak time data for concentrations of lead in air shown in
      Figure 1.
                                                                     11-9

-------
   0.8 r
   0.6
  I
 gO.4
   0.2
     0 -
                                          _L
                            I  i
               6        12
                  (Hours)
 I         2

PERIOD
   4
(Days)
8
16
Figure  11-5.  Fourier  amplitude  spectrum  of concentrations of  total
      hydrocarbons in Cincinnati for Oct.-Dec., 1968 (as reported by CAMP).
   10 i-
              6        12
                (Hours)
                                PERIOD
Figure 11-6. Fourier amplitude spectrum for concentrations of sulfur dioxide in
     air in Cincinnati for October 1968 (as reported by CAMP).
11-10

-------
                                 PERIOD
      M

      T
      W
      T
      F
      S
    Ul
      T
      F
      S
      S
      M
      T
                   (Hours)
                 6       12
(Days)
 4
8
16
Figure 11-7.  Fourier peak  time data for  concentrations of sulfur dioxide  in
Cincinnati for October,  1968.


References

Cholak, J.,  Schafer, L. J., and Yeager, D.,  1968:  The  air transport of lead
      compounds present in automobile exhaust gases.'Amer. Industrial Hygiene
      Assoc. J. 29: 562-8.
Horlick, G.,1971:  Fourier  transform approaches to spectroscopy. Anal. Chem.
      43: 61A-66A, July.
Horlick, G.,  1972: Digital data handling of  spectra utilizing  Fourier trans-
      formations. Anal.  Chem. 44:943-7.
NAPCA, 1969: 1968 Data  Tabulations and Summaries, Cincinnati, Continuous
      Air Monitoring Projects. National Air Pollution Control Administration,
      Raleigh, N. C., Publication No. APTD 69-16.
Saltzman, B. E., 1970: Significance of sampling time in air monitoring. J. Air
      Pollution Control Association. 20: 660-5.
Schnelle, K. B., Jr., and Neeley, R. D., 1972:  Transient and frequency response
      of air monitors. J. Air Pollution Control Association.  22: 551-5.
                                                                   11-11

-------
DISCUSSION
Marcus: I  think this approach of time series analysis of pollutant concentration
data is absolutely essential. One thing about focusing on periodicities, is that it is
another way  of  looking  at  or trying  to  look at some  of  the fundamental
mechanisms that produce these diurnal  patterns. Have you tried a logarithmic
transformation of the concentrations?  To  use the concentrations as they  are
gives you a Fourier analysis of the  . . .
Saltzman: The virtue of using a linear transformation is that is simplifies adding
all the components.  One can then add  all  the amplitudes and reconstruct  the
original data.  I  don't  see any  advantage to converting to logarithms. It just
complicates everything. You can get an exact representation with  the linear data
input.
Marcus: Have you done any  statistical  analysis to test the significance or  the
reality of the existence of these peaks?
Saltzman: No. What you  see are preliminary results. As a matter of  fact I want
to mention how the calculations were made. You may laugh,  but this was done
on a Wang computer with a plotter output. To run one month's data required 7
million steps. It took 100 minutes  on the Wang to execute, but it was convenient
for me. We are now putting it on  the IBM/65 which is about  500 times as fast.
We hope  to  get  this program  going  by about March. What  you see now  are
preliminary results. I can say that we are getting spectra and that they do persist
for data time periods as long as 3 months.
Marcus: One advantage perhaps of going  into a transformation  of the data would
be  to reduce  the  concentration observations  to a somewhat more nearly
Gaussian-distributed form and then you could .. .
Saltzman: This procedure has nothing to do with statistics. This is an analytical
representation. I am  not talking about probabilities here.  This  is an  exact
representation. If you use linear terms, you can add and subtract everything.
Marcus: We have an exact representation of intrinsically noisy data and perhaps
just transforming that, and trying to again  extract a  signal,  we can get rid of
some of the uncertainties that are  built into the observations at the beginning.
Benarie:  I  am very  impressed  with this spectral representation of air pollution
data.  It is a great idea.  Some amplitudes  can be caused by meteorological factors
or by human  activity. For  instance, the  24-hour peak  amplitude is almost
certainly produced by the early  morning inversions. It appears  in the Fourier
spectrum of every pollutant. The human activity has a weekly  cycle, so it can be
easily recognized in  the amplitudes,  even if the data are not  for a long period.
Secondly, such analysis gives us immediately the  answer, following  Shannon's
communication  theory on what  should be the  sampling  frequency of  the
apparatus.  It should be 2N if the highest frequency is N. So we don't have to ask
any more if the best sampling time is 5  minutes or 30 minutes or 24 hours. We
get the answer out of these spectra.
11-12

-------
    For the lead, you took 2- and 3-day data. As the concentrations are related
to sampling time, an  artificial sampling component which is not real has been
introduced.  All sampling times should be either uniform or random, but not 2
and 3.
Saltzman: In this particular case  each data item was weighted for the length of
its sampling period  for calculations. Now with  regard to the proper sampling
time, in my paper published in the October, 1970 issue of the Journal of the Air
Pollution Control Association, the viewpoint was that if we are interested in the
effects on the body, then we are only interested in the frequencies that the body
can see.  For contaminants with  a long biological half-life such as lead (which
could  be  several  years),  high   frequency  fluctuations are attenuated and
determining the high frequency components is a  waste of money. So we would
only sample to determine the significant frequencies remaining after attenuation
by the biological  window through which the body views the data  if this is the
purpose of sampling.
                                                                    11-13

-------

-------
     12. THE PREDICTION OF HIGH CONCENTRATIONS OF
    SULFUR DIOXIDE IN LONDON AND MANCHESTER AIR
              F. BARRY SMITH AND G. H. JEFFREY

          Meteorological Office, Brae knell, United Kingdom

 Introduction

    High concentrations of sulfur dioxide  (SO2)  in the atmosphere can cause
considerable upset to  people  with  bronchial troubles,  particularly  if the
concentrations are maintained over a period  of days. One of the most unpleasant
results of  the famous London smog of 1952 was the very high mortality rate
caused by bronchitis and other related ailments during  the subsequent week;
overall it was estimated the smog caused between  3,500  and 4,000 deaths (see
"Air Pollution and Health",  1970).  Hospital  places  were  also  in tremendous
demand by less seriously affected sufferers.
    Quoting from the same  source,  absenteeism  tends to rise  rapidly among
London  factory  and  office workers whenever  the  daily  average  S02
concentration exceeds 250 jug/m3 (not a particularly high value in London) and,
in Salford, absenteeism  is twice that  of the daily average amongst all workers
when  the  concentration reaches 1000 /ig/m3 (a rather more exceptional level).
    Even the most cursory investigation of  weather conditions on days of high
S02 concentration reveals that cold and  relatively  calm  days in winter are
frequently the most  dangerous. The Meteorological Office in  London was
therefore  asked over 8  years ago to provide a  forecasting service of  those
meteorological  conditions which  were  likely  to lead  to significantly  high
concentrations and a subsequent demand on hospital beds. The criterion chosen
was for a concentration of  at least  1000  jug/m3. During the 1952 smog the
maximum  daily average  SO2 concentration  over 10 sampling stations was
approximately  2000 M9/m3; however  the effects of the Clean Air Act are such
that this is about double the greatest SO2 concentration experienced from 1968
to  1970  inclusive when averaged  over  4  stations,  with  a  typical  mean
somewhat above the Inner London  average. In the original scheme developed  to
meet this demand, the meteorological conditions which were expected to lead to
critical concentrations were as follows:

    (a) an expectation of less than 2/8th of cloud, or of sky obscured by fog at
18Z, OOZand06Z.
                                12-1

-------
    (b) an expectation of a mean of surface wind speeds at 18Z, OOZ and 06Z of
less than 3 knots, the actual speed at each of these hours being less than 5 knots,

     (c) an instability index S = (2TX - 3Tn - 12) > 0, where Tx is the highest
temperature expected at midnight at Urawley at any level up to and including
900 mb (but excluding the surface) and  Tn is the forecast minimum temperature
at Heathrow (temperatures are in °C). If (a), (b)  and (c) are satisfied, a forecast
of high  pollution is issued, but if in  (c)  we only have —3 < S < 0 then a more
cautious forecast is made.

    The London Weather Centre, which is responsible for making the forecasts,
has felt that a  re-appraisal of the scheme is called  for, partially because the
scheme did  not appear to be highly successful and partially because the  Clean
Air Act has reduced the overall SO2 low-level emission rates.
    With  growing  concern  all  over  the  world  over the  state  of  urban
environments, many alternative forecasting schemes  have  been developed, and
several of these are 'reported  in the literature. These generally fall into one of
three groups.
    (a) Numerical  models. Whenever source  distributions are reasonably well
known  both in time  and space, the  equations  of  diffusion  may be used to
calculate  spatial distributions of pollution, provided the wind and  turbulence
characteristics can  be adequately prescribed  and predicted. Such  calculations
require considerable computer facilities, and can  only be meaningful on a scale
that is large compared with the typical  distance separating those  sources which
are not individually represented but are merged into area sources.
    (b) Physical models. Detailed models  of  urban  areas  have been created in
large wind tunnels and the dispersion  of pollution  emitted in life-like manner
from one or  more  sources studied. The advantage of this  system  is that the
proposed addition  of a new  major  source  into  an urban  environment can be
studied fairly realistically,  even when  the local  topography is quite complex.
Perhaps their chief disadvantage lies in the difficulty in simulating the wide range
of meteorological parameters that affect dispersion:  low-level inversions, fogs,
solar radiation, wind direction and so on. Their use is therefore more in the  urban
planning field than in routine day-by-day predictive work.
    (c) Empirical models. The scheme outlined  above is one such model. The
physics of the whole dispersion process only enters in at a comparatively  low
level, but the scheme does have the advantage that it is based on real data taken
in real situations. Considering the very  considerable complexity of the problem
in an  urban environment, the empirical  approach  may  be the only  really
practical  one  on  a  day-by-day  basis  whenever a sufficient body of data is
available  for post-facto analysis (say at least 2 years of measurements of SO2
and the weather). Since such  measurements are readily available in London, our
revised  scheme described in  later sections is also of this type.
12-2

-------
The Measurements of Sulfur Dioxide in London

    Inner  (central)  London  as defined by  Weatherley and  Gooriah (1970)
comprises an area 30 km by 20 km encompassing Hendon in the NW, Dagenham
in the NE, Sidcup in the  SE  and  Wimbledon in  the SW. Within this area the
National Survey sampling network has nearly one hundred sites in  operation (the
exact number varies  between  90 and 100 from year to  year.) The area contains
industry, scattered mainly  around the River Thames and along the Lee Valley, as
well as housing and commerce regions with substantial  fuel consumption.  Parks
and comparatively low density housing areas (less than  5000 people per square
km) are also present, so that the source distribution and the actual concentration
distribution are far from simple (see  Figs.  1, 2 and 8). Inspection of the Figures
shows  that the  correspondence  between  source  and  concentration,  as
represented, is not particularly strong on a scale of 1  or 2 kilometers, but is
much  better on  a  scale of 5 to  10 kilometers. This  perhaps  indicates that
individual  sites  may often be significantly influenced by one  or two  fairly
dominant local sources, and only  when the concentrations are averaged  over,
say, four or more sites do they begin to have an obvious meaning in relation  to
broad  area source-values.  Figure  1  shows population  density  and  the  main
industrial areas and comes from Weatherley and Gooriah. Figure 2 shows values
of the mean SO2 winter-values derived from the ten yearly values for each  Inner
London station in which the smoothed overall trend over the period  is linearly
extrapolated one year to  1969-70.  The mean for all stations is 231 jug/m3;
however the area-density  of stations is  not uniform and if isopleths of  mean
concentration are drawn (ignoring all the possible pitfalls in doing  this) the  mean
concentration determined on  an area basis is approximately 213 jug/m3. The
overall pattern appears to  change little from year-to-year but on  a shorter time
scale  significant changes from day-to-day  probably  occur due  to changes  in
source strength and wind direction. If Figure  2 is representative, concentrations
within Inner London vary from at least half, to  twice the area mean on any
occasion.   The  highest values are  in  Westminster,  where,  since  industrial
undertakings are few, road traffic and office-block central heating systems may
be the most significant polluters of  the urban environment.
    Figure 3 shows two concentration-direction roses, one for  Kensington (site
4),  the  other for Deptford (site 3).  The radius in any  direction  represents the
smoothed mean concentration, relative to the mean for  all conditions, when the
wind  is coming  from that direction.  An almost 3  to 1 variation  in  mean
concentration with wind direction is implied at both stations, and this appears  to
be fairly typical.
                                                                    12-3

-------
Meteorological Parameters for London

    The  analysis of S02 concentrations  at  a  rural site which preceded the
present London  analysis, revealed that day-to-day values depended significantly
upon the following parameters:
    (a) wind direction.  Effective source strengths  may vary sufficiently with
direction as exampled in the last section.
    (b) temperature. Source  strength in the  UK tends to be greater at lower
temperatures. Temperature is also correlated  with other meteorological factors
that influence the dispersion of the S02-
    (c) wind speed. Wind speed  affects  the stability of the atmosphere and
hence  the vertical  dispersion  of E>02. For a specified emission rate of SO2 the
concentration  immediately  downwind  of the  source tends to be  inversely
proportional  to  the wind speed.  It is probable that  when  ventilation by the
exterior  wind significantly affects offices and homes, the production of S02
increases, following the increase in compensatory heating. Some of these trends
are clearly in opposing directions and, at the  rural site investigated, were almost
self-cancel ing.  In  London itself  wind speed  appears  to  remain  important,
particularly  at light winds when accumulation of S02 within the same mass  of
air leads to the highest concentrations recorded.
    (d)  mixing  depth  or  stability. Dispersion  through the  vertical of  S02
depends  on the intensity of vertical turbulence. Quite frequently a layer near the
ground which is well mixed  by turbulence is "capped" by a thermal inversion
which  inhibits  further  spreading  of  the pollutant  to  greater heights.  The
pollutant is thus  trapped,  and  concentrations  tend  to  a   value  inversely
proportional to  the height of the inversion. At places well away from the major
source of pollution, the mixing depth is one  of the most important parameters,
since the approach  to uniform mixing below the inversion has time to take place.
Within London itself where the typical distance  between  source and receptor is
much  less, the  mixing depth ceases to have this importance, except when it is
very small. (See  (d)  below.)
    The  post-facto  meteorological data have been obtained from Kew records in
Parts I and II, and from London Airport in  Part III of the forecasting scheme.
    After consideration and  experiment  it seemed  that the most  relevant
parameters could be defined as follows:
    (a) wind direction. 10 meter wind directions, using the tabulated mean over
the preceding hour, averaged over 12 hours centered at 15Z during the day when
the concentration sample is started.  (National Survey  1-day samples start in the
morning  at an assumed time between 09Z and 10Z and finish 24 hours later). If
12-4

-------
the wind  direction varied by more than 60° during the period, the direction is
 described as "Variable" and treated as a separate category. Further if there are at
 least 3 hours of calm (wind speed effectively zero) during the period direction is
 described as "Calm" and treated as a further separate category.
     (b) temperature. In Parts I and  II of the forecasting scheme the minimum
 hourly temperature, during the period 10Z to 24Z on the day when the sample
 is started, is used. The reasons for this choice are:
        (i)  temperatures after midnight are not  expected to  be  very  relevant
     since  emission rates are then normally quite low.
        (ii) the minimum temperature is likely to be well-correlated with the
     overall coldness of the late afternoon and evening, and hence the domestic
     heating output.
 In Part III of the forecasting scheme the minimum temperature for  the whole
 24-hour period is used.
     (c) wind speed. Two wind speed parameters  are extracted. The first is the
number of hours when the hourly-mean wind speed (10-meter value) is 2 knots
or less (Parts I and II) or less than 5 knots (Part III). For simplicity we  call this
the number of hours of calm.  The second parameter is the mean wind speed for
the  full day on which the sample is started. Locally a  mean speed over  the
precise period of the sample should have been taken but  the sidereal-day mean
was  already tabulated and thus saved quite an amount of laborious computation
at the expense of some accuracy.
     (d) the  mean  reciprocal  mixing depth  (MRMD). The  London  analysis
indicates that only in situations with low mixing depths, did the MRMD  become
significant as a  predictor.  During the winter months either of the  following
criteria almost always are necessary and sufficient for a significant MRMD:
        (i) Surface inversion sets in before 18Z, and during the  day cloud height
    at or below 500m, or
        (ii) Surface inversion sets in between 18 and 21Z, and during the day in-
    version or cloud  height at or below 300m.
The rules for surface  inversions during the winter are:
        (i)  At  18Z, assume  a surface inversion unless wind speed > 8 kts or
    cloud amount > 5/8ths.
        (ii)  At 21Z  and 24Z, assume a surface inversion unless wind speed >
    8 kts or cloud amount 8/8ths.
The SO2-Concentration Data

    Ideally all sampling stations in the Inner London area should have been used
in the analysis. However certain factors weighed against this. For various reasons
not all stations maintain  a regular day-by-day sampling routine. Further it was
decided in this exploratory analysis to limit the amount of data to that which
                                                                     12-5

-------
could be handled and analyzed fairly easily using a desk electronic computer, the
Olivetti Programma 101.
    Consequently  4 stations with a good record  of completeness were selected,
and permission to  use their data was kindly granted by the Councils concerned.
These stations are:
    Kensington, Site 4
    City of London, Site 17
    Hackney, Site  2
    Deptford, Site 3.
    Mean concentrations for a particular day were evaluated whenever either
3  or 4 or tne stations gave readings. In the former case the mean was given the
appropriate weighting to balance the omission  of one of  the readings:
    Expected mean concentration when C4is  missing
                                      (m,+ m2+m»+mA \

                                        •M+Vm,   )
where C1 , C2 and C3 are the day's readings at the 3 given sites; m1 , m2 , m3 and
m4 are the long-term mean concentrations. For the 2  winter periods that are
studied  in detail in this analysis (the winter of 1968-69 and that of 1969-70)
they take the following values:
    m (Kensington) = 364/Lig/m3
    m (City of London) = 415Mg/m3
    m (Hackney) = 376 jt/g/m 3
    m (Deptford) = 253 /ug/m 3
Winter covers the months from October to March inclusive.
    No readings were taken on  Saturdays  or Sundays,  an
-------
at one site, and C  is a 1-day concentration at the same site) has a statistical
day-by-day distribution which is virtually the same irrespective of site, then

                            
-------
and

                        S2=(C2-C£)C2/C£                      (2)
Cm is calculated by forming the geometric mean of all the concentration values
in  the sample  and is a theoretically better estimate  of the parent population
median concentration than is C, the arithmetic mean, of the parent population
mean concentration. Similarly a is more reliable than s.
    Applied to the 73 data values involved in Figure 5:

          FROM THE DATA	CALC. FROM EQ. (1) and (2)
          Cm = 227.1  C  =238.6              C= 238.0
          a =0.313 s = 76.8                  s =  74.6

The  median evaluated by its fundamental definition, namely by the value which
equally divides the data points  (50% having a higher concentration and 50% a
lower) is Cm =  231 jug/m3. However this is a less accurate method of estimating
the parent population  Cm  from a sample on the assumption of a lognormal
distribution.
    The  close agreement between the calculated and derived values of C and s
strongly  support the lognormal hypothesis. The advantage of this hypothesis is
that  it enables us to  estimate  the likely area in  Inner  London in  which the
concentration  of SO2  may exceed some  defined critical  level at any time.
However the hypothesis must  remain  of doubtful validity "out on its tails", i.e.,
when the area  becomes smaller than  about 10 sq km, and too much reliance
should not be  placed on forecasts in these circumstances  without a much more
detailed investigation than is given here.
    One  final  point concerning  these statistics may be made. The geographical
distributions of
    (a) the mean concentrations for the winters under analysis, and
    (b) the number of days with concentrations exceeding 500jU9/m3 shown in
Figures 7 and  8,  are very similar. The following approximate  correspondence
apply:	
               Number of days with        Mean concentration
               C > 500 jug/m3 over         for the same two
               two winters                winters
0 	
10 	
25 	
50 	
100 ...

	 150
	 200
	 300
	 360
	 400

These  relations should  be  roughly  consistent  with  the  lognormal  time
distributions of concentration.
12-8

-------
 The Data for Manchester

    Sulfur dioxide data were obtained for the same 2 winters as for London
 from  7  regular  sampling  stations  in   the  National  Survey Network  in
 Manchester: Numbers 11, 13, 15, 16, 17, 18 and 19.
    Meteorological data came from Manchester Airport (Ringway) some 9 miles
 to the south of the city.
    Both sets of data were treated in the same way  as in the analysis of the
 London data and thus no further explanations will be given.

 The  Variation of Concentration with Meteorological Parameters
    The  previous  section  headed  "Meteorological parameters for  London"
 described the meteorological parameters that appeared to be significant.
    Table I gives the variation of  mean concentration,  averaged over the 4 sites,
 with wind direction.
      TABLE I. Variation of Mean Concentration With  Wind Direction
                Mean concentration.                    Mean concentration.
 Wind Direction        jug/m3         Wind Direction          Aig/m3
001-030
031-060
061-090
091-120
121-150
151-180
Variable
243
271
351
395
302
268
307
181-210
211-240
241-270
271-300
301-330
331-360
Calm
235
204
223
232
323
279
306
     Table II sets out in detail all the basic data, some of which has already been
defined in section "Meteorological parameters for London." The column headed
MRMD gives  the  mean reciprocal  mixing  depth  described as H  when it is
significantly important. The penultimate column represents the results of the
objective post-facto forecasting scheme (Part I).

    The forecast  scheme  was  developed  empirically by  considering the
concentration values and the appropriate meteorological parameters for the first
winter 1968-69. When applied to the second winter 1969-70, the scheme proved
to be equally successful without any further  modification or elaboration of the
rules. The rules may be stated quite simply as  follows.
The Forecasting Scheme: Part I

    (a)  A  concentration averaged over the  usual  24-hour period at the 4
stations:  Kensington 4, City of London 17,  Hackney 2, Deptford 3, will exceed
400 jug/m3 (or in the case of those wind directions which on average have low

                                                                    12-9

-------
S02 concentrations, a  normalized concentration exceeding 1.5), whenever at
least one of the following conditions is fulfilled:
         (i) the number of hours with mean wind speed lessor equal to 2 knots
    greater or equal to 8 hours (see column 5, Table II)
         (ii) minimum temperature (col 7) < 0°C, and at least 1 hour light winds
    (col 5)
         (iii) the MRMD (col 8) = H and minimum temperature (col 7) < 6°C,
    and mean wind (col 6) < 1 0 knots
         (iv) (3 tca(m -2Tmin)  is between 0 and  25 if c (previous day)  > 600
    /Ltg/m3, or between 1 0 and 35 if C (previous day) > 400 jug/m3
     (b) The concentration defined in (a) above will exceed 600 /zg/m3 whenever
         (i) the minimum temperature (col 7)  is less than 5°C; ^nd light winds
    (col 5) for 19 or more hours
         (ii) (3tca|m - 2Tmin) exceeds 25 if C (previous day) > 600 jUQ/m3, or
    exceeds 35 if C (previous day )> 400 jug/m3
    The results displayed  in Table  II  may be  summarized in  the following
 tables:

(A) Contingency Table for success in forecasting A*
    (i.e., London: either O 400 jug/m3, or normalised C > 1.5; Manchester O
    270A*g/m3)

               high concentration      lower concentration
forecast A forecast not A Total
.London M/C London M/C London
forecasting 55 49 106 144 161
success ~ 28% 21% 54% 61% 82%
forecasting 15 20 17 22 32
failure x 8% 9% 9% 9% 18%
Total 70 123 193
London 36% 64%
M/C 69 166
30% 70%
M/C
193
82%
42
18%


235

    Exactly equivalent information is included for the Manchester data, without
giving the basic data equivalent to Table II.
    In both cases when a forecast of high pollution is made, a success rate of
about 80% is achieved.
    Some important points must be made:
    (a)  The London threshold values  400 and  600 jug/m3 are  not universal
values. They are  only meaningful in so far as the source distribution and output
remains basically unaltered. While it is virtually impossible over a short period of
time to identify any such change, it is recommended that the two values be
12-10

-------
(B) Contingency Table for success in forecasting B*
    (i.e., C > 600M9/m3 for London; C ^ 450jug/m3 for Manchester)
                     very
               high concentration
lower concentration


forecasting
success
forecasting
failure x
Total

forecast
London
11
6%
0
0
11
6%
B
M/C
11
5%
8
3%
19
8%
forecast
London
182
93.5%
1
0.5%
182
94%
not B
M/C
212
90%
4
2%
216
92%
Total
London
192
99.5%
1
0.5%
193


M/C
223
95%
12
5%
235

* Percentages in general have been rounded to the nearest whole number.
suitably  modified if  necessary  once every 5 years in the light of the overall
changes in mean winter concentration at the 4 sites over the preceding 5 years.
    (b) No attempt has been made in  this analysis to relate the concentration
values to the effect on people's health and the likely demands on the facilities at
the two hospitals concerned. This is largely a medical problem and lies outside
our capabilities.
    (c) In the  previous scheme a forecast  had to be made before 1600Z of the
chances  of high pollution during the evening and night  that followed. We have
moved to a different problem, partially because our basic concentration data are
daily  mean values (rather than  hour-by-hour values)  and also  because  we feet
that the problem is not solely a night-time problem. At  night many people, and
particularly bronchitic  sufferers, are likely  to be in  the shelter of their own
homes where  they  can to some extent regulate the condition of the air they
breathe, whereas during the day they are more likely to  be out and about, being
affected by atmospheric concentrations of SO2 which  are not necessarily a great
deal less than  the evening concentration. Our aim has therefore been to forecast
the mean concentration for the  whole  24-hour  day. The forecast  of  the
meteorological  conditions  is therefore longer-range and to  that extent more
liable to error.
             The percentage of forecasts made by
    Forecast     the scheme that were correct
               London           Manchester
           The percentage of actual cases
          correctly forecast by the scheme
            London       Manchester
C>A 79
CB 100
C
-------
Ideally, then, a  forecast should  be made  in the early morning, before 1000Z,
of the  meteorological  conditions for the  next 24 hours and hence the likely
mean concentration. Since many of the criteria in the forecasting scheme relate
to evening conditions (and hence are not altogether different in intent from the
previous shceme), some revision  of the concentration forecast could be made as
late as 1600Z whenever this seemed called for.
     (d) The  results set out above refer to a post-facto application of the scheme
using meteorological data as it actually occurred. In day-by-day application  of
the scheme in the future these data will have to be forecast and this is bound  to
introduce further significant  errors.
     Some  of the parameters, such as the  minimum temperature and the cloud
amounts, are already  estimated on a  routine basis for other purposes.  The
criterion of the  number of hours when the mean wind falls to 2 knots or below
is probably the  hardest to estimate with any certainty, and for this reason Part
III explores the effect of a relaxation of this condition.

A scheme designed to forecast actual concentration values: Part 1 1

     This part is concerned with predicting actual concentration, as distinct from
forecasting whether or not certain  threshold values are exceeded. The variation
of concentration with the same meteorological  parameters that were successfully
used  in  the  "threshold" method was studied  for the 2 winter periods for the
London  data. The  following  fairly simple formula yielded reasonably
satisfactory estimates of the  average daily concentrations at the 4 sites:
where C = long term mean concentration
     Cest = estimated concentration (24-hour average)
     Cp  = concentration for the previous 24 hours
     T   = minimum temperature (°C) expected up to midnight
     t    = number of hours of mean wind less than 3 knots during the 24 hours
    2     2
a = — , b = ± if the mean wind for the day exceeds 6 knots
    3     9
    •3     Q
a = _, b = _ otherwise
    7    21
    5d  = 1, if the mean wind comes from the "dirty" sector, 060  to 120
           0, otherwise
    5m  = 1, if the mixing depth is low (as defined at the end of section on
           London)
           0, otherwise
12-12

-------
Although  the formula has been  verified only for the 4  sites, it is probably
equally applicable to any group of sites in  Inner London, and can be applied to
other cities pro'vided the appropriate value of  C, the mean concentration,  is
inserted. This has been done for the Manchester data.
    The formula can either be expressed graphically  as a nomogram (see Figure
9) or it can be programmed for a desk electronic calculator.
    The root-mean-square errors have been evaluated for the 4 sites  over the 2
winters. The significance  of the  errors have to be assessed  in relation to the
inherent "error" due to local quasi-random variations in concentration at the 4
sites, which can be estimated from the inter-site correlation studies described in
the section about SO2 concentration data. The  inherent error in  the 4-site
average concentration was shown to be about 50 /ig/m3.
    The root-mean-square error in the formula-estimates is only 66 jug/m3 (little
more than  the  inherent error) if the actual value  of Cp, the previous day's
concentration, is known and used, but rises to nearly 80 jug/m3  when Cp is only
known by the application of the formula using the actual meteorological data at
the end of the previous 24-hour day (see Figure 10). This is still a satisfactorily
small  margin of error when compared with the  inherent error, and is certainly
considerably less than that obtained using a persistence forecast  (142 jug/m3) i.e.
by using yesterday's concentration as an estimate for today's. If Et is the total
error, Ej is the inherent error, and Es is the basic error of the scheme, then
For the whole of Inner London  (nearly 100 sites), Ej will fall from 50jug/m3 to
about 10 /Ltg/m3. The expected value of Et would then be
                       E2=(80)2-(50)*+(IO)S
i.e. a little over 60
    The formula displays the relative importance of the basic parameters. It is
clear that an error of 4°C in  T, the minimum temperature, would introduce an
error in Cest of about only 50 jug/m3. A similar error would follow from an error
of 3 hours in  t, the hours of light winds, or of about 150 jUQ/m3 in Cp. The
method does not therefore demand impossible precision in evaluating the basic
meteorological parameters. Nevertheless if it has to be used at  an operational
office, such as the London Weather Centre, evaluation  of these parameters may
take rather more time than the fully occupied staff may wish to spend. The next
section  (Part  III)  describes  a simplified  scheme designed to  overcome this
difficulty.

                                                                    12-13

-------
The simplified forecasting scheme: Part III

    In  order  to help the operational forecaster by  minimizing the analysis
required in making a pollution forecast, an investigation has been made into the
effect on the accuracy of the scheme when:
    (a) London Airport meteorological data is used instead  of  Kew data for
London. Unlike Kew, London Airport data is  received by  London Weather
Centre on an hourly basis.
    (b) The "hours of calm" criterion is  relaxed to  include all hours when the
mean wind falls below 5 knots. This criterion should be much easier to forecast.
    (c) The minimum temperature up to midnight is replaced by the minimum
temperature over the 24 hour period of the forecast. The forecaster will already
have this temperature estimate for other reasons.
    (d) The effects of the daily mean wind speed and direction are ignored.
    The empirical  formula for forecasting concentration  in terms of the revised
parameters is
where 5m = 1 if MRMD = H, and is 0 otherwise
      T  = minimum temperature 09Z to 09Z
      t   = hours when mean wind falls below 5 kts
      C  = mean concentration
      Cp = yesterday's concentration
    The results using this scheme are summarized in Table III. As expected the
errors are somewhat bigger than in Part  II, but not appreciably so. It seems that
this very simple scheme  still gives a very  satisfactory means of forecasting
pollution.
     For many  cities,  including Manchester,  the  number  of  days  with  a low
mixing  depth (MRMD =  H)  is very  small  (only a  few days per year), and
experience may show that the factor  (1 +  5m/6) can  be then fairly  safely
ignored.
12-14

-------
TABLE II.  Basic Forecasting Data
Date
1968
Oct. 1
2
3
4
10
11
15
16
17
18
22
23
24
25
29
30
31
Nov. 1
5
6
7
8
12
13
Ca

127
104
121
198
172
123
182
174
161
314
493
548
480
331
253
197
204
202
418
378
311
283
546
506
ddb

260
240
240
260
220
240
260
230
260
250
140
360
050
070
190
230
180
200
030
070
070
060
150
110
C/c~c

0.57
0.51
0.59
0.89
0.84
0.60
0.82
0.85
0.72
1.41
1.63
1.96
1.77
0.94
1.08
0.96
0.76
0.86
1;72
1.08
0.89
1.04
1.81
1.28
'calm01

0
0
0
4
1
0
2
0
0
13
15
18
8
0
0
7
7
2
12
2
2
2
11
0
ve

13
13
14
9
8
13
8
12
9
7
5
3
6
10
12
12
8
12
11
9
14
11
6
10
W MRMD«

14 H
15
15
16
14
12
10
11
9
8
10
12
11
12
12
11 -
14
13
2
7 H
7
6
4
4
' F/Ch

=
=
=
=
=
5=
=
=
=
Ax
A=
A=
A=
=
as
=
=
=
A=
=
=
=
A=
	 x 	
Estimated C'

—
121
119
164
—
155
—
166
185
367
—
497
477
259
—
277
236
177
—
431
360
357
—
	 405 	
 aCorrected mean S02 
-------
TABLE II (continued). Basic Forecasting Data
Date










Dec.











14
15
19
20
21
22
26
27
28
29
3
4
5
6
10
11
12
13
17
18
24
31
Ca
431
394
284
425
382
317
225
250
585
529
892
443
303
481
618
374
469
864
419
300
278
570
ddb
100
090
340
320
150
160
190
190
120
030
080
160
150
140
030
050
340
060
220
180
270
330
C/cc
1.09
1.12
1.02
1.21
1.26
1.18
0.96
1.06
1.48
2.18
2.54
1.65
1.00
1.59
2.54
1.38
1.68
3.19
2.05
1.12
1.25
1.76
t d u e
Tcalm w
0
0
3
20
5
1
0
2
17
12
12
10
2
12
20
1
16
13
7
2
3
12
14
17
10
5
8
9
12
11
6
4
7
5
6
8
5
6
7
5
9
10
13
7
min
6
5
6
7
7
6
10
12
7
8
2
4
4
1
2
3
4
0
0
6
6
0
MRMD9 F/Ch
—
—
—
—
H
H
—
H
H
H
H
—
—
—
—
—
H
H
—
—
H
—
X
=
=
A=
=
Ax
=
=
A=
A=
B=
A=
=
A=
B=
=
A=
B=
A=
—
=
A=
Estimated C'
362
344
—
519
382
310
—
222
616
557
—
668
318
470
—
349
612
723
—
289
—
—
1969
Jan.


















Feb






1
2
3
7
8
9
10
14
15
16
17
21
22
23
24
28
29
30
31
. 4
5
6
7
11
12
13
677
565
499
500
346
435
421
321
271
287
248
290
319
351
281
274
353
341
197
449
670
706
329
635
318
233
340
270
330
090
150
190
120
190
190
170
270
150
190
230
240
200
270
270
240
340
280
280
240
320
V
360
2.43
2.53
1.54
1.42
1.14
1.85
1.06
1.36
1.15
1.07
1.11
0.96
1.36
1.72
1.38
1.16
1.58
1.53
0.96
1.61
2.89
3.04
1.61
1.96
1.03
0.83
17
2
14
0
0
10
0
0
0
3
1
0
0
8
13
0
9
1
0
4
17
16
0
5
0
0
7
8
8
10
14
12
8
16
13
10
13
8
12
10
5
12
8
8
14
14
6
4
6
9
11
14
2
0
5
1
6
2
2
8
4
4
3
6
10
11
10
7
3
4
7
-1
-1
4
2
-3
1
0
H
H
—
—
—
—
H
—
—
—
—
—
—
H
H
—
H
H
—
—
H
H
—
—
—
—
B=
A=
A=
X
=
A=
A=
=
=
=
=
=
=:
A=
Ax
=
A=
A=
=
A=
B=
B=
X
B=
=
=
537
493
546
—
271
443
477
—
261
296
280
—
191
357
428
—
461
330
232
—
733
783
362
—
361
304
12-16

-------
TABLE II (continued). Basic Forecasting Data
Date
14
18
19
20
21
25
26
27
28
Mar. 4
5
6
7
11
12
13
14
18
19
20
21
25
26
27
28
Oct. 1
2
3
7
8
9
10
14
15
16
17
21
22
23
24
28
29
30
31
Nov. 4
5
6
7
11
Ca
503
379
409
346
475
591
381
276
289
387
396
396
581
552
399
525
216
472
498
369
372
280
251
171
280
175
235
165
195
149
255
380
207
177
197
255
328
429
301
126
333
223
263
205
135
224
417
296
248
ddb C/Cc tcg|md
360
360
050
050
180
060
040
030
350
360
040
050
150
080
V
070
220
070
070
020
010
360
030
030
050
280
310
250
220
210
200
130
120
240
190
160
170
040
250
250
270
200
340
280
230
300
270
160
230
1.80
1.36
1.51
1.28
1.77
2.18
1.40
1.13
1.03
1.39
1.46
1.46
2.14
1.57
1.30
1.50
1.06
1.34
1.42
1.52
1.53
1.00
1.03
0.70
1.03
0.75
0.73
0.74
0.95
0.63
1.08
1.26
0.52
0.97
0.84
0.95
1.22
1.58
1.35
0.56
1.50
0.95
0.94
0.88
0.66
0.96
1.87
1.10
1.21
15
5
0
0
1
9
0
0
0
0
0
0
12
0
0
0
1
0
0
2
4
0
0
0
4
4
9
0
2
2
14
19
1
2
1
11
19
17
12
0
15
4
7
0
0
2
13
4
0
ve T . f
v 'mm
11
9
10
21
13
10
6
9
11
11
14
14
8
13
11
10
12
10
10
6
2
10
10
12
12
3
4
7
3
7
5
1
5
7
7
6
1
1
2
7
2
3
7
6
17
11
4
3
9
-1
-1
0
-1
2
6
4
0
2
3
0
1
-3
4
3
3
8
6
6
5
5
1
2
1
1
7
3
12
15
15
10
7
14
9
10
8
9
12
11
13
5
10
5
9
14
5
-2
5
8
MRMD9 F/Ch
H
-
—
H
—
H
—
—
—
—
—
H
—
—
—
—
—
H
H
—
—
—
—
—
—
—
—
H
H
—
—
—
—
—
—
H
—
—
H
—
—
—
H
—
—
H
—
—
-
A=
Ax
X
=
X
A=
=
=
=
=
=
=
A=
X
=
X
=
A=
A=
X
X
=
=
=
=
=
Ax
=
=
=
Ax
Ax
=
=
=
Ax
Ax
A=
Ax
=
A=
=
Ax
=
=
=
A=
=
=
Estimated C1
594
-
318
392
304
—
313
318
274
—
319
394
549
—
318
383
312
—
413
317
297
—
274
279
246
—
291
190
—
186
278
420
—
217
189
326
—
391
441
161
—
233
387
198
—
283
405
315
-
                                                                      12-17

-------
TABLE II (continued). Basic Forecasting Data
Date










Dec.














12
13
14
18
19
20
25
26
27
28
2
3
4
9
10
11
12
16
17
18
19
23
24
30
31
Ca
163
192
279
461
292
161
320
280
395
287
529
356
218
661
1161
632
357
304
421
411
463
320
318
428
265
ddb
180
220
210
290
210
230
320
310
320
240
230
230
280
330
V
170
180
260
340
060
020
290
230
090
040
C/cc
0.61
0.94
1.61
1.99
1.24
0.79
0.99
0.87
1.22
1.41
2.59
1.74
0.94
2.05
3.78
2.36
1.33
1.36
1.51
1.52
1.90
1.30
1.56
1.22
0.98
tcalm
2
0
16
9
0
0
0
0
1
0
1
0
0
6
24
13
2
0
7
0
10
0
0
0
0
ve
14
10
4
8
8
11
10
12
9
8
4
7
9
7
1
1
4
12
6
13
9
12
5
12
17
^min
12
6
0
-2
2
10
1
1
-1
5
2
5
4
1
3
0
6
3
-1
1
-2
6
4
0
0
MRMD9 F/Ch Estimated C'
	
—
—
H
—
—
—
—
H
—
H
—
—
—
H
H
—
—
H
—
—
H
H
—
-
=
=
A=
A=
=
=
=
=
Ax
=
A=
X
=
X
B=
B=
=
=
A=
X
A=
=
A=
X
=
194
212
402
_
312
192
—
294
374
264
—
288
268
—
981
1140
346
—
486
420
510
_
272
—
328
1970
Jan.
















Feb.





1
2
6
7
8
9
13
14
15
16
20
21
22
23
27
29
30
3
4
5
6
10
11
215
387
453
556
659
438
284
249
232
283
178
286
250
295
423
399
257
173
278
277
317
197
421
030
V
270
270
C
090
170
140
140
130
130
160
160
190
230
120
100
230
250
230
350
250
270
0.88
1.26
2.03
2.49
2.15
1.25
1.06
0.82
0.77
0.94
0.59
1.07
0.93
1.25
2.07
1.01
0.65
0.85
1.25
1.36
1.14
0.88
1.89
0
0
1
7
4
0
5
0
2
8
0
7
2
3
11
0
0
0
0
5
1
0
9
16
9
4
9
4
13
9
7
10
4
8
5
9
5
3
6
11
13
12
9
7
11
8
-1
-1
-4
-2
-5
-2
5
8
8
8
7
7
8
3
4
3
5
7
4
3
2
2
-4
—
—
H
H
H
—
H
—
—
—
—
H
—
H
H
—
—
—
—
—
—
—
-
=
=
A=
A=
B=
A=
Ax
=
=
=
=
=
ss
Ax
A=
=
=
=
=
=
=
=
A=
303
291
—
554
582
543
—
212
235
266
—
280
242
298
—
—
353
—
233
337
314
—
441
12-18

-------
TABLE II (continued). Basic Forecasting Data
Date
Ca   ddfc
C/c
                          calm
Tminf  MRMD9
F/Ch   Estimated C1
12
13
17
18
19
20
24
25
26
27
Mar. 3
4
5
6
10
11
12
13
17
18
19
20
24
26
505
318
366
253
243
158
247
269
307
239
262
354
340
447
683
495
444
429
231
240
189
212
282
314
V
030
300
270
230
220
250
290
330
350
320
300
V
320
V
210
200
070
280
240
270
270
V
010
1.67
1.31
1.58
1.13
1.19
0.77
1.11
1.16
0.95
0.86
0.81
1.52
1.11
1.38
2.22
2.11
1.89
1.22
0.99
1.18
0.85
0.95
0.92
1.29
3
2
6
0
0
0
0
2
5
0
0
0
0
0
12
1
3
0
0
0
0
0
3
8
4
16
8
13
9
12
8
9
5
9
10
8
11
7
3
4
8
8
7
11
13
11
5
7
-1 H
1
-3 H
3
4
8
6
3
1 -
2
-1
-1
0
-1
-1 H
1
3 H
2 	
7
10
5
7 H
6
-1
A=
=
A=
=
=
=
=
=
=
=
=
X
=
X
B=
A=
A=
X
=
=r
=
=
=
Ax
429
368
—
281
248
205
—
286
307
280
—
302
312
320
—
416
418
411
—
183
235
241
—
425
                                TABLE III
  Results of applying the Forecasting Schemes to London and Manchester data
                           Correlation between actual
                           and forecast concentrations
                                                  Standard error,
                                                     M9/m3
Detailed Scheme: Part II
Inherent error from taking
only 4 sites:
All data: actual cone, for
previous day known:
As above, excluding 11.12.69:
Previous day's concentration
replaced by mean C :
Previous day's cone, replaced
by f/c value for that day:
Persistence f/c: previous day's
cone, used as f/c for today:
London

0.94

0.87
0.90

0.80

0.85

0.55
Manchester London

50

0.80 76
66

95

79

142
Manchester

-

65
—

—

-

—
Simplified Scheme: Part III
All data: actual cone, for previous 0.81
day known:
As above, excluding 11.12.69:
Standard deviation of all the
observations:

0.84

-
0.79 88

83

150
66

-

106
                                                                      12-19

-------
Figure 12-1. Population density of  Inner  London in  thousands per square
      kilometer.
Figure 12-2. Mean winter concentrations for Inner London for 1969-70 based on
     extrapolation   from  data  of  previous  ten  years.  Sulfur  dioxide
     concentrations in jug/m3.
12-20

-------
                                            .0-5    \l     I -1-5
                                                    *    / '•
Figure  12-3.  Normalized  concentration-direction  roses  for  Kensington  &
      Deptford.
  I0
~ 9
X
5 8
•5
r
I 6
  4
li)
£3
                                 Distribution
Best fit
log-normal
ICO          200          300
        AVERAGE  CONCENTRATION
                                                     40O
500
Figure 12-4.  Histogram based on 290 values (mean winter concentrations) for all
      Inner London sites for the winters 1965-66 to 1969-70. Data grouped into
      20jug/m3 classes.
                                                                       12-21

-------
  1000
 i loo
 UJ
 u

 8
     10
                             I
                                      i   I
            O.I      I
10         50         90
CUMULATIVE FREQUENCY
  99     99.9
Figure  12-5. Spatial  distribution  for mean winter 1965-70,  all  Inner  London
      sites.
     1000
    6
    9
      100
    8
       10
                  111
                                            I   I	I	I	l_
             0.1
10         50         90
 CUMULATIVE FREQUENCY
99    99.9
Figure 12-6. Daily-values average over the 4 sites, Winters 1968-69 & 1969-70.
12-22

-------
Figure 12-7. Number of days when C > 500 for the winters 1968-69 & 1969-70.
    Figure 12-8. Mean winter concentration for Inner London 1968-69-70.
                                                                12-23

-------
 CONCENTRATION ON MCVIOM MY
       MM* «ll>< i • Ml.
                                           -•  -4 -t  0  t  4  •   •  10  II 14  !• M
Figure 12-9.  Nomogram for  SO2  concentrations at the  four  sites in Inner
London.
 900


 800


 700
I

 600


 500


 400


 300


 200


 100
                                           //  ./.  /••'  (Mfldord «rw
                                        •/'/  /\  sj?   Ming m*oiurtd Cp

                                        '
                           •   •// r* s •
                            V;**V
                                           \	I
                                                    J	I
           0    100   200  300  400   500  600  700  800  900  1000
                         ESTIMATED CONCENTRATION (Mg/m»)


          Figure 12-10. Results of the forecasting scheme of Part II.
12-24

-------
Ack nowledgements

    The authors are grateful to the Warren Spring Laboratory for their helpful
cooperation in the supply of their National Survey data, and to the following
Councils who gave permission for the data from their areas to be used:
    The Corporation of Manchester
    The Royal Borough of Kensington and Chelsea
    The Borough of Hackney
    The London Borough of Lewisham
    The Corporation of London
    The Rural District Council of Epping and Ongar.
    We also wish to acknowledge the very helpful work done by Mr. P. Bushby,
a vacation student, who analysed the Manchester data. This paper is published
by permission of the Director-General of the Meteorological Office.
References

Annual Reports of the National Survey, Warren Spring Laboratory.
The Royal College of Physicians, 1970: Air Pollution and Health. Pittman.
Weatherly,  M. L. and Gooriah, B. D., 1970:  National survey of smoke and
     sulphur dioxide: The greater London area. Warren spring Laboratory.
DISCUSSION
Gifford: You mentioned in your written presentation that you tried to avoid a
numerical  scheme because of the  complexity  that the computer required. Of
course we have a scheme that is perfectly numerical  which doesn't require  a
computer and since you had been good enough to include a whole rather long
series of data we thought it  would be fun to compare our scheme with yours.
The  results are summarized in the form of a comment to the preprint of your
paper, which  is presented after the discussion of the paper. Our scheme is very
simple. In your notation it simply says that the concentration is proportional to
the source strength divided by the wind speed. This might look like a box model
to you, and in fact it is. This number here (C in Equation 1 of my comments on
the preprinted paper), which can be  expanded, depends rather weakly on the
city  size and on stability  and if you  are  interested  in  seeing some  other
comparisons these were mentioned in the references.  This model really works
very  well. For  instance, it  gives a correlation with the data that  Joe  Knox
showed yesterday for carbon  monoxide  in San Francisco that is just about as
good as the one from his model. Unfortunately, as you pointed out, you don't
                                                                  12-25

-------
have the source strength to be entered into the model so we simply turn the
formula around and use yesterday's source strength, which I call QQ, to put into
today's formula. So  if  1  represents today and  0 represents tomorrow, our
formula  says that the  concentration  is given by  the ratio  of today's  vs.
tomorrow's wind speed, times the existing concentration level. Your model gives
a  0.9 (rounded  off) correlation; our  model  gives a  0.7 correlation.  Your
simplified  model is 0.81  or perhaps it  is 0.84,  I  don't remember. Anyway,
persistence was low, 0.55. I  put in confidence  limits to indicate the fact that all
of these models beat persistence. I discovered in going through the data that our
model had a  serious  tendency to  over  predict.  It  being  so simple, it's very
subject, for instance, to error  for  low wind  speed. When V1  is 1 meter per
second, things got pretty bad. So I tried an arbitrary correction, namely, I put a
modified model which took the square root of the wind ratio and found this
gave a correlation of 0.76. I  would expect that  these correlations, whichever one
you choose to use, would repeat, since ours is essentially an a priori model. This
I  admit was somewhat suggested by our experience with your data, but I would
expect that ours would repeat, that yours would come down a little bit, and that
probably Manchester was more representative of the  sort of effect that you are
likely to get with it, and would guess that in all likelihood the best way to
improve predictions of this  item  would be if you went to  work on the source
strength term. In fact I tried to incorporate some information and this is the
question:  Do you not think some  information  on  the  source spatial  and
temporal variability janitor functions and so on  would improve the model?
    I made the  point in my prepared comment on  your written paper that I
thought that you had used both  of your winters to develop this model and
you've now told us that you only used one winter so I have to that extent more
confidence. Just the  same,  yours  is a very high correlation.  If  it holds  up
anywhere near that value it is going to be an awfully tough forecast to beat.
Smith: I  am very pleased to see this and I would certainly agree that wind speed
is  important  in this sense as you saw  in the nomograms I showed you. Wind
speed obviously was playing a very important role in that both in the number of
hours of calm and in the influence of the previous day's concentration.  I also
agree that if one could  somehow  get a better  understanding  of the  Q
distribution, the source distribution  in London, one  might very easily improve
the situation. The only hesitation I  have in this  is that both Dr. Gifford's and my
correlations  depend  on  using  meteorological data which  has  really  been
measured, its real  data. Whereas the forecaster's problem is he has to actually
forecast this data which is not an easy task and I am quite sure the correlations
will drop in  practice  quite significantly when the forecaster has to face the
music and  try to forecast the wind speeds and temperatures  for the following
period. And perhaps there comes the level in this  when it's not worth going to
great effort and expense to get a Q distribution when you know fairly well that
even if you had the most perfect Q values the meteorology would let you down
and you'll still not get a very accurate forecast.
12-26

-------
Comment by F. A. Gifford on  the Paper,  "The Prediction  of High
Concentrations of Sulphur Dioxide in London and Manchester Air,"
by F.B. Smith and G.H. Jeffrey.

    I  would like first to record my agreement with Dr. Smith's remarks on the
need for a simple approach, guided by the data, in  air pollution forecasting. And
also to second his remarks concerning the limited effect that mixing depth has
on urban air concentrations, as a rule.
    The authors approach this  problem using  the empirical  techniques  of
objective weather forecasting and one can only applaud both their methodology
and the workmanlike result that they obtain. They mention as alternatives to
their empirical approach, numerical models and physical models, rejecting the
former  because  they require  a large computer facility. One  seemingly valid
numerical urban  air pollution  model does not however require such a computer
facility, namely the simple ATDL model described by my colleague Dr. Hanna
and me in a series of papers. (See all the references  cited.)
    It is of  some interest to compare our  model  with the  present results,
particularly as Smith and Jeffrey have  included a fairly long series of London
S02  air concentration  data,  together  with  related  meteorological data,  the
developmental data for  their  model.  Application of  our simple model to the
present  data is similar to the  applications discussed in the first four references
cited.  Chemical  S02  removal,  which  could  be  included using  the scheme
suggested by Hanna (1972), will not specifically be taken  into account. Then the
simple model gives, in the notation of Smith and Jeffrey,
                              C= cQ/v                             (1)
where Q is source strength and c is a dimensionless parameter. Unfortunately no
data  on Q are  included. To apply our model to forecasting London S02
concentration, we have to use today's value of C  as a measure  of Q, i.e., where
subscript zero means today. Then
                            Q0=V0C0/C                          (2)

is the  prediction of our model.

                          C|=(V0/V,)C0                        (3)

    Table I displays  the results of this comparison, in  the form of correlation
coefficients (with  95% confidence limits)  between  predicted concentration
values from Equation 3, using the data in Smith and Jeffrey's Table II. In this
table, "ATDL-modified" refers  to  a  second prediction  using the following
modification of Equation 3:

                        C|  = (v0/v,)"2 C0                      (4)
                                                                  12-27

-------
This entirely arbitrary modification attempts to account for the lack of data on
source strength variability. Variation of Q with  several of  the meteorological
parameters of Table 11 is probable. For instance, higher winds in winter are likely
to be correlated with higher domestic fuel consumption. The modification to
our simple model is an arbitrary attempt to take this into account.

              Table I - Correlations of concentration predictions
by various models with 140 measurements of London 24-hour SO2 concentrations.
Model
Met. Office
ATDL
ATDL-Modified
Met. Office-Simplified
Persistence
Correlation Coefficient
.87
.70
.76
.81
.55
95% Limits
.91-.82
.79-.60
.82-.G8
.8G-.74
.6G-.42
    Several conclusions can  be drawn  from  Table  I.  None  of the other
correlation coefficients fall within the confidence limits of persistence, so in this
sense all methods "beat" persistence. Of these methods "Met. Office" correlates
best with  the  data  sample.  But remember that this is an empirical  method,
developed  from the given data sample. The "ATDL" correlation can on the
other  hand be considered a genuine test and should repeat on independent
London data or on data from other cities. "ATDL-Modified"  incorporates an a
priori  hypothesis and should also hold up. However in all honesty it has to be
pointed out that the data suggested the hypothesis.
    More important, it seems safe to suggest that there would be no  inherent
difficulty in proposing further empirical modifications to "ATDL" to bring it up
to the level of "Met. Office,"  based  on the data sample.  "ATDL-Modified" is
already quite competitive.  But notice the following  implication of that fact.
"ATDL" says that only Q, v, and c can vary.  The last varies with stability. To
whatever extent Q,  the  source strength pattern, is  a variable factor, "Met.
Office" takes  this  into account  indirectly, through  correlations  of  Q with
meteorological factors. If you believe "ATDL" on the  other hand, it seems that
the most sensible way  to  improve SO2 forecasts would be to include the best
available estimates of Q(x,y,t).

 References

Gifford, F. A., and Hanna, S.  R., 1971:  Modeling urban air pollution. Atmos.
      Environ. 6.
 Gifford,  F.  A., and   Hanna,  S.  R.,  1971:  Urban  air pollution modeling.
       Proceedings 2nd  Int. Clean Air Congress, Washington, D. C., Academic
       Press, pp. 1146-1151.
 12-28

-------
Gifford,  F.  A.,  1972:  Application  of  a simple urban pollution  model.
     Proceedings of Conference on Urban Environment and Second Conference
     on  Biometeorology,  Philadelphia, Pa., American Meteorological Society,
     pp. 62-63.
Hanna, S.  R., 1971: Simple methods of calculating dispersion from urban area
     sources../. Air Pollution Control Association. 21: 774-777.
Hanna, S.  R., 1972: A simple model for  the analysis of photochemical smog.
     Proceedings of Conference on Urban  Environment and Second Conference
     on  Biometeorology,  Philadelphia,  Pa., American Meteorological Society,
     pp. 120-123.
                                                                   12-29

-------

-------
                  13. FITTING CURVES TO URBAN
                SUSPENDED PARTICULATE DATA
                           DAVID A. LYNN

                       Department of Statistics
                          Harvard University
                       Cambridge, Massachusetts
Introduction

    The effort described here is  a comparison of a  number  of  theoretical
frequency distributions with respect to their ability to fit bodies of urban air
quality  data. I  have used almost exclusively  some quite extensive suspended
particulate data from Philadelphia, and  so  the  work, and  of  course the
conclusions, are to this extent limited and in a sense preliminary. It is presented
at this meeting in particular, principally in the hope that it might interest others
in actually trying out various distributions on their data.
    The reaction of people in the air pollution field to the mention of this topic
is often one of  surprise.  The question is usually believed to have been long
settled  by the  use  of the  lognormal distribution and its parameters, the
geometric  mean and geometric standard deviation. It is certainly true that the
lognormal  is widely accepted in this role. I think we have all observed that air
quality  data  is typically  positively skewed, so  that the normal  distribution
doesn't  fit at all, and then taken the lognormal as the simplest positively-skewed
alternative, without ever really considering alternative distributions very deeply.
(Phinney and Newman (1972) is a recent example.) We need to  remember that
the decision to use the lognormal distribution was made mainly by the staff of
the National Air Sampling Network back in the 1950'swhen computer-handling
of data  was almost unheard of. At that time, it was a major accomplishment to
achieve  enough  computer processing to publish the  NASN  data summary
reports, let alone to do anything very lengthy or complicated.
    Now,  however, the situation is much different.  Not only do most sizable
agencies and installations  have  computerized  data  processing, but we are
beginning  to be asked for increasingly sophisticated statistical inferences from
our air  quality  data.  This will likely  become  increasingly true  as the typical
pollution levels in  our cities decline to the general  vicinity of the National
Ambient Air Quality Standards. It is my view that as we in the field of data
                                  131

-------
analysis begin to provide more sophisticated methodologies to be used for these
inferences, we should re-examine one of our most basic assumptions.  In a sense,
the increasing importance of gaseous pollutants during the 1960's has already
forced this re-examination on us somewhat, and the results are not completely
consistent—we have to use arithmetic means with sulfur dioxide data in order to
avoid the embarrassment of too many logs of zero.
    We might also consider briefly, by ways of further motivation, just what we
have, and might in the future do, with a distributional  form once we've chosen
it,  and  how  we verify our choice.  When the NASN first  introduced  the
lognormal, they used it in two ways. They included  the geometric parameters in
their  data summaries, and they used logarithmic spacing of tallying intervals in
the frequency distribution program that calculated the familiar summary line of
percentiles.  More recently,  however,  statistical  applications have included
significance tests, confidence  intervals  (Hunt  (1972)),  and  extreme value
statistics (Singpurwalla (1972)). In  the future  we  may need inferences about
spatial distributions, decision-theoretic-inference procedures, and so  on, all of
which involve making distributional assumptions of some sort. We might note
for the sake of completeness that in some fields, notably actuarial science, some
of the jobs done years ago with fitted theoretical curves have more recently been
approached with smoothed empirical distributions.  In fact at least one of these
smoothing techniques has been applied to air quality  data, specifically NASN
paniculate data (Spirtas and  Levin (1970)).  The  reasons  we might  prefer a
theoretical model to a set of smoothed data are the availability of  probability
theory, or at least more probability  theory, the ability to use somewhat smaller
data sets, and the ability to follow trends in the value of a parameter over time.
Distributions and Fitting Methods

    Our purpose here is to compare the several theoretical distributions under
consideration by actually fitting each to a number of sets of data, seeing which
typically fits best, or which fits best in some overall sense, if indeed any do. The
data used are suspended particulate data gathered at three sites in Philadelphia
by the City control agency. The data have been gathered daily for many years.
Here  I have used data from 1960  to 1968,  comprising  25 annual data sets in
all—nine years from three stations with two sets missing.
    There are a number of  ways to determine the parameter values that will fit a
specified distribution to a set of observed data. The use of the word "determine"
rather than "estimate" the parameter values is deliberate. For most of the data
under consideration we have essentially  complete data for the entire year, so we
consider the year's frequency distribution as  "known", and view the problem as
fitting a  distribution  with known  parameters  rather than  as estimating the
parameters from a  sample. Of  the  various methods available, I  have used the
"method  of moments," which selects the specific distribution out of the given
 13-2

-------
family that  has the same moments as our observed distribution, in  each case
equating as many moments as there are parameters in the algebraic formulation
of the distribution. The first two moments are the mean and the variance, so this
is of course  just what we routinely do when we calculate the sample  mean and
standard deviation  and then  use a  normal distribution with that mean and
standard deviation, similar  to what we  would  do  with  the  two  geometric
parameters.  Used  in this way, the mean and variance of a normal distribution are
effective as  location  and scale  parameters.  That is, changes in the  mean are
equivalent to changing the location of the distribution back and forth along the
axis, while changes in scale,  such as changes in  the size of the units.  Beyond
these simple additive and multiplicative variations, however, a two-parameter
distribution  has no flexibility  left to change its shape.
    We can, however, have more than two parameters, and can determine their
values by equating more than two moments from the observed distribution to
the algebraic expressions for their theoretical counterparts. In common practice,
four moments  are the  most used, because the sensitivity  of the higher-order
moments to small sampling errors is very great. The third and fourth moments,
when used are commonly used in  modified forms rather than  in  their raw
numerical form.  If we let ju2, (JL3, and ju4  be  the  variance  and 3rd and 4th
moments respectively, we commonly deal instead with the coefficients
The division by the proper  power of  the  variance makes jS1  and  /?2 small
dimensionless numbers, while squaring ju3  makes j31 independent of its  sign.
These  two constants ()31  and ]32) are  called the coefficients of skewness and
kurtosis, respectively. Figures  1 and 2 give some indication of how they operate
to measure the shape of a distribution. The point to note here is that every j31 ,j32
pair represents  a  different  possible  distribution  shape,  if the  theoretical
distributional form we  are using  has enough parameters to  make use of them.
This is of course not to say that any observed distribution with the same mean,
standard deviation, skewness,  and kurtosis will  be the same. We have reduced
some 300-plus data points down to four numbers,  and have given  up some
information in the process, but we've given up  less than if we dealt with  only
two parameters. The first  few columns in Table I present the mean, standard
deviation, and coefficients of skewness and kurtosis for the various data sets.
    To pass from  general considerations to the specific distributions used here,
let's begin briefly with the lognormal.  We  have used  not only the ordinary
lognormal,  here  called   the  two-parameter  or   2-p  lognormal,   but   a
three-parameter  version which  includes  as the third parameter  a  location
parameter, additive in the p.g/m2  scale  before the logs are taken. The density, y,
                                                                     13-3

-------
written in terms of the parameters GM3 and GSD3, and 6, is given in Equation
2, with  GM3 and GSD3  denoting the parameters analogous to the geometric
mean and standard deviation. The 2-p lognormal is obviously just a special case
of the more general form. It  doesn't really  have a true location parameter,
because it's tied to the fixed origin of the coordinate system by the nature of the
log function. As the geometric  mean varies along the axis, it does take the bulk
of the density with it, but the  shape also changes, as illustrated in Figure 3. The
third parameter,  here 6  permits  the distribution  to  slide along the axis
independent of changes in its general size and shape.


                                               In(X-Q) - lnGM3  I2
                       i                —uzi•
   y  =
    The  other three-parameter distribution  was the  Gamma distribution with
density
            Zir  (X-0)  In GSD3
                                                     a >0
        y=	for  B>0
As one can  see in the term x'g. the parameters o and ]3 are location and scale

parameters,  respectively;  a is a shape parameter that is determined from the
skewness coefficient 0.,. If the shape parameter is an integral multiple of 1/2, say
n/2,  and |3 - 2, a = 0, the Gamma becomes a  chi-square distribution with n
degrees of freedom. As we see in Figure 4, the Gamma distribution can take on a
variety of shapes with changes in a. For a < 1, it  becomes "J-shaped", that is,
has an infinite ordinate at x = a; as a gets very large, the shape approaches that
of a  normal distribution. Although it often isn't apparent in the graphs, the
lower tail of the density curve does come down tangent to the axis.
    The other distribution  considered was the four-parameter Beta distribution
or,  as  it  turned out,  the Bete distribution  and  another  four-parameter
distribution  in different instances. The Beta is the closest there is to a common
four-parameter   distribution,  although  it's  most  commonly   used   in  a
two-parameter form.  Its density,
                              (X-A)m»  (B-X)m2    m,= p-l
                              "     B-Ap + q-'        m2=q-
13-4

-------
depends on two parameters, A and  B,  that are very clearly the minimum and
maximum, and two others, p and q,  that are symmetrical, representing a sort of
relative contribution from the two ends. The shape of the distribution and its (3-|
and  /32 coefficients are functions  of the latter two parameters, and are quite
flexible. The various distributions in  Figures 1 and 2 are all  Betas. The bounded
nature of the Beta and the p-q symmetry make it a sort of continuous analogue
of a  binomial  random variable and  it is most often used  as a Bayesian prior
distribution for the binomial  parameters  p  and q,  with the max and min
parameters set at 0 and 1.
    Actually, the Beta distribution is specifically a Type I  member  of the system
of theoretical  density  curves  developed  by  Karl Pearson many  years ago.
Pearson's system is  an  attempt to provide a theoretical density function for
every point in  the j31f j32 P'ane- that is, for every possible  combination of the
skewness and kurtosis coefficients. As we see in  Figure 5, his system consists of
three "main types", I, IV, and VI, and a number of "transition types". The three
main  types are four-parameter distributions, representing the areas in Figure 5,
while the transition curves are represented by lines and points at the boundaries
of the areas. The three-parameter curves, such as the Gamma distribution, which
is Pearson's Type III curve, appear as lines in the plane, while the two-parameter
Normal distribution is only a point. Thus if one is fitting Pearson curves to data,
the correct main type to be used depends on the values of  ^ and j32—if we try
to fit the wrong one, we get imaginary numbers during the calculation  process.
As it turned out, the Philadelphia data used  here, as often  as not, fell outside the
range of the  Beta  distribution, we also included  Type VI when needed. There
developed no need for Type  IV, though there very well might have. The Type VI
density function is most simply written in terms of an arbitrary origin. Equation
5.  A  more complicated formulation, with  the data expressed in  their normal
way,  is more logical (Eq. 6).  In either  case,  it's apparent why Type VI is not
often used.
    We  note  in closing that  the lognormal  densities,  though not a  part  of
Pearson's system, can still be represented by a line in the j8}-/?2 Plane,

TYPE VI DENSITIES
                                       +l)          (X*)qi

with X* and A* referred to an origin at

                                 (q.-')A*
                                     -

                                                                     13-5

-------
with origin at Ojug/m3,
         (q2 +
  y =  	£	:	&	x	    (6)
        X0(q,-l)q'  r(q,-q2-l)  r
-------
to be, however; the symmetric  normal fits the skewed data so badly that the
effect of the poor  fit dominated the  effect of the  roughness  of the observed
data.  This comparison did, however, provide an interesting way of viewing the
process of choosing a distribution. With the value of the criterion for the normal
being 100%, the choice of any of the skewed distributions would reduce this to
50-70%, while the distributions typically differed among themselves by less than
10%, and rarely by more than 30%.
    In fact, when we look at  the results in Table III, it's  apparent  that there is
frequently very little choice among the distributions. The figures in the table are
the values  of  the absolute  difference criterion  at  the 20  jug/m3  level of
aggregation; to  provide  a simple  summary,  averages for each station and for the
whole  group are  provided. It's clear, first  of all, that the normal  distribution
doesn't fit,  which  is of course no  surprise.  It is also apparent that the  two
lognormal distributions have noticeably  lower  values than the  Gamma and the
four-parameter  Pearson. It is  not really  clear that the differences between the
two distributions in each pair are real because position changes of 1 to 2 units or
so did  sometimes occur just in changing  from one  to another of the various fit
criteria.
    Because  the  data  used here are quite a narrow sampling  from the many
monitoring  sites  (with their  possibly-different  distributional shapes),  we
probably want to attach relatively little weight to choosing an overall  "winner"
among the distributions, and rather consider just why the various distributions
react the way they do  to the different  data sets. In Table  I are tabulated the
fitted values of  the  parameters of the various  distributions, and with  this
information in  conjunction  with Table III and some figures we'll consider the
performance of the various distributions.
    First we consider the  Pearson curves, the Gamma distribution and the  two
types of four-parameter curves.  They are usually closer together than  they are
close  to the performance  of  the two lognormals. But except for a  very  few
instances, they  aren't too  far  from the lognormals, at least in comparison with
the normal. The  extremely  bad  instances are the  cases where  one or both of
them becomes J-shaped. Recalling that in the jS.,-/^  plane, the Gamma represents
the dividing line between the Type I and Type VI areas, we can view the Gamma
as an approximation to  either. The equation of the Type  III, or Gamma line, is
2j32 = 6+3j31f so we would  expect the approximation to be better, the closer j31
and j32 for our distribution are  to that line.
    In  fact,  the  Gamma and  the Type  I or VI are  quite  close for most of the
data sets (or station-years), and those that are farthest apart are in fact those
farthest from the  line. Considering the data  sets where the Gamma and the
four-parameter  are  the   most  different, we find  that   those  where  the
four-parameter is better (1-60,  1-65, 2-65, 3-65)  are  all  in the Type VI area,
while those where the Gamma distribution does better (1-68, 2-62, 2-67, 2-68,
3-68)  are  all in the  Type  I  area. In  fact,  with  a single exception, closer
examination  shows this to be true for all the data sets.  Except for 3-66, the
                                                                      13-7

-------
Gamma always does better than the Type I and worse than the Type VI, though
in some cases  the  differences are trivial. It's not immediately clear why this
should be, and given the fact that the years 1965 and 1968 occur so often in the
lists above, it is likely nothing more than the fact that a year's meteorology may
give a distinct shape to the distributions at the several stations.
    Figure 6 presents histograms and the fitted curves for two examples of cases
where the Pearson Type VI does much better than the Gamma. Figure 7 presents
two cases where the Gamma does better than the Pearson Type I. In Figure 6 it's
clear why the  Gamma does poorly - it makes a much larger peak than necessary,
and then underestimates on the downslope of the upper side of the distibution.
In contrast, in Figure  7, where sharper peaks are needed, the Pearson Type I
tends to exaggerate the  peak or, as in the 2-62  case,  even to  go off to  an
infinite  ordinate.  In these cases, the curves  are  almost identical on the upper
side of  the peak. On the lower side, the Gamma does better with its moderate
peak, though  neither  does really  well. Interestingly, the factor which controls
these  cases is  hardly evident in the resulting curves. Those where the Type  VI
does better than the Gamma are typically those with high maxima, or long tails,
and those where the Gamma does better than the Pearson Type I are those with
quite short tails—those with few or no values over 400 )Ltg/m3. The 1-65 case in
Figure 6 is a good illustration. The Gamma comes more steeply  down the upper
side of the curve in order to throw more probability mass out into the tail, and
in order to do so, builds itself a higher peak,  losing good fit not only in the peak
but on the downslope  as well.
    Before considering the two  lognormal  distributions more thoroughly, it
might be of value to consider when and why  the two pairs of distributions differ
more or less.  Although the two Pearson distributions are poorer overall here,
they have a number of possibly-useful  properties,  and ought not be discarded
outright. Clearly, the situation where  they differ most is  in those high-skewness
cases where the Pearson curves peak wildly or become J-shaped.  No matter what
the skewness,  the lognormals retain their zero ordinate at the lower boundary,
and hence fit much better.
    The two pairs,  however, do sometimes differ fairly markedly even when the
Pearson curves don't  have infinite ordinates. Data set 3-68 in Figure 8 is  an
example where the two lognormals are better, and set 3-66 is  Figure 9 one where
the two Pearson curves are better. These two  sets of data illustrate fairly well the
typical result in those situations when there is a noticeable difference among the
distributions, typically the cases where the distributions'tails are moderate-well
out into the 300-400 jug/m3 range, but not into the 600 jug/m3 range. On the
upper side of  the  distribution, there  is very  little  difference among the several
distributions.  To  accomplish this,  the two  Pearson  curves  typically  have
somewhat sharper peaks than the two lognormals, and more abrupt slopes on the
lower tails.  Thus if the observed distribution has a clean, sharp  peak as  does
3-66, the Pearson curves fit better, while if  it has  a broader peak as in 3-68 or
2-64, the lognormals fit better. Actually, in these data, the former is a rarity; the
 13-8

-------
3-66 data set is quite unusual, being the only case where the Pearson Type I is
the best  of the four, and the only exception to the pattern of the Gamma doing
better than the Type I, regardless of the rank.
    If  we consider the differences between Figures 8 and 9, we would rather
expect that those observed  distributions with  peaks, in some sense, midway
between  would produce roughly the same fit with all the curves. This is precisely
what happens, higure ID shows  two data sets  tor which the results are quite
close, with only the best and worst of the four distributions plotted. Their peaks
do seem midway, being a little more blocky on the high than on the low side. In
addition, there is less noise than  in some of the distributions, leaving less oppor-
tunity for one of the distributions to  appear  better or worse, fortuitously. In
fact, the 3-62 data set can be considered the best in this sense, because by a clear
margin, it had the lowest overall value of the fit criterion.
    There are several such data sets where all four distributions fit about equally
well;  in  about half of all the sets, three  of  the four could  be described as
essentially tied. The  balance of cases on either  side of this center is far from
even, though; there are several data sets with square peaks, while only one (3-66)
with a sharp peak. It is largely this predominance of relatively square,  blocky
peaks  (low kurtosis)  that gives the lognormals  their overall advantage, at least
among these  data. And at this  point  it might be  prudent to point out that
shafper  peaks often seem to go with overall lower concentration levels.  This is
the case  here, and since this is relatively high data (especially for the early years),
                                       r
we might expect somewhat different overall results with other data.
    We  now turn to a closer look at the differences and similarities between the
two lognormals. The first general  observations are most readily seen in the list of
parameters in Table  I, comparing the geometric mean and standard deviation
with their analogues in the three-parameter model, labelled GM3 and GSD3. The
location   of the  distributions remains strikingly constant,  the sum  of  the
threshold location parameter 6 and GSD3 is rarely more than 3 or 4 jzg/m3 from
the two-parameter geometric mean. The parameter 6 takes on both positive and
negative  values, many  near zero but with a few  sizeable cases in each direction.
We also note that when the value of 6 is negative, GSD3  is less than GSD, and
vice versa, in rough proportion to  the magnitude of 6.
    In terms  of performance,  we've  already  noted that the two-parameter
lognormal does overall slightly better than  the three-parameter, and in fact does
the best  of all four distributions  tried. This is, of course, in conflict with what
might  be expected, since with only two parameters, it should have the  least
flexibility. We've  seen   why this  happens,  though.  The  three-parameter
lognormal, the  Gamma,  and the  four-parameter  Pearson  are in order  of
increasing flexibility, but they use this flexibility  primarily to adjust  to  the
upper tail of the distribution.  While the 3-p lognormal can't go off to a wild peak
or an  infinite ordinate even remotely as easily as the Pearson curves, it can do so
more  readily than  the 2-p lognormal, as  in  Figure 11.  This permits the  2-p
distribution to "win" by  default, even though  it fails  badly  to fit  the blocky
                                                                       13-9

-------
peak. Thus  the success of the lognormals is a bit left-handed, as it were. They
succeed not because they are sensitive to the bulk of the data but simply because
they are insentitive to the upper tail, and are tied down at the lower tail.
    Figure  12 also illustrates this  peaking tendency, but  here the result is the
opposite;  the observed  distribution (2-62)  has  a very  sharp  peak,  and the
flexibility of the 3-p lognormal  reaches it a little better, though neither does
really well.  It is also possible  to associate this tendency toward high peaks and
long tails with the value of 6.  The larger positive values are associated with those
that have shorter peaks and shorter  tails than their 2-p counterparts, such as 3-66
in Figure 9.
     An attempt to summarize is not as difficult  as it first appears. The overall
level of performance is the reverse of what would be expected on the basis of the
number of  parameters  involved  in the distributions,  largely because of  the
sensitivity of the distributions to the extreme upper  tail. As a consequence,
square, blocky distributions can't  be fit well unless they have a very short tail.
And I should add as a closing footnote, though this isn't  the place to discuss it,
that this  sensitivity to the extreme upper tail is characteristic of fitting by the
method of  moments. Thus the first place to look for improvements is likely in
less touchy, though possibly less simple, fitting methods.
 13-10

-------
     TABLE Ma)
Fitted Parameter Values
Sta/Yr
1 1960
1961
1962
1963
1964
1965
1966
1967
1968
2 1960
1961
1962
1963
1964
1965
1966
1967
1968
3 1960
1961
1962
1963
1964
1965
1966
1967
1968
9 1968
11 1968
Mean
169
167
156
167
174
166
174
152
136
142
137
136

137
141
147
127
118
151
153
138
152

134
138
131
117
123
152
MOMENTS
Std.
Dev. ]3i
71
74
58
80
69
68
69
69
55
58
54
55

56
56
61
57
49
67
70
53
65

65
67
64
56
59
74
1.75
2.84
1.35
5.81
1.65
2.88
1.54
1.99
1.43
2.19
2.08
2.96

1.71
2.56
2.52
1.85
1.61
2.18
2.14
1.33
1.70

3.87
1.61
1.71
1.78
1.42
1.58
02
5.85
7.28
4.85
12.69
5.47
8.77
5.45
6.08
4.83
6.61
6.07
6.88

5.36
7.90
7.18
5.38
5.05
6.60
5.93
4.99
5.37

11.76
4.92
5.44
5.37
4.57
4.87
Geo.
Mean
156
153
147
153
162
154
162
139
125
133
128
128

127
132
136
116
109
138
139
129
140

122
124
117
105
110
137
LOGNORMALS
GSD GM3 GSD3
1.50
1.51
1.43
1.51
1.46
1.46
1.46
1.53
1.49
1.46
1.45
1.43

1.47
1.44
1.46
1.53
1.49
1.52
1.53
1.45
1.50

1.55
1.59
1.60
1.60
1.62
1.59
158
128
146
95
157
117
163
142
136
113
109
93

125
102
111
123
114
133
140
135
147

95
155
143
123
147
171
1.49
1.63
1.43
1.87
1.48
1.63
1.46
1.53
1.44
1.55
1.54
1.64

1.49
1.60
1.59
1.51
1.47
1.55
1.54
1.43
1.48

1.72
1.47
1.49
1.50
1.44
1.47
0
-2
23
0
57
4
34
-1
-3
-10
18
17
31

2
27
23
-6
-5
4
-1
-6
-7

24
-30
-24
-17
-34
-32
                                      13-11

-------
                             TABLE Kb)
                        Fitted Parameter Values
Sta/Yr
1








2








3








9
11
1960
1961
1962
1963
1964
1965
1966
1967
1968
1960
1961
1962
1963
1964
1965
1966
1967
1968
1960
1961
1962
1963
1964
1965
1966
1967
1968
1968
1968
a
2.29
1.41
2.96
0.69
2.42
1.39
2.59
2.01
2.79
1.82
1.92
1.35

2.33
1.56
1.58
2.15
2.48
1.83
1.87
3.00
2.36

1.03
2.48
2.34
2.24
2.82
2.53
GAMMA
&
47.1
62.7
33.6
96.7
44.4
58.0
42.7
48.6
33.1
42.7
38.9
47.5

36.4
45.0
48.1
39.0
31.4
49.7
51.3
30.5
42.4

63.4
42.6
41.7
37.6
35.4
46.2
7
61
79
57
101
67
85
63
55
43
65
62
72

52
71
70
43
40
60
57
46
52

69
32
33
33
23
35
Criterion
4.04
87.89
PEARSON
X. (VI) A
1553
30908
-3.82
5.46
1396
-104.
1.30
5
11
-2
4
-24
-3

-4
1
.74
.92
.31
.06
.70
.49

.48
.51
3.89
227
2346
4493

1134




272
1064
-2.49
-2
3
-4
-48
-5

1
-1
.32
.85
.31
.48
.24

.01
.71

1249




26

-6.99
-3
-1
-1
.19
.31
.65



50
78
64
93
67
43
56
52
55
56
63
80

58
44
61
54
50
49
65
47
58

-11
50
38
41
44
56
TYPES I
m1/q1
44.03
500.5
1.32
19.93
1.39
16.57
66.60
101.84
0.87
36.07
0.84
-0.06

0.86
17.79
30.87
0.43
0.64
34.73
0.44
1.94
0.94

47.32
0.43
1.01
0.63
6.86
9.73
& VI
n\2/c\2
2.06
0.43
38.83
-0.04
1001.
4.44
2.14
1.21
19.86
1.47
184.1
13.87

35.97
3.42
1.21
15.99
17.37
1.53
26.74
575.1
43.49

38.32
11.48
59.26
22.92
0.44
0.44
B(l)
1731

1731

44973



1034

7569
1040

1703


996
876

1839
17954
2302


903
2901
1228
922
1122
13-12

-------
                             TABLE II
               Typical Summary Tables With Criterion At
                   5, 10, and 20jug/m3 Class Widths
STATION 2, 1964


5/zg/m3

10jUg/m3

20jUg/m3


STATION


5jUg/m3

10jUg/m3

20jUg/m3

K
SUM
DIFF
REL
DIFF
REL
DIFF

REL
3,1961
K
SUM
DIFF
REL
DIFF
REL
DIFF
REL
Normal
360.5632
146.4098
1.0000
125.9653
1 .0000
125.6084

1 .0000

Normal
320.3271
168.2665
1.0000
148.1114
1.0000
126.3808
1.0000
2-P Log
363.0000
105.9059
0.7234
66.1073
0.5248
38.8479
t
0.3093

2-P Log
324.9995
109.4671
0.6506
79.7891
0.5387
39.1026
0.3094
3-P Log
363.0000
106.0737
0.7245
67.0827
0.5325
39.1794

0.3119

3-P Log
324.9990
110.7158
0.6580
81.1142
0.5477
39.2929
0.3109
Pearson
363.0208
120.8522
0.8254
86.2695
0.6849
62.3205

0.4961

Pearson
325.7625
121.5336
0.7223
101.8797
0.6879
56.9197
0.4504
Gamma
362.8891
114.4267
0.7816
80.6849
0.6405
53.4492

0.4255

Gamma
324.5805
114.4060
0.6799
90.2807
0.6095
53.0919
0.4201
                                                              13-13

-------
                             TABLE III
          Summary of Total Absolute Deviations (20/ig/m3 classes)
Normal
Dist.
Station
1







Station
2







Station
3







Station
9
Station
11
1960
1961
1962
1963
1964
1965
1966
1967
1968
Average
1960
1961
1962
1963
1964
19C5
1966
1967
1968
Average
1960
1961
1962
1963
1964
1965
1966
1967
1968
Average
1968

1968
Average
109.8
135.1
129.9
159.4
119.5
129.1
129.0
131.0
120.3
129.2
114.2
125.7
177.2

125.6
119.5
143.5
143.7
134.5
135.5
108.6
126.4
106.0
122.9

123.8
134.9
131.2
136.9
123.8
94.0

60.2
125.6
Lognormals
2-P 3-P
43.4
56.6*
45.0*
60.8*
59.3*
59.0
52.2
51.2*
60.2
54.2
46.6
41.6*
82.2

38.8*
43.5*
43.3
59.8*
60.4*
52.0
47.3*
39.1*
27.7*
56.4

56.0
58.6
53.7
61.1*
50.0
60.3

31.6*
51.7
43.2*
62.6
46.7
93.1
60.3
55.9
52.4
52.5
59.4*
58.5
39.5
46.1
64.4*

39.2
46.5
35.8*
63.6
61.3
49.6
47.8
39.3
28.1
57.8

54.6*
66.8
53.8
65.3
51.7
58.9*

36.0
53.0
Pearson
4-P Gamma
45.6
109.3
49.7
131.6
70.9
55.1*
48.3*
60.6
86.6
73.1
36.4*
69.3
105.7

62.3
48.3
52.1
75.6
80.9
66.3
57.2
56.9
29.6
48.0

57.4
50.0*
52.2
82.5
54.2
72.8

36.9
64.1
54.0
110.1
49.4
210.6
70.5
84.8
48.4
63.9
69.3
84.6
39.1
68.1
68.2

53.4
63.5
67.0
62.9
68.9
61.4
66.7
53.1
29.1
45.1*

89.4
51.7
49.7*
73.1
57.2
60.7

32.0
66.8
13-14

-------
O

O
C>
oo        up       ^-


odd

 AHSN3Q  Ainiavaoyd
                                                      0)
                                                      Q.
                                                      re
                                                      O
                                                      c
                                                      o
                                                      I
                                                      u
                                                      O)
                                                      e
                                                      3
CM

8
d
                                                    13-15

-------
       0.004 -
                                       3.0 NORMAL
                     100      200      300      400
                                   x  = 150.0
                                 VAR  - 3000.0
                                  a,  = i.o
       o.ooo
                              100
                             CONCENTRATION
300
            Figure 13.2. Effect of kurtosisj32 on distribution shape.
13-16

-------
                                           o
                                           v>
                                           0)
                                           -o
                                           15
                                           £
                                           o
                                           &
                                           o
                                           OJ

                                           2
                                           a
                                           6

                                           CO
                                           CO
      o
A1ISN30
      O
xiniavaoad
                                        13-17

-------
     0.016
     0-014
     0.002
     0.000
                                        £•40.0
                                        y »0
                            100               200
                           CONCENTRATION (/tg/m»)
300
   Figure 13-4. Variation of Gamma distribution with shape parameter a.
13-18

-------
UJ
O

t
en
w
UJ



Ul
               <
                                                                       >
                                                                       s
                                                                       Q.
                                                                       CD
                                                                       c

                                                                       —
                                                                       Q.

                                                                        CN
                                                                       LO

                                                                       CO
                                                                       O)
                                                   O   =
                       1N3IDUJ30D SISOldHX
                                                                   13-19

-------
                                      STATION 1-1965
                                      	 Peorson 21
                                      	Gamma
                 100
200     300      400
CONCENTRATION  (/tg/m»)
500
600
                                STATION 3-1965
                                	 Pearson 3ZI
                                	 Gamma
                         200      300      400
                         CONCENTRATION (ua/m8)
                           500
         600
               Figure 13-6. Pearson VI does better than gamma.
13-20

-------
  80
  60
£
  40
  20
                              STATION 1-1968
                              	 Gamma
                              	Beta (Type I)
200      300
CONCENTRATION
                                       400
500
600
         STATION 2-1962*
         	Gamma
         	Beta (Type I)
            100      200      300      400
                     CONCENTRATION  (Mg/m8)
                           500
         600
          Figure 13-7. Gamma does better than Pearson I.
                                                           13-21

-------
                                                                                           I/)

                                                                                           OJ
                                                                                           o




                                                                                           I
                                                                                           co
                                                                                           0)
                                                                                          Q_

                                                                                           c
                                                                                           CD
                                                                                           CD

                                                                                          JD
                                                                                          00


                                                                                          CO
                                                                                           O)
13-22

-------
     0>
     0>
     .a
     CD



     U

     c
     CO
     CD
    QL

     03

     CD
    .c
     CO
    O

    qi

    ro


     £
     D
13-23

-------
                                         STATION 3 - 1962
                                         	 2-P Uog
                                         	 Pearson I
                           200      300     400
                           CONCENTRATION  (/ig/m8)
         500
                                       STATION 3 - 1967
                                        ~     Gommo
                                             3-P  Log
                  100
200      300
CONCENTRATION
400
500
               Figure 13-10. Cases where all distributions fit.
13-24

-------
    a.

    CO

    c
    CD
    4->
    ^-J
    0>
    jQ

    15

    E

    o

    B,
    Q_

    CN
    CO


     e

     O)

    ill
13-25

-------
                                                                                          a.

                                                                                          OJ


                                                                                           CO
                                                                                           a>
                                                                                          J3

                                                                                          15
                                                                                           e
                                                                                           i^
                                                                                           o

                                                                                           O)
                                                                                          _o

                                                                                          a.

                                                                                          CO

                                                                                          CN
                                                                                           0)


                                                                                           "5
13-26

-------
Ack nowledgement
    My appreciation  is  gratefully  extended to  the  Harvard Department of
Statistics for their support of the computing necessary for this effort, and to the
Philadelphia Department of Public Health for permission to use their data.
References

Elderton,  W. P., and Johnson,  N.  L.,  1969: Systems of  Frequency Curves,
      Cambridge University Press.
Hunt, W.  F., Jr., 1972: The Precision Associated with the Sampling Frequency
      of Log-Normally Distributed Air Pollution Measurements. J. Air Pollution
      Control Association. 22: 687.
Johnson, N. L., and Kotz, S., 1970: Continuous Univariate Distributions, Vol. I
      and  II. Houghton-Mifflin.
Phinney, D. E.,  and  Newman, J. E., 1972:  The precision associated with the
      sampling frequencies of total  paniculate at Indianapolis,  Indiana.  J. Air
      Pollution Control Association. 22: 692.
Singpurwalla,  N. D.,  1972: Extreme  Values from a  Lognormal  Law with
      Applications to Air Pollution Problems, Technometrics. 14: 703.
Spirtas, R.  and Levin,  H.  J.,   1970:  Characteristics of  Particulate  Patterns
      1957-1966. PHS, NAPCA, Publication AP-61.
DISCUSSION
Larsen: As you mentioned in the beginning, Dave, there are several distributions
that could be used for  expressing skewed data.  Now that  we are  considering
controlling air pollution, there is one simple characteristic where the lognormal
is handy for the man in the regional office. For instance, if we have a lognormal
distribution of  SO2 concentration and we use a control plan that reduces all
sources by 90%, then we expect the new distribution to be parallel and  just
down one log cycle on the plot. So the lognormal provides a very simple way to
predict  what is going to happen to a future distribution based on  the  present
distribution. Also it gives a physical feeling of what might happen if, for instance
only tall sources were reduced, maybe all the power plants in the region, and all
the low, area sources were  not controlled. You might tend to chop off the
highest observed concentrations to give a new distribution with a shallow slope.
Or  if only all  the  low sources  were  controlled, a  steeper slope would be
expected.
                                                                    13-27

-------
Lynn: There's not much I would say; I certainly agree that nothing much fancier
than the lognormal  is ever going to  be what we call handy for that type of field
calculation.  I think  that the place of more complicated distributions is in fitting
large  bodies of data as entire bodies of data, rather than making assumptions
about control strategies. And of course the case when  one wants to do some
mathematics which  the lognormal may not be amenable to, or just that nobody
has done that particular derivation yet.  I  hesitated  there  in saying whether
another  distribution would ever  be of use in this  field  because  I  think the
extreme  value distribution can be handy and do well if one is interested only in
predictions about the maximum levels.
Hershfield: We have been talking a  great deal about the  2-parameter lognormal
distribution  but we haven't mentioned some important characteristics  of the
distribution. We have a distribution which is  skewed to the right. It is  outlier
prone. If you plot  the coefficient of skew (Cs) on the  vertical axis versus the
coefficient of variation (Cv) on the horizontal axis, the lognormal  distribution
has the relationship given by the equation, Cs = 3CV + Cv3.  Some mention has
been  made of the extreme-value or  Fisher-Tippett type I (Gumbel) distribution.
Gumbel, in  his work postulates a  coefficient of skew 1.139 and at  Cv equal  to
37%;  estimates from both the lognormal and the Gumbel  distributions are equal.
13-28

-------
      14. A PROPOSED AMBIENT AIR QUALITY SAMPLING
      STRATEGY AND METHODOLOGY FOR THE DESIGN
                OF SURVEILLANCE NETWORKS
        JOSEPH R. VISALLI AND DAVID L. BRENCHLEY

               Environmental Engineering Laboratory
                     School of Civil Engineering
                         Purdue University
                      West Lafayette, Indiana

                                and
                       HOWARD REIQUAM

                     Battelle Memorial Institute
                          Columbus, Ohio
Introduction

    Determination of the concentration of a specific pollutant in the ambient
air over a defined area  is a complex problem because of spatial and temporal
variations in air quality. Many factors such as local topographical differences,
atmospheric chemistry  phenomena, fluctuating  emission rates, non-stationary
sources, and varying meteorological conditions contribute to these  variations.
This space-time dependence complicates immensely the  acquisition  of statisti-
cally accurate estimates of true air quality. This in turn confounds the inter-
pretation and utility of the data.
    The Federal Clean Air Act of 1970 requires the monitoring of ambient air
for various pollutants. Well defined and vigorous federal directives  have been
issued  indicating  which pollutants are to be  monitored,  and  the preferred
reference testing methods and procedures for measuring these pollutants (anon.
(1971)). In contrast,  the guidelines  for  the  establishment  of air quality
surveillance networks are essentially of a  subjective nature as implied by the
following quotation:  "Experience and technical judgment  are  essential for
determining the  number and  location  of  sampling  sites because adequate
mathematical models or other models have not been  formulated"  (E.P.A.
(1971)). The importance and high costs associated with ambient air monitoring
                               14-1

-------
demand that mathematically rigorous methods for determining the numbers and
placement  of sampling  instrumentation be developed, such that the precision
associated with the test methods and procedures is of equivalent magnitude to
that of  the  actual  data  collection. It  is  of  little value  to take  precise
measurements in  a subjective manner. The result is high cost data, having poor
reliability and statistical validity, and of  little utilitarian  value  in determining
compliance with Federal standards.
The Problem

    Federal air quality regulations essentially limit the concentration of specific
pollutants to certain levels at all locations within defined areas (Anon. (1971)).
    Basically then, the problem is reduced to one of finding the point or points
within the defined areas at which the concentration of the specific pollutant is
the greatest. Noting  however,  that an infinite number of  points exist  in the
defined  space,  and  that  the  concentration  levels  vary  both  spatially  and
temporally,  the deterministic  approach  of  finding the  points where  the
concentration is the highest is  a practical impossibility. Instead, a probabilistic
approach is indicated. A random sample of pollutant concentration within the
space over the defined area should be taken. This sample must take into account
both space and  time factors. From the resultant distribution, an estimate of the
probability that a certain level of pollutant concentration has been exceeded can
be  made. Two  problems arise as  a result  of  choosing this approach. One is
concerned with the mechanics of designing and conducting a  simple random
sample  with respect  to an  infinite population that is a function of time and
space.  The  other  pertains to a strategy and methodology that will enable an
unbiased estimate  of  the  mean and variance of pollutant concentration to be
calculated, with pre-determined accuracy and confidence in the statistical sense.
An approach to the first problem can  be devised  by taking into account the
nature of the turbulence regime at the microscale level. To accomplish the latter
requires an  advance estimate of the sample size needed for assurance that the
desired  degree of precision is attained (Cochran (1963)). This in turn requires an
advance estimate of the variance  of the random variable under consideration.
With  this  advance information, a  random sample  of sufficient  size can be
conducted,  such that an  accurate  and precise estimate of the  air pollutant
concentration distribution over the defined  area can be made. The probabilistic
question of whether  air quality standards have been  exceeded for the sampled
area can  then  be asked.  More important  is that  with such a procedure, the
question  can  be  answered  with  a  mathematically  determined  degree of
confidence.
 14-2

-------
The Strategy and Methodology

    Strictly  speaking, federal standards can be  interpreted to include not just
ground level concentrations,  but concentrations as a function of the vertical
coordinates  up  to the depth of the mixing layer.  To accomplish such a  task
would  require,  as  will  soon  be evident, a  virtual armada  of airplanes or
helicopters, or a remote sensing method. The primary purpose of establishing
federal standards is to protect public health and welfare. Thus we feel it can be
reasonably  argued that  the  interests  of  public health  and  welfare will be
adequately served if pollutant concentration is monitored with respect to both
space and  time  at people (ground) level only. Dispensing of the vertical aspect
brings the problem  into  the realm of practicality as far as economics, and the
satisfaction of sampling criteria  (to be discussed later) are concerned. The scope
of the  problem,  even with  this significant deletion,  remains however, beyond
practical consideration at this point. The results of  the Nashville study (Stalker
et al.  (1962)) revealed that  at least 245 sampling stations, approximately  four
per square mile, would be required to estimate the daily mean concentrations of
sulfur dioxide for  the whole city  with  95%  confidence  of  ±20% accuracy.
Establishment of this many  monitoring stations is probably beyond the financial
capabilities  of  most  communities.  The only  alternative then, if statistical
accuracy and precision are  to be maintained, is to consider that only  certain
sub-areas within the whole bounded area require air monitoring. These sub-areas
would be specific sectors of the whole bounded area, that by some subjective or
objective  criteria  are  judged  to  contain  the highest   levels  of  pollutant
concentration.1  Effectively  then, we would be utilizing the statistical technique
of stratification, but departing  from the usual method of  analysis. Ordinarily,
sampling would be conducted in randomly selected stratified sub-areas, and an
estimate of the whole area  mean value for some variable calculated from these
samples. In this approach, we purposely select the stratified sub-areas judged to
contain the  highest values of the variable under consideration, and attempt to
1A certain amount of subjectivity is inherent in most engineering approaches or solutions to
 problems.  This  is  due  to the  two classical compromises between  mathematical
 requirements and technical capabilities, and between this resolution and available financial
 resources.  It will  become  evident later, that  the  approach  taken  in  the proposed
 methodology would  theoretically  eliminate all  subjectivity if the imposed economic
 demands could be met. In any case, deciding what constitutes a critical area is a complex
 question that can be approached in any one of, or a combination of the following ways. It
 could  be based on  existing  air  quality isopleth  data, simulation dispersion  model
 predictions, attitudinal-behavioral studies of the citizenry, demographical-medical-specific
 area data,  wind  rose data in conjunction  with industrial location distribution, or prior
 complaints. The point is that some criteria can be developed.
                                                                         14-3

-------
establish from  the  sampling of these sub-areas an upper bound to the variable.
We can then  conclude that the remaining sub-areas, based on the developed
criteria, will be enjoying better air quality (the variable) than that indicated by
the established upper bound.2 With this approach, the scope of the problem is
within the limits of practical economic consideration, the legal aspect is satisfied,
and the mathematical accuracy and precision that is desired can be attained.
    Consider  one  of  the  sub-areas discussed earlier. It is a defined area of
constant (with time)  topographical features.  Assume a hypothetical  situation
where the  emissions polluting this sub-area are of  constant strength  and flow
rate.  If the meteorological factors that influence the transport and dispersion of
these pollutants are also assumed to remain constant with time, then the mean
and variance of pollutant concentration across the sub-area will remain constant
with  time. Larsen (1971) indicates such a situation for a single point. One can
extrapolate the concept to  cover an area simply by taking the average of many
single points over the area.
    Consider now  the  hypothetical case where emission factors polluting this
sub-area remain  invariant  with  time, but the meteorological  conditions  are
changing with time. During time periods when the meteorological conditions are
similar, our previous argument of a constant mean  and variance of pollutant
concentration  is valid. That is to say,  varying  meteorological conditions can be
classified  according to some finite scheme that attempts  to categorize these
varying conditions  into specific meteorological regimes (e.g. Pasquill's scheme).
If these regimes  are  described  by  the factors that influence transport  and
dispersion phenomena, then during periods when a specific regime is dominating
the weather,  a specific  mean  and  variance  of pollutant concentration  will
accompany it.
    In reality,  however, emission characteristics will vary with time in a manner
which  is  very  difficult  to  predict.   Thus  even during periods  of  "similar
meteorology",  the  mean pollution concentrations will differ. However, since the
variance is not directly proportional to the mean it is reasonable to hypothesize
that  the variance of pollutant concentration will  remain  constant with time
during periods of "similar meteorology". Effectively  we are hypothesizing  that
since  the variance of  a group  of numbers  depends only  on their relative
magnitudes,  and  not  their  absolute  values, the  variance  of  pollutant
concentration  is not  dependent  upon the  mean  pollution level. Hence the
variance is independent of emission factors, and the original argument for the
variance of pollutant  concentration  holds.3 Thus, differing concentrations of
^Considering the extreme complexity of the problem, it is entirely possible that points in
 the sub-areas not sampled could violate the standards. Nothing can be done about this
 problem using  this methodology.  Thus it is important  to consider this aspect  when
 determining the sampling criteria.
3The strategy and methodology developed in this paper does not consider the effect of
 non-stationary emission sources such  as automobiles. Hence this approach  is applicable
 only to those pollutants (such as SC^) emitted primarily from stationary sources.
14-4

-------
pollutants will be dispersed over a given area in a similar manner during periods
of "similar meteorology".
     This hypothesis parallels  closely the  statistical concepts of homogeneity
and stationarity  as applied  to horizontal turbulence  levels  over  an  area.
Stationarity  essentially means that  the statistical characteristics which  describe
the  frequency distribution of the horizontal turbulent eddies remain constant
with time. Thus the mean, variance, and the rest of the statistical moments that
characterize  the frequency distribution remain constant with changes in  time.
Homogeneity  implies  that the  frequency  distributions  of  these turbulent
eddies are the same throughout the area under consideration. Dispersion models
derived from both  gradient transport and  statistical theory must employ  these
concepts in  order to solve their respective  equations. It is  well known  that the
statistical  properties of  turbulence vary  in the vertical  direction. However,
horizontal turbulence characteristics at constant height do fulfill to a certain
extent, the statistical  concepts of homogeneity and stationarity provided that
major  changes in the meteorological regime do not occur (U.S. Atomic Energy
Commission  (1968)).4 The first hypothesis contained in our methodology states
that  a  stationary  process will occur,   provided  that  no  changes in the
meteorological  characteristics  that  govern  the  transport  and  dispersion
phenomena  occur.  This  is a  less restrictive stationary process  than  the one
postulated for the turbulence regime.  We  have  proposed the concept  that the
variance of pollutant concentration, not the mean, will remain  constant with
time under specific conditions. In many stationary processes, assuming the mean
of a random variable to remain constant with time is a questionable assumption.
The stationary  process we propose does not require the mean value to remain
constant over time, and thus is less restrictive than the typical case.
     Consider an infinitesimal section of a defined area. The variance of pollutant
concentration will  be  very small, and  in the limit will equal zero as the area
approaches zero. As the  section is  increased in size, the variance will exhibit a
tendency to increase  according to  some function  toward the variance of the
entire  defined  area. A decrease appears  unlikely  because  the  variance will
decrease in  magnitude only  if observations tend to cluster about the actual
mean.  This  indicates that the concentration gradient is tending toward  zero.
Experience demonstrates  the  reverse.  There is a tendency for pollutants  to
accumulate in certain  areas,  creating  large gradients  when  the whole area  is
considered. Thus it  is hypothesized  that an  increasing area-variance relationship
will  exist. This is also in accordance with turbulent theory, which indicates that
the statistical properties of turbulence in the horizontal will tend toward greater
dissimilarity  as larger areas are considered.  Thus the variance between turbulent
eddies will in all likelihood increase as  larger areas are considered (U. S. Atomic
4lt must be noted that these concepts are generally  assumed to hold for flat plains and
 rolling countryside. Little is known about the effects of irregular terrain.
                                                                       14-5

-------
Energy  Commission  (1968)).  As shown  in  Figure  1,  knowledge  of the
variance-area relationship could  be  combined with  cost  data  and available
financial resources to maximize  the area covered by  the  sampling strategy.
Knowing what  financial  resources are available for sampling purposes and the
cost per sample, we can easily determine the sample size we must work with. If
we select a confidence interval (precision) and a margin of error (accuracy) that
are deemed adequate for the purpose, then from the equation:

        ^ /t
n = (-r)2  (variance) = k;  (variance)
    d                '
where:
     n = sample size, if the population size is very large
     t = the abscissa of the normal curve that cuts off an area a at the tails
     d = chosen margin of error
1  — % = confidence probability
    kj = (t/d)2, where  the value of t depends upon the sample size
 we can  determine an  estimate  of  the variance  that  would  be required  to
 accomplish  our  sampling  task.  Knowing  this  and  the  variance-area size
 relationship,  we can  determine the size  of  the  area we should cover in our
 random  sample  in order to achieve the prescribed precision and accuracy with
 the financial  resources available. We obviously would like to maximize the area
 covered  since this reduces the amount of subjectivity that has entered our
 strategy  and methodology.
 The Experiment

     The correct method  of  actually obtaining the samples  to be  used  in
 estimating the  mean  and variance  of pollutant  concentration  is not  at  all
 obvious. To apply probability theory to the data with any degree of confidence,
 it is imperative that random sampling techniques be employed (Cochran (1963)).
 Keeping this in mind, one might propose  that to overcome the spatial-temporal
 dependence noted earlier, continuous air monitors should be randomly scattered
 throughout the defined  sub-area.  This  approach  would be  correct  if the
 continuous monitors were not fixed at every  randomly chosen  location. Two
 reasons  dictate  that  continuous monitors should  be continually  moved in
 random fashion, if accurate  estimates of pollutant concentration and variance
 are  to be obtained.  First, the  population (molecules  of  S02)  is continually
 changing its spatial distribution  with time, and its size (number of molecules) is
 also   continually  changing   with   time   (molecules  are   added,  deposited,
 transformed to other forms, remain with the area, etc.). Essentially then, a
 14-6

-------
new random sample is needed for each  new population.  Secondly, and perhaps
more important, the  response  of any monitoring  instrument placed in an
irregular topographical setting will be biased. This is due to specific effects of
building geometry   on diffusion  parameters.  If  a monitor is fixed  at  one
point—even if randomly located—the response of the instrument will include this
bias on a continual basis with time. Effectively this states that no single point
can  be representative of a large area (Corn (1970)). This is especially true over
relatively short periods of time such as  a day. The Nashville study (Stalker et al.
(1962)) demonstrated  that relatively few instruments were needed to accurately
estimate seasonal  and yearly concentration levels  over  an  area.  This  is a
predictable  result, as long term sampling tends to average out fluctuations and
converge on the true mean.  The study also revealed how inaccurate their results
were when a few instruments were used to estimate 24 hour averages. This was
due, in part, to the  bias encountered from fixed point sampling. Thus, to
continually  "remove" or reduce the inaccuracy due to this time factor bias, one
must continually relocate  the  monitors in a  random  manner with time.
Randomized mobile monitoring is thus essential to proper sampling procedure in
this case.
     In accordance  with  this type of reasoning,  we  should also note  that a
restriction  to  the interpretation of data collected from continuous fixed point
monitoring  is indicated. If the purpose  of the monitoring'is merely to indicate
trends at that specific location with time, continuous fixed station monitoring is
certainly reasonable. Caution must be  exercised however, if such data is to be
used to determine why the trends occur over time. Changes in  the topography,
both local and  in  the vicinity of the continuous  fixed point  monitor  (urban
renewal, road construction, etc.), the addition of new sources, etc. can confound
meaningful interpretation. It would be impossible to discern whether changes in
air  quality are due to an effective (or ineffective) air conservation management
program, or merely due to charges jn topography that alter the bias suggested
previously. Problems of this nature would not arise if a mobilized monitoring
approach  was  used.  This   is  because the mobilized  monitoring  approach
eliminates (or at least reduces) this bias.
Data analysis and prediction models

    The utility of this whole approach is  dependent upon an ability to predict
accurate advance estimates of the variance. The choice of a suitable prediction
model depends to a great extent upon the nature of variance data presented as a
time  series.  Figure  2  depicts  some of  the possible  results  of graphically
representing such data. All of these possibilities have distinct features that would
influence the choice of a suitable stochastic  prediction model. Figure 2A,  for
example, illustrates a process that incorporates features of both time stationarity
and non-stationarity. This time series depicts discrete periods of stationarity that
                                                                      14-7

-------
obviously change mean values in an almost instantaneous jump fashion. Figure
2B exhibits  a  similar  tendency, but  has a  smoother transition between time
stationarity intervals. Figure 2C shows a random fluctuation process. Figure 2D
depicts a  time series  that is stationary in  the wide sense.5 Certainly other
possibilities exist. Subsequent data  analysis would reveal  if the random variables
are dependent  or independent, and  if long period, seasonal, or cyclic trends exist
along  with  inherent  random fluctuations.6  A  procedure  outlining a  general
method of determining the nature of a  particular  set of data can be found in
many texts on  time series analysis (e.g. Box and Jenkins (1971)).
    At this  point we can only  speculate (since data  is not available) as to  the
nature of the time series. Since  many  possibilities exist, it is not feasible here to
outline a  data  analysis  for each situation. Instead, a single  case  where a time
series that is in accordance with the hypotheses stated earlier will be considered.
One possible approach to the data analysis will be outlined. The stress will be on
simplicity.
    Table I  represents the type of data required for the approach outlined in
this paper. The variance estimates and concomitant meteorological parameters
 are arranged  in  the  sequence  in  which  the data  were taken.  Each variance
 estimate  is calculated from the average of simultaneous measurements taken over
 equal  averaging times if discrete sampling  techniques are employed, or  the
 average of the  measurements over equal time intervals  if continuous monitors are
 utilized.7 As shown in Figure  3, the variance can be plotted  as a continuous
 function  of  time. Since the  sequence does not appear to fluctuate in a totally
 random manner, and  many atmospheric phenomena are known to be dependent
 processes (U. S. Atomic Energy Commission (1968))  it will be assumed that the
 variance measurements are dependent random variables. The data analysis should
 concentrate  on achieving five objectives:
  The concept of stationarity in the wide sense essentially provides for a process where only
  up to a certain statistical moment is constant with time.  A good possibility exists that
  many atmospheric  processes exhibiting stationary features are actually stationary in the
  wide sense.
      first hypothesis described in this paper provides for transient variations due to short
  term meteorological conditions.  Cyclic trends due to diurnal effects and seasonal weather
  trends may also affect the time series. Long term trends, due perhaps to changing topology
  (urban  renewal, etc.) or new  emission sources, are also a possibility that  should be
  considered.

 'Averaging time  is defined as some time interval over which the continuously varying trace
  of the variable  may be represented by a constant (average) value. We note also that this
  averaging time (or the time interval if continuous monitors are used) should be as small as
  is technically possible to avoid the time bias discussed previously.
 14-8

-------
                        Table I. Data Sequence Required

                                        V    ......................   V
                                                                        n
                                                                       Xn1
   Xlm	   Xim  	   Xnm

 where:
     V = spatial variance of pollutant concentration at time i
    xij = recorded values of meteorological variables assumed to have a
         significant relationship to the variance
      i = time period number
     j = representative of a specific meteorological parameter


     1.   Determine periods of  time stationarity over  which  the variance  is
homogeneous or constant.
     2.   Determine which  meteorological  parameters influence the changes  in
variance  levels.
     3.   Characterize  the  meteorology  during  each period  of stationarity
according to some finite classification scheme based on the parameters found to
influence variance changes.  Test the hypothesis  of "similar meteorology" based
on this classification scheme.
     4.  Choose an algorithm that will predict advance estimates of the variance
across  the sub-area, given  that the  variance of  the  time period  immediately
preceding it is known, and the estimated meteorological, parameters of the time
period to the predicted are known.
     5.  Develop estimates of  the length of  the stationary time period for the
predicted variance.
Keeping these objectives in mind, the data analysis can proceed as follows:
     (a)   Find the longest  possible continuous  time  intervals over which the
variance  is constant in the statistical  sense, i.e.,  no significant difference exists,
or limit the standard deviation of the data to a certain percentage, etc. This will
have the effect of breaking  down the data  into  discrete step functions of time,
where  the variance will be constant over the  interval.  These resultant  time
intervals  may not, as a consequence of the analysis, be  of equal length, but
should  be  as  long  as  statistically  possible.  The overall  result could  look
something like Figure 4.
                                                                      14-9

-------
    (b) Run an analysis of variance or a regression analysis on all of the data to
determine the meteorological variables that significantly influence the variance
change.
    (c) Based  on  the results of  the  analysis of variance (or the regression
analysis), devise a classification scheme of a finite number of divisions or classes.
The  scheme  should   probably  be  both  qualitative  and   quantitative in
nature—perhaps similar to that proposed  by Pasquill (1962). After each time
interval is classified,  we can  proceed  to test  the  hypothesis  of "similar
meteorology" by simple inference tests on the variance values of each class.
    (d) Assuming that the hypothesis is not rejected, we are now in  a position to
propose a  simple algorithm.  This will predict an estimate of the variance in
advance, given that we know what the variance was at the previous time interval,
and  have  available an  estimate of the values of the pertinent meteorological
variables for the time interval to be predicted. This can  be accomplished in the
following manner:
    (1)  Breaking  down the data into  a finite number of classifications, say six
(A,B,C,D,E,F), will  result in six frequency  distributions  being developed (see
Figure 5). The variance values for each class stem directly from re-arranging and
grouping the data after classification. We thus  have a mean value and a variance
associated with each variance class. These  distributions need not be similar. The
only  requirement for  further  analysis is that  the variance  of  the variance
distributions be finite.
    (2) Choose a simple linear least-squares algorithm such as:

                                 cov
 where:
            (vj.  ) = variance estimate for time period Tj+1
         E[V-p  J  = expected value of the distribution of the variance for
                     time period Tj+1

 cov[VT. VT.  ]  = covariance of the random variables VT.VT.

        var[V-|-j] = variance of the distribution of the variance for time period Tj

           (Vj.)  = variance for time period i
         E [ VT. ]  = expected value of the distribution of the variance for time
                  period T;
 14-10

-------
    We possess all  the inputs to such an  equation from  the six  frequency
distributions. The
The  estimate  for  E  [VT.VT   ]   would  be  obtained  from  the  bivariate
distributions resulting from a  Markov chain analysis  of  the sequence of  order
that the classification followed.
Time interval     12345     etc.
Class            A     C    D     A    B     etc.
We  can,  as a  result  of this  sequence  determine  the  bivariate  frequency
distributions of  AC,   CD,  DA,  AB,  etc.  for all  possible combinations of
classifications to determine the cross-correlation function above.8
     This  whole  analysis thus  far  depends upon the availability  of advance
weather forecasts.  These forecasts  must include estimates of the parameters
found by the analysis of variance to be pertinent to variance changes. It must be
noted that  in the event such forecasted parameters are  not available,  we can
determine from the analysis just  outlined,  the k-step transitional probabilities
needed to estimate the  probability that a certain predicted state will follow the
present state. We thus have meteorological class prediction capabilities inherent
within the normal data analysis routine.
     The remaining problem is to predict the length  of time interval that will be
associated with the predicted meteorological class (and hence the variance level).
The approach to this problem lies in determining whether the length of the time
interval  of  successive  meteorological  classes is described  by a  dependent or
independent random process.  If the  process  is independent, then the  best
estimate for the length  of the time interval of the predicted meteorological class
is merely the mean value  of the  time interval  frequency distribution for  that
class.  If  the process is found  to be  dependent on  the  preceding event, an
algorithm similar to the one proposed for the variance estimates can be  used. If
the process  is found to  be dependent in a more complex manner, such as being
dependent upon  the length of  the time interval for the previous 2 or 3 or n
classes, then a probability model such as
       i  =  f|LTj   +  f2LJi_,   +  ...........  +  fnLTj_n  + ru,
 lf the time series is found to be an independent random process, the
 would be zero.
 The best  advance estimate for the variance  would then  be the  mean value  of the
 distribution for the predicted meteorological class.
                                                                      14-11

-------
can be used,
    where:
    LTj+1  = length of the time interval to be predicted
      LTj   = length of the time interval of the present meteorological class9
        f  = weighted factor of previous values of LT
      ri+1   = the random element that must be taken into account
    There  are some distinct advantages to a program and an analysis such as the
one proposed in this paper.  The method  is  obviously extremely  flexible, as
mobile monitors can  be moved wherever desired. The method can  be adopted
by  almost any  community  with   limited  financial resources due  to  the
variance-area relationship.10  The analysis is  basically simple and is  Bayesian in
nature. That  is,  new data are continually put into the system  so  that better
and better  estimates  of predicted values can be  realized.  Most important, a
mathematical statement of accuracy and confidence can be associated with each
estimate of the upper  bound concentration level for a specific pollutant.
Summary

     A  sampling strategy  and  methodology  were  proposed that enable  an
optimum size air  monitoring surveillance  network to be developed at a  given
cost. The estimates of pollutant concentrations that  result from such a network
are statistically accurate and valid to a predetermined level. The model develops
two hypotheses into a prediction algorithm that enables advance estimates of the
variance of pollutant concentration to be made. Knowing this, one  is able to
allocate the correct number of samples to best estimate pollution concentration
levels with predetermined accuracy and confidence.
     The mobilized monitoring approach taken by this strategy is probably the
method that first should  have  been  implemented. Unfortunately, the fast
response time and portability features that were necessary for such an  approach
were not  available  when  ambient air  monitoring was  first started.  These
requirements  are  being met  by  the new instrumentation available now, and
planned for the  future. If the  mobilized  approach by  any methodology  is
deemed the proper way to go, we should not  let tradition stand in the way.
       that LTj must be estimated also, as we are dealing with a real time situation. That is,
  we don't really know how long LTj will actually last until it ends. We can then get a
  revised estimate for LTj+^.

 "It should be noted that as a result of the analysis, the variance-area relationship becomes a
  family of curves—one curve for each classification. Thus the final graph depicted in Figure
  1 will contain a family of curves.
 14-12

-------
                                UJ
                                N
                                tf)
                                UJ
 SAMPLE SIZE
VARIANCE
             LIMITING VALUE FOR TOTAL BOUNDED AREA
               SUB AREA SIZE
Figure 14-1. Use of variance-area relationship irf sampling strategies.
                                                              14-13

-------
       CE

    
-------
o:  t
UJ  <
>  UJ
o  rr
LU
O  CD
2  =>
<  C7)
   o
                     TIME   PERIOD  NUMBER-
                Figure 14-3. Spatial variance as a function of time.
1
Ul

u<
a:
=J<3

-------
                             CLASS A
          o
          z
          UJ
          z>
          a
          UJ
          rr
               SPATIAL  VARIANCE  FOR  A

                  SPECIFIC  SUB AREA
          >-
          o
          o
          UJ
          or
          u_
                                 CLASS  B
              SPATIAL  VARIANCE FOR A

                  SPECIFIC SUB  AREA
Figure 14-5. Frequency distributions for various meteorological classes (similar

     plots for all classes).
14-16

-------
 REFERENCES

 Aitchison, J. and Brown, J.A.C., 1957: The Lognormal Distribution. Cambridge
      University Press.
 Anon.,  1971:  Requirements  for  Preparation,  Adoption,  and Submittal of
      Implementation Plans. Federal Register. 36: No. 158.
 Box,  G.E.P., and Jenkins, G.M., 1971: Time Series Analysis Forecasting and
      Control. Holden-Day.
 Cochran, W. G., 1963: Sampling Techniques. John Wiley.
 Corn, M., 1970: Measurement of Air Pollution Dosage to Human Receptors in
      the Community. Environmental Research. 3'. 218-233.
 Davenport, W. B., 1970: Probability and Random Processes. McGraw-Hill.
 E.P.A.,  1971:  Guidelines: Air Quality  Surveillance Networks.  United States
      Environmental Protection Agency, O.A.P., AP-98.
 Larsen,   R.  I.,1971:  A  Mathematical   Model  for   Relating  Air  Quality
      Measurements  to  Air  Quality  Standards.  Environmental  Protection
      Agency, O.A.P. Publication No. AP-89.
 Pasquill, F., 1962: Atmospheric Diffusion. D. Van Nostrand Co. Ltd.
 Stalker, W. W., Dickerson, R. C., and Kramer, G. D., 1962: Sampling Station and
      Time  Requirements for Urban  Air  Pollution  Surveys. J. Air Pollution
      Control Association. 12: 361-375..
 United States  Atomic  Energy Commission, 1968:  Meteorology and  Atomic
      Energy, 1968. A.E.G./Division of Technical Information.
DISCUSSION
Court: A third hypothesis that you should consider is that the variance, your
spatial variance, will decrease as the sampling time increases. If you take samples
for an entire year over your area, the difference between the sampling sites will
be much less than if you read them minute by minute. This all comes back then
to the question of what you are trying to measure. If all that is wanted is the
total concentration over an area, then your procedure may  be valid. If we want
to know where the hot spots are so we can track them down  to a certain emitter,
then we want to  keep the stationary sampling sites. Furthermore,  if we have
stationary sites we can use the correlation between stations (the covariance) to
look  at the  measure of the  variability  of the concentrations over area. A
covariance  between   mobile   stations  would  be  meaningless  whereas  the
covariance between fixed stations will indicate just how spotty the pattern is.
Visalli: Yes, I think you're absolutely correct. I would like  to comment on one
other thing, that is the concept of the biological half-life. If indeed the primary
purpose of  monitoring air is to protect public health, then the time period of
                                                                   14-17

-------
analysis should be something that is going to be effective, that is, comparable to
the biological half-life. In the case of SO2 the biological half-life happens to be
on the order of several minutes—I think 2-4 minutes. Well, if we're trying to get
an estimate  of what the air quality is in an area, I don't think that there is a way
that we can  do this by placing a monitor at a fixed point. We've got to have at
least several  monitors, in one area, whether fixed or not, to get an idea of what
the  probable  concentration  is  for the  entire area;  and then see  if we get
meaningful cause and effect relationships. However I do agree with your point.
Marcus:  One  thing  I would just like to point out  is that as far  as  getting
correlations  between moving monitors, this seems to be a little bit analogous to
the association between the Eulerian and Lagrangian approaches to correlation
turbulence theory.
Dave Spiegler: One of the problems that National  Weather Service has (I'm not
with the  National  Weather Service)  is to determine the optimum  spacing of
stations for an analysis which is similar to the problem you're talking about.
What they do is have an objective analysis of an area and the spacing depends on
the scale and how big the grid is. Here again it depends on whether, as Dr. Court
said, you want  to analyze  over a period of  an hour, or 6 hours, or a  day, or
what. The space scale is related  to the time scale. The longer the period of time
that you want to analyze, the fewer the number of stations necessary to describe
the analysis over the area; i.e., the shorter wavelength features are less important
as the time period  increases. An objective analysis procedure  has initial guess
values  on a  grid of points and uses the station observations near those points to
adjust  the initial guess. I think  that something like this  might  be useful in the
work you are doing.
/.  Enger:  Just a small comment about the role of sampling, based upon just a
very preliminary result that we came across a few weeks  ago. Apparently a very
small number of samples can approximate the distribution of an annual average
very closely. What we did was for each  10 points we  took a random sample of
100 and then  a random sample of 10; and the means for two samples came very
close. Not that I'm trying to say something statistical about it, just that it is an
indication that a very small number of samples can estimate a distribution quite
effectively.  I think that's a point favorable to mobile sampling, provided  you
have the sampler returned to the same place. On  a randomized scheme,  100 or
200 times a  year you can get an annual average for a whole variety  of places with
only one instrument.
Wanta: I suppose that with the passage of time one will become acquainted with
the characteristics of each of the sampling sites in the sense that one does with a
single site. He  learns what the near and more distant sources are.
14-18

-------
       15.THE EFFECT ON ROLLBACK MODELS DUE TO
      DISTRIBUTION OF POLLUTANT CONCENTRATIONS
               YUJI HORIE AND JOHN OVERTON

       Department of Environmental Sciences and Engineering
                      School of Public Health
                    University of North Carolina
                     Chapel Hill, North Carolina
Introduction

    One of the important uses  of air quality data is determination  of  the
emission reduction required to achieve desired air quality. So-called "rollback
equations" are  often employed to compute a reduction in  emissions. Basic
Questions have been raised on  the application of rollback equations to emission
standards for automobiles, and the effect of the distributions of present, future,
and desired concentrations on the applicability of these equations.
    The rollback equations that have been proposed are expressed as

                         L    gLxp-B

                       gj(Xp-B)-(Xd-B)
where
    RL = reduction ratio (according to Larsen (1969))
    Rj = reduction ratio (according to Jensen (1971))
    X  = present air quality
    Xd = desired air quality
    B = background concentration
    9l_' 9j = 9rowtri factors over the period from the present to the goal year in
            which the desired air quality will be achieved.
The  reason for the differences between  these two  equations is not being
explored.  The differences,   however,  are negligible  when the  background
                               15-1

-------
concentration, B,  is much smaller than the  present air  quality, Xp.  Both
equations, for most practical purposes, can be reduced to

                     D~D  ~D  ~  9xP-xd
                     R ~*~ Ri ~"~ Kj ~~"                                lo)
                                          Q Xn
    where g = gL = gj.
   Once the reduction ratio, R, is determined, the permissible emissions can be
calculated by using

where
    ed = future desired emission per unit source
    ef = future emission  without controls per unit source.
If the future emission without controls, ef, can be assumed to be the same as for
the present (this is the case for automotive emissions), then  Equation 4 can be
written as
                            ed = ep(l-R)                         (5)

where
    ep = present emission per unit source.
    The major question  in  using a rollback equation is how to compute the
growth factor, g (gL or gj). According to Equation  1 the growth factor appears
to be defined by

                                   —  '                             IR}
                                 ""   XP
where

    Xf = future air quality without controls.
According to Equation 2 the growth factor appears to be defined by

    Values of pollutant concentration are  described by  both  their averaging
times and percentiles. The  national air quality standards, for example, are stated
as "D milligrams  per cubic meter -maximum t-hour concentrations not to be
exceeded more than once per year" (Anon. (1971)). Therefore the values of Xp,
Xd, and B in the rollback equations must have the same averaging times, t-hours,
etc.  In addition to this, the growth factor defined either by Equations 6 or 7
should be calculated using the values of Xf, Xp, and  B  that correspond to the
same percentile at which the value of Xd is designated,  i.e., "once per year."
    The  usual method for determining  the growth factor (Ott et al.  (1967);
Larsen (1961)) is: First, future emissions are estimated by some method such as
15-2

-------
projection  of   past  car  registrations  to  the  goal   year.  Second,  future
concentrations are estimated by applying an air pollution display model to the
future emissions using  presently available meteorological data. Third, growth
factor is determined from the ratio of the estimated future concentration to the
present air quality  that  is either estimated  by the  same air pollution model or
actually measured.
    A question  arises as to whether the ratio of future concentration to present
concentration remains the same at every percentile value. This question may be
restated—does the linear relation between emissions and concentrations, which
leads  to the rollback equations, extend to the percentile values of present, future
and desired  concentrations.  In order to explore the assumption of linearity,
which is implicitly employed in the usual method,  a simple proportional model
is assumed. As  will  be seen, a  proportional model does not in general imply
linearity between emissions and concentrations.
    The discussion  and consequence of  this model are  developed  in the next
section. Using the assumption  of linearity over percentile values, the effect of
concentration distributions  on  the rollback  equations  is  investigated in  the
section titled "Rollback  equations for  percentile values."  Several numerical
examples are given in the succeeding section in which imaginary cities have been
constructed and empirical distributions  of  the "city's" present  and future
concentrations  have  been calculated;  growth  factor  for each percentile is
graphically displayed showing the dependence of the growth factor on percentile
and emission growth pattern. The implications of these calculations are discussed
in the last section.
Simple Proportional Model

    The  air quality concentrations at  any location in a city  is assumed to be
given by  (Appendix I)

                               X=B+eF                             (8)
where
    e = emission per unit source
    F = function  of all  relevant variables such as weather factors and source
         distribution.
Even  though B has a distribution, its value (even at the relevant high percentiles)
is assumed much smaller than  e F and may be neglected. We assume it to be
constant and independent of city growth.
                                                                      15-3

-------
    In Equation 8, F is assumed not to be a function of e; thus the distribution
of  F  determines the distribution  of X.  Future  concentration at the a-th
percentile, without emission controls, is

                          Xfa=B + efFfa                         (9)

Xfa depends, in part, on the future source distribution. This source distribution
includes the effect of any city planning  and  regulatory  practices. The desired
future concentrations due to an emission reduction at the same percentile is
Since F is independent of emission per unit source, we have

                              Fda=Ffa

Using Equations 5 and 1 1 , Equations 9 and 1 0 can be transformed to
                                                                   (12)

                       Xda-B = (I-R) epFfa                  (13)
Thus we have
                                    X,a-B

The reduction ratio given by the above equation is independent of percentile, a.
There  is,  however,  no established  means to estimate  the percentile value  of
future concentration, Xfa.
     A gcowth factor, Ga, can be defined as

                           r  =  Xfot"B                          (15)
                           Ga — T   nr
                                   *pa  °
This growth factor is a measure of the growth of concentration value at different
percentiles due only to the sources  that can be controlled and thus excludes the
background which is assumed to be independent of any growth. Substitution  of
the growth factor into Equation 14 yields
                                  Ga(Xpa-B)
Now the reduction ratio  is expressed in terms of  known variables except the
growth factor.
    Since (Xpa- B) = epFpa and (xfa- B) = epFfa, the growth factor can be
written as (Appendix I)
                              Ga = l!2                           (17)
                                     ""pa
15-4

-------
 There  is  no reason  to assume  that the ratio, (Ffa/Fpa),  is independent of
 percentile since both Fpa and Ffa are dependent on the emission inventories.
 The spatial  distribution   of  emission  sources changes  as  the  city  grows.
 Consequently, the distributions of Fpa and Ffa may be significantly different.
 In  this sense, the  proportional  model  does  not  necessarily  imply  a linear
 relationship between emissions and concentrations.
 Rollback Equations for Percentile Values

     This section is concerned with the dependence of the rollback equations on
 percentile and in particular the relation of these equations to the proportional
 model of the previous section.
     For simplicity the following assumptions are made: (a) The growth factor is
 independent of  percentile,  i.e.,  G =  (Ffa/F  a) =  constant, (b) The rollback
 equations are valid for some percentile, say the 50-th. Then, the 50-th percentile
 reduction ratios in terms of any other  percentile concentration values can be
 expressed as (Appendix II)

            qLXPa"  Xda  + fiqCG-gJ   ,  9Lxpg-xda
     jj   —                      ,              <+
                                                   gLXpa-B

 and
         RJ =
                        9 I v A pa - o i T f?a v u ~9j j

                      gj(Xpa~B)  — (Xda~^)

 where

                    £a= (Xpa-Xp50)/ep                    (20)

One can see that the form of Equations 18 and 19 is not invariant with a change
in percentile a nor are RL and Rj independent of a unless the growth factors,
gL  and gj are properly  defined. A  "natural"  definition  of gL and  gj  is
                                             Fpa

In this case Equations 18 and 19 reduce to the same form as the original rollback
equations, Equations 1 and 2.
                                                                  15-5

-------
    In a more general case the growth factor, G, will be a function of percentile.
In this situation the reduction ratio, R:, defined in Equation  2 is the same as R
in Equation 16 when  the growth factor, gj( is defined as in Equation 21, while
the reduction ratio, RL, is different from  R even when the definition. Equation
21, is used.
Numerical Examples

    A simple  diffusion  model   (Hanna  (1971))  is  used to  simulate  the
distribution of concentrations due to several source density configurations. In
this model, as used, concentrations are a function only of the source density and
wind velocity; i.e.,

                                                                       (22)
where
    u = wind speed
    PJ = density of sources in the i^ grid upwind from the central block "o"
    c = constant
    N = the number of upwind grid blocks included in the sum.
The source  density, p, has been calculated  using the sum of several Gaussian
functions with  the same parameters, each centered at different positions. This
gives some structure to the "city." Future "cities" differ from present "cities"
by  the  addition  of a Gaussian  function to the present  city's source density
configuration. The statistical  distributions of concentrations are determined for
each city  by using Equation 22 with 2000 "wind" velocity vectors taken from a
normal  population; i.e.,  the X  and  Y  components  of wind velocity  were
computed  independently  by the  use  of a  normal  random  number generator.
From  this  pair  of values the  wind  speed and  direction  were calculated.
Concentrations were determined, using Equation 22, from 2000 pairs of wind
speed  and  direction.  Using the   2000 values of  concentrations,  empirical
distributions were formed. Then the growth factors, Ga, were determined by the
use of  Equation  15, with zero background concentration, and the empirical
distributions.
    Three different source density  configurations as shown in  Figure 1 are used
in  the  numerical examples.  The  present city's source densities  (first source
density  configuration) are given  by the sum of two  Gaussian functions.  The
center  of the second function is located at 5.0 kilometers east of that of the
first. The receptor is at the center of the first function, the origin. The second
configuration is the present city's configuration plus another Gaussian centered
at 7.1 kilometers northeast of the  origin.  In the third configuration, the growth
15-6

-------
is represented  by another Gaussian centered  at  5.0  kilometers north of the
origin.
    The growth  factors,  Ga, have been calculated and plotted, in Figure 2
through 6, as a function of percentile for several normal velocity distributions.
Of the 2000 values of growth  factor,  only seventy-five evenly  spaced  (along
percentile axis) values have been plotted. On each figure are two  sets of growth
factors corresponding to  the  two  different emission  growth  patterns.  The
solid circles give the growth factor when the future city is the second configura-
tion,  while the closed circles give the growth  factor  when  the future city
is the  third configuration as  described above. The wind  velocity  distribution is
indicated  on each diagram by Ux ~ N  (mean wind in x-direction, variance of
x-wind component) and U ~ N (mean wind in y-direction, variance of y-wind
component). The growth  factor diagram for the case of no preference in  wind
direction  rs shown  in Figure 2.  Beginning with a southerly dominant wind in
Figure 3,  the dominant wind direction keeps rotating clockwise by 90° for each
of the remaining growth factor diagrams shown in Figures 4 through 6. Thus the
effect  of differing wind patterns relative  to the cities'source configurations can
be observed. The variances have been held constant, 25 and 100, respectively,
for the wind components in East-West and North-South directions.
    The growth  factors  appear  to be  more dependent on  emission  growth
pattern than  on concentration distribution. Although the two  future cities have
the same  amount of increase in emissions, the growth factors of future city 2
(third  configuration) are appreciably greater in  every wind pattern and at every
percentile  than  those  of future city  1  (second configuration). The reason
probably is that the third configuration (p (r_= 0)  = 1.90) yields a higher source
density around the receptor  point  than  the  second  configuration (p (x= 0) =
1.65)  does, and that pollutant concentrations are influenced in a  greater extent
by nearby sources than remote sources.
    The dependence of the growth  factor on percentile is fairly sensitive on the
source  configuration relative to the  wind  pattern.  As the dominant  wind
direction  rotates,  the  slope  of the  trends changes.  Strong  dependence  on
percentile occurs when the dominant wind blows from the North or  South (see
Figures 3  and 5 respectively). Weak or no dependence on percentile occurs when
the dominant wind blows from  the East or West. When wind does  not have a
preferred direction the growth factors show a mild dependence on percentile.
    It  is difficult to determine how realistic the model simulations  are. The wind
roses were constructed from the wind speeds and wind directions generated  by
the method mentioned  above. The simulated wind frequency distributions are
not much  different in nature from those observed at CAMP cities.
                                                                      15-7

-------
Discussion and Conclusions

    The effect of statistical distribution of pollutant concentrations on rollback
equations has  been investigated by using the concept of the proportional model
as well as by using numerical examples. The form of the rollback equations, in
general, changes with percentiles of pollutant  concentration  when  incorrect
growth factors are used. This conclusion derives from the percentile forms of the
rollback  equations that has been obtained for the case of a constant growth
factor by using the proportional model.
    Most cities may  not be expected to grow  in any simple fashion. To see the
dependency of the growth factor on percentiles and emission growth patterns,
the growth factors at different percentiles of an  imaginary city were calculated
for several emission  growth patterns and for several different wind velocity
distributions. The percentiles were constructed from 2000 concentration values
that were calculated  using a simple model and  normal random wind velocity
vectors as meteorological input. The results are  shown in Figures 2 through 6.
    In all cases the figures show that the growth factor of future city 2 (open
circles) are greater than those of future city 1 (solid circles) although the amount
of emission growth  is the same for the two  future  cities. The reason is that
emission  growth around the receptor point is larger in the 2nd future city than
in the first, and  that the receptor concentrations are affected in a greater extent
by nearby sources than remote sources. Here,  the source densities at the origin
where the receptor is located  are 1.45,  1.65 and 1.90, respectively, for present
city, future city 1, and  future city 2 (Fig.  1).  The "emission growth factors" at
the receptor point, therefore,  are 1.14 and 1.31, respectively, for future cities 1
and 2. This indicates that not only  the amount of emission growth but also the
emission  growth pattern is important to  correctly estimate the growth factor.;
This in  turn  suggests  that redistribution of emission sources through  city
planning can be an effective measure to improve the air quality at dirty spots in
a city.
    The  growth factor may increase or decrease with percentile depending upon
the emission growth pattern and wind pattern. When  there is no preference in
wind direction (Fig.  2), the growth factors gradually increase with percentile.
The reason is  that a weak wind, which results  in higher concentrations, tends to
magnify  the effect of  emission growth on concentrations.  When  a dominant
wind blows from the East or West (Figs. 4 and  6), the growth factors become
less dependent on percentile. Strong dependence on percentile  occurs when the
dominant wind blows from the North or South. When the dominant wind blows
from  the South  (Fig. 3), the growth  factors increase with percentile.  For a
northerly dominant wind the growth factors  decrease with percentile (Fig. 5).
This can  be explained as follows:
15-8

-------
    (a)   Lower  percentile  concentrations in  Figure 3  result from a  strong
southerly wind that blows over the southern part of the city where the emission
growth  is lower than the other part of a city. Thus, the concentration growth
factors at lower percent!les are smaller than the "emission growth factors" at the
receptor.
    (b)   Higher  percentile  concentrations  in Figure 3  result from a  weak
northerly wind that carries  pollutant from the northern part of the city where
the emission growth  is  higher than the other  part. Thus,  the concentration
growth  factors  at  higher percentiles  are greater than the  "emission  growth
factor"  at the receptor.
    (c)  As a result of (a) and (b), the growth factors in Figure 3 increase rapidly
with percentile.
    (d)  Lower percentile concentrations of future city 2  result from a strong
northerly wind that carries  pollutant from the  high  emission growth northern
part to  the receptor. Thus, the concentration growth factors at lower percentiles
are much greater than the "emission growth factor" at the  receptor (=1.31). On
the other hand, higher percentile concentrations of future city 2 result from a
weak  southerly wind that blows over the low emission growth southern part of a
city.  Since  a  weak wind tends to magnify the  effect of  emission  growth on
concentrations as mentioned before, the concentration growth factors at higher
percentiles are about the same or a  little higher than  the  "emission  growth
factor"  at the receptor.  As  a result of this, the growth factors of future city 2
decrease sharply with percentile as seen from Figure 5.
    (e)  The growth factors of future city 1 decrease with percentile by a similar
reason to the above. However, the downward  trend is much milder than that of
future city  2  because  the  center of the emission  growth  is located at the
northeast of the receptor instead  of the North, and is more distant from the
receptor point than that of future city 2.
    From the preceding discussion the  following qualitative statement can be
made as to percentile dependence of growth factors. When there is no preference
in wind direction  and speed, growth  factors  tend to increase with percentile
unless source  density  configuration  is  point symmetric  and  the receptor  is
located  at the  symmetric point. When  there is a dominant  wind direction from
which wind blows strong and frequently, growth factors at downwind receptors
from  the  emission growth center decrease with percentile and those at upwind
receptors increase with percentile.
    The effects of  concentration distribution and emission growth  pattern on
growth  factors have been discussed mainly because the growth factors are used
extensively in  the  literature (Larsen (1961) (1969);  Jensen  (1971); Ott et al.
(1967)). The real concern, however, is the effects of concentration distribution
and growth  pattern  on the reduction ratio that is given either by Equations 1 or
2 or 16. The sensitivity  of the reduction ratio, R, on the growth factor, G, can
                                                                      15-9

-------
be checked by expanding R in a Taylor series about the correct value of G, G0,
and in terms of the deviation of G from the correct value, 5 = G — G0-
                                 'o
                                           r j 11 i r_
                                                 'o
*R(G0)  +  8^-                    (23)
Setting B = 0 in Equations 1, 2, and 16 we can obtain

                      dR i    =   '  ~ R (Qo)                    (24)
                      dGlGo         G0
Thus we can write the reduction ratio as

                                     8(I-R CG03)
                R(G) ^
                                         (25)
    Suppose that one made a mistake in estimating future growth pattern and
ignored  the effect of concentration distribution on growth factors. Thus, from
Figure 3 the difference between the growth factors at the 50-th percentile of
tuture city 1 and the 99-th percentile of future city 2 is about —0.25. Taking R
(G0) = 0.90, then from Equation 25, R(G) s 0.87. This appears to be a negligible
effect. Real  importance,  however,  is as  to how much the total amount of
pollutants will  remain when emissions are reduced  according to the reduction
ratio. The relative difference in the amounts of remaining pollutants according
to the correct and incorrect reduction ratios is given by
           {(l-RrG03)-{l-RCG])}ep/£/of(dr')
                                                                   (26)
                  (l-RCG0])ep /r, />f  (dr')

                         _   R(G)-R(G0)
                               I-R(G0)
Substituting the values into  Equation 26, the example  yields about 30%
difference in the amounts of remaining pollutants. This can not be a negligible
effect.
    The results of this work are not conclusive as to the extent of the effects of
concentration distribution and emission growth pattern on calculating reduction
ratios. It  is, however, obvious that a correct reduction ratio can not be obtained
from  emission growth  only. Consideration on emission  growth patterns and
concentration distributions should be included in rollback calculations.
15-10

-------
         /»(£)-  2  />n
                n=l
Present City:
  N=2
                        R, =
R2=5000i
Future City I
Future City 2'-
  <' R3S 5000 i
      +5000]
                        R| = 0    R2=5QOOi


                           R3=500of
                                   -•    A
                        R,=0    R^SOOOi
     Figure 15-1. Source configuration of imaginary city
                                                15-11

-------
       1.4
     O I 3

     J2
     o
     oc
                                                   O   00 00
                                   I
                                         _r
         0.0
25.0         50.0

        FREQUENCY (%)
75.0
9995
Figure 15-2. Cumulative frequency diagram for the growth factor for case I: U,
      N (0,100), Uy ~N (0,100)
    1.4
    "
 u> 1.2
     I.I
                      I
                 I
     I
      J
      0.0
25.0           50.0           75.0

         FREQUENCY (%)
                  99.95
Figure  15-3. Cumulative frequency diagram for the growth factor for case II: U,

     ~N (0,25), Uy ~N (5,100)
15-12

-------
     1.4
   o

   2
     1.3
   o
   o:
   o
     1.2
      I.I
                  ooo
               0000   n  on
                                 o
                 ooo o   o         o
                   -o—oo-o—-oo  o o-o-
         ^_^____—^MI^^fcB^^^^to^—-•••^"^^••"•^•••"^^^—^—    •"' W	W—V	WW	 ^f » *r

         U UUUUO      000 000000000 000 00     O   000  000     000   O

          o                        o                    o   ooo o
                                            oo
                       I
                                      I
                               I
   I
       0.0
25.0           50.0           75.0

         FREQUENCY  (%)
99.95
Figure 15-4. Cumulative frequency diagram for the growth factor for case

      Ux ~N (5,25), Uy ~N (0,100)
     1.5
   KI.4


   o
   <
     1.3
   o
   cr
   o
     1.2
         o o
            o o
                000
                  oo
       b
                       I
                 I
   •

  J_
25.0           50.0            75.0

          FREQUENCY (%)
 99.95
Figure 15-5. Cumulative frequency diagram for the growth factor for case IV:

      Ux ~N (0,25), Uy~N (-5,100)
                                                                     15-13

-------
                           Q   CO       O       OO  I
                           g UUU    -QUO oo-oo oo  o coo..
                            O   00 000 O OO O  O OO O OO

                 13
               *
               o
                            250       50.0       75.0
                                 FREQUENCY (%)
                                                         9995
Figure 15-6. Cumulative frequency diagram for the growth factor for case V: Ux
      ~N (-5,25), Uy~N (0,100)

Acknowledgments
    The authors wish to thank Mr. Richard Kamens for fruitful discussions and
Professor Arthur  C.  Stern for his encouragement and helpful suggestions. This
work  was supported by the Environmental Protection Agency research project
R-800901.

References

Anon, 1971:  National Primary and Secondary Ambient Air Quality Standards.
      Federal Register. 36: 8187-8188.
Hanna,  S. R., 1971: A  Simple Method of Calculating Dispersion from Urban
      Area Sources. J. Air Pollution Control Association. 21: 774-777.
Jensen,  D., 1971: From  Air Quality Criteria to Control Regulations. Ford Motor
      Co. Publication No. 710303, pp. 67-74.
Larsen,  R. I., 1961:  A Method for Determining  Source Reduction Required to
      Meet  Air Quality  Standards. J.  Air Pollution Control Association.  11:
      71-76.
Larsen,  R. I., 1969: A New Mathematical Model of Air Pollutant Concentration
      Averaging Time and  Frequency. J. Air Pollution Control Association. 19:
      24-30.
Ott, W., Clarke,  J.  F., and Ozolins,  G.,  1967:  Calculating Future Carbon
      Monoxide Emissions and Concentrations from Urban Traffic Data. U. S.
      Dept. of HEW, Public Health Service, Bureau of Disease Prevention  and
      Environmental  Control, Public Health Service Publication No. 999-AP-41.
15-14

-------
APPENDIX I

Simple Multiple-Source Model
    The concentrations at position £and time t due to all relevant  physical
factors w can be expressed for a single type of sources as

                                                                 2/
                            l',t',w)  A(r-r',  t-t', w ) (dr/)2dt
where
                                                                 (1-1)
    Xs = concentration due to one type of source
    e = emission per unit source
    p = density of emission sources
    A = transfer function that relates sources at£f, t' to concentration atj, t.
    The  total  concentration  is   given  by  the  sum  of  the background
concentration, B, and the concentration due  to the one type of source, Xc.
                                                                     o
Defining a function F d't'.w) A(r-r', t-t', ffi)(dr/)2dt/

the total concentration can be written as
    X = B + e F                                                     (8)
Equation 8 indicates that the  distribution of the total concentration,  X, is
determined by the distribution of F through the physical factors w, position £,
and time t. Thus the a-th percentile value of X, Xa, is determined by the a-th
percentile of  F, Fa/ where the background concentration,  B, is assumed to be
constant.
    The growth  factor defined by  the ratio of (Xfa —  B)/(Xpa — B) can be
expressed as

                                 -Ffa
                                                             /2/
           L//° (I7, t/wf) A(r-r',tf-r', tf-t! gf )(dr/)
                                                                 dt
                                                            dt7
where the integrals in Equation I-3 can be computed with variables, t and w, that
would give the a-th percentiles of Fp and Ff .
                                                                  15-15

-------
APPENDIX 11

Derivation of the Percentile Form of the Rollback Equations
     Assume that G =  (Ffa/F a) is independent of percentile a and that the
 rollback equations are valid for the 50-th percentile, i.e.,

                      D  -  gLXP50~  Xd50                    (,|.-,)
                      RL	~3	                     J
                                                                  (I I-2)
                             9j (Xp50- B)

where
    Xp50 = the 50-th percentile value of Xp
    Xd50 = the 50-th percentile value of Xd.
The proportional  model, in general, can be written as

                          Xj  =  B + ejFj                       (11-3)

where subscript i  is used to generalize quantities for future, present and desired.
From the preceding discussions we have

                         Ffa= Fda =  GFPa                     O1'4)

                          ef = ep

                          ed = (l-R)ep

                          R = RLor Rj

The a-th and  50-th percentiles oi the present and  desired  air qualities can be
obtained, using RL, from Equations 11-3 and 11-4 as

                        X—  Q -U A  C*                             /i i c'\
                     pa ~  °   eprpot                          v""^
                        =  B + (|-RL) epG Fpa               (ii-6)
                               6PFP50
                                         epGFp50               (u-8)
15-16

-------
   Any percentile value of F  can be related to the 50-th percentile value of
FP' Fp50. by a constant, )3a.
                         pa
 0"-9)
Substitutions of Equations 11-5 and 11-9 into Equation 11-7 and Equations 11-6
and 11-9 into Equation 11-8 yield, respectively,
                       ^P50 = xpa~" epPa
and
                             -{l-RL)epG/3a
Substitution of Equations 11-10 and 11-11 into Equation 11-1 yields

           gL(xpa-ep)3a)-{xda-Cl-RLDepG pY
     R, =
                       gL(Xpa-ep/3a)-B
Solving for RL we obtain
               R  _
                            - B
By similar steps Equation 19 can be derived from Equation II-2.
(11-10)



 (n-11)

 (H-12)
                                                            15-17

-------
DISCUSSION
Larsen:  Thank you, Dr. Horie, for a thorough look at this problem. We might
consider one factor. Three  rollback equations could be used according to the
behavior  of  background  concentrations. We  have  heard of  two  rollback
equations, Rj which  refers to Jensen's article and RL  which refers to Larsen's
equation. Jensen's equation is the correct equation if background now remains
the same as  background later; a second rollback equation would be one in which
the concentration might  be doubling  and  the background would also be
doubling. In other words, a second rollback equation would be one in which the
background  growth factor equaled the urban growth factor. The Larsen rollback
equation is between these two, between a background growth factor of one and
a background growth  factor equal to the urban growth factor. For a situation
involving no background growth, the Jensen equation should be used. If growth
is intermediate the Larsen equation should be used. This intermediate situation
might be experienced, for instance, with  a northeast wind  blowing from Boston
to New  York to  Philadelphia to Baltimore to Washington.  As these places grow
together, the background grows together and background  increases. The Larsen
equation could be used for places growing together. In the middle of the great
plains, not affected by background from other cities, the Jensen equation could
be used.
Horie: Thank you very much. This is exactly so. We noticed this difference when
we consider the growth for the background concentration.
Smith: I think  one fact which  ought be taken into account in this sort of
calculation   is the variation due  to the modification  of  the  weather  by the
pollutants themselves. If you reduce the  concentration of  such things as smoke
or photochemical components, then you may very well change the statistics of
the weather  and hence, get a change in your reduction factor.
15-18

-------
16. SYMPOSIUM PARTICIPANTS
Pat Adomitis
Dept. of ESE
School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

Gerald G. Akland
EPA — Div. of Atmospheric Surveillance
Research Triangle Park, N. C. 27711

James N. Arvesen
Dept. of Statistics
Math Science Bldg.
Purdue University
Lafayette, Indiana 47907

C. W. Ash
Dept. of Biostatistics
School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

R. E. Barlow
University of California
Berkeley, Ca. 94700

Joel  Barnett
Div.  of Air Pollution Control
C2-212Cordell Hull Building
Nashville, Tennessee 37219

John J. Beauchamp
Oak  Ridge National Laboratory
Statistics Department
P. O. Box Y
Bldg. 9704-1
Oak  Ridge, Tennessee 37830

Michael M. Benarie
IRCHA Research Center
Boite Postale 1
91 Vert-le-Petit
France
Bernard Bloom
Allegheny Cty. Air Poll. Control Board
301 39th St.
Pittsburg, Penn. 15201

John M. Bowman
501 N. 9th St.
Room  130
Safety, Health & Welfare Building
Richmond, Virginia 23219

Franklin Briese
Div. of Biometrics
Mail Container 2355
Univ. of Colorado Med. Center
Denver, Colorado 80220

Kenneth Calder
EPA — Div. of Meteorology
Research Triangle Park, N. C. 27711

Norman L. Canfield
19 Bayberry Ave.
Garden City, Ntew York 11530

Charles R. Case
TRC
125 Silas Deane Highway
Wethersfield, Conn. 06109

C. Andrew Clayton
EPA
Research Triangle Park, N. C. 27711

David C. Collins
Technology Service Corp.
225 S.  Monica Blvd.
Santa Monica, Ca. 90401

R. E. Cooper
Environmental Analysis and Planning
Savannah River Laboratory
Aiken,  S. C. 29801
                                    16-1

-------
Arnold Court
California State University
Northridge, California

J. M. Craig
Southern Services Inc.
P. O. Box 2625
Birmingham, Alabama 35202

Alexander R. Craw
National Bureau of Standards
Washington, D. C. 20234

T. V. Crawford
Environmental Analysis and Planning
Savannah River Laboratory
Aiken, S. C. 29801

Loren W. Crow
2422 South Downing St.
Denver, Colorado 80210

E. James Dale
Dept. of ESE
School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

Eugene M. Darling, Jr.
DOT Transportation Systems Center
55 Broadway
Cambridge, Massachusetts 02147

William Delaware
Dept. of Envir. Conservation
Div. of Air  Resources
50 Wolf Road
Albany, New York

John B. Edwards
Chrysler Corp.
CIMS 418-18-22
P. O. Box 1118
Detroit, Michigan 48231

I sod ore Enger
Geomet, Inc,
50 Monroe St.
Rockville, Maryland 20850
16-2
James A. Fay
Room 3246
MIT
Cambridge, Massachusetts 02139

Robert Faoro
EPA
Research Triangle Park, N. C. 27711

Martin A. Ferman
Research Laboratories
Department  E.V.
General Motors Technical Center
Warren, Michigan 48090

Doug Fox
EPA/NERC
Research Triangle Park, N. C. 27711

Neil H. Frank
EPA
Research Triangle Park, N. C. 27711

Joel Frockt
Department  of Biostatistics
School of Public Health
University of North Carolina

A. S. Galbraith
EPA
Research Triangle Park, N. C. 27711

Frank A. Gifford
NOAA-ATDL
P. 0. Box E
Oak Ridge, Tennessee 37830

Steve Goranson
Office of Statistical Services
EPA/NERC
Research Triangle Park, N. C. 27711

Rebecca A. Gray
EPA/NERC
Div. of Chemistry and Physics
Research Triangle Park, N. C. 27711

-------
George W. Griff ing
EPA/Div. of Meteorology
Research Triangle Park, N. C. 27711

Nathaniel Guttman
National Climatic Center
Federal Building
Asheville, North Carolina 28801

Howard R. Hammond
Baltimore Gas and Electric
Room 231
531 E. Madison St.
Baltimore, Maryland

David M. Hershfield
ARS-W Soils Building
Beltsville, Maryland 20705

C. Doyce Hester
Reynolds Metals Co.
P.O. Box 9177
Corpus Christ!, Texas 78408

M. Eugene Hoffman
North Carolina State University
Raleigh, North Carolina 27607

George C. Holzworth
EPA/ Div. of Meteorology
Research Triangle Park, N. C. 27711

Yuji Horie
ESE, School of Public Health
University of North Carolina
Chapel Hill,  N. C.  27514

William F. Hunt
EPA
Research Triangle Park, N. C. 27711

Peter Imrey
Department of Biostatistics
School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514
John S. Irwin
University of North Carolina
Chapel Hill, N. C. 27514

Warren G. Johnson
EPA
Research Triangle Park, N. C. 27711

Howard C. Jones
Dept. of Environmental Conservation
Div. of Air Resources
50 Wolf Road
Albany, New York

James Jones
Dept. of Chemical Engineering
University of Kentucky
Lexington, Kentucky

Surendra Joshi
ESE, School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

R. H. Ketterer
General Electric
Schenectady, New York

K. R. Knoerr
308 Biological Sciences
Duke University
Durham, North Carolina 27706

Joseph B. Knox
Lawrence Livermore Laboratory
University of California
P. O. Box 808
Livermore, California 94550

Lewis Kontnik
ESE, School of Public Health
University of North Carolina
Chapel Hill, North Carolina

Stanley  L. Kopczynski
EPA/NERC
Research Triangle Park, N. C. 27711
                               16-3

-------
 Lawrence L. Kupper
 Dept. of Biostatistics,
   School of Public Health
 University of North Carolina
 Chapel Hill, N. C.

 Ralph Larsen
 EPA/Met. Laboratory
 Research Triangle Park, N. C. 27711

 Russell  F. Lee
 EPA/OAP
 Research Triangle Park, N. C. 27711

 Stephen A. Lesh
 ESE, School of Public Health
 University of North Carolina
 Chapel  Hill, N. C. 27514

 Helmut Lieth
 Department of Botany
 University of North Carolina
 Chapel  Hill, N. C. 27514

 Jiumn W. Lin
 Room 402,
 Dept. of Environ. Control
 320 N.Clark St.
 Chicago, Illinois 60610

 James W. Lingle
 Industrial Bio-Test Labs Inc.
 1810 Frontage Rd.
 Northbrook, Illinois 60002

 Gene R. Lowrimore
 EPA
 Research Triangle Park, N. C.  27711

David A. Lynn
33 Cogswell Ave.
Cambridge, Massachusetts 02140


George Manire
EPA
Research Triangle Park, N. C. 27711
16-4
Allan H. Marcus
Dept. of Mathematical Sciences
Johns Hopkins University
Baltimore, Maryland 21218

David McLeod
EPA
Research Triangle Park,  N. C. 27711

Thomas E. McMullen
EPA
Research Triangle Park,  N. C. 27711

Donald McNeil
Dept. of Statistics
Princeton University
Princeton, New Jersey 08540

David McNeils
Dept. of ESE
University of North Carolina
Chapel Hill, N. C. 27514

W. S. Meisel
Technology Service Corp.
225 Santa Monica Blvd.
Santa Monica, Ca. 90401

Edwin  L. Meyer
EPA/OMP
Research Triangle Park, N. C. 27711

Mark Murray
Dept. of Biostatistics
School of Public  Health
University of North Carolina
Chapel Hill, N. C. 27514

Carl  Nelson
Research Triangle Institute
Box  12194
Research Triangle Park, N. C. 27711

Harold E. Neustadter
NASA-LERC
21000 Brookpack Road
Cleveland, Ohio 44135

-------
Everett C. Nickerson
EPA
Research Triangle Park, N. C. 27711

Lawrence Niemeyer
EPA/Div. ot Meteorology
Research Triangle Park, N. C. 27711

John Overton
ESE, School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

Don Pack
NOAA
8060 13th Street
Silver Springs, Maryland 20910

John J. Paulus
Westinghouse Electric Corp.
P. O. Box 9533
Raleigh, North Carolina

Robert Papetti
EPA
Waterside Mall
4th and M St. S.W.
Washington, D. C.

M. M. Pendergrast
Envir. Analysis and Planning
Savannah River Laboratory
Aiken, S. C.  29801

Bier Peters
North Carolina State  University
Raleigh, N. C. 27607

James Peterson
EPA/Div. of Meteorology
Research Triangle Park, N. C. 27711

John M. Pierrard
Petroleum Laboratory
E. I. DuPont de Nemours
Wilmington, Delaware 19898
Richard I. Pollack
University of California
Lawrence Livermore Laboratory
Livermore, California

Charles Proctor
Statistics Department
North Carolina State University
Raleigh, North Carolina 27607

William J. Raynor
Dept. of Biostatistics
School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

John H. Reynolds
2615Selwyn Ave.
Charlotte, N. C. 28209

Donald M. Rote
Argonne National Laboratory
9700 S. Cass Ave.
Argonne, Illinois 60439

R. Ruff
EPA/Div. of Meteorology
Research Triangle Park, N. C. 27711

J. S. Rustagi
Division of Statistics
The Ohio State University
Columbus, Ohio 43210

Bernard E. Saltzman
Kettering Laboratory
Eden and Bethesda Ave.
Cincinnati, Ohio 45219

Don Thomas
Air Management Branch
880 Bay St.  4th Floor
Toronto, Ontario

Irving A. Singer
Smith-Singer Meteorologists, Inc.
189 Brooklyn Ave.
Massapequa, N. Y. 11758
                                                                     16-5

-------
Nozer Singpurwalla
School of Engineering
George Washington University
Washington, D, C. 20006

Bjarne Sivertsen
135 Clinton
Apt. 1U
Hemptead,  New York 11550

Ralph C. Sklarew
EPA
Research Triangle Park, N. C. 277

F. Barry Smith
Meteorological Office
Met-0-14
Bracknell, Berkshire, ENGLAND

Vernon M. Smith
Box 2723
Geography  Department
East Carolina University
Greenville, N.C. 27834

Ronald D. Snee
Engineering Dept.
E. I. DuPont de Nemours
Wilmington, Delaware 19898

David B. Spiegler
22 Fiske Rd.
Lexington,  Mass. 02173

Robert Spirtas
Department of Biostatistics
School of Public Health
University of North Carolina
Chapel Hill, N.C.  27514

William Stasick
Dept. of Environmental Conservation
Div. of Air  Resources
50 Wolf Road
Albany, New York
16-6
David J. Svendsgaard
Dept. of Biostatistics
School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

Arthur C. Stern
ESE, School of Public Health
University of North Carolina
Chapel Hill, N. C. 27514

Philip Sticksel
Battelle Columbus Laboratories
505 King Ave.
Columbus,  Ohio  43201

George C. Tiao
Dept. of Statistics
University of Wisconsin 53706

George Touchton
ESE, School of Public Health
University of North Carolina
Chapel Hill, N.C. 27514

D. Bruce Turner
EPA/Div. of Meteorology
Research Triangle Park, N. C. 27711

Joseph R. Visalli
School of Civil Engineering
Purdue University
Lafayette,  Indiana 47907

Raymond C. Wanta
Consulting Meteorologist
28 Hayden Lane
Bedford, Massachusetts

Lowell Wayne
Pacific Environmental Services
2932 Wilshire Blvd.
Santa Monica, California 90403

-------
Robert Wevodau
Air Pollution Consulting Group
E. I. Dupont De Nemours
Wilmington, Delaware

J.G.Williams, Jr.
501 N. 9th St.
Room 130
Safety, Health and Welfare Bldg.
Richmond, Va. 23219

Peggy Wingard
VEPCO
P. O. Box 26666
Richmond, Virginia 23261

F. K. Wippermann
Technische Hochschule Darmstadt
Sektion Meterologie
6100 Darmstadt, GERMANY

Donald F. Worley
EPA
Research Triangle Park, N. C. 27711

Charles E. Zimmer
8160Trabant Dr.
Cincinnati, Ohio 45242
                                                                 16-7

-------

-------
                                   TECHNICAL REPORT DATA
                            (Please read Instructions on the reverse before completing)
 1 REPORT NO.
  EPA-650/4-74-038
                                                            3. RECIPIENT'S ACCESSION-NO.
4 TITLE AND SUBTITLE
 Proceedings  of  the Symposium on Statistical Aspects
 of Air Quality  Data
                               5. REPORT DATE
                                October 1974
                               6. PERFORMING ORGANIZATION CODE
7 AUTHOR(S)
                                                            8. PERFORMING ORGANIZATION REPORT NO
 Lawrence  D.  Kornreich, Editor, Executive Director,
 Triangle  Universities Consortium  on  Air Pollution
9. PERFORMING ORGANIZATION NAME AND ADDRESS
 Triangle Universities Consortium on  Air Pollution
 Post Office  Box  2284
 Chapel Hill,  North  Carolina 27514
                               10. PROGRAM ELEMENT NO.

                                1AA009
                               11. CONTRACT/GRANT NO.


                                68-02-0994
 12 SPONSORING AGENCY NAME AND ADDRESS
 Meteorology  Laboratory
 National Environmental Research Center
 U.S. Environmental  Protection Agency
 Research Triangle  Park, North Carolina 27711
                               13. TYPE OF REPORT AND PERIOD COVERED
                                Symposium Proceedings	
                               14. SPONSORING AGENCY CODE
15 SUPPLEMENTARY NOTES
16. ABSTRACT
 The 15 papers  in  these proceedings  analyze air quality  data  as a function of
 frequency, maxima,  the form of the  frequency distribution, and averaging time.
 Concentrations  and  frequency distributions calculated with meteorologic diffusion
 models are compared with observed values.   Discussions  that  followed the paper
 presentations  are included.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                              b.IDENTIFIERS/OPEN ENDED TERMS
 Air quality data
 Statistical analyses
 Frequency distribution
 Lognormal
 Averaging time
 Meteorology
Diffusion
Sulfur dioxide
Suspended
  particulate
                                                                         c.  COSATI Field/Group
 3 DISTRIBUTION STATEMENT
   Unlimited
                                              19 SECURITY CLASS (This Report)

                                                Unclassified
                                            21. NO OF PAGES
                                                266
                                              20 SECURITY CLASS (Thispage)

                                                Unclassified	
                                                                         22  PRICE
EPA Form 2220-1 (9-73)
                                            17-1
                                                              G P O 1975— 64O-878 J 639, REGION NO. 4

-------

-------

-------
00
Ul
o
o
CO
00
                    NO o>

                    Ti = S


                    R-5
                    O. ft) „
                    01 =, :r
                      ft) m
                      ex —
                      Q. -
                       2
                      D
—- O

Q. fD
— O
O O>

51 <
55
SJ5
10 3
JT> g
_ S
3 2
<2^ o>

ID
                                             i
                                    O 2
                                    3 Q
                                    Q >
                                    Z <5
                                    ^ z
                                             O

                                           111
                   5  •
                   lit
                   I»l3
                   Sj7?
                   •|?.3
                   •o 3 o o
                   S! 5 »3
                   v- — — m
                   •won
                   zsSi
                   pg?i
                   -Ds^
                   ^  ^s
                   -   g
                   «
                   m
                   n
                                              f O
                                              2 M
                                              m -^
                                              Z J>

                                         3  J- So
                                         X  U» -< -n
                   X
                   r>
                   C
                                              O
                                              2
                                              n

-------