Studies Of Pollutant Concentration Frequency Distributions


EPA-650/4-75-004

January 1975
Environmental Monitoring Series
          FREQUENCY DISTRIBUTIONS
                      Meteorology Laboratory
                National Envifomaenfo! Research Center
                  Office of Reseofcji and Development
                 ILS. ift¥ir$wll^lf|:f^t«0i«B Agency
                                 H.C.

-------

-------
                             EPA-650/4-75-004
             STUDIES
        OF  POLLUTANT
       CONCENTRATION
FREQUENCY  DISTRIBUTIONS
                 by
            Richard I. Pollack

        Lawrence Livermore Laboratory
          University of California
         Livermore, California 94550
         Program Element No. 1AA009
     National Environmental Research Center
       Office of Research and Development
      U.S. Environmental Protection Agency
   Research Triangle Park, North Carolina 27711


              January 1975

-------
                   RESEARCH REPORTING SERIES


Research reports of the Office of Research and Development, U. S. Environ-
mental Protection Agency,  have been grouped into series.   These broad
categories were established to facilitate further development and applica-
tion of environmental technology.  Elimination of traditional grouping was
consciously planned to foster technology transfer and maximum interface
in related fields.  These series are:

          1.  ENVIRONMENTAL HEALTH EFFECTS RESEARCH
          2.  ENVIRONMENTAL PROTECTION TECHNOLOGY

          3.  ECOLOGICAL RESEARCH

          4.  ENVIRONMENTAL MONITORING

          5.  SOCIOECONOMIC ENVIRONMENTAL STUDIES

          6.  SCIENTIFIC AND TECHNICAL ASSESSMENT REPORTS

          9.  MISCELLANEOUS

This report has been assigned to the ENVIRONMENTAL MONITORING
series.  This series describes research conducted to develop new or
improved methods and instrumentation for the identification and quanti-
fication of environmental pollutants at the lowest conceivably significant
concentrations.  It also includes studies to determine the ambient con-
centrations of pollutants in the environment and/or the variance of
pollutants as  a function of time or meteorological factors.

Copies of this report are available free  of charge to Federal employees,
current  contractors  and grantees, and nonprofit organizations - as supplies
permit - from the Air Pollution Technical Information Center, Environmental
Protection Agency, Research Triangle Park, North Carolina 27711.  This
document is also available to the public for sale through the Superintendent
of Documents, U.S.  Government Printing Office, Washington, D.C. 20402.
                        EPA REVIEW NOTICE

This report has been reviewed by the Office of Research and Develop-
ment, Environmental Protection Agency,  and  approved for publication.
Approval does not signify that the  contents necessarily reflect the
views and policies of the Environmental Proctection Agency, nor does
mention of trade names constitute endorsement or recommendation for
use.

                  Publication No. EPA-650/4-75-004
                                ii

-------
                                      PREFACE
    Air quality data have been analyzed as a function of frequency, maxima, the
form of the frequency distribution, and averaging time in the 15 papers composing
the "Proceedings of the Symposium on Statistical Aspects of Air Quality Data" (U.S.
Environmental Protection Agency Report No.  EPA-650/4-74-038, Research Triangle
Park, North Carolina 27711, October 1974).

    Dr. Pollack has drawn on these and other data in his analysis of air pollutant
concentration data and the frequency  distributions used to describe such data.   His
dissertation identifies the nature of the frequency distributions for both reactive and
inert pollutants, for both point and area sources, and to some extent for different
types of atmospheric conditions, using a substantially non-empirical approach.
Because of the valuable information presented in his dissertation, Dr.  Pollack
and the Lawrence Livermore Laboratory have given their kind permission to the
Meteorology Laboratory of EPA to publish  it for wider distribution.
                                        iii

-------

-------
                      TABLE OF CONTENTS
  I   INTRODUCTION	       1
        The Problem	       1
        The Research	       2
        The Significance      	       3
 II   AIR QUALITY MEASUREMENTS      	       5
        The Derivation of a Frequency Distribution of a
          Pollutant Emitted from a Point Source    ...       5
        Area Source      .........       6
        An Extension of the "Gaussian Plume" Point Source
          Distribution Derivation     ......       7
        A New Approach to the Derivation of Frequency
          Distributions of Pollutant Concentrations      .     .       8
        Lognormality Over Various Averaging Times    .     .      12
        Summary of Derivations	      18
        Conclusion   ..........      19
III   FREQUENCY DISTRIBUTIONS OF RELATED
     VARIABLES	      20
        Advection	      20
        Diffusion	      21
        Particle Sizes    .........      25
        Conclusion   ..........      26
IV   FREQUENCY DISTRIBUTIONS FOR  VARIOUS
     POLLUTANTS AND SOURCE TYPES      ....      27
        Reactive versus Inert Pollutants    .....      27
        Point  versus Area Sources    ......      37
        Summary	'.    .    .     .      38
 V   THE FREQUENCY DISTRIBUTIONS	      39
        Lognormal Distribution   .......      39
        Weibull Distribution	      49
        Gamma Distribution	      52
        Pearson Distribution      .......      56
        Mathematical Similarity	      58
        Summary    ..........      61

-------
 VI   ILLUSTRATIVE APPLICATIONS      	
        Analysis of Meteorological Patterns for
           Pollution Level Forecasting     .....      52
             Selecting the Clusters	      55
             Classifying New Days     ......      QQ
             Recalibration    ........      5?
             Spatial Interpolation  .......      53
             An Example      ........      53
             Development     ........      72
        Transition Matrices   ........      72
        Random Sampling     ........      73
VII   SUMMARY AND CONCLUSIONS	      75
        Area Sources     .........      75
        Point Sources     .........      7g
        Related Variables    ........      77
        Other Distributions    ........      77
        Applications      .........      73
        Future  Research      ........      73
      LITERATURE CITED	      80
                                 vi

-------
                 TABLE OF FIGURES AND GRAPHS
Fig. l(a).    Autocorrelation versus lag for CO concentrations
             in San Francisco,  1970 hourly averages.              14

Fig. Kb).    Autocorrelation versus lag for CO concentrations
             in San Francisco,  1970 hourly averages (log plot).     15

Fig. l(c).    Autocorrelation at lag 1 versus averaging time
             for  CO concentrations in San Francisco, 1970
             hourly averages.                                    16

Fig. 2.      Concentration and windspeed frequency distributions
             for  CO and windspeed for San Francisco,  1970
             hourly averages.                                    22

Fig. 3.      Probability distribution of the squared temperature
             difference compared with lognormality.
             P(e  < eQ).  e = (AT)2/[(AT)2]  (15).  Separation - 2 cm,
             104  samples per plot.                                24

Fig. 4(a).    Oxidant  concentration versus time in Los Angeles,
             hourly averages.                                    28

Fig. 4(b).    CO  concentrations versus time in San Francisco,
             hourly averages.                                    29

Fig. 5(a).    SO2 concentrations, direction 289°-308°, Lacq,
             France,   1968-1969 (3).                              31

Fig. 5(b).    NO2 concentrations, direction 108°-121°,  Lacq,
             France,  1968-1969 (3).                              32

Fig. 5(c).    SO2 concentrations, direction 108°-121°, Lacq,
             France,   1968-1969 (3).                              33

Fig. 5(d).    NO2 concentrations, direction 289°-308°,  Lacq,
             France,   1968-1969 (3).                              34

Fig. 6(a).    Log probability plot of oxidant concentrations in
             Los Angeles,  11/11/70, hourly averages.              35

Fig. 6(b).    Oxidant  concentrations in Riverside,  California and
             Los Angeles,  California,  1967 hourly averages.       35

Fig. 7.       Frequency curves  of the normal and lognormal
             distributions.                                        41

Fig. 8.       Frequency curves  of the lognormal distribution
             for three values of cr2.                               42


                                vii

-------
                                                                 Page
Fig. 9.      Frequency curves of the lognormal distribution
             for three values of /u.                                 43

Fig. 10(a).   Regions of convergence where the sum of n
             lognormal variates is approximately lognormal.
             (A) Convergence for both normal and lognormal
             approximations, (B) convergence for the log-
             normal approximation,  (C) convergence
             uncertain (21).                                        45

Fig. 10(b).   CO concentration in San Francisco,  hourly
             averages.                                            46

Fig. 10(c).   CO concentration for various  categories of
             pollution days in San Francisco,  1970  hourly
             averages.                                            47

Fig. 11.     Frequency curves for Weibull (top) and Rayleigh
             probability distributions.                             51

Fig. 12.     Cumulative Weibull distribution plotted on log
             probability paper.                                    52

Fig. 13(a).   Frequency curves for gamma probability
             distribution for various values of at.                   54

Fig. 13(b).   Cumulative gamma distributions plotted on log
             probability paper.                                    55

Fig. 14.     Skewness-kurtosis plane in Pearson's system.         57

Fig. 15.     Cumulative beta distribution plotted on log
             probability paper.                                    59

Fig. 16.     A possible set of air quality patterns.                  63

Fig. 17.     Geometric mean versus standard geometric
             deviation for individual days for oxidant con-
             centration in Los Angeles, California,  1970
             hourly averages.                                     65

Fig. 18.     A set of clusters of air quality day-types from
             the data in Fig.  17.                                   69

Fig. 19.     An example of the form of the chart to be
             developed in comparing the clusters generated
             in Fig. 18 with windspeed and temperature.           71
                                viii

-------
ABSTRACT

Early air pollution research focused on determining the identity of the concentration
distributions for a variety of pollutants and locations and the relationships between attri-
butes of the data, e.g.,mean values, maximum levels and averaging times, from an em-
pirical standpoint. This report attempts to identify the nature of the frequency distribu-
tions for both reactive and inert pollutants, for both point and area sources, and to some
extent for different types of atmospheric conditions using a substantially non-empirical
approach. As an illustration of the applicability of these results, a predictive model and
a monitoring scheme are proposed based upon knowledge developed by studying the fre-
quency distributions.

It is found that a theory of the genesis of pollutant concentrations based upon the
Fickian diffusion equation predicts that concentration distributions due to area sources
will be approximately lognormal over a diurnal cycle in the absence of nearby strong
sources. It is determined that reactive pollutants will have larger standard geometric
deviations than relatively inert pollutants. Empirical observations are in good agree-
ment with these results. The frequency distribution of the logarithms of concentrations
due to point sources is derived and shown to be a sum of normal and chi-squared com-
ponents, with the identity of the dominant term determined by meteorological conditions.
This result provides a framework for resolving apparently conflicting results in the lit-
erature. The lognormality of other meteorological variables, notably windspeeds and the
rate of energy dissipation in turbulent flow, and their relation to air quality frequency
distributions is discussed. There is considerable discussion in the literature concern-
ing whether the lognormal distribution provides the best fit. Other distributions that
fit air quality data fairly well are investigated, and their mathematical similarity to the
lognormal is demonstrated.

As an illustration of the significance of the results developed herein, a predictive
scheme that uses concentration frequency distributions as a. basis for classifying meteoro-
logical patterns is presented. This scheme uses natural clustering of the distribution
parameters to identify meteorological and emission patterns. Finally, an air quality
monitoring random sampling scheme based upon the distributions identified in the litera-
ture and this work is presented and its improvement over non-parametric techniques is
demonstrated.
ix

-------

-------
                    CHAPTER I —INTRODUCTION

                             The Problem
      In recent years public interest in the quality of ambient air has
increased.  As information concerning deleterious health effects has
gained wider acceptance, government has moved to specify standards
for the  quality of ambient air.  The standards are most often given in
terms of a maximum value which may be exceeded only once a
year for concentrations averaged over a specified period of time.
      The approach taken in relating air quality data to standards is  to
calculate the frequency distribution for the air quality data,  from
which concentrations at various averaging times can be derived.  It is
essential that these distributions  be very precise because of the
stunning economic  impact on a  region which must change its way of
life to conform to air quality standards.  As a result,  considerable
attention is being paid to this problem.
      The earliest  work on this problem consisted of the  empirical
identification  of the frequency distributions of surface air pollutant
concentrations.   Various distributions were proposed  with different
degrees of success.  The most widely accepted of these distributions
is the lognormal,  primarily due to the work of Larsen (1) who pre-
sented data indicating that concentrations of all pollutants in all tested
cities for all averaging times are approximately Lognormally dis-
tributed, it was also noted from these data, however, that some
pollutants tended to fit better than others, there were  differences
between cities, and it was not clear why averages of lognormal vari-
ables should be lognormal rather than normal as the Central Limit
Theorem would indicate.
      Later work determined that different distributions and/or dif-
ferent ranges  of parameter values were appropriate in different cir-
cumstances.  Marked differences were noted between  inert and re-
active pollutants and point and area sources.   Conflicting results are
presented in the literature concerning these distributions.  It is clear
that an  understanding of these results is important because of the
economic impact of the decision that ambient air quality standards

-------
(AAQS) are being violated.   Further,  an understanding of the nature of
the distributions and their parameters in the various cases can serve
only to enhance our understanding of the fundamental principles in-
volved. '
      There are,  of course, more pragmatic applications of this re-
search.  In particular, the formulation and validation of air pollution
models cannot proceed without some knowledge of the form of the
output to be expected.  The effects of various types of sources on
ambient air quality can be estimated through knowledge of the form of
the resulting concentration distributions and how the  parameters vary
with source type,  pollutant type and distance from the source.  Con-
siderable savings in time and money can be made in air quality
monitoring, prediction and modeling through applications of the tech-
niques presented herein.
      At present,  the scientific community has not reached a concensus
concerning the points raised above.  There are a number of conflicting
empirical results, and there is little work on a more theoretical level.
The present work seeks to add new information to the discussion,
derived from a non-empirical viewpoint.

                            The  Research
      The objective of this work  is to present a model which binds to-
gether previous theoretical and empirical findings within a  unified
framework.
      First, the frequency distribution of surface air pollutant con-
centrations  is derived starting from the differential equation describing
the time evolution of air pollutants through the  atmosphere.  It is shown
that for certain fairly general conditions, the distribution is lognormal.
      Using the "Gaussian Plume" equation, which describes the disper-
sion of a pollutant from a point source as a spatial bivariate normal
distribution, the concentration distribution resulting  from  a point source
is derived.  It  is shown here that the identity of the distribution is
dependent upon the distance from the source, the atmospheric stability
conditions,  and the magnitude of the windspeeds.   Within this framework
several apparently conflicting results from the literature (2), (3) can be
reconciled.

-------
    •  One of the most significant of Larsen's empirical results was the
fact that pollutant concentrations are approximately lognormally dis-
tributed for a wide spectrum of averaging times.  This appears to
contradict the central limit theorem of mathematical statistics.  How-
ever, through the model presented herein the averaging process can
be seen as a filter of various scales of atmospheric motion, each of
which results In the lognormal.
      Several Investigators have proposed other distributions for de-
scribing air quality data.  The Weibull, beta, and gamma distributions
are the most often suggested.  An obvious question concerns the reason,
In mathematical terms, that these distributions fit the same data fairly
well. It Is presented later that a transformation can be found which re-
duces the lognormal, gamma, and Weibull distributions to very similar
forms.   This suggests  that there Is little significant difference between
these distributions for  the parameter values often observed.
      A variety of other meteorological variables are approximately
lognormally distributed, particularly those  describing atmospheric
motion,  or motion of substances suspended  In the atmosphere.  The
relationship between several of these variables and pollutant concen-
trations Is discussed.
      Rather than merely stating these derived results,  considerable
attention has been paid to empirical evidence both from the literature
and compiled for this work.  The assertions made herein are supported
by data  which are presented concomitant with the non-empirical results.

                           The Significance
      This study treats the problem of the identification of the frequency
distributions of air quality data comprehensively.   The following are
discussed:
      1.  Which parametric distributions are appropriate to characterize
          pollutant concentrations from point and area sources and for
          inert and reactive pollutants?  Why?
      2.  What Is the effect of averaging time on these frequency dis-
          tributions?

-------
      3.   How are these distributions affected by other meteorological
          variables?
      4.   How can this information be applied?
      It is concluded that the  information developed here can be applied
to developing models for alert level forecasting,  air quality monitoring,
and more.

-------
           CHAPTER II— AIR QUALITY MEASUREMENTS

      Air quality data are continuously monitored,  with the receptors
punching out one reading every 5 minutes.  These raw data are then
averaged as follows:
      let
           xrx2 ...  XL ...  xn

be a sequence of 5-minute observations.  The averages
x1 + x2 + . . .  xk   xk+1 + xk+2 + . . .  x2k   x2k + x2k+1 + . . .  x3k
         K                   K          '           K

are then calculated and referred to as the averages of time K.  These
averages are the standard concentration measurements used  by re-
searchers and analysts.
      Air quality standards are  set based upon the relationship between
air pollution exposure and health effects. A comprehensive exposition
of these  standards can be found  in Ref. (4).   Updated information is
published by the Environmental  Protection Agency (EPA) Office of Air
Programs.
      To compare ambient air quality to standards, frequency distribu-
tions  of the  averages are calculated.   The cumulants of these distribu-
tions  are used to determine the  probability of exceeding a particular
standard.
      This chapter concerns itself with determining the nature of these
distributions from a non-empirical standpoint.

            The Derivation of a  Frequency Distribution of a
                Pollutant Emitted from a Point Source
      The earliest discussion of pollutant concentration frequency dis-
tributions was by Frank Gifford in 1958 (5).  Gifford started with the
equation describing the diffusion of a plume of stack effluent.  This
"Gaussian Plume" equation,

-------
           2- = (27rY2U)"1 exp
                                (y - Dy)2 + (z - Dz)2'
                                        2Y
where
             o
               is the variance of the material in individual disk
           R is the continuous rate of emission,
               is the var
               elements,
           -\r
           •^ is the instantaneous relative concentration,
           U is the magnitude of the wind vector,  and
           D ,D  are the distances of the plume center from the
             «/
                  origin,
can be simplified by defining
               (y-D)        (z - Dz)
              " (2Y2)1/2      " (2Y2)1/2

Therefore,
                       =  -In

where
            Cj - 2?rY2U

            2      2
The terms3^  and5? are each chi-squared variables, and by the re-
productive property of  chi-squared variates the result of the convolu-
tion is also chi-squared. Hence the natural log of the concentration is
directly proportional to a chi-squared random variable.
      Note that this result applied to a point source only.
                             Area Source
      In 1972 Gifford expanded this derivation to include area sources
by summing a number of point sources (6).

-------
      For n sources which affect concentration at the same point,
Eq.  (II-3) is summed over all n  sources yielding:
                   n              P
_n  ClXi/Qr Z  (Y2+Z2)
                                                                (11-5)
which can be written
               n
            In  [I [C;  X./Q.]1/11 = -L/n                           (II-6)
where the term within the brackets is the weighted geometric mean and
the right-hand side is normally distributed by the  Central Limit Theorem.
If the geometric and arithmetic means are  simply related, e.g. propor-
tional, then the natural logarithm of  concentration is normally dis-
tributed.

                An Extension of the  "Gaussian Plume"
                 Point Source Distribution Derivation
      The standard Gaussian Plume equation uses U, the mean wind-
speed, as a parameter.  However, there is considerable evidence  to the
effect that windspeeds are approximately lognormally distributed,  and
this may be expected to affect the concentration distribution.  If we re-
turn to
where
               = 27rY2U ,
& and & are defined in terms of the parameters of the Gaussian Plume
equation and are assumed normally distributed.
      This equation caji be written
      (27r?2U)
                                       ) .

-------
                 p
Defining K1 = 27rY , one finds
                                U  +      L                     (11-10)
           2    2
The term 3^  +^  is exponentically distributed or chi-squared with 2df,
j£n K. is a constant, and S.n U is normally distributed.  If

                                                               (11-11)
then the lognormal distribution results. I Thus the two results are recon-
ciled.
      Equation (11-10) is intuitively reasonable for periods of non-
negligible wind because inU^Us a measure of advective flux while &  +&
is a measure of diffusive flux.  Except in periods of extremely low winds,
the advective flux will be the larger.  During low wind periods,
apparently the lognormal approximation will be poor.  The author knows
of no such  empirical analysis or theoretical analysis at this  time.  In-
deed, atmospheric modeling during calm conditions is in its infant stage.
      In short,  the exponential distribution is appropriate under the
assumption of constant wind velocity.  However,  windspeeds are ordinar-
ily lognormally distributed, and the advective flux is of greater signif-
icance than the diffusive flux.  Under these more general conditions we
see that X/Q is  approximately lognormally distributed for periods of non-
negligible wind.  No conclusion is reached for calm periods  save that the
lognormal  approximation is likely to be poor.  We realize, of course,
that the concentration must be nonzero at all times for the lognormal to
be correct.  For most pollutants the background  concentration is enough
to satisfy this condition.

                 A New Approach to the Derivation of
         Frequency Distributions of Pollutant Concentrations
     Also in 1972, Knox and Pollack (7) derived the following relation-
ship from theoretical  considerations, using a substantially different
approach.

-------
      Consider a stochastic process of the form:
where Y.  is an independent stochastic variable,  arbitrarily distributed.
      If we solve Eq. (11-12) for Y.
           X.  - X.
           -^ - — = Y.                                       (11-13)
             X          L
and sum both sides
               X. - X.
                         + Z Y.,                               (H-14)
                           4=0
We can approximate the left side by
                        N
                           Y..                                 (11-16)
By the Central Limit Theorem,  2^ Y. is normally distributed,  hence
                               j2=Q
X  is lognormally distributed.
      This is known as the law of proportional effect; the percentage
change in a variable is equal to a constant plus an error.  If the absolute
change had been equal to this  same constant-plus-error term,  the
normal distribution would have resulted.  Hence the lognormal distribu-
tion  is the result of a multiplicative process, whereas the normal dis-
tribution results from an additive process.

-------
                                                                      10
      If we examine the differential equation describing the time evolu
tion of pollutants in the  atmosphere:
/
(
                                         8   „
                                            Ky
              9     8^
                               V
where ^  is the concentration of pollutant a; u, w, and v are the velocity
        cl
components; K , K  are the lateral vertical eddy diffusivities which are
                  tj
lognormally distributed based  upon the lognormality of e and  the repro-
ductive properties; K   is the vertical eddy diffusivity; S  is the source
                    Z                                cl
term for pollutant a; P is the term representing changes in concentra-
tion due to photochemistry; and V is the volume of air for  which S and
P act.
      This equation can be manipulated to represent a box model
formulation (8) where  we are concerned with the concentration averaged
over a box which is  surrounded by M other boxes.

dk (m, t)l      ^
-Uat—   =' /<
              J
             M
           + ^ [TA(J- m) + TD(J' m)] ^k(j' i} + Sk(m't}
            J

           + Pk [^a(m, t) . . . ^/n(m, t), t]   .                        (11-18)

                                 \
Where TA(m, j) and  T1~.(m, j) are the advective and eddy diffusive transfer
coefficients from box  m to box j.  The lognormal distribution can be
argued for these latter variables in a similar manner as for K  and K  .
      This equation is also  consistent with the generating process,
Eq.  (11-12), when certain reasonable restrictions hold.
      1,   The contribution  of advection and diffusion terms are larger
          than the contribution of the source term.  It has been found
          empirically that if this is not the case, lognormality does not
          result (9).

-------
                                                                     11
      2.   The concentrations in the surrounding boxes are on the aver-
           age over long periods of time close to that of the box we are
           interested in because they are subjected to similar stimuli.
      These restrictions transform Eq. (11-18) to:
            M
                [TA(m,j)+TD(m,;jJ] ^(m, t)
           J
          M
        + 2  [TA(j, m) + TD(j, m)]  tf/k(j,t).                     (H-19)


Suppose we let

          tf/k(j, t) = (//k(m, t) + Ek(j, t) ,                           (11-20)

the equation becomes

d»(m, *)   -
        = -     [TA(m,j)+TD(m.j)]  ^(m, t)
            J
           M
               [TA (J' m) + TD(J'
           M
        +  Z  [TA(j, m) + TD(j, m)l EMj, t) .                    (11-21)
          j=0  L                  J

When we sum both sides to show lognormality,  we have for the third
term,
              M
                                      Ek(j' t}
      From meteorological reasoning we note that if the flux term is
large, indicating strong winds,  the difference between i//(j, t) and i//(m, t)
will be small.  Hence the term tends to zero.  Conversely,  in the case

-------
                                                                     12
where the error term E, (j, t) is Large the flux term will usually be small,
indicating light winds.  Furthermore, in either case or any combination
of cases occurring between T,. and TR,  we can expect that the sign of
the term will vary over a diurnal,  weekly or seasonal cycle, implying that
the positive and negative terms will cancel each other.
      This argument implies that Eq. (11-21) is essentially equivalent to
Eq.  (11-12), which is consistent with the law of proportional  effect.
      The solution will be  source-dominated only when the magnitude of
the source terms  is comparable to the magnitude of the current con-
centration.  There is reason to believe  (9) that in such cases the con-
centrations will not be lognormally distributed, as the model indicates.
This result has also been noted in  investigations of particle size dis-
tributions (10).
      This reasoning is most easily justified for a well mixed urban
region.  It is not clear that the lognormal distribution will fit as well for
non-urban,  poorly mixed areas.  We do feel, however, that the  charac-
teristics of an area's topography and typical meteorology would have to
be highly unusual  for (11-22) to be so large that the lognormal distribution
would fit poorly.
      We have not yet discussed the At interval necessary for these re-
sults.  ,We recognize that it must be  sufficiently small not to obscure
the generating process.  If, for an extreme  example, At was six months
we would not see the effect of Eq.  (11-12) because the effect of i//. 1 on
(jj.  would have long since died out.  Larsen's (1) data are for 5-minute
instantaneous readings.  We accept this as an appropriate time  scale for
our purposes, based on the fact that  meteorology certainly does not
change enough in a 5-minute period to obscure the relevant correlations.

             Lognormality Over Various Averaging Times
      When the data are averaged over other time periods within the
realm of atmospheric motion,  the  averaging time acts as a filter which
smooths out motions of a smaller time scale.  This has the effect of
allowing us to see only motion of a time scale comparable to the aver-
aging time in the averaged data.  Hence the  process  described by
Eq.  (11-21) still holds for larger averaging times,  but the T.,  T^-. terms

-------
                                                                      13
now represent motion of a larger scale.  This results in lognormality
over a large spectrum of averaging times.
      An essentially equivalent relation to Eq. (11-12) is
This equation can be transformed to represent a first-order autore-
gressive stochastic process by taking the logs of both sides to yield:

            In x. = In x._. + In e.

This stochastic process is identified by examining the autocorrelation
function to verify that it decays exponentially,  and by calculating the
partial  correlation function to verify that it cuts off after lag 1.  The
partial  correlation coefficient can be thought of as a measure of the
independent predictive capabilities of x._ ,  without  regard to informa-
tion which is "passed through" x._  , .
      To verify the hypothesis presented above, these statistics were
calculated for natural logs of CO concentrations in San Francisco for
1970 as well as the untransformed observations.  The autocorrelation
function for lags 1 to 12  appears in Fig.  l(a) and  Table 1,
and is plotted on a log scale in Fig. Kb).  The agreement with the
exponential curve appears good.  The partial correlation coefficient
is equal to the  autocorrelation coefficient for lag 1,  of course,  but
thereafter it is negligible statistically.   In particular, the values are
0.874, -0.046,  and 0.015 respectively for the first three lags.  The
additive model does not appear to fit the first- order autoregressive
model as well.
      Figure l(c) indicaces that the multiplicative model appears to
be more appropriate for averaging times of 1 hour to 180  hours.  A
statistical test on the differences between the autocorrelations from
the multiplicative as opposed to the additive model was performed.  The
results  indicated that the autocorrelations are indeed significantly
different.   Table 2 presents several such calculations for representative
lags and averaging times.

-------
                                                           14
   1.0
   0.9
   0.8
   0.7
g  0.6

o
u

£
D
   0.5
   0.4
   0.3
   0.2
      0
                  Log transformed
      No transform
I
I
4        8



 Lag time in hours
         12
Fig. l(a). Autocorrelation versus lag for


          CO concentrations in San


          Francisco, 1970 hourly averages.

-------
                                                                    17
Table 1.  Autocorrelation and averaging time for CO concentrations in
          San Francisco,  1970 hourly averages.
Autocorrelation
(lag 1)

0.834
0.648
0.576
0.623
0.642
0.604
0.626
0.686
Standard
Variance deviation

3.943
3.153
2.285
1.267
1.000
0.826
0.677
0.544
Untransformed
1.986
1.776
1.512
1.125
1.000
0.909
0.823
0.738
Mean
data
3.488
3.488
3.488
3.486
3.486
3.491
3.500
3.476
No. data
points

20073
5018
1672
238
119
59
29
14
Averaging
time (days)

0.042
0.167
0.500
3.500
7.000
14.000
28.000
56.000
Log-transformed data
0.874
0.705
0.641
0.675
0.706
0.664
0.664
0.726
0.288
0.243
0.184
0.116
0.095
0.081
0.068
0.059
0.536
0.493
0.429
0.340
0.308
0.285
0.261
0.243
1.109
1.109
1.109
1.109
1.109
1.110
1.112
1.104
20073
5018
1672
238
119
59
29
14
0.042
0.167
0.500
3.500
7.000
14.000
28.000
56.000
Total number of hourly measurements = 20448

Number of missing measurements     =   375

-------
                                                                      18
 Table 2.  Differences between autocorrelations of log-transformed and
          untransformed CO cor centrations in San Francisco, 1970
          hourly averages.
Autocorrelation
0.834a
0.874b
0.576a
0.641b
0.626a
0.678b
0.648a
0.712b
0.68ia
0.7535
0.566a
0.650b
0.487a
0.4055
Averaging
time
(hr)
1
12
84
168
1
1
84
a
level of
significance
21.1
4.23
1.4
1.3
20.9
18.9
1.60
Lag
1
1
1
1
2
3
3
   No transformation of data.
   Natural log transformation of data.
                       Summary of Derivations
      The latter derivation will serve as the basis for the reasoning
presented later concerning the nature of the frequency distributions re-
sulting from various types of sources and pollutants.
      The other derivation supports the latter one,  as is to be expected
since the Gaussian Plume  equation is  a  solution to the Ficklan Diffusion
equation.  It is included for the sake  of completeness and to dem-
onstrate the consistency of the new derivation with existing theories.

-------
                                                                    19
      The new derivation is based on reasoning which is clearer and
 more flexible than the Gaussian Plume derivation.  These features will
 be used to great advantage in the explanation of various empirical re-
 sults which could not be easily justified using the earlier derivation.
      Other than these,  no theoretical explanations for the lognormal
 distribution or any other have been suggested.
                            Conclusion
      It is shown that considerable theoretical and empirical support
exists for the lognormal distribution as the most appropriate for the
characterization of pollutant concentrations for a wide range of averaging
times.  In later chapters other distributions are discussed, but these
alternate  results are demonstrated to be consistent with the material
presented in this chapter.

-------
           CHAPTER III —FREQUENCY DISTRIBUTIONS OF
                        RELATED VARIABLES
      Pollutants can be viewed as tracers of atmospheric movements.
Since we know that pollutant concentration frequency distributions are
fit well by the lognormal,  we suspect that some descriptors of the
atmosphere are also lognormally distributed.   Indeed, this is the case.
      Fundamentally,  atmospheric processes  are structured differently
than ordinary engineering-type processes.  In particular,  the change in
a variable describing an atmospheric process is most often propor-
tional to the level of that variable.  This is  written;

           X.  =X._1 e .                                        (III-l)

This is a multiplicative  process.  Many descriptions of atmospheric
motion  can be described by a process of this form.
      We are primarily  interested in variables describing the transport
of pollutant through the atmosphere.   Such transport is described by the
advective and eddy diffusive transfer rates.   We are interested also in
the removal of pollutants from the atmosphere,  but few results are
available except for particulates.  Particle  sizes, which partially govern
the deposition of particulates, will be treated  below.
      Through this investigation of other meteorological variables we
shall lend further credence to the conclusions presented in the pre-
ceeding chapter concerning the identity of the  concentration distributions.
In addition, the study of these variables yields greater insight into the
nature of atmospheric motion and pollutant transport which leads to new
approaches to identifying pollutant concentration frequency distributions.

                              Advection
      The lack of knowledge concerning mesoscale atmospheric motions
has hindered the formulation of a mathematical  model describing such
motion.  For this reason it is difficult to demonstrate that windspeeds,
measured continuously at a point over an interval of time, are log-
normally distributed from theoretical considerations.  However, ex-
tensive empirical analysis has been performed, indicating that the
lognormal is a reasonable assumption.
                                  20

-------
                                                                      21
      It has been shown by Gifford and Hanna (11), (12) that pollutant con-
centrations are proportional to windspeeds,  which is also implied from
the fact that both are lognormally distributed (see Fig. 2).  The correla-
tion coefficient for nonreactive primary pollutants like CO is extremely
high (~0.90), slightly less for the more reactive pollutants like NO™
(~0.85), and least for the secondary pollutants Like oxidant (0.66).  These
investigators evaluated the constants of proportionality for many cities
for several pollutants.   The results are summarized in Refs. (11)  and
(12).
      In comparative studies,  it has been found that this  simple model
            0 = KQ/U                                              (III-2)
where
            Q = total emissions,
            U = windspeed,
            0 = pollutant concentration and
            K = empirical constant
performs as well as many more complicated models for  primary pollu-
tants.  This is a. testimonial to the fact stated  above,  i.e.,  pollutants are
tracers of atmospheric motion.   Although  this may seem obvious,  one
should realize  that other processes than just advection influence con-
centration,  including the time history of the source,  the  deposition, and
the distance the pollutant has been transported.

                              Diffusion
      The windspeed is a measure of advection,  while diffusion is de-
scribed by eddy diffusivities.  In the  Fickian diffusion  equation,

d0      80     80      80    Q   /   80 \      /   80_
~dT + U 9x  + v ~8y  + W 8z~ = 9x
                     90 \'
                     	a)
K  , K ,  K  are constants which are the eddy diffusivities describing
      »/
diffusive flux.  We may suspect that these are also lognormally dis-
tributed, and the process of demonstrating this is  now presented.

-------
                                                           22
    100
  0_
 0)
 0)
 Q.
  Q.
  Q.
  C
  O
 C
 0)
 o

 o
 u
     10
                   •Wind
               Observed
               concentration
          i I  i i  i  i
     0.01       1      10       50      100

           % of concentration exceeded
Fig. 2.  Concentration and windspeed fre-

        quency distributions for CO and

        windspeed for San Francisco,  1970

        hourly averages.

-------
                                                                     23
      The local viscous dissipation is turbulent flow, e, given by
                           2
where v is viscosity,  u is velocity, and i and j refer to direction.
      In his original similarity hypothesis, Kolmogoroff formed length
and time scales of turbulent motion without taking the variability of e into
consideration.  In  1962, KoLmogoroff refined this hypothesis to take into
account random fluctuations in e (13).  In turbulent flow,  energy is
transferred from one  stage of motion to the next; this transfer is what e
represents.  The amount of energy transferred at each stage has been
argued to be a function of the relative magnitude of the state (14).  This
describes a multiplicative process.  If the transfer  stages are similar
and are independent, then by the law of proportional effect, In (e)  is
distributed normally.
      Now,  recalling the reproductive properties of the distribution we
can argue the lognormality of several other variables for which the
relations to e are well known.  The dissipation of temperature variance
by thermal conduction is given by

            X = 2a(grad  T)2                                       (III-5)

where a is thermal diffusivity. If both X and e are lognormal [X is
argued lognormal by Gurvich (15)] then AT and AU,  the temperature and
velocity differences between two points in space separated by a distance,
                                                                  o
are lognormal on either side of the origin, or more  concisely, (AT)  and
    2
(AU)   are lognormal due to the relationships:
                                                                 (III-6)

from Kolmogoroff and the reproductive properties of the distribution.
      Measurements (15) for r = 2 cm at 4 m above ground yielded the
distribution function plotted on lognormaUprobability paper in Fig. 3.
Clearly, the lognormal is a good approximation where a straight line in-
dicates perfect lognormality.

-------
                                                                24
 (V
V
  0.990
   0.95
   0.90
0 0.80
   0.70
    0.50
    0.30

    0.10
       -3    -2-101
                       €0
Fig. 3.  Probability distribution of the squared
        temperature difference compared with
        lognormality P(e < eQ).   e = (AT)2/
        [ (AT)2](15).  Separation = 2 cm, 104
        samples per plot.

-------
                                                                     25
      An approximation to the horizontal eddy diffusion coefficient based
upon similarity theory  is (16)

            Kh =  e1/3 a4/3                                        (III-7)

where a is the root-mean-square dispersion of the particles in a pollut-
ant puff.
      Since a  lognormal variable raised to any exponent is also log-
normal and since a '   is invariant,  it can be inferred that K,  is log-
normally distributed.   Note that a has been approximated at 0.7 AS
where AS is an intergrid-square distance in a compartmentalized model.
      Further, another approximation which is used in atmospheric
modeling is

                      3                          •               (m-8,

which was  developed without reference to statistical considerations.
This relationship indicates the lognormality of windspeed based upon
the lognormality  of e,  or vice versa (17).  Also,  the vertical  eddy
diffusivity  has been approximated as
            K  -  400 u1                                          (111-9)
             £-i        -L

where u1 is the horizontal wind velocity at 1 meter, determined by use
of a power law vertical profile from the mean layer wind.   This
illustrates the likelihood that K , a quantity which has not yet been
                              z/
accurately measured in the atmosphere,  is lognormally distributed.
      It is  worthwhile to notice that the results obtained from models
using these approximations have been encouraging (8).

                            Particle Sizes
      It has been clearly established (18) that  particle size distributions
in atmospheric aerosols are approximately lognormal.  There is  some
disagreement between scientists as to the exact size range  covered, or
whether there are two or three overlapping lognormal distributions  (19).
However,  the lognormal approximation is widely  accepted.

-------
                                                                     26
      Particles are the result of a multistage grinding process where
the size of a particle at any stage Is a function of its size at the Im-
mediately previous stage.  This can be represented by the multiplicative
process which results in  the lognormal distribution examined in
Chapter II.
      The deposition rate of the particles is a function of particle size,
primarily, and  is therefore approximately lognormally distributed by
virtue of the reproductive properties of the distribution.  Therefore,  the
negative portion of the source term in Eq. (II-3) due to deposition is
approximately lognormally distributed.

                              Conclusion
      Pollutant concentrations are a function of emissions, chemical
change, deposition and transport.  Several of the variables describing
transport and deposition are discussed here and are found to be con-
sistent with the  lognormal assumption.
      Hence It Is clear that pollutant concentrations are followers of the
overall character of motion In the atmosphere.  This result has not yet
received adequate attention.   Full realization of Its significance In-
dicates that air  quality data, which is essentially simple to collect and
useful for a pragmatic  purpose,  may have applications In the under-
standing of atmospheric processes.

-------
     CHAPTER IV —FREQUENCY DISTRIBUTIONS FOR VARIOUS
                 POLLUTANTS AND SOURCE TYPES
      In this chapter we shall examine several aspects in which pollu-
tants differ, in light of the model presented in Chapter II.  It will be
shown that certain seemingly conflicting results can be explained in this
context.

                   Reactive versus Inert Pollutants
      When we examine frequency distributions of air pollutant con-
centrations, we notice that the parameters vary from day to day and
from pollutant to pollutant.  This variation is a result of the nature of
the meteorology of which the pollutant is a tracer,  the sources of the
pollutant, and its reactivity.
      Figures 4(a)  and 4(b) show daily profiles of several pollutants.  The
profiles with the steepest slopes and highest peaks yield  the highest
standard geometric deviation (SGD).  We can see that pollutants with
similar reactivity trace the meteorological conditions in the same way,
provided that they are from the same type of source.  Carbon monoxide
and hydrocarbon are an example of this.  Shuck et al. (20) report a
correlation coefficient of 0.99.
      These principles are simple to  see; anything which causes large
fluctuations in the daily profile will cause a large SGD  in the  frequency
distribution of that pollutant.   The significant causes are unstable
meteorological conditions and volatile pollutants.  This volatility is
caused by either the basic chemical structure of the pollutants, as is the
case with oxidants and  oxides of nitrogen, or by the temperature of the
pollutant, as is the case with thermal plumes of SO0.
                                                  Li
      If we return to the Fickian Diffusion equation:
                                /   8^a\    8 /   9^a\
                                (Kx^) + e|(Ky^)
  8^a    8  /   8^    « /   8^-
w T5- = tt (K
                            Jx.y.z.t)   p
                              V       V  a'  b ' ' '   n'
                              27

-------
                                                                 28
      2   4  6  8  10 12 14 16  18 20 22 24
                    hours

Fig. 4(a). Oxidant concentration versus time
          in Los Angeles,  hourly averages.

-------
                               29
 en
 0)
 W)
 a
 t,
 0)

 rt
1

 o*
 u
 CO
••-I
 o
 cs
 at
 cfl
CO
 o;
 s
 0}
 3
 ca
 (H
 (U
 >

 a
 o
 o
 g
 o
O
U
t

-------
                                                                    30

we see that the chemical change term is larger and more variable for
reactive pollutants.   This affects the argument presented in Chapter II
through its direct effect on the daily profile predicted from the equation.
The  increased volatility causes larger fluctuations in concentration
which are seen in the box model diffusion [Eq.  (11-18)] both through the
magnitudes of ih (m, b) and !//,  (j,t), and in the magnitude of the difference
E^Xj, t). This absolute value of this  term is larger,  but it still changes
sign over the diurnal cycle which results in lognormality (Fig. 5).
      This explains the results of Benarie (3) and Knox and Lange (2)
who  noticed changes  in SGD between pollutants  for the same meteorology.
The  model described in Chapter II provides a theoretical framework
through which these results can be understood.
      Although Larsen's analysis indicated that oxidant concentrations
are lognormally distributed for all cities for all averaging times as in
Fig.  6(a) often oxidant concentrations depart from lognormality in a
consistent way,  as is depicted in Fig. 6(b).  At the  present time, no
definitive explanation has been set forth to explain this  anomaly.  A
brief investigation has suggested two nonmutually exclusive,  possible
explanations.
      The first is based upon the theory that the photochemical reactions
producing oxidant in the atmosphere are  self-limiting.  This is given
some support by the fact  that oxidant concentrations have never been
recorded at 1 ppm or higher in the atmosphere, even when such con-
centrations might be  expected because of source and meteorological
conditions.
      The second  explanation is that the averaging time of 5 minutes is
so long in comparison to  the reaction  rates that it averages out short-
duration high concentrations which might otherwise cause the cumulative
distribution curve, as in  Fig.  6(b) to straighten out.  This is  supported
by the fact that the "angle" between the two adjacent straight lines  in
Fig.  6(b) becomes sharper as averaging  time increases,  and  there  is  no
reason why the reverse should not hold for averaging shorter than
5 minutes.
      The shape of the curve  in Fig. 6(b) can be compared to  the shape
of the curves for the  gamma, Weibull, and Pearson-IV distributions
plotted on log probability paper.   It appears that the Weibull distribution

-------
                                                                                                               31
T—I—I—I
          CO   CM   •—   O
c   c    c
O   0    O
                           c
                           Q>
          5   5    £   P
          oo   oo   t/»   co

          O   O    +    •
                                                                                     a.
                                                                                     oo
                                                               J	1-
oo
o

to
cs

o
                                                                                     o
                                                                                     >o
                                                                                     o
                                                                                     in
                                                                                     o
                                                                                     "^•
                                                                                     o
                                                                                     n

                                                                                     o
                                                                                     cs
                                                                                     in
     ~o
     0>
     0>
     U

     £


     _0
     4-

     E
     •*-

     0)
     U

     o
     U
                                                                                             
                                                                                                      O   en
                                                                                                      C/2   •-•
                                                                                                       as
                                                                                                       W)
                                   LU

-------
                                                                 32
                      i i  i  i—i—i	1	r
     "- O 00
•— CM CO CO CM

 C C  C  C  C

.2 .2 .2 .2 .2
'•C *ZT "zr *^r tn

 o o  o  o  o
•»- *- •*- 4- *-
liO LO CO CO l/>
>  0
         0 +
   //-/7'x   f

?'//'*'
                                    Os


                                     •

                                    Os
                                                 Os

                                                 8:
                                                 a.
                                                 Os


                                                 00
                                                 Os
                                   JO
                                   Os
                                                 o
                                                 00
                                   8
                                   o
                                   10
                                                 o
                                                 CO


                                                 8
                                                 m
                                                 CM
                       p 1  1  1   1

                                                 O

                                               od
TJ
4>

"8
O
U
X
V

c
.o


1

IE
0)
o

o
o
968-1969
ee
                                              0*
                                              o
                                              <&
                                                            o

                                                            00

                                                            0
                                              |

                                              •+J

                                              O

                                              O)

                                              S-,
        CO

        §
        •rH
        13
concenl
                "/ ON 6Tl)
                                                            txo
                                                            -r-l
                                                            fo

-------
                                                                     33
  ....  1    ,
      i—  O  00
r—  CN  CO  CO  CM
 c  c  c  c  c
 o  o  o  o  ^o
•j;  '£.  '^  '£•  '•£
 °  °  5  S.  £
I/I  l/>  l/">  CO  CO
 t>  a  •  o  +
                                   .
                               ,.*/*/*'•
1 1  1  1  1  1   t
                         _Li_L
                                                     o
                                                     Os
                                                     cs
                                       O
                                       00
                                       O
   "8
O "O
00  S
o  u
-  S
O  ".£
Sf
    (U
    o
    o
    o
                                                     o
                                                     00
                                                     IT)
                                                     CN
                                                        <£
                                                                 00
                                                                 o
                                                                 c
                                                                 CO
                                                                 0*
                                                                 o
                                                                 OJ
                                                                 o
                                                                 I— I
                                                                 
                                                   o
                                                   0>
                                                   C!
                                                   O
                                                   • t-t
                                                   -*-*
                                                   OJ
                                                                 (U
                        8
                                                                 o
                                                                  Ng
                                                                 O  C35
                                                   O
                                                   In

                                                   ti
                            6Ti)

-------
35
1 1 1 1 1 1 1 I
i I 1 1 1 * 1 1
o
o

o
CN
o

GO
o-

10
o -8
GO Q)
QJ

O U
P^ ^
0)
o
c-
o
CO

o
CM

-------
36
i i i i i i
cs
.Ox
o
cs
cs
•o
00
cs
IO
CS
i i i i i
V
-
ID

o
o
o
•
o
.£

CO
CO

a «

M £>
c ^
O 3
•- o
rt -C
^ r-
c »
3 05
O ^

8«-

S S

I §
•—i t—i
X rt
O U
aiijdd
-------
37
would provide a better fit, although there is no apparent theoretical
reason for this to be the case. The beta and gamma distributions do
not appear to fit the tail of the oxidant concentration distribution at all
well. The nature of these distributions is discussed more extensively
in Chapter V.

Point versus Area Sources
The question of the difference in concentration distributions
between point and area sources is of great interest from a practical
standpoint. One must be able to evaluate the contribution of a large
point source toward air pollution to determine, for example, the
feasibility of a particular location for a polluting industry. It is, how-
ever, a difficult question for which the literature contains conflicting
answers.
Gifford (5) has proved theoretically and has presented a limited
amount of data to the effect that logs of concentrations of a pollutant
from a point source are proportional to a chi-squared distribution for
suitably normalized data based on the Gaussian Plume equation.
Benarie (3) has proven that such concentrations are lognormally dis-
tributed on the basis that they are merely tracers of a lognormally dis-
tributed windfield. Knox and Lange (2) analyzed a 5-year release of
Argon-41 from the Chalk River reactor in Ontario, Canada, (Argon-41
has a zero background concentration) and determined that the lognormal
distributions fit poorly, although they did not propose an alternate dis-
tribution. An analysis of their data indicates that the chi-squared dis-
tribution did not fit well either.
In Chapter II a modified version of the derivation of the frequency
distribution from a point source is presented. The result of this
derivation, which considers the variability in the windfield, is a dis-
tribution composed of a sum of chi-squared and lognormal components
determined by the magnitude and direction of the windfield, the stability
conditions, and the distance from the source.
In the cases presented in the literature, these factors are not con-
trolled. There is not yet enough data to determine the distribution
identity as a function of the relevant variables. The point to be made
here is that the presented equations do indeed predict such differences as
-------
38
are reported, and that future research may yield a quantitative treatment
of these differences.
A final point is that the diffusion-equation-based model is con-
sistent with this result. The form of the predicted frequency distribution
is dependent on the relative source strength, and is sensitive to stability
conditions and windspeeds through the advective and eddy diffusive trans-
fer fluxes. However, the exact form is more difficult to predict from
this model,which is more appropriate for area sources.

Summary
This chapter discusses the differences in pollutant concentrations
resulting from point and area sources. It also discusses fundamental
differences in the distributions measured for reactive and inert
pollutants.
These differences are explained within the framework of the
models discussed in Chapter II. The fact that the observed distributions
appear to be consistent with model predictions lends further support to
the validity of the concepts presented here.
-------
CHAPTER V —THE FREQUENCY DISTRIBUTIONS

Lognormal Distribution
The natural logarithm of a lognormally distributed
random variable is normally distributed. This relationship implies
that the lognormal is the multiplicative analog to the normal distribu-
tion. In particular, where the process
x[ = XL_I + e (V-l)

generates a normally distributed random variable, the process

XL = (x.^) e (V-2)

generates a lognormally distributed random variable; e is an arbitrarily
distributed random shock.
Indeed, many physical processes are best described by Eq. (V-2)
and hence their result is lognormally distributed. The lognormal is
more than a variation of the normal distribution, in fact it is one of the
most fundamental distributions of mathematical statistics. Increasingly
it is being found that the outputs of physical processes are lognormally
distributed. It is a distribution that physicists, meteorologists and
engineers all encounter.
The lognormal distribution is given by:

A(Y) = N(log x) x> 0 (V-3)
and
dA(x) = — exp
xa
1
- rj- (lOg X -
2a
dx x> 0.
Thejth moment about the origin is given by
r00
m, = / xj dA(x) (V-5)
r00
= 1 e*y dN(y) (v-6)
-00
39
-------
40
= eJu + l/2j2a2 . (V-7)

Therefore, the mean and variance are given by

rv — o / t\7 Q\
" - e (v-o;
p/2 \
P2 2u+az { cr J 22 ._, ..
P=e ^e - I/ = a rj (V-9)

2
where r) = e - 1. The third.moment is:
and the fourth moment is
6n10 + isn8 + i6n6 + 3n4) (v-ii)
which results in nonzero coefficients of skewness S1 and kurtosis S0,
-1 ^
m „
S1 =-4 = rj +3n (V-12)
S = -4 - 3 = n8 + 6r]6 + 15n4 + 16r]2 (V-13)
4
Skewness and kurtosis are both positive and both increase as the
variance increases. „
The mode of the distribution is given by e , the median by e ,
and the mean by e ' , hence the curve appears as in Fig. 7. Fig-
ures 8 and 9 illustrate the effect of varying the parameters.
Most important to the present studies are the reproductive prop-
erties of the distribution. The necessary theorems will be stated with
outlines of the proofs as required.

Theorem 1. If x^ and x2 are indpendent A variates, then the product
X..XP is also a A variate.
Proof: This is proved by taking logs to convert the distributions to
normal distributions, convolve the resulting variables and
convert the result back using an antilog transform.
-------
41
Fig. 7. Frequency curves of the normal and
lognormal distributions.
-------
42
0
Fig. 8. Frequency curves of the lognormal
2
distribution for three values of cr .
-------
43
1.0
0.8
0.6
0.4
0.2
0
0
Fig. 9. Frequency curves of the lognormal
distribution for three values of /u.
-------
44
Theorem 2. If {x.} is a sequence of independent positive variates having
J
the same probability distribution and such that:

E{log x.} = u (V-14)
J

V2{logx}=cr2 (V-15)
J
n
and both exist, then the product Jl x. is asymptotically distributed as
2 1=1^
A(nu, ncr ). J
Proof: By analogy to the additive normal Central Limit Theorem.
For limited numbers of variates in the sum it has been demonstra-
ted that the sum variable is lognormally distributed for certain ranges
of the coefficient of variation (21). Figure 10(a) gives these conditions.
Goodness-of-fit tests have been derived for the lognormal. The
chi-squared test is appropriate of course; the Kolmogoroff-Smirnov
test is a nonparametric technique for determining a confidence band
around an empirical distribution function. Another useful test is to plot
the data on lognormal probability paper on which truly lognormal data
will plot as a straight line. Figures 10(b) and 10(c) illustrate several
such plots.
Up to this point we have discussed primarily the lognormal dis-
tribution which appears to have considerable empirical and theoretical
support. There are, however, other distributions which fit the same
data quite well and are therefore deserving of mention. It is interesting
to note, however, that no non-empirical support for these distributions
has yet been published.
Lynn (22) used data from Philadelphia, Pennsylvania to estimate
the parameters of several distributions by the method of moments. The
distributions are the normal, two-parameter Lognormal, three-parameter
Lognormal, gamma, and Pearson-IV parameter. The goodness-of-fit
statistics are summarized in Table 3.
Not considered here is the Weibull distribution which has con-
siderable support from several sources, Milokaj (23) and Barlow
(24).
In Table 3, notice that the normal distribution was clearly the
worst. The two-parameter lognormal was the best by a small margin
-------
45
8 10
0)
-Q
E
c
V
c
10'
0.1 0.3 1 3 10 30 100
Coefficient of variation

Fig. 10(a). Regions of convergence where
the sum of n lognormal vari-
ates is approximately lognormal.
(A) Convergence for both normal
and lognormal approximations,
(B) convergence for the lognormal
approximation, (C) convergence
is uncertain (21).
-------
46
fc
•
cs
o
CO
CO
V
tuO
rt
IH
(U

rt
§
o
CO

R
13
0)
u
) <»
> c
o .2
o
CO
o
CN
-------
47
I i i i i i
o-
o
o
o
o

o
oo
o
m
o-

o
O
§
m

m
•
o
(D
U
X
^ c
o .2

s 1
o v
co u

o 8
CM „
c
rt
W
>^
rt
-o

C
O
o
a
w
0)
O
tuO

ctf
o

£
rt
CO
C
O

"rt

-4->
C
OJ
a
0
O
tj
CO
a;
tuo
rt
t-,
OJ
>
rt
§
O
c~
as
O
o
w
o
o
Ludd
o
W)
fa
-------
48
3 22
Table 3. Summary of total absolute deviations (20 jug/m classes).

Station 1960
1

1961
1962
1963
1964
1965
1966
1967
1968
Average
Station 1960
2

1961
1962
1963
1964
1965
1966
1967
1968
Average
Station 1960
3

1961
1962
1963
1964
1965
1966
1967
1968
Average
Station 9 1968
Station 11 1968

Average
Normal
dlst.
109.8
135.1
129.9
159.4
119.5
129.1
129.0
131.0
120.3
129.2
114.2
125.7
177.2

125.6
119.5
143.5
143.7
134.6
135.5
108.6
126.4
106.0
122.9

123.8
134.9
131.2
136.9
123.8
94.0
60.2
125.6
Lognormals
2-P
43.4
56. 6a
45. Oa
60. 8a
59. 3a
59.0
52.2
51. 2a
60.2
54. 2a
46.6
41. 6a
82.2

38. 8a
43. 5a
43.3
59. 8a
60.4a
52.0
47. 3a
39. la
27. 7a
56.4

56.0
58.6
53.7
61.1a
50.0
60.3
31. 6a
51.7a
3-P
43. 2a
62.6
46.7
93.1
60.3
55.9
52.4
52.5
59.4a
58.5
39.5
46.1
64 .4a

39.2
46.5
35. 8a
63.6
61.3
49. 6a
47.8
39.3
28.1
57.8

54. 6a
66.8
53.8
65.3
51.7
58. 9a
36.0
53.0
4-P
45.6
109.3
49.7
131.6
70.9
55.1
48.3
60.6
86.6
73.1
36.4
69.3
105.7

62.3
48.3
52.1
75.6
80.9
66.3
57.2
56.9
29.6
48.0

57.4
50.0
52.2
82.5
54.2
72.8
36.9
64.1
Pearson
Gamma
54.0
110.1
49.6
210.6
70.5
a 84.8
a 48.4
63.9
69.3
84.6
a 39.1
68.1
68.2

53.4
63.5
67.0
62.9
68.9
61.4
66.7
53.1
29.1
45. la

89.4
a 51.7
49. 7a
73.1
57.2
60.7
32.0
66.8
*Best fit.
over the three-parameter lognormal, and the Pearson distributions
fared considerably worse. However, note that In some cases each of
the distributions provided the best fit. It is a curious fact that the
-------
49
fitting method employed assigns a nonzero value to the location param-
eter despite the fact that zero will provide a better fit according to the
sum of absolute deviations criterion. This provides a caveat, that this
table might have been considerably different had a different fitting
method or criterion been employed. This is seen from the fact that the
2-p lognormal fit better than the 3-p, of which it is a special case.
Note too that the 2-p lognormal fared better than the 4-p distributions,
a surprising fact due to the greater flexibility of the 4-p distributions.

Weibull Distribution
In 1951, Woloddi Weibull (25) published a paper in which the
applicability of the distribution commonly written:

f(x) = Kxm exp [- Kxm+1/(m + 1 )] (V-16)

was demonstrated. Although it had been known before 1951, the distri-
bution has come to bear his name. The derivation presented therein is
an interesting one since it is not derived from a single theoretical
principle. Weibull approached the problem of finding the probability of
failure of a chain consisting of n links P . He noted that the probability
of nonfailure of the chain l-Pn is equal to the probability of nonfailure of
all the links simultaneously (1-p) where p is the probability of failure of
an individual link. Therefore, if each link has a distribution function
governing its failure of the form
Fix) = 1 - e, (V-17)
the distribution function for the chain will be
Pn = 1 - e^(x) . (V-18)

The remaining problem is to specify ^(x). The only necessary condition
is that it be a positive nondecreasing function, vanishing at ^ which is
not necessarily equal to 0. Weibull then stated that the simplest function
satisfying this condition is

i \m
f(x) = (x " ^
X0
-------
50
The remarkable fact about this is that there is no theoretical justifica-
tion for using this form, indeed Weibull states: ". . . it is utterly hope-
less to expect a theoretical basis for distribution functions such as ...
particle sizes."
As it happens, the Weibull distribution has been used to fit a large
number of naturally occurring phenomena quite well. These include oil
spill data, particle sizes, distances in cotton fibers, molecular weights,
and solution concentrations.
The Weibull has both two- and three-parameter forms, the latter
Eq. (V-16) being more common, where parameter m determines the
shape of the curve and K is the scaling factor. Figure (11) illustrates
the Weibull distributions for several values of m. Note that for m = 0
the Weibull reduces to the exponential, and for m ='1 it is equivalent to
the Rayleigh distribution. It is obvious that when m = 1 or 2 the dis-
tributions appear similar to the lognormal. In fact, in cases where both
distributions fit the same data, the Weibull shape parameter is always
near this range.
The mean of the Weibull distribution is given by

/m+i - 2/m+l
E(x) . r
m +
,)
and the variance is given by
Var(x) =
r (m + 3\ - r2 (m + 2\
\m + I/ " \m + I/
(V-21)
As is usually the case with skew distributions in practical applica-
tions, the median is used as a measure of central tendency rather than
the mean. The latter is extraordinarily sensitive to values in the tail of
the skew distribution. This is true also for the lognormal.
For the Rayleigh,
E(x) = (V-22)
Var(x)=£ 1 - ~ . (V-23)
-------
51
2.0

1.6

1.2

0.8

0.4
0
0 0.4 0.8 1.2 1.6 2.0 2.4
/T7K
Fig. 11. Frequency curves for Weibull (top)
and Rayleigh probability distributions.
-------
52
It is interesting that the Weibull Ls usually fit with a narrow range for
the shape parameter from one application to another. This suggests
that the Rayleigh distribution may provide a fair fit, a surprising fact
because it implies that a complex physical process is adequately de-
scribed with only one parameter in a distribution with no theoretical
foundation.
Figure 12 demonstrates the appearance of the Weibull probability
distribution on log probability paper for parameter values typically
obtained in pollution work.

Gamma Distribution
The gamma distribution is also used in air pollution work. We
can easily see why from Fig. 13(a). The gamma has the ability to
appear quite similar to both the lognormal and Weibull, depending on
the values of the scale parameter J3 and the shape parameter a in
(V-24)
a > -1, ]3 > 0

0 < X < 00 .

The distribution can be derived as the distribution of the sum of
n identical exponentially distributed random variables. The mean and
variance of this distribution are given by
E(x) =j8(a +1)
Var(x) = /32(« + 1) (V-25)

The gamma distribution also has a three-parameter form. The three-
parameter form is derived by subtracting a location parameter from the
mean, a process which also results in three-parameter forms for the
other distributions. The three-parameter form is given by
-------
53
o
cs
o
o
o.
CO
o
o
o
o
00
(J
X
c
o .2

10 o
o
CO
CN

o
0)
a
rt
ex
cti
x>
o
^
a

tuo
o

c
o
o

C5
o

^-»

'C
-4->
03
^2
• •—I

• ~^
-*->

rt
r-H

1
8
o
•

o
t
• »-«

fa
uj dd
-------
54
o in
o rx

.— O
o
in
m
CM
CO
0)

r— I
0)

to
rt
G
O
CO
• v-4
TJ
OS
•8
al
s
s
rt
CUD
CO
ta
O
CJ
C
cu

a1
CO
— o
-------
55
cs
cs
cs
cs
0)
a
rt
a,
o
o

cs'
cs
CO
CS
CN

CS
O
CO
O
0)
T3

0)
O
rv x

0 c
S*O

o -2
o
CO

o
CN
CM
c
0)
o
c
o
u
rt
Xi
o
f-i
a

tuO
o
0)

"o
•—I
a

c
o

3
,0

'£
-*-j
en

s
rt
tuo
s

3
CO
t—I

bi
o

o*
iudd
-------
56
a
f(x) =
(X - XQ)
(V-26)
x > 7
a > -1
j3 > 0 .
The gamma distribution is most widely used in reliability theory.
Figure 13(b) demonstrates that appearance of the Gamma
probability distribution on log probability paper for parameter values
typically obtained in pollution work.
Pearson Distribution
Pearson's system is to provide a theoretical density function for
every possible combination of skewness and kurtosis (B.., B2) (see
Fig. 14). There are three main types, I, IV, VI. Type I is the beta,
type IV is the gamma. The procedure is to calculate ELand B0 and see
1 £t
which part of the plane is indicated. For air quality data, type I often
occurs. Type VI has been investigated also, although type IV was not
needed.
The type I density is given by
m
f(x)
4-p form,
_T(p) • r(q) (x - A) l (B - x)
T(p + q) ,R
\£S ~

= p -1
= q - 1.
The type VI is given by
y =

(q2 +

xo(qi

q9
D2(qi-q2
ql
-1) ^(c^-c

qt-q?
- 2) l *

12 - 1) F(q

r(qx)

, + l)
(\q2~
L +~^2~)
1 \QI
- \ Ai / -
(V-27)
(V-28)
(V-29)
-------
57
|
-------
58
where
and
X (q +1)
A2 ' (qt - 1) - (q2 + 1)
-------
59
o-

cs
CN
CN
^
oo
o-
8
o
10
o
CO

o
CM
10

10
•
o
8
o

uidd
o
•
o
c
0)
o
c
•§
(H
a
•a
a>
0)
13
?> X
O
— I
a
Q) g

c 2
o £

P 5
fl
•!->
CO
•iH
•a

rt
•*j
OJ
^3

0)
>
• V*
-f-»
aJ
i— i
y

a
b
-------
60
jen[f(x)] - InK + minx - Kxm+1/(m + 1) (V-36)

d{ln[f(x)]} = m _
3x x

= | (m - Kxm+1) . (V-38)

For the gamma,

f(x) = L-rXae"x^ • (V-39)
in[f(x)] = inl - 4na! - ^n(/3) +aj?nx - x/3 , (V-40)
= i (a - x/)3) . (V-42)
A

Summarizing the final result for each:

lognormal =!(-!+ -|-^)
V a cr /

Weibull = - (m - Kxm+1)
X

gamma = — (a - x/|3)
X

In each case there is a constant term, a function of the shape
parameter, and a function of x. From experience in fitting those dis-
tributions, we know that the value of the appropriate parameters adjust
so that the constant term is often between 1 and 2. The remaining term
is higher order, and serves to provide the differences in goodness of
fit noticed between these distributions.
Thus, the required theoretical results have been provided, with an
additional section describing the similarity between these distributions.
-------
61
Summary
In this chapter the air quality distributions presented earlier are
discussed from a mathematical viewpoint. This serves to illustrate
more clearly the nature of air quality distributions.
A result of significance to air pollution data analysis is the trans-
formation which indicates a fundamental similarity between the various
distributions used to fit air quality data. Future research may there-
fore be directed at modifying the second-order term to provide a more
accurate distribution in cases where the two-parameter lognormal is
inadequate because of the magnitude at the source term or the reactive
nature of the pollutant.
-------
CHAPTER VI —ILLUSTRATIVE APPLICATIONS

There is considerably greater utility in these results than the
simple comparison of air quality to standards. Knowledge of the
nature of these distributions and parameter variations under various
conditions allow the construction of models for a variety of purposes.
Several are outlined in the following section.

Analysis of Meteorological Patterns
for Pollution Level Forecasting
A particularly useful application of these techniques is discussed
below. It is included as an example of the power of the techniques pre-
sented earlier. Note that the fundamental philosophy of this application,
i.e., that pollutant concentration distributions can be used as a partial
substitute for meteorological data, is motivated by the arguments pre-
sented in this dissertation.
It has been established in Chapter II that the only variables affect-
ing future pollutant concentrations are emissions, meteorological
variables, photochemical change, and current concentration levels. It
was further established that for all pollutants for all averaging times,
concentrations are approximately lognormally distributed by both
theoretical and empirical arguments. These assumptions can be used in
the formulation of a predictive model. If one plots cumulative distribution
functions on lognormal probability paper for individual days using hourly
averages, the resulting curves are based on 24 points and, as expected
from the theoretical argument in Chapter II, are relatively flat, straight
lines. If one plots a great many such curves taken from data at one
location over a certain period of time, it is possible that the resulting
diagram will appear similar to Fig. 16. For regions where the clima-
tology is very persistent, the clustering at the lines will be clear.
These lines can be mapped into points by plotting geometric mean
(GM) versus SGD as in Fig. 17. In this plot, taken from actual Los
Angeles (downtown) oxidant data for 1970, the degree of clustering is
clear. It is also clear that changing the metric in which these points are
plotted alters the degree of clustering seen from a plot. If we can find

62
-------
63
I 7111 I
I I I I I I
o

o
Os
00
Os

o
cs
o
CO

o
fv.
o
CO

o
CN
to
CM
V
-o
<0
0)
o
X
o .2
c
0)
o

8
CO
c
(4
0)
nJ

§•
rt

«M
O

co
CO
o
a
CO
W)
o
•
o
aidd
-------
64
the most independent clusters, with the total number of clusters con-
strained, we have then identified days with similar air quality patterns.
The significance of this is clear from Chapter H. Days with
similar air quality patterns tend to have similar meteorological and
emission patterns. Therefore, if we can identify the meteorology and
emissions in each cluster and can then predict tomorrow's meteorolog-
ical and emission pattern, we can determine in which cluster tomorrow's
air quality will fall. Of course, each cluster refers to a particular GM
and SGD which, as shown in Chapter II, completely describe a day's air
quality from the point of view of air quality standards.
There is evidence (26) to the effect that emission patterns are
primarily dependent on the day of week and time of year. Consequently,
if we stratify our clustering graphs by day of week and season, we can
then relate each cluster directly to a meteorological pattern.
The problems remaining are twofold:
1. How do we find the "best" clusters?
2. How do we determine into which cluster tomorrow's air
quality falls?
Both of these problems could be handled pragmatically by an
"eyeball" solution. In areas with high climatological persistence, this
coarse method might give acceptable results. There are, however,
exact methods to deal with these problems also.
If we refer to Fig. 17 and treat the points as nodes in a graph, we
immediately realize that the clustering problem that has arisen from our
air quality classification problem is mathematically equivalent to the
clustering problem in modern graph theory. In fact, a great nUmber of
papers have been written concerning methods of solving for the optimal
clusters. In general, the techniques do not propose to solve a large
problem completely, but rather they deal with effective compromises
which utilize the tradeoff between the distance the algorithm comes from
the true optimum and the computing time necessary to reach that point.
In our case, the problem will not usually be very large. It is
unlikely that more than three or four years of data would be analyzed
together because of the gradual change in emissions. That leaves us with
approximately 1400 points. As a working approximation, we can use only
two significant figures which will serve to make many points identical.
-------
65
4.0
c
0

-------
66
3
Hence we may expect to have about 10 points which we will try to
divide into between 10 and 20 clusters. Several algorithms are
available which will handle these numbers in a reasonable amount of
computing time (27).

Selecting the Clusters
It is beyond the scope of this work to discuss the details of these
algorithms; however, a simple explanation of a general method is in
order. First, arbitrary clusters are selected and the bivariate median
within each is calculated. Then, for each cluster, we calculate the sum
of the distances from the bivariate median to each point within the
cluster and the sum of the distances to each point out of the cluster. The
objective function is to maximize the between-cluster differences minus
the within-cluster differences. Each algorithm has a rule by which
incremental changes in the clusters are made; at each stage the objective
function is recalculated. Each algorithm also has a stopping rule based
upon the size of the marginal improvement of a single or of a series of
changes in the clusters.

Classifying New Days
The next problem to be dealt with arises after the final clustering
arrangement has been identified. When the model goes into operation,
the National Weather Service (NWS) forecasts are examined and the
values of the predictor variables are selected. Now, from, these values
we must decide into which cluster the new day should be classified. This
question should be answered even before the meteorological data are
reduced.
The simplest method is to reduce data for the variable which are
thought to be most important and construct a range of each variable for
each category by "eyeball." Then the forecaster looks at the ranges thus
selected and selects the cluster which the day most closely matches. A
problem arises when th,e cluster is not distinct; in this case more
variables or narrower ranges are needed. This information could also
be displayed concisely in a series of nomographs, which would eliminate
one source of error. A refinement of this technique would call for
probability distributions of the ranges so that a value occurring near the
-------
67
center of the range would be weighted more heavily than one near the
extreme. Then point scores based upon probability could be used as the
selection procedure.
A more rigorous method would be one that examines the proba-
bility distributions systematically and calculates both the classification
and its probability of error. A well known statistical technique for
which programs are available and which performs the required opera-
tions is multiple discriminant analysis (MDA).
In discriminant analysis, linear functions are developed which
classify a new set of observations into one of several existing categories.
The basic philosophy of the method LS to define the categories to be used,
in this case each cluster. Then, the values of the predictor variables
associated with each point within each cluster are examined to discover
patterns which will aid in the classification of a new set of predictors.
A new metric for the predictors is found which maximizes the discrim-
ination between the classes of predictors. Then based upon the within
and between class distributions, conditional probability functions can be
constructed which give the probability of membership in the ith class
for a new set of predictor variables. A more complete discussion of the
statistical techniques involved may be found in Ref. 28.

Recalibration
It is conceivable that over a period of several years the patterns
of emissions and /or meteorology of an area will change in such a way as
to affect the accuracy of the predictions made with this model. Fortu-
nately, recalibratlon is accomplished in a relatively simple, straight-
forward manner.
Throughout the operation of the predictive model data should be
kept, perhaps in a small notebook, noting the values of the variables used
as predictors, the prediction, and the observed ambient air quality
(AAQ). When the predicted and the observed values vary unacceptably
the model is recalibrated, perhaps by the addition of a new category,
either by an heuristic method or the "clustering" program, or by a full
recalibration performed exactly as the original. Since the new data are
available and the programs are already written, this procedure presents
no problems. It is unlikely that it would be performed more often than
biannually.
-------
68
Spatial Interpolation
It should be clear that the foregoing analysis can predict concen-
tration levels at only the receptor locations for which data sufficient for
calibration are available. To predict air quality throughout a region, an
Interpolation scheme is required.
Because of the high correlation between wind velocities and con-
centrations of inert pollutants, especially over areas with relatively
simple topography, Interpolations can be made. It Is necessary to make
the assumption that concentration Isopleths can be determined from the
streamlines of the wlndfleld, source location and available meteorological
and concentration measurements (3). Given this and several receptors,
one can Identify the value of the Isopleths passing through the receptors
and interpolate for locations between the Identified lines. Linear
Interpolation Is adequate for areas which appear to be uniform In
topography and emissions. Otherwise, experimental sampling, ran-
domized spatially and temporally, may be employed to determine better
Interpolation functions. For Inert pollutants from a point source,
Benarie (3) has discussed this Interpolation problem with respect to the
frequency distributions. In his work the same fundamental assumption
is made," but he makes another assumption which simplifies the calcula-
tion of the frequency distribution of an Intermediate point. In particular,
he assumes that the SGD, represented by the slope of the plot of the
distribution function or lognormal probability paper, Is constant through-
out a streamline of the wlndfleld. Therefore he requires only one point
on the distribution function to estimate both parameters at an Inter-
mediate point along a wlndfleld streamline for which he has the SGD
calculated at another point. This method provides additional information
with little effort and seems promising to be used In conjunction with the
random sampling scheme mentioned above.

An Example
As an Illustration of the techniques described herein, we shall
present a simplified and to some extent hypothetical application of the
model. Figure 17 Is a graph of GM versus SGD for oxldant data In
downtown Los Angeles In 1970. In Fig. 18 "eyeball" estimates of the
clusters have been made.
-------
69

3.0
0
-------
70
Now, according to the procedures outlined above, one takes a
random sample of the points in each cluster and analyzes the meteorology
for the day that point represents. This analysis requires a large
amount of data reduction, and functions best with more than one year of
data, stratified by season and day of week, to determine the values of
windspeed, temperature and other variables used. Some of these data
are recorded only in the archives of the NWS on synoptic scale maps
which require a trained meteorologist to read. This is clearly beyond
the scope of this work, despite the fact that this model is comparatively
simple to calibrate. However, an actual run of the model is not
essential for illustrative purposes. Instead we shall now turn to a
hypothetical discussion of the kind to be expected in real application.
If we assume that we have successfully reduced data for a random
sample of the days in each category, we can examine the range of values
of the variables for each category. If we see, for example, that for the
topmost cluster in Fig. 18 windspeeds are between 0 and 5 knots and the
temperature varies between 90 and 95 deg, the oxidant concentrations
are at episode levels. We record these values and proceed to the next
cluster in each case, noting the mean, range and standard deviation
of each variable.
Upon completion of this analysis we will be prepared to draw
Fig. 19 which is a graph of the ranges of the two variables for each
cluster and the identification of the cluster. Note that a nomographical
technique would be required for a case with more than two predictors.
The shaded portion of the figure indicates an area of uncertainty
where more than one cluster applies. The decision can be made using
rigorous techniques such as discriminant analysis which gives both the
classification and the probability of error, or by an heuristic
technique such as determining how many standard deviations the center
of the shaded portion is from the center of each of the two clusters and
selecting the smaller. In effect, the latter method is a simplified
"discriminant analysis."
Each cluster has its expected air quality level and a range of un-
certainty, hence once the graph (Fig. 19) has been entered with the NWS
predicted values, the problem is complete.
-------
71
5 10
Windspeed — knots
15
Fig. 19. An example of the form of the chart
to be developed in comparing the clusters
generated in Fig. 18 with windspeed and
temperature.
-------
72
Development
The model, described above takes into consideration the expense of
data collection and development work and the availability of computer
time in that it minimizes the calibration data requirement, does not
need a mesoscale weather prediction model because it adapts the standard
NWS predictions, and does not require a computer on-line for pre-
dictions.
Further, the development effort required by the predictive model
is also applicable to land use planning models and the comparison of
present air quality with that of a benchmark year. For the latter
application, the model eliminates bias in the comparisons due to
differences in the meteorology of the years being compared.
This model was proposed to the California State Air Resources
Board for use in the South Coast Air Basin to predict oxidant concen-
trations 24 hours in advance. The funding request was $80,000. In-
cluded in this amount was a full-time statistician, a part-time
programmer and a full-time meteorologist.
Other modeling concepts are likely to be more expensive. For
example, the multibox modeling concept requires considerably more
effort and computer time, and least-squares analysis requires sub-
stantially more data.

Transition Matrices
A further application of the material presented herein involves the
application of the categories defined above in land use planning. In
particular, the air quality categories, stratified by emission-day-types,
will be used to determine typical meteorological patterns, a problem
previously tractable only subjectively by meteorologists examining large
numbers of weather maps or by statisticians analyzing huge amounts of
data much of which must be reduced from maps by trained meteorol-
ogists (29).
It can be argued that each of the clusters defined in the calibration
of the predictive model represents a particular meteorological pattern.
It is conceivable that two or more different patterns could constitute the
same cluster; however, this is not necessarily significant because it is
-------
73
unnecesary to distinguish between meteorological patterns yielding
identical air quality for many purposes.
Therefore, if one calculates the frequency of occurrence of each
pattern stratified by day of week and season, the results thus ob-
tained allow one to simulate a year of "typical" meteorology which, in
actuality, is a composite of all years for which air quality data exist.
For most large cities, the continuous air monitoring program (CAMP)
began in 1961. This "year," which is actually created as a Markov
model using a state transition matrix, can be used in conjunction with a
dispersion model as a benchmark year for comparing air quality over
time, or in land use planning to determine future annual average pollu-
tant concentrations.

Random Sampling
The justification of the lognormal assumption permits the use of
parametric methods to operate on air quality data. An example of the
usefulness of parametric methods as opposed to distribution-free
methods is seen in the sample size required to estimate population
parameters within a specified accuracy.
Table 4 indicates the sample sizes required under the various
assumptions for oxidant data taken in Los Angeles in 1970. The
efficiency of the parametric methods suggests that random sampling is
an efficient method of characterizing regional air quality when an
appropriate randomized scheme is used. Note that the measurements
consisted of hourly averages every hour for an entire year. Clearly,
a more cost effective scheme was possible.
-------
74
Table 4. Sample size required for various confidence limits on
estimates of GM and SGD.
PARAMETRIC
Mean (Z-Statlstlc)
At 95%

At 99%

At 95%

10%
5%
10%
5%
2
Variance (x. Statistic)
20%
10%
5%
NONPARAMETRIC
90 samples
359 samples
1 55 samples
621 samples

200 samples
750 samples
5,000 samples

Mean (using Chebyshev Inequality)
At 95%

At 99%

10%
5%
10%
5%
467 samples
1,869 samples
2,336 samples
9,344 samples
Variance [using Kolmogoroff's method (D-Statlstlc)]
At 95%

10% on CT
5% on CT
3,000 samples
(approx)
18,500 samples
(approx)
-------
CHAPTER VII —SUMMARY AND CONCLUSIONS

The object of this work is to provide a model which binds together
previous theoretical and empirical findings in a unified framework, and
in so doing provides a deeper understanding of the physical processes
which affect frequency distributions of air pollutant concentrations.
To this end, the frequency distributions of air pollutant concen-
trations have been derived from first principles for both point and area
sources, for both reactive and inert pollutants. These results have been
compared to published findings and have been found to be consistent.

Area Sources
Both Larsen's data analysis and Gifford and Hanna's simple model
indicate that pollutant concentrations are approximately lognormally
distributed. The former work consists of the examination of large
quantities of data for all pollutants, for all cities and for all averaging
times. The latter indicates the high degree of correlation between
windspeeds, which are approximately lognormally distributed, and
pollutant concentrations.
From the nonempirical standpoint, this distribution can be de-
rived from the Fickian Diffusion equation by manipulating it into a finite
difference form and demonstrating its consistency with the law of pro-
portional effect. This method predicts the lognormal distribution will
fit best for inert pollutants from area sources, and least well for re-
active, secondary pollutants. Larsen's results and those of Gifford
and Hanna are in good agreement with this assertion. This derivation
also predicts that the lognormal distribution will not fit as well close to
a source as it will further away. Recent data collected near large
sources bear this out.
Also, a generalization of Gifford's point source model based upon
the Gaussian Plume solution to the Fickian Diffusion equation indicates
that pollutant concentrations are lognormally distributed if the geometric
and arithmetic means are simply related.
. Peripheral to the main discussion is an explanation of the surpris-
ing fact that pollutant concentrations are approximately lognormally dis-
tributed for all averaging times. This is explained through an analysis
75
-------
76
of the averaging process as a window through which atmospheric motion
of various scales can be seen.
As a result of this Investigation, we are prepared to assert
strongly that the Ipgnormal Is an appropriate distribution to use to
characterize air quality data. We recognize that this will not
significantly affect current practice, which has been proceeding on this
basis, but will serve to quell the arguments concerning the correctness
of this assumption, and lend further empirical and nonempirical sup-
port to those who are currently using the lognormal assumption. These
users Include parties responsible for monitoring air quality, meteorol-
ogists who are modeling atmospheric transport, and others who have
use for these distributions along the lines suggested In Chapter VI.

Polnt'.Sources
No general agreement exists on the Identity of the frequency dis-
tributions of air pollutants emanating from a point source. The empirical
findings of Knox and Lange and Benarie are at odds with the theoretical
prediction by Gifford. At present there Is no explanation of these dis-
crepancies in the literature.
In Chapter II a derivation Is presented which Indicates that any of
the distributions mentioned in the literature may result depending on
atmospheric stability, wlndspeeds, and the distance from source to
receptor. Within this framework, each of the results In the literature
may be obtained for the appropriate values of the relevant variables.
This suggests strongly that the eventual quantification of these
relationships will proceed along the lines outlined here. This model is
the first to reconcile the conflict through a treatment which provides
understanding of the fundamental physical processes involved. It allows
air pollution engineers to state with some certainty that pollutant con-
centration distributions resulting from a point source are neither log-
normal nor chl-squared, but rather a subtle combination which depends
upon the particular conditions under which the pollutant is measured.
-------
77
Related Variables
The main point to be made here is that pollutant concentrations
are tracers of atmospheric motion. As such, air quality frequency
distribution data can be used as a partial substitute for meteorological
data under certain conditions. An example is presented in Chapter VI.
To illustrate this point the cases of advection, diffusion and
deposition are treated. Advective transport rates have been investigated
empirically, and the lognormal distribution appears to fit quite well.
Eddy diffusive transport rates can be demonstrated to be approximately
lognormal by Kolmogorov's similarity theory argument, based upon
energy exchange between different scales of turbulent motion. Deposi-
tion is based upon particle size distributions, which can be shown to be
approximately lognormally distributed from both empirical and non-
empirical arguments.
According to the "simple model" proposed and effectively applied
by Gifford and Hanna, there is substantial correlation between windspeeds
and pollutant concentrations. Based upon the lognormality of transport
this statement is well motivated for nonreactive pollutants, especially
in areas with relatively small source terms. In some cases the deposi-
tion component of the source term is also lognormally distributed,
further contributing to the argument for the lognormality of pollutant
concentrations.

Other Distributions
A number of authors have investigated the use of other frequency
distributions to fit air quality data. The gamma, Weibull and beta dis-
tributions have received a good deal of attention. These distributions
tend to fit marginally worse than the two-parameter lognormal, according
to an extensive study by Lynn.
The fundamental questions are then: Why do these distributions do
as well as indicated in the literature, and is there any theoretical sup-
port for these distributions to be used to characterize air quality data?
There are no reports in the literature presenting any nonempirical
support for these distributions. However, the question remains that
there must be some mathematical similarity between these distributions
for the fits which have been observed to occur.
-------
78
In this work a transformation has been found which transforms the
lognormal, WeibuLL and gamma distributions to approximately the same
form for typical parameter values observed in air quality data fits. This
is indicative of a fundamental mathematical similarity between the dis-
tributions which demonstrates that if any one of the distributions fits the
data the others must also, with only small differences in the goodness of
fit.
This argument is significant in terms of the long standing discus-
sion in the scientific community concerning which distribution is most
appropriate, in that it gives a greater mathematical understanding of the
goodness of fit observations. It also suggests a path for future research
to determine the basic differences between the higher order terms of
each distribution and how they relate to the physical processes at hand.

Applications
It is instructive to examine some of the modeling possibilities
opened by the results presented here to demonstrate their utility. The
applications outlined herein are actually being developed, or have been
proposed for future development, in the author's work at the Lawrence
Livermore Laboratory. It is expected that their utility will be dem-
onstrated as the Laboratory effort progresses over the next several
years.
Frequency distributions can be used to characterize meteorological
and air quality patterns which have application in land use modeling and
pollution level forecasting. They can also be used in air pollution dis-
persion modeling, and in the validation of such models.
The results presented herein and the accepted results in the litera-
ture justify these modeling concepts more firmly than the latter alone.

Future Research
For area sources a major question to be addressed is the pre-
diction of the parameters of the concentration distribution. Research
in this area is being conducted at present (30)(31).
The author has further work planned in studying the relationship of
meteorological parameters to concentration distribution, particularly in
the relationship of windspeeds to concentration.
-------
79
The point source question is more complicated because of the
changing identity of the distribution. The fundamental question here
concerns the change in shape and parameters of the distribution as a
function of windspeed, stability, reactivity of pollutant and distance.
Perhaps other variables like stack height will also be significant. It
will take a good deal more data than is currently available to produce
definitive results on this matter.
The author has new work planned to delve more deeply into the
matter of concentration distributions resulting from a point source.
Simulation experiments are planned using the ADPIC (32) model to cal-
culate such pollutant concentrations at various distances from the source
under various meteorological regimes. The resulting frequency dis-
tributions will be compared with those predicted in Chapter II. This
work should be completed in 1974.
The fundamental question concerning the identity of the distribu-
tions is not yet completely resolved. Additional theoretical and
empirical support is still welcome, despite the strong arguments made
in this work and in previous published reports.
-------
80
LITERATURE CITED

1. R.I. Larsen and C. E. Zimmer, "Calculating Air Quality and Its
Control," JAPCA, 15, 565 (1965).
R. I. Larsen, "Analyzing Air Pollutant Concentration and Dosage
Data," JAPCA, IT., 85 (1967).
R. I. Larsen and C. E. Zimmer, "A New Mathematical Model of
Air Pollutant Concentration Averaging Time and Frequency,"
JAPCA, 19, 24 (1969).
2. J. B. Knox and R. Lange, "Surface Air Pollutant Concentration
Frequency Distribution: Implications for Urban Air Pollution
Modelling," University of California, Lawrence Livermore Lab-
oratory, Report UCRL-73887 (1972).
3. M. Benarie, "The Use of the Relationship Between Wind Velocity
and Ambient Pollutant Concentration Distributions for the Estima-
tion of Average Concentrations from Gross Meteorological Data,"
Proceedings of the Symposium on Statistical Aspects of Air Quality
Data, Chapel Hill, North Carolina, November 1972.
M. Benarie, "Sur La Validite De La Distribution Logarithmico-
Normale Des Concentrations De Pollutant," Second International
Clean Air Congress, 1970.
4. A. C. Stern, "Air Pollution," Vol. Ill, 2nd Ed. (Academic Press,
New York, 1968).
5. F. Gifford, "Statistical Properties of a Fluctuating Plume Dis-
persion Model," Proceedings of the Symposium on Atmospheric
Diffusion and Air Pollution, Oxford, August 1958.
6. F. Gifford, "The Form of the Frequency Distributions of Air
Pollutant Concentrations," Proceedings of the Symposium on
Statistical Aspects of Air Quality Data, Chapel Hill, North
Carolina, November 1972.
7. J. B. Knox and R. I. Pollack, "An Investigation of the Frequency
Distributions of Surface Air Pollutant Concentrations," Symposium
on Statistical Aspects of Air Quality Data, Chapel Hill, North
Carolina, November 1972.
-------
81
8. M. C. MacCracken, T. V. Crawford, K. R. Peterson and
J. B. Knox, "Initial Application of a Multi-Box Air Pollution
Model to the San Francisco Bay Area," University of California,
Lawrence Livermore Laboratory, Report UCRL-73994 (1972).
9. C. Hopper, personal communication, 1972.
10. N. A. Fuchs, The Mechanics of Aerosols, (Permagon Press,
New York, 1964).
11. F. A. Gifford and S. R. Hanna, "Modeling Urban Air Pollution,"
ARATDL Contribution No. 63, 1972.
12. F. A. Gifford and S. R. Hanna, "Urban Air Pollution Modelling,"
presented at the 1970 International Air Pollution Conference of
the International Union of Air Pollution Prevention Associations.
13. A. N. Kolomogoroff, Dokl. AN 5SSR, 30_, 301 (1941).
14. A. M. Yaglom, Dokl. AN SSSR, 166, 49 (1966).
15. A. S. Gurvich, Dokl. AN SSSR, 172, 554 (1967).
16. G. K. Batchelor, "The Application of the Similarity Theory of
Turbulence to Atmospheric Diffusion," Quart. J. Roy. Met. Soc.,
7_6, 133 (1950).
17. T. V. Crawford, "Atmospheric Diffusion of Large Clouds,"
Proceeding of the USAEC Meteorological Information Meeting,
September 1967, Chalk River, Ontario Canada, Rept. AECL-2787
(1968).
18. T. V. Crawford, "A Computer Program for Calculating the
Atmospheric Dispersion of Large Clouds," University of California,
Lawrence Livermore Laboratory Report UCRL-50179 (1966).
19. I. H. Blifford and D. A. Gillette, "Applications of the Lognormal
Frequency Distribution to the Chemical Composition and Size
Distribution of Naturally Occurring Atmospheric Aerosols," Water,
Air and Soil Pollution, !_, 106 (1971).
20. E. A. Shuck, J. N. Pitts, and J. K. S. Wan, "Relationships Between
Certain Meteorological Factors and Photochemical Smog," Intern. J.
Air Water Pollution, K>, 689 (1966).
21. R. L. Mitchell, "Permanence of the Lognormal Distribution,"
J. Opt. Soc. Am., _58, 1267 (1968).
22. D. S. Lynn, "Fitting Curves to Suspend Particulate Data," Proceed-
ings of the Symposium on Statistical Aspects of Air Quality Data,
Chapel Hill, North Carolina, November 1973.
-------
82
23. P. G. Milokaj, "Environmental Applications of the Weibull
Distribution Function: Oil Pollution," Science, 176 1019 (1972).
24. R. E. Barlow, "Averaging Time and Maxima for Air Pollution
Concentration," NTIS AD-729 413, ORC 71-17.
25. W. Weibull, "A Distribution Function of Wide Applicability,"
J. Appl. Mech., 293 (1951).
26. E. Lawrence, "Urban Climate and Day of the Week," Atmos.
Environ., 5, 935 (1971).
27. J. C. Gower, "A Comparison of Some Methods of Cluster
Analysis,"Biometrics, 23_, 623 (1967).
28. R. G. Miller, "Statistical Predictions by Discriminant Analysis,"
Meteorol. Monographs, 4_, 25 (1962).
29. C. L. Smalley, "A Survey of Air Flow Patterns in the San
Francisco Bay Region 1952-1955," Bay Area Air Pollution Control
District Technical Services Division Report.
30. R. Thullier, "Air Quality Statistics in Land Use Planning Applica-
tions," 3rd Conf. on Probability and Statistics in Atmospheric
Science, Boulder, Colo., June 19-22, 1973.
31. W. B. Johnson, "The Status of Air Quality Simulation Modeling,"
Proceedings of the Interagency Conference on the Environment,
Livermore, California, October 1972.
32. R. Lange, personal communication, 1973.
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
1. REPORT NO.
EPA-650/4-75-004
3 RECIPIENT'S ACCESSION-NO.
4. TITLE AND SUBTITLE
STUDIES OF POLLUTANT CONCENTRATION FREQUENCY
DISTRIBUTIONS
5. REPORT DATE
January 1975
6. PERFORMING ORGANIZATION CODE
'. AUTHOR(S)
Richard I. Pollack
8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Lawrence Livermore Laboratory
University of California
Livermore, California 94550
10. PROGRAM ELEMENT NO.

1AA009
11. CONTRACT/GRANT NO.
12. SPONSORING AGENCY NAME AND ADDRESS
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, N.C. 27711
13. TYPE OF REPORT AND PERIOD COVERED
Final
14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16. ABSTRACT
air pollution research focused on determining the identity of the con-
centration distributions for a variety of pollutants and locations and the relation-
ships between attributes of the data, e.g. mean values, maximum levels and averaging
times, from an empirical standpoint. This report attempts to identify the nature of
the frequency distributions for both reactive and inert pollutants, for both point and
area sources, and to some extent for different types of atmospheric conditions using a
substantially non-empirical approach. As an illustration of the applicability of thes
results, a predictive model and a monitoring scheme are proposed based upon knowledge
developed by studying the frequency distributions.
It is found that a theory of the genesis of pollutant concentrations based upon
the Fickian diffusion equation predicts that concentration distributions due to area
sources will be approximately lognormal over a diurnal cycle in the absence of nearby
strong sources. It is determined that reactive pollutants will have larger standard
geometric deviations than relatively inert pollutants. Empirical observations are in
good agreement with these results. The frequency distribution of the logarithms of
concentrations due to point sources is derived and shown to be a sum of normal and chi
squared components, with the identity of the dominant term determined by meteorologica
conditions. This result provides a framework for resolving apparently conflicting re-
sults in the literature. The lognormality of other meteorological variables, notably
windspeeds and the rate of energy dissipation in turbulent flow, and their relation tc
air quality frequency distributions is discussed. There is considerable discussion in
the literature concerning whether the lognormal distribution provides the best fit.
Other distributions that fit air quality data fairly well are investigated, and their
mathematical similarity to the lognormal is demonstrated.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
Air pollutants
Frequency distribution
Monitoring
Modeling
b.IDENTIFIERS/OPEN ENDED TERMS C. COSATI Field/Group
8. DISTRIBUTION STATEMENT
Unlimited
19 SECURITY CLASS (This Report)
Unclassified
21. NO. OF PAGES
94
20 SECURITY CLASS (This page)

Unclassified
22. PRICE
EPA Form 2220-1 (9-73) 83

U.S. GOVERNMENT PRINTING OFFICE! 1975 - 640-881/659 - Region 4
-------
-------