EPA-650/4-75-004

January 1975
Environmental Monitoring Series
          FREQUENCY DISTRIBUTIONS
                      Meteorology Laboratory
                National Envifomaenfo! Research Center
                  Office of Reseofcji and Development
                 ILS. ift¥ir$wll^lf|:f^t«0i«B Agency
                                 H.C.

-------

-------
                             EPA-650/4-75-004
             STUDIES
        OF  POLLUTANT
       CONCENTRATION
FREQUENCY  DISTRIBUTIONS
                 by
            Richard I. Pollack

        Lawrence Livermore Laboratory
          University of California
         Livermore, California 94550
         Program Element No. 1AA009
     National Environmental Research Center
       Office of Research and Development
      U.S. Environmental Protection Agency
   Research Triangle Park, North Carolina 27711


              January 1975

-------
                   RESEARCH REPORTING SERIES


Research reports of the Office of Research and Development, U. S. Environ-
mental Protection Agency,  have been grouped into series.   These broad
categories were established to facilitate further development and applica-
tion of environmental technology.  Elimination of traditional grouping was
consciously planned to foster technology transfer and maximum interface
in related fields.  These series are:

          1.  ENVIRONMENTAL HEALTH EFFECTS RESEARCH
          2.  ENVIRONMENTAL PROTECTION TECHNOLOGY

          3.  ECOLOGICAL RESEARCH

          4.  ENVIRONMENTAL MONITORING

          5.  SOCIOECONOMIC ENVIRONMENTAL STUDIES

          6.  SCIENTIFIC AND TECHNICAL ASSESSMENT REPORTS

          9.  MISCELLANEOUS

This report has been assigned to the ENVIRONMENTAL MONITORING
series.  This series describes research conducted to develop new or
improved methods and instrumentation for the identification and quanti-
fication of environmental pollutants at the lowest conceivably significant
concentrations.  It also includes studies to determine the ambient con-
centrations of pollutants in the environment and/or the variance of
pollutants as  a function of time or meteorological factors.

Copies of this report are available free  of charge to Federal employees,
current  contractors  and grantees, and nonprofit organizations - as supplies
permit - from the Air Pollution Technical Information Center, Environmental
Protection Agency, Research Triangle Park, North Carolina 27711.  This
document is also available to the public for sale through the Superintendent
of Documents, U.S.  Government Printing Office, Washington, D.C. 20402.
                        EPA REVIEW NOTICE

This report has been reviewed by the Office of Research and Develop-
ment, Environmental Protection Agency,  and  approved for publication.
Approval does not signify that the  contents necessarily reflect the
views and policies of the Environmental Proctection Agency, nor does
mention of trade names constitute endorsement or recommendation for
use.

                  Publication No. EPA-650/4-75-004
                                ii

-------
                                      PREFACE
    Air quality data have been analyzed as a function of frequency, maxima, the
form of the frequency distribution, and averaging time in the 15 papers composing
the "Proceedings of the Symposium on Statistical Aspects of Air Quality Data" (U.S.
Environmental Protection Agency Report No.  EPA-650/4-74-038, Research Triangle
Park, North Carolina 27711, October 1974).

    Dr. Pollack has drawn on these and other data in his analysis of air pollutant
concentration data and the frequency  distributions used to describe such data.   His
dissertation identifies the nature of the frequency distributions for both reactive and
inert pollutants, for both point and area sources, and to some extent for different
types of atmospheric conditions, using a substantially non-empirical approach.
Because of the valuable information presented in his dissertation, Dr.  Pollack
and the Lawrence Livermore Laboratory have given their kind permission to the
Meteorology Laboratory of EPA to publish  it for wider distribution.
                                        iii

-------

-------
                      TABLE OF CONTENTS
  I   INTRODUCTION	       1
        The Problem	       1
        The Research	       2
        The Significance      	       3
 II   AIR QUALITY MEASUREMENTS      	       5
        The Derivation of a Frequency Distribution of a
          Pollutant Emitted from a Point Source    ...       5
        Area Source      .........       6
        An Extension of the "Gaussian Plume" Point Source
          Distribution Derivation     ......       7
        A New Approach to the Derivation of Frequency
          Distributions of Pollutant Concentrations      .     .       8
        Lognormality Over Various Averaging Times    .     .      12
        Summary of Derivations	      18
        Conclusion   ..........      19
III   FREQUENCY DISTRIBUTIONS OF RELATED
     VARIABLES	      20
        Advection	      20
        Diffusion	      21
        Particle Sizes    .........      25
        Conclusion   ..........      26
IV   FREQUENCY DISTRIBUTIONS FOR  VARIOUS
     POLLUTANTS AND SOURCE TYPES      ....      27
        Reactive versus Inert Pollutants    .....      27
        Point  versus Area Sources    ......      37
        Summary	'.    .    .     .      38
 V   THE FREQUENCY DISTRIBUTIONS	      39
        Lognormal Distribution   .......      39
        Weibull Distribution	      49
        Gamma Distribution	      52
        Pearson Distribution      .......      56
        Mathematical Similarity	      58
        Summary    ..........      61

-------
 VI   ILLUSTRATIVE APPLICATIONS      	
        Analysis of Meteorological Patterns for
           Pollution Level Forecasting     .....      52
             Selecting the Clusters	      55
             Classifying New Days     ......      QQ
             Recalibration    ........      5?
             Spatial Interpolation  .......      53
             An Example      ........      53
             Development     ........      72
        Transition Matrices   ........      72
        Random Sampling     ........      73
VII   SUMMARY AND CONCLUSIONS	      75
        Area Sources     .........      75
        Point Sources     .........      7g
        Related Variables    ........      77
        Other Distributions    ........      77
        Applications      .........      73
        Future  Research      ........      73
      LITERATURE CITED	      80
                                 vi

-------
                 TABLE OF FIGURES AND GRAPHS
Fig. l(a).    Autocorrelation versus lag for CO concentrations
             in San Francisco,  1970 hourly averages.              14

Fig. Kb).    Autocorrelation versus lag for CO concentrations
             in San Francisco,  1970 hourly averages (log plot).     15

Fig. l(c).    Autocorrelation at lag 1 versus averaging time
             for  CO concentrations in San Francisco, 1970
             hourly averages.                                    16

Fig. 2.      Concentration and windspeed frequency distributions
             for  CO and windspeed for San Francisco,  1970
             hourly averages.                                    22

Fig. 3.      Probability distribution of the squared temperature
             difference compared with lognormality.
             P(e  < eQ).  e = (AT)2/[(AT)2]  (15).  Separation - 2 cm,
             104  samples per plot.                                24

Fig. 4(a).    Oxidant  concentration versus time in Los Angeles,
             hourly averages.                                    28

Fig. 4(b).    CO  concentrations versus time in San Francisco,
             hourly averages.                                    29

Fig. 5(a).    SO2 concentrations, direction 289°-308°, Lacq,
             France,   1968-1969 (3).                              31

Fig. 5(b).    NO2 concentrations, direction 108°-121°,  Lacq,
             France,  1968-1969 (3).                              32

Fig. 5(c).    SO2 concentrations, direction 108°-121°, Lacq,
             France,   1968-1969 (3).                              33

Fig. 5(d).    NO2 concentrations, direction 289°-308°,  Lacq,
             France,   1968-1969 (3).                              34

Fig. 6(a).    Log probability plot of oxidant concentrations in
             Los Angeles,  11/11/70, hourly averages.              35

Fig. 6(b).    Oxidant  concentrations in Riverside,  California and
             Los Angeles,  California,  1967 hourly averages.       35

Fig. 7.       Frequency curves  of the normal and lognormal
             distributions.                                        41

Fig. 8.       Frequency curves  of the lognormal distribution
             for three values of cr2.                               42


                                vii

-------
                                                                 Page
Fig. 9.      Frequency curves of the lognormal distribution
             for three values of /u.                                 43

Fig. 10(a).   Regions of convergence where the sum of n
             lognormal variates is approximately lognormal.
             (A) Convergence for both normal and lognormal
             approximations, (B) convergence for the log-
             normal approximation,  (C) convergence
             uncertain (21).                                        45

Fig. 10(b).   CO concentration in San Francisco,  hourly
             averages.                                            46

Fig. 10(c).   CO concentration for various  categories of
             pollution days in San Francisco,  1970  hourly
             averages.                                            47

Fig. 11.     Frequency curves for Weibull (top) and Rayleigh
             probability distributions.                             51

Fig. 12.     Cumulative Weibull distribution plotted on log
             probability paper.                                    52

Fig. 13(a).   Frequency curves for gamma probability
             distribution for various values of at.                   54

Fig. 13(b).   Cumulative gamma distributions plotted on log
             probability paper.                                    55

Fig. 14.     Skewness-kurtosis plane in Pearson's system.         57

Fig. 15.     Cumulative beta distribution plotted on log
             probability paper.                                    59

Fig. 16.     A possible set of air quality patterns.                  63

Fig. 17.     Geometric mean versus standard geometric
             deviation for individual days for oxidant con-
             centration in Los Angeles, California,  1970
             hourly averages.                                     65

Fig. 18.     A set of clusters of air quality day-types from
             the data in Fig.  17.                                   69

Fig. 19.     An example of the form of the chart to be
             developed in comparing the clusters generated
             in Fig. 18 with windspeed and temperature.           71
                                viii

-------
                                    ABSTRACT


    Early air pollution research focused on determining the identity of the concentration
distributions for a variety of pollutants and locations and the relationships between attri-
butes of the data, e.g.,mean values, maximum levels and averaging times, from an em-
pirical standpoint.  This report attempts to identify the nature of the frequency distribu-
tions for both reactive and inert pollutants, for both point and area sources, and to some
extent for different types  of atmospheric conditions using a substantially non-empirical
approach.  As an illustration of the applicability of these results, a predictive model and
a monitoring scheme are proposed based upon knowledge developed by studying the fre-
quency  distributions.

    It is found that a theory of the genesis of pollutant concentrations based upon the
Fickian  diffusion equation predicts that concentration distributions due to area sources
will be approximately  lognormal over a diurnal cycle in the  absence of nearby strong
sources.  It is determined that reactive pollutants will have  larger standard geometric
deviations than relatively inert pollutants.  Empirical observations are in good agree-
ment with these results.  The frequency distribution of the logarithms of concentrations
due to point sources is derived and shown to be a sum of normal and chi-squared com-
ponents, with the identity of the dominant term determined by meteorological  conditions.
This result provides a framework for resolving apparently conflicting results in the lit-
erature. The lognormality of other meteorological variables, notably windspeeds and the
rate of energy dissipation in turbulent flow, and their relation to air quality frequency
distributions is  discussed.  There is considerable discussion in the literature concern-
ing whether the lognormal distribution provides the best fit. Other distributions that
fit air quality data fairly well are  investigated, and their mathematical similarity to the
lognormal is demonstrated.

    As an illustration  of the significance of the results developed herein, a predictive
scheme  that uses concentration frequency distributions as a. basis for classifying meteoro-
logical patterns  is presented.  This scheme uses natural clustering of the distribution
parameters to identify meteorological and emission patterns. Finally, an air quality
monitoring random sampling scheme based upon the distributions identified in the litera-
ture and this work is presented and its improvement over non-parametric techniques is
demonstrated.
                                         ix

-------

-------
                    CHAPTER I —INTRODUCTION

                             The Problem
      In recent years public interest in the quality of ambient air has
increased.  As information concerning deleterious health effects has
gained wider acceptance, government has moved to specify standards
for the  quality of ambient air.  The standards are most often given in
terms of a maximum value which may be exceeded only once a
year for concentrations averaged over a specified period of time.
      The approach taken in relating air quality data to standards is  to
calculate the frequency distribution for the air quality data,  from
which concentrations at various averaging times can be derived.  It is
essential that these distributions  be very precise because of the
stunning economic  impact on a  region which must change its way of
life to conform to air quality standards.  As a result,  considerable
attention is being paid to this problem.
      The earliest  work on this problem consisted of the  empirical
identification  of the frequency distributions of surface air pollutant
concentrations.   Various distributions were proposed  with different
degrees of success.  The most widely accepted of these distributions
is the lognormal,  primarily due to the work of Larsen (1) who pre-
sented data indicating that concentrations of all pollutants in all tested
cities for all averaging times are approximately Lognormally dis-
tributed, it was also noted from these data, however, that some
pollutants tended to fit better than others, there were  differences
between cities, and it was not clear why averages of lognormal vari-
ables should be lognormal rather than normal as the Central Limit
Theorem would indicate.
      Later work determined that different distributions and/or dif-
ferent ranges  of parameter values were appropriate in different cir-
cumstances.  Marked differences were noted between  inert and re-
active pollutants and point and area sources.   Conflicting results are
presented in the literature concerning these distributions.  It is clear
that an  understanding of these results is important because of the
economic impact of the decision that ambient air quality standards

-------
(AAQS) are being violated.   Further,  an understanding of the nature of
the distributions and their parameters in the various cases can serve
only to enhance our understanding of the fundamental principles in-
volved. '
      There are,  of course, more pragmatic applications of this re-
search.  In particular, the formulation and validation of air pollution
models cannot proceed without some knowledge of the form of the
output to be expected.  The effects of various types of sources on
ambient air quality can be estimated through knowledge of the form of
the resulting concentration distributions and how the  parameters vary
with source type,  pollutant type and distance from the source.  Con-
siderable savings in time and money can be made in air quality
monitoring, prediction and modeling through applications of the tech-
niques presented herein.
      At present,  the scientific community has not reached a concensus
concerning the points raised above.  There are a number of conflicting
empirical results, and there is little work on a more theoretical level.
The present work seeks to add new information to the discussion,
derived from a non-empirical viewpoint.

                            The  Research
      The objective of this work  is to present a model which binds to-
gether previous theoretical and empirical findings within a  unified
framework.
      First, the frequency distribution of surface air pollutant con-
centrations  is derived starting from the differential equation describing
the time evolution of air pollutants through the  atmosphere.  It is shown
that for certain fairly general conditions, the distribution is lognormal.
      Using the "Gaussian Plume" equation, which describes the disper-
sion of a pollutant from a point source as a spatial bivariate normal
distribution, the concentration distribution resulting  from  a point source
is derived.  It  is shown here that the identity of the distribution is
dependent upon the distance from the source, the atmospheric stability
conditions,  and the magnitude of the windspeeds.   Within this framework
several apparently conflicting results from the literature (2), (3) can be
reconciled.

-------
    •  One of the most significant of Larsen's empirical results was the
fact that pollutant concentrations are approximately lognormally dis-
tributed for a wide spectrum of averaging times.  This appears to
contradict the central limit theorem of mathematical statistics.  How-
ever, through the model presented herein the averaging process can
be seen as a filter of various scales of atmospheric motion, each of
which results In the lognormal.
      Several Investigators have proposed other distributions for de-
scribing air quality data.  The Weibull, beta, and gamma distributions
are the most often suggested.  An obvious question concerns the reason,
In mathematical terms, that these distributions fit the same data fairly
well. It Is presented later that a transformation can be found which re-
duces the lognormal, gamma, and Weibull distributions to very similar
forms.   This suggests  that there Is little significant difference between
these distributions for  the parameter values often observed.
      A variety of other meteorological variables are approximately
lognormally distributed, particularly those  describing atmospheric
motion,  or motion of substances suspended  In the atmosphere.  The
relationship between several of these variables and pollutant concen-
trations Is discussed.
      Rather than merely stating these derived results,  considerable
attention has been paid to empirical evidence both from the literature
and compiled for this work.  The assertions made herein are supported
by data  which are presented concomitant with the non-empirical results.

                           The Significance
      This study treats the problem of the identification of the frequency
distributions of air quality data comprehensively.   The following are
discussed:
      1.  Which parametric distributions are appropriate to characterize
          pollutant concentrations from point and area sources and for
          inert and reactive pollutants?  Why?
      2.  What Is the effect of averaging time on these frequency dis-
          tributions?

-------
      3.   How are these distributions affected by other meteorological
          variables?
      4.   How can this information be applied?
      It is concluded that the  information developed here can be applied
to developing models for alert level forecasting,  air quality monitoring,
and more.

-------
           CHAPTER II— AIR QUALITY MEASUREMENTS

      Air quality data are continuously monitored,  with the receptors
punching out one reading every 5 minutes.  These raw data are then
averaged as follows:
      let
           xrx2 ...  XL ...  xn

be a sequence of 5-minute observations.  The averages
x1 + x2 + . . .  xk   xk+1 + xk+2 + . . .  x2k   x2k + x2k+1 + . . .  x3k
         K                   K          '           K

are then calculated and referred to as the averages of time K.  These
averages are the standard concentration measurements used  by re-
searchers and analysts.
      Air quality standards are  set based upon the relationship between
air pollution exposure and health effects. A comprehensive exposition
of these  standards can be found  in Ref. (4).   Updated information is
published by the Environmental  Protection Agency (EPA) Office of Air
Programs.
      To compare ambient air quality to standards, frequency distribu-
tions  of the  averages are calculated.   The cumulants of these distribu-
tions  are used to determine the  probability of exceeding a particular
standard.
      This chapter concerns itself with determining the nature of these
distributions from a non-empirical standpoint.

            The Derivation of a  Frequency Distribution of a
                Pollutant Emitted from a Point Source
      The earliest discussion of pollutant concentration frequency dis-
tributions was by Frank Gifford in 1958 (5).  Gifford started with the
equation describing the diffusion of a plume of stack effluent.  This
"Gaussian Plume" equation,

-------
           2- = (27rY2U)"1 exp
                                (y - Dy)2 + (z - Dz)2'
                                        2Y
where
             o
               is the variance of the material in individual disk
           R is the continuous rate of emission,
               is the var
               elements,
           -\r
           •^ is the instantaneous relative concentration,
           U is the magnitude of the wind vector,  and
           D ,D  are the distances of the plume center from the
             «/
                  origin,
can be simplified by defining
               (y-D)        (z - Dz)
              " (2Y2)1/2      " (2Y2)1/2

Therefore,
                       =  -In

where
            Cj - 2?rY2U

            2      2
The terms3^  and5? are each chi-squared variables, and by the re-
productive property of  chi-squared variates the result of the convolu-
tion is also chi-squared. Hence the natural log of the concentration is
directly proportional to a chi-squared random variable.
      Note that this result applied to a point source only.
                             Area Source
      In 1972 Gifford expanded this derivation to include area sources
by summing a number of point sources (6).

-------
      For n sources which affect concentration at the same point,
Eq.  (II-3) is summed over all n  sources yielding:
                   n              P
_n  ClXi/Qr Z  (Y2+Z2)
                                                                (11-5)
which can be written
               n
            In  [I [C;  X./Q.]1/11 = -L/n                           (II-6)
where the term within the brackets is the weighted geometric mean and
the right-hand side is normally distributed by the  Central Limit Theorem.
If the geometric and arithmetic means are  simply related, e.g. propor-
tional, then the natural logarithm of  concentration is normally dis-
tributed.

                An Extension of the  "Gaussian Plume"
                 Point Source Distribution Derivation
      The standard Gaussian Plume equation uses U, the mean wind-
speed, as a parameter.  However, there is considerable evidence  to the
effect that windspeeds are approximately lognormally distributed,  and
this may be expected to affect the concentration distribution.  If we re-
turn to
where
               = 27rY2U ,
& and & are defined in terms of the parameters of the Gaussian Plume
equation and are assumed normally distributed.
      This equation caji be written
      (27r?2U)
                                       ) .

-------
                 p
Defining K1 = 27rY , one finds
                                U  +      L                     (11-10)
           2    2
The term 3^  +^  is exponentically distributed or chi-squared with 2df,
j£n K. is a constant, and S.n U is normally distributed.  If

                                                               (11-11)
then the lognormal distribution results. I Thus the two results are recon-
ciled.
      Equation (11-10) is intuitively reasonable for periods of non-
negligible wind because inU^Us a measure of advective flux while &  +&
is a measure of diffusive flux.  Except in periods of extremely low winds,
the advective flux will be the larger.  During low wind periods,
apparently the lognormal approximation will be poor.  The author knows
of no such  empirical analysis or theoretical analysis at this  time.  In-
deed, atmospheric modeling during calm conditions is in its infant stage.
      In short,  the exponential distribution is appropriate under the
assumption of constant wind velocity.  However,  windspeeds are ordinar-
ily lognormally distributed, and the advective flux is of greater signif-
icance than the diffusive flux.  Under these more general conditions we
see that X/Q is  approximately lognormally distributed for periods of non-
negligible wind.  No conclusion is reached for calm periods  save that the
lognormal  approximation is likely to be poor.  We realize, of course,
that the concentration must be nonzero at all times for the lognormal to
be correct.  For most pollutants the background  concentration is enough
to satisfy this condition.

                 A New Approach to the Derivation of
         Frequency Distributions of Pollutant Concentrations
     Also in 1972, Knox and Pollack (7) derived the following relation-
ship from theoretical  considerations, using a substantially different
approach.

-------
      Consider a stochastic process of the form:
where Y.  is an independent stochastic variable,  arbitrarily distributed.
      If we solve Eq. (11-12) for Y.
           X.  - X.
           -^ - — = Y.                                       (11-13)
             X          L
and sum both sides
               X. - X.
                         + Z Y.,                               (H-14)
                           4=0
We can approximate the left side by
                        N
                           Y..                                 (11-16)
By the Central Limit Theorem,  2^ Y. is normally distributed,  hence
                               j2=Q
X  is lognormally distributed.
      This is known as the law of proportional effect; the percentage
change in a variable is equal to a constant plus an error.  If the absolute
change had been equal to this  same constant-plus-error term,  the
normal distribution would have resulted.  Hence the lognormal distribu-
tion  is the result of a multiplicative process, whereas the normal dis-
tribution results from an additive process.

-------
                                                                      10
      If we examine the differential equation describing the time evolu
tion of pollutants in the  atmosphere:
/
(
                                         8   „
                                            Ky
              9     8^
                               V
where ^  is the concentration of pollutant a; u, w, and v are the velocity
        cl
components; K , K  are the lateral vertical eddy diffusivities which are
                  tj
lognormally distributed based  upon the lognormality of e and  the repro-
ductive properties; K   is the vertical eddy diffusivity; S  is the source
                    Z                                cl
term for pollutant a; P is the term representing changes in concentra-
tion due to photochemistry; and V is the volume of air for  which S and
P act.
      This equation can be manipulated to represent a box model
formulation (8) where  we are concerned with the concentration averaged
over a box which is  surrounded by M other boxes.

dk (m, t)l      ^
-Uat—   =' /<
              J
             M
           + ^ [TA(J- m) + TD(J' m)] ^k(j' i} + Sk(m't}
            J

           + Pk [^a(m, t) . . . ^/n(m, t), t]   .                        (11-18)

                                 \
Where TA(m, j) and  T1~.(m, j) are the advective and eddy diffusive transfer
coefficients from box  m to box j.  The lognormal distribution can be
argued for these latter variables in a similar manner as for K  and K  .
      This equation is also  consistent with the generating process,
Eq.  (11-12), when certain reasonable restrictions hold.
      1,   The contribution  of advection and diffusion terms are larger
          than the contribution of the source term.  It has been found
          empirically that if this is not the case, lognormality does not
          result (9).

-------
                                                                     11
      2.   The concentrations in the surrounding boxes are on the aver-
           age over long periods of time close to that of the box we are
           interested in because they are subjected to similar stimuli.
      These restrictions transform Eq. (11-18) to:
            M
                [TA(m,j)+TD(m,;jJ] ^(m, t)
           J
          M
        + 2  [TA(j, m) + TD(j, m)]  tf/k(j,t).                     (H-19)


Suppose we let

          tf/k(j, t) = (//k(m, t) + Ek(j, t) ,                           (11-20)

the equation becomes

d»(m, *)   -
        = -     [TA(m,j)+TD(m.j)]  ^(m, t)
            J
           M
               [TA (J' m) + TD(J'
           M
        +  Z  [TA(j, m) + TD(j, m)l EMj, t) .                    (11-21)
          j=0  L                  J

When we sum both sides to show lognormality,  we have for the third
term,
              M
                                      Ek(j' t}
      From meteorological reasoning we note that if the flux term is
large, indicating strong winds,  the difference between i//(j, t) and i//(m, t)
will be small.  Hence the term tends to zero.  Conversely,  in the case

-------
                                                                     12
where the error term E, (j, t) is Large the flux term will usually be small,
indicating light winds.  Furthermore, in either case or any combination
of cases occurring between T,. and TR,  we can expect that the sign of
the term will vary over a diurnal,  weekly or seasonal cycle, implying that
the positive and negative terms will cancel each other.
      This argument implies that Eq. (11-21) is essentially equivalent to
Eq.  (11-12), which is consistent with the law of proportional  effect.
      The solution will be  source-dominated only when the magnitude of
the source terms  is comparable to the magnitude of the current con-
centration.  There is reason to believe  (9) that in such cases the con-
centrations will not be lognormally distributed, as the model indicates.
This result has also been noted in  investigations of particle size dis-
tributions (10).
      This reasoning is most easily justified for a well mixed urban
region.  It is not clear that the lognormal distribution will fit as well for
non-urban,  poorly mixed areas.  We do feel, however, that the  charac-
teristics of an area's topography and typical meteorology would have to
be highly unusual  for (11-22) to be so large that the lognormal distribution
would fit poorly.
      We have not yet discussed the At interval necessary for these re-
sults.  ,We recognize that it must be  sufficiently small not to obscure
the generating process.  If, for an extreme  example, At was six months
we would not see the effect of Eq.  (11-12) because the effect of i//. 1 on
(jj.  would have long since died out.  Larsen's (1) data are for 5-minute
instantaneous readings.  We accept this as an appropriate time  scale for
our purposes, based on the fact that  meteorology certainly does not
change enough in a 5-minute period to obscure the relevant correlations.

             Lognormality Over Various Averaging Times
      When the data are averaged over other time periods within the
realm of atmospheric motion,  the  averaging time acts as a filter which
smooths out motions of a smaller time scale.  This has the effect of
allowing us to see only motion of a time scale comparable to the aver-
aging time in the averaged data.  Hence the  process  described by
Eq.  (11-21) still holds for larger averaging times,  but the T.,  T^-. terms

-------
                                                                      13
now represent motion of a larger scale.  This results in lognormality
over a large spectrum of averaging times.
      An essentially equivalent relation to Eq. (11-12) is
This equation can be transformed to represent a first-order autore-
gressive stochastic process by taking the logs of both sides to yield:

            In x. = In x._. + In e.

This stochastic process is identified by examining the autocorrelation
function to verify that it decays exponentially,  and by calculating the
partial  correlation function to verify that it cuts off after lag 1.  The
partial  correlation coefficient can be thought of as a measure of the
independent predictive capabilities of x._ ,  without  regard to informa-
tion which is "passed through" x._  , .
      To verify the hypothesis presented above, these statistics were
calculated for natural logs of CO concentrations in San Francisco for
1970 as well as the untransformed observations.  The autocorrelation
function for lags 1 to 12  appears in Fig.  l(a) and  Table 1,
and is plotted on a log scale in Fig. Kb).  The agreement with the
exponential curve appears good.  The partial correlation coefficient
is equal to the  autocorrelation coefficient for lag 1,  of course,  but
thereafter it is negligible statistically.   In particular, the values are
0.874, -0.046,  and 0.015 respectively for the first three lags.  The
additive model does not appear to fit the first- order autoregressive
model as well.
      Figure l(c) indicaces that the multiplicative model appears to
be more appropriate for averaging times of 1 hour to 180  hours.  A
statistical test on the differences between the autocorrelations from
the multiplicative as opposed to the additive model was performed.  The
results  indicated that the autocorrelations are indeed significantly
different.   Table 2 presents several such calculations for representative
lags and averaging times.

-------
                                                           14
   1.0
   0.9
   0.8
   0.7
g  0.6

o
u

£
D
   0.5
   0.4
   0.3
   0.2
      0
                  Log transformed
      No transform
I
I
4        8



 Lag time in hours
         12
Fig. l(a). Autocorrelation versus lag for


          CO concentrations in San


          Francisco, 1970 hourly averages.

-------
                                                        15
o
o>
     o
     r-
     o>
      O
      o
      CO
      •t-l
      o
      CO

      .s
       CQ
       c
       o
      ••"*
      13
 O

 g
 u

O
U
       bo
       cd
       t— i

       CQ
        4)
        >     .
        d    m
        o    v
       ."    tuo
       -y    rt
        rt    ^
       U    v

        &    ^

        8    >?
        O    (•<
        -•->    3
        5    5
        tiO

-------
                                                                                                 16
                                                      8
                                                      CN
                                                       o
                                                       •o
                                                      o   §

                                                      2  J
                                                            0)

                                                      8  |

                                                       o   «
                                                       00  <
                                                       o
                                                       CN
00
  •

O
CN
  •

O
                                                                      .S

                                                                      CO
                                                                      f!
                                                                      O
                                                                      a
                               o

                               §
                              o
                              o

                               o

                               v

                              •*-i
                              -i->
                               bo

                              'So

                                    w
                                    ft)
                                    tio
                                                                           §
                                                                      --
                                                                      cd   ra
                                                                      1— I   "^
                                                                      a)   o

                                                                      Si   0)
                                                                      o   t<
                                                                      o   *,
                                                                      tuo

-------
                                                                    17
Table 1.  Autocorrelation and averaging time for CO concentrations in
          San Francisco,  1970 hourly averages.
Autocorrelation
(lag 1)

0.834
0.648
0.576
0.623
0.642
0.604
0.626
0.686
Standard
Variance deviation

3.943
3.153
2.285
1.267
1.000
0.826
0.677
0.544
Untransformed
1.986
1.776
1.512
1.125
1.000
0.909
0.823
0.738
Mean
data
3.488
3.488
3.488
3.486
3.486
3.491
3.500
3.476
No. data
points

20073
5018
1672
238
119
59
29
14
Averaging
time (days)

0.042
0.167
0.500
3.500
7.000
14.000
28.000
56.000
Log-transformed data
0.874
0.705
0.641
0.675
0.706
0.664
0.664
0.726
0.288
0.243
0.184
0.116
0.095
0.081
0.068
0.059
0.536
0.493
0.429
0.340
0.308
0.285
0.261
0.243
1.109
1.109
1.109
1.109
1.109
1.110
1.112
1.104
20073
5018
1672
238
119
59
29
14
0.042
0.167
0.500
3.500
7.000
14.000
28.000
56.000
Total number of hourly measurements = 20448

Number of missing measurements     =   375

-------
                                                                      18
 Table 2.  Differences between autocorrelations of log-transformed and
          untransformed CO cor centrations in San Francisco, 1970
          hourly averages.
Autocorrelation
0.834a
0.874b
0.576a
0.641b
0.626a
0.678b
0.648a
0.712b
0.68ia
0.7535
0.566a
0.650b
0.487a
0.4055
Averaging
time
(hr)
1
12
84
168
1
1
84
a
level of
significance
21.1
4.23
1.4
1.3
20.9
18.9
1.60
Lag
1
1
1
1
2
3
3
   No transformation of data.
   Natural log transformation of data.
                       Summary of Derivations
      The latter derivation will serve as the basis for the reasoning
presented later concerning the nature of the frequency distributions re-
sulting from various types of sources and pollutants.
      The other derivation supports the latter one,  as is to be expected
since the Gaussian Plume  equation is  a  solution to the Ficklan Diffusion
equation.  It is included for the sake  of completeness and to dem-
onstrate the consistency of the new derivation with existing theories.

-------
                                                                    19
      The new derivation is based on reasoning which is clearer and
 more flexible than the Gaussian Plume derivation.  These features will
 be used to great advantage in the explanation of various empirical re-
 sults which could not be easily justified using the earlier derivation.
      Other than these,  no theoretical explanations for the lognormal
 distribution or any other have been suggested.
                            Conclusion
      It is shown that considerable theoretical and empirical support
exists for the lognormal distribution as the most appropriate for the
characterization of pollutant concentrations for a wide range of averaging
times.  In later chapters other distributions are discussed, but these
alternate  results are demonstrated to be consistent with the material
presented in this chapter.

-------
           CHAPTER III —FREQUENCY DISTRIBUTIONS OF
                        RELATED VARIABLES
      Pollutants can be viewed as tracers of atmospheric movements.
Since we know that pollutant concentration frequency distributions are
fit well by the lognormal,  we suspect that some descriptors of the
atmosphere are also lognormally distributed.   Indeed, this is the case.
      Fundamentally,  atmospheric processes  are structured differently
than ordinary engineering-type processes.  In particular,  the change in
a variable describing an atmospheric process is most often propor-
tional to the level of that variable.  This is  written;

           X.  =X._1 e .                                        (III-l)

This is a multiplicative  process.  Many descriptions of atmospheric
motion  can be described by a process of this form.
      We are primarily  interested in variables describing the transport
of pollutant through the atmosphere.   Such transport is described by the
advective and eddy diffusive transfer rates.   We are interested also in
the removal of pollutants from the atmosphere,  but few results are
available except for particulates.  Particle  sizes, which partially govern
the deposition of particulates, will be treated  below.
      Through this investigation of other meteorological variables we
shall lend further credence to the conclusions presented in the pre-
ceeding chapter concerning the identity of the  concentration distributions.
In addition, the study of these variables yields greater insight into the
nature of atmospheric motion and pollutant transport which leads to new
approaches to identifying pollutant concentration frequency distributions.

                              Advection
      The lack of knowledge concerning mesoscale atmospheric motions
has hindered the formulation of a mathematical  model describing such
motion.  For this reason it is difficult to demonstrate that windspeeds,
measured continuously at a point over an interval of time, are log-
normally distributed from theoretical considerations.  However, ex-
tensive empirical analysis has been performed, indicating that the
lognormal is a reasonable assumption.
                                  20

-------
                                                                      21
      It has been shown by Gifford and Hanna (11), (12) that pollutant con-
centrations are proportional to windspeeds,  which is also implied from
the fact that both are lognormally distributed (see Fig. 2).  The correla-
tion coefficient for nonreactive primary pollutants like CO is extremely
high (~0.90), slightly less for the more reactive pollutants like NO™
(~0.85), and least for the secondary pollutants Like oxidant (0.66).  These
investigators evaluated the constants of proportionality for many cities
for several pollutants.   The results are summarized in Refs. (11)  and
(12).
      In comparative studies,  it has been found that this  simple model
            0 = KQ/U                                              (III-2)
where
            Q = total emissions,
            U = windspeed,
            0 = pollutant concentration and
            K = empirical constant
performs as well as many more complicated models for  primary pollu-
tants.  This is a. testimonial to the fact stated  above,  i.e.,  pollutants are
tracers of atmospheric motion.   Although  this may seem obvious,  one
should realize  that other processes than just advection influence con-
centration,  including the time history of the source,  the  deposition, and
the distance the pollutant has been transported.

                              Diffusion
      The windspeed is a measure of advection,  while diffusion is de-
scribed by eddy diffusivities.  In the  Fickian diffusion  equation,

d0      80     80      80    Q   /   80 \      /   80_
~dT + U 9x  + v ~8y  + W 8z~ = 9x
                     90 \'
                     	a)
K  , K ,  K  are constants which are the eddy diffusivities describing
      »/
diffusive flux.  We may suspect that these are also lognormally dis-
tributed, and the process of demonstrating this is  now presented.

-------
                                                           22
    100
  0_
 0)
 0)
 Q.
  Q.
  Q.
  C
  O
 C
 0)
 o

 o
 u
     10
                   •Wind
               Observed
               concentration
          i I  i i  i  i
     0.01       1      10       50      100

           % of concentration exceeded
Fig. 2.  Concentration and windspeed fre-

        quency distributions for CO and

        windspeed for San Francisco,  1970

        hourly averages.

-------
                                                                     23
      The local viscous dissipation is turbulent flow, e, given by
                           2
where v is viscosity,  u is velocity, and i and j refer to direction.
      In his original similarity hypothesis, Kolmogoroff formed length
and time scales of turbulent motion without taking the variability of e into
consideration.  In  1962, KoLmogoroff refined this hypothesis to take into
account random fluctuations in e (13).  In turbulent flow,  energy is
transferred from one  stage of motion to the next; this transfer is what e
represents.  The amount of energy transferred at each stage has been
argued to be a function of the relative magnitude of the state (14).  This
describes a multiplicative process.  If the transfer  stages are similar
and are independent, then by the law of proportional effect, In (e)  is
distributed normally.
      Now,  recalling the reproductive properties of the distribution we
can argue the lognormality of several other variables for which the
relations to e are well known.  The dissipation of temperature variance
by thermal conduction is given by

            X = 2a(grad  T)2                                       (III-5)

where a is thermal diffusivity. If both X and e are lognormal [X is
argued lognormal by Gurvich (15)] then AT and AU,  the temperature and
velocity differences between two points in space separated by a distance,
                                                                  o
are lognormal on either side of the origin, or more  concisely, (AT)  and
    2
(AU)   are lognormal due to the relationships:
                                                                 (III-6)

from Kolmogoroff and the reproductive properties of the distribution.
      Measurements (15) for r = 2 cm at 4 m above ground yielded the
distribution function plotted on lognormaUprobability paper in Fig. 3.
Clearly, the lognormal is a good approximation where a straight line in-
dicates perfect lognormality.

-------
                                                                24
 (V
V
  0.990
   0.95
   0.90
0 0.80
   0.70
    0.50
    0.30

    0.10
       -3    -2-101
                       €0
Fig. 3.  Probability distribution of the squared
        temperature difference compared with
        lognormality P(e < eQ).   e = (AT)2/
        [ (AT)2](15).  Separation = 2 cm, 104
        samples per plot.

-------
                                                                     25
      An approximation to the horizontal eddy diffusion coefficient based
upon similarity theory  is (16)

            Kh =  e1/3 a4/3                                        (III-7)

where a is the root-mean-square dispersion of the particles in a pollut-
ant puff.
      Since a  lognormal variable raised to any exponent is also log-
normal and since a '   is invariant,  it can be inferred that K,  is log-
normally distributed.   Note that a has been approximated at 0.7 AS
where AS is an intergrid-square distance in a compartmentalized model.
      Further, another approximation which is used in atmospheric
modeling is

                      3                          •               (m-8,

which was  developed without reference to statistical considerations.
This relationship indicates the lognormality of windspeed based upon
the lognormality  of e,  or vice versa (17).  Also,  the vertical  eddy
diffusivity  has been approximated as
            K  -  400 u1                                          (111-9)
             £-i        -L

where u1 is the horizontal wind velocity at 1 meter, determined by use
of a power law vertical profile from the mean layer wind.   This
illustrates the likelihood that K , a quantity which has not yet been
                              z/
accurately measured in the atmosphere,  is lognormally distributed.
      It is  worthwhile to notice that the results obtained from models
using these approximations have been encouraging (8).

                            Particle Sizes
      It has been clearly established (18) that  particle size distributions
in atmospheric aerosols are approximately lognormal.  There is  some
disagreement between scientists as to the exact size range  covered, or
whether there are two or three overlapping lognormal distributions  (19).
However,  the lognormal approximation is widely  accepted.

-------
                                                                     26
      Particles are the result of a multistage grinding process where
the size of a particle at any stage Is a function of its size at the Im-
mediately previous stage.  This can be represented by the multiplicative
process which results in  the lognormal distribution examined in
Chapter II.
      The deposition rate of the particles is a function of particle size,
primarily, and  is therefore approximately lognormally distributed by
virtue of the reproductive properties of the distribution.  Therefore,  the
negative portion of the source term in Eq. (II-3) due to deposition is
approximately lognormally distributed.

                              Conclusion
      Pollutant concentrations are a function of emissions, chemical
change, deposition and transport.  Several of the variables describing
transport and deposition are discussed here and are found to be con-
sistent with the  lognormal assumption.
      Hence It Is clear that pollutant concentrations are followers of the
overall character of motion In the atmosphere.  This result has not yet
received adequate attention.   Full realization of Its significance In-
dicates that air  quality data, which is essentially simple to collect and
useful for a pragmatic  purpose,  may have applications In the under-
standing of atmospheric processes.

-------
     CHAPTER IV —FREQUENCY DISTRIBUTIONS FOR VARIOUS
                 POLLUTANTS AND SOURCE TYPES
      In this chapter we shall examine several aspects in which pollu-
tants differ, in light of the model presented in Chapter II.  It will be
shown that certain seemingly conflicting results can be explained in this
context.

                   Reactive versus Inert Pollutants
      When we examine frequency distributions of air pollutant con-
centrations, we notice that the parameters vary from day to day and
from pollutant to pollutant.  This variation is a result of the nature of
the meteorology of which the pollutant is a tracer,  the sources of the
pollutant, and its reactivity.
      Figures 4(a)  and 4(b) show daily profiles of several pollutants.  The
profiles with the steepest slopes and highest peaks yield  the highest
standard geometric deviation (SGD).  We can see that pollutants with
similar reactivity trace the meteorological conditions in the same way,
provided that they are from the same type of source.  Carbon monoxide
and hydrocarbon are an example of this.  Shuck et al. (20) report a
correlation coefficient of 0.99.
      These principles are simple to  see; anything which causes large
fluctuations in the daily profile will cause a large SGD  in the  frequency
distribution of that pollutant.   The significant causes are unstable
meteorological conditions and volatile pollutants.  This volatility is
caused by either the basic chemical structure of the pollutants, as is the
case with oxidants and  oxides of nitrogen, or by the temperature of the
pollutant, as is the case with thermal plumes of SO0.
                                                  Li
      If we return to the Fickian Diffusion equation:
                                /   8^a\    8 /   9^a\
                                (Kx^) + e|(Ky^)
  8^a    8  /   8^    « /   8^-
w T5- = tt (K
                            Jx.y.z.t)   p
                              V       V  a'  b ' ' '   n'
                              27

-------
                                                                 28
      2   4  6  8  10 12 14 16  18 20 22 24
                    hours

Fig. 4(a). Oxidant concentration versus time
          in Los Angeles,  hourly averages.

-------
                               29
 en
 0)
 W)
 a
 t,
 0)

 rt
1

 o*
 u
 CO
••-I
 o
 cs
 at
 cfl
CO
 o;
 s
 0}
 3
 ca
 (H
 (U
 >

 a
 o
 o
 g
 o
O
U
t

-------
                                                                    30

we see that the chemical change term is larger and more variable for
reactive pollutants.   This affects the argument presented in Chapter II
through its direct effect on the daily profile predicted from the equation.
The  increased volatility causes larger fluctuations in concentration
which are seen in the box model diffusion [Eq.  (11-18)] both through the
magnitudes of ih (m, b) and !//,  (j,t), and in the magnitude of the difference
E^Xj, t). This absolute value of this  term is larger,  but it still changes
sign over the diurnal cycle which results in lognormality (Fig. 5).
      This explains the results of Benarie (3) and Knox and Lange (2)
who  noticed changes  in SGD between pollutants  for the same meteorology.
The  model described in Chapter II provides a theoretical framework
through which these results can be understood.
      Although Larsen's analysis indicated that oxidant concentrations
are lognormally distributed for all cities for all averaging times as in
Fig.  6(a) often oxidant concentrations depart from lognormality in a
consistent way,  as is depicted in Fig. 6(b).  At the  present time, no
definitive explanation has been set forth to explain this  anomaly.  A
brief investigation has suggested two nonmutually exclusive,  possible
explanations.
      The first is based upon the theory that the photochemical reactions
producing oxidant in the atmosphere are  self-limiting.  This is given
some support by the fact  that oxidant concentrations have never been
recorded at 1 ppm or higher in the atmosphere, even when such con-
centrations might be  expected because of source and meteorological
conditions.
      The second  explanation is that the averaging time of 5 minutes is
so long in comparison to  the reaction  rates that it averages out short-
duration high concentrations which might otherwise cause the cumulative
distribution curve, as in  Fig.  6(b) to straighten out.  This is  supported
by the fact that the "angle" between the two adjacent straight lines  in
Fig.  6(b) becomes sharper as averaging  time increases,  and  there  is  no
reason why the reverse should not hold for averaging shorter than
5 minutes.
      The shape of the curve  in Fig. 6(b) can be compared to  the shape
of the curves for the  gamma, Weibull, and Pearson-IV distributions
plotted on log probability paper.   It appears that the Weibull distribution

-------
                                                                                                               31
T—I—I—I
          CO   CM   •—   O
c   c    c
O   0    O
                           c
                           Q>
          5   5    £   P
          oo   oo   t/»   co

          O   O    +    •
                                                                                     a.
                                                                                     oo
                                                               J	1-
oo
o

to
cs

o
                                                                                     o
                                                                                     >o
                                                                                     o
                                                                                     in
                                                                                     o
                                                                                     "^•
                                                                                     o
                                                                                     n

                                                                                     o
                                                                                     cs
                                                                                     in
     ~o
     0>
     0>
     U

     £


     _0
     4-

     E
     •*-

     0)
     U

     o
     U
                                                                                             
                                                                                                      O   en
                                                                                                      C/2   •-•
                                                                                                       as
                                                                                                       W)
                                   LU

-------
                                                                 32
                      i i  i  i—i—i	1	r
     "- O 00
•— CM CO CO CM

 C C  C  C  C

.2 .2 .2 .2 .2
'•C *ZT "zr *^r tn

 o o  o  o  o
•»- *- •*- 4- *-
liO LO CO CO l/>
>  0
         0 +
   //-/7'x   f

?'//'*'
                                    Os


                                     •

                                    Os
                                                 Os

                                                 8:
                                                 a.
                                                 Os


                                                 00
                                                 Os
                                   JO
                                   Os
                                                 o
                                                 00
                                   8
                                   o
                                   10
                                                 o
                                                 CO


                                                 8
                                                 m
                                                 CM
                       p 1  1  1   1

                                                 O

                                               od
TJ
4>

"8
O
U
X
V

c
.o


1

IE
0)
o

o
o
968-1969
ee
                                              0*
                                              o
                                              <&
                                                            o

                                                            00

                                                            0
                                              |

                                              •+J

                                              O

                                              O)

                                              S-,
        CO

        §
        •rH
        13
concenl
                "/ ON 6Tl)
                                                            txo
                                                            -r-l
                                                            fo

-------
                                                                     33
  ....  1    ,
      i—  O  00
r—  CN  CO  CO  CM
 c  c  c  c  c
 o  o  o  o  ^o
•j;  '£.  '^  '£•  '•£
 °  °  5  S.  £
I/I  l/>  l/">  CO  CO
 t>  a  •  o  +
                                   .
                               ,.*/*/*'•
1 1  1  1  1  1   t
                         _Li_L
                                                     o
                                                     Os
                                                     cs
                                       O
                                       00
                                       O
   "8
O "O
00  S
o  u
-  S
O  ".£
Sf
    (U
    o
    o
    o
                                                     o
                                                     00
                                                     IT)
                                                     CN
                                                        <£
                                                                 00
                                                                 o
                                                                 c
                                                                 CO
                                                                 0*
                                                                 o
                                                                 OJ
                                                                 o
                                                                 I— I
                                                                 
                                                   o
                                                   0>
                                                   C!
                                                   O
                                                   • t-t
                                                   -*-*
                                                   OJ
                                                                 (U
                        8
                                                                 o
                                                                  Ng
                                                                 O  C35
                                                   O
                                                   In

                                                   ti
                            6Ti)

-------
                                                                                      34
                CO  

                                   £  S
                                   r^*  x
                                      o>

                                   •O  c
                                   _  o
                                  o  i
                                  ^  c
                                  o  *
                                  8  g

                                  o  8
                                  
                                                o
                                                C
                                                cd
                                                (H

                                                fe



                                                C?
                                                o
                                                ri
                                                                                CO

                                                                                o
                                                                                CO
                                                                                CO
                                                                                CM


                                                                                I

                                                                                -4->
                                                                                o
                                                                                OJ
                                                                                 c
                                                                                 o
                                                                                •^
                                                                                -t->
                                                                                 rt
                                                                                 0)




                                                                                 O

                                                                                 CJ
o
o
o
o
o
                                                                                 CM
                                                3
                                                in

                                                bib
                            LU/DT1 	

-------
                                                                                                                         35
    1  1   1   1   1     1      1         I
                                                    i  I   1   1   1     *      1        1
                                                                                                    o
                                                                                                    o

                                                                                                    o
                                                                                                    CN
                                                                                                    o



                                                                                                    GO
                                                                                                    o-


                                                                                                    10
                                                                                                    o  -8
                                                                                                    GO   Q)
                                                                                                          QJ

                                                                                                    O   U
                                                                                                    P^   ^
                                                                                                          0)
               o
               c-
                                                                                                    o
                                                                                                    CO

                                                                                                    o
                                                                                                    CM
      
-------
                                                                                                                      36
                                                    i  i  i   i   i    i
                                                                                                   cs
                                                                                                  .Ox
                                                                                                   o
                                                                                                   cs
                                                                                                   cs
                                                                                                   •o
                                                                                                   00
                                                                                                   cs
                                                                                                   IO
                                                                                                   CS
                                                    i  i   i   i    i
                                                                                                         V
                                                                                                         -
                                                                                                         ID
                                                                                                         
o
o
o
  •
o
.£

CO
                                                                                                                         CO
                                                                                                                         
 a    «

 M   £>
 c    ^
 O    3
•-    o
 rt   -C
 ^   r-
 c   »
 3   05
 O   ^

 8«-

S    S

I   §
•—i   t—i
 X    rt
O   U
                                               aiijdd

-------
                                                                    37
would provide a better fit, although there is no apparent theoretical
reason for this to be the case.  The beta and gamma distributions do
not appear to fit the tail of the oxidant concentration distribution at all
well.  The nature of these distributions is discussed more extensively
in Chapter V.

                      Point  versus Area Sources
      The question of the difference in concentration distributions
between  point and area sources is of great interest from  a practical
standpoint.   One must be able to evaluate the contribution of a large
point source toward air pollution to determine,  for example,  the
feasibility of a particular location for a polluting industry.  It is, how-
ever, a difficult question for which the literature contains conflicting
answers.
      Gifford (5) has proved theoretically and has presented a limited
amount of data to  the effect that logs of concentrations of a pollutant
from a point source are proportional to a chi-squared distribution for
suitably  normalized data based on the Gaussian Plume equation.
Benarie  (3)  has  proven that  such  concentrations are lognormally dis-
tributed  on the basis that they are merely tracers of a lognormally dis-
tributed  windfield.  Knox and Lange (2) analyzed a 5-year release of
Argon-41 from the Chalk River reactor in Ontario,  Canada, (Argon-41
has a zero background concentration) and determined that the lognormal
distributions fit poorly, although they did not propose an  alternate dis-
tribution.  An analysis of their data indicates that the chi-squared dis-
tribution did not fit well either.
      In  Chapter II a modified version of the derivation of the frequency
distribution from  a point source is presented.  The  result of this
derivation,  which considers the variability in the windfield,  is a dis-
tribution composed of a sum of chi-squared and lognormal components
determined by the magnitude and direction of the windfield, the stability
conditions,  and the distance from the source.
      In  the cases presented in the literature, these factors are not con-
trolled.  There is not yet enough data to determine the distribution
identity as a function of the relevant variables.  The point to be made
here is that the presented equations do indeed predict such differences as

-------
                                                                     38
are reported, and that future research may yield a quantitative treatment
of these differences.
      A final point is that the diffusion-equation-based model is con-
sistent with this result.  The form of the predicted frequency distribution
is dependent on the relative source strength, and is sensitive to stability
conditions and  windspeeds through the advective and eddy diffusive trans-
fer fluxes.  However, the  exact form is more difficult to predict from
this model,which is more appropriate for area sources.

                              Summary
      This chapter discusses the differences in pollutant concentrations
resulting  from point and area sources.  It also discusses fundamental
differences  in the distributions measured for reactive and inert
pollutants.
      These differences are explained within the framework  of the
models discussed in Chapter II.  The fact that the observed distributions
appear to be consistent with model predictions lends further support to
the validity  of the concepts presented here.

-------
          CHAPTER V —THE FREQUENCY DISTRIBUTIONS

                        Lognormal Distribution
      The natural logarithm of a lognormally distributed
random variable is normally distributed.  This relationship implies
that the lognormal is the multiplicative analog to the normal  distribu-
tion.  In particular, where the process
            x[ = XL_I  + e                                         (V-l)

generates a normally distributed random variable, the process

            XL = (x.^) e                                          (V-2)

generates a lognormally distributed random variable; e is an arbitrarily
distributed random shock.
      Indeed,  many physical processes are best described by Eq.  (V-2)
and hence their  result is lognormally distributed.  The lognormal is
more than a variation of the normal distribution,  in fact it is one of the
most fundamental distributions of mathematical statistics.  Increasingly
it is being found that the outputs  of  physical processes are lognormally
distributed.  It is a distribution that physicists, meteorologists  and
engineers all encounter.
      The lognormal distribution is given by:

           A(Y) = N(log x)          x>  0                         (V-3)
and
           dA(x)  = —	 exp
                   xa
   1
- 	rj- (lOg X -
                                 2a
dx    x> 0.
Thejth moment about the origin is given by
                 r00
           m, = /   xj dA(x)                                      (V-5)
                r00
               = 1   e*y dN(y)                                    (v-6)
                -00
                               39

-------
                                                                     40
               = eJu + l/2j2a2 .                                   (V-7)

Therefore,  the mean and variance are given by

            rv  — o   /                                             t\7 Q\
            "  - e                                                (v-o;
                      p/2   \
            P2   2u+az  { cr    J    22                          ._, ..
            P=e       ^e   - I/ = a rj                           (V-9)

              2
where r)   =  e   - 1.   The third.moment is:
and the fourth moment is
                          6n10 + isn8 + i6n6 + 3n4)              (v-ii)
which results in nonzero coefficients of skewness S1 and kurtosis S0,
                                                 -1              ^
                m     „
            S1  =-4 = rj  +3n                                   (V-12)
            S   = -4 - 3 = n8 + 6r]6 + 15n4 + 16r]2                 (V-13)
                  4
Skewness and kurtosis are both positive and both increase as the
variance increases.                              „
      The mode of the distribution is given by e     ,  the median by e  ,
and the mean by e    '  ,  hence the curve appears as in Fig.  7.  Fig-
ures 8 and 9 illustrate the effect of varying the parameters.
      Most important to the present studies are the reproductive prop-
erties of the distribution.  The necessary theorems will be stated  with
outlines of the proofs as required.

Theorem  1.  If x^ and x2  are indpendent A variates, then  the product
X..XP is also a A variate.
      Proof: This is proved  by taking logs to convert the distributions to
             normal distributions, convolve the resulting variables and
             convert the  result back using an antilog transform.

-------
                                                         41
Fig.  7.  Frequency curves of the normal and
        lognormal distributions.

-------
                                                           42
    0
Fig. 8. Frequency curves of the lognormal
                                       2
       distribution for three values of cr  .

-------
                                                               43
   1.0
   0.8
   0.6
   0.4
   0.2
    0
     0
Fig. 9. Frequency curves of the lognormal
       distribution for three values of /u.

-------
                                                                     44
 Theorem 2.  If {x.} is a sequence of independent positive variates having
                 J
 the same probability distribution and such that:

            E{log x.} = u                                        (V-14)
                   J

            V2{logx}=cr2                                       (V-15)
                    J
                                n
 and both exist, then the product Jl  x. is asymptotically distributed as
        2                      1=1^
 A(nu, ncr ).                     J
      Proof: By analogy to the additive normal Central Limit Theorem.
      For limited numbers of variates in the sum it has been demonstra-
 ted that the sum  variable is lognormally distributed for certain ranges
 of the coefficient of variation (21).  Figure 10(a)  gives  these conditions.
      Goodness-of-fit tests have been derived for the lognormal.   The
 chi-squared  test is appropriate of course; the Kolmogoroff-Smirnov
 test is a nonparametric technique for determining a confidence band
 around  an empirical distribution function. Another useful test is to plot
 the data on lognormal probability paper on which truly  lognormal data
 will plot as a straight line.  Figures 10(b) and  10(c) illustrate several
 such  plots.
      Up to this point we have discussed primarily the  lognormal dis-
 tribution which appears to have considerable empirical and theoretical
 support.  There  are,  however, other distributions which fit the same
 data quite well and are therefore deserving of mention.  It is interesting
 to note, however,  that no non-empirical support  for these distributions
 has yet been published.
      Lynn (22) used data from Philadelphia, Pennsylvania to  estimate
 the parameters of several distributions by the  method of moments.  The
 distributions are the normal, two-parameter Lognormal, three-parameter
 Lognormal,  gamma, and Pearson-IV parameter.   The goodness-of-fit
 statistics are summarized  in Table 3.
      Not considered here is the Weibull distribution which has con-
 siderable support from  several sources,  Milokaj (23) and Barlow
(24).
      In Table  3,  notice that the normal distribution was clearly the
 worst.  The  two-parameter lognormal was the best by  a small margin

-------
                                                             45
 8  10
 0)
-Q
 E
 c
 V
 c
    10'
      0.1   0.3    1     3    10   30   100
             Coefficient of variation

Fig. 10(a). Regions of convergence where
           the  sum of n lognormal vari-
           ates is approximately lognormal.
           (A)  Convergence for both normal
           and lognormal approximations,
           (B)  convergence for the lognormal
           approximation, (C) convergence
           is uncertain (21).

-------
                                                                                                              46
                                                                                          fc
                                                                                            •
                                                                                          cs
                                                                                          o
                                                                                          CO
                                                                                                         CO
                                                                                                         V
                                                                                                         tuO
                                                                                                         rt
                                                                                                         IH
                                                                                                         (U

                                                                                                         rt
                                                                                                         §
 o
 CO

 R
                                                                                               13
                                                                                               0)
                                                                                               u
                                                                                            )    <»
                                                                                            >    c
                                                                                          o   .2
 o
 CO
                                                                                          o
                                                                                          CN

-------
                                                                                                                  47
                                                    I  i   i   i     i     i
                                                                                                o-
                                                                                                o
                                                                                                o
                                                                                                o
o

in

o
                                                                                                oo
                                                                                                o
                                                                                                m
                                                                                                o-

                                                                                                o
                                                                                                O
                                                                                                §
                                                                                                m


                                                                                                cs


                                                                                                m
                                                                                                  •
                                                                                                o
                                                                                                      (D
                                                                                                      U
                                                                                                      X
^    c
o   .2


s    1
o    v
co    u

o    8
CM    „
                                                                                                                 c
                                                                                                                 rt
                 W
                 >^
                 rt
                -o

                 C
                 O
                                                                                                                 o
                                                                                                                 a
                 w
                 0)
                                                                                                                 O
                                                                                                                 tuO
                                                                                                                 
                                                                                                                 ctf
                                                                                                                 o

                                                                                                                 en

                                                                                                                 §

                                                                                                                 £
                                                                                                                 rt
                 CO
                 C
                 O

                 "rt

                 -4->
                 C
                 OJ
                 a
                                                                                                                 0
                                                                                                                 O
                                                                                                                 tj
                      CO
                      a;
                      tuo
                      rt
                      t-,
                      OJ
                      >
                      rt
                                                                                                                      §
                     O
                     c~
                     as
                      O
                      o
                      w
o
o
                                              Ludd
                                                                                                                  o
                 W)
                 fa

-------
                                                                     48
                                                      3          22
 Table 3.  Summary of total absolute deviations (20 jug/m  classes).




Station 1960
1








1961
1962
1963
1964
1965
1966
1967
1968
Average
Station 1960
2








1961
1962
1963
1964
1965
1966
1967
1968
Average
Station 1960
3








1961
1962
1963
1964
1965
1966
1967
1968
Average
Station 9 1968
Station 11 1968

Average
Normal
dlst.
109.8
135.1
129.9
159.4
119.5
129.1
129.0
131.0
120.3
129.2
114.2
125.7
177.2

125.6
119.5
143.5
143.7
134.6
135.5
108.6
126.4
106.0
122.9

123.8
134.9
131.2
136.9
123.8
94.0
60.2
125.6
Lognormals
2-P
43.4
56. 6a
45. Oa
60. 8a
59. 3a
59.0
52.2
51. 2a
60.2
54. 2a
46.6
41. 6a
82.2

38. 8a
43. 5a
43.3
59. 8a
60.4a
52.0
47. 3a
39. la
27. 7a
56.4

56.0
58.6
53.7
61.1a
50.0
60.3
31. 6a
51.7a
3-P
43. 2a
62.6
46.7
93.1
60.3
55.9
52.4
52.5
59.4a
58.5
39.5
46.1
64 .4a

39.2
46.5
35. 8a
63.6
61.3
49. 6a
47.8
39.3
28.1
57.8

54. 6a
66.8
53.8
65.3
51.7
58. 9a
36.0
53.0
4-P
45.6
109.3
49.7
131.6
70.9
55.1
48.3
60.6
86.6
73.1
36.4
69.3
105.7

62.3
48.3
52.1
75.6
80.9
66.3
57.2
56.9
29.6
48.0

57.4
50.0
52.2
82.5
54.2
72.8
36.9
64.1
Pearson
Gamma
54.0
110.1
49.6
210.6
70.5
a 84.8
a 48.4
63.9
69.3
84.6
a 39.1
68.1
68.2

53.4
63.5
67.0
62.9
68.9
61.4
66.7
53.1
29.1
45. la

89.4
a 51.7
49. 7a
73.1
57.2
60.7
32.0
66.8
  *Best fit.
over the three-parameter lognormal,  and the Pearson distributions
fared considerably worse.  However,  note that In some cases each of
the distributions provided the best fit.  It is a curious fact that the

-------
                                                                     49
fitting method employed assigns a nonzero value to the location param-
eter despite the fact that zero will provide a better fit according to the
sum of absolute deviations criterion.  This provides a caveat, that this
table might have been considerably different had a different fitting
method or criterion been employed.  This is  seen from the fact that the
2-p lognormal fit better than the 3-p,  of which it  is a special case.
Note too that the 2-p lognormal fared better than the 4-p distributions,
a surprising fact due to the greater flexibility of the 4-p distributions.

                         Weibull Distribution
      In 1951,  Woloddi Weibull (25) published a paper  in which the
applicability of the distribution commonly written:

            f(x) = Kxm exp [- Kxm+1/(m + 1 )]                     (V-16)

was demonstrated.  Although it had been known before 1951, the distri-
bution has come to bear his name.  The derivation presented therein is
an interesting one since it  is not derived from a single theoretical
principle.   Weibull approached the problem of finding  the probability of
failure of a chain consisting of n links P .  He noted that the probability
of nonfailure of the chain l-Pn is equal to  the probability of nonfailure of
all the links simultaneously (1-p)  where  p is the probability of failure of
an individual link. Therefore, if each link has a distribution function
governing its failure of the form
            Fix)  = 1 - e,                                     (V-17)
the distribution function for the chain will be
            Pn =  1  - e^(x) .                                    (V-18)

The remaining problem is to specify ^(x).   The only necessary condition
is that it be a positive nondecreasing function, vanishing at ^ which is
not necessarily equal to 0.  Weibull then stated that the simplest function
satisfying this condition is

                 i     \m
            f(x) = (x " ^
                    X0

-------
                                                                     50
The remarkable fact about this is that there  is no theoretical justifica-
tion for using this form,  indeed Weibull states: ". . .  it is utterly hope-
less to expect a theoretical basis for distribution  functions such as ...
particle sizes."
      As it happens,  the Weibull distribution has been used to fit a large
number of naturally occurring phenomena quite well.  These include oil
spill data,  particle sizes, distances in cotton fibers,  molecular weights,
and solution  concentrations.
      The Weibull has both two- and three-parameter forms,  the latter
 Eq. (V-16)  being more common,  where parameter m determines the
shape of the  curve and K is the scaling factor.  Figure (11) illustrates
the Weibull distributions for several values of m.  Note that for m = 0
the Weibull reduces to the exponential, and for m ='1  it is equivalent to
the Rayleigh distribution.  It is obvious that when m = 1 or 2 the dis-
tributions appear similar to the lognormal.  In fact, in cases  where both
distributions fit the  same data, the Weibull shape parameter is always
near this range.
      The mean of the Weibull distribution is given by

                         /m+i                 - 2/m+l
            E(x)  .              r
                   m +
                       ,)
and the variance is given by
            Var(x) =
r (m + 3\ - r2 (m + 2\
  \m + I/ "    \m + I/
                                                                (V-21)
      As is usually the case with skew distributions in practical applica-
tions,  the median is used as a measure of central tendency rather than
the mean.   The latter is extraordinarily sensitive to values in the tail of
the skew distribution.  This is true also for the lognormal.
      For the Rayleigh,
            E(x) =                                               (V-22)
            Var(x)=£ 1  - ~   .                                  (V-23)

-------
                                                           51
    2.0

    1.6

    1.2

    0.8

    0.4
     0
      0    0.4  0.8   1.2   1.6  2.0  2.4
                   /T7K
Fig. 11.  Frequency curves for Weibull (top)
         and Rayleigh probability distributions.

-------
                                                                  52
It is interesting that the Weibull Ls usually fit with a narrow range for
the shape parameter from one application to another.  This suggests
that the Rayleigh distribution may provide  a fair fit, a surprising fact
because  it implies that a complex physical process is adequately de-
scribed with only one parameter in a distribution with no theoretical
foundation.
      Figure 12 demonstrates the appearance of the Weibull probability
distribution on log probability paper for parameter values typically
obtained  in pollution work.

                         Gamma Distribution
      The gamma distribution is also used in air pollution work.  We
can easily see why from Fig.  13(a).  The gamma has the ability to
appear quite similar to both the lognormal and Weibull,  depending on
the values of the scale parameter J3 and the shape parameter a in
                                                                (V-24)
           a > -1, ]3 > 0

           0 < X < 00 .

      The distribution can be derived as the distribution of the sum of
n identical exponentially distributed random variables.  The mean and
variance of this distribution are given by
           E(x) =j8(a +1)
           Var(x) = /32(« + 1)                                    (V-25)

The gamma distribution also has a three-parameter form.  The three-
parameter form is derived by subtracting a location parameter from  the
mean, a process which also results in three-parameter forms for the
other distributions.  The three-parameter form is given by

-------
                                                                                                                      53
                                                                                                   o
                                                                                                   cs
                                                                                                   o
                                                                                                   o
                                                                                                   o.
                                                                                                   CO
                                                                                                   o
                                                                                                   o
                                                                                                   o
                                                                                                   o
                                                                                                   00
                                                                                                          (J
                                                                                                          X
                                                                                                          c
                                                                                                   o   .2

                                                                                                   10    o
                                                                                                   o
                                                                                                   CO
                                                                                                   CN




                                                                                                   10

                                                                                                   o
                0)
                a
                rt
                ex
                cti
               x>
                o
                ^
                a

                tuo
                o


                c
                o
                o

                "a

                C5
                o

                ^-»


                J

                'C
                -4->
                03
               ^2
               • •—I

                OJ


               ^


                
               • ~^
               -*->

                rt
               r-H



                1
8
o
  •

o
                                                                                                                   t
                                                                                                                  • »-«

                                                                                                                  fa
                                               uj dd

-------
                                                                                         54
o        in
o        rx

.—        O
o
in
m
CM
                                                                                          CO
                                                                                          0)

                                                                                         r— I
                                                                                          0)


                                                                                          to
                                                                                          rt
                                                                                          G
                                                                                          O
                                                                                          CO
                                                                                         • v-4
                                                                                         TJ
                                                                                          OS
                                                                                         •8
                                                                                          al
                                                                                          s
                                                                                          s
                                                                                          rt
                                                                                          CUD
                                                                                          CO
                                                                                          ta
                                                                                          O
                                                                                          CJ
                                                                                          C
                                                                                          cu

                                                                                          a1
                                                                                         CO
                                   —      o

-------
                                                                   55
                                                 cs
                                                 cs
                                                 cs
                                                 cs
              0)
              a
              rt
              a,
                                                 o
                                                 o


                                                 LO

                                                 cs'
                                                 cs
                                                  CO
                                                  CS
                                                  CN



                                                  O

                                                  CS
                                                 O
                                                 CO
                                                 O
      0)
     T3


      0)
      O
rv    x


0    c
S*O

o   -2
                                                 o
                                                 CO


                                                 o
                                                 CN
                                                 CM
     c
     0)
     o
     c
     o
     u
              rt
             Xi
              o
              f-i
              a

              tuO
              o
 0)


"o
•—I
 a


 c
 o


 3
,0

'£
-*-j
 en



 a)

 s

 s
 rt
 tuo
              s

             a



             3
             CO
             t—I

              bi
                                                 o

                                                 o*
iudd

-------
                                                                    56
                                     a
f(x) =
                              (X - XQ)
(V-26)
                                     x > 7
                                     a > -1
                                     j3 > 0 .
The gamma distribution is most widely used in reliability theory.
      Figure 13(b) demonstrates that appearance of the Gamma
probability distribution on log probability paper for parameter values
typically obtained in pollution work.
                         Pearson Distribution
      Pearson's system is to provide a theoretical density function for
every possible combination of skewness and kurtosis (B.., B2) (see
Fig. 14).  There are three main types,  I, IV, VI.  Type I is the beta,
type IV is the gamma.  The procedure is to calculate ELand B0  and see
                                                     1      £t
which part of the plane is indicated.  For air quality data, type  I often
occurs.  Type  VI has been  investigated also, although type IV was not
needed.
      The type I density is given by
                                    m
            f(x)
4-p form,
    _T(p) • r(q)   (x - A)  l (B - x)
       T(p + q)       ,R
                     \£S ~

    = p -1
    = q - 1.
The type VI is given by
y =

(q2 +

xo(qi

q9
D2(qi-q2
ql
-1) ^(c^-c

qt-q?
- 2) l *

12 - 1) F(q

r(qx)

, + l)
(\q2~
L +~^2~)
1 \QI
- \ Ai / -
(V-27)
                                                               (V-28)
(V-29)

-------
                                                                                             57
|

-------
                                                                       58
where
and
                     X (q  +1)
            A2 ' (qt -  1) - (q2 + 1)                               
-------
                                                                                                                  59
                                                                                              o-

                                                                                              fc

                                                                                              cs
                                                                                              CN
                                                                                              CN
                                                               ^
                                                                                              oo
                                                                                              o-
                                                                                              8
                                                                                              o
                                                                                              10
                                                                                              o
                                                                                              CO

                                                                                              o
                                                                                              CM
                                                                                              10



                                                                                              CN




                                                                                              10
                                                                                               •
                                                                                              o
8
 o


uidd
o
  •
o
                                                       c
                                                       0)
                                                       o
                                                       c
                                                               •§
                                                               (H
                                                               a
                                                                                                           •a
                                                                                                            a>
                                                                                                    0)
                                                                                                   13
                                                       ?>      X
                                                                                                            O
                                                                                                            — I
                                                                                                            a
                                                       Q)      g

                                                       c      2
                                                       o      £

                                                       P      5
              fl
              •!->
              CO
              •iH
              •a

              rt
              •*j
              OJ
              ^3

              0)
              >
              • V*
              -f-»
              aJ
              i— i
              y

              £

              a
                                                                                                           b

-------
                                                                    60
            jen[f(x)] - InK + minx - Kxm+1/(m + 1)                (V-36)

            d{ln[f(x)]} = m _
               3x       x

                      = | (m - Kxm+1) .                         (V-38)

For the gamma,

            f(x) =	L-rXae"x^                            •    (V-39)
            in[f(x)] = inl - 4na! - ^n(/3) +aj?nx - x/3    ,       (V-40)
                      = i (a - x/)3)  .                            (V-42)
                       A

Summarizing the final result for each:

            lognormal =!(-!+ -|-^)
                          V    a    cr /

            Weibull   = - (m - Kxm+1)
                       X

            gamma    = — (a - x/|3)
                       X

      In each case there is  a constant term, a function of the shape
parameter,  and a function of x.   From experience in fitting those dis-
tributions, we know that the value of the appropriate parameters adjust
so that the constant term is often between 1 and 2.  The remaining term
is higher  order,  and  serves to provide the differences in  goodness of
fit noticed between these distributions.
      Thus, the required theoretical results have been provided, with an
additional section describing the similarity between these distributions.

-------
                                                                     61
                              Summary
      In this chapter the air quality distributions presented earlier are
discussed from a mathematical viewpoint.   This serves to illustrate
more clearly the nature of air quality distributions.
      A result  of significance to air pollution data analysis is the trans-
formation which indicates a  fundamental similarity between the various
distributions used to fit air quality data.  Future research may there-
fore be directed at modifying the second-order term to provide a more
accurate distribution in cases where the two-parameter lognormal  is
inadequate because of the  magnitude at the source term or the reactive
nature of the pollutant.

-------
           CHAPTER VI —ILLUSTRATIVE APPLICATIONS

      There  is considerably greater utility in these results than the
simple comparison of air quality to standards.  Knowledge of the
nature of these distributions and parameter variations under various
conditions allow the construction of models for a variety of purposes.
Several are outlined in the following section.

                 Analysis of Meteorological Patterns
                   for Pollution Level Forecasting
      A particularly useful application of these techniques is discussed
below.  It is included as  an example of the power of the techniques  pre-
sented earlier.  Note that the fundamental philosophy of this  application,
i.e., that pollutant concentration distributions can be used as a partial
substitute for meteorological data,  is motivated by the arguments pre-
sented in this dissertation.
      It has been established in Chapter II that the only variables affect-
ing future pollutant concentrations are emissions, meteorological
variables, photochemical change,  and  current concentration  levels. It
was further established that for all pollutants for all averaging times,
concentrations are approximately lognormally distributed by both
theoretical and empirical arguments.  These assumptions can be used in
the formulation of a predictive model.   If one plots cumulative distribution
functions on  lognormal probability paper for individual days using hourly
averages,  the resulting curves are based on 24 points and, as expected
from the theoretical argument in Chapter II,  are relatively flat,  straight
lines.  If one plots a great many such curves taken from data at one
location  over a certain period of time,  it is possible that the resulting
diagram will appear similar to Fig.  16. For regions where  the clima-
tology is very persistent, the clustering at the lines will be clear.
      These  lines can be  mapped into points  by plotting geometric mean
(GM) versus SGD as in Fig.  17.   In this plot, taken from actual Los
Angeles  (downtown) oxidant data for  1970, the degree of clustering  is
clear.  It is  also clear that changing the metric in which these points are
plotted alters the degree  of clustering  seen from a plot.  If we can  find

                                 62

-------
                                                                                                         63
  I  7111     I
I  I   I   I   I    I
o

o
                                                                                         Os
                                                                                         00
                                                                                         Os

                                                                                         o
                                                                                         cs
                                                                                         o
                                                                                         CO

                                                                                         o
                                                                                         fv.
                                                                                         o
                                                                                         CO

                                                                                         o
                                                                                         CN
                                                                                         to
                                                                                         CM
      V
     -o
      <0
      0)
      o
      X
                                                                                         o   .2
      c
      0)
      o


      8
              CO
              c
              (4
              0)
 nJ

 §•
 rt

«M
 O

+j
 
                                                                                                       co
                                                                                                       CO
                                                                                                       o
                                                                                                       a
                                                                                                       CO
                                                                                                       W)
                                                                                         o
                                                                                           •
                                                                                         o
                                         aidd

-------
                                                                     64
the most independent clusters, with the total number of clusters con-
strained,  we have then identified days with similar air quality patterns.
      The significance of this is clear from Chapter H.   Days with
similar air quality patterns tend to have similar meteorological and
emission patterns.  Therefore, if  we can  identify the meteorology and
emissions  in each cluster and can  then predict tomorrow's meteorolog-
ical and emission pattern, we can  determine in which cluster tomorrow's
air quality will fall.  Of course,  each cluster refers to  a particular GM
and SGD which,  as shown in Chapter II, completely describe a day's air
quality from the point of view of air quality standards.
      There is evidence (26) to the effect that emission  patterns are
primarily dependent on the day of week  and time of year. Consequently,
if we  stratify our clustering graphs by day of week and season,  we can
then relate  each cluster directly to a meteorological pattern.
      The problems remaining are twofold:
            1.  How do we find the "best"  clusters?
            2.  How do we determine into  which cluster tomorrow's air
               quality falls?
      Both of these problems could be handled pragmatically by an
"eyeball"  solution.  In areas with high climatological persistence, this
coarse method might give acceptable results.  There are,  however,
exact methods to deal with these problems also.
      If we  refer to Fig.  17 and treat the points as nodes in a graph, we
immediately realize that the clustering  problem that has arisen from our
air quality classification problem  is  mathematically equivalent to the
clustering problem in modern graph  theory.  In fact, a  great  nUmber of
papers have been written concerning methods  of solving for the optimal
clusters.  In general, the techniques do not propose to solve a large
problem completely,  but rather they deal  with effective compromises
which utilize the tradeoff between the distance the algorithm comes from
the true optimum and the computing time necessary to reach that point.
      In our case, the problem will not usually be very large.  It is
unlikely that more than three or four years of data would be analyzed
together because of the gradual change in  emissions. That leaves us with
approximately 1400 points.   As a working approximation, we  can use only
two significant figures which will serve to  make many points identical.

-------
                                                              65
    4.0
 c
 0
 
-------
                                                                    66
                                     3
Hence we may expect to have about 10 points which we will try to
divide into between 10 and 20 clusters.  Several algorithms are
available which will handle these numbers in a reasonable amount of
computing time (27).

Selecting the Clusters
      It is beyond the scope of this work to discuss the details of these
algorithms; however, a simple explanation of a general method  is in
order.  First,  arbitrary clusters are selected and the bivariate median
within each is calculated.   Then, for  each cluster,  we calculate the sum
of the distances from the bivariate median to each point within the
cluster and the sum of the distances to each point out of the cluster.  The
objective function is to  maximize the  between-cluster differences minus
the within-cluster differences.  Each  algorithm has a rule by which
incremental changes in the clusters are made; at each stage the objective
function is recalculated.  Each algorithm also has a stopping rule based
upon the size of the  marginal improvement of a single or of a series of
changes  in the  clusters.

Classifying New Days
      The next problem to be dealt with arises after the final clustering
arrangement has been identified.  When the model goes into operation,
the National Weather Service (NWS) forecasts are  examined and the
values of the predictor  variables are  selected. Now, from, these values
we must decide into which cluster the new day should be classified.  This
question should be answered even before the meteorological data are
reduced.
      The simplest method is to reduce data for the variable which are
thought to be most important and construct a range of each variable for
each category by "eyeball."  Then the forecaster looks at  the ranges thus
selected and selects the cluster which the  day most closely matches.  A
problem arises when th,e cluster is not distinct; in this case more
variables or narrower ranges are needed.  This information could  also
be displayed concisely in a series of nomographs,  which would eliminate
one source of  error.  A refinement of this technique would call for
probability distributions of the ranges so that a value occurring near the

-------
                                                                     67
 center of the range would be weighted more heavily than one near the
 extreme.  Then point scores based upon probability could be used as the
 selection procedure.
      A more rigorous method would be one that examines the proba-
 bility distributions systematically and  calculates both the  classification
 and  its probability of error.  A well known statistical technique for
 which programs are available and which performs the required opera-
 tions is multiple discriminant analysis  (MDA).
      In discriminant analysis, linear  functions are developed which
 classify a new set of observations into one of  several  existing categories.
 The basic philosophy of the  method LS to define the categories to be used,
 in this case each  cluster. Then,  the values of the predictor variables
 associated with each point within  each  cluster are examined to discover
 patterns which will  aid in the classification of a new set of predictors.
 A new metric for  the predictors is found which maximizes the discrim-
 ination between the  classes  of predictors.  Then based upon the within
 and between class distributions, conditional probability functions can be
 constructed which give the probability  of membership in the ith class
 for a new set of predictor variables.  A more complete discussion of the
 statistical techniques involved may be found in Ref. 28.

 Recalibration
      It is conceivable that over a period of several years the patterns
 of emissions and /or meteorology  of an area will change in such a way as
 to affect the accuracy of the predictions made with this model.  Fortu-
 nately, recalibratlon is accomplished in a  relatively simple,  straight-
 forward manner.
      Throughout  the operation of the predictive model data should be
 kept, perhaps in a small  notebook, noting the  values of the variables used
 as predictors,  the prediction,  and the observed ambient air quality
 (AAQ).  When the  predicted  and the observed  values vary unacceptably
 the model is recalibrated, perhaps by the addition of a new category,
 either by  an heuristic method or the "clustering" program, or by a  full
 recalibration performed exactly as the original.  Since the new data are
 available  and the programs are already written, this procedure presents
 no problems.  It is unlikely  that it would be performed more often than
biannually.

-------
                                                                     68
Spatial Interpolation
      It should be clear that the foregoing analysis can predict concen-
tration levels at only the receptor locations for which data sufficient for
calibration are  available.  To predict air quality throughout a region,  an
Interpolation scheme is required.
      Because of the high correlation between wind velocities and con-
centrations of inert pollutants, especially over areas with relatively
simple topography,  Interpolations can be made.  It Is necessary to make
the assumption  that concentration  Isopleths can be determined from the
streamlines of the wlndfleld, source location and  available meteorological
and concentration measurements  (3).  Given this  and several receptors,
one can Identify the value of the Isopleths passing through the receptors
and interpolate  for locations between the Identified lines.  Linear
Interpolation Is  adequate for areas which appear  to be uniform In
topography and  emissions.   Otherwise,  experimental sampling,  ran-
domized spatially and  temporally,  may be employed to determine better
Interpolation functions.   For Inert pollutants from a point source,
Benarie  (3) has  discussed this Interpolation problem  with respect to the
frequency distributions.  In his work the same fundamental assumption
is made," but he makes another assumption which  simplifies the calcula-
tion of the frequency distribution of an Intermediate point.  In particular,
he assumes that the SGD, represented by the slope of the plot of the
distribution function or lognormal probability paper,  Is constant through-
out a  streamline of the wlndfleld.  Therefore he requires only one point
on the distribution function to estimate both parameters at an Inter-
mediate  point along a  wlndfleld streamline for  which  he has  the SGD
calculated at another point.  This  method provides additional information
with little effort and seems promising to be used In conjunction with the
random sampling scheme mentioned above.

An Example
      As an Illustration of the techniques described herein,  we shall
present a simplified and  to some  extent hypothetical application of the
model.   Figure 17 Is a graph of GM versus SGD for oxldant data In
downtown Los Angeles In 1970.  In Fig.  18 "eyeball" estimates of  the
clusters have been made.

-------
                                                                 69


3.0
0

-------
                                                                     70
      Now, according to the procedures outlined above, one takes a
 random sample of the points in each cluster and analyzes the meteorology
 for the day that point represents.  This analysis requires a large
 amount of data reduction,  and functions best with more than one year of
 data, stratified by season and day of week, to determine the values of
 windspeed, temperature and other variables used.  Some of these data
 are  recorded only in the archives of the NWS on synoptic  scale maps
 which require a trained meteorologist to read. This is clearly beyond
 the scope of this work,  despite the fact that this model is comparatively
 simple to calibrate.  However, an actual run of the model is not
 essential for illustrative purposes.  Instead we shall now turn to a
 hypothetical discussion of the kind to be expected in real application.
      If we assume that we have successfully reduced data for a random
 sample of the days in each category,  we can examine the  range of values
 of the variables for each category.  If we  see,  for example, that for the
 topmost cluster in Fig. 18 windspeeds are between 0 and  5 knots and the
 temperature varies between 90 and 95 deg, the oxidant concentrations
 are  at episode levels.  We record these values and proceed to the next
 cluster in each case,  noting the mean, range and standard deviation
 of each variable.
      Upon completion of this analysis we will be prepared to draw
 Fig.  19 which is a graph of the ranges of the two  variables for each
 cluster and the identification of the cluster.  Note that a nomographical
 technique would be required for a case with more than two predictors.
      The shaded portion of the figure indicates an area of uncertainty
 where more than  one cluster applies.  The decision can be made using
 rigorous techniques such as discriminant  analysis which gives both the
 classification and the probability  of error, or by an heuristic
 technique such as determining how many standard deviations the center
 of the shaded portion is from the  center of each of the two clusters and
 selecting the smaller.   In effect,  the latter method is a simplified
"discriminant analysis."
      Each cluster has its expected air quality level and a range of un-
 certainty,  hence once the  graph (Fig. 19)  has  been entered with the NWS
 predicted values,  the problem is  complete.

-------
                                                           71
               5              10
                Windspeed — knots
15
Fig. 19. An example of the form of the chart
        to be developed in comparing the clusters
        generated in Fig. 18 with windspeed and
        temperature.

-------
                                                                    72
Development
      The model, described above takes into consideration the expense of
data collection and development work and the availability of computer
time in that it minimizes the calibration  data requirement, does not
need a mesoscale weather prediction model because it adapts the standard
NWS predictions,  and does not require a computer on-line for pre-
dictions.
      Further, the development effort required by the predictive model
is also applicable to land use planning models and the comparison of
present air quality with that of a benchmark year.  For the latter
application, the  model eliminates bias in the comparisons due to
differences in the meteorology of the years being compared.
      This model was proposed to the California State Air  Resources
Board for use in the South Coast Air Basin to predict oxidant concen-
trations 24 hours  in advance.   The funding request was $80,000.  In-
cluded in this amount was a full-time statistician, a part-time
programmer and a full-time meteorologist.
      Other modeling concepts are likely to be more  expensive.  For
example,  the multibox modeling concept  requires  considerably more
effort and computer time, and least-squares analysis requires sub-
stantially more data.

                         Transition Matrices
      A further application of the material presented herein  involves the
application of the categories defined above in land use planning.  In
particular, the air quality categories,  stratified by emission-day-types,
will be used to determine typical meteorological patterns,  a problem
previously tractable only subjectively by meteorologists examining large
numbers of weather maps or by statisticians analyzing huge  amounts of
data much  of which must be reduced from maps by trained meteorol-
ogists (29).
      It can be argued that each of the clusters defined in the calibration
of the predictive model represents a particular meteorological pattern.
It is conceivable that two or more different patterns could  constitute the
same cluster; however,  this is not necessarily significant because it is

-------
                                                                    73
unnecesary to distinguish between meteorological patterns yielding
identical air quality for many purposes.
      Therefore, if one calculates the frequency of occurrence of each
pattern stratified by day of week and season, the results thus ob-
tained allow one to simulate a year of "typical"  meteorology which,  in
actuality, is a composite of all years for which  air quality data exist.
For most large cities,  the  continuous air monitoring program (CAMP)
began in 1961.  This "year," which is actually created as a Markov
model using a state transition matrix,  can  be used  in conjunction with a
dispersion model as a benchmark year for  comparing air quality over
time, or in land use planning to determine  future annual average pollu-
tant concentrations.

Random Sampling
      The justification of the lognormal assumption permits the use of
parametric methods to operate on air quality data.  An example of the
usefulness of parametric methods as opposed to distribution-free
methods is seen in the sample size  required to estimate population
parameters within a specified accuracy.
      Table 4 indicates the  sample sizes required under the various
assumptions for oxidant data taken in Los Angeles in 1970.  The
efficiency of the parametric methods suggests that  random sampling is
an efficient method of characterizing regional air quality when an
appropriate randomized scheme  is used. Note that the measurements
consisted of hourly averages every  hour for an entire year.  Clearly,
a more cost effective scheme was possible.

-------
                                                                74
Table 4.   Sample size required for various confidence limits on
          estimates of GM and SGD.
PARAMETRIC
Mean (Z-Statlstlc)
At 95%

At 99%


At 95%



10%
5%
10%
5%
2
Variance (x. Statistic)
20%
10%
5%
NONPARAMETRIC
90 samples
359 samples
1 55 samples
621 samples

200 samples
750 samples
5,000 samples

Mean (using Chebyshev Inequality)
At 95%

At 99%

10%
5%
10%
5%
467 samples
1,869 samples
2,336 samples
9,344 samples
Variance [using Kolmogoroff's method (D-Statlstlc)]
At 95%

10% on CT
5% on CT
3,000 samples
(approx)
18,500 samples
(approx)

-------
            CHAPTER VII —SUMMARY AND CONCLUSIONS

      The object of this work is to provide a model which binds together
 previous theoretical and empirical findings in a unified framework,  and
 in so doing provides a deeper understanding of the physical processes
 which affect frequency distributions of air pollutant concentrations.
      To this end,  the frequency distributions of air pollutant concen-
 trations have been derived from first principles for both point and area
 sources, for both reactive and  inert pollutants.  These results have been
 compared to published findings and have been found to  be consistent.

                            Area Sources
      Both Larsen's data analysis and Gifford and  Hanna's simple model
 indicate that pollutant concentrations are approximately lognormally
 distributed.   The former work  consists of the examination of large
 quantities of data for all pollutants,  for all  cities and for all averaging
 times.   The latter indicates the high degree of correlation between
 windspeeds,  which are approximately lognormally distributed, and
 pollutant concentrations.
      From the nonempirical standpoint,  this distribution can be de-
 rived from the Fickian Diffusion equation by manipulating it into a finite
 difference form and demonstrating its consistency with the law of pro-
 portional effect. This method predicts the  lognormal distribution will
 fit best for inert pollutants from area sources, and least well for re-
 active,  secondary pollutants.  Larsen's results and those of Gifford
 and Hanna are in good agreement with this assertion.  This derivation
 also predicts that the lognormal distribution will not fit as well close to
 a source as it will  further away.  Recent  data collected near large
 sources bear this out.
      Also, a generalization of Gifford's point source model based upon
 the Gaussian Plume solution to the Fickian Diffusion equation indicates
 that pollutant concentrations are lognormally distributed  if the geometric
and arithmetic means are simply related.
    .  Peripheral to the main discussion is an explanation of the surpris-
 ing fact that pollutant concentrations are approximately lognormally dis-
tributed for all averaging times.  This is explained through an analysis
                               75

-------
                                                                     76
of the averaging process as a window through which atmospheric motion
of various scales can be seen.
      As a result of this Investigation, we are prepared to assert
strongly that the Ipgnormal Is an appropriate distribution to use to
characterize air quality data.  We recognize that this will not
significantly affect current practice, which has been proceeding on this
basis,  but will serve to quell the arguments concerning the correctness
of this assumption,  and lend further empirical and  nonempirical sup-
port to those who are currently using the lognormal assumption.  These
users Include parties responsible for monitoring air quality, meteorol-
ogists who are modeling atmospheric transport, and others who have
use for these distributions along the lines suggested In Chapter VI.

                            Polnt'.Sources
      No general agreement exists on the Identity of the frequency dis-
tributions of air pollutants emanating from a point source.  The empirical
findings of Knox and Lange and Benarie are at odds with the theoretical
prediction by Gifford.   At  present there Is no explanation of these dis-
crepancies in the literature.
      In Chapter II a derivation Is presented which  Indicates  that any of
the distributions mentioned in the literature may result depending on
atmospheric stability,  wlndspeeds,  and the distance from  source to
receptor.  Within this framework,  each of the results In the literature
may be obtained for the appropriate values of the relevant variables.
      This suggests strongly that the eventual quantification of these
relationships will proceed along the lines outlined here.   This  model is
the first to reconcile the conflict through a treatment which provides
understanding of the fundamental physical processes involved.   It allows
air pollution engineers to state with some certainty that pollutant  con-
centration distributions resulting from a point source are  neither log-
normal nor chl-squared,  but rather a subtle combination which depends
upon the particular conditions under which the pollutant is measured.

-------
                                                                     77
                          Related Variables
      The main point to be made here is that pollutant concentrations
are tracers of atmospheric motion.  As  such, air quality frequency
distribution data can be used as a partial substitute for meteorological
data under certain conditions. An example is presented in Chapter VI.
      To illustrate this point the cases of advection, diffusion and
deposition are treated.  Advective transport rates have been  investigated
empirically, and the lognormal distribution appears to fit  quite well.
Eddy diffusive transport rates can be demonstrated to be approximately
lognormal by Kolmogorov's  similarity theory argument, based  upon
energy exchange between different scales of turbulent motion.  Deposi-
tion is based upon particle size distributions, which can be shown to be
approximately lognormally distributed from both empirical and non-
empirical arguments.
      According to the "simple model" proposed and effectively applied
by Gifford and Hanna, there is substantial  correlation between windspeeds
and pollutant concentrations. Based upon the lognormality of transport
this statement is well motivated for nonreactive pollutants, especially
in areas with relatively small source terms.  In some cases  the deposi-
tion component of the source term is also lognormally distributed,
further contributing to the argument for  the lognormality of pollutant
concentrations.

                         Other Distributions
      A number of authors have investigated the use of other  frequency
distributions to fit air quality data.  The gamma, Weibull  and beta dis-
tributions have received a good deal of attention.  These distributions
tend to fit marginally worse than the two-parameter lognormal, according
to an extensive study by Lynn.
      The fundamental questions are then:  Why do  these distributions do
as well as indicated in the literature, and is there any theoretical sup-
port for these distributions to be used to characterize air  quality data?
      There are no reports in the literature presenting any nonempirical
support for these distributions.  However,  the question remains that
there must be some mathematical  similarity between these distributions
for the fits which have been  observed to  occur.

-------
                                                                     78
      In this work a transformation has been found which transforms the
lognormal,  WeibuLL and gamma distributions to approximately the same
form for typical parameter values observed in air quality data fits.  This
is indicative of a fundamental mathematical similarity between the dis-
tributions which demonstrates that if any one of the distributions fits the
data the others must also, with only small differences in the goodness of
fit.
      This argument is significant in terms of  the long standing discus-
sion in  the scientific community concerning which distribution is most
appropriate, in that it gives a greater  mathematical understanding of the
goodness of fit observations.  It also suggests  a path for future research
to determine the basic differences between the higher order terms of
each distribution and how they relate to the physical processes at hand.

                             Applications
      It is  instructive to examine some of the modeling possibilities
opened  by the results presented here to demonstrate their utility.  The
applications outlined herein  are actually being  developed, or have been
proposed for future development, in the author's work at the Lawrence
Livermore Laboratory.  It is expected that their utility  will be dem-
onstrated as the Laboratory effort progresses  over the next several
years.
      Frequency distributions can be used to characterize meteorological
and air  quality patterns  which have application in land use modeling and
pollution level forecasting.  They can also be used in air pollution dis-
persion modeling,  and in the validation of such models.
      The results  presented herein and the accepted results in the litera-
ture justify these modeling concepts more firmly than the latter alone.

                           Future Research
      For area sources  a major question to be addressed is the pre-
diction  of the parameters of the concentration distribution.   Research
in this area is being conducted at present (30)(31).
      The author has further work planned in studying the relationship of
meteorological parameters to concentration distribution, particularly in
the relationship of windspeeds to concentration.

-------
                                                                     79
      The point source question is more complicated because of the
changing identity of the distribution.  The fundamental question here
concerns the change in shape and parameters of the distribution as a
function of windspeed, stability, reactivity of pollutant and distance.
Perhaps other variables like stack height will also be significant.  It
will take a good deal more data than is currently available to produce
definitive results on this matter.
      The author has new  work planned to delve more deeply into the
matter  of concentration distributions resulting from a point source.
Simulation experiments are  planned  using the ADPIC (32) model to cal-
culate such pollutant concentrations  at various distances from the source
under various meteorological regimes.  The resulting frequency dis-
tributions will be compared  with those predicted  in Chapter II.   This
work  should be completed  in 1974.
      The fundamental question concerning the identity of the distribu-
tions  is not yet completely resolved.  Additional  theoretical and
empirical support is still  welcome,  despite the strong arguments made
in this work and in previous  published reports.

-------
                                                                  80
                       LITERATURE CITED

1.   R.I. Larsen and C. E. Zimmer,  "Calculating Air Quality and Its
     Control," JAPCA, 15,  565 (1965).
     R. I. Larsen, "Analyzing Air Pollutant Concentration and Dosage
     Data," JAPCA, IT., 85 (1967).
     R. I. Larsen and C. E. Zimmer,  "A New  Mathematical Model of
     Air Pollutant Concentration Averaging Time and Frequency,"
     JAPCA, 19, 24 (1969).
2.   J.  B. Knox and R. Lange,  "Surface Air Pollutant  Concentration
     Frequency Distribution:  Implications for Urban Air Pollution
     Modelling," University of California,  Lawrence Livermore Lab-
     oratory, Report UCRL-73887 (1972).
3.   M. Benarie,  "The Use of the Relationship  Between Wind Velocity
     and Ambient Pollutant Concentration Distributions for the  Estima-
     tion of Average Concentrations from Gross Meteorological Data,"
     Proceedings of the Symposium on Statistical Aspects of Air Quality
     Data, Chapel Hill, North Carolina, November 1972.
     M. Benarie,  "Sur La Validite De La Distribution Logarithmico-
     Normale Des Concentrations De Pollutant," Second International
     Clean Air Congress, 1970.
4.   A. C. Stern,  "Air Pollution," Vol.  Ill, 2nd Ed.  (Academic Press,
     New York,  1968).
5.   F. Gifford, "Statistical Properties of a Fluctuating Plume Dis-
     persion Model," Proceedings of the Symposium on Atmospheric
     Diffusion and Air Pollution, Oxford, August 1958.
6.   F. Gifford, "The Form of the Frequency Distributions of Air
     Pollutant Concentrations," Proceedings of  the Symposium  on
     Statistical Aspects of Air Quality Data, Chapel Hill,  North
     Carolina, November 1972.
7.   J.  B. Knox and R. I.  Pollack,  "An Investigation of the Frequency
     Distributions of Surface Air Pollutant Concentrations," Symposium
     on Statistical Aspects of Air Quality Data,  Chapel Hill, North
     Carolina, November 1972.

-------
                                                                    81
 8.   M.  C.  MacCracken, T. V.  Crawford, K. R.  Peterson and
      J.  B. Knox, "Initial Application of a Multi-Box Air Pollution
      Model to the San Francisco Bay Area," University of California,
      Lawrence Livermore Laboratory, Report UCRL-73994 (1972).
 9.   C. Hopper, personal communication,  1972.
10.   N. A. Fuchs,  The Mechanics of Aerosols,  (Permagon Press,
      New York, 1964).
11.   F. A. Gifford  and S. R. Hanna, "Modeling  Urban Air Pollution,"
      ARATDL Contribution  No.  63,  1972.
12.   F. A. Gifford  and S. R. Hanna, "Urban Air Pollution Modelling,"
      presented at the 1970 International Air Pollution Conference of
      the International Union of Air Pollution Prevention Associations.
13.   A. N. Kolomogoroff, Dokl. AN 5SSR, 30_, 301 (1941).
14.   A. M.  Yaglom, Dokl.  AN SSSR,  166,  49 (1966).
15.   A. S. Gurvich, Dokl. AN SSSR, 172,   554 (1967).
16.   G. K. Batchelor, "The Application of the Similarity Theory of
      Turbulence to  Atmospheric Diffusion," Quart.  J.  Roy. Met. Soc.,
      7_6,  133 (1950).
17.   T. V. Crawford,  "Atmospheric Diffusion of Large Clouds,"
      Proceeding of  the USAEC Meteorological Information  Meeting,
      September 1967,  Chalk River,  Ontario Canada, Rept. AECL-2787
      (1968).
18.   T. V. Crawford,  "A Computer Program for Calculating the
      Atmospheric Dispersion of Large Clouds,"  University of California,
      Lawrence Livermore Laboratory Report UCRL-50179 (1966).
19.   I. H. Blifford  and D. A.  Gillette,  "Applications of the Lognormal
      Frequency Distribution to the Chemical  Composition and Size
      Distribution of Naturally Occurring Atmospheric Aerosols," Water,
      Air and Soil Pollution,  !_,  106 (1971).
20.   E. A. Shuck, J. N. Pitts, and J. K. S. Wan, "Relationships Between
      Certain Meteorological Factors and Photochemical Smog," Intern. J.
      Air Water Pollution, K>, 689 (1966).
21.   R.  L. Mitchell,  "Permanence of the  Lognormal Distribution,"
      J. Opt.  Soc. Am., _58,  1267 (1968).
22.   D.  S. Lynn, "Fitting Curves to Suspend Particulate Data," Proceed-
      ings  of the Symposium  on Statistical Aspects of Air Quality Data,
      Chapel Hill, North Carolina, November 1973.

-------
                                                                   82
23.   P. G. Milokaj, "Environmental Applications of the Weibull
      Distribution Function:  Oil Pollution," Science,  176  1019 (1972).
24.   R. E. Barlow,  "Averaging Time and Maxima for Air Pollution
      Concentration," NTIS AD-729 413, ORC 71-17.
25.   W. Weibull, "A Distribution Function of Wide Applicability,"
      J. Appl.  Mech., 293 (1951).
26.   E. Lawrence,  "Urban Climate and Day of the Week," Atmos.
      Environ., 5, 935 (1971).
27.   J. C. Gower,  "A Comparison of Some Methods of Cluster
      Analysis,"Biometrics,  23_,  623 (1967).
28.   R. G. Miller,  "Statistical Predictions by Discriminant Analysis,"
      Meteorol. Monographs, 4_, 25 (1962).
29.   C. L. Smalley, "A Survey of Air Flow Patterns in the San
      Francisco Bay Region 1952-1955," Bay Area Air Pollution Control
      District Technical Services Division Report.
30.   R. Thullier, "Air  Quality Statistics in Land Use Planning Applica-
      tions," 3rd Conf.  on Probability and Statistics in Atmospheric
      Science,  Boulder,  Colo.,  June 19-22,  1973.
31.   W. B. Johnson,  "The Status of Air Quality  Simulation Modeling,"
      Proceedings of the Interagency Conference  on the Environment,
      Livermore, California, October 1972.
32.   R. Lange,  personal communication, 1973.

-------
                                   TECHNICAL REPORT DATA
                            (Please read Instructions on the reverse before completing)
 1. REPORT NO.
  EPA-650/4-75-004
                                                           3 RECIPIENT'S ACCESSION-NO.
4. TITLE AND SUBTITLE
  STUDIES OF POLLUTANT CONCENTRATION FREQUENCY
  DISTRIBUTIONS
             5. REPORT DATE
               January 1975
             6. PERFORMING ORGANIZATION CODE
 '. AUTHOR(S)
  Richard I. Pollack
                                                           8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
  Lawrence Livermore Laboratory
  University of California
  Livermore, California  94550
             10. PROGRAM ELEMENT NO.

               1AA009
             11. CONTRACT/GRANT NO.
 12. SPONSORING AGENCY NAME AND ADDRESS
  Office of Research and Development
  U.S.  Environmental Protection  Agency
  Research Triangle Park, N.C.   27711
              13. TYPE OF REPORT AND PERIOD COVERED
               Final
             14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16. ABSTRACT
                 air pollution research focused on determining the  identity of the con-
  centration distributions for a variety of pollutants and locations  and  the relation-
  ships  between attributes of the  data,  e.g.  mean values, maximum levels  and averaging
  times,  from an empirical standpoint.   This  report attempts to identify  the nature of
  the  frequency distributions for  both  reactive and inert pollutants,  for both point and
  area sources, and to some extent for  different types of atmospheric  conditions  using a
  substantially non-empirical approach.   As an illustration of the  applicability  of thes
  results,  a predictive model and  a monitoring scheme are proposed  based  upon knowledge
  developed by studying the frequency distributions.
      It is found that a theory of the  genesis of pollutant concentrations based upon
  the  Fickian diffusion equation predicts that concentration distributions  due to area
  sources will be approximately lognormal over a diurnal cycle in the  absence of  nearby
  strong  sources.  It is determined that  reactive pollutants will have larger standard
  geometric deviations than relatively  inert  pollutants.  Empirical observations  are in
  good agreement with these results.  The frequency distribution of the logarithms of
  concentrations due to point sources is  derived and shown to be a  sum of normal  and chi
  squared components, with the identity  of the dominant term determined by  meteorologica
  conditions.   This result provides a framework for resolving apparently  conflicting re-
  sults in  the literature.  The lognormality  of other meteorological variables, notably
  windspeeds and the rate of energy dissipation in turbulent flow,  and their relation tc
  air  quality frequency distributions is  discussed.  There is considerable  discussion in
  the  literature concerning whether the  lognormal distribution provides the best  fit.
  Other distributions that fit air quality data fairly well are investigated,  and their
  mathematical similarity to the lognormal is demonstrated.
                               KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
 Air pollutants
 Frequency distribution
 Monitoring
 Modeling
                                             b.IDENTIFIERS/OPEN ENDED TERMS  C. COSATI Field/Group
 8. DISTRIBUTION STATEMENT
 Unlimited
                                             19 SECURITY CLASS (This Report)
                                               Unclassified	
                           21. NO. OF PAGES
                                94
20 SECURITY CLASS (This page)

  Unclassified	
                                                                        22. PRICE
EPA Form 2220-1 (9-73)                            83


 U.S. GOVERNMENT PRINTING OFFICE! 1975 - 640-881/659 - Region 4

-------

-------