The maximum likelihood approach to probabilistic modeling of air quality data

&EPA
             United States
             Environmental Protection
             Agency
            Environmental Monitoring and Support EPA-600 4-79-044
            Laboratory           July 1979
            Research Triangle Park NC 27711
             Research and Development
The Maximum
Likelihood Approach
to Probabilistic
Modeling  of Air
Quality Data

-------
                RESEARCH REPORTING SERIES

Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gories were established to facilitate further development and application of en-
vironmental technology.  Elimination  of traditional grouping was  consciously
planned to foster technology transfer and a maximum interface in related fields.
The nine series are:

      1.  Environmental Health Effects Research
      2  Environmental Protection Technology
      3.  Ecological Research
      4.  Environmental Monitoring
      5.  Socioeconomic Environmental Studies
      6.  Scientific  and Technical Assessment Reports (STAR)
      7.  Interagency Energy-Environment Research and Development
      8.  "Special" Reports
      9.  Miscellaneous Reports

This report has been assigned to the ENVIRONMENTAL MONITORING  series.
This series describes research conducted to develop new or improved methods
and instrumentation for the identification and quantification of environmental
pollutants at the lowest conceivably significant concentrations. It also includes
studies to determine the ambient concentrations of pollutants in the environment
and/or the variance of pollutants  as a function  of time or meteorological factors.
 This document is available to the public through the National Technical Informa-
 tion Service, Springfield, Virginia 22161.

-------
       THE MAXIMUM LIKELIHOOD APPROACH TO
   PROBABILISTIC MODELING OF AIR QUALITY DATA
                        by
    Terence Fitz-Simons and David M. Holland
Environmental Monitoring and Research Laboratory
         Environmental Protection Agency
       Research Triangle Park, N. C. 27711
 ENVIRONMENTAL MONITORING AND SUPPORT LABORATORY
       OFFICE OF RESEARCH AND DEVELOPMENT
      U.S.  ENVIRONMENTAL PROTECTION AGENCY
       RESEARCH TRIANGLE PARK, N.  C.  27711

-------
                                 DISCLAIMER
     This report has been reviewed by the Environmental Monitoring and
Support Laboratory, U.S. Environmental Protection Agency, and approved
for publication.  Mention of trade names or commercial products does
not constitute endorsement or recommendation for use.

-------
                                  FOREWORD

     Measurement and monitoring research efforts are designed to anticipate
potential environmental problems, to support regulatory actions by developing
an in-depth understanding of the nature and processes that impact health and
the ecology, to provide innovative means of monitoring compliance with regu-
lations and to evaluate the effectiveness of health and environmental pro-
tection efforts through the monitoring of long-term trends.  The Environmental
Monitoring and Support Laboratory, Research Triangle Park, North Carolina, is
responsible for development of:  environmental  monitoring technology and
systems; agency-wide quality assurance programs for air pollution measurement
systems; and technical support to the Agency's operating function including;
the Office of Air, No'ise and Radiation, the Office of Toxic Substances and
the Office of Enforcement.

     In order for the laboratory to effectively analyze the data generated
by these activities, statistical assumptions must be made regarding the
underlying distribution of these data.  This report presents a tool to aid
in evaluating these assumptions.  Any changes in the material presented in
this report will be presented in future reports.
                                                Thomas R. Hauser
                                                   Director
                                          Environmental Monitoring and
                                               Support Laboratory

-------
                                ABSTRACT
     Software developed by the authors using maximum likelihood estimation
to fit six probabilistic models is presented.  The software is designed
as a tool for the air pollution researcher to determine what assumptions
are valid in the statistical analysis of air pollution data for the
purposes of standard setting, roll-back calculations, estimation of
maximum concentrations, threshold approximations, and handling missing
observations.  The program fits user's data to the normal distribution,
the 3-parameter lognormal distribution, the 3-parameter Weibull distri-
bution, the 3-parameter gamma distribution, the Johnson SR distribution
(a 4-parameter lognormal distribution), and the 4-parameter beta distri-
bution.  The parameters are estimated using standard closed solutions
to maximizing equations, Gaussian elimination to solve non-linear maxi-
mizing equations where possible, and a golden section search for all
other parameters.  Graphical output contains a histogram of the data
superimposed by the fitted density for each model.  Six goodness-of-fit
criteria are supplied and ranked by the program to aid in the selection
of the most appropriate choice among the six models.  These criteria are
absolute deviations (AD statistic), weighted absolute deviations (WAD
statistic), Kolmogorov-Smirnov statistic, Cramer-von Mises-Smirnov
statistic, the log-likelihood function, and the observed significance
level of the Chi-square goodness-of-fit test.  The results of applying
the program to several subsets of the Los Angeles Catalyst Study data
base are presented.
                                      •fv

-------
                                 CONTENTS

Foreword	iii
Abstract	   iv
Figures	   vi
     1.  Introduction  	    1
     2.  Methodology 	    2
     3.  Comparing Distributions 	    4
     4.  Application of MAXFIT 	    7
     5.  Future Developments 	   25
     6.  Summary	   27
References	   28
Appendix A	   30
Appendix B	   33

-------
                                  FIGURES
Number                                                                   Page
  1  MAXFIT Output:  Fit of normal distribution for data in
          Example 1 ..... 	    10
  2  MAXFIT Output:  Fit of 3-parameter lognormal distribution
          for data in Example 1	    11
  3  MAXFIT Output:  Fit of Gamma distribution for data in Example 1  .    12
  4  MAXFIT Output:  Fit of Wei bull distribution for data in
          Example 1	    13
  5  MAXFIT Output:  Fit of Johnson Sb distribution for data in
          Example 1	    14
  6  MAXFIT Output:  Fit of Beta distribution for data in Example 1 .  .    15
  7  MAXFIT Output:  Fit of the normal distribution for data in
          Example 2	    18
  8  MAXFIT Output:  Fit of the 3-parameter lognormal distribution
          for data in Example 2 	    19
  9  MAXFIT Output:  Fit of the Gamma distribution for data in
          Example 2	'.    20
 10  MAXFIT Output:  Fit of the Weibull distribution for data in
          Example 2	    21
 11  MAXFT Output:  Fit of the Johnson SR distribution for data in
          Example 2 . .	    22
 12  MAXFIT Output:  Fit of the Beta distribution for data in
          Example 2	    23

                                TABLES
Number                                                                  Page
  1  Available Distributions and Identifying Numbers  	    8
  2  MAXFIT Output:  Comparison Statistics for Data 1n Example 1   ...   16 -
  3  MAXFIT Output:  Comparison Statistics for Data in Example 2   ...   24

-------
                                 SECTION 1

                               INTRODUCTION


     There is a growing interest in the application of probability models to
air quality data in areas such as:

          standard setting,
          emission roll-back calculations,
          estimation of maximum concentrations,
          threshold approximations, and
          handling missing observations.

Selection of a valid probability model to describe air quality data is a
difficult problem involving many factors.  Larsen (1,2,3,4,5) proposed the
two-parameter and three-parameter univariate log-normal  models for describing
air quality data collected in urban areas.  Mage and Ott (6,7) introduced
censorship to the three-parameter lognormal probability model whenever the
third parameter was negative.  These authors concluded that this model was a
superior, general purpose model suitable for a large variety of environmental
phenomena.  Standard logarithmic probability paper was used to fit their
proposed probability model.  Curran and Frank (8) used semi-log graph paper
to plot l-F(x) vs. pollutant concentration as a technique for examining the
behavior of the upper tails of air quality data.  Curran and Frank pointed
out that fitting techniques based upon two selected percentiles may be sensi-
tive to the choice of an underlying distribution, and that the stability of
these estimates for varying choices of percentiles needs investigation.

     Computer software providing efficient estimation of parameters and
evaluation of goodness-of-fit for several probability modes for ambient air
quality data has not been available.  A computer program, MAXFIT, is presented
here for the purpose of identifying a suitable probability model to describe
ambient air quality data using maximum likelihood techniques.

-------
                                 SECTION 2

                                METHODOLOGY


     The approach employed Is maximum likelihood estimation.  This is a
simple method, yielding estimates for almost every model imaginable, and its
estimators have several statistically desirable properties.  The various
distribution models are compared by goodness-of-fit statistics providing
an objective basis for choosing one of the six models for a particular
application.

     The main idea behind maximum likelihood estimation may be stated very
simply.  Given a random sample from a population distributed according to a
density f(x) with parameters a, 6, and X (f[x;a,e,X]), the maximum likelihood
estimates (a,i, and X) are found such that the likelihood that the random
sample came from f(x;a,0,X) is greater than or equal to the likelihood that
the sample came from f(x;a,B,x); where a, B, and X are any other estimates of
(», 6 and X.

     Under certain assumptions discussed in Rao (9), maximum likelihood
estimators have statistically desirable characteristics.  First, they are
consistent.  Consistent estimators become "better" as the sample size in-
creases.  If biased, the bias decreases as the sample size gets larger.  The
variance of the estimators lessens as the sample size increases.  Second,
they converge to the true value of the parameters with a probability equal
to one.  If the entire population were included, the estimators would equal
the parameters with a probability equal to one.  Third, as n (sample size)
approaches infinity, sampling distribution of the estimators approaches
normal distribution.

     The method of fitting using maximnn likelihood is described below:

     •    The likelihood function 1s defined as the joint density of the
          random sample.

     t    The logarithm of the likelihood function is determined.  This
          facilitates differentiation techniques and since the logarithm of
          X Increases whenever X does, the likelihood and the log-likelihood
          reach a maximum at the same point.

     •    Differentiation techniques are employed to maximize the log-
          likelihood.  If these techniques fail to yield equations which can
          be solved algebraically, the correct values of the parameters must
          be found by an iterative search.  This usually requires a computer.

-------
     FORTRAN Software utilizing these concepts was developed based on a
program written by Schreuder, et al (10) making use of several IMSL (Inter-
national Mathematical and Statistical Library, 1975) subroutines.  While
modifying this program to run on the EPA UNIVAC 1110 situated at Research
Triangle Park, North Carolina, several new capabilities were incorporated.
These include the estimation of the lower boundary parameter in the 3-
parameter lognormal (Johnson S^), gamma, Weibull, the 4-parameter beta, and
lognormal (Johnson SB) distributions.  Boundary parameters are estimated by
a golden section search (11).  The golden section search is a fairly effi-
cient technique, accomplishing a 38 percent reduction in the interval  of
uncertainty for each iteration and requiring one evaluation of the function
per iteration.  Graphic output was also included to provide a visual  display
of the fit.

     A list of the distributions that were fitted and the estimates used to
fit them is shown in Appendix A.  All summations are from 1 to the number of
observations in the data set (n).

-------
                                 SECTION 3

                          COMPARING DISTRIBUTIONS


     The program MAXFIT supplies the user with six goodness-of-fit statistics
as a basis for comparing fitted distributions.  The statistics supplied are:
absolute deviations, weighted absolute deviations, the observed significance
level of the Chi -square (x2) test,  Kolmogorov-Smirnov statistic, Cramer-
von Mises-Srnirnov statistic, and the maximum value of the log- likelihood
function.  All these statistics are computed upon grouped data.  However, the
log- likelihood function is also computed using ungrouped data when the
original data are available.

     It should be noted that model validity tests, are designed to reject the
null hypothesis that the data follow a certain distribution as few times as
possible when the hypothesis is true.  Therefore, these statistics are designed
to indicate with some certainty when data do not follow the distribution
stated in the null hypothesis.  However, the literature discloses no other
method to reveal how closely the data fit a distribution.  The six various
statistics are defined and discussed briefly below.


ABSOLUTE DEVIATIONS (AD)

     This statistic puts less emphasis on large deviations between expected
and observed frequencies than does the classic x2 statistic.
where  P  . = observed proportion of data falling in the i   interval
                                                         th
       P  . = expected proportion of data falling in the 1   interval
          n = total number of observations in the data set
          N = total number of intervals.


WEIGHTED  ABSOLUTE DEVIATIONS (WAD)

     The  WAD statistic puts less emphasis upon the fit in the tails of the
distribution.

-------
CHI-SQUARE STATISTIC (x2 )

     The x2 statistic is very popular and widely used; however, it has poor
power characteristics and is considered unreliable unless intervals have
expected frequencies equal to or greater than five.
                           Pei
KOLMOGOROV-SMIRNOV STATISTIC (d)

     The Kolmogorov-Smirnov statistic is an empirical distribution function
(EOF) statistic.  EOF is the observed cumulative distribution function.
The statistic is the maximum distance between the observed and the expected
cumulative frequency distribution.
where  Sn  = observed cumulative probability at the i   observed value
        ni

       F   = expected cumulative probability at the i   observed value



CRAMER-VON MISES-SMIRNOV STATISTIC (W2) .

     The Cramer-von Mises-Smirnov statistic is the sum of squares of the
distance between observed cumulative distribution and the expected cumulative
distribution function weighted by the expected probabilities.   This test puts
less emphasis upon the tails of the distribution when measuring the fit.
LIKELIHOOD

     The likelihood is also presented as a goodness-of-fit statistic.   If we
are committed to fitting, using the maximum likelihood criteria,  it is con-
sistent to use the larger likelihood functions as a criterion for choosing
between two or more distributions having the same number of parameters.  Such
a comparison is also the basis of the Neyman-Pearson theorem (9)  dealing with
the construction of a powerful test of a simple hypothesis involving two
distributions.

-------
     The formulas of the log-likelihood functions of the various distribu-
tions are shown as follows where the superscript ~ represents the MLE
estimates of the parameters and the summations are from 1 to n.
•  Normal distribution:
                      Ijj-O+logl&l+logla2])

t  Johnson Si distribution:
•  Johnson SB distribution:
                     -n(l/2-log[{c-a/a]-Hog[2ir]/2+Zlog[c-xi]/rH-Elog[x1-X]/n)
•  Gamma distribution:
                -n(alog[e]+log[r{a}J-[o-l]zlog[x1-x]/n+_

•  Weibull  distribution:
                 -n(l-[;-l]zlog[xrx]-«-[a-l]log[8]-log[a/e])
                              _

t  Beta  distribution:
                        n(log[r{a+g}/{r(a)r(&)}]-[a+8-lJlog[Ml
                        +[a-l]Elog[x1-X]+[0-l]Elog[C-x1J

-------
                                 SECTION 4

                           APPLICATION OF MAXFIT
INPUTS

     Subroutine USER must be supplied by the user to pass certain control
variables to MAXFIT.  An example of this subroutine is given in Appendix B.

     The subroutine argument list must be as follows:

     SUBROUTINE USER (X,P,XB,NGRP,TF,N,NC,BEGIN,STEP,TOL,TITLE,XMIN,XMAX,
     NDIST).

     The variables that must be provided by the user are defined to  be:

    X =   array of length m *2000 of class midpoints if the  data are grouped;
          into classes or individual  observations if the data are not grouped.

    P =   an array of length n ^100 of class probabilities for data  grouped
          into classes.  This array is not defined for ungrouped data.

   XB =   an array of length p = n+1, consisting  of the lower limits of  each
          class with the last element being the upper  limit  of the last
          class.  It is only needed if NGRP=1.   Two additional  classes will
          be created by MAXFIT to cover the entire range of  the distribution
          being fitted.

 NGRP =   1  if data are grouped, = 0 if data are  individual  observations.
          This variable indicates whether the statistics calculated  by MAXFIT
          are to be based on individual observations or on class frequencies.

NDIST =   an array of length 6, containing the identifying numbers for the
          desired distributions.

   NC =   number of classes if class limits, XB,  are specified by the user.
          These limits must be specified if NGRP=1.  If NC=0, MAXFIT calcu-
          lates class limits based on value of BEGIN and STEP.   The  last
          class calculated by MAXFIT contains the maximum observed data
          point.

BEGIN =   The lower boundary of the first class for which predictions are
          desired.  Default value is slightly less than the  observed data
          minimum value.

-------
 STEP =   The class width.  Default value is the (data range)/12.

  TOL =   primary cutoff point used in the searching subroutines.  The
          Optional, default value is .0001.

TITLE =   an array of length 10 which contains a 60-character title in elements
          of length 6.  Default value Is blanks.

    N =   number of classes for which the user wants predictions if N6RP=1,
          or the number of observations if NGRP=0.

     If the user specifies the class limits, he gets predicted frequencies
for the range of classes he specifies and the variable N should indicate
the number of classes covered by that range.  If the user does not specify
classes, the program creates class limits based on the values of BEGIN and
STEP.  The variable TOL is used as the measure of precision desired in the
maximum likelihood estimation processes where interaction is required.  The
available distributions and their identifying numbers are in Table 1.


         TABLE 1.  AVAILABLE DISTRIBUTIONS AND IDENTIFYING NUMBERS

               Distribution             Distribution
                   Name                    Number
Normal
Lognormal
Gamma
Wei bull
SB
Beta
1
2
3
4
5
6
     The program provides the following output for each data set:

1.   Observed minimum and maximum data value, number of observations, mean,
     variance, and standard deviation, index of skewness (1/37), skewness
     squared (BI) and index of kurtosis (g2)-

2.   For each distribution fitted:

     •    Estimated parameters for the distribution (notations and descrip-
          tions of the distributions are given in Appendix A).

     t    By classes of the variable, the observed and predicted probabil-
          ities, the difference between them (residual), the cumulative
          observed and predicted probabilities, and the observed and predicted
          frequencies and their difference (residual frequency).

     •    A CALCOMP plot with the density of the fitted distribution super-
          Imposed upon the histogram of the data.

                                      8

-------
3.   A comparison of the distributions and their ranking from  best  (1)  to
     worst  (6) in terms of absolute deviations, weighted absolute deviations,
     Chi-square, Kolmogorov-Smirnov, Cramer-von Mises-Smirnov, and  log
     likelihood statistics.

     These  statistics are ranked not as goodness-of-fit statistics, but as
absolute measurements of fit.  They may be used to calculate valid  goodness-
of-fit statistics when degrees of freedom are taken into account.   The value
of the observed significance level is printed out for the Chi-square goodness-
of-fit test defined as the probability that a Chi-square random variable
will be greater than the observed value.  The observed value is calculated
regrouping classes to insure that the predicted frequency is greater than 5.
If N be the number of classes used to compute the statistic, the degrees of
freedom for each model are as follows:

                    normal                   N-3
                    3-parameter lognormal    N-4
                    3-parameter Gamma        N-4
                    3-parameter Wei bull      N-4
                    Johnson SB               N-5
                    4-parameter Beta         N-5.


EXAMPLES OF OUTPUT

Description of Data Sets

     A short-term study was conducted by the Environmental  Monitoring
and Support Laboratory (EMSL) at the Los Angeles Catalyst Study (LACS)  site
from 7/20/78 to 8/30/78.  Their purpose was to quantify the influence of NOX
emissions from automobiles on ozone (03) and nitrogen dioxide (N0£)  measure-
ments made in close proximity to the freeway (12).   The data base was strati-
fied into low, medium and high 03 categories by the background 03 level  at
the upwind site.

     Oxides of nitrogen (NO ) and N02 measurements  taken at the upwind LACS
site 3 were used to fit the probability models.   The ranges of the two data
sets are equivalent due to the low nitrogen oxide (NO) measurements  at the
upwind site.  However, the results of fitting probability models to  these two
data sets are very different.

     The first example is N02 measurements recorded at site 3.   The  92
observations range from 0.007 to 0.059 ppm.   The output of MAXFIT is shown in
Figures 1-6 and Table 2.  The Weibull  distribution  appears to be the "best"
model for this data.  It has the highest log-likelihood value,  the highest
observed significance level of the Chi-square test  and does fairly well  when
compared to other models by the remaining fitting criteria.  The Weibull is
also a useful model since data can be transformed to an exponential  distri-
bution and analyses may be performed assuming an exponential distribution.

     The second example is NO  measurements recorded at site 3.  Again, the
92 observations range from 0.007 to 0.059 ppm.  The output of MAXFIT for this

-------
  LACS BITE 3  NO2  HIGH 03 LEVEL
  OBS.MIN.X -
  OBS.MAX.X -
  NO. OF OB8. -
     .007000
     .060000
      92.0000
                MEAN"   .0246
                VARIANCE -   .0001
                STANDARD DEV. -
                     .0112
      INDEX OF SKEWNESS -  .7315
      INDEX OF KURTOSIS -  3.3555
      SKEWNESS SQUARED -  .5361
  NORMAL DISTRIBUTION
   MEAN-  .0246
   VARIANCE-  .0001
   STANDARD DEVIATION -  .0112
   X1
X2
INFINITY -
.007-
.011-
.016-
.020-
X>24-
.029-
.033-
.037-
.042-
.046-
.060-
.066-
.069-
.007
.011
.016
.020
.024
.029
.033
.037
.042
.046
.060
.065
.069
.063
  .063-INFINITY

LACS SITE 3 NO2
OBSERVED
         .000000
         .086957
         .119666
         .195662
         .173913
         .064348
         .130435
         .130436
         .043478
         .021739
         .010670
         .010870
         .010870
         .010870
         .000000
PREDICTED   RESIDUAL
                .056922
                .069663
                .094116
                .128129
                .160290
                .151885
                .132252
                .099218
                .064132
                .035715
                .017136
                .007083
                .002523
                .000774
                .000262
              -.056921
               .027394
               .026449
               .067523
               .023623
              -.097538
              -.001817
               .031217
              -.020664
              -.013976
              -.006266
               .003786
               .008347
               .010096
               .000262
CUMULATIVE
 OBSERVED

   .000000
   .086967
   .206622
   .402174
   .576087
   .630435
   .760870
   .891304
   .934783
   .966522
   .967391
   .978261
   .989130
  1.000000
  1.000000
CUMULATIVE    OBSERVED
 PREDICTED   FREQUENCY
    .056922
    .116485
    .210601
    .338730
    .489020
    .640905
    .773168
    .672375
    .936507
    .972222
    .989358
    .996441
    .998964
    .999738
   1.000000
       HIGH OZ LEVEL
            NORMAL DISTRIBUTION
        .000
       8.000
       11.000
       18.000
       16.000
       5.000
       12.000
       12.000
       4.000
       2.000
       1.000
       1.000
       1.000
       1.000
        .000

MU - 0.0246
SIGMA2 - 0.0001
 PREDICTED
FREQUENCY

      5.237
      6.480
      8.659
     11.788
     13.827
     13.973
     12.167
      9.128
      5.900
      3.286
      1.577
       .652
       .232
       .071
       .024
 RESIDUAL
FREQUENCY

 -5.237
  2.520
  2.341
  6.212
  2.173
- 8.973
-  .167
  2.872
- 1.900
- 1.286
-  .577
   .348
   .768
   .929
  -.024
    46.15 _
     37.63



     30.10


g   22.68
UL

     16.06


     7.63


     0.00
              .007    .011    .016    .020    .024    .029    .033   .037   .042   .046    .050    .055    .059   .063
                           FIGURE 1  MAXFIT OUTPUT: FIT OF NORMAL DISTRIBUTION FOR DATA IN EXAMPLE 2

-------
    LACS SITE 3 N02   HIGH 03 LEVEL

    LOG NORMAL DISTRIBUTION

      MEAN =   -3.5020
      VARIANCE =   .1213
      STANDARD DEVIATION =  .3483
      SEARCHED LOWER BOUND =   -.007367
XI

-.007-
.007-
.011-
.016-
.020-
.024-
.029-
.033-
.037-
.042-
.048-
.050-
.055-
.059-
X2

.007
.011
.016
.020
.024
.029
.033
.037
.042
.046
.050
.055
.059
.063
.063-INFINITY
OBSERVED

.000000
.086957
.119565
.195652
.173913
.054348
.130435
.130435
.043478
.021739
.010870
.010870
.010870
.010870
.000000
PREDICTED

.016618
.068441
.134676
.170804
.166827
.138416
.103334
.071926
.047761
.030723
.019347
.012015
.007397
.004532
.007182
RESIDUAL

-.016618
.018516
-.015111
.024848
.007086
-.084068
.027101
.058509
-.004283
-.008984
-.008478
-.001145
.003472
.006338
-.007182
CUMULATIVE
OBSERVED
.000000
.086957
.206522
.402174
.576087
.630435
.760870
.891304
.934783
.956522
.967391
.978261
.989130
1.000000
1.000000
CUMULATIVE
PREDICTED F
.016618
.085069
.219735
.390539
.557366
.695782
.799116
.871042
.918803
.949526
.968874
.980889
.988286
.992818
1 .000000
OBSERVE
REQUENI
.000
8.000
11.000
18.000
16.000
5.000
12.000
12.000
4.000
2.000
1.000
1.000
1.000
1.000
.000
                                                                                                     1.529
                                                                                                     6.297
                                                                                                    12.390
                                                                                                    15.714
                                                                                                    15.348
                                                                                                    12.734
                                                                                                     9.507
                                                                                                     6.617
                                                                                                     4.394
                                                                                                     2.827
                                                                                                     1.780
                                                                                                     1.105
                                                                                                      .681
                                                                                                      .417
                                                                                                      .661
                                                                                                     -1.529
                                                                                                      1.703
                                                                                                     -1.390
                                                                                                      2.286
                                                                                                        .652
                                                                                                     -7.734
                                                                                                      2.493
                                                                                                      5.383
                                                                                                      -.394
                                                                                                      -.827
                                                                                                      -.780
                                                                                                      -.105
                                                                                                        .319
                                                                                                        .583
                                                                                                      -.661
    LACS SITE 3 N02   HIGH OZ LEVEL
                          LNORM3 DISTRIBUTION
F(X)
45.15


37.63


30.10



22.58


15.05


7.53


0.00
                                                                              MU= -3.5020
                                                                              SIGMA * =0.1213
                                                                              LOWER BOUND =  -0.0074
                    .007    .011   .016    .020    .024   .029   .033    .037    .042    .046   .050    .055   .059    .063

                                                             X
                  FIGURE 2 MAXFIT OUTPUT: FIT OF 3-PARAMETER LOG NORMAL DISTRIBUTION FOR DATA IN EXAMPLE 1

-------
           LACS SITE 3 NO2  HIGH 03 LEVEL
          GAMMA DISTRIBUTION
            ALPHA-  3.1864
            BETA-  .0066
            SEARCHED  LOWER BOUND-
            X1
   X2
OBSERVED
.004-
.007-
.011-
.016
.020-
.024-
.029-
.033-
.037-
.042-
.046-
.050-
.055-
.069-
.007
.011
.016
.020
.024
.029
.033
.037
.042
,046
.060
.065
.069
.063
.063-INFINITY
.000000
.086957
.119566
.196662
.173913
.064348
.130436
.130436
.043478
.021739
.010870
.010870
.010870
.010870
.000000
.003606
PREDICTED
.010620
.070766
.146624
.166444
.167326
.120842
.098862
.071127
.049068
.032771
.021331
.013597
.008618
.006258
.007938

RESIDUAL
-.010529
.007190
-.026069
.027208
.016588
-.075494
.031673
.069308
- .005590
- .011032
- .010461
- .002727
.002362
.005611
-.007938

CUMULATIVE
OBSERVED
.000000
.086967
.206822
.402174
.576087
.630436
.760870
.891304
.934783
.956522
.967391
.978261
.989130
1.000000
1.000000

CUMULATIVE
PREDICTED
.010529
.090295
.236919
.404363
361688
.691630
.790393
.861620
.910588
.943369
.964690
.978286
.986804
.992062
1.000000

OBSERVED
FREQUENCY
.000
8.000
11.000
18.000
16.000
5.000
12.000
12.000
4.000
2.000
1.000
1.000
1.000
1.000
.000

PREDICTED
FREQUENCY
.969
7.338
13.397
15.497
14.474
11.945
9.095
6.544
4.614
3.015
1.962
1.251
.784
.484
.730

RESIDUAL
FREQUENCY
-.969
.662
-Z397
2.503
1.526
-6.945
2.905
5.456
- .514
-1.015
- .962
- .251
.216
.516
-.730
ro
LACS SITE 3 N02  HIGH OZ LEVEL
                    GAMMA DISTRIBUTION
                                                                                      LOWER BOUND-  0.0036
                                                                                      ALPHA- 3.1864
                                                                                      BETA-  0.0066
       FIX)
                        .007    .011   .016   .020   .024   .029   .033   .037    .042   .046   .050    .055
                0.00
                                                                                     .059   .063
                           FIGURE 3 MAXFIT OUTPUT:  FIT OF GAMMA DISTRIBUTION FOR DATA IN EXAMPLE 1

-------
    LACS SITE 3 NO2  HIGH 03 LEVEL
    WEIBULL DISTRIBUTION
      BETA=   .0212
      C=  1.732S
      SEARCHED LOWER BOUND =
                    .005694
     X1
X2
OBSERVED
.006-
.007-
.011-
.016-
.020-
.024-
.029-
.033-
.037-
.042-
.046-
.050-
.055-
.059-
.007
.011
.016
.020
.024
.029
.033
.037
.042
.046
.050
.055
.059
.063
.063-INFINITY
.000000
.086957
.119565
.195652
.173913
.054348
.130435
.130435
.043478
.021739
.010870
.010870
.010870
.010870
.000000
PREDICTED

.007848
.087676
.141050
.159711
.153660
.132469
.104811
.077111
.053188
.034591
.021298
.012454
.006934
.003683
.003525
RESIDUAL

-.007848
-.000719
-.021485
.035941
.020263
-.078121
.025624
.053324
-.009710
-.012852
-.010429
-.001585
.003936
.007186
-.003525
CUMULATIVE
OBSERVED
.000000
.086957
.206522
.402174
.576087
.630435
.760870
.891304
.934783
.956522
.967391
.978261
.989130
1.000000
1.000000
CUMULATIVE
PREDICTED
.007848
.095524
.236574
.396285
.549935
.682405
.787216
.864326
.917515
.952105
.973404
.985858
.992792
.996475
1 .000000
OBSERVED
FREQUENCY
.000
8.000
11.000
18.000
16.000
5.000
12.000
12.000
4.000
2.000
1.000
1.000
1.000
1.000
.000
PREDICTED
FREQUENCY
.722
8.066
12.977
14.693
14.136
12.187
9.643
7.094
4.893
3.182
1.959
1.146
.638
.339
.324
RESIDUAL
FREQUENCY
-.722
- .066
-1.977
3.307
1.864
-7.187
2.357
4.906
- .893
-1.182
- .959
- .146
.362
.661
-.324
        LACS SITE 3 NO2   HIGH OZ LEVEL
                               WEIBULL DISTRIBUTION
        45.15P
                                                                      LOWER BOUND =  0.0057
                                                                      ALPHA = 1.7325
                                                                      BETA =  0.0212
F(X)
                  .007   .011   .016    .020   .024    .029    .033   .037   .042   .046   .050    .055    .059   .063

                                                         X

                          FIGURE 4 MAXFIT OUTPUT: FIT OF WEIBULL DISTRIBUTION FOR DATA IN EXAMPLE 1

-------
LACS SITE 3 NO2

SB DISTRIBUTION
                     HIGH 03 LEVE L
FIX)
      MEAN-  -1.0364
      VARIANCE-  .6181

      MAX-  .079366
      MIN- .002743
X1 X2 OBSERVED PREDICTED RESIDUAL
CUMULATIVE CUMULATIVE OBSERVED PREDICTED RESIDUAL
OBSERVED PREDICTED FREQUENCY FREQUENCY FREQUENCY
.003- .007
.007- .011
.011- .016
.016- .020
.020- .024
.024- .029
.029- 433
.033- .037
000000 .011040 -.011040
086957 .083043 .003913
119565 .144029 - .024464
195652 .161326 .034327
173913 .150945 .022968
054348 .128165 - .073807
130435 .101948 .028486
130435 .076939 .063496
.037- .042 .043478 .065223 -.011744
.042- .046
.046- .050
021739 .037540 - .016800
010870 .023917 - .013047
.060- .065 .010870 .014018 - .003148
.055- .059 .010870 .007325 .003645
.069- .063
.063- .079
010670 .003231 .007638
000000 .001324 -.001324
.000000
.088967
.206522
.402174
.576087
.630436
.760870
.891304
.934783
.956622
.967391
.978261
.989130
1.000000
1.000000
.011040 .000 1.016 -1.016
.094084 8.000 7.640 .360
.238112 11.000 13.251 -2.251
.399437 18.000 14.842 3.158
.660382 16.000 13.887 2.113
.678637 5.000 11.790 - 6.790
.780485 12.000 9.379 2.621
.857424 12.000 7.078 4.922
.912646 4.000 5.080 - 1.080
.950186 2.000 3.454 ~ 1.454
.974103 1.000 2.200 - 1.200
.988120 1.000 1.290 - .290
.996445 1.000 .674 .326
.998676 1.000 .297 .703
1.000000 .000 .122 -.122
LACS SITE 3 N02 HIGH OZ LEVEL
«1R i—
• ID

37.63
30.10
22.58
16.06
7.63
0.00



-
-
-
- y
^/
SB DISTRIBUTION



/
/
I


^••MIM^















^







.
^s









X
\s




MU- -1.0364
SIGMA 2 - 0.6181
LOWER BOUND = 0.0027
1 IDDCD Dftl IMn n A mQA
UrrcH bUUIMU *• U.U/U*»

^>.
^v^
	 "T 	 	
1 1 ^1-- — 4 	 1
                   .007   .011   .016    .020   .024  .029    .033    .037   .042    .046   .050    .055   .059   .063

                                                           X


                      FIGURES MAXFIT OUTPUT:  FIT OF JOHNSON  SB   DISTRIBUTION FOR DATA IN EXAMPLE 1

-------
LACS SITE 3 NO2  HIG H 03 LEVE L

BETA DISTRIBUTION

 ALPHA-  1.8673909
 BETA" 6.5897199

 MAX-  .081378
 MIN- .005647
XI X2 OBSERVED PREDICTED RESIDUAL CUMULATIVE CUMULATIVE OBSERVED PREDICTED RESIDUAL
OBSERVED PREDICTED FREQUENCY FREQUENCY FREQUENCY
.006- .007 .000000 .008148 -.008148 .000000
.007- .011 .086957 .092974 -.006018 .086957
.011- .016 .119565 .143431 -.023866 .206522
.016- .020 .195652 .155967 .039685 .402174
.020- .024 .173913 .147240 .026673 .576087
.024- .029 .054348 .127375 -.073027 .630435
.029- .033 .130435 .103017 .027417 .760870
.033- .037 .130435 .078469 .051966 .891304
.037- .042 .043478 .056330 - .012852 .934783
.042- .046 .021739 .037964 -.016215 .966522
.046- .050 .010870 .023793 - .012923 .967391
.050- .055 .010870 .013676 - .002806 .978261
.055- .059 .010870 .007042 .003827 .989130
.059- .063 .010870 .003129 .007741 1.000000
.063- .081 .000000 .001454 -.001454 1.000000
.008148 .000 .750 - .750
.101122 8.000 8.554 - .554
.244563 11.000 13.196 - 2.196
.400520 18.000 14.349 3.651
.547760 16.000 13.546 2.454
.675135 5.000 11.718 -6.718
.778152 12.000 9.478 2.522
.856621 12.000 7.219 4.781
.912951 4.000 5.182 - 1.182
.950906 2.000 3.492 - 1.492
.974699 1.000 2.189 - 1.189
.988375 1.000 1.258 - .258
.995417 1.000 .648 .352
.998546 1.000 .288 .712
1.000000 .000 .134 -.134
LACS SITE 3 NO2 HIGH OZ LEVEL

45.15

37.63
Mm
. iu
FIX)
22.58
15.05
7.53
0.00
-

"
—
-
" I
/
BETA DISTRIBUTION


S
/
/










- !••






-\





^








\




^-


.007 .011 .016 .020 .024 .029 .033 .037


X

LOWER BOUND = 0.0056
UPPER BOUND = 0.0814
Al DUA — 1 QG~JA
MLrrlM ~ I.OD/f
BETA= 5.5897

\.
^r^
I r +- i
.042 .046 .050 .055 .059 .063

                 FIGURE 6 MAXFIT OUTPUT:  FIT OF BETA DISTRIBUTION FOR DATA IN EXAMPLE 1

-------
LACS SITE 3 NO 2   HIGH 03 LEVEL
                             NORMAL
LOG NORMAL
GAMMA
WEIBULL
SB
BETA
ABSOLUTE DEVIATION
WEIGHTED ABS. DEVIATION
CHI-SQUARE (*)
KOLMOGOROV-SMIRNOV
CRAMER-VON MISES-SMIRNOV
LOG LIKELIHOOD
.36327934 + 002
(6)
.36439217 + 001
(6)
.12294319 -001
(6)
.87066924 - 001
(6)
.18261896 + 000
(69
.28306338 + 003
(6)
.26840004 + 002
(1)
.26141019+001
(1)
.77316610-001
(2)
.65347441 - 001
(5)
.82409978 - 001
(5)
.28868319 + 003
(5)
.27568650 + 002
(3)
.27436755 + 001
(2)
.49182300 - 001
(3)
.61095604 - 001
(4)
.76377309 - 001
<4)
.28923631 + 003
(4)
.26914336 + 002
(2)
.27989979 + 001
(3)
.97144187 - 001
(1)
.51969778 - 001
(3)
.68976456 - 001
(2)
.29017593+003
(1)
.28404869 + 002
(4)
.28615729 + 001
(4)
.36629757 - 001
(4)
.48101939 - 001
(2)
.64916543-001
<1»
.28974790 + 003
(3)
.28944865 + 002
(5)
.29539632 + 001
(5)
.34430578 - 001
(5)
.44699905 - 001
(1)
.69541976 - 001
(3)
.29012675 + 003
(2)
* OBSERVED SIGNIFICANCE LEVEL
            TABLE 2 MAXFIT OUTPUT:  COMPARISON STATISTICS FOR DATA IN EXAMPLE 1

-------
data set is shown in Figures 7-12 and Table 3.  For this data set there is
no obvious "best" model.  Beta-distribution has the highest log-
likelihood value.  However, gamma distribution has the lowest Kolmogorov-
Smirnov statistic.  The 3-parameter lognormal shows up best in the Chi-square
test and weighted absolute deviations.  Therefore, the 3-parameter lognormal
distribution would probably serve as the "best" model  in this example
since it is a simple model to work with and scored reasonably well on several
fit criteria.
                                    17

-------
                 LACS SITE 3  NOX   HIGH 03 LEVEL
                 OBS.MIN.X =  .007000
                 OBS.MAX.X =  .059000
                 NO. OF OBS. =  92.0000

                 NORMAL DISTRIBUTION

                   MEAN =  .0252
                   VARIANCE =  .0001
                   STANDARD DEVIATION
                               MEAN =  .0252
                               VARIANCE"  .0001
                               STANDARD DEV. =  .0114
INDEX OF SKEWNESS =  .7463
INDEX OF KURTOSIS =  3.2460
SKEWNESS SQUARED =  .5570
                  X1
                          X2
         .0114

OBSERVED    PI
CO
INFINITY-
.007-
.011-
.016-
.020-
.024-
.029-
.033-
.037-
.042-
.046-
.050-
.055-
.059-
.007
.011
.016
.020
.024
.029
.033
.037
.042
.046
.050
.055
.059
.063
.063-INFINITY
.000000
.086957
.076087
.239130
.173913
.043478
.130435
.097826
.086957
.010870
.021739
.010870
.000000
.021739
.000000
EDICTED

.055187
.056838
.089794
.122932
.145847
.149950
.133602
.103156
.069023
.040022
.020110
.008756
.003304
.001080
.000401
RESIDUAL

- .055187
.030118
- .013707
.116199
.028066
- .106472
- .003167
- .005330
.017934
- .029152
.001629
.002114
- .003304
.020659
- .000401
CUMULATIVE
OBSERVED
.000000
.086957
.163043
.402174
.576087
.619565
.750000
.847826
.934783
.945652
.967391
.978261
.978261
1.000000
1.000000
CUMULATIVE
PREDICTED
055187
.112025
.201819
.324750
.470597
.620547
.754149
.857305
.926328
.966350
.986460
.995216
.998519
.999599
1.000000
OBSERVED
FREQUENCY
.000
8.000
7.000
22.000
16.000
4.000
12.000
9.000
8.000
1.000
2.000
1,000
.000
2.000
.000
                 PREDICTED  RESIDUAL
                FREQUENCY FREQUENCY
                                                                                                                5.077
                                                                                                                5.229
                                                                                                                8.261
                                                                                                               11.310
                                                                                                               13.418
                                                                                                               13.795
                                                                                                               12.291
                                                                                                                9.490
                                                                                                                6.350
                                                                                                                3.682
                                                                                                                1.850
                                                                                                                 .806
                                                                                                                 .304
                                                                                                                 .099
                                                                                                                 .037
                  LACS SITE 3  NOX   HIGH 03 LEVEL
                                        NORMAL DISTRIBUTION
                                                                                              MU= 0.0252
                                                                                              SIGMA - =  0.0001
                                                                                             - 5.077
                                                                                               2.771
                                                                                             - 1.261
                                                                                              10.690
                                                                                               2.582
                                                                                             - 9.795
                                                                                              - .291
                                                                                              - .490
                                                                                               1.650
                                                                                             - 2.682
                                                                                                .150
                                                                                                .194
                                                                                              - .304
                                                                                               1.901
                                                                                              - .037
                                  .007    .011   .016    .020   .024    .029    .033   .037    .042    .046   .050   .055   .059   .063

                                                                           X  -

                                      FIGURE 7  MAXFIT OUTPUT:  FIT OF NORMAL DISTRIBUTION TO DATA IN EXAMPLE 2.

-------
LACS SITE 3 NOX  HIGH 03 LEVEL

LOG NORMAL DISTRIBUTION

 MEAN-  -3.5370
 VARIANCE -   .1337
 STANDARD DEVIATION -    .3667
 SEARCHED LOWER BOUND -  -.006922
 X1
X2    OBSERVED   PREDICTED   RESIDUAL
.006-
.007-
.011-
.016-
.020-
.024-
.029-
.033-
.037-
.042-
.046-
.050-
.055-
.059-
.007
.011
.016
.020
.024
.029
.033
.037
.042
.046
.050
.055
.069
.063
.063-1 NFINITY
.000000
.086967
.076087
.239130
.173913
.043478
.130436
.097826
.086957
.010870
.021739
.010870
.000000
.021739
.000000
                            .013136
                            .063109
                            .130498
                            .168756
                            .166634
                            .139403
                            .106139
                            ,074096
                            .049934
                            .032668
                            .020962
                            .013287
                            .008360
                            .005241
                            .008876
                                _ .013136
                                  .023847
                                _ .064411
                                  .070374
                                  .007379
                                - .095925
                                  .026296
                                  .023730
                                  .037023
                                - .021799
                                  .000777
                                - .002417
                                -.008360
                                  .016498
                                - .008876
CUMULATIVE
OBSERVED
.000000
.086967
.163043
.402174
.676087
.619666
.760000
.847826
.934783
.946652
.967391
.978261
.978261
1.000000
1.000000
CUMULATIVE
PREDICTED
.013136
.076246
.206744
.375800
.542034
.681437
.786576
.860672
.910606
.943274
.964237
.977523
.985883
.991124
1.000000
OBSERVED
FREQUENCY
.000
8.000
7.000
22.000
16.000
4.000
12.000
9.000
8.000
1.000
2.000
1.000
.000
2.000
.000
-PREDICTED
•FREQUENCY
1.209
5.806
12.006
15.526
15.321
12.825
9.673
6.817
4.594
3.005
1.929
1.222
.769
.482
.817
RESIDUAL
FREQUENCY
-1.209
2.194
-5.006
6.474
.679
-8.825
2.327
2.183
3.406
-2.005
.071
-.222
-.769
1.518
-.817
LACS SITE 3  NOX    HIGH 03 LEVEL
                         LNORM3 DISTRIBUTION
                                                                    MU= -3.5370
                                                                    SIGMA 2 = 0.1337
                                                                    LOWER BOUND =  -0.0059
       7.63  -
       0.00
                 .007   .011    .016    .020   .024    .029   .033   .037    .042    .046   .050    .055    .059   .063
                                                      X
             FIGURE 8  MAXFIT OUTPUT:   FIT OF 3 PARAMETER LOG NORMAL DISTRIBUTION FOR DATA IN EXAMPLE 2.

-------
LACS SITE 3 NOX   HIGH 03 LEVEL

GAMMA DISTRIBUTION
  ALPHA =  3.1819
  BETA =  .0067
  SEARCHED LOWER BOUND <
  X1
X2   OBSERVED
.004-
.007-
.011-
.016-
.020-
.024-
.029-
.033-
.037-
.042-
.046-
.050-
.055-
.059-
.007
.011
.016
.020
.024
.029
.033
.037
.042
.046
.050
.055
.059
.063
.063-INFINITY
.000000
.086957
.076087
.239130
.173913
.043478
.130435
.097826
.086957
.010870
.021739
.010870
.000000
.021739
.000000
.003853
PREDICTED

.008224
.072925
.139776
.165745
.157521
.131863
.101669
.074000
.051615
.034840
.022913
.014754
.009335
.005820
.008998

RESIDUAL

- .008224
.014031
- .063689
.073385
.016392
- .088385
.028766
.023826
.035342
- .023970
- .001174
- .003884
- .009335
.015919
- .008998

CUMULATIVE
OBSERVED
.000000
.086957
.163043
.402174
.576087
.619565
.750000
.847826
.934783
.945652
.967391
.978261
.978261
1.000000
1.000000

CUMULATIVE
PREDICTED
.008224
.081149
.220925
.386670
.544192
.676055
.777724
.851724
.903339
.938179
.961092
.975846
.985181
.991002
1.000000

OBSERVED
FREQUENCY
.000
8.000
7.000
22.000
16.000
4.000
12.000
9.000
8.000
1.000
2.000
1.000
.000
2.000
.000

PREDICTED
FREQUENCY
.757
6.709
12.859
15.249
14.492
12.131
9.354
6.808
4.749
3.205
2.108
1.357
.859
.535
.828

RESIDUAL
FREQUENCY
_ .757
1.291
- 5.859
6.751
1.508
-8.131
2.646
2.192
3.251
-2.205
- .108
- .357
- .859
1.465
- .828
LACS SITE 3 NOX
        HIGH 03 LEVEL
             GAMMA DISTRIBUTION
                                                                  LOWER BOUND =
                                                                  ALPHA- 3.1819
                                                                  BETA = 0.0067
                                                                       0.0039
     0.00
               .007   .011    .016   .020    .024   .029    .033   .037   .042    .046   .050   .055    .059

                                                      X

               FIGURE 9  MAXFIT OUTPUT:  FIT OF GAMMA DISTRIBUTION FOR DATA IN EXAMPLE 2.
                                                                                        .063

-------
LACS SITE 3  NOX  HIGH 03 LEVEL

WEIBULL DISTRIBUTION

  BETA - .0218
  C-  1.7617
  SEARCHED LOWER BOUND -  .005742
  X1
X2
                OBSERVED
.006-
.007-
.011-
.016-
.020-
.024-
.029-
.033-
.037-
.042-
.046-
.060-
.056-
.059-
.007
.011
.016
.020
.024
.028
.033
.037
.042
.046
.060
.065
.069
.063
.063-INFINITY
.000000
.086967
.076087
.239130
.173913
.043478
.130438
.087826
.086967
.010870
.021739
.010870
.000000
.021739
.000000
PREDICTED  RESIDUAL
                                .008666
                                .081266
                                .134664
                                .156648
                                .162516
                                .133809
                                .107686
                                .080656
                                .066479
                                .037324
                                .023344
                                .013861
                                .007832
                                .004221
                                .004141
                                -.006686
                                 .006691
                                -.068677
                                 .083482
                                 .021397
                                -.090331
                                 .022749
                                 .017271
                                 .030477
                                -.026464
                                -.001604
                                -.002991
                                -.007832
                                 .017519
                                -.004141
LACS SITE 3  NOX
         HIGH 03 LEVEL
              WEIBULL DISTRIBUTION
CUMULATIVE
OBSERVED
.000000
.086057
.163043
.402174
.576087
.619566
.760000
.847826
.934783
.945662
.967391
.978261
.978261
1.000000
1.000000
CUMULATIVE
PREDICTED
.006655
.087921
.222584
.378232
.530748
.664557
.772243
.852799
.909278
.946602
.969945
.983806
.991638
.995859
1.000000
OBSERVED
FREQUENCY
.000
8.000
7.000
22.000
16.000
4.000
12.000
9.000
8.000
1.000
2.000
1.000
.000
2.000
.000
PREDICTED
FREQUENCY
.612
7.476
12.389
14.320
14.031
12.310
9.907
7.411
5.196
3.434
2.148
1.276
.721
.388
.381
RESIDUAL
FREQUENCY
-.612
.524
-5.389
7.680
1.969
-8.310
2.093
1.589
2.804
-2.434
-.148
-.275
-.721
-• 1.612
-.381
     46.16


     37.63



     30.10


 B   22.58
 ik

     16.05


      7.63


      0.00
                                                                          LOWER BOUND = 0.0057
                                                                          ALPHA = 1.7517
                                                                          BETA- 0.0218
                .007   .011    .016    .020   .024    .029   .033   .037    .042   .046   .050    .055    .059

                                                          X

                    FIGURE 10 MAXFIT OUTPUT:  FIT OF WEIBULL DISTRIBUTION FOR DATA IN EXAMPLE 2
                                                                                                         .063

-------
ro
ro
             LACS SITE 3  NOX

             SB DISTRIBUTION

               MEAN = -1.0339
               VARIANCE =  .6214

               MAX =  080503
               MIN=  .032990
 HIGH 03 LEVEL
              X1
                       X2
                             OBSERVED
.003-
.007-
.011-
.016-
.020-
.024-
.029-
.033-
.037-
.042-
.046-
.050-
.055-
.059-
.063-
.007
.011
.016
.020
.024
.029
.033
.037
.042
.046
.050
.055
.059
.063
.081
.000000
.086957
.076087
.239130
.173913
.043478
.130435
.097826
.086957
.010870
.021739
.010870
.000000
.021739
.000000
             LACS SITE 3 NOX
            PREDICTED  RESIDUAL
                                             .008625   - .008625
                                             .076215     .010741
                                             .138778   -.062691
                                             .158836     .080294
                                             .150719     .023194
                                             .129460   -.085982
                                             .104146     .026289
                                             .079535     .018291
                                             .057857     .029099
                                             .039966   -.029096
                                             .025978   -.004238
                                             .015631   -.004762
                                             .008472   - .008472
                                             .003946     .017793
                                             .001836   -.001836
CUMULATIVE
OBSERVED
.000000
.086957
.163043
.402174
.576087
.619565
.750000
.847826
.934783
.945652
.967391
.978261
.978261
1 .000000
1.000000
CUMULATIVE
PREDICTED
.008625
.084841
.223619
.382455
.533174
.662634
.766780
.846315
.904172
.944138
.970116
.985747
.994219
.998164
1.000000
OBSERVED
FREQUENCY
.000
8.000
7.000
22.000
16.000
4.000
12.000
9.000
8.000
1.000
2.000
1.000
.000
2.000
.000
PREDICTED
FREQUENCY
.794
7.012
12.768
14.613
13.866
11.910
9.581
7.317
5.323
3.677
2.390
1.438
.779
.363
.169
RESIDUAL
FREQUENCY
- .794
.988
- 5.768
7.387
2.134
- 7.910
2.419
1.683
2.677
- 2.677
- .390
- .438
- .779
1.637
- .169
HIGH 03 LEVEL
      SB DISTRIBUTION
                                                                                                MU=  -1.0339
                                                                                                SIGMA 2=  0.6214
                                                                                                LOWER BOUND = 0.0030
                                                                                                UPPER BOUND =  0.0805
                              .007    .011    .016   .020    .024    .029    .033   .037    .042    .046   .050    .055   .059    .063
                                 FIGURE 11 MAXFIT OUTPUT: FIT OF JOHNSON SB DISTRIBUTION FOR DATA IN EXAMPLE 2

-------
ro
Co
                   LACS SITE 3 NOX  HIGH 03 LEVEL

                   BETA DISTRIBUTION

                    ALPHA-  1.9339036
                    BETA- 6.7444366

                    MAX*  .083143
                    WIN- .006647
X1 X2 OBSERVED PREDICTED RESIDUAL
CUMULATIVE CUMULATIVE OBSERVED PREDICTED
OBSERVED
.006- .007
.007- .011
.011- .016
.016- .020
.020- .024
.024- .029
.029- .033
.033- .037
.037- .042
.042- .046
.046- .050
.050- .055
.056- .059
.059- .063
.063- .083
LACS SITE 3 NOX
.000000 .008710 -.006710
.086967 .084586 .002371
.076087 .137076 -.060989
.239130 .163181 .086970
.173913 .147323 .026590
.043478 .129321 -.086842
.130435 .106908 .024527
.097826 .081618 .016208
.086957 .059291 .027686
.010870 .040477 -.029607
.021739 .025777 -.004038
.010870 .016120 -.004251
.000000 .008008 -.008008
.021739 .003710 .018030
.000000 .001916 -.001916
.000000
.066957
.163043
.402174
.676087
.619565
.760000
.847826
.934783
.945652
.967391
.978261
.978261
1.000000
1.000000
PREDICTED FREQUENCY FREQUENCY
.006710 .000 .617
.091296 8.000 7.782
.228372 7.000 12.611
.381532 22.000 14.091
.528856 16.000 13.554
.658176 4.000 11.897
.764084 12.000 9.743
.846702 9.000 7.509
.904992 8.000 5.455
.945469 1.000 3.724
.971246 2.000 2.371
.986386 1.000 1.391
.994375 ..000 .737
.998084 2.000 .341
1 .000000 .000 .176
HIGH 03 LEVEL
BETA DISTRIBUTION
45.15 r-


37.63
30.10
22.58
H
"• 16.06
7.53
0.00


"
-
-
-
1
1





/
/
/






/*
/







-







-^*





^v








\
^S








^









\


LOWER BOUND = 0.0056
UPPER BOUND - 0.0831
ALPHA= 1.9339
BETA= 6.7444



^^
^>_
— r5^— T-— rr-
                                                                                                                               RESIDUAL
                                                                                                                              FREQUENCY

                                                                                                                                 -.617
                                                                                                                                  .218
                                                                                                                                -5.611
                                                                                                                                 7.909
                                                                                                                                 2.446
                                                                                                                                -7.897
                                                                                                                                 2.257
                                                                                                                                 1.491
                                                                                                                                 2.545
                                                                                                                                - 2.724
                                                                                                                                 -.371
                                                                                                                                 -.391
                                                                                                                                 -.737
                                                                                                                                 1.689
                                                                                                                                 -.176
                                 .007    .011    .016    .020   .024   .029    .033   .037    .042   .046    .050   .055

                                                                           X

                                      FIGURE 12  MAXFIT OUTPUT: FIT OF BETA DISTRIBUTION FOR DATA IN EXAMPLE 2

-------
  LACS SITE 3 NOX   HIGH 03 LEVEL
                              NORMAL
LOG NORMAL
GAMMA
WEIBULL
SB
                              BETA
ABSOLUTE DEVIATION





WEIGHTED ABS. DEVIATION





CHI-SQUARE («)





KOLMOGOROV-SMIRNOV





CRAMER-VON MISES-SMIRNOV





LOG LIKELIHOOD







'OBSERVED SIGNIFICANCE LEVEL
.39876353 + 002
(6)
.40290285 + 001
(3)
.80831527 - 003
(6)
.10548994-1-000
(6)
.25241871 +000
(6)
.28121652+003
(6)
.37706056 + 002
(3)
.39116101 +001
(1)
.27928927 - 001
<1)
.61872242 - 001
(4)
.11858179 + 000
(5)
.28741674 + 003
(5)
.38209408 + 002
(5)
.40558162+001
(5)
.23569159 - 001
(2)
.57881671 -001
(1)
.11282250 + 000
(2)
.28792753 + 003
(4)
.36539779 + 002
(1)
.40038714+001
(2)
.91304910-002
(5)
.59540748 - 001
(2)
.11458265+000
(3)
.28859950 + 003
(2)
.37849178+002
(4)
.40795364 + 001
(6)
.11633393-001
(3)
.60575323 - 001
(3)
.10825762 + 000
(1)
.28846644 + 003
(3)
.37050406 + 002
(2)
.40345331 + 001
(4)
.92297671 - 002
(4)
.65328118-001
(5)
.11509788+000
(4)
.28860871 + 003
(1)
                    TABLE 3 MAXFIT OUTPUT: COMPARISON STATISTICS FOR DATA IN EXAMPLE 2

-------
                               SECTION 5

                          FUTURE DEVELOPMENTS
     Obviously MAXFIT is not the answer to all air quality data distribution
problems.  Continuing efforts are planned to improve the program in several
areas.
IMPROVEMENT OF THE SOFTWARE'S GROUPED DATA HANDLING CAPACITY

     At present, software considers grouped data to be repeated observations
at class mid-points.  The simplistic method causes estimates to become un-
stable when a large proportion of the data falls in one class.  This problem
can be avoided by maximizing the probability that certain sample values fall
in certain classes instead of maximizing a likelihood function of grouped
data.
TRUNCATION AND CENSORING

     If the underlying variate X cannot be observed in part or parts of its
range, the distribution of X is usually termed truncated.  In addition,
observations which cannot be recorded below a level R, the variate X is
truncated on the left.  Similarly, the distribution of X can be truncated on
the right and doubly truncated.  Observations from any type of truncated
distribution are in actuality drawn from an incomplete distribution.  A
truncated variate does not differ from other variates, but is treated in a
special way because its distribution is generated by an underlying untrun-
cated variable.

     There are times when an experimenter is forced to put little faith in
sample values when they occur above or below certain values.  In these cases,
such an experimenter might not wish to use the information contained in the
data values in estimating parameters.  This situation occurs when a value is
below the minimum detectable limits of an instrument or measurement method.
As in truncation, we may have censoring on the left when a certain minimum
response is necessary to make a valid measurement, censoring on the right,
and double censoring.  Censoring can be more completely defined by distin-
guishing Type I and Type II censoring.  Type I censoring is said to occur
when the number of censored observations is a random variable, whereas, in
Type II censoring the number of censored observations is fixed.  Censoring is
a property of the sample whereas truncation is a property of the distribution
(14,15).

                                     25

-------
     In the truncated and censored cases, parameters of frequency functions
can be estimated by maximum likelihood estimation.  For a continuous variate,
with frequency function f(x;e), the following likelihood function can be
maximized when f(x;e) is doubly truncated at known points a and b, with a
-------
                                  SECTION 6

                                   SUMMARY


     MAXFIT is a vehicle for applying a statistically valid method of fitting
distributions.  The software makes use of the maximum likelihood method of
estimation to fit the normal distribution, the 3-parameter lognormal distri-
bution, the 3-parameter gamma distribution, the 3-parameter Weibull distribu-
tion, the Johnson S« distribution, and the 4-parameter beta distribution.  With
the increased availability of high speed computers, we feel that maximum
likelihood is a better method to fit probability models than outdated,
inefficient graphical techniques.

     There are many managers within EPA who must make statistical assumptions
regarding the distribution of air pollution data.  These assumptions may
affect agency policy.  Such assumptions affect standard setting, emission
roll-back calculations, estimation of maximum concentrations, threshold
approximations and the handling of missing observations.  It is hoped MAXFIT
could be used as a tool to aid in making such assumptions.

     In fitting air quality data, we have found large variations in the
shape of the distributions suggested by the data.  We consider fitting one
distribution to all air quality data to be inadvisable.   Even the two highly
similar data sets presented as examples in this report led to the selection
of different models as "best."  With this software, several models can be
fit and the goodness-of-fit of each can be compared in several ways.  Thus, a
rational decision can be made as to which model would be adequate for a given
data base and purpose.
                                     27
-------
                                 REFERENCES


1.   Larsen, R. I.  A Mathematical Model for Relating Air Quality
     Measurements to Air Quality Standards.  Publication No. AP-89, U.S.
     Environmental Protection Agency, Research Triangle Park, North Carolina,
     1971.

2.   Larsen, R. I.  An Air Quality Data Analysis System for Interrelating
     Effects, Standards, and Needed Source Reductions.  J. Air Poll. Control
     Assoc. 23:993, 1973.

3.   Larsen, R. I.  An Air Quality Data Analysis System for Interrelating
     Effects, Standards, and Needed Source Reductions.  Part 2, J. Air Poll.
     Control Assoc. 24:551, 1974.

4.   Larsen, R. I.  An Air Quality Data Analysis System for Interrelating
     Effects, Standards, and Needed Source Reductions.  Part 3, J. Air. Poll.
     Control Assoc. 26:325, 1976.

5.   Larsen, R. I.  An Air Quality Data Analysis System for Interrelating
     Effects, Standards, and Needed Source Reductions.  Part 4, J. Air Poll.
     Control Assoc. 27:454, 1977.

6.   Mage, D. T., and W. R. Ott.  Refinements of the Lognormal Probability
     Model for Analysis of Aerometric Data.  J. Air Poll. Control Assoc.
     28(8):796-798, 1978.

7.   Ott, W. R., and D. T. Mage.  A General Purpose Univariate Probability
     Model for Environmental Data Analysis.  Comput. & Ops. Res.
     3:209-216, 1976.

8.   Curran, T. C., and N. H. Frank.  Assessing the Validity of the Lognormal
     Model When Predicting Maximum Air Pollution Concentrations.  Annual
     Meeting of the Air Pollution Control Association, Boston, Massachusetts,
     1975.

9.   Rao, C. R.  Linear Statistical Inference and Its Applications.  2nd Ed.,
     John Wiley and Sons, New York City, New York, 1973.

10.  Schreuder, H. T., W. L. Hafley, E. W. Whitehorne, and B. T. Dare,
     Maximum Likelihood Estimation for Selected Distributions (MLESD).
     Tech. Report No. 61, School of Forest Resources, North Carolina State
     University, Raleigh, North Carolina, 1978.

11.  Box, M. J., D. Davies, and W. H. Swann.  Non-linear Optimization
     Techniques.  Oliver & Boyd, Edinburgh, Scotland, 1969.

                                     28
-------
12.  Rodes, C. R., and D. M. Holland.  NCu/CL Sampler Siting Study.   (To be
     published as an EPA Environmental Monitoring Series Report).

13.  IMSL Subroutine Library, Vol. 1 & 2, International  Mathematical  and
     Statistical Library, Inc., Houston, Texas, 1975.

14.  Hald, A.  Maximum Likelihood Estimation of the Parameters of  a  Normal
     Distribution which is Truncated at a Known Point.  Skandinavisk
     Aktuarietidskrift, Vol. 32:119, 1949.

15.  Kendall, M. 6., and A. Stuart.  The Advanced Theory of Statistics.
     Vol. 2, 2nd Ed.  Hafner Publishing Co., New York City, New York, 1967.

16.  Johnson, N. L., and S. Kotz.  Continuous Univariate Distributions - 1.
     Houghton MiffTin Co., Boston, Massachusetts, 1970.
                                     29
-------
                  APPENDIX A.  DISTRIBUTION DESCRIPTIONS




Normal Distribution:                                               _«>
-------
APPENDIX A (continued)
     Estimates:
          grouped data:  x  and  f are searched for, y  = Ilog
                               f x. -t  }
                                                               - x
                                                               -
                                                                    -n.
                                 - xi  J
          ungrouped data:  X and I are searched for, y = £log
                                   xi~X )
Gamma Distribution:
     Density:        f(x)
                             (x - x)a"1.EXP[-(x -
                             ear(a)*
                                                               x.ni
' 1
TT
a

)g(xrx).ni _
n

          ungrouped  data:  x is searched for,
                              Z(xrx)Slog(xrx) . Ilog(xrx)   ^ =Q
               a  must  satisfy
                              I  (xrX)«
                                                       n
                                                           (continued)
r(a)
       =  /
                    -x
                     dx.
                                    31
-------
APPENDIX A (continued)

Beta Distribution:
     Density:   f(x) =
     Estimates:
           ungrouped:   oU  0.  X»  and | are searched for.
           grouped data:   a,  e,  X,  and | are searched for.
                                      32
-------
                  APPENDIX B
     SUBRDUT I NE  USER  P » XB » NGRP > TF . N > NC , BEG I N t STEP > TDL
    * T I TLE * XM I N j XMflX > MD I ST>
     I MPL I C I T DOUBLE  PREC I S I DH Cfl-H » D- Y>
     RERL TITLECl;-
     DflTfl I FL.-"'H •-•'•
     DIMEMSIDN X<1> ?XB<1> >HDIST<1> >P<1>
     1 = 0
     XMflX=-9.D6
     XMIM*9.D6
     TDL=.0001DO
     TITLE ='LflCS S'
     T I TLE C£>=' I TE 3  '
     TITLEC3>='  NDX   '
     TITLE <4J='  HIGH  '
     TITLEC5>=-D3 LEV''
     TITLE<6>='EL
2    RERD 05 > 1 0 > END=3>  I DZL » TC > RVGSP n I S I TE > RMOX
1 0   FDRMflT -::R 1 j T 1 7 •• F7 . 0 !• Ft. . 2 •. 1 3 j T4S j F5 . 3.>
     IF < I S I TE . HE . 3 . DR . I DZL . HE . I FL> GO TO 5
     IF (RNDX.LE. l.D-6>  GD TD 5
     XI=RNDX
     IF
     IFCXI.GT.XMflX>  XMRX=XI
5    GQ TD 2
3    DD 4 J=l,6
4    NDIST=1
     NGRP=0
     N=I
     TF=M
     NC=0
     BEGIN=XMIN-l.D-5
     STEP='::XMRX-XMIN> •- 12. DO
     RETURN
     END
                          33
-------
                                   TECHNICAL REPORT DATA
                            (Please read Instructions on the reverse before completing)
  REPORT NO.
  EPA 600/4-79-044
2.
                             3. RECIPIENT'S ACCESSIOf*NO.
4. TITLE AND SUBTITLE
  THE MAXIMUM LIKELIHOOD APPROACH  TO  PROBABILISTIC
  MODELING OF AIR QUALITY DATA
                             5. REPORT DATE
                               July  1979
                             6. PERFORMING ORGANIZATION CODE
 . AUTHOR(S)
  Terence Fitz-Simons
  David  M.  Hoi land
                             8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
                                                           10. PROGRAM ELEMENT NO.
                                                               IAD 883
                                                           11. CONTRACT/GRANT NO.
12. SPONSORING AGENCY NAME AND ADDRESS
  Environmental Monitoring  and  Support Laboratory
  Office of Research and  Development
  U.S.  Environmental Protection Agency
  Research Triangle Park, N.C.  27711	
                              13. TYPE OF REPORT AND PERIOD COVERED

                                Final	
                              14. SPONSORING AGENCY CODE
                                  EPA-600/08
15. SUPPLEMENTARY NOTES
16. ABSTRACT
            Software using maximum  likelihood  estimation to fit six probabilistic
  models  is presented.  The software  is designed  as a tool for the air pollution
  researcher to determine what assumptions  are valid in the statistical analysis
  of air  pollution data for the purposes of standard setting, roll-back calculations,
  estimation of maximum concentrations, threshold approximations, and handling
  missing observations.  The program  fits user's  data to the normal distribution,
  the 3~parameter lognormal distribution, the 3~parameter Weibull distribution,
  the 3~parameter gamma distribution,  the Johnson SR distribution  (a 4-parameter
  lognormal distribution), and the ^-parameter beta distribution.  The parameters
  are estimated using standard closed  solutions to maximizing equations, and a
  golden  section search for all other  parameters.  Graphical output contains a
  histogram of the data superimposed  by the fitted density for each model.  Six
  goodness-of-f it criteria are supplied and ranked by the program to aid  in the
  selection of the most appropriate choice  among  the six models.  These criteria
  are absolute deviations  (AD statistic), weighted absolute deviations  (WAD
  statistic), Kolmogorov-Smirnov statistic, Cramer-von Mises-Smirnov statistic,
  the log-likelihood function, and the observed significance level of the Chi-
  square  goodness-of-f it test.  The results of applying the program to several
  subsets of the Los Angeles Catalyst  Study data  base are presented.
17.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                              b.lDENTlFIERS/OPEN ENDED TERMS  C. COSATI Field/Group
  Maximum  Likelihood estimations,
  Lognormal,  gamma, Weibull, beta,
  Johnson  SD,  Johnson S.   Distribution
            D            L
                  Statistics
                  Statistical  Modeling
                               43F
                               68A
18. DISTRIBUTION STATEMENT
   Release  to  Public
19. SECURITY CLASS (ThisReport)
  Unclassified
                                           21. NO. OF PAGES
                                              33
                                              20. SECURITY CLASS (Thispage)
                                                Unclassified
                                                                        22. PRICE
EPA Form 2220-1 (9-73)
-------