vvEPA
United States
Environmental Protection
Agency
Office of Monitoring and Technical EPA-600/4-79-040
Support June 1979
Washington DC 20460
Research and Development
Testing the
Validity of the
Lognormal
Probability Model
Computer
Analysis of Carbon
Monoxide Data from
U.S. Cities
ENVH'
DAL . -
-------
RESEARCH REPORTING SERIES
Research reports ot the Office of Research and Development, U S Environmental
Protection Agency, have been grouped into nine series These nine broad cate-
gories were estabhsheo to facilitate further development and application of en-
vironmental technology Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields
The nine series are
1 Environmental Health Effects Research
2 Environmental Protection Technology
3 Ecological Research
4 Environmental Monitoring
5 Socioeconomic Environmental Studies
6 Scientific and Technical Assessment Reports (STAR)
7 Interagency Energy-Environment Research and Development
8 "Special1 Reports
9 Miscellaneous Reports
This report nas been assigned to the ENVIRONMENTAL MONITORING series.
This series describes research conducted to develop new or improved methods
and instrumentation for the identification and quantification of environmental
pollutants at tne lowest conceivably significant concentrations It also includes
studies to determine the ambient concentrations of pollutants in the environment
and/0' the variance oi pollutants as a function of time or meteorological factors
-------
m-600/4-79-040
1979
-------
-------
EPA-600/4-79-040
June 1979
TESTING THE VALIDITY OF THE LOGNORMAL PROBABILITY MODEL:
COMPUTER ANALYSIS OF CARBON MONOXIDE DATA FROM U.S. CITIES
Wayne R. Ott
David T. Mage
Victor W. Randecker*
OFFICE OF MONITORING AND TECHNICAL SUPPORT
OFFICE OF RESEARCH AND DEVELOPMENT
U.S. ENVIRONMENTAL PROTECTION AGENCY
WASHINGTON, D.C. 20460
-x-
Currently with the Food Safety and Quality Service, U.S. Depart-
ment of Agriculture, Washington, D.C.
-------
DISCLAIMER
This report has been reviewed by the Office of Research
and Development, U.S. Environmental Protection Agency, and
approved for publication. Mention of trade names or commercial
products does not constitute endorsement or recommendation for
use.
ii
-------
FOREWORD
The U.S. Environmental Protection Agency (EPA) was created
because of increasing public and governmental concern about the
dangers of pollution to the health and welfare of the American
people. Polluted air, foul water, and spoiled land are tragic
testimony to the deterioration of our ^natural environment. The
complexity of that environment and the interplay among Its com-
ponents require a concentrated and integrated attack on the problem.
Physical, chemical, and biological measurements of environ-
mental quality are the most important means available for deter-
mining the state-of-the-environment. These monitoring data pro-
vide the basis for evaluating the Nation's progress in improving
environmental quality and are essential to environmental managers
and decision makers. Although great quantities of monitoring
data are collected routinely across the Nation, the statistical
and mathematical analyses performed upon these data usually are
quite rudimentary. In recent years, there has been increased
concern over the need to carry out more in-depth analyses of these
data. Thus, EPA has increased its emphasis on the development
of new data analysis techniques.
This report describes one of a series of projects that
EPA's research program is conducting to demonstrate new methods
for handling, analyzing, and interpreting environmental data.
By providing greater insight into the statistical properties
of environmental measurements, we hope that the results of this
work will provide information that is useful in the regulatory
process for setting policies and standards. We also hope that
this work will help stimulate more comprehensive analyses of
environmental data by the institutions that collect these
data—municipalities, counties, states, federal agencies, pri-
vate firms, and universities.
Albert C. Trakowsk'i
Deputy Assistant Administrator
for Monitoring and Technical Support/
ill
-------
PREFACE
This report Is one of several that examines the frequency
distributions of observed environmental pollutants. This report
focuses on a particular probability model — the two-parameter
lognormal (LN2) probability model — and it evaluates the suit-
ability of this model for representing U.S. carbon monoxide (CO)
air quality frequency distributions. Subsequent reports will
examine other probability models, other pollutants, and other
approaches (such as graphical techniques) for applying and
evaluating various candidate models.
Although the LN2 model frequently has been used by air
pollution field personnel, few air quality studies have treated
the properties of the LN2 model in detail or examined how well
the model fits the raw observations. In this study, a computer
program has been developed which can apply the LN2 model to any
CO data set, generating summary Information about the goodness-
of-fit of the model to the data. This computer program has been
applied to a carefully selected nationwide sample of CO data
sets. To make our work as complete as possible, this report
fully documents the properties of the LN2 model, deriving all
necessary formulas and equations. Thus, we hope it will serve
as a reference document for others In the air pollution field.
This report has required approximately five years for its
completion, primarily because the authors have full-time duties
that involve other aspects of EPA's research program. A substan-
tial portion of this work has been carried out on the authors'
own time, and the present report was typed by the senior author.
Despite the slow progress of our work, we hope that the present
study is the most exhaustive and complete treatment currently
available of the application of the LN2 model to CO frequency
distributions. We also hope that, by demonstrating one of the
many possible analytical techniques that can be applied to
environmental data, we will help Increase interest in the develop-
ment of a formal, fully-funded statistical reasearch and develop-
ment activity within the EPA.
iv
-------
ABSTRACT
A stratified sample consisting of 11 data sets from an
original list of 166 carbon monoxide (CO) air quality data sets
in SAROAD was selected as a "national cross section" of observed
U.S. CO concentrations, along with a longitudinal group of data
sets from a single air monitoring station. The adequacy of the
two-parameter longnormal probability model then was evaluated
using these data. A special-purpose computer program was devel-
oped for calculating the parameters of the LN2 model using four
different techniques: (l) direct "Method of Moments," (2) Larsen's
"Method of Fractiles," (3) "Maximum Likelihood Estimation" (MLE)
for grouped data using an MLE approximation technique, and (4)
MLE for grouped data using computer optimization.
The goodness-of-fit of the LN2 model to the data was eval-
uated using frequency-based approaches (e.g., chi-square, magni-
tude of the log-likelihood function, Kolmogorov-Smirnov measures)
and variate-based approaches (e.g., the difference between the
concentration predicted by the model for each interval and the
concentration actually observed). The findings show that the
method of calculating parameters for the LN2 model exerts a pro-
found influence on the goodness-of-fit of the model to the data,
particularly at the higher concentrations of greatest interest
for protecting public health. The method of fractiles performs
best for the variate-based tests but poorest for the frequency-
based tests, suggesting that it is responding more to randomness
at the extremes than to the underlying distribution of the process,
v
-------
ACKNOWLEDGMENTS
Grateful appreciation is expressed to the following indi-
viduals, all employees of the Environmental Protection Agency,
who were kind enough to review this report and offer many useful
suggestions and comments: Ralph Larsen, Lance Wallace, • Teri
Gardinier, Robert Papetti, William F. Hunt, Gerald Akland,
Terence Fitz-Simons, David Holland, and Irene Kiefer (consulting
technical editor). We wish to thank them for their excellent
technical reviews and the many thoughtful suggestions we re-
ceived.
vi
-------
CONTENTS
FOREWORD ill
PREFACE iv
ABSTRACT v
ACKNOWLEDGMENTS vl
I. INTRODUCTION 1
STRUCTURE OF REPORT 4
II. LITERATURE REVIEW 6
III. STRUCTURE OF THE NORMAL AND LOGNORMAL PROBABILITY
MODELS 14
MOMENTS 16
NORMAL PROBABILITY MODEL 21
LOGNORMAL PROBABILITY MODEL 29
IV. CALCULATION OF PARAMETERS . ; 45
MOMENTS OF THE DATA 45
METHOD OF MOMENTS 49
METHOD OF FRACTILES 52
MAXIMUM LIKELIHOOD ESTIMATION 6l
MLE Approximation 64
MLE Optimization 67
V. GOODNESS-OF-FIT 70
FREQUENCY-BASED MEASURES 70
VARIATE-BASED MEASURES 74
VI. DESCRIPTION OF THE COMPUTER PROGRAM 81
MAIN PROGRAM 81
SUBROUTINES 83
VII. SELECTION OF THE DATA 95
GEOGRAPHICAL GROUP 98
LONGITUDINAL GROUP 109
VIII. RESULTS FOR THE GEOGRAPHICAL DATA GROUP Ill
FINDINGS FOR INDIVIDUAL CITIES Ill
Barstow, CA 113
Norfolk, VA 118
Alexandria, VA 122
Denver, CO 126
Philadelphia, PA 130
Napa, CA 133
Phoenix, AZ 136
Newhall, CA l4l
Springfield, MA 144
Pasadena, CA 147
New York City, NY 150
Discussion 153
vii
-------
OVERALL FINDINGS 155
Discussion 170
IX. RESULTS FOR THE LONGITUDINAL DATA GROUP 173
FINDINGS FOR INDIVIDUAL YEARS 173
1966 175
1968 176
1969 182
1970 183
1971 186
1972 187
1973 194
OVERALL FINDINGS 195
Discussion 197
X. CONCLUSIONS 204
REFERENCES 208
APPENDICES
A. LISTING OF THE COMPUTER PROGRAM 215
B. SAMPLE OUTPUT FOR NEW YORK CITY 231
C. BASIC STATISTICS, HISTOGRAMS, AND AUTOCORRELATIONS
FOR THE GEOGRAPHICAL DATA GROUP 259
D. BASIC STATISTICS, HISTOGRAMS, AND AUTOCORRELATIONS
FOR THE LONGITUDINAL DATA GROUP 305
viii
-------
I. INTRODUCTION
Our review of the research literature (Chapter II) shows
that numerous claims have been made that air pollution concen-
tration data, measured in the ambient air, are "lognormally
distributed." Despite the frequent use of the phrase "log-
normally distributed," few authors state specifically what they
mean, by the phrase. When is a distribution judged "lognormal"
and how does one make this determination? If some objective
criterion were available for lognormality, to what degree would
air quality data for cities throughout the U.S. meet this cri-
terion?
A reader carefully reviewing the air pollution literature
may be struck even more by the lack of standard statistical
nomenclature and a lack of application of well known goodness-
of-fit tests. The standard chi-square test, for example, which
is useful for examining the hypothesis that the underlying
distribution from which a random sample was drawn is lognormal,
seldom has been applied in articles on data collected by the
air pollution control community. Rather, the usual approach has
been to plot the data on logarithmic probability paper. If,
upon subjective examination, the resulting line "looks straight,"
the claim is made that the data are "lognormally distributed."
After careful consideration, we have concluded that two
general philosophies of data analysis currently are being
-------
followed: (l) the "statisticians' approach," and (2) the
"engineers' approach." Each philosophy addresses a different
question. The statistician, generally having a random sample
from some larger population, seeks to make inferences about the
characteristics of the larger population. The engineer, on the
other hand, often possessing the entire population of data, is
interested in representing the population graphically or esti-
mating the approximate Impact of regulatory pollution control
decisions upon this population. The two philosophies, because
they are fundamentally different, produce interesting inconsis-
tencies. For example, the engineer may plot environmental data
on logarithmic probability paper and decide that the resulting
line "looks straight" and that the lognormal model is a good
candidate for his application. In contrast, the statistician
may apply the chi-square test to the same data and conclude that
he should reject the hypothesis that a given data set has a log-
normal distribution (i.e., arises from a lognormally distributed
population). Very early in this investigation, we observed that
data sets which appear as straight lines on logarithmic proba-
bility paper can seriously fail the chi-square and other sta-
tistical tests.
We began considering the implications of the two approahces
in 1975. Although the theoretical basis for the statisticians'
approach is well developed, we were troubled that the engineers'
approach appears to have slipped into the scientific literature
with little fanfare and no formal theoretical explanation.
-------
Except for a recent technical note by Mage and Ott, few papers
have discussed the existence of these two fundamentally different
approaches. Furthermore, no objective criteria are available to
assist the engineer in determining whether a given model is suit-
able for a given application. Thus, the engineers' approach
potentially may be misunderstood by the statistical community,
resulting in criticism of a methodology that may be reasonable
'from an engineering standpoint.
Because of the existence of two disparate data analysis
philosophies, there currently is much ambiguity about the claim
that a given data set "fits" or "does not fit" a particular
distribution, and there is much opportunity for confusion. An
important purpose of the present study was to alleviate possible
confusion arising from these two philosophies, with particular
emphasis on the engineers' approach. By giving additional devel-
opment and structure to the engineers' approach to air quality
data analysis, we hope that we will move one step further to pro-
viding engineers with objective, uniform tools by which they can
carry out their jobs.
In planning this study, our goal was to conduct the most
thorough and extensive evaluation ever undertaken of the suitabil-
ity of the 2-parameter lognormal (LN2) probability model for
representing carbon monoxide (CO) air quality frequency distri-
butions in the United States. Although some past studies have
dealt extensively with the LN2 and other distributions^ few have
applied formal goodness-of-fit tests to the data. In addition,
-------
most past studies were "data limited;" they often used Just one
or two data sets that were not representative of any particular
universe. This study has sought to Improve upon previous studies,
thereby advancing the state of knowledge about this topic, by
Incorporating the following features:
• Two data groups are used: (l) a "geographical data group"
designed to represent a cross section of U.S. cities In
a particular year, and (2) a "longitudinal data group"
designed to represent a number of different years In one
U.S. city.
• Pour different ways to compute the parameters of the
2-parameter lognormal probability model are compared and
examined In detail: (l) method of moments, (2) method of
fractlles, (3) an approximation of maximum likelihood
estimation for grouped data, (4) exact maximum likelihood
estimation using computer optimization.
• A number of different goodness-of-fit tests are used to
evaluate the adequacy of the model for use with air
quality data from U.S. cities.
• An engineering error test (EET) is developed that enables
the performance of a model to be evaluated in the context
of a criterion familiar to the air pollution professional!
namely, the difference between the concentration pre-
dicted by the model and the concentration actually
observed, expressed in parts-per-million (ppm).
STRUCTURE OF REPORT
The overall structure of this report should be relatively
easy for the reader to follow. Chapter II consists of a liter-
ature review, including a historical summary of applications of
4
-------
the lognormal probability model. Chapter III presents the basic
structure of the normal and lognormal probability models, and
the equations commonly associated with these models are derived
and discussed in detail. Chapter IV describes the methodology
used in this study to calculate parameters for the lognormal
probability model from the CO data sets, and Chapter V describes
the methodology used to evaluate the goodness-of-fit of the
model to the data. The methodology described in Chapters III,
IV, and V is incorporated into a computer program, HIST.IIB,
which is described in Chapter VI (and listed in Appendix A).
Chapter VII tells how the air quality data used in this inves-
tigation were selected. The remaining three chapters cover the
results obtained by applying the computer program to the air
quality data sets described in Chapter VII. Chapter VIII pre-
sents the results for the geographical data group, and Chapter
IX presents the results for the longitudinal data group. Finally,
the overall conclusions of this research investigation can be
found in Chapter X. The appendices include the computer program
along with a sample of the full output for one city, and portions
of the outputs containing the basic statistics for all cities
in the geographical and longitudinal data groups.
2
A companion report that is planned for publication in 1980
discusses these data sets in the context of a variety of other
probability models and presents graphical techniques for selecting
and applying the models.
-------
II. LITERATURE REVIEW
As noted by Aitchison and Brown, the two-parameter lognor-
mal distribution has a long history of application in the field
of small particle statistics. The size distribution of small
particles produced in crushing and grinding operations has been
4 5
found to be approximately lognormal,'^ and a theoretical ration-
ale for the lognormality of particles resulting from breakage
processes has been given by Kolmogorov. Analyses of the dust
generated by industrial grinding processes have shown the LN2
model to be a good descriptor of the size distribution of these
7
particles.
It was logical to expect that this work would be extended
to describe the distributions of particles in the atmosphere. In
1956, Harris and Tabor applied the LN2 model to air pollution
9
particle statistics, and in 1959 Zimmer, Tabor, and Stern de-
scribed concentrations of total suspended particulate (TSP) in
both urban and nonurban ambient atmospheres as approximately
LN2. Although they noted that there were some marked deviations
from lognormality, they concluded that "...in general, concen-
Q
trations of particulate matter are lognormally distributed."
More recently, deNevers, Lee, and Prank have suggested that
the marked deviations from lognormality associated with TSP data
may be caused by sampling from two separate lognormal distri-
butions .
-------
Larsen extended the application of the LN2 model from
TSP to the gaseous air pollutants, and, in a 1961 paper, he
stated that carbon monoxide and oxidant concentrations in the
Los Angeles area "...indicate that the variables tend to be
12 1°>
logarithmically distributed." In subsequent papers, Larsen ' J
continued to apply the LN2 model to air quality data, and in
14
1964 Larsen expressed the suitability of the LN2 model for
air quality data in general terms: "Air pollutant concentration
data usually fit the bell (Gaussian) shape, if concentration is
plotted on a logarithmic scale." Larsen and his associates -*~ '
applied the LN2 model to additional air quality data sets, and
in a 1967 paper they generalized even further about the suitabil-
ity of the LN2 model for air quality data: "...concentrations
are approximately lognormally distributed for all pollutants in
"1 ft 1 Q OQ
all cities for all averaging times." Larsen -7"0 applied the
LN2 model to a variety of pollutants and cities and further
refined his "averaging time model" so that the LN2 parameters
and maxima could be calculated for any averaging time. In a
24
1971 report, Larsen 'presented the averaging time model in
considerable detail and concluded: "Pollutant concentrations are
lognormally distributed for all averaging times."
25
In 1972, Lowrimore J derived the probability distribution
of the sums of lognormally distributed random variables and pro-
posed a compound Poisson-lognormal distribution for some envlron-
26 02 00
mental applications. Prom 1972 to 1975 Larsen J and othersJJ
continued to develop the LN2 model and apply it to air quality
7
-------
data, generally to the exclusion of any other probability models.
During this period and earlier, some authors attempted to
explain the apparent lognormality of air quality data in theo-
04
retical terms. A 1969 paper by Gifford-* offered a theoretical
basis for the LN2 model, and several papers containing theo-
retical explanations were presented at a 1972 symposium on
statistical aspects of air quality dataj-^ others have
appeared elsewhere in the literature. The explanations usually
are based either on statistical theory^ '^ or on properties
4n 4i
of the meteorological data. ' The statistical explanations
usually involve the central limit theorem and the diffusion
law, while the meteorological explanations assume that one of
the meteorological variables, such as wind speed, is lognormally
distributed.
Although a large number of applications of the LN2 model
to air quality data had been undertaken by the mid-19701s, few
authors had examined the goodness-of-fit of the model to the data
or had considered other candidate models. The first comprehen-
sive comparison of the LN2 model with other models was made by
42
Lynn in 197^. Lynn examined TSP data from Philadelphia and
examined the suitability of the normal, LN2,' and three-parameter
lognormal (LN3) probability models, along with the Pearson
Types I and IV distributions and the Gamma distribution. Lynn
concluded that "...the two-parameter lognormal does overall
slightly better than the three-parameter and in fact does the
best of all four distributions (LN2, LN3, Gamma and four-
8
-------
parameter Pearson)." This result was unexpected, because the
LN2 model really Is a special case of the LN3 model for which
the LN3 third parameter Is zero.
The near-unanimous acceptance of the LN2 model continued
ho
until 19?4 when Mage -* demonstrated that the LN3 model was
superior to the LN2 model for every air quality data set con-
sidered, provided that the model Is censored when the third pa-
rameter Is negative. Lack of censorship appearantly explains why
ho
Lynn did not find the LN3 model to be superior to LN2 model.
44-46
In 1975 and 1976, Mage and Ott recommended the censored
three parameter lognormal (LN3C) model as a general-purpose
model which could represent air quality frequency distributions
In a large number of circumstances. The LN3 model Is a special
case of the LN3C model for which the third parameter is positive,
and the LN2 model is a special case for which the third parameter
44-46
is zero. Mage and Ott also suggested a possible theoretical
rationale for this model: They transformed the second-order
partial differential equation for diffusion into a first-order
linear differential equation (Langevin equation) and found that,
in one special case at least, the LN3C model might naturally
arise. They also applied the LN3C model to water quality data.
In the mid-1970's, several investigators began to apply
goodness-of-fit criteria to the models, and, upon careful Inspec-
tion, were finding that the LN2 model did not fit air quality
4?
data especially well. In 1976, Kalpasanov and Kurchatova '
applied the LN2 model to Bulgarian air quality data and found
-------
that it was rejected by the Kolmogorov-Smirnov goodness-of-flt
criterion. In some cases, the normal probability model exhibited
a goodness-of-fit that was superior to the LN2 model. Using the
ho
exponential model as a reference, Curran and Frank analyzed the
"tails" of distributions of air quality data. They found that,
in general, the cumulative frequencies of the data approached
unity more rapidly than the cumulative distribution function
(CDF) of the exponential probability model. They called this
property "light tailed." They showed that the LN2 was "heavy
tailed" since its CDF approached unity more slowly than the
exponential model. They reasoned that a light-tailed distribu-
tion, such as the Weibull probability model, should be considered
for fitting light-tailed data sets.
The Weibull distribution was applied to environmental prob-
49
lems by Mikolaj, who modeled the lengths of oil slicks caused by
50
oil tanker spills. Johnson^ has reported that ozone data appear
better suited to the Weibull model than to the LN2 model. Bencala
and Seinfeld^ have applied the Weibull, Gamma, LN2, and LN3 dis-
tributions to air quality data and compared their relative good-
ness-of-fit. Although they state that the LN3 model gives the
best fit of all candidate models, they conclude that the air
quality data are approximately LN2 in distributional form.
In 1977, Larsen^2 adopted the LN3 and LN3C models and
recommended their use over the standard LN2 model for analysis
of air quality data. As noted by Mage and Ott, Larsen's imple-
44-46
mentation of the LN3C differed from previous authors in the
manner in which the third parameter was defined,, and his initial
10
-------
paper-contained several errors. Larsen^ "^ continued to apply
the LN3 and LN3C models to a variety of air quality data sets
and generally favored the use of the three-parameter form.
More complicated models than the three-parameter types
57
also'have been applied. Mage-" has extended the lognormal
application to the four-parameter lognormal probability (LN4)
cp
model, which also is known as the Johnson S^ distribution.
D
He showed that introduction of the fourth parameter allows
inclusion of a fini&e concentration which is approached asymp-
totically as the CDF approaches unity," thereby giving the light
tail that is necessary to fit higher air pollutant concentrations.
CO
In a recent study, Ledolter and his collogues-' applied
a power transform to CO air quality data. This transformation,
also known as the Box-Cox transform, makes it possible to test
whether various power functions, such as the square root or the
cube root, are more suitable than the normal (untransformed)
distribution. In the limit, as the nth root is taken, and |n|
becomes very large (i.e., -—- approaches 0), the distribution
approaches the lognormal probability model. For the data they
examined, they found that maximum likelihood estimates gave
relatively large values of |n|, and they concluded that "...the
lognormal distribution (or the class of power transforms in
general) can provide a good overall description of the frequency
distribution." These authors also addressed the problem of fit-
ting the upper tail of the distribution in which the CDF exceeds
0.95- They applied a Pareto distribution to these high frequen-
cies and tested their results using a chi-square test, concluding
11
-------
that "... the Pareto distribution provides an excellent fit to
the upper 5 percent of CO air quality data."-3
The extreme value distribution also has been proposed for
application to air quality data by Singpurwalla and Roberts.
Its use seems especially well suited to situations in which the
maxima or second highest maxima of air quality data sets are
compared with environmental standards. Roberts°2 nas provided
a number of examples of its application to air quality data.
Jio
Although the gamma distribution, when applied by Lynn and
51
Bencala and Seinfeld, was reported to not perform as well as the
(~\3.
LN2 and LN3 models, Trajonis ^ analyzed nitrogen dioxide (N02)
data and observed the "light tail" earlier reported by Curran and
48
Prank. This result suggested that the gamma distribution was
better suited to representing these NOp data than the heavy-
tailed LN2 model.
At the present time, there appears to be a growing consen-
sus that the data analyst should not automatically choose any
one probability model (such as the LN2 model) to the exclusion of
others, but, rather, that he should carefully evaluate the prob-
lem he is addressing and the data at hand before selecting a
1 64-65
probability model. ' ^ We realize that it may be tempting to
use the LN2 model, because many articles are available to facili-
tate its application. For example, Hunt has developed formulas
and tables to assist the analyst in determining the precision
associated with random samples from the LN2 model. Although sim-
ilar aids have not appeared in the air pollution literature for
the other models mentioned above, their absence should not dis-
12
-------
courage the analyst from using a particular model if it fits the
data better than the LN2 model. We believe that the choice of
the model should be based on the purpose for which the model is
intended3 the findings of other investigators regarding similar
applications, the goodness-of-fit of the model to the data, and
the professional judgment of the data analyst. The report by
63
Trajonls contains a good example of a procedure in which dif-
ferent models are tested, and the investigator makes a selection
that takes into account the fit of the model to the data.
Although, as we have seen, extensive applications of the
LN2 model to air quality data have been made, the use of goodness-
of-fit measures is a relatively recent phenomenon in the air
pollution literature, and no systematic study of the LN2 model's
goodness-of-fit to a large number of data sets has been under-
taken. In addition, no investigators have examined the impact
of different methods of selecting the model's parameters on the
resulting fit. To fill this gap, the present study has been
undertaken. It includes a national cross section of air quality
data and it considers four different methods of calculating
parameters for the LN2 model. We hope that future investi-
gators will consider extending the general approach presented iri
this study to other pollutants and to additional probability
models. The findings of this work are intended to assist data
analysts in selecting and applying probability models to meet
their particular needs.
13
-------
III. STRUCTURE OF THE NORMAL AND LOGNORMAL PROBABILITY MODELS
6f7
As noted by Akland, probability models have a variety
of environmental applications:
• Evaluating environmental standards
• Calculating emission "rollback" levels
• Estimating maximum concentrations
• Approximating threshold concentrations
"I
• Estimating missing observations
Other applications include calculating confidence intervals about
certain point estimates, analyzing environmental trends, and
estimating environmental health risks. A variety of probability
models have been developed for representing the distributions of
continuous random variables.
Let X denote a continuous random variable (such as atmos-
pheric concentration levels) and x denote a particular observation
(such as 12 ppm measured at a station during 12:00 midnight to
1:00 a.m.). Then a particular probability model designed to
represent the distribution of the random variable usually is
written in terms of its Probability Density Function (PDF), which
is denoted as fx(x). If P(x < X < x + Ax) denotes the probability
that the concentration lies within the range x and x + Ax, then
the relationship between the PDF and probability is as follows:
-------
Thus, for a continuous random variable (such as pollutant concen-
tration), the PDF is the first derivative of the probability P
with respect to the random variable X. Consequently, the probability
associated with any point X = x is undefined and cannot be calcu-
lated directly from the PDF. Rather, fx(x) can be interpreted
loosely as the probability that X lies within the infinitesimal
interval (x, x + dx). The fact that fx(x) is a limiting function
causes no difficulty in environmental applications, because we
usually are interested in the probability that X lies above or
below a particular value, such as an environmental standard X = x .
S
The probability that a given random variable is less than
or equal to a specific value X = x is obtained by integrating the
PDF from negative infinity to X = x :
s
xs
P/V / -V \ If f-v\rt-v — "fil f -v ^ f <~>\
{*• S. a' ~ I ly^X^UX = -Ty^X ) (d)
-00
Because of the importance of the integral of the PDF, it is for-
mally defined as the Cumulative Distribution Function (CDF) and
is denoted as FX(X). In the PDF and the CDF, the subscript X
is used to show explicitly that a random variable is under con-
sideration. Some authors use the notation fx(y) and F-v-(y) to
emphasize that y is a specific member from a sample set denoted
by X. If it is understood that a random variable is being
considered, the subscript may tie deleted, and fx(x), FX(X) may
be denoted simply by f(x), F(X). For purposes of emphasis, the
subscript X will be used throughout this report.
Because Fx(b) gives the probability that X < b, the
15
-------
probability that X lies within a specified range between a and
b is obtained by subtracting respective CDF's:
b
P(a < X < b) = Px(b) - Px(a) = f fx(x)dx (3)
a
Although a strict inequality is given at the left side of X in
the statement P(a 0 for all x (5)
00
f fx(x)dx = 1 (5a)
-oo
Because the CDF is the integral of the PDF, FX(X) is a monotonically
increasing function that begins at 0 and approaches 1 in the limit
as x approaches oo.
MOMENTS
The "moments" of a probability model are single values
16
-------
o
3
.5
5
o
1
E
8
Q
o.
•8
I
a
re
H-
O
.2
a
i
3
O>
17
-------
which characterize its properties. The first moment about the
origin is known as the "expected value," or arithmetic mean, and
is an Important measure of "central tendency" of the distribution,
For random variable X, the expected value E{X} is calculated from
the PDF as follows:
00
E{X} = J xfx(x)dx (6)
-00
Because of its importance, the notation M. = EJX} often is used
to refer to the arithmetic mean of the probability model.
The expectation operator shown above can be applied
to other functions of X, say g(x):
oo
EJg(X)} = g(x)fx(x)dx (7)
Q
If g(X) = X , then Equation 7 gives the second moment of random
variable X about the origin. Similarly, if g(X) = X3, the third
moment about the origin is obtained. In general, the Jth moment
about the origin, In which j is a po. i ive integer, is obtained
as followsi
00
E JXJ} = j xjfx(x)dx (8)
-00
If we choose the function g(X) = (X - EJX}) (X - n), then
18
-------
the expectation of (X - ^)^ Is known as the J_th moment about
the mean of the random variable X:
00
E((X-U-)J} = (x - n)Jf(x)dx (9)
-00
Because of Its Importance in probability theory, the second
moment about the mean Is given a special name—the "variance"
or simply var(x):
00
var(x) = E{(X - M-)2} = f (x - n)2fx(x)dx (10)
— 00
If the square In Equation 10 Is expanded,
var(x) = E{(X - u)2 } = E{X2}- 2nE{x} + |i2 (ll)
Noting that u. = EJX}, we obtain a useful result:
var(x) = E{X2} - \? (lla)
In this report, we shall use the notation u.. to denote the
Jth moment about the origin, while m. denotes the Jth moment
about the mean. That Is,
M-j = E{XJ} (12)
mj = E{(X - u.)J} (13)
19
-------
Thus, M,,, or simply u. , denotes the arithmetic mean, and nu
denotes the variance, var(x). The term a also commonly is used
to denote the variance of a probability distribution; that is,
«2
m2 = a .
Other measures that are useful for describing a probability
distribution are the coefficient of skewness ^pT and the coefficient
of kurtosis Pp:
m_
mi
These measures may be viewed as indices of the shape of a
distribution. The skewness is a measure of the symmetry of
the distribution. For symmetrical distributions, ^/P~7 = 0.
For distributions that have tails extending to the right,
^p7 is positive; for distributions that have tails extending
to the left, ^/pT is negative. The kurtosis P2 is a measure
of the "peakedness" of a distribution—the relationship of
its height to its width. For the uniform distribution, Pg =
1.8; for the normal distribution, P2 = 3; and for the exponential
distribution, Pg = 9.
As we shall see, the values of P^ and P2 for a given
distribution can be plotted in the "moment plane" to draw
conclusions about the shapes of different probability
models in relation to each other. Application of the moment
20
-------
plane to the data presented in this report is covered in a
2
companion report by Mage and Ott.
If g(x) = etX in Equation 7, then the expectation operator,
if it exists for a given probability model, is called the "moment
generating function," M{X}:
00
= /etxfx(x)dx
-oo
The moment generating function is a useful tool for obtaining
the moments of a probability model. For some probability models,
the moments are more easily obtained using the expectation oper-
ator, Equation 6, than the moment generating function, Equation 16.
For others, the reverse is true. Once an analytical expression
for the moment generating function is obtained, the jth moment of
the probability model can be foupd by taking the Jth derivative
of'MJX) with respect to t and setting t = 0:
dt<
Normal Probability Model
The normal, or Gaussian, probability model is a distribution
of particular interest in many fields. If Y is a random variable,
the PDF of the normal probability model is symmetrical about the
21
-------
origin (Figure 2), and its PDF is written as follows:
y2
i - T~
fv(y) = ~= e for -oo < y < +°° (18)
A random variable which has a distribution that can be represented
by the normal probability model is said to be normally distributed.
Usually, it is of interest to consider normally distributed
random variables which are translated such that they are
symmetrical about values other than the origin. Thus, we
employ the transformation Y = (X - &}/a, which contains location
parameter u. and dispersion parameter a. With this transfor-
mation, dY/dX = I/a, and the PDF of the transformed variate X
is written as follows:
, 1/x - u\
f(x) = — ==• e 2V ff ' (19)
The moments for this distribution are calculated using
the techniques described above. The arithmetic mean is calcu-
lated as follows using the expectation operator:
/-, ""ol " 'rr I
x -jL. e 2\ a I dx (20)
-co
Since y = (x - H-)/a, we solve for x and substitute x = ay + |i
into Equation l4. Noting that dx = ady,
22
-------
8
o
o>
d
v_
S
\
2
a
o
c
a
•s
o
o
c
o
3
.£>
o >• .£
r-
oo
I
I 5
2 •
S o
.~ c
CM
£
e>
6
CO
6
(O
6
n
d
Oj
d
o
d
-------
l y2
E
|xf = f (ay + n) JL e 2 ady (20)
— OO
a |y -~— e 2 dy + u, /"-r=r- e 2 dy (20a)
•J VPTT •/ l^TT
.1-2 » 1 ..2
_J * V27
E{X} = n (20b)
The simple result given in Equation 20b Is obtained from Equation
20a by the following reasoning. The left-hand Integral in
Equation 20a is the integral of y multiplied times the PDF of
random variable Y, in which Y has mean 0 and standard deviation
of unity. The PDF is symmetrical about 0, and the function within
the integral sign becomes negative for -« < y < o and positive
for 0 < y < +00 when the Integration is performed. Because the
values obtained by integrating over these two ranges are equal
and of opposite sign, the integral over the entire range -<» < y < +«
is zero. The second term in Equation 20a is M- times the Integral
of the PDF of Y over -oo < y < +«,, which, by definition, is 1.
The variance, or second moment about the mean, is calculated
by first calculating EJX } using Equation 8:
E{X} = -^e dx (21)
-00
-------
222 2
Because x = cry + p., the square of x is x = a y + 2u.ay + p. •
Substituting this equation into Equation 21, we obtain an expres-
sion for EJX | that contains three Integrals:
00 JL y2 °° — y2 * -1 y2
E{X2}= /"
-------
To form a perfect square for the numerator of the expression
inside the brackets in Equation 23b, we must modify the numerator
o
so that terms which do not involve x or yc will be equal to one-half
the square of the coefficient of x. That is, the last term must
. 2\ 2 2 224 ?
be equal to (u. + to ) = M- + 2|J.tcr + t a . Because \L already is
present as the last term in the numerator, we can form the perfect
2 24
square by adding 2|j.ta + t a to the numerator and then subtracting
an equivalent term from the quantity within parentheses:
- 2(\i + ta2)x + (u.2 + 2utg2 + t2g4) 2nta2 + t2^]
°2 ? J<* (24)
\<-~ }
tVl . ifx - (m- to2)
-0, °
The integral part of Equation 24a is equivalent to the CDF of
p
a normally distributed random variable with mean M, + ta and
2
variance a . Because this integral is evaluated over the range
-oo to +00, its value is unity, and the moment generating function
for the normal probability model becomes:
_ , _^
M{X( = e L a J = e (25)
To determine the moments of the distribution, we differentiate
Equation 25 with respect to t and set t = 0. Thus the first
26
-------
s
moment, or the arithmetic mean, is calculated from the moment
generating function as follows:
<_ aV
lit + ^r—
= (M. + a t)e (26)
E)X} = ™$JL = M, (27)
The second moment about the origin is calculated in a similar
manner:
2 t + g2fc2 t + S2!2-
—«• = ae + (M. + a t) e 2 (30)
dt^
dt
Using the same approach as above, the variance is obtained as
follows :
var(x) = EJX2} - [EJx}]2 = a2 (32)
For the normal probability model, the moments about the origin
are obtained from M{x} in the same fashion, giving the following
results:
3, + 11 (33)
27
-------
dt
(34)
The notation used In Equations 31* 33* and 3^ indicates that we
first differentiate MJX} with respect to t, and then we substitute
t = 0 into the result.
General equations for the third and fourth mements about the
origin can be derived in a manner analogous to the derivation of
the variance given in Equation lla:
E |(X - n)3} = E)X3( - 3HEJX2( + 2M3 (35)
E )(X - n)4( = EJX4} - 4nE{X3} + 6H2E{X2} - 3v* (36)
Substitution of Equation 31 for E|X2( and Equation 33 for EJX }
into Equation 35 gives the following result:
EJ(X - n)3}= 3C72|^ + |X3 - 3M-(cT2 + H2) + 2n3
= 3cr2M- + p.3 - 3na2 - 3n3 + 2n3 = 0 (37)
Similarly, substitution of Equations 31* 33* and 3^ into Equation 36
gives the following result:
E-{(x - n)V 60V + M-4 + 3a4
2 - 4^ + 6n2a2 + 6^ - 3n4 = 3
-------
37) and 38 Into Equations 14 and 15 gives simple expressions for
and P for the normal distribution:
nio"
•2 - o - . ovo = 3 (*0)
Lognormal Probability Model
If the logarithmic transformation Y = (in X - \i)/o is
employed, then we say that the logarithm of X is normally dis-
tributed, or X is "lognormally" distributed. Noting that
dY = dX/(0X), 'the PDF of the transformed varlate is written
as follows, in which In denotes the logarithm to the base e:
f(x) = —±— e «\ * I (41)
Note that this PDF differs from that for the normal distribution
in that x appears in the denominator and In x is substituted
for x. The lognormal PDF is skewed to the right, and X has the
range 0 < x < «> (see Figure 4, page 43).
If the quantity in parentheses in Equation 4l is set
equal to 0, we obtain an expression for the median of the log-
29
-------
normal distribution,
In Itf - u _ o
.*. In m = |i, or m = e^ (^2)
In another common form of the lognormal probability model, the
PDF does not involve u.; rather, x is expressed as a ratio to the
median In:
-i 2 \ u /
fx(x) = i— e V ' (43)
A ax \T27r
The two forms are equivalent, since the expressions within the
parentheses are equivalent.
To calculate the expected value, or first moment about the
origin, of the lognormal distribution, we apply the expectation
operator, Equation 6:
"
//* Y 2\ a /
xfv(x) dx = I —-^ e dx (44)
•&• «/ Yrr JOTT
0 0 *° ^
Note that the integration is performed between 0 and », because
the Ipgnormal PDF is defined as 0 from -« to 0. To evaluate this
30
-------
integral, we shall employ the transform Z = In X, or X = exp(z)
n
Then, dX = e dZ, and Equation 44 can be written as follows\
E)e*( = /—£-_. e " * " ' dz (42)
JQ a ^2ir
We now observe that, if the substitutions x = z and t = 1 are made
in the expression for the moment generating function for the
normal distribution given by Equation 23, the result will be
identical to Equation 42. Thus, we can use the same approach
that was used to evaluate Equation 23 to obtain a result for
Equation 42. By setting t = 1 in Equation 25, we obtain the
following expression for the expected value, or arithmetic mean,
of the lognormal probability model:
H +
Ejxf = e (45)
To obtain the higher moments, we apply Equation 8, the expecta-
tion operator for the jth moment about the origin:
1/x - n\
{XJ| = f -£il- e a dx (46)
The exponent j-1 appears in the numerator instead of j, because
the lognormal PDF contains x in the denominator. Using the same
31
-------
approach as in Equation 42, we let Z = In X; then, X = exp(z),
dX = eZdZ, and x5"1 = e^"1^2. Thus,
00
C e
1/z - MA2
e ' e" dz
JQ 0J*Z
, -
Jz 2\ a /
~= e dz (47)
V2ir
In a similar fashion, we conclude that Equation 47 is identical
in form to Equation 23, and we use the result obtained for that
situation; namely, Equation 25 with t = j:
E|XJ} = e (48)
This result implies that the expectation operator for a lognormally
distributed random variable X happens to be identical to the
moment generating function MJz} for a normally distributed random
variable Z = In X.
The general result given by Equation 48 allows us to calculate
all the moments about the origin for the lognormal probability
model with relative ease:
M. + 75-
Ma = e ^ (49)
p
2|j. + 2a
H2 = e (50)
32
-------
9 ?
I G
e 2 (51)
* 8a2
= e (52)
The variance, or second moment about the mean, Is calculated
from the above relationships using the moment expansion given
in Equation lla:
var(X) =
- l) (53)
Similarly, the third moment about the mean can be obtained
using the moment expansion given by Equation 35 •
3
3 = "
m = ^ " MM' + 2|JL1
2
3n + a2 n + %- ( 2n + 2a2 ) ( .p. +
= e - 3e e 2
3n + - a 3M- +
=e - 3e +2e
U
= e
33
-------
Using Equations 53 and 54, we can derive an expression for the
coefficient of skewness
Squaring both sides.,
_
x x 2 xl/2
)(ea - l) (55)
2 x x 2
+2e
Finally, the fourth moment about the mean can be obtained by
using the moment expansion given by Equation 36:
2 a2/ 9 2\ / a2\2 2
Qa LL + Vf 3a + T? a I I LL + »- I 2u + 2a
e - 4e ^ \e
M- +
)
„ d
-.(
= e
= e
o o O o
8a 4|j, + 5a 4u + 3a 4p. + 2a
- 4e + 6e - 3e
.2 2
oCT P_ x
+ 2e3a + 3e2a - 3) (56)
34
-------
Using Equations 53 and 56, we obtain the following expression
for the coefficient of kurtosis P:
P2=3 =
- 3)
3e2°2 - 3 (55)
If we add 3 to Equation 55 and then subtract 3* another factor
can be obtained, and Equation 55 can be written as follows:
. 3
= 3 + (e°2 - l)(e3°2 -f 3e2°2 + 6e°2 + e) (56)
Some authors express «p and Pp in a simplified form using the
a2
substitution oo = e :
i = (» + 2)(a - i) (57)
Kp "™ O i ^^ """ •*• / \ ' O *^ VAW r VJ y \ ^O /
When plotting the lognormal distribution, Interest also may focus
on the mode x^, which is defined as the value of X for which the
PDF has its maximum. To obtain an expression for xm, we dif-
35
-------
ferentiate Equation 4l, the PDF of the lognortnal distribution, and
set the result equal to 0;
x
= 0 (59)
Then, solving Equation 59 for x, we obtain a simple result for
the mode of the lognormal distribution:
a(ln xm - p.) - 1 = 0
(60)
Because the expressions for the lognormal distribution usually
involve M. and a, the mean and standard deviation of a normal
distribution that describes the logarithm of the variable X of
interest, we shall introduce the following notation to avoid
confusion:
a = arithmetic mean of X
I = arithmetic standard deviation of X
36
-------
Another statistic of interest is the coefficient of variation v,
which is the ratio of the arithmetic standard deviation to the
arithmetic mean:
= i
a
Expressing a in terms of Equation 45 and £ in terms of Equation
2
53, the arithmetic mean and variance C of a lognormally distributed
random variable can be written as follows:
a = e (62)
r = e \e - I/ (63)
Thus, the coefficient of variation can be written as the ratio of
<• to a, as obtained from Equations 62 and 63:
/ 2 x1/2
v = (ea - l) (64)
Often the arithmetic mean and standard deviation of the variable
of interest are known, and it is necessary to calculate M, and a.
To obtain expressions for M, and a as a function of a and £, we
first solve Equation 62 for M.:
= In a - | a2 (65)
Substituting Equation 65 into Equation 63, we can derive an
37
-------
o
expression for a as a function of a and |:
2 [in a - %-]+ a2 /a2 \ 2 ln a/ j> \
£ = e \e-l/ = e ye . iy
In £ = 2 In a + ln
(ea - l)
In i* = in|
a
-^
a
a2 = in (4+1^1 (66)
Equation 66 also can be written In terms of the coefficient of
variation:
= Vln(v2 + l) (6?)
If £ and a are known, we first use Equation 66 to compute- a.
Then, we substitute a and a into Equation 65 to compute M-. Once
li and a are known, most of the other equations given in this
chapter can be applied.
The CDF for the lognormal probability model plots as a
7
straight line on logarithmic probability paper. To determine
the vertical position of this line, it is useful to employ
the equation for the median of the lognormal distribution,
38
-------
thus obtaining a value for X which corresponds to the 50 percen-
tlle of the CDF. That Is, we use Equation 42 to find the median,
^ = XCQ, which Is the value for which FV(XCQ) = 0.50s
(68)
A second point on the straight line can be obtained by finding
the value of X which corresponds to plus one standard deviation
for the normally distributed random variable Y. This gives the
value of X which corresponds to the 84.13 percentlle of the CDF
(see Figure 2); that is, U) = 0.8413:
In Xpj, - M-
y = ^— = +1 (69)
Solving Equation 69 for xg^, we obtain '-he following simple
relationship for the second point on the line:
= ea ^ = e (70)
Some authors define the dlmensionless factor e in Equation 70 as
the "standard geometric deviation (SGD)," denoted as 0 :
O
If Equation 69 is set equal to y = -1, we obtain a value of X
39
-------
which corresponds to minus one standard deviation for the normally
distributed random variable Y. This gives the value of X which
corresponds to the 15.87 percentile of the CDF (see Figure 2);
that is, Fx^xl6^ = °-1587- Solving Equation 69 for y = -1, we
obtain the following result:
(7a)
Equation 72 actually gives a third point on the straight line,
which can be used to check the correctness of the other two
points. Notice that the following log-symmetric relationship
exists among the three points:
(73)
X50 S xl6
We can Illustrate the method of plotting the lognormal probability
model by selecting a simple example in which the arithmetic mean
of X remains constant at a = 3.0 ppm CO, and the arithmetic stan-
dard deviation takes on four successive values, £ = 0.5, 1.0, 1.5,
and 2.0 ppm. Applying the various formulas given in this chapter,
the results listed in Table 1 are obtained.
Notice that p. and a are different for every case, even
though the arithmetic mean is constant. The median m Is always
less than the arithmetic mean a, a general property of the log-
normal distribution, although the median becomes increasingly
close to the arithmetic mean as the coefficient of variation v
becomes smaller. For values of v less than about one-sixth
40
-------
Table 1
Example of Values Calculated for the Lognorraal Model
arith.
mean
a
ppro
3.0
3.0
3.0
3.0
std.
dev.
1
ppm
0.5
1.0
1.5
2.0
coef .
var.
V
0.16667
0.33333
0.50000
0.66667
a
0.1655
0.3246
0.4724
0.6064
M.
1.0849
1.0459
0.9870
0.9148
mode
\
PPtn
2.8792
2.5614
2.1466
1.7281
median
fa
ppm
2.959
2.846
2.683
2.496
SGD
a
g
1.1800
1.3835
1.6038
1.8338
(0.16667), the PDF of the lognormal probability model appears
almost symmetrical when plotted and resembles the PDF of the
normal distribution (see Figure 4).
Figure 3 shows the straight lines plotted ©n logarithmic,
probability paper for the four cases given in Table 1. This paper
was created by the authors of the present study by cutting and
pasting 2-cycle logarithmic probability papers manufactured by
Keuffel and Esser Co. This was,necessary in order to obtain
3-cycle logarithmic probability paper that could cover the entire
range of CO concentrations with a wide range for the cumulative
frequencies (that is, from 0.01 to 99.99 percent).
Sometimes it may be necessary to obtain points that are
further from the median than those given by Equations 70 and
71 in order to plot the straight line with greater precision.
One convenient way to do this is to solve Equation 69 for additional
integer multiples of the standard deviation of the normally dis-
tributed random variable Y (that is, y = +2, +3, etc.). Then
-------
0.05 0.1 0.2 O.S 1 2
CUMULATIVE FREQUENCY
10 20 30 40 50 60 70 80 90 95
CONSTRUCTED BY THE AUTHORS FROM PAPERS OF
K-E
99.5 99.8 99.9 99.99
O
5
Ul
u
0.01 0.05 0.1 0.2 0.5
5 10
20 30 40 50 60 70 80 90 95 9898 99.5
CUMULATIVE FREQUENCY
M.I 99.8
0.2
0.1
99.99
FIGURE 3. EXAMPLE OF LOGARITHMIC PROBABILITY PLOTS FOR FOUR CASES OF THE 2-PARAMETER LOGNORMAL
PROBABILITY MODEL.
-------
I
O
CO
i
C
S
IA
C
8
to
.E
0>
u
C
s^ o>
x*
O)
C
- CO
- CM
0) .
•go
:= in
*- o
-
(O O>
§•§,
o c
c n
O) >-
- c
Q 5
0. -D
Ol
X O 0>
••- t-' O
00
O
CO
o
CO
o
q
o
-------
the points are plotted on logarithmic probability paper for
x = cryXc0 at the appropriate cumulative frequencies' as shown in
Table 2. In Figure 3, these points are shown as dots along each
of the four straight lines for the four cases in the example.
Table 2
Cumulative Frequencies at Various Plotting Positions
for the 2-Parameter Lognormal Probability Model
y: -3 -2 -1 0 +1 +2 +3
•30 23
3C • jt f— s\.J w -*v r— /•%,/ U -*fcr~ f^J w J»r~ /% » ••*i~ /^ • «•« ^*c /^ ^ <^ ^*C /"^
^0 P! ^0 s 50 PI 50 K !PV S !?0 8 v^
freq °'135$ 2.275$ 15.87$ 50$ 8^.13$ 97.725$ 99.865$
Figure 4 shows the PDF's for the four cases given in Table 1.
The PDF's are skewed to the right, a general trait of the
lognormal probability model, and the skewness increases as the
standard deviation increases. From an examination of this figure,
It is not at all obvious that the arithmetic mean is the same,
a = 3.0 ppm, for all four PDF's. As the standard deviation
decreases from £ = 2.0 to £ = 0.5, it is apparent that the PDF
moves toward a symmetrical distribution resembling the normal
distribution. Figures 3 and 4 will be useful in helping the reader
understand the results presented in subsequent chapters.
-------
17. CALCULATION OP PARAMETERS
The two parameters u. and a fully define the PDF and CDF of
the lognormal probability model. If we wish to use this model
to represent the frequency distribution of a particular data set,
it first is necessary to compute the parameters of the model in
some fashion. Although the statistical literature usually
describes this process as "estimation of parameters" to empha-
size that the parameters of an underlying distribution are being
approximated, we shall use the more general phrase "calculation
of parameters" to denote any method that is used to obtain the
numerical values of the parameters, regardless of whether tests
for underlying distributions are involved.
In this research investigation, four general techniques for
calculating the parameters were investigated: (l) "method of
moments," in which the moments of the model are set equal to
the moments of the data; (2) "method of fractiles," in which the
CDF of the model is set equal to the cumulative frequencies of
the data in the manner suggested by Larsen21"2^, (3) an approxi-
mation of the maximum likelihood estimation (MLE) technique for
grouped data; and (4) a more exact MLE technique for grouped data
using computer optimization.
MOMENTS OF THE DATA
Before describing the four parameter calculation techniques,
we shall briefly discuss several statistical calculations which
can be made from the raw data in order to describe the data in a
-------
parsimonious form. These calculations were incorporated in the
computer program that was developed for this investigation (see
Appendix A).
The moments of the observations are computed in a manner
similar to that used to calculate the moments of probability
models, except that the PDF in the expectation operator is
replaced by 1/n, and the actual observed values are used in
place of x. The jth moment of the data about the origin pA
" J •
is computed in a fashion analogous to Equation 8:
n
where x. = the ith observation
n = total number of observations
The arithmetic mean x of the observations is the first moment
about the origin:
n
1=1
Similarly, the second moment about the origin is computed by
setting j=2 in Equation 7^:
1=1
46
-------
The jth moment of the data about the mean, which we shall denote
as ml, is calculated in a manner analogous to Equation 9:
J
n
1=1
Of particular importance is the second moment about the mean,
or the variance s :
n
1=1
Because n-1 independent pieces of information are used to calcu-
f-^
late Equation 1%, s is called a "biased" estimate of the variance,
and an "unbiased" estimate is obtained by dividing the second
moment about the mean by n-1 instead of n:
s2=
1=1
The square root of the variance is known as the standard deviation
~ ~2 2
of the observations, either s" or s. Because £3 and s are related
to each other as follows, the biased and unbiased estimates of the
variance are approximately the same for large n:
= s2 S. (80)
-------
In some practical applications, computation of the variance
is more convenient if the following expansion of Equation 78 is
used:
n n
1=1 1=1
n n n
i Vx2 - 2x i Vx + x2 = I Vx2 - x2 = a' - x2 (81)
n L*t i n Z-f i n Z-/xi x ^2 x ltsi;
1=1 1=1 1=1
Equation 81 is similar in form to Equation lla, the expression
for the variance of a probability model. If we solve Equation 80
for §" and substitute the result into Equation 81, the following
computational expression for the unbiased estimate of the variance
is obtained:
n n n
1=1 1=1 1=1
Notice that Equations 81 and 82 permit the variance to be cal-
culated by summing the square of the observed values rather
than summing the square of the difference' between observed
values and the mean, as in Equation 79. This expression is
very useful when hand calculations are necessary or when certain
computer programming efficiencies are required.
Using Equation 77, the third and fourth moments of the
48
-------
data about the origin are calculated as follows:
n -
"3 • H E (xi - *> (83)
1=1
n 4
= £ S (x± " *> (84)
1=1
The coefficient of skewness »/bI and coefficient of kurtosls
are calculated in a manner analogous to Equations 14 and 15:
(85)
(86)
mi
METHOD OF MOMENTS
In the '^method of moments," the moments of the model are set
equal to the moments computed from the data. To apply this tech-
nique to the lognormal probability model, we set Equation 49 equal
to Equation 75* MW = M-{* and we set Equation 50 equal to Equation
76, |X2 = M-^:
49
-------
n
- n-
1=1
(87)
2a
n
M-2 = e
n
1=1
(88)
Taking logarithms of Equations 87 and 88,
n
(89)
n
2[L + 2a = In i
1=1
(90)
Multiplying Equation 89 by 2 and subtracting the result from
o
Equation 90, we obtain the following expression for a :
= In =•
n
- 21n
1=1
(91)
Multiplying Equation 89 by 4 and subtracting Equation 90 from
the result, we obtain the following expression for u.:
n
* = 21n 5 Hxi
(92)
50
-------
Rewriting Equations 91 and 92 using the simplified notation for
the moments about the origin, we obtain two equations with two
unknowns:
a2 = -21n M + In p. (93)
= 21n n - ln nJ> (94)
— 2 —2
Note that |i| = x from Equation 75* and sf = M-2 - x from Equation
2 "-*2
81; therefore, |i^ = s + x . Substituting these values for M,'
2
and |x' into Equation 93* the following expression for a is
obtained:
a2 = ln(s2 + x2) - 21n(x)
(95)
In a similar manner, substituting these values into Equation 94
gives the following expression for p.:
UL = 2 In x - |ln(32 + x2) = In x + In x - |ln(32 + x2)
1/2 . N1/2
—2 \ / ~2 -2
= In x + In | = In x - In s x
s2+x2
= in x - ±ln[£-^- I = ln x - iin(S, + l]= In x - ^ (96)
51
-------
If s and x in Equation 95 are replaced by £ and a, respectively,
then Equation 95 will be identical to Equation 66 (Chapter III).
Similarly, if x in Equation 96 is replaced by a, then Equation 96
will be identical to Equation 65. These results imply that the
method of moments can be applied to experimental data by either
of two approaches. One approach is to calculate the moments
of the data about the origin and use Equations 93 and 94 to
determine |i and a. Another approach is to set the arithmetic
mean of the model equal to the arithmetic mean of the data, a = x,
and the standard deviation of the model equal to the biased
estimate of the standard deviation of the data, I = sT. In the
computer program developed for this study (Appendix A), the second
approach was used. The arithmetic mean and standard deviation
were calculated from the data, and Equations 95 and 96 were used
to compute \i and a. Although the unbiased estimate of the variance
o
s generally should be used in such computations, the difference
between the biased and unbiased values will be negligible when n
is large, as it usually is for a year of air quality data.
METHOD OF FRACTILES
In the "method of fractlles," the two parameters of the
model are chosen in a manner such that the CDF of the model is
equal to the cumulative frequencies of the observed data at two
points, or fractiles. In this context, a "fractlle1* is a number
between 0 and 1 which represents either the CDF of the model or
the percentiles (divided by 100) of the observations. Associated
52
-------
with each fractlle f Is a value X = xf of the CDF, such that
**fe
F-v-fx-r) = f • For example, If a fractlle Is selected as f = 0.70,
A X
then ^(XYQ) = °»7°> and we set ^vo equal ^° tne value occurring
at the 70-percentlle point of the cumulative frequencies of the
observations.
Let f and f, represent the two fractlles selected, with
corresponding CDF values of x and x,:
F(x) = f (97)
xa
= f (98)
b
where f. > f_
D d.
If we consider the random variable Y, which Is the logarithmic
transform Y = (in X - |i)/a (see Chapter III), then we can write
Equations 97 and 98 as follows:
/In x - M-\
M 1 = f (99)
Yl
= f. (100)
"b
Solving Equations 99 and 100 for the quantity In parentheses and
multiplying by a, we obtain the following pair of equations, In
which Kl (f ) denotes the Inverse CDF of the Gaussian,,or normal,
JL d.
53
-------
distribution:
In x,. -
In ^ - M- = aPy-'-C^) (102)
Subtracting Equation 101 from Equation 102, we obtain an
equation which has a as its only unknown:
" ln xa = °FY(fb) - *(t (103)
Then, solving for a,
In x, - In XQ ln
a = . ^ --S— = - U (104)
Likewise, either Equation 101 or Equation 102 can be solved for
M.:
H = In xa - aFY1(fa) (105)
M. = In x,,, - oP^C^) (106)
Equation 104, when used in conjunction with Equation 105 (or
Equation 106), provides a useful approach for calculating the
parameters M. and a for the two-parameter lognormal probability
model by the method of fractiles. Any two fractiles ffi and f^
-------
are selected, and the inverse CDF values Fy (fa) and Fy (fb) are
obtained from tables of the normal distribution. Then the pol-
lutant concentrations xa and Xv associated with these fractiles
are obtained from the cumulative frequencies of the raw observa-
tions. We substitute these four values into Equation 104 in
order to calculate a. Then a is substituted into Equation 105,
and (i is calculated.
21-24
Larsen has recommended use of the two fractiles
f0 = 0.70 and f, = 0.999 for analyzing air quality data. Although
dL D
•3
Aitchison and Brown show that the maximum efficiencies for
estimating M- and a do not occur at these particular fractiles,*
we shall use the two Larsen values in this study. By consulting
tables of the CDF of the normal distribution, and using interpola-
tion as necessary, we obtain the following inverse CDF values
for the two Larsen fractiles:
= 0.52440 (107)
= 3.09023 (108)
Substituting these values into Equation 104, we obtain a relatively
simple expression for a:
Aitchison and Brown^ shew that the maximum efficiency a£fea±n<-
able for estimating u. occurs at f = 0.27 and f, =' 0.73, while -the
a D
maximum efficiency attainable for estimating a occurs at f = 0.07
a
and f, = 0.93 when symmetrical fractiles, or quantiles, are chosen,
55
-------
In x, - In x
d = 2 a = 0>38g737 ln[^.] (109)
2.56583
Once a is calculated from Equation 109, we calculate i_i using
either Equation 105 or 106:
\L = In xb - 3-09023 (110)
H = In xa - 0.5244a (111)
24
Larsen suggests the following equation for determining
which observation should be plotted at a particular frequency:
where r = the rank order of the observations (where r is
truncated to give an integer value)
f = cumulative frequency (percent)
n = number of observations
He states that the constant 0.999 is used "...to give uniform
computer results," which select the observation closest to the
desired frequency, but no theoretical basis for this approach is
presented.
The previous discussion assumes that the 70 percentile of
the cumulative frequencies of observed concentrations is readily
56
-------
available. However, because the data usually are grouped,
there may not be a concentration which corresponds exactly to
oil
the 70 percentile and 99.9 percentlle. Larsen circumvents
this problem by approximating the values of x& and x^ as the
concentrations falling closest to the 70 percentile and the
99.9 percentile, respectively. A graphical interpolation
approach would be subjective, and too vulnerable to error.
Therefore, we seek a way to determine the 70 percentile and
99.9 percentile points that can be adapted to the computer.
Obviously, a variety of different ways can be used to approxi-
mate these two percentiles.
In our computer program (see Appendix A), the method of
fractiles was applied to the data by the subroutine RALPH. The
rank order for the 70 and 99.9 percentiles was computed by
substituting 70 and 99.9 into Equation 112, along with the
number of values n in the data set. This subroutine summed
up the number of observations in each interval i, storing the
result in JSUM., and found the interval which met the following
condition:
JSUM±_1 < r < JSUMi (113)
Consider, for example, the cumulative frequencies of CO
observations obtained for New York City (Figure 5). There are
8,364 hourly observations in this data set. Using Equation 112,
the ranking positions, denoted as Ml and B12, are computed by
the subroutine RALPH in the following manner (The notation
57
-------
"int" denotes the FORTRAN function IPIX(z), which truncates the
decimal portion of any floating point number z.)j
70 percentlie
Ml = r = Int[0.7(8364) + 0.999] = Int[5855.80] = 5855 (11*0
99.9 percentile
M2 = r = InttO.999(8364) + 0.999] = Int[8356.64] = 8356 (115)
In the computer program, the number of observations in each
interval are stored in the vector JH(l), in which I is an index.
As I is incremented, the number of observations in JH are suc-
cesively added and stored in JSUM. This subroutine finds the
first inteval in which the cumulative number of observations
just equals or exceeds r. For New York City, the following
values were determined:
70fo: 5675 < 5855 < 6054 (I = 20, xa = 19 ppm) (ll6)
99.9^: 835^ < 8356 < 8356 (l = 49, x^ = 48 ppm) (117)
The operation of this subroutine is described in greater
detail in Chapter V.
Figure 5 shows the cumulative frequencies of the obser-
vations for New York City. Notice that the observations are
stored in this program at intervals that are offset by 0.5
ppm. Interval Nos. 20 and 49 actually are at 19.5 ppm and
58
-------
89E8
99E8
WE8
Z9E8
19E8
8fr£8
8fr£8
Zfr£8
fr££8
ZEES
*z£8
81E8
iOtt
Z6Z8
9£Z8
£frZ8
OfrZE
i8l8
9E18
£80
600
i£6
9Z
£
9
t
L
I
Bl
B\
L\
3i|
LL\
LSL\
'Sti\
ZOIL]
92691
Z»99l
06C9I
f909l
9£9S|
Z02&I
90/Cfr|
iOZfl
889CI
06 1C 1
GfrSSl
981Z|
S6^lf
iee;|
ZZ01 |
E99r~
86€T
wer
S81
2Z|
8
1 I I I I I I i i I i I i i r l i i i
9'6trS'8fr
S'8tr9'£fr
9Yfr-9'9fr
9'9fr-S'Sfr
9'9fr-9'frfr
9>*-9-ev
9'efr-9'Zfr
9'2*-9' Ifr
9'ltr9'0*
9'01r9'6e
9'6£-9'8E
9'8E-9'ie
9YE-9'9E
9'9E-9'9£
9'9E-9>E
9'K-9'EE
9'£E-9'2E
g'ZE-9'lE
9'1£-9'0£
9'OE-9'6Z
9'6Z-9'8Z
9'8Z-9'iZ
9"iZ-9'9Z
9-9Z-9-9Z
9'92-9-W
9>Z-9'£2
9-£2-g'22
9-22-9'l2
9' 12-9' OZ
9'02-9'6L
9'6l-9'8l
9'81-9'il
SYl-9'91
9'9l-9'9l
9'9L-g>l
9'frl-9'£t
9'EL-9'Zl
9'2l-9'U
g'U-9'Ol
9-Ol-9'6
g-6-9'8
9'8-9'i
g-i-9'9
9'9-9'9
g-g-g>
9>9'£
9T-9'C
9-2-9'L
g-i-9'0
9'0-9'0-
NTERVAL
09
6f
8f
L*
9t
Sf
W
Ef
2fr
Ifr
Of
6£
8£
££
9£
9C
f£
E£
2£
l£
0£
62
82
a. x
92 UU
gz Q
vc. z
E2
22
12
02
61
81
L\.
91
91
frl
£1
21
U
01
6
8
i
9
9
fr
£
2
1
S
O)
T~
*-•
O
^
O
V
Z
c
w>
O
ts
1
O
O
O
>
^
O
*-
0
S
u
0)
CT
£
4-
§
<-"
(D
D
E
3
O
IT)
£
D
_O)
IT
O
o>
s s
O
CM
-------
48.5 ppra, respectively, with exact cumulative frequencies of
72.382$ and 99.904$. The concentrations used in Equation 109
are the midpoints of the intervals, x = 19 ppm and x, = 48 ppm.
A problem arises if 70 percent or more of the observations
are zero. If, for this situation, XQ = 0 is substituted into
3.
Equation 109, the result is undefined. In the present study,
this problem occurred with one data set, Barstow, CA, for which
70.08^ of the observations were zero. For this situation,
Larsen recommends that the arithmetic mean be substituted
for the 70 percentile value; that is, x = x. The subroutine
a
RALPH was designed to make this substitution automatically in
these instances.
Although the approach for determining xo and x, is patterned
a. D
24
after the report by Larsen, the theoretical basis for some of
these steps is not entirely clear. Thus, we recommend that
other formulas for the ranking position be explored, such as
the following simple equation, in which i is the cumulative
count of the number of observations:
* -
Another approach for selecting the fractiles would be to deter-
mine the cumulative frequency closest to_ the 70 percentile or
99.9 percentile. Then the exact cumulative frequencies for
these points might be substituted into Equation 104, rather
than f = 0.70 and f, ='0.999 that have been incorporated
3. 0
into Equation 109. Using this approach with New York City's
60
-------
data, we would substitute fQ = 0.67842 (that Is, 5675/8365)
a.
and fb = 0.99892 (that is, 8356/8365) into Equation 104, with
x =18.5 PPtn and x, =48.5 pptn. Use of Equation 104 is made
a o
computationally easy by the existence of convenient algorithms
for computing the CDF and inverse CDF of the normal distri-
bution, such as those incorporated into the computer program used
in this study (Appendix A). One advantage of this approach is
that the two points selected will coincide exactly with two
of the points that are plotted on logarithmic probability paper,
because the ranking position given by Equation 118 is the one
that is commonly used, in graphical work.
Another technique for calculating parameters for the
lognormal model that has been suggested by several authors '
is a combination of the method of moments and the method of
fractiles. Here, the arithmetic mean of the model is set equal
to. the arithmetic mean of the data (that is, a = x), and the
second parameter is chosen by the method of fractiles in the
manner described above. Because the mean of the model then
matches the mean of the data, this approach is useful when other
averaging times are considered, as they are in the averaging time
models. However, this approach needs further development and
evaluation before it can be recommended as a preferred technique.
MAXIMUM LIKELIHOOD ESTIMATION (MLE)
Assume that a set of n independent observations (x,,x?,..,,x )
is obtained from a distribution whose basic form is known. Using
the knowledge of these observations, what can we determine about
61
-------
most likely values of the parameters of this distribution? For
example, if we know that the observations arise from a lognormally
distributed random variable X, what are the most probable values
of M- and a, given this particular set of observations? The
method of maximum likelihood estimation (MLE) seeks to determine
the optimal values of (i and a that maximize the probability P(A)
of the event A = (x-^Xg,.. -*xn).
Assuming that a random variable is lognormally distributed,
we form the product of the PDF values for the n observations. We
call this product the likelihood function L:
l/ln xi -
n n '
L = n fY(%)= n
1=1 •*• 1=1
n n
"3C
1 n 2
o S (in x. - |i)
20^ 1=1 x
e (119)
The product L usually is a very small number. For computational
purposes, it generally is more convenient to work with the logarithm
of L. Thus, we form the log-likelihood function L*:
= n in/— 1~} - E In x - -i, £ (in x, -
V N/2T/ 1=1 x 2a^ 1=1 x
(120)
Maximizing the likelihood function L is equivalent to maximizing
62
-------
the log-likelihood function L*. To find the values of u- and cr
which maximize L*, we take first partial derivatives of L* with
respect to M. and a and then set the resulting functions equal to
zero. Differentiating Equation 120 with respect to u. and setting
the result equal to zero:
tr°- - 2n x -
dM<
1=1
Therefore,
n n
(in x, - |i) = £ In x, - np, = 0
1 X
1=1 -1- 1=1
Or,
V- = =• E in x, (121>
n 1=1 x
Differentiating Equation 120 with respect to a.and setting the
result equal to zero:
i n
- - + z
1=1
Therefore,
i n P
±K £ (In x, - n) = n
ad 1=1 1
Or,
n n n n
E (in x, - |i)2 = | E (in x,)2 - n2 (122)
1=1 1 n 1=1 1
63
-------
This result shows that the maximum likelihood estimators for
the lognormal distribution are obtained by first taking logarithms
of the observations. Then the mean of the logarithms is calcu-
lated using Equation 121, and the (biased) estimate of the variance
of the logarithms is calculated using Equation 122.
The underlying theory of maximum likelihood estimation assumes
that the observations are independent. In the present investi-
gation, the data consist of hourly observations which are known
to be serially correlated, and assumptions about independence are
violated. Here, the MLE approach is assumed to be just one of
a series of possible candidate techniques for calculating the
parameters of the lognormal distribution. Although the approach
is satisfactory for our purposes, it should be noted that the
resulting estimators, under these circumstances, will not neces-
sarily exhibit the properties associated with the theory.
MLE Approximation
A serious problem arises when we attempt to apply Equations
121 and 122 to data sets containing observations that are zero,
because the logarithm of zero is undefined. In many of the data
sets used in this Investigation, and in most environmental
measurements, one frequently encounters observations that are
reported as zero. To apply the MLE approach for the lognormal
model, it is necessary to develop a means for handling these
cases.
In developing our technique, we must note that concentrations
reported as zero are not necessarily exactly zero ppm. One reason
64
-------
is that all observations have been rounded off at the time of
the measurement to one or two significant figures; thus, 0 ppm
might have been reported even though the true observation was
0.237 ppm. This "automatic rounding" usually is a result of the
limited precision of the measuring method, and there is no way
to obtain more precise values from today's measuring methods. A
second problem is that most measuring instruments have "minimum
detectable limits" that cause them to respond poorly, or not at
all, at very low concentrations. A final reason that true zeros
are unlikely is that even the purest sample of air is likely to
contain at least a few molecules of a given pollutant, and, if
sufficient precision were possible, we would expect to observe
at least several parts-per-billion (ppb) of the pollutant rather
than exactly 0.0 ppm.
69
Nehls and Akland have suggested procedures for the
analysis of aerometric data that include ways to handle obser-
vations below the minimum detectable limit (MDL) of a given
instrument. Although a theoretical basis is not given, they
propose that a value that is the mid-point between zero and the
MDL of the measurement method be substituted for all observations
below the MDL, and they provide a table of the MDL values for
various air pollutants. For carbon monoxide, which usually is
measured by nondispersive infrared absorption (NDIR) techniques,
they recommend substitution of 286 u.g/nr (approximately 0.25 pptn)
for values below the CO MDL of 573 (ig/m^ (approximately 0.5 ppm).
Kushner™ has compared this assumption that all measurements below
65
-------
the MDL are equal to one-half the MDL, which he calls the
"L/2 approximation," with a more exact maximum likelihood
71
approach suggested by Hald. He notes that the L/2 approximation
is exact if the concentration below the MDL is a uniformly dis-
tributed random variable; if it is not, the L/2 approximation
introduces slight error. However, for most applications he
concludes that this error "...would probably be overshadowed by
the error due to finite sample and measurement noise."^
In this study, we apply the concept of the L/2 approximation
to the first interval (-0.5 ppm to + 0.5 ppm). Since negative
concentrations are impossible, the first interval of the histogram
actually extends from 0.0 ppm CO to 0.5 ppm CO. Thus, the L/2
approximation is applied by substituting one-half of 0.5 ppm,
or X = 0.25 ppm,for any concentration that appears in the first
interval. Because all CO concentrations in the first interval
are zero values, the result is to substitute 0.25 PPm for any
CO concentration reported as 0 ppm. Because of the way in which
intervals were chosen, this value happens to coincide with the
value suggested by Nehls and Akland."^
In the "MLE approximation" technique for calculating the
model's parameters, which is implemented in the subroutine AMLE
(see Appendix A), logarithms of the raw observations are taken,
and Equations 120 and 121 are used to calculate |i and cr. When-
ever a zero value is encountered, the subroutine automatically
substitutes 0.25 PPM for the observation. It should be noted
that the magnitude of the substitution value, if many zeros are
present, has a significant effect on M. and a, because logarithms
66
-------
are taken. For this reason, this study also examined the effect
of using a different substitution value, X = 0.5 pptn, on all data
sets.
MLE Optimization
To examine a somewhat different approach for implementing
the MLE concept, a computer algorithm was developed that would
iteratively find the values of |i and a that maximize the log-
likelihood function L*. In this algorithm, which is contained
in subroutine BMLE, L* was calculated by first calculating the
probability associated with each interval. Because the inter-
vals are offset by 0.5 ppm, which is one-half the interval
width (see Figure 5)* the probability that X lies within the
first interval is given as follows:
P1 = P(-0.5 ppm < X < 0-.5 ppm) = Fx(0.5) - Fx(0.0) (123)
Similarly, the probability that X lies within the second and
third intervals is given as follows:
Pp = P(0.5 ppm < X < 1.5 ppm) = FY(1.5) - FY(0.5) (124)
C. ~— A A
P3 = P(1.5 ppm < X < 2.5 ppm) = Fx(2.5) - Fx(l.5) (125)
In general, the probability P., that X lies within the ,1th
interval is given as follows, in which Ax denotes the inter-
val width, and m denotes the total number of intervals;
67
-------
- ]Ax < X < [J -
= FX([J - f]Ax) - FX([J - |]Ax) (126)
for J = 2,3, .. . ,m
Using the logarithmic transform Y = (in X - |i)/cr, Pp Is cal-
culated as follows:
(ln °-g ' ^
P2 = PY - - PY
For example, if |i = 2 and a = \, Pp is computed as follows:
P2 = PY(ln 1.5 - 2.0) - Py(ln 0.5 - 2.0)
= Py(0. 405446 - 2.0) - Fy(-0. 69314? - 2.0)
= PY(-1. 594554) - Py(-2. 69314?)
= 0.055406 - 0.003539 = 0.051867
For each observation in a given interval, we form the likelihood
function L by multiplying by the probability associated with the
interval. For example, if interval No. 2 contains 12 observations,
the factor of L for this interval is Pg2 or (0. 051867 )12. The
factors for all intervals then are multiplied together:
68
-------
L = P. ^ (128)
where P. = probability associated with the jth interval
J
m. = number of observations within the jth interval
J —
k = total number of intervals
Because Equation 128 generates numbers that are too small to be
stored on modern computers, the subroutine BLME actually works
with the logarithms of the probabilities. Thus, the log-likeli-
hood function is computed as follows:
k
L* = y^rrijlog Pj (129)
The computer program listed in Appendix A and described in
Chapter VI uses an iterative procedure to find the values of
(j, and a which maximize L*.
69
-------
V. GOODNESS-OF-FIT
Once the model's parameters have been calculated from a
particular set of observations, it is important to determine how
well the model fits the observations. Such tests may reveal that
the parameters have been calculated incorrectly or that the model
is unsuited to a particular application. Thus, the investigator
requires objective measures of the "goodness-of-fit" of the model
to the data.
Two basic types of goodness-of-fit measures are available:
(l) frequency-based measures, which depend on the number of
observations within intervals, and (2) variate-based measures,
which compare concentrations predicted by the model with concen-
trations observed at each interval. Because of different educa-
tional backgrouns, statisticians generally prefer frequency-based
measures, while engineers often prefer variate-based measures.
FREQUENCY-BASED MEASURES
One possible frequency-based measure of goodness-of-fit is
the difference between the number of observations predicted by
the model in each interval of the histogram and the number
actually observed. Because this difference sometimes will be
negative, it is convenient to form the square of the difference.
To obtain a single measure for all k intervals of the histogram,
we can add up the squares of the differences, divide by k, and
then take the square root of the result, giving the "root-mean-
square (RMS)" of predicted minus observed values:
70
-------
k
(np -n
where n = number of values predicted by the model in the
p jth interval
n = number of values actually observed in the Jth
interval
k = number of intervals
The RMS may be viewed as simply an "index" of goodness-of-fit,
and no particular probabilistic interpretation is attached to
it.
If, instead of dividing the difference squared by k, we
divide by the predicted number of observations n in each
c^
f~)
interval and then sum the result, we obtain the chi-square (x
measure of goodness-of-fit:
If the observations are independent, the result computed from
Equation 131 will have a chi-square distribution with k - r - 1
degrees of freedom, in which r is the number of parameters in the
model. Although the histogram intervals used in this study are
of uniform width, it is not necessary that each interval be the
same size. Thus, several intervals could be grouped together. Of
course, grouping reduces the number of degrees of freedom. The
71
-------
usual requirement is that at least five expected values should
appear in eaoh interval.
72
Tables of the chl-square CDF are available for various
degrees of freedom. By use of the tables, one can make a proba-
bility statement such as the following: irWe can reject the hypothe-
sis that the observations arise from a lognormal distribution with
95-percent certainty (P = 0.95)." Here, for the LN2 model, r = 2.
Because the CO hourly observations used in this study are
serially correlated, the assumption of independence does not apply,
2
making a probabilistic interpretation of x more difficult. How-
2
ever, x still provides a useful index for comparing the empirical
goodness-of-fit of different parameter calculation techniques.
Another frequency-based measure of goodness-of-fit is the
log-likelihood function itself, Equation 129, which always is
a negative number, ususally of large magnitude:
k
L* = nilog P (132)
The best fit of the model to the observations occurs when L*
is greatest; that is, when the magnitude of L* is smallest.
A final type of frequency-based measure of goodness-of-fit
is derived from the absolute value of the difference between the
CDF and the cumulative frequencies of, the observations. In the
VQ
Kolmogorov-Smirnov test, for example, t!he ingestigator computes
i
the maximum difference between the CDF and the cumulative fre-
72
-------
quencles of Individual observations. By referring to tables, he
can test the hypothesis that the observations arise from the
model under study. As discussed earlier, the hourly CO con-
centrations measured at urban air monitoring stations are
intrinsically grouped. Thus, the Kolmogorov-Smirnov computation
must b,e applied to the interval steps rather than to each obser-
vation:
D = max{|cDF, - CUMFREQ ll (133)
all J
where CDF. = CDF of the model for the jth interval
J ~~""*
CUMFREQ. * cumulative frequency of the observations
J for the jth interval
Some authors also recommend computation of the sum of the
absolute value of the differences:
k
J=l
CDF, - CUMFREQ
(134)
The above discussion covers five different measures of
goodness-of-fit, all of which were calculated by the computer
program developed for this investigation (See, for example,
the program output given in Appendix B). However, not all
of these goodness-of-flt measures are discussed in detail in
73
-------
the summary of findings (Chapters VIII and IX).
VARIATE-BASED MEASURES
The upper boundary of each interval corresponds to a par-
ticular cumulative frequency of observations, and we can use
the model to calculate the predicted concentration x for the
interval and compare it with the actual concentration of the
interval x . For example, if (x ). denotes the observed con-
centration and CUMFREQ. denotes the cumulative frequency of
observations for the jth interval, then we can use the model
to calculate the concentration predicted for that interval
by taking the inverse of the CDF: (x ), = F'1 (CUMFREQ.). The
P J -X- J
difference between the predicted and observed concentrations
for each interval can be treated as a random variable that is
Itself a measure of the goodness-of-fit of the model to the
observations:
= (xp - ,o) (135)
By examining the histogram of d, its mean d, its standard
deviation s,, its range, and other characteristics, the inves-
tigator can make statements about the practical suitability
of the model for representing a particular data set.
This approach can be illustrated by applying it to the
first 10 intervals of the New York City CO data set (See
Figure 5* page 59). Although the entire data set con-
sists of 8,364 hourly observations (See Appendix B or C
74
-------
for the complete histogram), the first 10 intervals contain only
1,397 observations (Table 2).
Table 2
Number of Observations and Cumulative Frequencies for
the First 10 Intervals of the New York City CO Data Set
J
1
2
3
4
5
6
7
8
9
10
Interval
(ppm CO)
-0.5-0.5
0.5-1-5
1.5-2.5
2.5-3.5
3.5-^.5
4.5-5.5
5.5-6.5
6.5-7.5
7.5-8.5
8.5-9.5
No. of
Observations
0
0
8
14
63
139
174
265
359
375
Cumulative
No. of Obs.
0
0
8
22
85
224
398
663
1022
1397
Cumulative
-Frequency
0.000^
0.000^
0.096^
0.263^
1.016$
2.678^
4.758^
7.927^
12.219^
16.703^
For New York City, the arithmetic mean of the hourly CO obser-
was x = 16.2074 ppm and the arithmetic standard deviation was
s = 6.8687 ppm. Calculating the parameters for the LN2 model
by the Larsen method of fractiles, we obtain u. = 2.7550 and
a = 0.36119 (See Chapter IV) . When the model's parameters are
calculated in this fashion, its arithmetic mean is a = 16.7812
ppm and its arithmetic standard deviation is £ = 6.2644 ppm.
If, for example, we wish to calculate x for the seventh inter-
£T
(j = 7), we note that the cumulative frequency of obser-
vations for values less than 6.5 ppm (x < 6.5) is 4.758^.
75
-------
Taking the inverse CDF of the normally distributed random
variable Y which describes the logarithms of concentrations,
Y = (in X - M.)/a, we obtain F~1(0.04758) = -1.6691. Then, x
and y for this interval are related as follows:
In x - 2.7550
y = 2 = -1.6691
p 0.36119
Solving for x ,
In x = (-1.6691)(0.36119) + 2.7550 = 2.1521
2.1521
x = e = 8.603 ppm
Because x =6.5 ppm for this interval, the model overpredicts
the concentration by d, = 8.603 - 6.5 = 2.103 ppm.
The predicted values and the differences d. for the first
10 Intervals are calculated in a similar fashion (Table 3). The
results for the entire histogram (the first 50 intervals) can
be presented graphically by plotting x versus XQ (Figure 6) . Here,
the 45 degree line, x = x , denotes an exact correspondence
between predicted and observed values. The vertical distance of
any point above the line is the value of d . For the New York
City data set, the model, with parameters calculated by the
method of fractiles, tends slightly to overestimate CO concen-
trations below 15 ppm* with very good correspondence over the
range from 15 ppm to 50 ppm. Of course, an exact correspondence
76
-------
50
40
a.
a
z
O
Z
o
o
o
Q
LU
DC
a.
30
20
10
10
20
30
40
50
OBSERVED CONCENTRATION XQ (ppm)
Figure 6. Predicted versus observed CO concentrations for New York City, with
parameters of the LN2 model calculated by the method of fractiles.
77
-------
Table 3
EET Values for the First 10 Intervals of the New York City Data
Set, with LN2 Parameters Calculated by the Method of Fractiles
J
1
2
3
4
5
6.
7
8
9
10
xo
(ppm CO)
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Cumulative
Frequency
CUMFREQ
J
0.000$
0 . 000$
0.096$
0.263$
1.016$
2.678$
4.758$
7.927$
12.219$
16.703$
F"1( CUMFREQ.)
*• J
a
a
-3.1037
-2.7910
-2.3207
-1.9308
-1.6691
-1.4103
-1.1642
-0.9659
xp
(ppm CO)
a
a
5.1243
5.7370
6.7991
7.8275
8.6034
9.4465
10.3246
11.0911
(ppm CO)
a
a
2.6243
2.2370
2.2991
2.3275
2.1034
1.9465
1.8246
1.5911
undefined.
occurs at the intervals corresponding to the 70 percentile
(x =19.5 ppm) and the 99.9 percentile (x =48.5 Ppm).
The characteristics of the differences can be examined in
greater detail by considering the histogram of the d's and the
statistical properties of this measure (Figure 7)• The average
difference (d~ = 0.5233 ppm) was positive, but small. No dif-
ference was less than -0.42 ppm, and the maximum positive dif-
ferences was +2.62 ppm. Of the 50 differences, 58$ were in the
range from -0.5 ppm to +0.5 ppm, and 80$ were in the range from
-0.5 ppm to +1.0 ppm. Because the precision of the NDIR measuring
instrument is, at best, on the order of +0.5 ppm, the correspon-
dence of the model to the observations, using this measure of
78
-------
o
o.
o o o o o o
»- •..•«.
O o c o o o o
a.
oooKi- o
Z IX
c
z
3
CD
a
a lu
c
«J U O •! 0
:
i
i .
c
o
c
a:
a
z
Kl in
C O
o a
— «C
CO <0
CO «-
Ul Cft
z o
« »-
ui a:
X 3
CO X
n
o
o
o
— u. u.
Kt UJ IU
IT O O
IV
O O-
in l/\
o •
«M -"
9
in
a
o
u
c
o
I-XMXMXXXXXXXXXXXXXXMM
Z OCOCOOC
uj ococeco
• ac ••••••••••••••••••••
• z
co
UJ
oocoooooc eoooooooeo
iroiroiroiroiroiroiroincincirc
b.'
o
tn
_» «99KiKtnjni-»»ooo'«-*rvi'\jKiKt99m
<. >iltiiiiit
> K
UJIIIIttltllllllllllll
IL. H-OOOOOOOOOO OOOOOOOOO
K> H»«V M — — 0 0 0 -» —MM P^m
a.
3.
o
(M
(.< '• '!» U: 3
•- 3 _l O
k. »- »- _l 4 Z
U. Z Z « > O
Oil «- Ul
O C C »- CO CO
X Z CO Ul
o
a
X (9
O •-
O
79
-------
goodness-of-fit, appears relatively high.
We shall call this measure of goodness-of-fit the
Engineering Error Test (EET). The EET, because it depends on
the difference between predicted and observed concentrations
at each interval, is a goodness-of-fit measure that has
practical significance. When one plots a set of observations
on logarithmic probability paper and then draws a straight line
to depict the LN2 model, the eye tends visually to weight each
point equally and to focus on the concentration difference
between the line and the observations. Thus, a line that
appears to be a good fit by such graphical approaches will
also do well when judged by the EET approach. Unfortunately,
not every point plotted on a logarithmic-probability plot
represents the same number of observations. Intervals near the
center of the distribution contain many observations, while
those at the tails, or extremes, contain very few. The EET
does not take this fact into account. Thus, as we shall see,
the EET tends to provide an excellent measure of how well the
model fits a particular empirical data set, which is just one
realization of the process under study, but a poor measure of
how well the model fits the underlying distribution of the
process itself.
80
-------
VI. DESCRIPTION OP THE COMPUTER PROGRAM
The computer program, which Is listed in Appendix A, con-
sists of a main program and seven subroutines, each of which
carries out a well-defined function (Table 4). Appendix B
contains an example of the output of this program for the
data from New York City. This chapter is intended to provide
brief documentation of the computer program to assist persons
wishing to use the software or the concepts contained in the
program.
Table 4
Subroutines Used in the Computer Program and Their Functions
Subroutine Function
STAT Computes basic statistics (mean, variance,
higher moments, etc.)
HISTOG Generates and prints a histogram
RALPH Computes parameters by the method of frac-
tiles
AMLE Computes parameters by the MLE approximation
BMLE Computes parameters by the MLE optimization
LN Examines goodness-of-fit of the model to
the data
AUTCOR Calculates autocorrelation function
MAIN PROGRAM
The main program reads the raw observations into the program
81
-------
and stores them. Format Statement #110 specifies the format
for reading the data. In the present version of the program,
12 values are read per record (one-half day of CO observations).
An alphanumeric label identifying the data set also can be read
and printed at the beginning of the output. If a label is used,
it is inserted immediately before the beginning of the data,
and a record containing the integers "ll" in columns 1-2 is
inserted before the label record. It is assumed that all missing
values in the raw data file have been coded with a missing value
code, which is denoted in the program as DUMMY. In the present
example, DUMMY = 9999.99. Although all observations, including
the missing values, are read into the program and counted (L
contains the total count), two different vectors are filled.
Observations with missing values are saved in the vector
STORE(9000), while a second vector, VALUE(9000), contains only
the valid observations. Here, missing values have been deleted,
and N is the total number of valid observations stored in VALUE.
Once the values have been read and stored, the subroutine
STAT is executed in order to compute basic statistics from the
raw data. Then the subroutine HISTOG partitions the observations
into intervals and prints out a histogram, along with the cumu-
lative frequencies, on the line printer. The statistics generated
by STAT and the counts of observations within intervals are used
in subsequent steps of the program.
After STAT and HISTOG have been executed, the main program
carries out four computation cycles, one for each parameter cal-
82
-------
culation technique. The computation cycles are executed in
the following order: (l) method of moments, (2) method of
fractiles, (3) MLE approximation, and (4) MLE optimization.
Within each of the computation cycles, the first step is to
obtain the values of the parameters, u. and a; then three sub-
•
routines are executed in the following order: LN, HISTOG, and
STAT. In the second computation cycle, for example, the subrou-
tlme RALPH computes the values of a. and cr (which are represented
in the program as U2 and SIG2). Next, LN is executed to deter-
mine the goodness-of-flt of a lognormal model with these param-
eters to the observations. Then, HISTOG Is executed to provide
a histogram of the differences (d's) for the Engineering Error
Test (see Chapter V). Finally, STAT is executed to compute
basic statistics of the differences. At the end of the four
computation cycles, the subroutine AUTCOR is executed to compute
the autocorrelation function.
SUBROUTINES
The following sections describe each subroutine briefly.
STAT(N,VALUE)
The subroutine STAT carries out statistical computations
on the N observations stored in the vector VALUE. It computes
the arithmetic mean, standard deviation, coefficient of
variation (ratio of standard deviation to mean), third and
fourth moments about the mean, and coefficients of skewness
and kurtosis. It also determines the minimum and maximum
observations and the second highest observation. The resulting
83
-------
values are prited out.
The arithmetic mean (AVE) and the arithmetic standard
deviation (STD) are retained in common storage for later use
by means of the statement COMMON /F/AVE,STD. Because the
method of moments requires a biased estimate of the variance,
the value s1 = s "/ (n-l)/n is computed in the main program for
use in the first computation cycle.
HISTOG(IN,N,VALUE,BASE,DELTA,LIMIT,XI,JH,PV,CUM,HI,IH)
The subroutine HISTOG partitions the observations into
intervals of size DELTA and prints out a histogram on the
line printer. Like the BASE, which is the lower limit of
the observations in the histogram, and LIMIT, which is the
number of intervals in the histogram, DELTA is a user option
that is set in the main program. In this study, BASE = -0.5
ppm; DELTA = 1.0 ppm; and LIMIT = 60 intervals. Thus, the
highest value which could be included on the histogram was
(6o)(l.O) - 0.5 = 59-5 ppm. Values above this range are
stored in the vector Hl(lOOO), and IH contains an integer
count of the number of values above the range. If the user
wishes to have the values above the range listed after the
histogram is printed, he sets IN = 1 in the main program;
otherwise the listing will be suppressed. In this study, only
one city, Phoenix, had any observations above 59.5 ppm, and
this one value was stored in HI in the Phoenix run.
The vector Xl(lOO) contains the concentration values which
comprise the abscissa of the histogram. Here, for example,
84
-------
Xl(l) = 0.5 ppm, Xl(2) = 1.5 PPm, XI(3) = 2.5 PPm, The
vector JH(lOO) contains the number of observations in each
interval; the vector FV.(lOO) contains the percentage of the
observations in each interval; and the vector CUM(lOO) contains
the cumulative frequency, by interval, expressed as a percentage,
The operation of this subroutine is described in greater detail
7.4
in a report by Ott.
RALPH (U, SIG, AVE1, LIMIT, XI, JH, FV, CUM, N)
The subroutine RALPH computes the two parameters of the
lognormal model using the method of fractiles in the manner
suggested by Larsen (see Chapter IV). In this computation,
the subroutine uses the vectors XI and JH to find the inter-
val that contains the ranked observation corresponding to
the 70 percentile and 99.9 percentile of the data. Once the
concentrations for the proper intervals have been determined,
M- and a are computed. These floating-point values are repre-
sented in the program as U and SIG, respectively.
A problem arises if more than 70 percent of the observa-
tions are zero, for Larsen's approach cannot be applied when
one of the two concentrations is zero. This situation occurre
in the data set for Barstow, CA. In this situation, Larsen"^
recommends that the arithmetic mean of the data set be substi-
tuted for the 70-percentile value. Thus, AVE1, the observed
arithmetic mean, is transferred to this subroutine, and the
substitution is made automatically for this case.
85
-------
AMLE(U,SIG,N,VALUE)
The subroutine AMLE computes the LN2 parameters by the
method of MLE approximation described in Chapter IV. This is
accomplished by taking the logarithm of each observation and
calculating M. as the mean of the logarithms and a as the
(biased) estimate of the standard deviation of the logarithms.
For observations reported as zeros, the "L/2 approximation"
is used. That is, zero values are set to the midpoint between
0.0 ppm and 0.5 ppm, or 0.25 PPtn> which is the midpoint of the
first interval.
BMLE(U,SIG,LIMIT,XI,JH,FV,CUM)
In contrast with AMLE, the subroutine BMLE is relatively
complex. It finds the values of [a, and a that maximize L*, the
log-likelihood function, by means of an iterative approach.
It accomplishes this by systematically incrementing (or decre-
menting) p. and a until a maximum is found and the absolute
value of the increment (or decrement) reaches an arbitrarily
small number. To determine whether to increment or decrement,
it tests to see which of the two choices will cause L* to
increase. For example, if adding the amount AM- to u. causes
L* to increase (become less negative), the subroutine will con-
tinue to "advance" (that is, to increment M, by Au.) on each
iteration until L* no longer increases. Then, it will divide
AM. by 2 and repeat the process. Once the maximum for u. has
been found, the process is repeated for a. When the optimum
value of a has been found, it returns to (i, and so on, ultimately
86
-------
finding the values of M, and 0 that correspond to a very small
increment (here, the process stops when A|i or Aa is less than
0.000001). Because this subroutine was developed especially
for this study as an integral part of the computer program,
the comments in the program refer to it as Ott's optimization
method.
The steps within the subroutine are reasonably straight-
forward and should be easy to follow. The value of the log-
likelihood function L* is calculated by the DO loop that ends
in Statement #20 and is stored in SUM. This DO loop is con-
tained within a second DO loop that ends in Statement #10 and
causes the log-likelihood function to be computed three times
upon each iteration. To describe this computation, we shall
assume that z denotes the value of either M- or a, depending on
which one is being optimized. If L = 1, z = M,, and if L = 2,
z = a. On the first computation (l = l), the DO loop ending in
Statement #10 computes L*(z) and stores the result in SUM1; on
the second computation (l = 2), it computes L*(z - d), where
d is the increment (represented in the program as D or DELTA),
and stores the result in SUM2; on the third computation (1=3),
it computes L*(z + d) and stores the result in SUM3.
Here, SUM1, SUM2, and SUM3 are test values of the log-like-
lihood function which are used to determine whether z should be
incremented ("advance forward") or decremented ("move backward")
or modified ("divide by 2.0 and recalculate"). The integer MEM
is intended to "remember" the direction. Initially MEM = 0, and
if SUM3 is greater than SUM1, indicating that computation should
87
-------
advance, the program sets MEM = 1. Conversely, If SUM2 Is
greater than SUM1, the program sets MEM = 2. Depending on the
outcome of these tests, the program either Increments or decre-
ments z (either U or SIG, depending on the value of the Integer
L) and returns to Statement #5. Once MEM has been set, the
Increment (or decrement) cycle Is repeated over and over, until
"overshoot" occurs. Overshoot Is determined, If MEM = 1
(advance mode), by testing whether SUM3 Is less than SUM1; con-
versely, If MEM = 2 (backward mode), by testing whether SUM2 Is
less than SUM1. Overshoot can be Interpreted as follows. If
the program Is In the advance mode, overshoot Implies that the
log-likelihood function has been successively Incremented a
number of times when suddenly it is found that, for the next
increment, the value of L*(z + d) is less than L*(z). When this
happens, control transfers to Statement #25, and the following
four steps occur: (l) d is divided by 2.0, (2) MEM Is again set
to zero, (3) the program tests to see if d is less than its
stopping point value (that is, O.OOOOOl), and, if not, (4) con-
trol returns to Statement #5-
In order to insure convergence, the algorithm also must
handle the situation in which z is very close to its optimum
value, and both L*(z - d) and L*(z + d) are less than L*(z).
In this case, which we call a "trough," the program tests to
see if both SUM2 and SUM3 are less than or equal to SUM1. If
so, control transfers to Statement #25* and the four steps
listed above are executed. That is, d is divided by 2.0, MEM
88
-------
is reset, the convergence test is made, and control returns to
Statement #25.
This subroutine always begins with SIG as the variable to
be optimized (that is, L = 2). Once the convergence test has
been satisfied for SIG, the program sets L = 1, and control
transfers to Statement #1 so the process can be repeated with
U as the variable to be optimized. When the convergence test
has been satisfied for U, the program again sets L = 2, control
transfers to Statement #1, and the process is repeated on SIG.
This process of reversing from U to SIG and back again could
be carried on indefinitely, but certain stops have been built
into the subroutine. Each time that L is changed, the old
value of the log-likelihood function L* is stored in SUMT so
that it can be compared with the new value of L*, which is
stored in SUM1. If the absolute value of the difference between
the old and new values of L*--that is, ABS(SUM1 - SUMT)--is less
than 0.0001, then the counter K is incremented. If K is 3 or
more, all computations of the subroutine are concluded, and
control returns to the main program.
As the steps of this subroutine are executed, considerable
output is generated (see Appendix B). On each iteration, the
program prints out the following information in 9 columns:
the index number of the iteration (the value of ICOUNT); the
code indicating whether M, or (J Is being optimized (the value of
L); the number of times that the probability is zero for an
Interval computed in the log-likelihood computation (the value
89
-------
of Kl); the value of L*(z), which is listed as "SUMl"; the
value of L*(z + d), which is listed as "SUM3"; the mean |i,
which is listed as "MEAN"; the standard devaition Q, which
is listed as "SIGMA"; and the increment d, which is listed as
"CHANGE." By examining this last column, we can see that the
increment, which "begins as d = 0.1, rapidly becomes very small,
aad convergence is obtained very quickly.
Although many improvements could be made to this subroutine
to make it more efficient and flexible, we found that it worked
very well for our purposes. It always converged, and it gave
a high degree of precision quite rapidly. Thus, once it was
operational, we felt that further refinements would take addi-
tional time that could better be spent on other parts of the
study. We feel we can recommend this approach for other appli-
cations similar to our own, and we feel that other investigators
also may wish to refine it more so it can be used for more
general-purpose applications.
LN(U,SIG,N,LIMIT,K,X,DIFP,XI,JH,PV,CUM,HI)
The subroutine LN calculates the goodness-of-fit of a LN2
model with parameters u. = U and a = SIG to the observations con-
tained in the vector JH with intervals specified by the vector
XI. Prior to computing these results, it calculates summary
information for the LN2 model that is specified by these two
parameters (that is, the arithmetic mean, standard deviation,
median, mode, geometric standard deviation, and coefficients of
90
-------
skewness and kurtosis). Then it prints out a table on the line
printer that enables one to compare the number of observations
predicted for each interval with the number of observations
actually observed. In this table, the first column, which is
listed as "INTERVAL", gives the abscissa of the histogram
(obtained from the vector XI); the next column, which is listed
as "LN(x)" is the natural logarithm of the upper bound of the
interval. This is followed by "Y", the number of standard
deviations from the mean for the normally distributed random
variable Y = (in X - |i)/a. Then the CDF of Y is given and is
listed as "F(Y)". By subtracting values of F(Y) for successive
intervals, we obtain the probability associated with each inter-
val, which is listed as "P(Y)". Then the expected number of
observations in the interval, listed as "PREDICTED N", is obtained
by multiplying the probability for each interval times the total
number of observations N. The next column, "OBSERVED N", lists
the actual number of observations contained in each interval;
this is followed by a column for the difference between predicted
and observed values, called "DIFFERENCE", Finally, the last two
columns contain the difference squared, listed as "DIFF. SQ.",
and the value for chi-square (x2)* listed as "CHI SQUARE."
The remaining observations are grouped into one last Inter-
val, and statistics for this last interval, along with summary
information, is printed at the bottom of the table. Information
that relates to the Kolmogorov-Smirnov goodness-of-flt statistics
also is printed out at the bottom of the table.
91
-------
After the table giving frequency-based measures of goodness-
of-flt has been printed, a second table giving variate-based
measures of goodness-of-flt is printed. This table lists the
cumulative frequency ("CUM. FREQ."), the number of standard
deviations from the mean for the normally distributed random
variable Y ("PROBITS"), the concentration predicted by the
model for each interval ("PREDICTED X"), the actual concentra-
tion observed at the interval ("OBSERVED x"), the difference
between predicted and observed concentrations ("DIFFERENCE"),
and the square of this difference ("DIFF. SQUARED"). The
difference values d. = (x - x ) . for each interval j are
stored in the vector DIFF(lOO) so that statistics and other
information can be computed from them in subsequent steps of
the program. These values are required for the Engineering
Error Test (EET) described in Chapter V.
The computations in this subroutine require the ability
to calculate the CDF of the normal distribution Fy(y) and the
inverse CDF of the normal distribution F~ (y). These calcu-
lations are carried out, with reasonably high precision,, using
75
approximations suggested by Hastings: The approximation for
Fy(y) is contained in the function GAUCDF(x), while the approxi-
mation for Fy (y) has been programed into the function GCDFl(Q).
Both these functions are listed directly after the subroutine
LN.
AUTCOR(L,N,STORE,ISTART,JUMP)
The subroutine AUTCOR computes the autocorrelation function
92
-------
for the L observations stored in the vector STORE, which con-
tains both the valid concentrations and the missing values
(which are coded as DUMMY). ISTART specifies the starting
point of the autocorrelation, and JUMP is a multiplication
factor for the lags. In most applications, ISTART = 1, and
JUMP = 1, causing the autocorrelation process to begin at the
first hour of the year, with successive lags corresponding to
Integer multiples of 1 hour. That is, for lag M, there will
be L - M pairs of observations used in the correlation compu-
tation, and each pair will be separated by M hours.
In the output for New York City (Appendix B), two auto-
correlation results are given. In the first case, ISTART =1,
and JUMP = 1; in the second case, ISTART = 6 and JUMP = 6. The
latter values cause the pairs of values for which the autocor-
relation is computed to be separated by even multiples of 6
hours, beginning at the sixth hour (that Is, 6, 12, 18, ..., etc.).
The multiplier was included in the subroutine to enable the
autocorrelation to be computed for lags that are relatively
long in time.
In its present form, this subroutine computes the auto-
correlation function for 1 to 75 lags. In the first case, lag
75 corresponds to 75 hours; in the second case, lag 75 corre-
sponds to 6 x 75 = ^50 hours. The line printer output shows
the lags in hours ("LAG HOURS"), the lags in days ("LAG DAYS"),
the number of pairs of values included in the computation of
each correlation coefficient ("PAIRS"), the covariance ("COVAR-
93
-------
IANCE"), and the correlation coefficient r ("CORRELATION COEF-
FICIENT"). A graph of the correlation coefficients, as a
function of the lags, also is printed on the line printer to
the right of the columnar output.
Because missing values cannot be included directly in the
computation, any pair of values that has at least one missing
value (that is, a value coded as DUMMY) is excluded from the
computation. As a result, the number of pairs of values
(listed under "PAIRS") varies from lag to lag. In this report,
the results from the autocorrelation subroutine are not analyzed
in detail. The subroutine was included in the program simply
to establish whether the observations were serially correlated
and the degree of correlation. It was found that most of the
data sets exhibited a strong pattern of serial correlation,
usually with a major periodicity of 24 hours and a minor per-
iodicity of 12 hours, although the latter was not very strong
or regular (see Appendices C and D). The magnitudes of the
correlation coefficients appeared to be related to the arith-
metic mean of the data set: the higher the mean, the greater the
magnitudes of the correlation coefficients.
-------
VII. SELECTION OF THE DATA
When this study was initially planned, we hoped to include
as many different pollutants, locations, and years as possible.
In particular, we hoped that the data sets ultimately selected
would be as representative of the U.S. population as possible.
We also hoped that that analytical work would be as thorough as
possible, thereby answering the questions addressed in this
study in a rigorous and definitive manner. These two competing
goals — the desire for an in-depth analysis and the desire to
include as many data sets as possible -- required compromise, for
it was not possible to carry out a detailed analysis on every
data set in the Nation.
Prior to selecting CO as the pollutant of interest, we
examined all candidate air pollutants and data sets available in
the U.S. through the Environmental Protection Agency (EPA). The
largest source of air quality data on a national basis is the
Storage And Retrieval Of Aerometric Data (SAROAD) system main-
tained within EPA by the Office of Air Quality Planning and
Standards (OAQPS) in Research Triangle Park, N.C. In selecting
the pollutants, we limited the scope of the study to those pol-
lutants for which National Ambient Air Quality Standards (NAAQS)
had been promulgated: total suspended particulates (TSP), sulfur
dioxide (SOp), ozone (0^), carbon monoxide, nitrogen dioxide
(NOp), and total hydrocarbons. In 197^, TSP was represented by
95
-------
more stations in the SAROAD data bank than any other pollutant.
When we looked into the characteristics of monitoring stations in
the data bank, we found considerable variation from station to
station. Only 100 stations could be found which measured at
least 5 of the 6 NAAQS pollutants. The measurement method,
"type" of station, and height of the intake probe above the
ground varied. (The "type" denotes whether a station is "urban,"
"rural," etc.). The measurement methods used for S02, 0~, CO,
and NOp varied from station to station. For example, in 1974, all
California stations measured total oxidant instead of ozone. A
few stations were mobile while most were stationary. Some were
urban and some were rural. Probe heights, for those stations in
which the probe height information was available, varied across
the Nation from ground level to 100 feet (Figure 8).
Although we sought stations which measured as many NAAQS
pollutants as possible, so few stations were available that
measured at least 5 of the 6 NAAQS pollutants, and so much varia-
tion occurred among them, that we decided to relax the site
selection criteria. Thus, we limited our objective to finding
all stations which measured just two air pollutants — 0.-, and
CO -- at the same location. These pollutants represent both ends
of a spectrum with regard to chemical reactivity. Of the 2^5
stations that measured either 0,, or CO in 197^, only 50 stations
could be found in SAROAD that measured both at the same location.
In our judgment, this small number of stations could not be con-
sidered representative of the U.S. population, and again it was
96
-------
974
ES
THE UNITED
W
O
1
O
E
o
o
UJ
00
O
oc
a.
8
o
I
ui
X
to
SNOIIV1S dO
97
-------
necessary to limit our objectives and relax the selection cri-
teria. Finally, we decided to select only one pollutant for our
analysis: carbon monoxide. Because CO is a primary air pollutant
and is relatively inert, it has highly favorable modeling charac-
teristics; other reasons for selecting CO are as follows:
• It has no "surrogates11 (i.e., no other pollutants are
substituted for CO as an "indicator" of CO levels)
• Almost all air monitoring stations use the same
measurement method — nondispersive infrared
absorption (NDIR)
• The measurement method has known validity and reliability
• CO ordinarily is monitored continuously, giving a large
number of hourly observations per year at each station
• Considerable literature is available on CO sources,
sinks, behavior, and effects
• CO's transport properties, because of its relative non-
reactivity, are relatively easy to handle analytically
Once CO was selected as the pollutant of interest in this
investigation, two general groups of data sets were taken from
SAROAD:
1. Geographical Group - A number of stations in one year
that were widely distributed geographically
2. Longitudinal Group - A number of different years at
one station
The geographical group consisted of 11 stations, and the longi-
tudinal group consisted of 8 years at one station, Washington,
B.C.
GEOGRAPHICAL GROUP
To obtain a geographically diverse group of stations for
98
-------
, we inventoried all CO data sets contained in SAROAD for
which a full year (four quarters) of data was available. The
result was 167 stations distributed throughout the Nation. We
then ranked these 166 stations by their arithmetic means, from
lowest to highest (Table 5)• The annual arithmetic means ranged
"3 -X- ^
from 0.2 mg/nr in Yuba City, CA, to 18.6 mg/m in New York City,
NY. Although four quarters were represented in each data set,
there were sometimes many missing values due to calibration and
instrument malfunction, and the percentage of observations avail-
able for the year ranged from 75$ to 100$.
Because the 166-station list was too large to carry out
detailed analyses of all stations, we sought to select, in a
systematic manner, a representative subset of the stations on the
list. We decided to use a stratified sample consiting of
the lowest station, the highest station, and the deciles in
between (that is, the 10-percentile, 20-percentile, and so on,
**
to the 90 percentile). Because the mean values had been rounded
off to whole ppm numbers at the monitoring station, there were
many repeated values on the list, and each decile contained more
*
Examination of this station by the authors revealed that all
values were exactly the same - 0.2 mg/m3. EPA substitutes 0.2
mg/m3 for all observations in the data bank that are zero values
to reflect minimum detectable limits of the instrument. Thus,
this data set consisted of nothing but zero values, and it was
eliminated from consideration, giving the resulting 166 stations.
** /
The percentiles (and deciles) are calculated from the ranked
data such that the Pth percentile is defined as the P(n+l)/100
order. For example,The 80th percentile (8th decile) is obtained
by computing 80(l66+l)/100 = 133.6, which tKen is rounded to 134.
Then the 134th observation is obtained from the list, which is
4.3 mg/m3, the value for Springfield, MA.
99
-------
TABLE 5
SUMMARY OF CARBON MONOXIDE MONITORING DATA FROM U.S. CITIES IN 1974^
CITY, STATE
YUBA CITY, CA
BARSTOW, CA
MILWAUKEE, WI
CHESAPEAKE, VA
WEST PALM BEACH, FL
MILWAUKEE, WI
MILWAUKEE, WI
RACINE, WI
KANSAS CITY, KS
CAMARILLO, CA
KANSAS CITY, MO
MINNEAPOLIS, MN
TOPEKA, KS
SEVEN CORNERS, VA
ROCK HILL, SC
SCRANTON, PA
DETROIT, MI
ALLEN PARK, MI
NORFOLK, VA
MEMPHIS, TN
BAYONNE, NJ
STOCKTON, CA
MONTEREY, CA
LANCASTER, CA
FRESNO, CA
MODESTO, CA
SACRAMENTO, CA
REDDING, CA
SAN LUIS OBISPO, CA
WICHITA, KS
PITTSFIELD, MA
SALEM, OR
HAMPTON, VA
ALEXANDRIA, VA
KANSAS CITY, MO
CHICO, CA
INDIO, CA
ST PAUL, MN
DETROIT, MI
PENNS GROVE, NJ
VANCOUVER, WA
PROBE
HEIGHT
(feet)
5
25
—
--
12
—
--
15
15
26
30
16
12
4
8
12
12
12
3
9
12
42
25
18
85
88
78
30
40
20
20
15
8
30
20
40
40
32
12
15
15
DATA
AVAIL.
(%)
98
80
90
86
77
93
84
96
99
91
93
91
97
96
76
80
86
78
94
97
88
95
92
98
95
85
98
99
95
94
95
81
91
98
92
99
98
91
81
90
89
ANNUAL
ARITH. MEAN
(mg/m3)
0.2
0.7
0.8
0.9
0.9
1.2
1.2
1.3
1.3
1.3
1.3
1.3
1.4
1.4
1.4
1.5
1.5
1.6
1.6
1.7
1.7
1.7
1.7
1.7
1.7
1.8
1.8
1.8
1.8
1.9
1.9
1.9
2.0
2.0
2.0
2.0
2.1
2.1
2.1
2.1
2.1
STANDARD
DEVIATION
(mg/m3)
0.00
0.91
0.90
0.74
1.22
1.32
1.48
1.20
0.99
1.15
1.42
1.23
1.17
1.16
2.76
1.25
1.81
1.40
1.20
1.68
1.11
1.48
1.05
1.35
1.43
1.26
1.49
0.86
1.10
2.50
1.36
2.10
1.64
5.33
1.97
1.87
1.46
1.49
1.59
1.99
1.63
MAXIMUM
(mg/m^)
0.2
9.2
12.6
17.8
12.1
14.3
31.5
25.0
13.0
11.5
11.6
20.7
20.0
16.1
202.0
20.0
20.9'
14.4
21.8
19.5
20:0
18.4
18.4
20.7
18.4
20.7
21.8
13.8
17.2
37.0
16.1
23.0
23.6
461.4
29.9
26.4
19.5
18.4
14.7
24.3
26.4
Ranked in order of increasing arithmetic means,
100
-------
TABLE 5 (Cont'd.)
SUMMARY OF CARBON MONOXIDE MONITORING DATA FROM U.S. CITIES IN 1974
CITY, STATE
ELIZABETH, NJ
ST PAUL, MN
CAMDEN, NJ
ROCHESTER, NY
LIVERMORE, CA
TUCSON, AZ
DENVER, CO
SANTA BARBARA, CA
SALINAS, CA
SCHENECTADY, NY
CAMDEN CO., NJ
NASHUA, NH
MANCHESTER, NH
RICHMOND, CA
FAIRBANKS, AK
SYRACUSE, NY
SEATTLE, WA
RICHMOND, VA
JACKSONVILLE, FL
WELBY, CO
LAS VEGAS, NV
OMAHA, NB
LAS CRUCES, NM
MINNEAPOLIS, MN
VISALIA, CA
BAKERSFIELD, CA
UPLAND, CA
SUNNYVALE, CA
PHILADELPHIA, PA
SYRACUSE, NY
PHILADELPHIA, PA
PITTSBURG, CA
FREMONT, CA
ANAHEIM, CA
SHIVLEY, KY
NEWPORT, KY
NEW YORK CITY, NY
NEW YORK CITY, NY
DES MOINES, IA
UTICA, NY
TACOMA, WA
RENSSELAER, NY
PROBE
HEIGHT
(feet)
12
13
12
—
5
80
15
40
31
30
45
30
35
20
10
—
15
12
11
15
35
15
50
40
26
56
25
38
11
14
17
25
18
15
20
15
—
—
20
—
12
15
DATA
AVAIL.
(*)
89
95
89
79
94
96
91
96
90
86
83
90
91
97
89
88
91
86
79
95
85
99
95
86
97
87
97
95
78
89
76
99
98
94
97
75
78
85
92
80
94
92
ANNUAL
ARITH. MEAN
(mg/m3)
2.2
2.2
2.2
2.2
2.2
2.2
2.3
2.3
2.3
2.3
2.3
2.3
2.4
2.4
2.4
2.4
2.5
2.5
2.5
2.5
2.5
2.5
2.6
2.6
2.6
2.6
2.7
2.7
2.7
2.8
2.8
2.8
2.8
2.8
2.8
2.9
2.9
2.9
2.9
2.9
2.9
3.0
STANDARD
DEVIATION
(mg/m^)
1.88
2.29
2.10
1.87
1.40
10.85
2.67
2.51
1.19
1.48
1.13
1.44
1.57
1.36
3.07
1.30
1.80
1.91
2.31
2.89
2.82
2.28
2.27
1.92
1.34
2.61
1.92
1.88
2.16
1.30
2.12
1.21
1.30
2.03
2.51
2.10
1.31
1.57
3.63
1.56
2.52
1.17
MAXIMUM
(mg/m )
26.2
28.7
38.5
23.5
17.2
696.9
34.5
31.0
14.9
24.1
12.5
17.2
18.4
12.6
35.6
16.6
19.5
24.7
30.5
27.6
57.5
24.7
23.6
35.6
12.6
29.9
23.0
23.0
33.3
13.2
29.9
13.8
12.6
19.5
23.0
23.6
9.2
23.0
33.3
21.8
32.2
12.4
101
-------
TABLE 5 (Cont'd.)
SUMMARY OF CARBON MONOXIDE MONITORING DATA FROM U.S. CITIES IN 1974
CITY, STATE
NEW YORK CITY, NY
FONTANA, CA
NAPA, CA
DENVER, CO
ARVADA, CO
SAN FRANCISCO, CA
TRENTON, NJ
MEMPHIS, TN
SAN RAFAEL, CA
ATLANTA, GA
REDWOOD CITY, CA
CONCORD, CA
LOS ANGELES, CA
COSTA MESA, CA
KANSAS CITY, KS
SAN BERNARDINO, CA
UPLAND, CA
PHILLIPSBURG, NJ
NEW YORK CITY, NY
CAMBRIDGE, MA
AZUSA, CA
PHOENIX, AZ
BURLINGTON, CA
SAN JOSE, CA
LAUDERDALE LAKES, FL
GREENWICH, CT
VALLEJO, CA
MAMARONECK, NY
BUFFALO, NY
SOMERVILLE, NJ
NIAGARA FALLS, NY
SANTA ROSA, CA
CHINO, CA
NORCO, CA
NEWHALL, CA
RUBIDOUX, CA
WHITTIER, CA
TOMS RIVER, NJ
OWENSBORO, KY
HACKENSACK, NJ
PAULSBORO, NJ
NEW YORK CITY, NY
PROBE
HEIGHT
(feet)
65
20
10
15
15
104
15
9
26
10
14
21
25
10
40
90
30
18
--
15
7
30
36
28
24
45
20
35
5
15
12
24
18
15
18
25
15
15
12
15
15
—
DATA
AVAIL.
(X)
81
90
98
88
86
97
88
93
99
95
100
99
99
94
98
93
100
79
94
90
96
94
100
90
92
95
96
93
92
88
83
99
98
89
98
83
97
88
92
81
88
84
ANNUAL
ARITH. MEAN
(mg/m3)
3.0
3.1
3.1
3.2
3.2
3.2
3.2
3.2
3.3
3.3
3.3
3.3
3.3
3.4
3.4
3.4
3.4
3.4
3.4
3.4
3.4
3.4
3.5
3.5
3.5
3.6
3.6
3.6
3.6
3.7
3.7
3.7
3.7
3.7
3.8
3.8
3.8
3.8
3.8
3.9
3.9
3.9
STANDARD
DEVIATION
(mg/m3)
1.66
1.78
1.60
3.54
3.42
1.95
2.12
2.50
1.81
2.14
1.78
1.50
3.69
2.93
1.72
2.63
1.99
1.54
1.60
2.47
1.73
4.67
1.79
2.57
1.83
3.21
2.11
2.12
1.78
3.32
1.61
1.58
2.86
1.52
1.95
2.35
3.31
3.25
2.27
2.97
2.53
2.27
MAXIMUM
(mg/m )
14.9
17.2
17.2
31.0
58.6
18.4
24.6
32.2
20.7
32.1
18.4
20.7
40.2
26.4
22.0
23.0
13.8
18.2
14.9
33.3
17.2
70.1
18.4
26.4
27.6
46.0
24.1
27.3
27.6
32.3
22.0
14.9
40.2
11.5
16.1
16.1
29.9
26.6
30.5
30.3
61.5
20.7
102
-------
TABLE 5 (Cont'd.)
SUMMARY OF CARBON MONOXIDE MONITORING DATA FROM U.S. CITIES IN 1974
CITY, STATE
ASBURY PARK, NO
POMONA, CA
LONG BEACH, CA
LOS ANGELES CO, CA
NEW YORK CITY, NY
BEDFORD, MA
SEATTLE, WA
PATTERSON, NJ
PERTH AMBOY, NJ
SPRINGFIELD, MA
PORTLAND, OR
PORTLAND, OR
CAMDEN, NJ
LOUISVILLE, KY
FREEHOLD, NJ
HEMPSTEAD, NY
NEWARK, NJ
WORCHESTER, MA
BURLINGTON, NJ
NEW YORK CITY, NY
RIVERSIDE, CA
DENVER, CO
SEATTLE, WA
PORTLAND, OR
PASADENA, CA
LA HABRA, CA
SAGINAW, MI
FAIRBANKS, AK
LOS ANGELES, CA
ELIZABETH, NJ
JERSEY CITY, NJ
LYNWOOD, CA
DENVER, CO
LENNOX, CA
RENO, NV
MORRISTOWN, NJ
BURBANK, CA
BOSTON, MA
CHICAGO, IL
NEW YORK CITY, NY
NEW YORK CITY, NY
NEW YORK CITY, NY
PROBE
HEIGHT
(feet)
14
21
21
24
--
15
12
15
15
15
18
7
15
35
15
15
12
13
15
--
12
30
15
14
18
6
10
10
72
15
15
22
9
20
9
20
15
10
10
5
5
--
DATA
AVAIL.
(%)
87
100
98
98
89
80
85
89
84
95
98
90
90
99
89
94
89
98
89
89
89
96
91
85
99
97
86
78
98
88
89
98
75
99
82
88
99
76
90
88
99
95
ANNUAL
ARITH. MEAN
(mg/m3)
3.9
3.9
3.9
4.1
4.1
4.1
4.1
4.2
4.2
4.3
4.3
4.4
4.4
4.5
4.6
4.6
4.7
4.7
4.7
4.8
4.8
5.0
5.0
5.1
5.1
5.1
5.2
5.3
5.5
5.6
5.6
5.9
6.2
6.3
6.3
6.4
6.9
7.0
7.4
10.7
14.6
18.6
STANDARD
DEVIATION
(mg/m3)
1.88
2.25
3.00
3.80
1.96
2.76
2.75
3.55
3.25
2.97
3.73
3.89
2.20
4.01
3.87
2.28
3.33
2.82
3.09
1.97
3.08
4.19
4.48
3.79
3.99
3.92
13.51
4.54
4.19
4.73
4.64
5.23
5.24
6.16
34.05
5.89
4.22
3.33
3.67
5.01
6.12
7.90
MAXIMUM
(mg/m3)
28.1
19.5
27.6
35.6
21.8
29.9
28.7
40.5
37.9
34.5
27.6
47.3
27.5
120.9
37.2
30.6
35.5
25.3
28.6
21.8
24.1
67.8
41.4
27.5
37.9
37.9
276.6
37.4
34.5
45.9
32.8
46.0
80.5
52.9
810.7
50.5
42.5
36.8
31.0
51.7
49.4
65.5
103
-------
than one station (Table 6). For example, at the sixth decile
(60 percentile), there were 9 stations with the same arithmetic
Q
mean of 3.^ mg/m . It was appropriate, in our sampling plan,
to select any of the stations that appeared within a particular
decile, because they all had the same arithmetic means. This
abundance of stations was desirable, because it permitted us to
implement a second station selection criterion.
The second criterion sought to emphasize geographical
diversity. It was Implemented by arranging the 166 data sets
in a different manner (Table 7). The left side of Table 7 shows
the data sets for four major Air Quality Control Regions (AQCR's):
Los Angeles, the New Jersey-New York-Connecticut area, San Fran-
cisco, and Philadelphia; the right side shows the remaining
stations, listed alphabetically by state. A total of 73 of the
166 stations, or ^%, were in these four AQCR's, with 25 stations,
or 15$, in the Los Angeles AQCR, and another 25 stations in the
New Jersey-New York-Connecticut AQCR. With so great a proportion
of the stations concentrated in just four AQCR's, it was apparent
that the 166 data sets were not necessarily representative of the
distribution of the U.S. population. Notice, for example, that
the Chicago AQCR, which contains a substantial U.S. urban popula-
tion, does not have a station among the 166 data sets and there-
fore is not represented an the original list at all.
In selecting the final group of 11 data sets, we decided
to Include at least one data set from each of the four AQCR's at
the left side of Table 7. These four AQCR's (Los Angeles, New
Jersey-New York-Connecticut, San Francisco, and Philadelphia)
104
-------
TABLE 6
CO AIR MONITORING STATIONS IN 1974 AT EACH DECILE
Arithmetic Data
Percentile
(*)
Lowest
10
20
30
40
50
•J\J
60
70
80
90
Highest
*First Letter:
Second Letter:
City and State
Barstow, CA
Allen Park, MI
Norfolk, VA
Hampton, VA
Alexandria, VA
Kansas City, MO
Chico, CA
Denver, CO
Santa Barbara, CA
Salinas, CA
Schenectady, NY
Camden Co., NJ
Nashua, NH
Upland, CA
Sunnyvale, CA
Philadelphia, PA
Rensselaer, NY
New York City, NY
Fontana, CA
Napa, CA
Costa Mesa, CA
Kansas City, KS
San Bernardino, CA
Upland, CA
Phillipsburg, NJ
Cambridge, MA
Azusa, CA
Phoenix, AZ
Newhall, CA
Rubidoux, CA
Whittier, CA
Toms River, NJ
Owensboro, KY
Springfield, MA
Portland, OR
Portland, OR
Pasadena, CA
La Habra, CA
New York City, NY
Mean,
(mg/m ]
0.7
1C
• 0
2.0
20
. J
2.7
3.0
3.0
3.1
3.1
3.4
3.8
4O
• J
5.1
18.6
C-Center City, S-Suburban,
C-Commercial, I-Industrial ,
Available
80
78
94
91
98
92
99
91
96
90
86
83
90
97
95
78
92
81
90
98
94
98
93
100
79
90
96
94
98
83
97
88
92
95
98
85
99
97
95
R-Rural ;
Probe
Height
(feet)
25
12
3
8
30
20
40
15
40
31
30
45
30
25
38
11
15
65
20
10
10
40
90
30
18
15
7
30
18
25
15
15
12
15
18
14
18
6
Coefficient
of
Variation
1.30
0.88
0.75
0.82
2.67
0.98
0.94.
1.16
1.09
0.52
0.64
0.49
0.63
0.71
0.70
0.80
0.39
0.55
0.57
0.52
0.86
0.51
0.77
0.59
0.45
0.73
0.51
1.37
0.51
0.62
0.86
0.86
0.60
0.69
0.87
0.74
0.78
0.77
0.42
Station
Type*
CC
SM
CR
SI
CC
SC
CR
SR
CC
SC
CR
RA
CC
CC
CC
CM
SC
RI
CC
SR
CC
CC
CC
CC
CI
SI
CC
SC
RC
CC
CC
CC
CI
CC
CC
CC
SR
M-Mobile, R-Residential
105
-------
Table 7
CO AIR QUALITY DATA SETS IN 1974, GROUPED BY AIR QUALITY CONTROL REGION (AQCR)
CITY STATE
HEIGHT PERCENT
FEET DATA
•"METROPOLITAN
AZUSA
ANAHEIM
CAMARILLO
BURBANK
WHITTIER
UPLAND
UPLAND
SANTA BARBARA
SAN BERNARDINO
RUBIDOUX
PIVERSIOE
POMONA
PASADENA
NORCO
NEWHALL
LYNWOOD
LOS ANGELES CO
LOS ANGELES
LOS ANGELES
LONG BEACH
LENNOX
LA HABRA
FONTANA
COSTA MESA
CHINO
MEAN
ug/m3
STANDARD
DEVIATION
MAXIMUM
ug/m3
LOS ANGELES AQCR***
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
7
15
26
15
15
JO
25
40
90
25
12
21
18
15
18
22
24
25
72
21
20
6
20
10
18
"*NEW JERSEY NEW YORK CONNECTICUT
ELIZABETH
BAYONNE
ASBURY PARK
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
NEW YORK CITY
MAMARONECK
HEMPSTEAD
SOMERVILLE
PERTH AMBOY
PATTERSON
NEWARK ,
MORRISTOWN
JERSEY CITY
HACKENSACK
FREEHOLD
ELIZABETH
GREENWICH
***SAN FRANCISCO
SAN JOSE
SAN FRANCISCO
SANTA ROSA
SAN RAFAEL
VALLETO
SUNNYVALE
CONCORD
FREMONT
LIVERMORE
RICHMOND
REDWOOD CITY
NAPA
PITTSBURG
BURLINGTON
•"METROPOLITAN
PAULS BORO
PENNS GROVE
TRENTON
BURLINGTON
CAMDEN CO.
CAMDEN
CAMDEN
PHILADELPHIA
PHILADELPHIA
NJ
NJ
NJ
NY
NY
NY
NY
NY
NY
NY
NY
NY
NY
NY
NY
NJ
NJ
NJ
NJ
NJ
NJ
NJ
NJ
NJ
CT
BAY
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
15
12
14
5
5
65
35
15
15
15
15
12
20
15
15
15
12
45
AREA AQCR***
28
104
24
26
20
38
21
18
5
20
14
10
25
36
96
94
91
99
97
100
97
96
93
83
89
100
99
89
98
98
98
99
98
98
99
97
90
94
98
AQCR"
88
88
87
88
99
94
78
95
84
85
89
89
81
93
94
88
84
89
89
I!
81
89
89
95
90
97
99
99
96
95
99
98
94
97
100
98
99
100
3.4
2.8
1.3
6.9
3.8
3.4
2.7
2.3
3.4
3.8
4.8
3.9
5.1
3.7
3.8
5.9
4.1
3.3
5.5
3.9
6.3
5.1
3.1
3.4
3.7
•*
5.6
1.7
3.9
10.7
14.6
3.4
2.9
18.6
3.9
2.9
4.8
4.1
3.0
3.6
4.6
3.7
4.2
4.2
4.7
!:*
3.9
4.6
2.2
3.6
3.5
3.2
3.7
3.3
3-6
2.7
3.3
2.8
2.2
2.4
3.3
3.1
2.8
3.5
1.73
2.03
1.15
4.22
3.31
1.99
1.92
2.51
2.63
2.35
3.08
2.25
3.99
1.52
1.95
5.23
3.80
3.69
4.19
3.00
6.16
3.92
1.78
2.93
2.86
4.73
1.11
1.88
5.01
6.12
1.60
1.31
7.90
2.27
1.57
1.97
1.96
1.66
2.12
2.28
3.32
3.25
3.55
3.33
45:fl
2.97
3.87
1.88
3.21
2.57
1.95
1.58
1.81
2.11
1.88
1.50
1.30
1.40
1.36
1.78
1.60
1.21
1.79
17.2
19.5
11.5
42.5
29.9
13.8
23.0
31.0
23.0
16.1
24.1
19.5
37.9
11.5
16.1
46.0
35.6
40.2
34.5
27.6
52.9
37.9
17.2
26.4
40.2
45.9
20.0
28.1
51.7
49.4
14.9
9.2
65.5
20.7
23.0
21.8
21.8
14.9
27.3
30.6
32.3
37.9
40.5
35.5
30.3
37.2
26.2
46.0
26.4
18.4
14.9
20.7
24.1
23.0
20.7
12.6
17.2
12.6
18.4
17.2
13.8
18.4
PHILDELPHIA AQCR"*
NJ
NJ
NJ
NJ
NJ
NJ
NJ
PA
PA
15
15
15
15
45
15
12
17
11
88
90
88
89
83
90
7,9
76
78
3.9
2.1
3.2
4.7
2.3
4.4
2.2
2.8
2.7
2.53
1.99
2.12
3.09
1.13
2.20
2.10
2.12
2.16
61.5
24.3
24.6
28.6
12.5
27.5
38.5
29.9
33.3
CITY STATE
HEIGHT
FEET
PERCENT
DATA
MEAN
ug/m3
STANDARD
DEVIATION
MAXIMUM
ug/m3
***OTHER AQCRS***
FAIRBANKS
FAIRBANKS
LAS VEGAS
PHOENIX
TUCSON
MEMPHIS
MEMPHIS
MONTEREY
SALINAS
SACRAMENTO
REDDING
CHICO
BAKERSFIELD
MODESTO
FRESNO
STOCKTON
VISALIA
SAN LUIS OBISPO
INDIO
LANCASTER
BARSTOU
ARVADA
UELBY
DENVER
DENVER
DENVER
DENVER
SPRINGFIELD
ALEXANDRIA
SEVEN CORNERS
JACKSONVILLE
LAUDERDALE LAKES
WEST PALM BEACH
ATLANTA
CHICAGO
OWENSBORO
SHIVLEY
LOUISVILLE
NEWPORT
OMAHA
DES MOINES
KANSAS CITY
KANSAS CITY
KANSAS CITY
KANSAS CITY
TOPEKA
WICHITA
PITTSFIELD
WORCHESTER
BEDFORD
CAMBRIDGE
BOSTON
MANCHESTER
NASHUA
SAGINAW
DETROIT
DETROIT
ALLEN PARK
MINNEAPOLIS
MINNEAPOLIS
ST PAUL
ST PAUL
RENO
TOMS RIVER
PHILLIPSBURG
SCRANTON
LAS CRUCES
UTICA
SYRACUSE
SYRACUSE
ROCHESTER
SCHENECTADY
RENSSELAER
NIAGARA FALLS
BUFFALO
ROCK HILL
PORTLAND
SALEM
PORTLAND
PORTLAND
VANCOUVER
NORFOLK
HAMPTON
CHESAPEAKE
RICHMOND
SEATTLE
TACOMA
SEATTLE
SEATTLE
RACINE
MILWAUKEE
MILWAUKEE
MILWAUKEE
AK
AK
NV
AZ
AZ
TN
TN
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CA
CO
CO
CO
CO
CO
CO
MA
VA
VA
FL
FL
FL
GA
IL
KY
KY
KY
KY
NB
IA
KS
KS
MO
MO
KS
KS
MA
MA
MA
MA
MA
NH
NH
MI
Ml
MI
MI
MN
MN
MN
MN
NV
NJ
NJ
PA
NH
NY
NY
NY
NY
NY
NY
NY
NY
SC
OR
OR
OR
OR
WA
VA
VA
VA
VA
WA
WA
WA
WA
WI
UI
WI
WI
10
10
35
30
80
9
9
25
31
78
30
40
56
88
85
42
26
40
40
18
25
15
15
15
30
15
9
15
30
4
11
24
12
10
10
12
20
35
15
15
20
15
40
20
30
12
20
20
13
IS
15
10
35
30
10
12
12
12
16
40
13
32
9
15
18
12
50
14
30
15
12
5
8
18
15
7
14
15
3
8
12
15
12
12
15
15
89
78
85
94
96
93
97
92
90
98
99
99
87
85
95
95
97
95
98
98
80
86
95
91
96
88
75
95
98
96
79
92
77
95
90
92
97
99
75
99
92
99
98
92
93
97
94
95
98
80
90
76
91
90
86
81
86
78
91
86
95
91
82
88
79
80
95
80
88
89
79
86
92
83
92
76
98
81
90
85
89
94
91
86
86
91
94
85
91
96
90
93
84
2.4
5.3
2.5
3.4
2.2
3.2
1.7
1.7
2.3
1.8
1.8
2.0
2.6
1.8
1.7
1.7
2.6
1.8
2.1
1.7
0.7
3.2
2.5
1:1
3.2
6.2
4.3
2.0
1.4
2.5
3.5
0.9
3.3
7.4
3.8
2.8
4.5
2.9
2.5
2.9
1.3
3.4
2.0
1.3
1.4
1.9
1.9
4.7
4.1
3.4
7.0
2.4
2.3
5.2
2.1
1.5
1.6
1.3
2.6
2.2
2.1
6.3
3.8
3.4
1.5
2.6
2.9
2.4
2.8
2.2
2.3
3.0
3.7
3.6
1.4
4.3
1.9
4.4
5.1
2.1
1.6
2.0
0.9
2.5
5.0
2.9
4.1
2.5
1.3
0.8
1.2
1.2
3.07
4.54
2.82
4.67
10.85
2.50
1.68
1.05
1.19
1.49
0.86
1.87
2.61
1.26
1.43
1.48
1.34
1.10
1.46
1.35
0.91
3.42
2.89
2.67
4.19
3.54
5.24
2.97
5.33
1.16
2.31
1.83
1.22
2.14
3.67
2.27
2.51
4.01
2.10
2.28
3.63
0.99
1.72
1.97
1.42
1.17
2.50
1.36
2.82
2.76
2.47
3.33
1.57
1.44
13.51
1.59
1.81
1.40
1.23
1.92
2.29
1.49
34.05
3.25
1.54
1.25
2.27
1.56
1.30
1.30
1.87
1.48
1.17
1.61
1.78
2.76
3.73
2.10
3.89
3.79
1.63
1.20
1.64
0.74
1.91
4.48
2.52
2.75
1.80
1.20
0.90
1.32
1.48
35.6
37.4
57.5
70.1
696.9
32.2
19.5
18.4
14.9
21. fl
13.8
26.4
29.9
20.7
18.4
18.4
12.6
17.2
19.5
20.7
9.2
58.6
27.6
8:i
31.0
80.5
34.5
461.4
16. 1
30.5
27.6
12.1
32.1
31.0
30.5
23.0
120.9
23.6
24.7
33.3
13.0
22.0
29.9
11.6
20.0
37.0
16.1
25.3
29.9
33.3
36.8
18.4
17.2
276.6
14.7
20.9
14.4
20.7
35.6
28.7
18.4
810.7
26.6
18.2
20.0
23.6
21.8
16.6
13.2
23.5
24.1
12.4
22.0
27.6
202.0
27.6
23.0
47.3
27.5
26.4
21.8
23.6
17.8
24.7
41.4
32.2
28.7
19.5
25.0
12.6
14.3
31.5
106
-------
represent about 60 million people, or 30$ of the U.S. population.
It was decided to distribute the remaining 7 stations among the
smaller AQCR's. The final listing of the 11 data sets (Talale 8)
therefore reflects the use of three selection criteria:
• The percentage of observations is as high as possible
(that is, as high above the 75$ minimum as possible.)
• The stations are a stratified sample (stratified by
arithmetic mean) from the universe of 166 CO stations
in SAROAD in 1974.
• To the extent possible, the stations are as geograph.-
ically diverse as possible to avoid clustering in any
one AQCR.
By emphasizing geographical diversity, we felt that the group
of data sets would better reflect the variety of meteorological
conditions and source emissions encountered in different parts
of the U.S. than if the stations were concentrated in a few
AQCR's.
Careful examination of the original list of 166 stations
(Table 5) reveals many pecularities and raises serious quality
control questions. For example, Alexandria, VA, one of the data
sets included in the geographical data group, has a reported max-
o
imum of 461.4 mg/nr (hourly average). However, measurements of
urban air quality by NDIR seldom gives values above 80 mg/nr and
almost never gives values above 110 mg/m . Thus, we were sus-
picious about this value, and subsequent discussions with air
monitoring field personnel revealed that it was an error, probablj
caused by keypunching and transfer of the data from the local air
pollution control agencies to EPA. Similar problems apparently
107
-------
00
UJ
00
•* 0.
r» 3
en o
"~^u>
O coo
5
9>
V
3.
_ CO
(/> t-
ai
ir>r— o-Jin.— <-> CM op MO)*'
•9
O
cvJ
LOOO
« i- Ivl < T3 O t-
— I*. C 1-
•3 *- t- e» « o
o >o
^-
l-aiC-r-
jCIO
r-O>
-------
occurred In the data set from Reno, NV, which contained a maximum
Q
value of 810.7 mg/m , and the data set for Tucson, AZ, which con-
o
talned a maximum value of 696.9 mg/m . Unusual and doubtful max-
o
ima also appeared in the data sets for Saginaw, MI (276.6 mg/tir)
and Rock Hill, SC (202.0 mg/m^). Unfortunately, for the lower
maxima it is difficult to determine with certainty whether the
value is a valid outlier or a keypunch error.
Overall, the existence of these peculiar and erroneous
values casts doubt on all the values comprising the data sets,
and it is evident that EPA should undertake a quality assurance
data checking program to compare the observations stored in the
data bank with those originally submitted by state and local air
pollution control agencies. One simple test, which could be
carried out at relatively low cost, would be to compare the
minimum, maximum, arithmetic mean, and arithmetic standard devia-
tion for the data set with the same statistics computed by the
agency submitting the data. Then inconsistencies could be
resolved by changing any incorrect hourly values on an observa-
tion-by-observation basis. Hopefully, the offices within EPA
that maintain the SAROAD data bank have taken these steps since
the preparation of this study. For our purposes, the erroneous
Q
maximum of 461.4 mg/m was handled by substituting a missing
value code, thereby treating it as a missing value and giving
8606 observations Instead of 8607 observations for Alexandria.
LONGITUDINAL GROUP
The longitudinal data group consisted of eight years of
109
-------
observations at the Continuous Air Monitoring Project (CAMP)
station in Washington, DC. This group covered the time period
1966-1974, but data were not available for one year, 1967 (see
Table 30 in Chapter IX, page 173). In years 1972 and 1973,
the data sets were not complete and contained only 4947 and
4269 observations, respectively. In 1974, only the first quarter
of data was available, and the data set consisted of only 1848
observations. Because these observations were not randomly dis-
tributed throughout the year, the mean of these data is not a
valid estimate of the annual mean, and the procedure described
by Hunt for computing the precision of the calculated mean
cannot be used. Consequently, the 1974 data set was not Included
in the analysis, and the results for the 1972 and 1973 data sets
are not really comparable with the results for the other data
sets in this study.
110
-------
VIII. RESULTS FOR THE GEOGRAPHICAL DATA GROUP
Previous chapters have discussed the theory behind the
methodology used in this investigation, details of the methodol-
ogy itself, the computer program developed to incorporate this
methodology, and the selection of the data used in this investi-
gation. This chapter presents the resulting analyses of the 11
CO hourly data sets which comprise the geographical data group.
As indicated in Chapter VII (pages 98-109), these data sets were
selected to represent a geographically diverse cross section of
U.S. CO data sets for a given year, 1974. In this chapter, we
first discuss the results for each of the 11 cities; then the
results for all cities grouped together are considered.
FINDINGS FOR INDIVIDUAL CITIES
The 11 cities comprising the geographical data group con-
sisted of stations with the lowest arithmetic mean, highest
arithmetic mean, and values at each decile selected from the
166-station list in the manner described in the previous chap-
ter (see Table 8). The basic statistics (Table 9) indicate that
the arithmetic mean ranges from 0.442 ppm to 16.207 ppm, and the
arithmetic standard deviation ranges from 0.859 ppm to 6.869 ppm.
The coefficient of skewness «/b, ranges from 0.805 to 4.977, and
the coefficient of kurtosis bp ranges from 4.17 to 51.47. The
following sections discuss the results for each city in the order
of their arithmetic means, from lowest (Barstow) to highest (New
111
-------
Q-
=>
i
o
— !
o
^
O-
s
o
UJ
!•—
Oi
a"
|
GO
§
a:
UJ S
—1 h-
CQ o
1- uj
GO
GO
t—
00
«£
i
o
o
GO
.
^3
^
UJ
i
g
GO
O
l-
GO
»—*
1-
oo
o
•—I
GO
oo
i— * O
U- 1—
u- a:
0
U_I
°[j^
UJ -
•-> GO
O GO
!•"• UJ
U. Z
U- 3
UJ UJ
o ^
O GO
o z
a: O
o i— "E
z < 5.
t- >• — -
GO UJ
0
.
it
«£ ,
GO
o
cc. I—
UJ <£
CO U. >
= ° UJ
Z GO
g
UJ
p
UJ »»
ef.
UJ
a.
UJ
i—
i—
GO
O
^£
^>
i—
0
o
oo
o
0
_
f»
*
^.
00
CTl
CM
at
CO
o
CM
s
o
CM
CTl
CTl
VO
l/l
(U
o
o
•
J
o
VI
s-
10
CO
o
?
o
0
^,
«f
in
^
r-
,3.
^,
VO
o
^
f^.
CO
^_
p_
CM
OO
0
>
«k
^
^_
o
VI-
J-
I
0
CM
O
O
in
O!
at
CM
00
oo
oo
^^
£
r—
CTl
CO
,
VO
0
VO
00
O
CM
•^
k
•s -P
<_) -^
0
ID J^
c s-
IV O
•o >-
ID
VI 7
10 cu
a- z
112
-------
York City).
Barstow, CA
When the cumulative frequencies of the observations for
Barstow are plotted on logarithmic probability paper (Figure 9),
a distinct downward curvature is evident. That is, the data
points (plotted as small circles) describe the locus of a curve
that is concave downward. The data set for Barstow is somewhat
unusual in that over 70$ (actually, 70.08$; see Appendix C) of the
observations are zero. All the zero values have been grouped
into the first Interval and plotted at the point whose coordi-
nates are 70.08$ (plotting position of 70.07$) and 0.5 ppm.
Because the LN2 probability model appears on logarithmic
probability paper as a straight line (see Chapter III), each of
the four parameter calculation techniques gives a straight line
in Figure 9. Each line differs in slope and vertical Intercept.
Because none of the straight lines passes through all the points,
none of the parameter calculation techniques gives a line that
unequivocally describes the observations for Barstow. The LN2
lines obtained by the MLE optimization technique and the method
of moments both give serious overpredlctions at the higher con-
centrations. For example, at the concentration of 7.5 ppm (inter-
val between 6.5 ppm and 7.5 ppm; plotting position of 99.957$)*
the model with parameters calculated by MLE optimization pre-
dicts 20.5 ppm, an overestimation of 273$* and the model with
parameters calculated by the method of moments predicts 19 ppm,
an overestimation of 253$.
113
-------
CUMULATIVE FREQUENCY
CONSTRUCTED BY THE AUTHORS FROM PAPERS OP-
PROBABILITY X I LC
KCurm. » tmt* CO
100
90
80
70
80
50
40
30
20
10
9
8
7
6
5
g 4
i-
K 3
Z
UJ
U
8 2
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
pp
££E
==
~
=
-7
--1-
-
_z
-
"_
-I :
^ =
-.-:
^-;
1 ~ -
~
f
-
1
• .«
Ml
rtt
;: I
t
t 1
t
Tit
t ;
1
i
|
f
|
JL
|
1
••
|
|-
11
-
,1
-
:
:
•
. i . .
!..
n
•
;
.j
~"jf
1 |
1
-j-1
\
\
i!|i i || i III! !i I !;
•! ! ! , ,,, '!,''.,!
--4 f-H- ti-tt TB* -H-H- -r-rff •*
..... j. •• | • • • • •
j ' i i t *
t , , i
{ i!
i ! . ', . . . , '.
!:: : i : :: :;' :' '.'
'••• ' 1 i" ": I .
' 1 ! t I i
TTTT — I — ! MM TTTT TTT t \ TT 'rT~r ""
±F --ffl tf it -
• u i !• i
i : : i •
r ' ' ii' ' ' :
ML APPROX MATIONxJ ;yf
^ IT T 1 1/[
j..... t' •<
' \ i i
/ /* *
\ A . ii''' ,f
01 005 01 0.2 0.5 1 2 5 10 20 30 40
TIL;!' "
^^^^^ —,
4i- 4^. jli; -^ — — —
„ i_ ,_j — „ — __ ___
MLE C
' /
" :' M
!•• M=: N= :• ^jr = !
;•• •::: :•:: : : ^ f ^'-- : ;
' ' • ' : : ;.' ^ V^ ' - • : ;
i i it i • ' t i
if Tj1" lf+ raf •;; ' HT- f - T
Hi \W: \:j? ff\ i i ii ii
Ii? i 1 : rW/ ! i : : i i ! i
iii iirifjl itti ;:n •
l; >&• r'i i 1 ' ; ' ii i i '
1 1 ^ ''it i ' ' t 1
Uiiiliii littiil ii
Ii I'ti " 11 till 1 Jl
/ P METHOD OF MOMENTS
' 11 lit \\"
r "
n j_ 1
PTIMIZA
' "• /
r
' /
/,
//
—
SO GO 70 10 90 IS
|—
'ifrrr
4«-i»-
TION.
,: }}
::/5
$r !
¥ Y
fy*.
..'. '.:
: : i;
i ::
*-
i
^Tt
:
~--H-
•f 4-f-
: • i
: : ~,
yy
i fi
"i f
/ [/
" ^
\>y
*.A
r; •
i !-
A
t
i
ri~
\—
~
it
t
_- it
i Hf
• ri!
; ;;
• -"j
h -j
v
1 IT
/ j
^^ |
^M^T1
JFRAC
if'
i
t
:;_
"-
-
:
-
:--•--
| f-
"J!H
=
~
—
i.^
•54-
Ml
^ • t
\—
! t/!~
Mio
4 i/
_;
HOD OF
:TILES =
.•£
ES
- =
••---.
- -
- -
I--
—
—
^T
t-
c=
in
-_
N M MS M.I (U M
"100
90
to
70
00
60
40
30
20
10
9
8
7
6
6
* I
UJ
1
0.9
0.8
0.7
0.8
0.6
0.4
aa
OJ
0.1
m
CUMULATIVE FREQUENCY
FIGURE 9. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR BARSTOW, CA, 1974 (n - 6992).
-------
When the results for the two sets of tests for goodness-of-
flt — the frequency-based -tests and the variate-based tests —
are compared, a great contrast is observed. The frequency-based
measures indicate that the best fit is obtained with parameters
estimated by MLE optimization, while the variate-based Engineer-
ing Error Test (EET) does not show that any one method is clearly
superior.
Table 10
Frequency-based Measures of Goodness-of-fit for Barstow, CA
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
0.202
0.259
0.409
0.271
S9
3.495
2.765
2.210
3.296
d.f.
6
4
4
7
RMS
181
120
439
27 :
x2
228
370
912
36 l
K-S
max
0.065
0.040
0.101
0.0051
L*
-6272
-6299
-6551
-61881
Indicates best result in each column.
For the frequency-based measures (Table 10), the MLE opti-
mization technique gives the best results for all tests. The
first two columns list the median (&) and the geometric standard
deviation (s ), which correspond to the 50-percentile and the
o
slope, respectively, of the straight lines plotted in Figure 10.
The third column lists the number of degrees of freedom (d.f.),
and the last four columns list the goodness-of-fit tests, begin-
ning with the root-mean-square (RMS) of the difference between
the number of observations predicted and observed in each inter-
Q
val, chi-square (x ), the maximum difference between the CDF and
115
-------
the observed cumulative frequency (K-S max), and the log-likeli-
hood function (L*). In each column, a footnote indicates the
"best" measure of goodness-of-fit.
Table 11
Variate-based (EET) Measures of Goodness-of-fit for Barstow, CA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
7
7
7
7
d
0.5461
-0.695
-0.991
1.158
Sd
1.787
0.5811
0.713
2.127
drain
-0.56
-1.32
-1.79
-0.151
dmax
4.41
0.13
0.121
5.67
Indicates best result in each column.
Despite the fact that MLE optimization causes the model to
overestimate concentrations at the higher end of the range, the
statistical, or frequency-based, tests of goodness-of-fit indi-
cate that it gives the best fit and greatly surpasses the fit of
all the other methods of calculating the LN2 model's parameters
for the Barstow data set. By contrast, the results for the EET
measures are not so clearcut. Table 11 shows the average dif-
ference (d") between the predicted and observed concentrations,
the standard deviation (s,) of the differences, the minimum dif-
ference (d . ), and the maximum difference (dmov) . The smallest
miri
values indicate the best fit. Both the method of moments and the
MLE optimization approach cause the model to overpredict CO con-
centrations, with MLE optimization causing the largest over-
prediction. One reason for the overpredict ion is the tendency
116
-------
for the locus of points which comprises the Barstow data to show
downward curvature. The method of fractlles gives the smallest
standard deviation of differences, largely because It passes
directly through the areas where curvature is greatest. It fits
the data at the 99.9 cumulative frequency which, for Barstow,
corresponds to a relatively high concentration level (6.5 ppm).
By contrast, the lines for MLE optimization and the method of
moments pass well above the data points at these higher levels
(Figure 9), largely because their slopes are determined by the
bulk of the data at the lower concentrations.
Overall, we can conclude that mueh of the differences
among the four parameter calculation techniques for Barstow
can be attributed to our difficulty in drawing straight lines
through points that are better described by a curve. That is,
the poor goodness-of-fit that is evident for Barstow may be
attributed largely to inadequacies of the LN2 model, which plots
as a straight line, when applied to data that show downward
curvature. This result suggests that the LN3 or LN4 models,
2 4^-46
which are covered in other reports,' J may be more suitable
for this application.
The results for Barstow also illustrate a basic difference
In the parameter calculation techniques. The method of MLE opti-
mization selects model parameters in a manner that weights heavily
those intervals which contain a large number of observations. By
iph
contrast, the method of fractiles, as suggested by Larsen, weights
the higher concentrations at the tails more heavily. Because the
tails are likely to exhibit considerable random fluctuation, the
117
-------
method of fractiles appears to fit a particular data set rela-
tively well, when, in fact, it may be responding more to random-
ness in the data than to underlying distributional properties.
If the underlying distribution generating the data is LN2 in form,
parameters calculated by the method of fractiles are likely to
have the greatest variance of the four parameter calculation
techniques, and hence the method of fractiles is a poor approach
for estimating parameters of the underlying distribution. It
fits the idiosyncracies at the tails of each data set quite well,
reflecting the randomness contained in a given realization, but
it does not respond to the underlying process generating the data
as well as the other approa
-------
PROBABILITY X 1
CUMULATIVE FREQUENCY
0.05 0.1 0 2 0.5 1 2 5 10 20 30 40 SO 60 70 «0 90 95 98 99 99.5 99.1 99J
LOG CYCLES
90
80
70
60
50
40
30
20
10
O 4
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
N -
OPTIM ZAT ON<
m
/•
TL
??
s
MET OD OF FRACTILES^ij-r
MLE APPROXIMATION
METHOD OF MOMENTS
-
/
100
90
M
TO
60
60
40
30
20
10
9
9
7
6
5
1
a9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.01 O.OS 0.1 0.2 0.5 1 2 5 10 20 30 40 SO 60 70 » 90 9S 91 M 9I.S 91.1 I
CUMULATIVE FREQUENCY
"—' 0.1
T
FIGURE 10 LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR NORFOLK, VA, 1974 (n - 8241).
119
-------
describe a slight zig-zag pattern with reversals of curvature.
The four straight lines representing the models are closer to
each other than they were for Barstow, and the lines for the
MLE approximation and the method of moments are quite close
together. In contrast with the Barstow results, these two lines,
as well as the one for MLE optimization, cause the model to
underestimate CO concentrations at the higher levels. As discus-
sed above, the method of fractiles, by design, gives a line that
lies very close to the observed values at the higher concentra-
tions (in this case, 10 to 20 ppm), However, it tends to over-
predict concentrations in the range from 2 to 10 ppm for this
data set.
When we examine the frequency-based measures of goodness-
of-fit for Norfolk (Table 12), we find that the MLE optimization
approach gives the best result in three of the four measures of
goodness-of-fit, while the method of fractiles gives the best
Table 12
Frequency-based Measures of Goodness-of-fit for Norfolk, VA
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
1.091
1.364
1.155
1.347
S9
1.978
2.074
1.886
1.797
d.f.
8
10
7
7
RMS
682
484 l
681
525
x2
2706
1953
2183
14651
K-S
max
0.194
0.126
0.174
0.0861
L*
-11230
-11063
-11020
-107851
Indicates best result in each column.
120
-------
result (i.e., the lowest value) In the root-mean-square (RMS)
measure of goodness-of-fit. Unfortunately, the RMS values cannot
be compared in this situation, because the RMS value is very sen-
sitive to the number of intervals used in the computation, which
varies from method to method, as reflected by the differing num-
ber of degrees of freedom. The difference between the predicted
P
and observed number of observations, (n - n ) . for the jth inter-
Sr *"' J
val, appears in the numerator of both the RMS and chi-square
computations (see Chapter V). For the method of fractiles, the
sum of the numerators for all intervals gives 3*0^7,590; for the
method of MLE optimization, the sum of the numerators gives
2,752,804. Thus, if the sum of the square of the difference
between the predicted and observed number of values is used as
a measure of goodness-of-fit instead of the RMS values, the MLE
optimization approach again gives the best results. As in the
case of Barstow, the frequency-based measures indicate that the
MLE optimization approach gives the best performance for Norfolk.
The results for the variate-based EET measures of goodness-
of-fit present a very different picture (Table 13). In three of
the four categories, the method of fractiles gives the best per-
formance. On the average, all approaches cause the model to
underpredict CO concentrations, with the greatest underpre-
diction occurring for the MLE approximation. In Figure 10, it
can be seen that the MLE optimization approach fits the lowest
five concentrations best, since these values account for the
bulk of the observations. By comparison, the line for the
121
-------
Table 13
Variate-based (EET) Measures of Goodness-of-f1t for Norfolk, VA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
14
14
14
14
d
-1.412
0.8961
-1.781
-1.625
Sd
1.585
0.7701
1.863
1.929
dm1n
-4.45
-0.421
-5.25
-5.22
dmax
0.38
1.87
0.351
0.60
Indicates best result in each column.
method of fractiles Is closest to the two lowest concentrations
(0,5 ppm and 1.5 Ppm) and also Is closest to the highest 10 con-
centrations which produce the large differences (d . ) for the
other three approaches. Like Barstow, the LN2 model does not
appear to fit the Norfolk data set very well for any of the
parameter calculation approaches, and the LN2 model may not be
a suitable representation of the underlying distribution.
Alexandria, VA
The CO data set for Alexandria, when plotted on logarithmic-
probability paper (Figure 11), resembles the data for Norfolk more
than It does the data for Barstow. The zig-zag pattern of the data
points is similar to what one might expect as a result of sampling
from a lognormal distribution. The lines obtained by the method
of moments and the method of fractiles both fit the observations
quite well at the higher concentrations. However, the method of
moments line diverges from the data points at the lowest concentra-
122
-------
100
90
80
70
60
50
40
30
20
10
CUMULATIVE FREQUENCY
001 0.04 0.1 0.2 0.5 1 2 S 10 20 30 40 SO CO 70 13 90 95 M
»5 MI99J MJO
5
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Z
7.
METHOD
PTIM ZATONf^
): FRX C ILES'-
' /
jj
MLE APPROX MAT ON
"METHOD OF MOMENTS
rrr
1
100
90
W
70
60
BO
40
30
20
10
9
8
7
6
5
*
\
1
0.9
0.8
0.7
0.6
0.6
0.4
0.3
0.2
0.01 0.06 0.1 0.2 0.5 1 2 S 10 20 30 44 SO GO 70 M M 9S M M M.S M.| MJ
CUMULATIVE FREQUENCY
FIGURE 11. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR ALEXANDRIA, VA. 1974 (n • 8606).
0.1
123
-------
Table 14
Frequency-based Measures of Goodness-of-fit for Alexandria, VA
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
1.133
1.308
1.184
1.387
S9
2.362
2.249
2.195
2.055
d.f.
12
12
11
10
RMS
306
221
215
165 l
X2
872
467
628
338 1
K-S
max
0.114
0.061
0.080
0.0381
L*
-13265
-12986
-13072
-12879
Indicates best result in each col
umn.
tions while the method of fractiles line appears to fit'reasonably
well at the lower concentrations. We would expect a good fit to
exhibit both positive and negative deviations about the line. By
this criterion, the MLE approximation fails completely because it
underestimates every data point. Inspection of the MLE optimiza-
tion line suggests that it gives a relatively good overall fit,
Table 15
Variate-based (EET) Measures of Goodness-of-fit for Alexandria, VA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
18
18
18
18
d
-0.016 l
0.103
-1.271
-1.384
Sd
0.660
0.5071
0.740
1.039
dmin
-1.01
-0.721
-2.40
-3.12
dmax
1.45
1.33
-0.11
+0.131
Indicates best result in each column.
124
-------
showing a good relationship to the data at the middle range and
slight underestimation of the higher concentrations.
The results for the frequency-based measures of goodness-
of-fit (Table 14) bear out these findings. The MLE optimization
approach gives the best fit by all goodness-of-fit measures. The
method of fractiles gives the second-best fit, but it is well
behind MLE optimization. The results for the variate-based
goodness-of-fit tests (Table 15) present a different picture. For
the method of moments, the average deviation (d) is nearly zero,
and the method of fractles also gives an average deviation that
is close to zero. Because the method of fractiles also has the
smallest standard deviation (s,) and the smallest range (dm._ -
Q ulci.X
d . ), the variate-based measures indicate that it gives the best
fit. The MLE approximation technique always underpredicts the
concentration levels; its maximum deviation (dmr,v = -0.11 ppm)
max
is still negative, so the fit is relatively poor, as was noted
in the discussion of Figure 11. According to the variate-based
goodness-of-fit measures, the MLE optimization approach gives the
poorest fit; it has the lowest average deviation, the largest
standard deviation, the most undershoot (largest negative d in)*
and the largest range. Thus, the frequency-based measures sug-
gest that the best fit is obtained by the MLE optimization
approach, while the variate-based measures suggest that the best
fit is obtained by the method of fractiles. One way to resolve
this paradox is to recall that the two sets of goodness-of-fit
measures are designed to test for different things. The frequency-
based measures are for the purpose of testing whether the hypothe-
125
-------
sis that the underlying distribution is lognormal can be rejec-
ted. The variate-based measures are for determining how well the
model represents a particular data set, or realization, in prac-
tical terms: the difference between the predicted and observed
concentration at each interval. If a hypothesis test for log-
normality were applied to the frequency-based measures, the log-
normal assumption would be rejected with a high degree of cer-
tainty. For example, to reject the lognormal hypothesis at a
confidence level of P = 0.995 with 10 degrees of freedom, the
2
chl-square value must exceed x =25.2. For the Alexandria data,
I-)
the MLE optimization approach gives x = 165, indicating that the
underlying distribution is not lognormal. Of course, statistical
independence does not necessarily apply to these data, thereby
invalidating the hypothesis testing procedure.
Denver, CO
The observations for Denver (Figure 12) exhibit a pattern
of downward curvature that is similar to that for Barstow. Like
the Barstow data, the two MLE techniques (MLE optimization and
MLE approximation) fit the observations reasonably well for cum-"
ulative frequencies below 90$, but they cause serious overpre-
dictions for the higher levels, sometimes overpredicting the con-
centrations by a factor of two or more. The lines for the method
of moments and the method of fractiles, in contrast, fit the data
points quite well at the higher concentrations. Here, the two
i
lines lie relatively close together, which differs from the
Barstow results, where the method of moments line differed in
126
-------
001 0.05 0.1 0.2 0.5 1 2 5 10
CONSTRUCTED BY THE AUTHORS FR)
CUMULATIVE FREQUENCY
20 30 40 SO 60 70 80 90 95 98
IOM PIPERS OP-
LOG CVCLES
KCUFFEL • ESSCR CO HIM !• V
99.5 99.8 99.9 99.99
0.1
0.01 0.05 0.1 0.2 0.5 1 2 5 10 20 30 40 SO 60 70 80
CUMULATIVE FREQUENCY
95 91 99 99.5 91.1 99J 99.M
FIGURE 12. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR DENVER, CO, 1974 (n = 7952).
127
-------
Table 16
Frequency-based Measures of Goodness-of-fit for Denver, CO
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
1.204
1.237
1.152
1.110
S9
2.617
2.500
2.969
3.086
d.f.
15
14
19
19
RMS
310
356
207
190 1
X2
770
1036
475
460 l
K-S
max
0.088
0.107
0.0551
0.064
L*
-15013
-15127
-14875
-14866
Indicates best result in each column.
slope from the method of fractlles line, causing overpredictlons
at the higher concentration levels. According to the frequency-
based measures of goodness-of-fit (Table 16), the MLE optimization
approach gave the best fit for three of the measures, but the
Kolmogorov-Smirnov measure (K-S max) indicated that the MLE
approximation gave the best fit. The MLE approaches tend to
Table 17
Variate-based (EET) Measures of Goodness-of-fit for Denver, CO
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
24
24
24
24
d
0.680
-0.4721
5.041
6.250
Sd
1.376
0.6271
5.139
6.319
dmin
-0.43
-1.18
-0.211
-0.25
dmax
4.99
1.681
17.75
21.65
Indicates best result in each column.
128
-------
give good fits according to the frequency-based measures, because
the MLE approaches weight more heavily those intervals which con-
tain large numbers of observations, and the frequency-based mea-
sures are sensitive to the number of observations in each inter-
vals. Because most of the observations lie in intervals in the
middle concentration ranges, the MLE approaches tend to fit the
middle ranges better than the high ranges. When the logarithmic-
probability plot of the observations shows downward curvature,
as in the case of Barstow and Denver, the MLE optimization
approach causes the model to overpredict at the higher concentra-
tions. In Denver, for example, the MLE optimization approach
causes serious overpredictions at concentrations above 6 ppm,
which corresponds approximately to a cumulative frequency of
95$, but the long tail of the distribution for concentrations
above this level contains only 5$ of the observations. Because
the method of fractiles does not respond to the number of obser-
vations in each interval, it fits the tail quite well, and the
variate-based measures of goodness-of-fit, which also are not
responsive to the number of observations in each interval, indi-
cate that the method of fractiles gives the best fit (Table 1?).
Although the method of fractiles, on the average, causes a slight
underestimation (cf = -0.472 ppm), the line fits the observations
between -1.18 ppm and +1.68 ppm over the entire range of concen-
trations, which is far superior to the other methods. As in the
previous data sets, the frequency-based measures of goodness-of-
fit indicate that the MLE approaches give the best fit, while the
variate-based measures indicate that the method of fractiles gives
129
-------
the best fit. The analyst might conclude that the LN2 model,
which plots as a straight line, may not be appropriate for repre-
senting the curved locus of observations obtained for Barstow and
Denver .
Philadelphia, PA
The observations for Philadelphia (Figure 13) exhibit
several reversals of curvature that appear similar to the obser-
vations for Norfolk (Figure 10) . However, the Norfolk data tend
to fluctuate around a curve that is concave upward, while the
Philadelphia data exhibit the more desirable "snake on a stick"
pattern of variation about a line. Because of this tendency
of the observations to follow a straight line, the frequency-based
measures of goodness-of-fit (RMS,
K-S max) in Table 18
all are significantly lower than the corresponding measures for
Norfolk given in Table 12. Because the lines are relatively
Table 18
Frequency-based Measures of Goodness-of-fit for Philadelphia, PA
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
1.835
2.015
1.832
1.844
S9
2.027
2.135
2.158
2.066
d.f.
12
15
14
13
RMS
218
168 l
217
211
x2
638
718
617
6121
K-S
max
0.0661
0.073
0.076
0.067
L*
-12335
-12390
-12350
-12331
1
Indicates best result in each column.
130
-------
CONSTRUCTED BY THE AUTHORS PROM P
CUMULATIVE FREQUENCY
91 99 99.5 99.1 99J 99Jt
9.01 0.06 0.1 0.2 0.5 1 2 5 10 20 30 M 60 60 70 80 90
MLE APPROX MAT ON
i MLE OPTIMIZATION
METHOD OF FRACTILES.
rH- fe, METHOD OF MOMENTS 4 m
0.01 0.05 a I 0.2 0.5 1 2 S 10 20 30 40 50 60 70 80 90 95
99 99.S 99.9 99J 99J9
CUMULATIVE FREQUENCY
FIGURE 13. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR PHILADELPHIA. PA. 19?4 (n = 6836).
131
-------
close together, the values of the frequency-based measures also
are very similar to each other, and no single method of calcu-
lating parameters emerges as clearly superior. Actually, each
frequency-based measure indicates a different method of calcu-
lating parameters is "best." The same ambivalence occurs In the
variate-based measures (Table 19), in which each measure selects
a different calculation procedure as "best." For example, the
method of moments gives minimum overprediction (dm_.r = 0.76 ppm);
UlclJt
the method of fractiles gives the least underpredictlon (dmln =
-0.08 ppm); the MLE approximation approach gives the smallest
standard deviation (s^ = 0.776 ppm); and the MLE optimization
technique gives the smallest magnitude of the average difference
(d~ = -0.396 ppm). If a hypothesis test were applied to these
findings, we would be forced to reject the LN2 model as a repre-
sentation of the underlying distribution, because of the large
magnitudes of the frequency-based measures of goodness-of-fit.
Table 19
Variate-based (EET) Measures of Goodness-of-fit for Philadelphia, PA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
21
21
21
21
d
-0.919
1.521
0.742
-0.3961
Sd
1.502
0.861
0.7761
1.211
dmin
-4.29
-0.081
-0.86
-3.25
dmax
0.761
2.67
1.92
1.13
Indicates best result in each column.
132
-------
Rejection of the LN2 model may seem surprising in view of its
excellent visual fit to the observations. Rejection means that,
if one were to draw the same number of observations, at random,
from a real lognormal distribution, frequency-based measures of
goodness-of-fit as large as the ones we have obtained would be
very rare. Of course, the Philadelphia data set, like the other
data sets in this report, is not really a random sample, and
therefore application of the theory of hypothesis testing appears
questionable.
Napa, CA
The data set for Napa (Figure 14) resembles the data for
Philadelphia (Figure 13) in that the observations display the
desirable "snake on a stick" pattern about a straight line. The
method of fractiles straight line appears visually to fit the
observations quite well — from the lowest value of 0.5 ppm to
* ^
the highest value of 15-5 ppm. Because almost 85$ of the obser-
vations occur in the range from 1.5 ppm to 5-5 Pprc, and nearly
half of the observations lie in the range from 1.5 ppm to 2.5
ppm, the method of moments and the MLE approaches are greatly
affected by concentrations in these ranges. The particular
shape of the zig-zag pattern of the observations causes lines
which fit well in these ranges to lie below the observations at
the higher concentrations, thereby causing underpredictions.
Because the method of fractiles Ignores the number of observa-
#
Note that the actual range for Napa is from 0.0 ppm to 15.0 ppm
and that the 0.5 ppm increment results from data grouping.
133
-------
Table 20
Frequency-based Measures of Goodness-of-fit for Napa, CA
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
2.433
2.260
2.442
2.466
S9
1.618
1.717
1.607
1.560
d.f.
10
11
9
9
RMS
179
329
171
no1
x2
334
800
313 X
493
K-S
max
0.043
0.109
0.037
0.0271
L*
-13436
-13712
713422
-13393
Indicates best result in each column.
in each interval, the EET measures of goodness-of-fit (Table 21)
indicate that the method of fractiles gives the best fit. In
contrast, the frequency-based measures (Table 20) indicate that
the method of fractiles gives the worst fit, with the two MLE
approaches, particularly MLE optimization, performing best. If
we examine Figure 14, we note that the minimum underprediction
Table 21
Variate-based (EET) Measures of Goodness-of-fit for Napa, CA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
13
13
13
13
d
-0.338
0.065 l
-0.413
-0. 747
Sd
0.477
0.229 l
0.549
0.861
dmin
-1.08
-0.32 1
-1.23
-2.05
dmax
0.26
0.56
0.23
0.191
Indicates best result in each column.
134
-------
CONSTRUCTED BY THE AUTHORS FROM PAPERS OF
100
0.5 1 2 5 10 20 30 40 50 60 70 80 90 95 98 99 99.5 99 8 99.9 99.99
<
DC
0.1
0.01 0.05 0.1 0.2 0.5 1 2
5 10 20 30 40 50 60 70 80 90 95 98 99 99.5 99.8 99.9
CUMULATIVE FREQUENCY
0.2
0.1
99.99
FIGURE 14. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR NAPA, CA, 1974 (n = 8558).
135
-------
reported in Table 21 (d ^ = -0.32 ppm) for the method of frac-»
tiles occurs at the observed value of 1.5 ppm, which corresponds
to a cumulative frequency of 11.5$. If we examine the figure's
horizontal fit, the method of fractiles line predicts a cumula-
tive frequency of about 22.% at 1.5 ppm, a discrepancy of 9.5$.
Notice that a difference in cumulative frequency of 9.5$ is
quite serious, because it is equivalent to 0.095 x 8558 = 813
observations for the Napa data set. Thus, we have the paradox
that a small error (0.32 ppm) according to a variate-based
criterion is equivalent to a large error (9.5$) according to
a frequency-based criterion. This single data point (1.5 PPtn)
accounts for the good performance of the method of fraotiles
when judged by the variate-based measures of goodness-of-fit
and the poor performance of the method of fractiles when judged
by the frequency-based measures of goodness-of-fit. As with the
other data sets, the frequency-based measures would lead to
rejection of lognormality if a hypothesis test could be validly
applied.
Phoenix, AZ
*
When the grouped data for Phoenix are plotted on logarith-
mic-probability paper (Figure 15)* they exhibit consistent down-
ward curvature similar to the patterns for the Barstow and
Denver data sets, but somewhat more abrupt in the range between
cumulative frequencies of 90$ and 99.9$. When we first plotted
this distribution, we were struck by the two extreme values of
50 ppm and 6l ppm which do not seem to lie on the curve at all.
136
-------
CONSTRUCTED BY THE AUTHORS FRI
CUMULATIVE FREQUENCY
0.01 0.05 0.1 0.2 O.S 1 2 5 10 20 30 40 50 60 70 80 90 95
iUFTEL A ESSER CO HAM »
OM PAPERS OP:
LOG CYCLES
99 99.5 99.1 99.9 9S.99
100
90
90
TO
60
Zul• MLE' APPROXIMATION
MLE O TIMIZATION
0.01 0.05 0.1 0.2 0.5 1 2 5 10 20 30 40 SO GO 70 80 90 95 91 99 99.5 98.8 99.9
CUMULATIVE FREQUENCY
oc
0.2
0.1
99.99
FIGURE 15. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR PHOENIX. AZ, 1974 (n = 8246).
137
-------
Table 22
Frequency-based Measures of Goodness-of-fit for Phoenix, AZ
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
1.710
1.900
1.407
1.311
S9
2.830
2.388
3.617
4.020
d.f.
24
19
30
32
RMS
299
436
145
103 '
x2
2224
6969
530
4121
K-S
max
0.150
0.206
0.058
0.0421
L*
-19025
-20262
-18406
-18366
Indicates best result in each column.
The next lower value after these two data points, 34 ppm, seems
to lie on the curve with the other observations. We were puz-
zled by these two anomalies, and we Initially concluded that
these observations must be from a different population. That is
perhaps very unusual micrometeorological conditions and sources
caused an event that was quite different from the process which
generated the other observations. When we obtained the original
Table 23
Variate-based (EET) Measures of Goodness-of-fit for Phoenix, AZ
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
32
32
32
32
d
3.196
-2.2371
14.41
21.82
Sd
6.582
1.9181
19.56
28.50
dmin
-2.26
-4.89
-0.73
-0.441
dmax
18.97
1.261
59.58
87.68
Indicates best result in each column.
138
-------
data sheets which had been submitted to EPA by Arizona, we found,
to our dismay, that these two high values apparently resulted
from transcription errors which resulted when the data were key-
punched by EPA. In the following list of values, the top line
corresponds to the true values that were measured in Phoenix on
December 21, 1974, and the bottom line contains the values that
actually were stored by EPA in the SAROAD data bank:
Correct: (6,2,1,0,0,1,1,1,1,2,2,1,0,0,0,1,1,1,2,5,6,10,11,10)
SAROAD: (6,2,1,0,0,1,1,1,1,2,2,1,0,0,0,10,10,10,20,50,61,2,2,1)
Notice that erroneous values were substituted in all of the last
9 hours in the 24-hour period, and the reason for such a striking
distortion of the correct measurements is unclear. Although
these errors were easy to detect because they gave high values
which stood out from the rest of the observations, their exis-
tence made us worry about other errors that may be present in
these data sets but may be more difficult to discover (for exam-
ple, high values recorded erroneously as low values),
Unfortunately, we had completed all the computer analyses
by the time that these errors had been discovered, and, rather
than changing the values and repeating the computer runs, we
have chosen to present briefly the results for the uncorreated
SAROAD data set. These findings should convey a warning to all
data analysts to be suspicious of all data that they do not
personally collect and process. It is clear that implementation
of high levels of quality control at the measuring location does
139
-------
not prevent the data from being distorted by the time they are
stored in the EPA data bank. Again, these findings have con-
vinced us that EPA should institute a quality assurance data
checking program to compare the observations stored in its data
banks with the values originally submitted by state and local
pollution control agencies (see page 109).
Because of the sharp downward curvature of the Phoenix
observations, the two MLE approaches, influenced by the number
of observations at the lower concentrations, cause serious over-
predictions at concentrations above 12 ppm, sometimes overpre-
dicting by 300$ or more. The method of moments causes overpre-
dictions at concentrations above 20 ppm, but these are not as
serious as for the MLE techniques.
Once again, the frequency-based measures of goodness-of-
fit (Table 22) indicate that the MLE optimization approach gives
the best fit, and the variate-based measures indicate that the
method of fractiles gives the best fit (Table 23). Study of the
data points in Figure 15 reveals that none of the parameter cal-
culation techniques gives a very good fit, because the LN2 model
attempts to impose a straight line on observations that are
better fit by a curve. The poor fit is reflected by the large
values obtained for the goodness-of-fit measures. For example,
p
the method of fractiles gives a chi-square value of x = 6988,
and the MLE optimization gives an overprediction of dmQV = 87.67
ppm.
140
-------
Newhall, CA
The data sets discussed thus far all have contained some
observations reported as 0 ppm; that is, observations in the
first interval (-0.5 < X < 0.5). The Newhall data set contains
no observations in the first interval, although there are 938
observations in the second interval (0.5 < X < 1.5). A possible
explanation is that vehicular activity 24 hours a day in the
vicinity of the monitoring location may prevent concentrations
from dropping below 0.5 ppm, even on days with high winds. The
overall range of measured concentrations for Newhall was from
1 ppm to 14 ppm, and the data points representing the intervals
appear to scatter about a straight line (Figure 16). If we care-
fully examine the data points at 1.5 ppm and 14.5 ppm, there is
some indication that these two end points lie below the others,
giving the appearance of slight downward curvature, but one can-
not be sure, and the LN2 model would appear to be a reasonable
representation of the observations. All four of the. straight
lines fit the observations reasonably well., but, because the
method of fractiles must fit exactly at cumulative frequencies
of 70?° and 99.9$, it gives larger deviations at the lower fre-
quencies than the other methods. Because the other three met'hods
of calculating parameters are weighted heavily by concentrations
near the median (cumulative frequency of 50$) and the arithmetic
mean (cumulative frequency of approximately 70$), their goodness-
of-fit measures give values that are quite similar to each other
(.Table 24). For example, the Kolmogorov-Smirnov maximum differ-
ence gives 0.036, 0.038, and 0.034, respectively, for the method
141
-------
Table 24
Frequency-based Measures of Goodness-of-fit for Newhall, CA
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
2.966
3.196
2.918
2.953
S9
1.617
1.534
1.699
1.650
d.f.
12
11
13
12
RMS
152
200
139 1
142
X2
284
1214
281
245 l
K-S
max
0.036
0.070
0.036
0.0341
L*
-15991
-16280
-16002
-159791
Indicates best result in each column.
of moments, MLE approximation, and MLE optimization. Similarly,
the values of the RMS, chl-square, and log-likelihood functions
have very narrow ranges. These results indicate that the MLE
optimization approach gives the best fit. As in many of the
other data sets, the EET variate-based measures (Table 25) indi-
cate that the method of fractiles gives the best fit. Notice
Table 25
Variate-based (EET) Measures of Goodness-of-fit for Newhall, CA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
13
13
13
13
d
0.185
-0.0551
0.811
0.450
Sd
0.376
0.3131
1.016
0.619
dirn'n
-0.16
-0.65
-0.17
-0.151
dmax
1.30
0.381
3.39
2,15
Indicates best result in each column.
142
-------
001 0.05 0.1 0.2 0.5 1 2
CONSTRUCTED BY THE AUTHORS FROM PAPERSC*
l/.C PROBABILITY X 1 LOG CYCLES
CUMULATIVE FREQUENCY «£ "«^~ .—~co—....
10 20 30 40 50 60 70 SO 90 95 91 M M.S M.I MJ MM
<
cc
u
I
flO
80
70
60
SO
40
30
20
10
9
8
7
6
5
4
3
2
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
01
---•;
RTHJ
= = = = !
= --"-
=;
---
-
->
~ -
— - -
-----
'--:'
~
—
ME O
= "-:
• I
**f •
-i*' ' J 1 •
r2" '
1
Sriiifl!
-----
J_ :
). .
\r \ I '• '•
:: | i ; :
t t
"21 T""1
ji : .
j: : : :
. _ 1
i i
:
-I- ^£
! j ; •
1 ; :
\ ' '
11 :
D OF RACTILE
1
£J
^
*'.•-'..
S.-' ''
T-'i — r---
,-•1 s
it- - - M'ETHC
\ Illl
MLE OPTI
H» — t-t — 1 1 1 M 1
LE APPROXI
jj 1 II j Ml
:::i3i:3HpilP£C:iJ:£
' il ' i i
--&TJ M -—IT t-r i~ -,-U r- —
ji i ,,,!,,,, ,- , - - , -
1! :
:::: - :::5f:rjK::3l!^m
-J.-.-j-j+^^U-^— „ -J
; ; : -J
i il i ! i
: !:: . :
-j-ff — i — < — ~~~ ^~~ ~ '
1 | !.
i : •:':•• . T
::::... s *
\ :;;• : ; ;•:.•... .-v*^
::::. . : .-:••• ^.^
'. ::• : : . : . . _^ ^
: ;:;: : ; : •?'.. jA*
C\ \\ \ J* ,' 'S(-'lf ,',',,' ',',',',':',
"lV<>^ •"•: •
^•^4^
ixr^ >
- + 4 — r— ' r— 't-t T— r-r -H44- -i — r rrr
Itt- • T If ff tTffPPF
)D OF MOMENTS --t- -+• Wt -H-(4 -ri-H- 444-
l i MM i II : pi- M i :i:: i':
MIZATION I I;1 :::}• '• :.
+ -*--•»• --H ttr --rrf- r— r rr
VIATION l:t: : :: : ; ::
---t -*H tf -+H- ff r --H-
[[I I i 1 j Illii IJJpip
^ 1 tl ;
J|.. — f - 4J ||4
T It i lit
I , A I J'l
- r Hill
T jijjli' j n | I 1 i !
: :
f TTTT TTf pTT "*• "~ ' i *
t f+- J-rf fH f — — — . •
t :
I
'&
SS*
~~~^~~ — ^?^ '—
' ,? ' '•
jS ' ' :
^ -, ::: .': ' -
-^ " • V • .
»• • . :
- rri-i- f-4r 44i-r r4r rr-4 Mil' -i— i —
r i||l 141 -p4i i .Ull
-rrHttttt!-tr--rt-' —
• rtt fit tttt r* rttt tttr -^1—
;;;; i •; -H; • • ;.:!.: •;; ; .:
• TTTT Hf tfff ^~ Hrf trH- — 4—
t...... II.. .11,1 .
ii i It ! i t f i T
t :t ; { j : t - M-
i
i li 1
; ; H
•U-5- It
: : :i
T-ff-
s
-••41*
•fr\\
*T°" f*
ptt
Iff 1
ijijf
tl 4J-- ^c
-1*t
t ! f-
: : . I-
Wffflf
*-^^^-
r- -Hi ~
"T "f~r {-
: : I-
; :
' '. '• X
>fe^
-------
CONSTRUCTED BY THE AUTHORS PROM PAPERS OP:
100
0.01 0.05 0.1 0.2 O.S 1 2 5 10
CUMULATIVE FREQUENCY
20 30 M 50 60 70 80
KCUFFEL * CMCR CO
95 98 98 99.5 M.I 99.9 MM.
z
UI
0.01 0.05 0.1 0.2 0.5 I 2 5 10 20 30 40 50 60 70 80 90 95 9» 99 9S.S
CC
0.1
FIGURE 17. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR SPRINGFIELD, MA, 1974 (n = 8339).
145
-------
Table 27
Variate-based (EET) Measures of Goodness-of-fit for Springfield, MA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
26
26
26
26
d
-0.2381
0.366
4.559
2.854
Sd
0.873
0.6041
3.450
2.057
dpiin
-2.77
-0.95
-0.31
-0.271
dmax
0.68
1.32
9.52
5.45
Indicates best result in each column.
and give large overpredictions. Neither the frequency-based
measures of goodness-of-fit (Table 26) nor the variate-based
measures (Table 2?) show any parameter calculation approach
emerging as clearly superior. The goodness-of-fit measures, in
general, seem relatively large, and, if hypothesis tests were
possible, would strongly reject the LN2 model. For example, to
reject the LN2 hypothesis at P = 0.995 with 21 degrees of free-
P
dome, chi-square must exceed x = 4l.4j yet the chi-square value
2
for the MLE optimization approach is x = 1*33^. Examination of
the chi-square values on an interval-by-interval basis reveals
that the first interval makes the largest contribution. For the
2
MLE optimization approach, the first interval contributes x =
853, which is more than half the total. Thus, elimination of
the first interval from the data sex, would improve the goodness-
of-fit measures in all categories and would bring the lines
closer together. In the present situation, it is difficult to
146
-------
claim that any one parameter calculation approach performs best.
Despite the poor performance of all four parameter calculation
approaches when judged by the frequency-based measures, notice
that the variate-based measures do not present as negative a
picture. Although no one approach Is superior In all categories,
the method of fractlles seems to fit reasonably well; the model
Is within 1.3 ppm (dmov = 1.32 ppm) over the entire range o£
ulci^t
observations, with very small average overpredictlon (of = 0.366
ppm). One reason that the variate-based measures give a more
favorable result than the frequency-based measures is.that they
are less affected by the poor fit that occurs in the first inter-
val.
Pasadena, CA
When plotted on logarithmic-probability paper, the grouped
observations for Pasadena give a gently S-shaped curve (Figure
18). Like the Newhall data set, the minimum concentration for
Table 28
Frequency-based Measures of Goodness-of-fit for Pasadena, CA
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
/v
m
3.487
3.570
3.542
3.558
S9
1.996
1.901
1.918
1.900
d.f.
22
20
21
20
RMS
109
83
85
82 l
x2
327
284
2721
275
K-S
max
0.037
0.037
0.0341
0.035
L*
-19561
-19518
-19519
-195171
Indicates best result in each column.
-------
Table 29
Variate-based (EET) Measures of Goodness-of-fit for Pasadena, CA
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
28
28
28
28
d
-0.1411
-1.304
-1.152
-1.359
Sd
1.496
1.420
1.3721
1.438
dmin
-1.6521
-3.39
-3.11
-3.48
dmax
5.43
1.27
1.87
1.141
Indicates best result in each column.
Pasadena was 1 ppm (695 observations), plotted in the 1.5 ppm
interval, and there were no observations in the 0.5 ppm interval.
Of course, if more observations were taken from the same station
(for example, several years of data), we would expect more values
to appear in the first interval. The results for the four meth-
ods give lines that are extremely close together; the lines for
the method of fractiles, MLE approximation, and MLE optimization
are almost coincident. For these three approaches, /mN and s are
O
almost identical (Table 28), and the values for the frequency-
based measures of goodness-of-fit are approximately the same.
The variate-based measures of goodness-of-fit (Table 29) also
yield similar values for these three approaches, and no one
approach stands out as clearly superior. Overall, the negative
values of d" suggest that the model tends to underpredict concen-
trations for all four approaches, although the underpredictions
148
-------
100
90
80
70
0.01 0.05 0.1 0.2 O.S 1 2
*
CUMULATIVE FREQUENCY **
5 10 20 30 40 50 60 70 80 90 95 98 99 99.5 99.8 99.9
O
S.
0.1
0.01 0.05 a I 0.2 0.5 1 2 5 10 20 30 40 50 GO 70 80
CUMULATIVE FREQUENCY
95 91 99 99.5 91.8 99.9 99.99
FIGURE 18. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR PASADENA, CA, 1974 (n = 8632).
149
-------
Table 30
Frequency-based Measures of Goodness-of-fit for New York City, NY
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
m
14.92
15.72
14.72
14.75
S9
1.501
1.435
1.581
1.567
d.f.
44
41
48
47
RMS
48
62
43
43 l
x2
893
4010
4161
430
K-S
max
0.0391
0.085
0.046
0.044
L*
-27798
-28284
-27710
-27707
Indicates best result in each column.
are not great. The slight tendency toward underprediction occurs
because, although the observations lie close to the straight
lines, more data points lie above (upper left) than lie below
(lower right) the lines.
New York City
The New York City data set was the highest of any CO data
Table 31
Variate-based (EET) Measures of Goodness-of-fit for New York City, NY
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
50
50
50
50
d
1.357
0.5231
3.730
3.284
Sd
1.731
0.8331
4.623
4.071
dmin
-0.67
-0.421
-0.74
-0.70
dmax
4.99
2.621
13.83
12.18
1
Indicates best result in each column.
150
-------
. 0.01 0.05 0.1 0.2 O.S 1 2
CUMULATIVE FREQUENCY
20 30 40 50 60 70 80
CONSTRUCTED BY THE AUTHOR* FROM PAPERS Of:
100
90 95 91 9» 99.5 99.1 MJ
0.01 0.05 0.1 0.2 0.5 1 2 5 10 20 30 40 50 60 70 H 90 95 9» 99 99.5 91.1 MJ
O
P
\
0.1
FIGURE 19. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR NEW YORK CITY, NY, 1974 (n = 8364).
-------
set in the SAROAD data bank in 19?4. When plotted on logarith-
mic probability paper (Figure 19), these observations, which
range from 2 ppm to 56 ppm, showed consistent downward curvature.
Unlike the high (erroneous) outliers for Phoenix, all of the New
York city observations seem to fall on a smooth, though gradual,
curve. Like the other data sets that exhibited downward curva-
ture (Phoenix, Denver, and Barstow), the line for the method of
fractlles fits the observations quite well at cumulative frequen-
cies between 70$ and 99.99$, but seriously overpredicts concen-
trations at the lower levels. The two MLE approaches fit best
in the middle ranges (10-20 ppm) between cumulative frequencies
of 20$ and 80$, where most of the observations are located. The
MLE optimization and MLE approximation lines almost coincide,
and the method of moments line seems somewhat Intermediate
between the line for the method of moments and the two MLE lines.
The frequency-based measures of goodness-of-fit indicate that
the best fit is obtained from the MLE optimization approach,
although K-S max measure selects the method of moments as best
(Table 30). According to the frequency-based measures, the
method of fractiles gives the worst fit. The varlate-based mea-
sures indicate the opposite: the method of fractlles gives the
best fit (Table 31). Inspection of Figure 19 suggests that none
of the parameter calculation approaches can be very satisfactory
over the full range of the observations because of the basic
difficulty of drawing a straight line through observations that
are better described by a curve.
152
-------
Discussion
If the results for the 11 data sets comprising the geo-
graphical data group are compared (Table 32), it is seen that
no single parameter calculation approach emerges as superior
in all situations. According to the frequency-based measures
of goodness-of-fit, the MLE optimization approach is best in
a majority of the data sets, although the MLE approximation per-
forms almost as well in two cases. According to the variate-
based measures of goodness-of-fit, the method of fractiles was
the best approach in a majority of the data sets, although no
single approach emerged as superior in four of these. The
method of moments was' never judged to be the best approach for
calculating parameters, although it offers the practical advan-
tage that the mean of the model matches the mean of the data.
Four of the 11 data sets (36$) exhibited concave downward cur-
vature, with the remainder showing a zig-zag pattern that varied
from slight reversals of curvature to more distinct S-shaped
patterns.
Thus, the method of fractiles seems best when judged by
variate-based criteria, and the MLE optimization approach seems
best when Judged by frequency-based criteria. This dichotomy
occurs because the two sets of criteria measure different things.
Frequency-based measures, like chi-square, are sensitive to the
number of values within each interval, and variate-based measures,
like the EET tests, are sensitive to the predicted and observed
concentrations at each interval. When an observer examines a
plot of the observations on logarithmic-probability paper, the
153
-------
OH
o
PC
o
EH
ft
-i
3
O
H
OH
PC
o
PH
OJ W
EH
0
H PC
X> O
cd PH
EH
CO
O
p)
/~\
py
PH
PH
O
>H
PC
n
CO
0
d
tn
0
-P
-P
cd
rH
cd
o
•H
ft
cd
EH
j^
3
4-5
cd
t>
£_l
3
0
'O
f^
as
cj
5
a
CO
1 0
cd 3
cd
0 ©
x!S
.p
T)
>3 0
X2 CO
cd
~p 1
CO 0
0 -p
PQ cd
f -H
0
d
o
p^
CO
1 0
0 EH
EH 3
PH CO
cd
0 0
.p
TD
X» CO
cd
= ft
•P 1
CO >>
0 O
ji) d
3
D*
d
o
•H
4^
cd
N
•H
E
•H
4J>
ft
O
p-1
i-3
^
KS
it
TH
u
ejj
0
«l
5
Q
43
CO
EH
cd
PQ
bO
cd
N
1
bO
•H
IS1
CO
0
H
•H
-P
O
cd
EH
PH
ft
O
w
L-3
^
<
^
•t
cd
•H
fa
T3
^
cd
X
0
H
0
3
,p
cd
t*
^
3
O
•o
cd
d
"$.
a
to
0
H
•H
4^
O
cd
PH
«H
0
•o
^
4J
0
s
£
O
•H
4^
cd
N
•H
E
•H
-P
ft
O
pq
i_3
^
0
o
^
EH
0
d
p
bO
cd
tsl
1
bO
^H
tsl
0
d
o
^^
0
d
o
«aj
PH
cd
!d
ft
H
0
T?
cd
H
•H
x:
bO
cd
N
1
bO
•H
N
CO
0
H
•H
43
O
cd
EH
PH
^_i
3
0
•D
EH
cd
d
^
a
CO
0
rH
•H
4J
O
cd
PH
0
S
d
o
•H
4-5
cd
N
•H
E
•H
4-3
ft
O
PH
ul]
^
N
03;
•\
X
•H
d
0
r*|
PH
60
cd
N
1
bO
•H
IS!
CO
0
rH
•H
P
O
cd
PH
<
•o o
rH
0 «\
•H cd
^H C
bO 0
d 'D
•H Cd
EH CO
ft cd
CO PH
0
£_l
3
4J
cd
£>
^H
3
O
T3
f_i
cd
5 H
d rH
3 0
a *
E
EH
O
CO EH
0 0
H ft
•H
4^ CO
O 0
cd X!
EH 0
PH Cd
0
O ft
ft
'o cd
x; d
4J O
0 -H
S -P
cd
N
•H
E
•H
d -P
O ft
•H 0
4^
cd H
N J
E
•H T5
43 d
ft cd
o
d
PH 0
t3 "^
^^ -j_?
cd
E
i
•l-t
X
8
£H ft
S ft
cd
*
4J . "i
•H g
O
0
ij f-t
EH 4-3
Q
K^ i j-H
J^* •*-*
43
3 Q
0 PQ
S cd
-------
eye tends to emphasize the variate-based approach, because the
paper displays the logarithm of the concentration on its variate
(vertical) scale, but it greatly compresses the (horizontal)
frequency scale in those areas (e.g., 20$ to 80$ cumulative
frequencies) where large numbers of observations occur. As a
result, the user of logarithmic-probability paper tends to give
considerable emphasis to the tail of the distribution, which is
represented by a large number of intervals but relatively few
observations. Because the tail of the distribution is subject
to considerable randomness, resulting variate-based fits, such
as the method of fractiles, tend also to be heavily influenced
by randomness in the data. Thus they fit the specific data set
generated by a process rather than the underlying distribution of
the process itself. In this investigation, we unfortunately can-
not be certain about the nature of this underlying distribution,
so it is not possible to attach any particular significance to
our findings for frequency-based fits such as the MLE optimiza-
tion approach.
OVERALL FINDINGS
Tables 33-36 summarize the overall findings for the 11
cities and the four parameter calculation approaches. In
general, the values for the frequency-based measures of goodness-
of fit are quite large, indicating a poor overall fit of the
LN2 model to the 11 data sets, regardless of the approach used
to calculate its parameters. For example, all chi-square values
155
-------
O
a
o
m oc
< O
I- u-
S
3
UJ
ac
u_
O
QC
CO
h-
-
»—
o
111
X
s|1
u c
S-E5~
CO 10 .U =
S c* i
£«»«
u
cy c s
= «J Q.
-C OJ CJ.
r~
o o
o^- o
_i co c:
L5
> 01
2> u
0 S =
O C 3 CO
cn i- S i-
g-r- .r- CO
E X tt—
»— v/> * vi-
es s "-
0)
.c: 10
c_> 3
cr
00
C it- CO
100 0
£ CO CO
I- L.
+J 10 CO
O 3 «-
* Q
U C
'"" t 2---
CO IO 4-» E
J-a 10 Q.
C -i- O-
W 10 > — •
•r- +J CO
^ oo a
u
•*u _ ^^
E 10 a.
_c a) Q.
a s-~~^
c
11
5S5gjeSS;Sv23g!
*Or— »c o o oo .— om<-
S^SSSScSSKSlS
?T7?fT',J?:,J7?
o>^vo29Q°oop30'^Of|r>
r^LjOiorOkn^iAroaoiArx.
^^-O^-^OlOO Or— r—
Irti— CMCOCM*-OaO<-*-lO
ir)^-o«JCTtoocMf— CM»— n
Or- OOOOPOOOO*—
•1 II II
CSJOuOrOLfJiOLrti— 00^00
f^oo'sO-— rorocMO*cMvoo»
CMCMCMOCO^OO>CX>U>r^
vor— coincMcna»moocy*ix.
r-r— if— r— i^-r-^-f— ^— CM
1 1 1 1 1 1 1 1 1 1 1
or-* ooOLO' — O^~
^-VOCOfOCM^-CM'— r— F—
vovor— «oocT»ooOir-.riN.
aoor*.fOCTtro»— r^vo^co
O»— r— CM — ,— ^-i— CMfOVO
^00^'^"^OP'*^"fl^r— CO*—
^fr)ioo^ro»>.cj»ror**^'CM
Oi— f— »— cMcycMroro^-vo
rf 5E
a. 3
> . < 5-
rf<« i- M-ST3O-'-
35«g£ So- .0
! 4 1 b 1 S. 1 5 i 1 I
S ^ 5 i i 1 1 1 I 2 |
2z VI
0) 01
3 =1
""" *«
> >
»*- 1*-
o o
1 J
1: T<
156
-------
Q
O
I
I
I-
00
UJ
DC
U.
O
tr
<
(A 0)<— *
S.2S.
^« ro CX
r^
i"* Q.
O >^»
—1
•"•o c
oj *o *j e
.11 .2 a
•r- +J Q)
i- l/> Q
U
+•* — *
Js i.
(U CL
r~
o o
o~ t>
^lf
1
> a*
o > o
I- 0 S C
o c 3 a
O i- •<— OJ
£ S X t<—
i— 00 <0 CU
O 3<*-
0 0-t»-
ce to •«-
a
u c:
OJ ro 4-> =
S -o 10 a.
~c c -i- a.
•r- +J QJ
t. iy> Q
CO CM
^ CM
^- r—
r- r—
S S
^" """
^
> «"
• I.
.X "O
r- C
o
a
5
CM
§
O
1
CO
o
CM
,~
O
CO
CM
1
CO
o
o
CO
CO
CM
cn
CM
<
a.
-
a.
"ai
-o
a.
s
o
CM
CO
O
CO
CM
O
s
0
CM
CO
'
CM
O
O
o
a»
CM
CO
CM
,_
lO
CM
o
a.
v>
cvj
^™
cn
T
CM
at
•—
CM
CM
1
CM
CM
O
1
UD
O
CM
O
IO
cn
^o
(0
CO
^-
m
cn
CM
CO
CM
a
X
c
0)
o
Q.
S
0
in
vo
o
CO
o
to
o
o
1
o
VO
1
o
o
o
CM
o
o
CM
in
—
o
CO
5
(O
1
CM
CO
^*
in
cn
O
s
o
P-.
CO
0
in
CM
00
1
CM
O
O
CO
>>.
r-^
ii^
-
CSJ
CO
m
CO
^
-o
"oi
i,
c
(.
ex
I/)
CM
*~
cn
CO
CO
i
CM
!—
S
^_
1
CO
in
cn
i
j^
CO
o
o
CO
CM
CO
CO
*
CO
cn
CO
o
c
d)
•o
rtJ
Ul
(0
CL.
CM
VO
CM
CM
O
1
3
o
S
O
CO
CM
co
CM
S
o
o
o
CM
to
to
CM
to
CO
to
>:
•
*>
o
i.
o
1
^s
CM
"cn
1
r**
0
CM
O
S
~
o
o
o
CO
CO
CM
^
<
U
h-
i
157
-------
Z
o
5
5
X
o
tr
a.
a.
<
UJ
i
5
u.
O
Q
in O
<"> I
Ml 1-
"j 1
< uj
H I
^
oc
£
S
3
UJ
DC
U.
o
>-
<
5
5
i—
g
ae
LiJ
3£
OS
UJ
LU
i— l
C3
Z
LL!
00
1—
to
1^
5!
a
00
5
1—
00
DC
UJ
1—
1
_J
s
2
UJ
1—
to
J
o
in OI~-»
01 3 E
fsS
+> OJ— .
in 3 E
Ol i— Q.
-I1*'"'
u c
JZ C *r- Q.
4J (O > ^
^ S Q
O
4-» ^*
f^ p
S §.
01 Q.
4-> Z^—
^
•o
§§
-d '^
0*^0
£ 3
H"-
,
> Ol
2> L>
O S C
O C 3 01
i3> L. S &.
g -r- -i- — a.
•^ 4-> 0)
t- to o
u
1 si
t_
-s^ssiassass
^*-0S0*'"2
9)i-OOr— ioforor-^>r~^-^
^ ^ ^" CM 00 CM P"* ^~ fO i~" 1*^
^m lO CM O O "^™ O O O PO O
11111
^SSSSSSgSRS
o>— a^oouif— roi— «
^
SSc\ioi?55Sln|2f2
, . . , "
SCM 10 CM m 0> 0
CMOOgr-'—
ior- co»ccMtncoioa>o~
Sg:5?^;p;s5;22§
00^— e^CMr-Jl0r-'QOin
O^^^JCMCMfOrOCO^^
>_
2 < *
5 • ^
•* * «r i a 5 2" * S
-^- (_> Q. (U *
Illllllllli
CO
•°s
CA
in
"vn
CM
in
1
8
CO
in
CM
CM
s
in
1
i
o
CM
o
CM
CM
X
U|
o
ARITHMEl
i 1
'o "o
u u
c c
"S 1
5 S
vi in
"lO <9
> >
•s -s
E E
^ X
f £
158
-------
2
O
I—
<
N
S
t
O
UJ
_i
S
u.
O
Q
O
* £
UJ S
_i
£ f
DC
o
U.
CO
L_
Li
O
UJ
DC
U.
O
oc
s
3
CO
h-
UJ
p-
1
UJ
i
cc
UJ
Ul
I/I
u
UJ
h-
si
O
r^
S
t—
S
S
t/>
UJ
PARAMET
_j
0
£
5
to
|
g
S«J~^
3 B
^H ^ &
5"~
«s?
$'« Q.
-I5"""
u c
S?.2~
oi i» *> E
E -o «o B.
Jc c t- a.
ii 2 J^*
u
•lie's
I s §.
x K a.
^
<
EC
o o
i?l^ ti
Ji 3
1
> ai
o c 3 a>
oil. = u
§•^•^01
£ X *i
— IO 10U-
O £ ••-
^ a
o>
!c fe
°§-
to
C if. 01
no u
* u 1
•M -»^
•*- +J- W
U (/> O
u
1 si
£ CM ^™ 00 »~ co co rw
inoocnocMinm-grr-jCM
inioooo^r^^cooocon
O>— '-CMCMCMCOCOCO'TtO
< _ z
2 •
^'< « . ^ « j«
g^^.UOX^^C^
°5S>*«"S5-5'S
feljssis-iScxScS
Sz
f^
00
a
tf>
1
S
»
in
r~
CM
to
in
in
i
S
S
0
in
in
m
s
5
*!
o
I
i
• •
"o "o
o u
c c
1 1
>
"S "S
C K
159
-------
2
but one are greater than x = 100, and, If hypothesis testing
were valid (and taking Into account the number of degrees of
freedom), the LN2 model would be rejected at a high level of
confidence (P > 0.995) for all cities and all parameter calcu-
lation procedures. The one data set for which chl-square Is
O
less than 100 Is Barstow, CA (x = 36 for the MLE optimization
approach with 7 degrees of freedom), and, to avoid rejecting the
LN2 hypothesis for Barstow at P = 0.995* chl-square would have
2
to be less than x =20.3. Although chl-square gives values
that are too large for all data sets, the "best" (that Is, low-
est) values of this frequency-based measure of goodness-of-flt
always are obtained with the MLE optimization approach, and
this approach clearly would be the preferred way to calculate
(that is, "estimate") parameters if hypothesis tests were to
be undertaken. Of course, hypothesis testing is questionable
in view of the high serial correlation, and consequent lack of
statististical independence, of the observations.
Tables 33-36 also briefly summarize the results obtained
from the Engineering Error Test. The method of fractiles gave
the smallest average difference between predicted and observed
values (d" = -0.12 pprn), while the method of moments, MLE approx-
imation, and MLE optimization approaches gave average overpre-
dictions that were increasingly large (d" = 0.26 ppm, 2.15 PPm,
2.75 PPm* respectively). The method of fractiles had a minimum
difference of -4.89 ppm and a maximum difference of 2.67 ppm for
all data sets (Table 34), indicating that this parameter calcu-
lation approach fits the observations quite well, according to
160
-------
the variate-based concepts. The other three approaches had
maxima of 18.97, 59.95, and 87.68 ppm, although their minima
were similar to the minimum for the method of fractiles.
Another view of the EET results can be obtained by devel-
oping histograms of the differences for each parameter calcula-
tion approach, with all 11 cities grouped together. To do this,
the differences were first listed in tabular form (Tables 37-40).
Each column gives the count for the number of occurrences of
differences within intervals of size 0.5 ppm; for example, the
first interval in the center of the table represents the number
of differences in the range (0 ppm < d < 0.5 ppm). For each
table, there was a total .of 246 differences, and the numerical
values of differences of 5 Ppm or more were listed below the
table.
The histograms generated from these tables (Figure 20)
allow the EET results for the four approaches to be compared
visually. It is seen that the histograms for all approaches but
one, the method of fractiles, have long tails extending to the
right, with very striking right tails for the two MLE approaches.
For example, the MLE approximation approach has 17 differences
that overpredict the concentration by 12 ppm or more, and the
MLE optimization approach has 19 differences that overpredict
the concentration by 12 ppm or more. In contrast, all 246 dif-
ferences for the method of fractiles are less than 3-0 ppm, and
this is the only approach showing a slight tendency to underpredict,
reflected by the leftward skewness of the histogram. The method
of fractiles also is unique for the overall "narrowness" of the
161
-------
>-
m
£
2
ft
Ul
Q£
Ul
|
£
i
3
LU
ce to
o t—
CO £j gr
CL) z
r- Sfe
_Q UJ
h- 3jE
tt
O
u_
LO
5^
OC
LU
1—
Z
00
Ul
=5
=!
*"
u.
o
1-^-
UJ
i
z
I
a.
o
X
i.
**
1
in
A
in
* in
in
CO
in
CO •
m
^co
oioa)-
i- t. «) c ^ aSsTv) 2
«o o «— « £ co LO
>n 10 o •— CM
1
'fl
1
QC
LU
h-
LU
3
162
-------
ooooooooooo
ooooooooooo
ooooooooooo
ooooooooooo
ooooooooooo
OOOOCVJOOOOO*-
OOOOIAOOOOO
01 3
:= s
00
CO
-------
00
CO
i
UJ
£
UJ
1
2
ac ae
o
CO P
is 2
oo
C£ Q.
UJ O.
Z Ul
UJ U.
2
C£ Ul
UJ
3^
^
1
s
Ul
<
u.
o
oc
1
z
1
0
1
^"
I
in
A
in
W
I5
CO
in
CO •
CO
in
• CO
CM
in
CM •
CM
in
'-CSI
in
in
o *~
o •
o
in
* o
o
1
r— **•
1 O
1
in
in
CM •
T
in
• CM
1
in
CO •
1 CM
1
m
• CO
CO 1
in
1 CO
in
•f*
in
in >
i
u>
i
V
SE
t/»
I
o
~ s 2 a s 5 « s > 8 s
oooooomocoor-.
r— i>^ r— p—
OOOOOOOOr-Or-
OOOt-OOi-OOOO
OOOr-OOOOr-OCM
OOOOOOOi-»-OO
OOOCMOOr-Or-OCM
OOOOOOOr- •— Or-
O O O r^ *tf O ^ O »™ r» ^
OOOCMinO^CSJr-OCM
OOOCMCMOOCOOOin
r-«MOcor^cocM*fOi^«>
r-CMCMOr— CMinOOr-rv.
CMO^OOCOOCrOCMO
esir-moooooo^-o
Or- COOOOOOOCMO
Or- OOOOOOOlOO
Or-OOOOOOOCMO
Or-OOOOOOOOO
o^ooooooooo
ooooooooooo
Of— OOOOOOOOO
< z
O- <
» - < ^
U>ffTo!c ^5r-O
•»~ O CXi -
W> <*- X > i— « OJ£t- IO
«Oi— 0>!c«0£«O.««
2
in vo
in •«
CM
00 r-
ooo
-------
^
CD
a
5
H-
UJ
UJ
h—
5
a.
if
«s
o §5
a) gu,
"3 zfe
*""" tfl O
So
UJ X
CK UJ
O X
^UJ
CO X
^
s
UJ
5
•"*
to
UJ
_J
&
OC
UJ
C
1
o
X
1
a.
X
1
in
A
• in
in
* *
in
in
in
• ro
esi
in
CM •
CM
in
• CSJ
in
"^
o *~
in
°o
in
• o
0°
in
i
in
i
in
CM •
1 i—
1
in
• CM
CM 1
1
in
1 CM
1
m
• ro
CO 1
i
in
i
m
i
in
in •
i
m
ui
o
5
£
0
- S ? * = 5 « 5 « • >
— oo — oor-omof.
r- r— ^
ooo*— oooomoo
OOOOOOOO ro O ^
OOO*-OO*-OCMOO
OOO»— OO»— O*— O-
f » "u ^ £" *f. f r 2- ^ »rt P*
"•* ^» 40 ff* ^ O*
~*^ ^"I^1^
POfO *
n in in
- ffi fO Jg f-
roo r-
n r%. o CM * »—
^•^ ^ CM
5» .— CM ^ f*.
r- * *O f^> »—
- ,- ^- r- CM
« oo <*> CM ro in
in us <-- o» O o
2
h»
165
-------
9O
8O
|70
§60
oc
8 50
^40
O
S30
§20
10
s\
22
r
n
rn
rirM \
84
METHOD OF MOMENTS
74 _
-12-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 *1
d = 0,26 ppm
s = 1.68 ppm
d
n = 246
NO. OF VALUES
GREATER THAN 12=5
is
7 [~~| 7
|_L_1 t 1
»2 *3 «4 *5 *6 «7 *8 »9*10'1H2
d = xp-x0 (ppm)
90
80
J 70
Q
560
>
OC
I50
S40
530
¥
i 20
10
f\
85
74
-
14 r12-
I-ii-l
^^T
-12-11 -10-9-8 -7-6 -5-4 -3 -2-1 0 .
90
80
I70
560
£
I50
-40
o
|30
|20
10
METHOD OF FRACTILES
d" = -0.12 ppm
s . = 0.79 ppm
d
n = 246
31
12
1 +2 »3 «4 *5 *6 *7 *8 *9 »10 »11 42
cl = xp-x0 (ppm)
MLE APPROXIMATION
•
23
I
12
^T
58
d - 2.1 5 ppm
so Sj = 3.62 ppm
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 »1
90
80
270
O
<60
tc
i50
S40
g30
120
10
0
a
n - 246
NO. OF VALUES
„ GREATER THAN 12=17
1
1 1 8 4*4 6 1 1 4 5
*2 -3 «4 *5 »6 «7 *8 »9*10*1H2
d = xp-x0 (ppm)
MLE OPTIMIZATION
51
•
•
23
11 10
1 1
L i 1 1 1 1 1 1 I 1 *— 1
M
3 = 2.75 ppm
s = 4.56 ppm
Q
n = 246
NO. OF VALUES
GREATER THAN 12-19
15
"1-2-rWh . rS-, «r-Li
n ip-rT-ij-— n-
-12-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 »1 «2 *3 «4 *5 *6 *7 »8 *9 *10 »1H2
d*xp-x0 (ppm)
Figure 20 Histograms of the deviations between predicted and observed concentrations
for four different methods of calculating LN2 parameters for the geographical
data group.
166
-------
histogram.
These results also can be compared graphically by plotting
the cumulative counts of the EET differences (Figure 21). For
ease of plotting, curves have been drawn through the points. All
curves lie close together for differences to the left of d =0.0
ppm, but the curves show distinct variations for positive dif-
ferences. The method of fractiles has a very small projection
to the right, but the two MLE approaches have striking right-
ward tails and almost coincide. The curve for the method of
moments is intermediate between the other curves' but lies fairly
close to the method of fractiles.
To determine the percentage of the differences which lie
within stated error ranges, we examine the bottom row marked
"CUMULATIVE" in each of Tables 37-40. For sample, to find the
percentage of the differences that fall in the range +2.0 ppm
for the Larsen method of fractiles, we read from Table 38 that
27 differences were less than d = -2.0 ppm and 234 differences
were less than d = +2.0 ppm. By subtracting the two values,
we determine that 234-27 = 207 differences were within the range
d = + 2.0 ppm. Because the table contains 246 differences, we
conclude that 207/246 = 84^ of the differences were between
+2.0 ppm for the method of fractiles.
It is possible, in the same fashion, to construct a table
for various "error ranges" which summarizes the percentage of
the differences falling into various ranges (Table 4l). it
is seen that that the first two techniques — the method of
moments and the method of fractiles — have similar error
167
-------
a
a.
X
II
•o
8
«U
2
10
a
CM
c
1
_o
s
I
+-*
1
g
I
*• CO
.Q O
if
c 2
a, §"
€ ft
Ii
8
•c
s 8
O
CD
i_
O>
1
o
-------
H
0
rH
,0
0)
02
0
to
C
«J
PH
£_,
g
w
-o
0
-p
cti
-p
CO
a
•H
J3
02
o
•H
-P
O
•H
0
P-i
CXJ
•"*
<4-H
o
•p
0
o
£4
0
^^
E
O.
a
c
o
•H
-P
S
-P
£j
0
O
£j
O
O
o
o
o
+1
o
oo
+ 1
o
CVJ
-f-)
LfN
H
+ 1
O
•
_-j
+1
iTi
Q
+ |
OO
CVJ
CTv
VO
VO
CO
o\
PO
CTi
j^_
^^.
PO
CVJ
^_
PO
•
VO
VO
^
H
LfN
^j-
02
j-5
c
0
E
S
us.
H
-£
CO
*^^
CVJ
1^
^
vo
•
^j-
vo
•^
VO
ivl
^t-
CQ
0
rH
•H
-P
O
as
£H
fe
OH
O
•a
Q
-p
0
S
CVJ
O-
*
H
rH
O-
Ox.
VO
CVJ
VO
^^
iH
_,:
in
oo
•
^J-
^f
•^
cr\
0
PO
£j
O
^H
•P
05
E
•H
O
P.
P.
<
w
lj]
g
CO
VO*
o-
•w.
rH
|vl
VO
o\.
CVJ
o
VO
^^
VO
H
Lf\
v~
t^_
•
^f
^.
LA
CVJ
oo
c
o
•H
•&
cf
N
E
TH
•p
p.
O
w
iH
g
169
-------
although the method of fractiles performs slightly better than '
the method of moments. The results for the last two approaches
are quite similar to each other and are distinctly different
from the first two approaches. If the analyst Intends to
represent the frequency distribution of a particular data set
by the LN2 model, he can gain some Insight from this table. The
NDIR Instrument used to make CO measurements customarily Is
assumed to have a measurement error on the order of + 1 ppm
(hourly averages). If the analyst wishes the model's predic-
tions to be within +1.0 ppm over the entire range of concentra-
tions, It can be seen from Table 4l that all methods perform
rather poorly. Only two-thirds of the predictions lie within
this range for the Larsen method of fractiles, and fewer than
half of the predictions are within + 1.0 of the observed values
for the two MLE approaches. If the allowable error range is
expanded to + 1.5 ppm, then 77.2$ of the predictions using
Larsen1s approach will be considered successful. Finally, with
a + 2.0 ppm error band, 84.1$ of the predictions from the method
of fractiles are successful, and 60.2$ of the predictions from
MLE optimization are successful. It is not until the error band
is expanded to + 3.0 ppm that we can state that at least 90$ of
the predictions are successful, and, even with this relatively
wide error band, only 71$ of the predictions from the ICE opti-
mization technique are successful.
Discussion
Whether the above error bands are considered "too wide" or
170
-------
"acceptable" Is a matter of opinion and should be determined by
the purpose for which the model is intended. It should be empha-
sized that the word "prediction," as used here, does not mean
predicting future values and should be interpreted in the fol-
lowing narrow sense.
Suppose that the analyst obtains a CO data set containing
8?60 hourly observations (ignoring missing values), and he
wishes to represent the year of data as parsimoniously as pos-
sible. If he partitions the observations into 50 intervals and
makes a histogram, he can represent the frequency distribution
of the data set by just 50 numerical values: the counts of the
number of observations in each interval. In effect, 8760 numer-
ical values have been reduced to just 50 numerical values, a
significant reduction. Although all information on the time
series of the observations has been removed, the histogram
retains information that is very useful for enforcement pur-
poses. If the analyst now wishes to apply the LN2 model to the
data set, he can reduce the information he must store to just
two numerical values: the two parameters of the LN2 model. By
doing so, he has reduced 8760 numerical values to just 2 numer-
ical values. With the model and the two parameters, he can
reconstruct the entire histogram. Great parsimony has been
attained, and the cost is paid in terms of the error that may
result. As we have seen, the LN2 model can be used to predict
the concentrations for 90$ of the intervals within + 3 ppm and
for 84$ of the intervals within + 2 ppm for the method of frac-
tiles. Because the geographical data group was selected as a
171
-------
cross section of U.S. CO data sets in 1974, it is likely that
the LN2 model, with parameters calculated by the method of
fractiles, will attain the same performance for other data sets
in the population.
Notice that the data analyst, because he is "predicting"
concentrations for a data set that is available to him, can
determine immediately how well his model fits the observations.
Thus, the goodness-of-fit tests described in this report can
immediately be used to see how well the LN2 model predicts the
true concentrations. It is important to stress that a model
applied in this manner should not be used to predict the future
concentrations for the following year, nor should it be used
to assess the impact of control strategies. Variate-based- tests
are useful techniques for assessing how well a model describes
the concentrations of a particular data set, which is one
realization of some underlying process. In order to use a model
for predictions, we must be certain that the basic distribution
of the process has been modeled correctly, not just one reali-
zation. Thus, it is possible, even likely, that the MLE opti-
mization approach gives the best estimate of the underlying dis-
tribution of the process, even though it may not perform well
on a given realization when judged by variate-based tests. Of
course, an even more likely result is that the LN2 model is the
wrong distribution for modeling air quality data, and more com-
plex models are needed to describe the underlying process. At
the present time, we are inclined toward this view.
172
-------
IX. RESULTS FOR THE LONGITUDINAL DATA GROUP
The longitudinal data group was included in this study to
enable us to examine the distributions of hourly CO observations
at a single air monitoring station in successive years. The
data were obtained from SAROAD for the Continuous Air Monitoring
Project (CAMP) station in Washington, D.C., and covered the
period from 1966 to 1974 (Table 42). However, data for 196? were
not available from SAROAD, and the data set for 1974 included
only several months of observations and therefore was not analyzed
in detail. Although the data- sets for 1972 and 1973 were incom-
plete, they were included in the analysis. Over the eight years
of data, the arithmetic means were relatively similar, ranging
from 3.31 ppm to 4.35 ppm. The arithmetic standard deviations
also were close to each other, ranging from 2.47 ppm to 3.80 ppm.
This chapter first presents the findings for the individ-
ual years; then the overall findings are discussed briefly. The
numerical analysis for the longitudinal data group is not quite
as detailed as that for the geographical data group (Chapter
VIII).
FINDINGS FOR INDIVIDUAL YEARS
The results for the individual years, from 1966 to 1973,
are discussed in the following sections.
173
-------
CNJ
^*
UJ
CO
<
1—
Q-
— ^
O
Of.
o
<:
0
_j
z
•— i
0
O
•z.
O
— *
o
o
z
o
t—
f n
Z
1— 1
<
3:
K— t
0
1—
<
co
Q_
o
LU
t—
O
Lu
CO
LU
co
l
a
o
o
50
5 i.
s
z a.
tu CM
O .0
Z «
LU CO
O CO
i— i o
LU 1—
LU a:
LU 3
o ^
0
Lui
O <->
z X1
LU •
.-i co
O CO
Lu Z
Lu 2
LU LU
O *£
CJ CO
z
o
h—
>
LU — »
Q a.
§ *""*
0
t—
CO
LU
O
5lL
Lu LX
E-S
s
>_
_J CO
Q£ Z
§ h—
o oc:
LU
• CO
O OO
z o
UJ
^
0
90
PO
o
o
10
o
CM
PO
CM
O
ps.
CM
ro
p-«^
CM
£
CM
CO
0
ro
PO
CM
CM
00
^
IO
en
0
CM
O
O
ro
10
ro
PO
IO
ro
CM
CM
PO
CM
CM
0
*r
ro
10
IO
CM
""*
00
IO
o
to
CM
O
O
to
CM
^
O
o
ro
CM
en
CM
ro
CM
IO
m
ro
vo
m
,o
IO
en
10
en
O
to
o
o
10
IO
to
to
ro
^
o
CO
ro
Q\
in
s
CM
0
en
PO
CO
^
o
r-.
O
*
O
0
ro
O
PO
ro
oo
-o
ro
^O
0
in
CM
a>
^~
PO
in
CM
r"~
^
en
0
in
in
0
o
kO
ro
m
o
CM
o
o
PO
^^
^
P:
PO
en
ro
•*
*
en
"*
CM
en
O
o
m
o
o
CM
O
IO
O
PO
r-
en
s
ro
oo
PO
ro
ro
CO
en
IO
ro
*
en
*•*
V
"a.
0
c
•— •
*
-------
Table 43
Frequency-based Measures of Goodness-of-fit for Washington, D.C. - 1966
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
ft
2.653
2.798
2.664
2.689
S9
1.946
1.977
1.987
1.922
d.f.
16
18
17
16
RMS
124
108
134
1071
x2
520
575
497 l
589
K-S
max
0.040
0.048
0.046
0.0301
L*
-15881
-15903
-15893
-158771
1
Indicates best result in each column.
1966
The observations for 1966, when plotted on logarithmic
probability paper (Figure 22), appear to scatter narrowly about
a straight line. Because the parameters that were calculated
by the four methods were nearly identical, the resulting LN2
lines all had similar slopes and vertical intercepts, and, when
Table 44
Variate-based (EET) Measures of Goodness-of-fit for Washington, D.C. - 1966
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
25
25
25
25
d
-0. 786
0.356
-o.no1
-0.985
Sd
1.110
0.7211
0.783
1.267
dmin
-2.79
-0.951
-1.62
-3.19
dmax
0.61
1.74
0.96
0.551
1
Indicates best result in each column.
175
-------
plotted, almost coincided. The medians (ft) were very similar to
each other, as were the standard geometric deviations (s ), and
O
the resulting frequency-based goodness-of-fit measures also were
very close in value (Table 43). Although the differences among
the measures were small, three of the four frequency-based meas-
sures indicated that the MLE optimization approach gave the best
fit. If hypothesis tests were undertaken, the horizontal scatter
about the lines in Figure 22, although small to the eye, would be
too large to be caused by chance alone, assuming that we really
were sampling from a lognormal distribution. This can be seen
p
from the relatively high values of chi-square (x ) and the Kol-
mogorov-Smirnov maximum difference (K-S max). The variate-based
measures of goodness-of-fit also are very close to each other,
with the method of fractiles having a slight, though debatable,
advantage over the other three methods. According to these
results, the LN2 model, with parameters calculated by the method
of fractiles, can represent the entire distribution with a mini-
mum underprediction of 0.95 PPm and a maximum overprediction of
1.7^ PPm, and an average overprediction of only 0.36 ppm. It
appears that the LN2 model could be used to represent this
particular data set adequately for many practical applications.
1968
The Washington, D.C., data for 1968, when plotted on loga-
rithmic-probability paper (Figure 23), appear very similar to the
1966 data (Figure 22), and the similarity suggests that they may
arise from the same underlying distribution. The medians for the
176
-------
100
90
80
CONSTRUCTED BY THE AUTHORS FROM PAPERS OP:
U*E PROBABILITY X 2 LOG CYCLES
CUMULATIVE FREQUENCY * e «""«• •—««™ • — •"
9.01 O.OS 0.1 0.2 0.5 I 2 5 ID 20 30 40 50 60 70 80 90 95 98 99 99.5 99.899.* 99J9
t-
Z
LU
o.i
0.01 0.05 0.1 0.2 0.5 1 2 S 10 20 30 40 50 60 70 80 90 95
CUMULATIVE FREQUENCY
0.1
99 99.5 99.8 99.9 99.99
FIGURE 22. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR WASHlM^T™i DC, 1966 (n = 7822).
177
-------
Table 45
Frequency-based Measures of Goodness-of-fit for Washington, D.C. - 1968
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
2.707
2.823
2.640
2.662
S9
1.976
1.943
2.132
2.072
d.f.
17
17
19
18
RMS
67
651
91
80
X2
541
853
309 1
327
K-S
max
0.023
0.028
0.035
0.0231
L*
-15499
-15547
-15476
-154681
Indicates best result in each column.
two data sets are very similar Cm" between 2.6 and 2.8 ppm), and
their standard geometric deviations are quite close (s between
O
1.9 and 2.1 ppm). However, the 1968 data show some tendency
towprd concave downward curvature while the 1966 data do not.
The frequency-based goodness-of-fit measures (Table 45) give much
Table 46
Variate-based (EET) Measures of Goodness-of-fit for Washington, D.C. - 1968
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
25
25
25
25
d
-0.101
-0.1011
2.147
1.237
sd
0.682
0.6781
2.144
1.393
dmin
-1.36
-1.45
-0.131
-0.16
dmax
1.48
1.171
7.69
5.16
Indicates best result in each column.
178
-------
001 0.05 0.1 0.2 0.5 1 2 5 10
CUMULATIVE FREQUENCY
20 30 40 SO 60 70 80
CONSTRUCTED «YT
t^f
KLUPFO. ft EMM
90 95 98 M 99.5 99.1 99J
I PAPERS OP-
! LOG CYCLES
o
0.1
0.01 0.05 0.1 0.2 0.5 ' 2 S 10 20 30 M SO SO 70 BO
CUMULATIVE FREQUENCY
95 M !6 99.5 N.I NJ MJt
FIGURE 23. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR WASHINGTON, DC, 1968 (n = 7266).
179
-------
Table 47
Frequency-based Measures of Goodness-of-fit for Washington, D.C. - 1969
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
2.935
2.879
2.862
2.899
S9
1.846
1.872
2.041
1.947
d.f.
15
15
18
17
RMS
1031
118
147
124
x2
2547
1933
959 !
1308
K-S
max
0.0461
0.060
0.074
0.061
L*
-12969
-12951
-12955
-12931
1
Indicates best result in each column.
lower values for the two MLE approaches in the 1968 data set
than they do In the 1966 data set. This occurs because the
observations In the lower and central ranges of the 1968 data
(cumulative frequencies of 2% to QOfo) happen to fit the model
quite well for the MLE approaches, giving correspondingly lower
values of chl-square and the Kolmogorov-Smlrnov maximum differ-
Table 48
Variate-based (EET) Measures of Goodness-of-fit for Washington, D.C. - 1969
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
22
22
22
22
d
-0.306
-0.1481
2.316
1.014
Sd
0.636
0.5631
2.118
1.010
^m in
-1.42
-1.19
-0.32
-0.251
dmax
i
0.52
1.07
7.80
4.13
Indicates best result in each column.
180
-------
001 0.05 0.1 0.2 0.5
100 ferttm^t-j iiiiu
90
80
70
60
50
40
30
20
10
9
8
7
6
S
4
3
1
0.9
0.8
0.7
0.6
0.5
0.4
0.2
0.1
0.
t=
—
»— —
^
~
-
==;
-
rf
-^
^_
^
-
:W
- :
p1
1 —
1 —
—
= -
- = :
= 5 :
--
e^'-
"-:
L-.:
-•
;:"
MET
ET 0
— -
&'T'-
?"
:;-
--•
i 3
-
!
. :
:
tit "
i
. ,
ii t
.ii
H '
i
I -i
HOD 0
§4
i
:
' 7"
.
i ::
! •;
<~ TVT
1 ' I
t • •
F l\
1
i
/IOMEN1
II'
Iff 3j_t a t 1 1
D OF FRACTILE
x
i ' /
;.:.
s,
*H
\-, ;'<.'.
C /
>
LE AP
12 5 10
It iii
i : .
iii i i - . , .
If {{; l . . .1 -
j : : i
--rH — t — I *— *•$ "
:',: : i
'
J ; , , •
; : - • ...
; : . . • :
j: : ' '. '.
: :'.::':
;; | ; ; ; ' :
S vU ' y^ '
t-j ;ff*' i i i 1
^if-xr : i
;-dr i I i
---4 - -H —
i i i
i in
1 i !ii| 1 — | 1 l l i 1 -
l«LE OPTIMIZATION
IFPTTT^
(('lit 1 I 1 lij -i
'ROXIMATION :
1
ill k
I/.C pt*OB*B'L W X t LOG CVCLC*
CUMULATIVE FREQUENCY " c ••"'">• .o»>co .......
20 30 40 50 60 70 80 90 95 91 99 99.5 99 1 MS SSJ9
l{.|[i)i|i 1H •••< H M 1 1 1 1 1 , : t -.p- I" tt-j^^^=j 10°
^rTT
'':"'if
?'': '/:''
i 1 :: : i ' :
-f fflfifl
1 'i' ; ' i
; i ; ;
t"-t -~--r
: ; i
l * j j
t
* • TH
* j * *
M M 1
i ' f t i
1 •:
i . i
?
"iri~
-U-:
i
j
i
;
• .
,; .
91 0.05 0.1 0.2 0.5 1 2 5 10 20 30
i
TTTT
£-
-r^-f
i*
1 1 1 t
* M f
~"~
^
1 1
iji
ri
X
! : . .
I'll
t » t *
....
IP
1!.
t '
i j
i
s£*
• lit
J
I
l/^
^~
..,.
fr*
•4
: i .
i
t
^ ^
'-•'J4
'/•
— - —
* •
t ;
40 50 60 70 M 90 95
' ' -'
f' .-*
:!} !'
:: : ::
tilt-
1; :
T4-H- K+
'it! t
ll 1
; :
' ' '-
•^TT
&
^^"
:
: *~~
•
. 1 ,
t * t
||
u
I if"
1 .
!:':
•
«-S
u:;
-r-1*
^ "L>r
^;;::
-l— m
nl
i!;|
!!;!
!**!
i
|
8
:t]
•
thm
'• I >
fr
I. :- =
MP
-1^
<* r-
:: r
i ! !-
"T~T — '
' 1 i
} i
J J-
1 1 f
:\l-.
ifp
i !"
: •-'
— -
::
-j
-"
: i r . -.
— —
.'..- ~
- -
91 91 99.5 N.I •»» M
90
SO
70
60
40
20
10
9
8
7
6
2S55S5- - - . .
CONCENTRATION
O.3
0.2
0.1
J*
CUMULATIVE FREQUENCY
FIGURE 24. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR WASHINGTON. DC, 1969 (n = 6056).
181
-------
ence. According to tne variate-based EET measures (Table 46),
the method of fractiles gives the best results, but the values
are so similar to those for the method of moments that it would
be difficult to choose the method of fractiles over the method
of moments. The relatively large overpredictions for the MLE
approximation and MLE optimization approaches (7-69 ppm and 5.16
ppm, respectively) occur because of the downward curvature at
the higher concentrations in 1968. As discussed in the previous
chapter, downward curvature at the extremes has little influence
on the computed values of the MLE parameters, causing a charac-
teristic overprediction when evaluated by the EET tests.
1969
The observed CO values for Washington, D.C., CAMP in 1969,
when plotted on logarithmic-probability paper (Figure 24), appear
quite similar to the 1966 and 1968 data sets. As given in Table
47, the medians (approximately In" = 2.9 ppm) are slightly higher
than those for the 1966 and 1968 data, and the standard geometric
deviations (approximately s =1.9 ppm) are almost identical to
O
those for the 1966 and 1968 data. As with the 1966 data set, the
observations do not appear to have definite curvature in either
direction, although the scatter of the points about the four
straight lines is larger. As a result of the scatter, the values
O
of x and K-S max for 1969 are larger than those for 1966 and
1968. According to the frequency-based measures of goodness-of-
fit (Table 47), no one approach emerges as clearly superior,
partly because the two parameters for the LN2 model are nearly
182
-------
the same for every approach. Surprisingly, the method of moments
gives the best fit in two of the categories (RMS and K-S max).
According to the EET variate-based measures (Table 48), no single
approach gives the best fit, although the method of moments and
the method of fractiles give fairly good fits. For the method of
fractiles, the LN2 model predicts the observed hourly CO concen-
tration over the full range of the observations within 2.3 ppm
(dmov - d . = 1.0? - [-1.19] = 2.26 ppm), with an average under-
Fiicix mm
prediction of less than 0.2 ppm (d~ = -0.148 ppm) and a standard
deviation of only 0.6 ppm (s, = 0.563 ppm).
1970
The 1970 data from Washington, D.C., CAMP, when plotted on
logarithmic-probability paper (Figure 25)* also resemble the data
for the previous years (1966, 1968, and 1.T69) . As reported in
Table 49, the value of the model's median (approximately fa =
2.8 ppm) and standard geometric deviation (approximately s =
o
2.0 ppm) are nearly identical to those in the previous years,
and the plot of the four straight lines resembles the previous
plots very closely. If we consider the characteristics that all
four of these data sets (1966, 1968, 1969, and 1970) share in
common, we note that, in all cases, 98^ of the data are 10 ppm
or less, and all medians lie in the narrow range of 2.64 ppm
< -ft < 2.94 ppm. Thus, all four of these distributions at the
CAMP station appear quite similar.
The frequency-based measures of goodness-of-fit for the
1970 data (Table 49) do not identify any one approach as clearly
183
-------
Table 49
Frequency-based Measures of Goodness-of-fit for Washington, D.C. - 1970
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
2.805
2.668
2.773
2.811
S9
1.937
2.164
2.093
1.996
d.f.
17
20
20
18
RMS
1431
216
181
155
x2
1897
1086 1
1120
1450
K-S
max
0.0501
0.088
0.063
0.051
L*
-16623
-16699
-16638
-166091
Indicates best result in each column.
-superior. The two lines for the method of moments and the method
of MLE optimization are very close together and therefore produce
similar values of K-S max and the log-likelihood function (L*).
Although the method of fractlles performs poorly with three of
the four frequency-based measures, surprisingly it gives the best
O
(lowest) value of chi-square (x = 1086). Of course, this chi-
Table 50
Variate-based (EET) Measures of Goodness-of-fit for Washington, D.C. - 1970
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
34
34
34
34
d
-2.235
2.038
1.048
-0.9551
Sd
3.331
1.8081
1.931
2.687
dmin
-8.95
-0.831
-2.77
-6.54
dmax
l.OO1
4.51
3.54
2.01
Indicates best result in each column.
184
-------
D BY THE AUTHORS FROM PAPER
LS*C WK>B*»LJTV X 2 LOG CYCLE
r\ C KEUFTCI. • CMCN co ua • •(*
100
90
80
70
001 0.06 0.1 0.2 0.5 1 2
CUMULATIVE FREQUENCY
20 JO 40 50 60 70 80 90 95 91 M 9S.5 99.1 »J
i
u
0.01 0.05 0.1 0.2 0.5 1 2 5 10 20 30 40
70 10 90 H M W tl.S M.I MJ
8 , ~
0.1
FIGURE 25. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR WASHINGTON. DC. 1970 (n - 7748).
185
-------
square value, like the others in Table 49, is quite large, and,
if hypothesis test were undertaken, would strongly lead to rejec-
tion of the lognormal hypothesis. The EET variate-based measures
(Table 50) tend to indicate that the method of fractiles has the
smallest standard deviation (s, = 1.808 ppm) and the least serious
underprediction (d . = -0.83 ppm), but it has the most serious
overprediction (d ^ = 4.51 ppm) and the second highest average
max
overprediction (d = 2.038 ppm). Overall, the goodness-of-fit
measures suggest that the observations exhibit scatter about a
straight line that is somewhat greater than might be expected by
chance alone and that no one parameter calculation approach is
clearly superior for this data set.
1971
The data for 1971, when plotted on logarithmic-probability
paper (Figure 26), once again resemble the plots for the previous
years at this station (1966, 1968, 1970, and 1971). As seen in
Table 51, the median of the LN2 model (approximately^ =2.8 ppm)
and standard deviation of the model (approximately s = 1.9 ppm)
O
lie in the same range as those for the other four years. Accord-
ing to the frequency-based measures of goodness-of-fit (Table 51),
the MLE optimization approach gives the best performance in all
r-)
four categories (RMS, x » K~s max* and L*)• Conversely, the
variate-based measures of goodness-of-fit (Table 52) indicate
that the method of fractiles gives the best fit. The superior
performance of the method of fractiles with this data set occurs
because the data points above 10 ppm describe an upward curve, and
186
-------
the method of fractiles fits this upward curve at a cumulative
frequency of 99.9^. Because the EET measures are more strongly
influenced by the tail of the distribution than the body of the
distribution, they indicate that the method of fractiles performs
quite well. The other three approaches are influenced more by
the body of the distribution and therefore do not perform as well
when judged by the EET measures.
When the CAMP data for 1972 are plotted on logarithmic-
probability paper (Figure 27}, the resulting plot stands in sharp
contrast with the plots for the period 1966-1971. The lines for
the four parameter calculation approaches have considerable
spread, and each line has a different slope. In addition, a
sharp discontinuity, or "bump", appears at the higher concentra-
tions. Finally, as indicated by Table 53, the concentrations for
this data set are higher than those for the previous five years;
the medians for the four approaches range from 3.0 ppm to 3-5 Ppm*
with an average value of 3.25 ppm, which is significantly higher
than the medians for the period 1966-1971 (approximately 'm' = 2.8
ppm). Similarly, the standard deviations for the four approaches
in 1972 have an average value of 2.31 ppm, which is significantly
higher than the standard geometric deviations for the period
1966-1971 (approximately s =1.9 ppm).
o
The anomaly, or bump, which occurs at the higher concen-
trations appears to be an abrupt deviation from the remaining
pattern of observations. Examination of the histogram (Appendix
187
-------
Table 51
Frequency-based Measures of Goodness-of-fit for Washington, D.C. - 1971
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
2.809
2.728
2.856
2.883
S9
1.932
2.074
1.885
1.850
d.f.
17
19
16
15
RMS
87
144
63
46 l
X2
171
449
126
123 !
K-S
max
0.037
0.073
0.022
0.0181
L*
-14691
-14838
-14664
-14658
Indicates best result in each column.
D) indicates that there are 2 observations of 24 ppm, followed by
14 observations of 25 ppm, followed by 21 observations of 26 ppm,
and then followed by only 1 observation of 27 ppm. Prom our
analysis of the geographical data group (Chapter VIII), it is
clear that this behavior is highly unusual, and one suspects a
possible error in the measurement or data transfer process, or
Table 52
Variate-based (EET) Measures of Goodness-of-fit for Washington, D.C. - 1971
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
30
30
30
30
d
-1.425
0.9191
-2.059
-2.563
*d
1.947
0.8001
2.408
2.768
dm in
-5.44
-0.431
-6.91
-8.05
dmax
0.55
1.96
0.37
0.231
Indicates best result in each column.
188
-------
too
90
50
40
30
CUMULATIVE FREQUENCY «
Ml OJS 0.1 0.2 O.S t 2 5 10 20 30 40 SO SO 70 to 90 95 91 99 99 S 99.1 99.9
10
K
1
0.9
0.8
0.7
0.6
04
0.3
0.1
- MLE OPTIMIZATION »
-j^mtttu t i i 111 n
•MLE APPROXIMATION'
METHOD OF MOMENTS
::ti:
METHOD
!| ill
:1 til
OF FRACTILES
4H
i-iH
•t
It
i-T
«-
•m
jm.>t-tj..=i 100
HTOIfl 90
90
70
60
60
40
30
20
U1 041 0.1 0.2 II 1 2 S 10 20 30 40 SO 60 70 M 90 IS M M M.t
CUMULATIVE FREQUENCY
10
9
8
7
6
i
z
Ul
M.I 00.1 MM
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
FIGURE 26. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR WASHINGTON, DC. 1971 (n •= 7254).
189
-------
Table 53
Frequency-based Measures of Goodness-of-fit for Washington, D.C. - 1972
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
3.286
3.542
3.043
3.112
S9
2.115
1.929
2.669
2.508
d.f.
21
19
28
26
RMS
155
140 l
169
167
x2
5874
25122
2009 x
2148
K-S
max
0.088
0.0841
0.111
0.105
L*
-12909
-13350
-12720
-127031
Indicates best result in each column.
else a very unusual event at the air monitoring station.
Because of this anomaly in the 19?2 data, the frequency-
based measures of goodness-of-fit (Table 53) all give very large
values, indicating a poor fit for all four approaches. Notice
2
that the chi-square values for all four approaches exceed x =
2000, with the method of fractiles giving the extremely large
value of x2 = 25122. According to the EET variate-based mea-
Table 54
Variate-based (EET) Measures of Goodness-of-fit for Washington, D.C. - 1972
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
35
35
35
35
d
0.8081
-1.893
13.848
9.631
sd
3.021
2.3551
15.253
10.942
dmin
-4.58
-7.29
-0.84
-0.751
dmax
7.35
0.941
41.39
29.45
1
Indicates best result in each column.
190
-------
CONSTRUCTED BY THE AUTHORS FROM PAPERS OF:
100
90
80
70
60
SO
0.01 0.06 0.1 0.2 0.5 1 2 5 10
CUMULATIVE FREQUENCY
20 30 M 50 60 70 80
KCUFFCL • ESKR CO HIDE w •
90 95 98 93 99.S 99.1 99.9 39JS
EC
0.01 0.05 0,1 0.2 0.5 1 2 5 10 20 30 40 50 GO 70 80 90 95 98 99 99.5 90.8 90.9 3918
CUMULATIVE FREQUENCY
0.1
FIGURE 27. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR WASHINGTON, DC, 1972 (n = 4947).
191
-------
Table 55
Frequency-based Measures of Goodness-of-fit for Washington, D.C. - 1973
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
A
m
2.802
2.728
2.751
2.774
S9
2.093
2.074
2.251
2.182
d.f.
17
17
19
18
RMS
55 l
63
68
62
X2
373
381
265 :
274
K-S
max
0.0301
0.044
0.044
0.038
L*
-9591
-9598
-9586
-9580
Indicates best result in each column.
sures (Table 5^)> the method of fractiles gives a better fit than
the method of moments, and both methods perform much better than
the two MLE approaches, which give serious overpredictions. The
MLE approaches perform poorly because they are greatly affected
by the middle ranges of the data; with the downward curvature
at the low end and the anomaly at the high end, the MLE lines
do not fit the overall distribution very-well. Of course, it
Table 56
Variate-based (EET) Measures of Goodness-of-fit for Washington, D.C. - 1973
(ppm)
Method
Moments
Fractiles
MLE Approx.
MLE Opt.
n
38
38
38
38
d
-0.5141
-1.439
3.288
1.580
Sd
2.398
2.652
2.974
2.3441
dpiin
-7.89
-9.47
i
-0.78
-3.99
dmax
3.11
1.531
10.22
7.01
Indicates best result in each column.
192
-------
100
0.01 0.06 0.1 0.2 0.6 1 2 5 10
CUMULATIVE FREQUENCY
20 30 40 SO 60 70 80
CONSTRUCTED BY THE AUTHOR! FROM PAPIRBOP:
95 98 88 99.5 99.1 99J 99J8.
0.01 0.05 0.1 0.2 05 1 2 5 10 20 30 40 50 GO 70 80 90 95 98 99 99.5
O
K
0.1
FIGURE 28. LOGARITHMIC PROBABILITY PLOT OF CO HOURLY OBSERVATIONS FOR WASHINGTON, DC. 1973 (n = 4269).
193
-------
should be borne in mind that the 1972 data set is incomplete,
with only 49^7 observations out of a possible 8760 hourly obser-
vations for the year.
1973
The CAMP 1973 data set, like the 1972 data set, also is
incomplete, but it exhibits no serious anomalies. Like the 1972
data, its levels are significantly higher than those for the
earlier five years, but the medians of the LN2 model for the
four approaches are not quite as high as those for the 1972 data.
Unlike the 1972 data, the straight lines for the four approaches,
when plotted on logarithmic-probability pap'er (Figure 28), have
the same close proximity to each other that was characteristic
of the first five years of data (1966-1971). The data points
tend to scatter about a straight line, and the lines for the
method of moments and the method of fractiles tend, subjectively
at least, to give the best visual fit. The frequency-based mea-
sures (Table 55) do not indicate that any one approach is supe-
rior, although the method of moments performs surprisingly well.
One reason that no single approach emerges as superior is that
the lines are sufficiently close together that the resulting
goodness-of-fit measures give very similar values. For example,
the log-likelihood function varies from L* = -9580 to L* = -9598,
only an 18 point spread. The variate-based measures (Table 56)
also do not indicate any of the four approaches clearly to be
superior, and each approach is identified as "best" by a dif-
ferent variate-based measure of goodness-of-fit.
194
-------
OVERALL FINDINGS
If we combine the EET differences for all data sets at the
CAMP station over the period 1966-1973, it is seen that the
resulting histograms for the longitudinal data group (Figure 29,
page 196) resemble those for the geographical data group (Fig-
ure 20, page 166), although some differences also are evident.
Unlike the findings for the geographical data group, the histo-
gram for the method of moments (top of Figure 29) does not show
any overpredictions greater than 12 ppm, and the distribution is
slightly skewed to the left. The histogram for the method of
fractiles, like the corresponding histogram for the geographical
data group, does not show any tendency for overpredictlon (that
is, it does not have any noticeable tail to the right), and its
average deviation (d~'= -0.04 ppm) is the smallest of the four
methods. In the geographical data group, the method of fractiles
also produces the smallest average deviation (d~ = -0.12 ppm). In
both data groups, the method of moments gives the next best aver-
age deviation (d~ = -0.50 ppm for the longitudinal data group and
d" = 0.26 ppm for the geographical data group), with similar stan-
dard deviations (1.88 ppm and 1.68 ppm, respectively). The over-
all EET statistics for the MLE approximation approach are similar
in the two data groups (cf = 2.93 PPm and s, =3.94 ppm for the
longitudinal data group and d" = 2.15 Ppm and s, = 3-71 ppm for
the geographical data group), and the histograms are similar to
each other. For the MLE optimization approach, both data groups
show a similar tendency for large overpredictions above 12 ppm,
although the average overpredlction Is much smaller in the longl-
195
-------
90
80
170
560
§50-
IAO-
530-
§20-
10-
*£d
METHOD! OF MOMENTS!
d • - 0.50
sd-1.88
n-209
22222,
-12-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 *3 +4 +5 *6 +7 +8 +9 +10 +11+12
= xp-x0 (ppm)
METHOD OF FRACTILES
d . - 0.04
Sd-1.37
-12-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 *1 +2 -3 -4 -5 *6 +7 +8 +9 +10 +11+12
d = xp-x0 (ppm)
MLE APPROXIMATION
d-2.93
sd -3.94
n-209
NO OF VALUES GREATER
THAN 12 > 10
-12-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 -1 +2 -3 +4 +5 *6 +7 +8 +9 +10 *
= xp-x0 (ppm)
MLE OPTIMIZATION
NO. OF VALUES GREATER
THAN 12 >9
0L . .
-12-11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 -1 +2 -3 *4 +5 *6 +7 +8 +9 +10 +11+12
d = xp-x0 (ppm)
Figure 29. Histograms of the deviations between predicted and observed CO concentrations for
four different methods of calculating parameters, Washington, D.C., CAMP, 1966-1973.
d • 1.29
Sd -3.22
n=209
196
-------
tudinal data group (d = 1.29 ppm) than in the geographical data
group (cf = 2.75 ppm) • The standard deviation also is somewhat
less (s, =3.22 ppm for the longitudinal data group and s, =
4.56 ppm for the geographical data group).
According to the EET measures of goodness-of-fit reflected
in Figure 29, the method of fractiles gives the best overall
performance. How good is this? For the longitudinal data group,
6l$ of the predictions from the LN2 model, with parameters calcu-
lated by the method of fractiles, are within + 1 ppm of the true
value; 81# are within + 2 ppm; Q^fo are within + 3 ppm; and 91#
are within + 4 ppm. These findings are similar to those for the
geographical data group (see Table 4l, page 169) in which
of the observations are within + 1 ppm of the true value;
are within + 2 ppm; 92^ are within + 3 ppm; and 97^ are within
+ 4 ppm. Using the method of fractiles to represent a data set
within either data group, therefore, we can be QQ% certain that
the LN2 model will fit the entire range of observations within
+ 2 ppm. We can be at least 90$ eertain that the model will fit
the entire distribution within + 4 ppm, and it usually will do
better than this for the geographical data group. Whether this
result is satisfactory for representing a data set will depend
on the particular application to be made by the data analyst
(see related discussion on pages 170-172).
Discussion
An important application of probability models in air
pollution control is to estimate the value of the maximum concen-
197
-------
tration, or, sometimes, the second highest maximum concentration
for the year. In such applications, we usually are interested
not in the maximum for a particular data set, which can be viewed
as one realization from some underlying distribution, but in the
expected maximum of the underlying distribution. Assume, for
example, that the LN2 model correctly describes the underlying
distribution. Assume, also, that the air pollution problem
(the magnitude of source emissions and their spatial configura-
tion) remains unchanged from year to year. Then, meteorological
events alone (ignoring any error introduced by the measurement
system) would be responsible for changes in the observed concen-
trations from year to year. If we treat these events as random
phenomena, we can assume that each year is a realization from
the same underlying LN2 distribution. For air pollution control
purposes, we are interested in the average of the maxima that
would result from this process in a number of successive years.
That is, we are interested in the expected value of the maxima
for a number of years and not in the maximum caused by the
particular meteorological conditions occuring in any one year.
If we actually were sampling, at random, from an LN2 dis-
tribution, the method of MLE optimization would be the best
approach for estimating parameters, for the resulting model
would give estimates of the expected value of the maxima that
have the lowest possible variance. Because the CAMP data for
the period 1966-1971 seemed to arise from similar distributions,
and because emissions may not have changed significantly over
this brief period, it is of interest to examine the maxima pre-
198
-------
dieted by the model using two different approaches for calculat-
ing parameters: the method of MLE optimization and the method of
fractiles.
Using the notation for the LN2 model Introduced in Chap-
ter III (see pages 29 to 44), X denotes concentration, and Y is
a normally distributed random variable that describes the loga-
rithm of X; that Is, Y = (in X - M-)/tf. Then the maximu/n concen-
tration predicted by 'the model for each data set is the value
X = x which corresponds to the probability associated with
fflclX
the highest value In the data set. In a data set containing n
observations (assuming that missing values are randomly distri-
buted over the year), we find the Inverse CDF corresponding to
the probability (or cumulative frequency) of f
"
max
(136)
Because tables are readily available to evaluate the CDF of the
normal distribution, we first determine the inverse CDF of Y:
(137)
Solving Equation 137 for x ,.
max
max
aymax |
= e max e
(138)
Noting, from Equation 42, that *& = e^ and, from Equation 71, that
199
-------
tf_ = e , we can write Equation 138 very simply in terms of the
O
model's median and standard geometric deviation:
To apply this relationship to the five similar CAMP data sets,
we calculate y „,,,_. from the inverse normal CDF for each data set
using Equation 136. Then we use the values of /mv and s given in
o
Tables ^3-52 of this chapter to calculate x using Equation
fflclX
139 and setting a = s . The results are given in Table 57.
O g
The resulting maxima calculated by the two approaches
are quite different from each other. The average of observed
maxima for the five years of data is 36.2 ppm (last column of
Table 57) . The average value of the maxima for the method of
fractiles is 35-3 ppm* which comes relatively close to the ob-
served average of 36.2 ppm. The average value of the maxima
for the MLE optimization approach is 32.2 ppm, which underesti-
mates the observed average value. Notice, however, that the stan-
dard deviation of the maxima for the KLE optimization approach
is smaller than the standard deviations of maxima for either
the method of fractiles or the actual observations. Thus, the
actual observations and the fractiles result show considerably
more variation from year to year than does the MLE optimization
approach. The observer! maximum for any one year is a relatively
poor estimate of the expected value of the maximum, since it
may be considerably higher or lower than the true value. Appli-
cation of the LN2 model in any year tends to reduce the variance,
200
-------
t-
LP,
0
rH
,0
CO
EH
a
.C
o
•> CO
o 2
• a
« ft
^j
•*
C 0
o s
-P EH
G {-"J
•H ,0
.c!
CO T3
CO 0
IS -P
CO
£ rH
«H 3
0
O-i H
S CO
<5 O
O
CO
PH £H
O 0
Q— 1 IJ>
0
CQ E
C cO
O £H
•H CO
-P P-i
cO
-p -p
G *rH
0 5
O
C r~H
o 0
o -o
o
0 S
o
OJ
J>3 g
rH i-q
3 0
O XJ
W -P
£ ^i
E ^
•H -O
X 0
cO -P
S o
•H
•D
0
S
rH 3»— "^
CO E E
3 -H ft
-P X ft
S ^^^
c
O
t>,-P
JD CO
N
E vH^-v
3 E E
EnH ft
•H -P ft
X ft"^^
cO O
S
M
• "j
^
[>j
J2 CQ
0
ErH^-
3 «H E
E-P ft
•H O ft
X cO^ — *
CO in
Sfe
X
cO
E
t?j
X
CO
E
OH
£
^
cO
0
>H
oo
oo
oo
•
CTi
OJ
k
00
•
oo
OO
OJ
vo
VO
LOv
vo
•
oo
OJ
OJ
0-
00
CTi
en
CTi
o
CM
CM
00
IS-
vo
vo
CTi
rH
£ —
OJ
[ —
•
£ —
00
vo
•
H
00
vo"
£ —
oo
vo
•
oo
_^-
OJ
vo
CO
cr>
cr>
cr>
O
vo
VO
OJ
t —
03
VO
en
rH
LPv
OJ
t-
•
r-H
OO
OO
f>-
OJ
00
o
en
LPv
•
OO
en
^j-
oo
CO
en
en
en
O
vo
LOv
O
vo
en
vo
en
rH
LOv
^f
H
•
LfN
00
CO
*
it
CO
rH
^J-
LO\
VO
•
OO
en
"O
IS-
co
o^
o^
en
O
CO
_^j-
IS-
IS-
o
en
rH
vo
_^-
O
•
[ —
OJ
N^_
•
CO
OO
OJ
t-
OO
vo
•
oo
rH
OJ
VO
CO
en
en
en
O
^j-
LTv
OJ
t —
rH
en
rH
OJ CO
* •
vo en
00
cu oo
• •
OJ ^J-
oo
OO t>-
LOi VO
oo
• • ..
G t*
cO 0
0 Q
S
O 13
•H -P
-P CO
0
E
-P -P
•H -H
C t $•
-------
thereby increasing our chances of accurately estimating the true
expected value of the maximum concentration. At first glance, it
might appear from these findings that the method of fractiles is
the best approach for making this estimation.
The confidence Interval about the average value of the
—— ts
maxima can be calculated as xmov + . in which t is obtained
max ~ -/rT
from (two-sided) tables of the Student's t-distribution and s is
the standard deviation of maxima. For the observed maxima and n
= 5, the 95$ confidence interval about the mean is calculated as
36.2 + (2.776)(4.4) = 36.2 + 12.2 ppm. Thus, we can say that the
true expected value of the maximum CO concentration lies, with
95^ assurance, between 24.0 ppm and 48.4 ppm. Although the
average value obtained from the method of fractiles is closer to
the observed average value, this finding is not conclusive because
of the wide confidence interval that exists around the observed
average value. Notice that, in three of the five years of data
(1968, 1969, and 1970), the MLE estimate of the maximum actually
is closer to the average of observed maxima than is the method
of fractiles estimate, and the MLE probably is the better approach.
However, this five-year time series appears to be too brief
to compare the performance of the two parameter calculation pro-
cedures. Furthermore, our assumption that the underlying distri-
bution of these observations is LN2 in form may be false, and
our assumption that the air pollution problem remained constant
over this period may not be valid. Also, we have disregared the
slight difference in the number of observations available for
202
-------
each year of data. Nevertheless, the findings do suggest that
the LN2 model, when applied in this manner, can be useful in
estimating the expected value of the maximum concentration and
therefore can be used for predictive purposes. A chief advan-
tage of using the model (and various other candidate probability
models as well) is that it reduces the variance of the estimate.
In this example, the LN2 model, with parameters calculated by
the method of fractlles, reduces the variance, or square of the
standard deviation, from (9.8)2 = 96.0 to (6.7)2 = 44.9, a reduc-
tion of 53$. The MLE optimization approach for calculating
P
parameters performs even better, reducing the variance to (4.3)
=18.5, a reduction of 81#.
Thus, if only one year of observations is available, the
predicted maximum concentration should lie closer to the average
value of maxima if we use the model than if we use the actual
maximum for that year. The observed maximum, because it is sub-
ject to considerable variability, is not as good an estimate of
the average of maxima as is the value predicted by the model.
Because of the importance attached to maximum concentrations by
regulatory officials, these results deserve greater attention.
The advantages of the MLE optimization approach, and the need to
find other models that are superior to the LN2 probability model
also should receive great- attention. -It would be useful, in
future research, to examine a time series covering a larger
span than five years and to investigate, in greater detail, the
significance of the assumptions used in this example.
203
-------
X. CONCLUSIONS
The authors' conclusions about the results of this Investi-
gation are summarized briefly as follows:
# * *
When the lognormal (LN2) probability model Is applied to
real air quality data, the method chosen for calculating Its
parameters exerts considerable Influence on the predictions and
hence on the goodness-of-fIt of the model to the data.
* * *
Air pollution concentration data for CO (hourly observa-
tions), when plotted on logarithmic-probability paper, show
either (l) a zig-zag pattern about a straight line, or (2) cur-
vature that Is concave downward. Concave upward curvature Is
very uncommon.
•* * *
For data sets that show concave downward curvature, the
method of fractlles approach for calculating parameters gives
overpredlctlons at the lower concentrations, and the MLE optimi-
zation approach gives overpredlctlons at the higher concentra-
tions .
* * *
According to frequency-based measures of goodness-of-flt
(for example, tests based on the difference between the number
of observations predicted and the number observed In each Inter-
val), the MLE approach gives the best fit.
* * *
204
-------
According to variate-based measures of goodness-of-fit
(that is, statistics based on the difference between the pre-
dicted and observed concentrations at each Interval), the method
of fractiles — with parameters calculated at the 70$ and 99.9$
cumulative frequencies in the manner suggested by Larsen — gives
the best fit.
* * *
According to all goodness-of-fit measures, the method of
moments and the method of MLE approximation seldom give the best
fit.
* * *
Graphical plots on logarithmic-probability paper, when
examined subjectively by an observer, tend to emphasize variate-
based goodness-of-fit measures and underemphasize frequency-
based goodness-of-fit measures.
* * *
Variate-based measures of goodness-of-fit, because they
weight each interval equally regardless of the number of obser-
vations it contains, are unduly influenced by randomness, errors,
and fluctuation which occurs at the tails of the distribution.
* -* *
Frequency-based measures of goodness-of-fit, because they
weight the number of observations in each interval, are less
sensitive to randomness at the tails of the distribution.
* •* *
If hypothesis tests are undertaken, the MLE optimization
205
-------
approach gives the best values for chi-square and other fre-
quency-based measures of goodness-of-fit.
* * *
Using the LN2 model with the method of fractiles to repre-
sent an hourly CO air quality data set, one can be 80$ certain
that the model will fit the entire range of observations within
+ 2 ppm. One can be at least 9Q% certain that the model will
fit the entire distribution within + 4 ppm.
* * *
If hypothesis testing is undertaken for urban air quality
data, and if problems of statistical independence are ignored,
then the LN2 model would be rejected with a high degree of con-
fidence (P > 0.995) for all data sets and all cities considered.
# * *
Although a particular data set may not be "lognormal" in
the statistical sense (that is, it does not satisfy frequency-
based measures of goodness-of-fit), it may be lognormal in the
engineering sense (that is, it satisfies variate-based EET mea-
sures sufficiently for its intended application); we shall call
such data sets "psuedo-lognormal."
# * #
Pew U.S. air quality data sets appear to be lognormal in
the conventional statistical sense; whether they are considered
psuedo-lognormal will depend on the particular application facing
the data analyst and the criteria he establishes for that appli-
cation.
* * #
2O6
-------
The analyst should not automatically choose the LN2 model
for use with air quality data. Rather, he should evaluate can-
didate models and consider each data set on an individual basis.
Other probability models for possible consideration include the
gamma, beta, Weibull, LN3C, and Johnson distributions.
* * *
Many of the data sets in this investigation exhibited their
largest chi-square values for intervals that were close to the
origin, suggesting that probability models which fit well at the
lower concentrations (such as the LN3 and LN3C models) would sig-
nificantly improve the frequency-based measures of goodness-of-
fit.
•* # -*
Application of a proability model (such as the LN2 model)
to air quality data appears to bfe helpful in predicting the
expected value of the maximum concentration. If only one year
of data is available, for example, application of the LN2 model
can reduce the variance of the prediction, thereby reducing the
chance of error in predicting the (long-term) expected value of
the maximum hourly CO concentration.
* * *
A quality assurance data checking program should be imple-
mented by EPA to compare data contained in the SAROAD data bank
with the data originally submitted by state and local air pollu-
tion control agencies (see pages 107-109 and 138-140). All dis-
crepancies should be investigated and corrected.
207
-------
REFERENCES
1. Mage, D. T., and W. R. Ott, "Refinements of the Lognormal
Probability Model for Analysis of Aerometrlc Data," J.
Air Poll. Control Assoc. 28, p. 796 (1978).
2. Mage, D.T., and W.R. Ott, "Testing the Validity of Selected
Probability Models: Graphical Analysis of Carbon Monoxide
Data from U.S. Cities," Environmental Protection Agency,
Research Triangle Park, N.C,, to be published in 1980.
3. Aitchison, J. and J.A.C. Brown, The Lognormal Distribution,
Cambridge University Press, New York (1955).'
4. Hatch, T. and S. P., Choate, "Statistical Description of the
Size Properties of Non-uniform Particulate Substances,"
J. Franklin Inst. 2O7, p. 369 (1929).
5. Krumbein, W.C., "Application of Logarithmic Moments to Size
Frequency Distributions of Sediments," J. Sediment. Petrol.
6, p. 35 (1936).
6. Kolmogorov, A. N., "Uber das logarithmisch normale Vertei-
lungsgesetz der Dimensionen der Teilchen bei Zerstuckelung,"
C.R. Acad. Sol. U.R.S.S. 31* P. 99 (l94l).
7. Drinker, P. and T. Hatch, Industrial Dust, 2nd Ed., Me Graw-
Hlll, New York, p. 149 (1954).
8. Harris, E.D., and E.G. Tabor, "Statistical Considerations
Related to the Planning and Operation of a National Air
Sampling Network," Proceedings of the 49th Annual Meeting
of the Air Pollution Control Association, Buffalo, N.Y.
(1956).
9. Zimmer, C.E., E.G. Tabor, and A.C. Stern. "Particulate Pollu-
tants in the Air of the United States," J. Air. Poll.'Con-
trol Assoc., 9, p. 136 (1959).
10. deNevers, N., K.W. Lee. and N.H. Frank, "Patterns in TSP Dis-
tribution Functions, J. Air Poll. Control Assoc., 29, p.
32 (1979).
11. Larsen, R.I., "A Method for Determining Source Reduction
Required to Meet Air Quality Standards," J. Air Poll. Con-
trol Assoc., 11, p. 71 (1961).
12. Larsen, R.I., "Parameters of Aerometric Measurements for Air
Pollution Research," Am. Ind. Hyg. Assoc. J., 22, p. 97
(1961).
208
-------
13. Larsen, R.I., "United States Air Quality," Aroh. Environ.
Health, 8, p. 325 (1964).
14. Larsen, R.I.. "Determining Basic Relationships Between
Variables, Symposium on Environmental Measurements, PHS
Publication No. 999-AP-15, Sanitary Engineering Center,
Cincinnati, Ohio, pp. 251-263 (1964).
15. Zimmer, C.E., and R.T. Larsen, "Calculating Air Quality and
Its Control," J. Air Poll. Control Assoc., 15, p. 565
(1965).
16. Larsen, R.I., "Air Pollution from Motor Vehicles," Annals
of the New York Academy of Sciences, 136(12), p. W5
C1966)~
17- Larsen, R.I., "Determining Source Reduction Needed to Meet
Air Quality Standards," Proceeding, Part I, of the Inter-
national Clean Air Congress, pp. 56-74, London C19bt>).
18. Larsen, R.I., C.E. Zimmer, D.A. Lynn, and K.G. Blemel,
"Analyzing Air Pollutant Concentration and Dosage Data,"
J, Air Poll. Control Assoc., 17, p. 85 (1967).
19. Larsen, R.I., "Determining Reduced-Emission Goals Needed to
Achieve Air Quality Goals—A Hypothetical Case," J. Air
Poll. Control Assoc., 17, p. 823 (1967).
20. Larsen, R.I., "Future Air Quality Standards and Industrial
Control Requirements," Proceedings; The Third National
Conference on Air Pollution, PHS Publication No. Ib49,
U.S. Government Printing Office, Washington, D.C. (196Y).
21. Larsen, R.I., "A New Mathematical Model of Air Pollutant
Concentration, Averaging Time, and Frequency," J. Air
Poll. Control Assoc., 19, p. 24 (1969).
22. Larsen, R.I. and H.W. Burke, "Ambient Carbon Monoxide Expo-
sures," APCA Paper No. 69-167 presented at the 62nd Annual
Meeting of the Air Pollution Control Association (1969).
23. Larsen, R.I., "Relating Air Pollutant Effects to Concentra-
tion and Control," J. Air Poll. Control Assoc., 20, p.
214 (1970).
24. Larsen, R.I., "A Mathematical Model for Relating Air Quality
Measurements to Air Quality Standards," Publication No.
AP-89, U.S. Environmental Protection Agency, Research
Triangle Park, N.C. (1971).
25. Lowrimore, G.R., "The Probability Distribution of the Loga-
rithm of the Sum of Two or More Lognormal Random Variables,
209
-------
and a Compound Polsson-lognormal Probability Distributior
with Applications to the Environmental Sciences," Ph.D.
Thesis, Virginia Polytechnic University, University Micro
films, Ann Arbor, Michigan, No. 74-29,135 (1972).
26. Larsen, R.I., "Discussion of Important Factors for the Sul-
fur Dioxide Concentration in Central Stockholm," Atmos-
pheric Environment, 6, p. 423 (1972).
27. Larsen, R.I., "Air Quality Frequency Distributions and
Meteorology," Proceedings of the Fourth Meeting of the
Expert Panel on Air Pollution Modeling, NATO Committee
ing,
, Ma;
on the Challenges of Modern Society, May 28-30, 1973.
28. Larsen, R.I., "An Air Quality Data Analysis System for
Interrelating Effects, Standards, and Needed Source
Reductions," J. Air Poll. Control Assoc., 23, p. 933
(1973).
29. Larsen, R.I., "An Air Quality Data Analysis System for
Interrelating Effects, Standards, and Needed Source
Reductions — Part 2," J Air Poll. Control Assoc., 24,
p. 551 (1974).
30. Larsen, R.I., "Relating Air Pollutant Effects to Concentra-
tion and Control," in Detection and Control of Air Pol-
lution, pp. 94-122, edited by J.C. Webb, J. Leroux, et.
al., MSS Information Corp., 655 Madison Avenue, New York,
N.Y. 10021 (1974).
31. Larsen, R.I., "Relating Air Quality Data to Effects, Stan-
dards, and Needed Source Reductions," Proceedings of the
Conference on Ambient Air Quality Measurements, pp. IV;
40-55, edited by W.B Chadick and G.F. Hoffnagle, South-
west Section of the Air Pollution Control Association,
March 10-11 (1975)•
32. Chapman, L.D., G.G. Akland, J.F. Finklea, R.I. Larsen, T.D.
Mount, W.C. Nelson, D.C. Quigley, and W.C. Wilson, "Elec-
tricity Demand: Project Independence and the Clean Air
Act," Publication No. ORNL-NSF-EP-89, Oak Ridge National
Laboratory, Oak Ridge, Tennessee (1975)•
33. Phinney, D.E., and J.E. Newman, "The Precision Associated
with the Sampling Frequencies of Total Partlculate at
Indianapolis, Indiana," J. Air Poll. Control Assoc.,
22, p. 692 (1972).
34. Gifford, F.A., Jr., "The Lognormal Distribution of Air Pol-
lution Concentrations," Air Resources Atmospheric Turbu-
lence and Diffusion Laboratory, ESSA, Oak Ridge, Ten-
nessee (preprint, 3p.)(l969).
210
-------
35. Proceedings of the Symposium on Statistical Aspects of Air
Quality Data, edited by L.D. Kornreich, U.S. Environmental
Protection Agency, Research Triangle Park, N.C., No.
EPA-650/4-74-038 (1974).
36. Gifford, F.A., Jr., "The Form of the Frequency 'Distribution
of Air Pollution Concentrations," Proceedings of the Sym-
posium on Statistical Aspects of Air Quality Data, U.S.
Environmental Protection Agency, Research Triangle Park,
N.C., No. EPA-650/4-74-038, pp. 3-1 to 3-6 (1974).
37. Knox, J.B., and R.I. Pollack, "An Investigation of the
Frequency Distribution of Surface Air Pollutant Concen-
trations," Proceedings of the Symposium on Statistical
Aspects of Air Quality Data, U.S. Environmental Protec-
tion Agency, Research Triangle Park, N.C., No. EPA-650/
4-74-038, pp. 9-1 to 9-17 (197*0. '
38. Kahn, H.D., "Note on the Distribution of Air Pollutants,"
J. Air Poll. Control Assoc. 23, p. 973 (1973).
39. Pollack, R.I., "Studies of Pollutant Concentration Frequency
Distributions," U.S. Environmental Protection Agency,
Research Triangle Park, N.C. No. EPA-650/4-74-004 (1975).
40. Benarie, M., "Sur la Validite de la Distribution Logarith-
mico-normale des Concentrations de Pollutant," Pro -
ceedings of the 2nd International Clean Air Congress,
Washington/ D.C., 1970, Academic Press, New York, N.T.,
pp. 68-70 (1971).
4l. Benarie, M., "The Use of the Relationship Between Wind
Velocity and Ambient Pollutant Concentration Distribu-
tions for the Estimation of Average Concentrations from
Gross Meteorological Data," Proceedings of the "Symposium
on Statistical Aspects of Air Quality Data, U.S. Environ-
mental Protection Agency. Research Triangle Park, N.C.,
No. EPA-650/4-74-038, pp. 5-1 to 5-17 (1974).
4.2. Lynn, D.A., "Fitting Curves to Urban Suspended Particulate
Data," Proceedings of the Symposium on Statistical
Aspects of Air Quality 'Data, U.S. Environmental Protec-
tion Agency, Research Triangle Park, N.C., No. EPA-650/
i|_74-038, pp. 13-1 to 13-27 (1974).
43. Mage, D.T., "On the Lognormal Distribution of Air Pollu-
tants," Proceedings of the Fifth Meeting of the Expert
Panel on" Air Pollution Modeling, NATO/CCMS N.35,
kilde, Denmark, June 4-b
44. Mage, D.T. and W.R. Ott, "An Improved Statistical Model for
Analyzing4 Air Pollution Concentration Data," Paper No.
211
-------
75-51-^ presented at the 68th Annual Meeting of the Air
Pollution Control Association, Boston, MA (1975).
45. Mage, D.T., and W.R. Ott, "An Improved Model for Analysis of
Air and Water Pollution Data," International Conference on
Environmental Sensing and Assessment, Volume 2, IEEE No.
#75-CH 1004-1 ICES A, p. 20-5, September (1975).
46. Ott, W.R., and D.T. Mage, "A General Purpose Univarlate
Probability Model for Environmental Data Analysis," Comput.
and Ops. Res., 3* p. 209 (1976).
K
47. Kalpasanov, ¥., and G. Kurchatova, "A Study of the Statlsti-
- cal Distributions of Chemical Pollutants in Air," J. Air
Poll. Control Assoc. 26, p. 981 (1976).
48. Curran, T.C., and N.H. Prank, "Assessing the Validity of the
Lognormal Model When Predicting Maximum Air Pollutant Con-
centrations," Paper No. 75-51.3* presented at the 68th
Annual Meeting of the Air Pollution Control Association,
Boston, MA (1975) .
49. Milokaj, P.G., "Environmental Applications of the Weibull
Distribution Function: Oil Pollution," Science, 176, p.
1019 (1972).
50. Johnson, T., "A Comparison of the Two-parameter Weibull and
Lognormal Distributions Fitted to Ambient Ozone Data,"
Proceedings, Quality Assurance in Air Pollution Measure-
ments, New Orleans, LA, edited by the Air Pollution Control
Association, March 11-14, pp. 312-321 (1979).
51. Bencala, K.E., and J.H. Seinfeld, "On Frequency Distributions
of Air Pollutant Concentrations," Atmos. Environ. 10, p.
941 (1976).
52. Larsen, R.I., "An Air Quality Data Analysis System for Inter-
relating Effects, Standards, and Needed Source Reductions:
Part 4, A Three-parameter Averaging-time Model," J. Air
Poll. Control Assoc., 27, p. 454 (1977).
53. Larsen, R.I., "An Air Quality Data Analysis System for Inter-
relating Effects, Standards, and Needed Source Reductions
— A Summary," Proceedings of the Fourth International
Glean Air Congress, edited by S. Kasuga et. al., The
Japanese Union of Air Pollution Prevention Associations,
Tokyo, Japan, pp. 323-325 (1977).
54. Finklea, J.F., C.M. Shy, J.B. Moran, W.C. Nelson, R.I.
Larsen, and G.G. Akland, "The Role of Environmental Health
Assessment in the Control of Pollution," in Advances in
Environmental Science and Technology, Vol. 7* edited by
212
-------
J.N. Pitts, Jr., and R.L. Metcalf, John Wiley and Sons,
New York, N.Y., pp. 315-389 (1977).
55. Larsen, R.I., "Relating Data to Effects, Standards, and
Source Reductions," In Quality Assurance Practices for
Health Laboratories, edited "by S.L. Inborn, American
Public Health Association, Washington, D.C., pp. 329-
338 (1978).
56. Larsen, R.I., "An Air Quality Data Analysis System for Inter-
relating Effects, Standards, and Needed Source Reductions
— Part 6, Calculating Needed Source Reductions from an
Analysis of Several Years of Air Quality Data," APCA
Paper No. 78-63.6, presented at the 71st Annual Meeting
of the Air Pollution Control Association, June (1978).
57. Mage, D.T., "Data Analysis by Use of Unlvarlate Probability
Models," Paper No. l-G/75* presented at the Seminar on Air
Pollution Control, Ministry of Health and Social Affairs,
Seoul, Korea (1975).
58. Johnson, N.L., "Systems of Frequency Curves Generated by
Methods of Translation," Blometrlka 36, p. 149 (1949).
59, Ledolter, J., G.C. Tlao, G.B. Hudak, J.T.Hsleh, and S.B.
Graves, "statistical Analysis of Multiple Time Series
Associated with Air Quality Data: New Jersey CO Data,"
Technical Report No. 529, Department of Statistics,
University of Wisconsin, Madison (1978).
60. Singpurwalla, N.D., "Extreme Values from a Lognormal Law
with Applications to Air Pollution Problems," Techno-
metrlcs, 14, p. 703 (1972).
6l. Roberts, E. M., "Review of Statistics of Extreme Values with
Applications to Air Quality Data, Part I, Review," J. Air
Poll. Control Assoc., 29, p. 632 (1979).
62. Roberts, E. M., "Review of Statistics of Extreme Values with
Applications to Air Quality Data, Part II, Applications,"
J. Air Poll. Control Assoc., 29, p. 733 (1979).
63. Trajonls, John, "Empirical Relationships between Atmospheric
Nitrogen Dioxide and its Precursors, U.S. Environmental
Protection Agency, Research Triangle Park, N.C., No. EPA-
600/3-78-018 (1978).
64. "Guidelines for the Interpretation of Air Quality Standards,"
U.S. Environmental Protection Agency, Research Triangle
Park, N.C., OAQPS No. 1.2-008, revised February 1977
(1977).
213
-------
65. Mage, D.T., T.R. Fltz-Slmons, D.M. Holland, and W.R. Ott,
"Techniques for Fitting Probability Models to Experimental
Data," Proceedings, Quality Assurance in Air Pollution
Measurement, New Orleans, Louisiana, edited by the Air
Pollution Control Association, March 11-14, pp. 304-311
(1979).
66. Hunt, W.F., Jr., "The Precision Associated with the Sampling
Frequency of Log-Normally Distributed Air Pollutant Mea-
surement," J. Air Poll. Control Assoc., 22, p. 687 (1972).
67. Akland, Gerald, "Applications of Probabilistic Models to Air
Quality Data," internal memorandum, U.S. Environmental
Protection 'Agency, Environmental Monitoring and Systems
Laboratory, Research Triangle Park, N.C., September 15,
1978.
68. Larsen, R.I., U.S. Environmental Protection Agency, Research
Triangle Park, N.C., personal communication (1978) .
69. Nehls, Gerald J., and Gerald G. Akland, "Procedures for
Handling Aerometric Data," J. Air Poll. Control Assoc.,
23, P. 180 (1973).
70. Kushner, E.J., "On Determining the Statistical Parameters
for Pollution Concentration from a Truncated Data Set,"
Atmos. Environ., 10, p. 975 (1976).
71. Hald, A., "Maximum Likelihood Estimation of the Parameters
of a Normal Distribution which is Truncated at a Known
Point," Skandlnavlsk Aktuarietidskrift, 32, pp. 119-134
(1949).
72. Hahn, Gerald J., and Samuel S. Shapiro, Statistical Models
in Engineering, John Wiley & Sons, Inc., New York, N.Y.
(19677.
73. Green, J.R., and U.A.S. Hegazy, "Powerful Modlfied-EDF
Goodness-of-Fit Tests," J. American Statistical Assoc.,
71, pp. 204-209 (1976).
74. ott, Wayne R., "A Fortran Program for Computing the Pollu-
tant Standards Index (PSl), U.S. Environmental Protection
Agency, Washington, D.C., No. EPA-600/4-78-001, May (1978),
75. Hastings, C., Jr., assisted by J.T. Hayward and J.P. Wong,
Jr., Approximations for Digital Computers, Princeton
University Press, Princeton, N.J.(1955).
214
-------
APPENDIX A
LISTING OF THE COMPUTER PROGRAM
215
-------
C PROGRAM NAME « HIST06RAM.il VERSION B MAY 20* 1979
C (VERSION B READS DATA IN"INDEX.PLOT FORMAT)
C PROGRAMMER IS HAYNE R. OTT, U. S. ENVIRONMENTAL PROTECTION AGENCY
C
C PROGRAM ANALYZES ENVIRONMENTAL DATA SETSt
C I. COMPUTES BASIC STATISIICS
C 2. GENERATES AND PLOTS A HISTOGRAM
C 3. EXAMINES GOODNESS OF Ml FOR LOGNORMAL MODEL
C 4. CALCULATES AND PLOTS IH€ ATOCORRELATION FUNCTION
C
C
c * * * USER INSTRUCTIONS * * *
c
C MODIFY INPUT FORMAT 0110 AS APPROPRIATE FOR ENTERING DATA
C IF LABEL IS DESIRED, INSERT t tXTRA LINES AT BEGINNING OF DATA SETl
C FIRST LINE MUST HAVE "U* iN COL. 1-2
C SECOND LINE MUST HAVE LABEL IN COL. 1-80
C SET 'DUMMY* TO THE CODE FOR MISSING VALUES
OUMMY»9999.99
C SET 'BASE1 AS LOHEST VALUE FUR CALCULATION
C SET 'DELTA' AS INTERVAL SIZE AND LIMIT AS NUMBER OF INTERVALS
C (NOT MORE THAN 100 INTERVALS ARE POSSIBLE)
BASEa-O.So
DELTA-1.0
LIMIT«60
C IF VALUES ABOVE RANGE ARE TO Bt PRINTED, SET INSTR1*!
INSTR1«1
C SET DIMENSION VALUE(...) LARUE ENOUGH TO HOLD ALL DATA
C SET DIMENSION STORE(...) LARUE ENOUGH TO HOLD ALL DATA
C
C
C * * * PROGRAM HOUSEKEEPING * * *
C
DIMENSION STORE(9000)
DIMENSION VALUE(9000)
DIMENSION LABEL(60)
DIMENSION TEMPU2)
DIMENSION XI(100),JH(100J,KV(100),CUM(100),HI(1000)
DIMENSION XB(100),JB(100),HB(100),CUB(100),H8(50)
DIMENSION X(100),DIFF(100)
COMMON /B/MARK1,HARK2,IBCA"K,LINE(50)
COMMON /F/AVE,STD
COMMON /G/DUMMY
C INITIALIZE VALUES (L IS NO. UF VALUES READ; N IS NO. OF VALID VALUES)
1 L«0
N*0
C CLEAR ARRAYS
DO 10 i«i,eo
LABEL(I)«IBLANK
10 CONTINUE
C
C
C * * * DATA INPUT SECTION * * *
C
C READ INPUT VALUE ACCORDING Tu 3AROAD CARBON MONOXIDE FORMAT
5 READ(5,110) ITEST,TEMP
110 FORMAT(I2,6X,12F6.2)
C FINISH READING DATA IF ITESf**8 OR ITEST«99
IFUTEST.E0.88) GO TO JO
IFUTEST.EQ.99) GO TO 30
216
-------
C READ AND HRITE LABEL IF ITE3H11
IF (ITE3T.NE.il) 60 TO 26
READ(5,120) LABEL
120 FORMAT(8QA1)
HRITE<6,220) LABEL
220 FORMATdHO,80Al)
C LABEL HAS BEEN READl RETURN 1-0* ANOTHER RECORD
60 TO 5
20 CONTINUE
C ALL 12 VALUES READ ARE STOREU IN "STORE*
DO 24 1*1,12
LiL+1
STORECL)«TEMP(I)
24 CONTINUE
C ONLY THE VALID VALUES READ AH£ STORED IN "VALUE"
C DELETE MISSING VALUES CODED AS DUMMY
DO 25 1*1,12
IFCTEMPCD.EO.DUMMY) Gu 10 25
NiNtl
VALUE(N)»TEMP(I)
25 CONTINUE
C ONE RECORD BEEN STORED) RETURN FOR ANOTHER RECORD
60 TO 5
C LAST VALUE HAS BEEN DETECTED
30 CONTINUE
C LIST SUMMARY INFORMATION
HRITE<6,230) L,N
230 FORMATUHOr'NO. OF VALUES HEAD »',I6,7X,
1'NO OF VALID VALUES STOPtD a',16)
C
C
C * * * MAIN COMPUTATION * * *
C
WRITE(6'301)
301 FORMATC///,IX,'BASIC STA USTICS—— ',///)
CALL STAT(N,VALUE)
HRITE(6,302)
302 FORMATdHl,lX,'HIST06RAM Of- RAH DATA——')
CALL HIST06
-------
CALL STAT(K,DIFF)
WRITE (6,306)
306 FORMATC1H1,IX,'PARAMETERS tST, BY OPTIMIZATION OF M. 1. £.—--')
CALL BMLE(U3,SIG3,LIMIT,XIJJH,PV,CUM)
CALL LN(U3,SIG3,N,LIMIT,K,X,OIFF,XI,JH,PV,CUM,IH)
CALL HISTOG<0,K,OIFF,-5.(i,0.5,20,XB,J8,PB,CUB,HB,IHD)
CALL 3TAT(K,OIFF) " "
C COMPUTE AUTOCORRELATION FUNCTION
CALL AUTCOR(L,N,STORE,1,1)
CALL AUTCOR(L,N,STORE,6,6)
C
C IF ITE4T*86, BEGIN COMPUTATION OVER AGAIN
IF(ItE8T.EQ.86) GO TO 1
C
END
BLOCK DATA
COMMON /B/MARKi,MARK2,IBLANK,LlNE(50)
DATA MARKt/'* '/,MARK2/'l »/,IBLANK/'
END
SUBROUTINE STATCN, VALUE)
C SUBROUTINE TO COMPUTE STATISIICAL MOMENTS
COMMON /F/AVE,STD
DIMENSION VALUEC9000)
IF (N.EQ.l) GO TO SO
C... COMPUTE MEAN
T«FLOAT(N)
SUMaO.O
00 10 1*1, N
SUMsSUMtVALUEd)
10 CONTINUE
AVEiSUM/T
HRITE(6,200) N
NRITE(6,210) AVE
C,.. COMPUTE OTHER MOMENTS
SUM2»0.0
SUM3»0.0
SUM4«0.0
DO 20 1*1, N
D«VALUE(I)-AVE
Z2«0*D
Z3«D*Z2
Z««D*Z3
SUM2«SUM2+Z2
SUM3*SUH3tZ3
20 CONTINUE
C... COMPUTE VARIANCE, STANDARD DEVIATION, AND COEFFICIENT OF VARIATION
S2«SUM2/(T-1.0)
218
-------
8TD«80RT(S2)
COF«STO/AVE
HRITE(6,220) S2,STD
MRITE(6,22S) COF
C...COMPUTE COEFFICIENT OF SKENNtSS
IF(S2.EQ.O.O> 60 TO 50
S3«SUM3/T
SKEW«S3/(S2*M.5)
HRlTE(6r230) S3,SK£H
C...COMPUTE COEFFICIENT OF KURIOSIS
8««8UM4/T
BETA2«S4/(S2*S2)
HRITE(6,240) S4,BETA2
C...FIND LOMEST AND HIGHEST VALUtS
VMAXi-99999999999.0
VMAX2«-99999999999.0
VMIN»99999999999.0
DO 30 I«1,N
IF(VALUE(I),LT.VMIN> VMIN*VALUE(I)
iF(VALUEd).GT.VMAX) IMAX«I
IF (VALUE CD.GT.VMAX) VMAX*VALUE(I)
30 CONTINUE
C...FIND SECOND HIGHEST MAXIMUM VALUE
DO 40 I»1,N
IF(I.EQ.IMAX) GO TO 40
IF(VALUE(I).GT.VMAX2) VMAX2«VALUE(I)
40 CONTINUE
HRITE(6f245) VMIN
NRITE(6«246) VMAX
HRITE(6*247) VMAX2
GO TO 60
50 CONTINUE
NRITE(6,250)
250 FORMAT(tH ,5X,'NO ADDITIONAL STATISTICS POSSIBLE*)
60 CONTINUE
200 FORMAT(1HO,5X,'N «'
210 FORHAT(lHOr5X,'MEAN
220 FORMAT(1HO,5X,'2ND MOMENT (VAR.) »',F22.10,12X,'STANDARD DEV. »',
IF20.9)
225 FORMAT(lHO,10X,'COEFFICItNF OF VARIATION *',F15.8)
230 FORMAT(1HO»5X,'3RD MOMENT *•,F20.9,9Xf'COEF. OF SKEWNESS «',F15.8)
240 FORMATClHOr5X,'4TH MOMENT *',F20.9,9Xf'COEF, OF KURT08I8 «',FIS.8)
245 FORMAT(1MO»5X,'LOHEST VALUb ••,F18.8)
246 FORMATUHOrSX,'HIGHEST VALUE «',F16.9)
247 FORMAT(lHOflOX,'SECOND HIGHEST VALUE «',F18.9)
RETURN
END
219
-------
SUBROUTINE HISTOG(IN,N,VALUE,BASE,DELTA,LIMIT,XI,JH,PV,CUM,HI, IH)
C THIS SUBROUTINE SORTS EACH Of N VALUES INTO INTERVALS OF SIT(: DELTA
C IF IN«1, THE VALUES ABOVE THt KAN6E ARE LISTED
C...XI CONTAINS THE HISTOGRAM ABSCISSA
C...PV CONTAINS THE INDIVIDUAL FNEQUENCIES
C...CUM CONTAINS THE CUMULATIVt FREQUENCIES
C...ILON IS THE NUMBER OF VALUtS BELOW THE RANGE
C...IZERO IS THE NUMBER OF ZERO VALUES
C...IEXCE3 IS THE NUMBER OF VALUtS ABOVE THE RANGE
C...HI CONTAINS THE INDIVIDUAL VALUES ABOVE THE RANGE
DIMENSION VALUEC9000)
COMMON /B/MARK1,MARK2,I8LANK,LINE<50)
DIMENSION XI(100),JH(100j,HVUOO),CUM(100)
DIMENSION HI(1000)
C CLEAR ARRAYS
DO 1 I«l,100
JH(I)*0
PV(I)«0.0
XI(I)*0.0
CUM(I)»0.0
1 CONTINUE
C...CLEAR VALUES
ILOW«0
1ZERO»0
IEXCES»0
C...SET TOP OF RANGE
TOP>FLOAT(LIMIT)*DELTA + BASE
C BEGIN SORTING ROUTINE
DO 10 I«1,N
C,..DETERMINE IF POINT IS A ZEKO VALUE
IF (VALUE(I).EQ.O.O) IZtRu.IZERO*1
C...MODIFY VALUE TO ADJUST FOR MACHINE DECIMAL CONVERSION ERROR
RVALUE.VALUE(I) * O.OOOOOi
C...DETERMINE IF POINT IS BELOR NANCE
IF(PVALUE.GE.BASE) GO 10 15
ILOW«ILOW+1
GO TO 10
15 CONTINUE
C...DETERMINE IF POINT IS ABOVt KANGE
IF(PVALUE.LT.TOP) GO TO £0
IEXCES*IEXCES+1
IFdEXCE3.GT.lOOO) GO TU 46
IF(IN.EQ.l) HI(IEXCES)«VALUE(I)
GO TO 10
20 CONTINUE
C...POINT IS WITHIN RANGE* CARny OUT SORTING
SOELTA.0.0
DO 25 J«l,100
SDELTAvDELTAtSDELTA
TDELTAsSDELTAtBASE
IF(PVALUE.LT.TDELTA) Gu 10 27
25 CONTINUE
27 CONTINUE
C PVALUE IS GE THAN VALUE(I)» THtREFORE INTERVAL HAS BEEN FOUNDt
JH(J)«JH(J)+1
GO TO 10
28 CONTINUE
C ERROR) NUMBER OF POINTS ABOVt KANGE EXCEEDS ARRAY SIZE
IFUEXCES.EQ.1001) HRI TE 16,2099)
2099 FORMAT(1H1,2X,'TOO MANY POINTS ARE OUTSIDE RANGE> INCREASE DELTA1)
10 CONTINUE
220
-------
C TOTAL HISTOGRAM POINTS (THOSt ABOVE BASE)
NUM»0
00 30 I«l, LIMIT
NUM»NUM*JH(I)
30 CONTINUE
NUMiNUMflEXCES
S«FLOAT(NUM)
C FILL ABSCISSA, INDIVIDUAL* AND CUMULATIVE FREQUENCY VECTORS
SUM«0.0
DO 40 l«l, LIMIT
XICI)«(FLOAT(I))*DELTA*BASt
PV(I)«(FLOAT(JH(I))*100.U)/S
SUM»SUM+PV(I)
CUM(I)«SUM
40 CONTINUE
C CLEAR LINE ARRAY
DO 50 K»l,50
LINE(K)«IBLANK
50 CONTINUE
C PRINT RESULTS
WRITE(6,2000) DELTA
WRIT£(6,2001) BASE, ILOW, IZtRO
WRITE(6,2002)
DO 60 I»l, LIMIT
H»IFIX(Z)
IF(M.EQ.O) GO TO 80
IF(M.GT.SO) M«50
C...FILL LINE ARRAY WITH INDIVIDUAL FREQUENCIES
DO 70 J«1,M
LINE(J)«MARK1
70 CONTINUE
80 CONTINUE
C...FILL LINE ARRAY NlTH CUMULATIVE FREQUENCIES
Q«(CUM(I)/2.)+0.5
LBIFIX(Q)
IF(L.NE.O) LINE(L)*MARK2
C... PRINT ONE LINE
AiXKD-DELTA
BIXI(I)
IC«JH(I)
DiPV(I)
E>CUM(I)
F»E*(S/(3+1.0))
IF(DELTA.LT.O.l) WRI TE (6,2u03) A,B,IC,D, (LINE(K) , K»lf 50)»
IF(DELTA.GE.O.l) HRI TE (6,2004) A,B, 1C, 0, (LINE(K) , K«l, 50) ,E,F
C... CLEAR LINE ARRAY
IF(M.EQ.O) GO TO 90
DO 90 K«1,M
LINE(K)»IBLANK
90 CONTINUE
IF(L.NE.O) LINE(L)«IBLANK
C... CHANGE PAGE
IFd.NE.5l) GO TO 60
HRITE(6,2005)
WRITE(6,1999)
HRITE(6,2002)
60 CONTINUE
C... PRINT BOTTOM OF GRAPH
HRITE(6,2005)
221
-------
C... PRINT NO. OF VALUES ABOVE KAN6E
PEXCESsr
2002 FORMAT (IHO,5X,« INTERVAL ',5X,« NUMBER ',3X, 'PERCENT',
12Xr'*— ————— —HISTOGRAM— ————— —f«,
21X,«CUM. FREO.',2X,'PLOT
2003 FORMATdH , 1X,F7.«, '-'
2004 FORMATC1H , 1X/F7.2, '*' , F6.«f, IX, l8,2XrF7,3,
2005 FORMATdH f 36X, 'f— —-«—-———————
1-+')
2006 FORMAT(1HO»2X,«NO. OF VALUtS EQUAL TO OR GREATER THAN
-------
X1«AVE1
HRITEC6,200)
16 X2«XI(IL2)-0.5
MRITEC6,202) X1,Y1,X2,Y2
SIG»(ALOG(X2/X1))/(Y2-Y1)
U«ALOG(X1)-SIG*Y1
GO TO 30
200 FORMAT<1HO»'70 X ARE ZERUSl MEAN IS USED FOR 70 PERCENTILE')
20 NRITE(6»201)
201 FORMATC1HO,'LARSEN HETHOu NOT POSSIBLE! NO 99.9 PERCENTILE1)
30 CONTINUE
202 FORMAT(1HO,5X,'FOR THE 7o KERCENTILE, X «',F7.3,' AND Y "»,
IF9.5,'| FOR THE 99.9 PE«CtNTlLE, X «',F7.3,' AND Y »',F9.5)
RETURN
END
SUBROUTINE AMLE(U,5IG»N,VALUE)
C SUBROUTINE TO COMPUTE PARAME1EKS BY THE APPROXIMATE M.L.E. METHOD
C IGNORES GROUPING) ASSUMES ZEROS ARE AT MINIMUM DETECTABLE LIMT (.50)
C TAKES LOGARITHMS OF RAW OATAj CALCULATES MOMENTS FROM RESULTING LOGS
DIMENSION VALUE(9000)
SUMiO.O
SUM2«0.0
DO 10 I«1,N
P«VALUE(I)
IF(P.EQ.O.O) P«0.25
Y«ALOG(P)
SUM«SUM4Y
SUM2*SUM2tY*Y
10 CONTINUE
QvFLOAT(N)
UBSUM/0
VAR»SUM2/0-U*U
SIGiSQRT(VAR)
RETURN
END
SUBROUTINE BMLE(U,SIG,LI«II,XI,JH,PV,CUM)
C SUBROUTINE TO COMPUTE PARAMElERS BY M.L.E. OPTIMIZATION METHOD
C THE LIKELIHOOD FUNCTION IS COMPUTED FOR THE INTERVALS
DIMENSION XI(100),JH(100>,KVC100),CUM(100)
ICOUNTsO
C«1.0/SQRT(3.141S92654)
3UM.O.O
NRITE(6,200)
C SET ARBITRARY STARTING POINTS
C U«1.0
C SIG-0.5
C SET L«l IF U IS TO BE INCREMtNTED
C SET L>2 IF SIG IS TO BE INCREMENTED
L«2
K*0
223
-------
5UMT*0.0
1 CONTINUE
MEM*0
0*0.1
5 CONTINUE
C COMPUTE LOG-LIKELIHOOD FUNCTION FOR U, SIG
TU»U
T3IG-SIG
Kl»0
DO 10 1*1,3
TEMPiO.O
DO 20 J*l,LIMIT
XL«ALOG(XI(J))
Y*(XL-TU)/TSIG
CDFiGAUCDF(Y)
PR08*CDF-TEMP
TEMP«CDF
C......DETERMINE OCCURRENCE OF ZtROS
IF(PROB.EQ.O.O) KUK1 + 1 *
IF(PROB.EQ.O.O) GO TO £0
IF(JH(J).EO.O) GO TO IQ
C...,..CALCULATE LOG-LIKELIHOOD rUNCTION
Z»ALOG(PROB)*FLOAT(JH(J)) "
3UM«SUM+Z
20 CONTINUE
C 3UM1 IS 8UM(X)| SUM2 IS SUM(x-u), SUMS IS SUM(X*D)
IF(I.EQ.l) SUMUSUM
IF(I.EQ.2) SUM2«SUM
IF(I,E0.3) SUM3«SUM
3UM.O.O
IF(I.EQ.l.AND.L.EO.l) TU.U-0
IF(I.EQ.t.AND.L.E0.2)
IF(I.EQ.2.AND.L.EQ.l)
IF(I.EQ.2.AND.L.EQ.2)
10 CONTINUE
IF(MEM.NE.2) DELTAiD
IFCMEM.EQ.2) DELTA**!.0*U
C PRINT RESULTS OF EACH ITERATION
WRITE(6,201) lCOUNT,L,Kl,MtM,SUMl,SUM3,U,SIG,DELTA
ICOUNT*ICOUNT*1
C LIMIT THE MAXIMUM NUMBER OF NUNS
IF(ICOUNT.GT.200) GO TO 50
C OTT'S OPTIMIZATION METHOD!
C THE GOAL IS TO FIND THE MAXIMUM VALUE OF SUM
C MODIFY U,SIG BY COMPARING tHt MAGNITUDE OF SUM2 AND SUM3 WITH SUM1
C IF SUMS IS GREATER THAN SUM1, ADVANCE FORWARD (SET M£M*1)
C IF SUM2 IS GREATER THAN SUM1, MOVE BACKWARD (SET MEM«2)
C IF YOU HAVE OVERSHOT, DIVIDE BY 2.0 AND RECALCULATE
C IF NEITHER SUM2 NOR SUM3 EXCtEDS SUM1, DIVIDE BY 2.0 AND RECALCULATE
C......SET DIRECTION
IF(MEM.EQ.O.AND.SUM3.GT.8UM1) MEM»1
IF(HEM.EQ.O.AND.SUM2.GT.4UM1) MEM*2
C, TEST FOR OVERSHOOT
IF(MEM.EQ.l.AND.SUMS.LT.SUni) GO TO 25
IF(MEM.EQ.2.AND.SUM2.LT.»U*1) GO TO 25
C TEST FOR TROUGH
IF(SUM3.LE.SUMl,AND.SUM2.Lfc.SUMl) GO TO 25
C....,.INCREMENT VARIABLE " "
IF(MEM.EQ.l.AND.L.EQ.l) U«U+D
IF(MEM.EQ.1.AND.L.EQ.2) 3IG»SIG*D
IF(MEM.EQ.2.AND.L.EQ.l) U*U»D
224
-------
IF(MEM.EQ.2fAND,L.E0.2) »IC«SI6-D
GO TO 29
25 D«D/2.0
MEM«0
29 CONTINUE
C RETURN IF CONVERGENCE IS NOT YtT OBTAINED
TEST«AB3(DELTA)
IF(TEST.6E.0.000001) Gu TO 5
C ALLOW FOR ONLY 3 REVERSALS Oh L AFTER CONVERGENCE
T»A8S(SUM1-8UMT)
SUMT«3UM1
IFCT.LT.0.0001) K«K+1
IFCK.GE.3) GO TO 50
C CONVERGENE IS OBTAINED| REVEKSt L
GO TO (30,40), L
30 L*2
GO TO 1
40 L»l
GO TO 1
50 CONTINUE
C CHANGE PAGE
MRITE<6,203)
203 FORMAT(lHl)
200 FORMAM1HO,IX,'NUMBER',3X,»L',2X,'K1',IX,'MEM',9X,'8UM1',15X,
1'SUM3M7X,'MEAN',15X,'SIGMA',13X,'CHANGE')
201 FORMATUH ,2X,I3,5X,Il,2X,l2f2X,II,1X,2(1X,F10.6),3(tX,E18.6))
RETURN
END
SUBROUTINE LN(U,SIG,N,LIMI|,K,X,DIFF/XI,JH,PV,CUM»IH)
C SUBROUTINE TO CALCULATE LOGNURftAL FREQUENCIES FOR PARAMETERS AVE,STD
DIMENSION XI(100),JH(100),KV(100),CUM(100)
DIMENSION X(100),DIFF(100)
C...INITIALIZE VALUES
TEMP«0.0
QtFLOAT(N)
SUMCHIcO.O
SUMSQ*0.0
SUM3.0.0
SUMZnO.O
JSUMiO
JOSUMBO
NSUM«0
ISUM«0
ISUMDBO
ISTOP«LIMIT
DSUMvO.O
CUMAX»0.0
M«0
S«0.0
Z«0.0
C...COMPUTE PARAMTERS
VAR*SIG*SIG
AVE«EXP(U«0.5*VAR)
STD«SORT(EXP(2.0*U+VAR)*(EXP(VAR)-1.0))
GM£D»EXP(U)
GMODE*EXP(U*VAR)
225
-------
SGO«EXP(SIG)
WsEXP(VAR)
B1SQ*(M+2.0)*SQRT(H»1.0)
HRITE(6f200) AVEtSTD
HRITE(6»201) AVE,STD
NRITE(6,202) GMED,GMODE,SGl>
HRITEC6,203) U,8IG
MRITE<6,207) B1SQ,B2
WRITE(6»199)
WRITE(6,20«)
HRITE(6,205)
WRITE(6,206)
WRITEC6,20«)
C...MAIN COMPUTATION
00 10 111, LIMIT
MiM+1
XliALOG(XKI))
Y«(XL-U)/8IG
CDF«GAUCOF(V)
PROB«COF«TEMP
C...J IS THE PREDICTED NO. OF VALUES IN THE INTERVAL
C...JO IS THE OBSERVED NO. OF VALUES IN THE INTERVAL
J»IFIX(PROB*Q+0.5)
JOsJH(I)
C...STOP COMPUTATION IF NO. OF PREDICTED VALUES IS LESS THAN 5 AFTER MEDIAN
VAL»XICI)
IF(J.LT.S.AND.VAL.GT.CMEO) ISTOP«I
IF(J.LT.5.AND.VAL.6T.GMEU) 60 TO 12
JSUMRJSUM+J
JOSUM«JOSUM*JO
NOIFcJ-JO
NSUMsNSUMtNDIF
NDIF2«NOIF*NOIF
ISUMDBISUMD+NDIF2
EsFLOATCJ)
OiFLOAT(JO)
SQ*(E»0)*(E-0)
IF(E.EQ.O.O) CHISQvO.O
IF(E.EQ.O.O) GO TO IS
CHISQ«SQ/E
C... COMPUTE SUM OF SQUARES OF UlfF. BETWEEN PREDICTED AND OBSERVED NO.
IS SUMSQBSUMSQ+SQ
C... COMPUTE SUM OF CHI-SQUARE
SUMCHIsSUMCHI+CHISQ
HRITE(6,210) XI(I),XL,Y,CDf,PROB,J,JH(I),NDIF,NDIF2,CHI5Q
C ...... FIND KOLMOGOROV-SMIRNOV"RtLATED STATISTICS
CUMDIF«ABS(CDF-FLOAT(JOSUM)/Q)
IF(CUMDIF.GT.CUMAX) INlsl
IF(CUMDIF.GT.CUMAX) CUMAXWCUMDIF
OSUM»OSUM+CUMDIF
C ..... .CALCULATE LOG-LlKELIHOOU ^UNCTION
TEMP«CDF
IF(PROB.EQ.O.O) GO V .*
IF(JHd).EQ.O) GO TO 10
Z«ALOG(PROB) *FLOAT (JH(I)J
3UMZ-SUMZ*Z
10 CONTINUE
12 CONTINUE
NRITEC6,20«)
HRITE(6,209) JSUM, J08UM,NSUM, ISUMD,8UMCHI
226
-------
C SUM UP REMAINING VALUES
C... COMPUTE SUM OF OBSERVED NO. OF VALUES FOR LAST INTERVAL
DO 20 I«ISTOP, LIMIT
ISUM«ISUM+JH(I)
C.... ..CALCULATE LOG-LIKELIHOOU JUNCTION FOR LAST INTERVALS
XL«ALOG(XI(I))
Y«(XL-U)/SI6
CDF»GAUCDF(Y)
PROB«CDF-TEMP
TEMPiCDF
IF(PROB.EQ.O.O) GO TO *0
IF(JHd).EO.O) GO TO 40
Z«ALOG(PROB)*FLOAT(JH(I)J
SUMZ«SUMZ*Z
20 CONTINUE
ISUMilSUMtlH
JBN-JSUM
PROB«FLOAT(J)/Q
NDIFiJ-ISUM
NDIF2«NDIF*NDIF
E«FLOAT(J)
0«FLOAT(ISUM)
SO«(E-0)*(E-0)
IF(E.EO.O.O) GO TO 25
CHISQ«SQ/E
25 SUMSQiSUMSQ+SQ
SUMCHI«SUMCHItCHlSQ
NRITE (6,212) PROB, J, ISUM,NUIF,NDIF2,CHISQ
RMSliSQRT (FLOAT ( I SUMD) /FLOAT (M))
NRITE(6r211) SUMSQfRMSlrSUnCHI
NDFvISTOP-3
NRITE(6r2l3) NDF
NRITE(6,214)
HRITE(6,215) CUMAX,INT
NRITE(6,216) DSUM
C COMPARISON OF X FOR OBSERVED FKEQUENCIES WITH X FOR MODEL
MRITE(6»225) SUMZ
HRITE(6,l(>e)
MRZTE(6»220) AVE,STD
HRITE(6,199)
C... DETERMINE LAST INTERVAL MlTH NUMERICAL VALUES
DO 30 !•!, LIMIT
IF(CUM(I).LT.99.95) K»l
30 CONTINUE
C...FIND CORRESPONDING X VALUE FUR EACH CUMULATIVE FREQUENCY
MRITE(6»221>
NRITE(6,222)
DO 40 I«1,K
COF«CUM(I)/tOO.O
IF(CDF.EQ.O.O) X(I)«0.0
IF(CDF.EO.O.O) DIFF(I)iO.O
IF(CDF.EO.O.O) Y«0.0
IF(CDF.EB.O.O) GO TO 3>
Y'GCDFKCDF)
X(I)»EXP(SIG»Y*U)
8«DIFF(I)*DIFF(I)
SUMS«SUMS*S
35 NRITE(6»223) CUM(I) , Y,X(1) ,XI (I) ,DIFF(I)
40 CONTINUE
NRITE(6f222)
227
-------
RMS2«SQRT(SUM3/FLOAT(K))
WRITE(6,224) RHS2
HRITE(6,198)
198 FORMATC1H1)
199 FORMAT(IHQ)
200 FORMATdHO* IX,'COMPARISON uF LOGNORMAL PROBABILITY MODEL (',
iF7.4,',',F7,4,') HITH OBSERVED DATA')
201 FORMAT(1HO,4X,'ARITH. MEAN «',F15.8,6X,'ARITH. 3TO. DEV. «',F15.8)
202 FORMATdHO,4X, 'MEDIAN «',Fl5.8,8X, 'MODE «',F16.8,8X,
1'GEO. STD. DEV. «',F16.8)
203 FORMAT(1HO»4X,'MEAN OF NORMAL DIST. «',F15.8,
17X,'STD. DEV. OF NORMAL UlST. B'fFlS.8)
204 FORMATdH , IX, • —.——«—.....-•—.... ........................ ..
j[[[
2——')
205 FORMATdH ,3X,' (CUM) ' ,6X, 'LN(X) ' ,8X,' Y»,9X, 'F( Y) ' ,8X, 'P( Y)',
17X,'PREDICTED',5X,'OBSERVED',ax,'DIFFERENCE',6X,«DIFF. 30.',
23X,'CHI SQUARE')
206 FORMATdH , IX, 'INTERVAL1 »5bX, 'N' , 12X, 'N' }
207 FORMATdHQ,4X,'COEF. OF SKtWNESS • ',FlS.8,7X,
1'COEF. OF KURTOSIS « SFlSiS)
209 FORMATdH ,«5X, 'SU8TOTALI • *7X»S(ISr8X)f I9,5X,F7.2)
210 FORMAT(2X,3(F7.a,ax),2(Fv./,3X),3X,3(I5,8X),I9,5X,F7.2)
211 FORMAT(/,6X,'SUM OF SQUARES «',F12.2,8X,'ROOT MEAN SQUARE OF DIFFE
1RENCE *',F9.3,8X,'CHI SQUAKE * ',F9.2)
212 FORMATtlHOrSX,'RESULTS FUR"REMAINING INTERVALS!',9X,F9,7,
16X,3(IS,8X),I9,5XfF7.2)
213 FORMATdHOrSX,'DEGREES Of rREDOM * ',14)
214 FORMATdHO,///,lX,'KOLMOUOHOV SMIRNOV RELATED MEASURES}')
215 FORMATUHO, IX,'MAXIMUM ABSOLUTE DIFFERENCE BETWEEN CDF AND CUMULAT
1IVE FREQUENCY OF OBSERVA1IUNS B',F}2.7,' (AT INTERVAL NO,',13,')')
216 FORMAT(1HO,1X,'SUM OF ABSOLUTE DIFFERENCES BETWEEN CDF AND CUMULAT
1IVE FREQUENCY OF OBSERVAIIUNS «',F12.7)
220 FORMATdHO, IX,'COMPARISON OF X FOR OBSERVED FREQUENCIES WITH X FOR
1 LOGNORMAL PROBABILITY MUDtL (',F7.4,',',F7.4,')')
221 FORMAT(3X,'CUM. FREQ.',4x,lpROBlTS«»6X,'PREDICTED X',4X,'OBSERVED
IX',5X,'DIFFERENCE',4X,'OlFf. SQUARED')
222 FORMATdH , IX, '———-•—.............. ............. ...........
1...................................i)
223 FORMAT(4X,F7.3,6X,F7.4,314X,F11.6),SX,F13.6)
224 FORMAT(/,9X,'ROOT MEAN SUUARE OF DIFFERENCE «',F7,3)
225 FORMATdHO,/,IX,'LOG-LIKtLlHOOD FUNCTION *',Fl8.7)
RETURN
END
FUNCTION GAUCDF(X)
DIMENSION A(5)
DATA A/.31938153,-.356563702,1.761477937,
*-l.821255978,1.330274429?
IF(X.NE.O.O) GO TO 1
GAUCDF*0.5
RETURN
I MEMORY»0
PDFN»0.39894228*EXP(-0.5*X*X)
IF(X.GT.O.O) MEMORY«1 ' "
-------
SUM»0.0
V«1.0
DO 10 I«l,5
V»T*V
SUM«SUM+A(I)*V
10 CONTINUE
GAUCOF«SUM*PDFN
IF(MEMORY.EQ.l) GAUCDF«1,0-GAUCDF
RETURN
END
FUNCTION GCDFI(O)
DIMENSION A(3),8(3)
C INVERSE GAUSSIAN CUMULATIVE UlaTRIBUTION FUNCTION
DATA A/2.515517,0.802853,0.0103287
DATA B/l.a32788,0.189269,0.0013087
P«0
IF(P.GT..5) P-l.-P
ETA«SQRT(ALOG(1.7(P*P))>
XNUM*0.
DENOM«1.
ETAPci.O
DO 10 HI,3
XNUM>XNUMtA(I)»ETAP
ETAP«ETAP*ETA
10 DENOM«DENOM*B(I)*ETAP
GCOFI«XNUM/DENOM"ETA
IF(P.NE.Q) GCDFI«-GCDFI
RETURN
END
SUBROUTINE AUTCOR(L,N, STURt, ISTART, JUMP)
C SUBROUTINE TO COMPUTE AUTOCORRELATION FUNCTION FOR LACS 1-75
COMMON /B/MARKl,MARK2,IBLAftK,LINE(50)
COMMON /G/DUMMY
DIMENSION STOREC9000)
HRITE(6r201)
WRITEC6,202)
WRITE(6r203>
C CALCULATE MEAN AND VARIANCE
SUM1«0.0
SUM2«0.0
IF(N.EQ.O) GO TO 90
T«FLOAT(N)
DO 5 I«1,L
IF(STORE(IJ.£Q. DUMMY) «0 TO 5
Z«STORE(I)
SUM2«SUM2+Z*Z
5 CONTINUE
AVEiSUMl/T
VAR»SUM2/T-AVE*AVE
229
-------
C BEGIN COMPUTATION
J»75*JUHP
DO 10 M«ISTART,J,JUMP
LAST«L-M
C INITIALIZE VALUES
COUNTaO.O
SUM3sO.
C CALCULATE COVARIANCE FOR LAG H
00 20 Isl,LAST
IF
-------
APPENDIX B
SAMPLE OUTPUT FOR NEW YORK CITY
231
-------
O
o.
-O (•> -O CO CD »^ »• HI—
— r^ IT IM •» o «
X o o o o
3
u
-» IM »f I- ftl -O •<
-> — ru
•o
co
u
o
K
Z
1-
•O
OC
o
K
O
J
O
o
o
oc
a.
ft
IM
O
o
•A r-
« »
«
tu •
K
< O
ni ni
in o>
»- o-
o- a-
9 a
IB
CO
co
O
ui
1C
u.
o
o
o
IM • .
m u. k.
IM Ul Ul
O O O
CD U U
nt
o
o
o
r- u.
• o
K
>• .
o
l
I
l
CO
o
09
U
O
00
• -c o-
ru m
• u. o
O
o
o
o
o
O
O
o
o
o
o
o
o
o
Z
ki
O
z
r\i
u. z z
Ul kl kJ
O X X
u O O
X X
O X
ac. t~
'•»
3
3E
o
UJ
Ul
l- O
z oc
n Ul
*rf IM
k.
O
o
>- o
00
c
« In
1-4 O
O O
o
I
o
*
u « « «
« « « *
u « « « «
at
Ul
Q.
X Ul
»- oo
CO 3
CO Z
Ul
•« -" «M »O KI «
ooooooooooo
DC
(9
O
»-
CO
»H
X
>
o
O
Z
(-OOOOCOOOOOO
232
-------
MMMMMMMMMMMMMMMMXMMMMMMMMMMHMMMMMXMMMMMMMMMMMMMMM
MMWMMMMMMWMMMMXMHMMWMMMMMMMMMMMXMMMMMMMMMMMMMMMMM
UCIUOUUUCIU
o«=oooooo
ooooocooooo
ooocooooooooooooooooooooooooooooooooo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO Ik
233
-------
OD
•>
O
a
o
o
-c
IU
o
ui
>
a »
Ut I-
•> m
CO
X «
t- CO
ni
in
o
9
a
ru
a
a • •
i- »- -c
00 O3
o
UJ
<9
O «
Z M
a>
u. o
O t-
ct
ui
O
• IU
9 O
o •
ni o
in
-o
i -o
t —
•
•
o»
2
Ul
a
o
UI
I
9}
HI
to
a:
a
«t
a.
ui M
a at
O •«
X
< i^
OD »
O f>
ac r-
a. o
ni
_i •
« -a
z —
a
o
.
to ui
O
00
in
-o »
co o
00 f>
ni m
r- t-
m m
IM •->
o- o
3 (E
• « i^ n< •<>
-•rvi —
I »M K> in IM KI
-------
9 in f««v in 9 o r» r-iv 9
Hi<\ioino-»o-*«0o-*o-*Hiomoo in o
<\i «
o '9ra- .o in
— — .-•.. o o o o o o o o o o o o o o o o o o o z
OOOOOOOOOOOOOOOOOOOOO I- O «
m • tti
OOOOOOOOOOOOOOOOOOOOO 3 O Z
*-
ni 9 o> o AIM « o rani ry o HI «••-«rai/i « in » HI o
r O"
i -e
• o-
CO
«
> o
a o
ui •
»- 9
z «v 9
•1 •» 9
»
U O
ni <\i
z
•"• x
< B O
Z O
UI 0} UI
a ui ac
a u.
• eo a: «
O 3 U.
U. O O
n
o)
m
b.
O
=>
O
UI
ac
9
in
03
O
IU
OI
to.
O
U
z
UI
u
a
3
u
U. to.
O O
u o
9
•o
•o
a
r>
o>
UI
UI
Ul
(Q
o
U
UI Z M
ac ui
•a ui ac z
-I u. ui o
ui u. u. **
ac « u. t-
o >-i o
> o z
o ui 3
z »- ui u.
a 3 •-
«- j 3 o
Z O -I O
CO i
o < _j
or z ui
o o u. x
o z o «
i ; z ^
O Z 0> Q
235
-------
«M o
• 111
•a a:
o »
! fc
>• o
£
ac
o
ac o
£ §1
ac
x u
«J O
X Itl
III
m
CD O)
O l-
Ht
OC CD
p o
o
O ui
oooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooo
ui
u
2
lu
a
IU
IL
UI
ac
-------
o
a.
AJ in ••
« a a- 9 — »
i-»— o
Oooooo'
o.
UJMMXMMXXMXMXMMMMMXXMX
ac oooooooooooo
It. oooooooooooo
oooooooooooooooooooo
o
o
o
X
>-
o
»*
X
«
Z -I
m 3 «
ui a
-I UI
e
in
3 _i*!rC«' .Ci-l'ir<^- «^f «-eJr*'Ir0«Ui0r .^"i0. 3
«>lllllllll <
>• o: »
Ulllllllllllllltllllll
U. (-OOOOOOOOOO OOOOOOOOO U, II
..................... z
O Illlllllll O
o »•
«M ni
nt 1/1
in «
• •
o •-
IU •>
z o
X »-
UI QC
X 3
•> HC
O O
r- • •
» u. u.
Kt UJ UI
M
— O IB II
ui UJ uj ui
S. O X. X
O U O O
X XX
a ox
z cr i-
o
u
1- IU
03 «O
237
-------
ni
0
a
O
O
0
o
9
O
•4
9
hi
u
»
o-
o
9
ni
IU
0
IU
a>
c
OD
ni
9
«
m
^.
•O
in
in
I
ec tt n
tu in
ni "» w > ^
• cc tu
•o . » o
> m
• IU « •
ni o o- o •
— c- >- u.
fe
ni
o
o
o
z
IU
K O
-o - < -O 9 t-
z s — — «o
tu a ni M
oo ^- o
a: z •
w u in -i
a. o in ** J
o r a
I- ta. < O
O IU Z
IU X H
X Z U.
»- O • Z O
03 I <
O
c
•> 9
ni o
in tu
m o-
MI o ac »•
ac
a
a.
o
u
IU IU
f. Z
m
at
u
z
IU
If
n
IU
O
u
oooom«uoo«\io—
OOOOO — »KiniKI
« o KI «j o- M
•*> 9 9 9
••• — rw AI
— •— o—
—oo—
— «iv m in in -a nt ni ni t- ni » i- «
, i — — — — i — — —
till
OOOO OOOOO4
o o o -o
o o o —
>• O O O O I _._..__.___._______
ooooo — r-oa — — toAJ-o — 9K>«m=jao*o-t-o
OOOOOOOIMI
oooooooo^o*
— ru
tnino«-t9^9or^fo*^aDiO
i
4
I
OOOOOOOOGOOOOOOOOOOOOOOOO
ooooooooooooooooooooooooo
Z> tt 0000000000000000000000000
mini
O — I
238
-------
t/»OO9O<\IO«VOir»»»OOO-«OI'>-«
o KI — o -« » o -o 9 ni
» a -« —m9
m
t
-. i « -«f\i t
ill
9 o •**• r-in in in i\i •«»»•»
oooooooooooooooooo
oooooooooooooooooo
• ninifWfwn/ntnini<\»\ii\i
oooooooooooooooooo
oooooooooooooooooo
oooooooooooooooooo
irnninininininininintnininininminin
o
•
o
M O>
o
o
o
in
ra •
oj ni
•e
o
z
Ul
CE
o
Ul
•>
z
o
to
z
o
a: a>
o o
o o
V >-
u u
IT K
U. U.
»
- »
»- M
O O
O
o
oc
o
O
in —
» 9
>i X
« <• o
z o
ui n ui
or ui oc
or u.
a: «
O => U.
u. o o
•)
«o co
>- U. Ul
J O Ul
3 OL
•> Z (9
Ul O Ul
at « o
o
u
o
u
a: z z
o t- »-
o> ui uj
« go a>
ui
Z Ul •)
U Ul
o z u
Ul Ul Z
t- a: ui
< u ac
-i u. ui
ui u. u.
a « u.
o M
> o
O Ul
Z »- Ul
or = t-
•1 j 3
Z O -I
co to o
o <•
a z
O 3 u.
(9 Z O
O •-
X X Z
ni
IM
i
o
o
o
I
u
o
239
-------
ni
•
•o
ooooooooooooooooooo
oooooooooooooooooooooo
ooooooooooooo
ooooooooooooo
oooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooo
oooooooooooo
— — — oooooooooooooo
hi
a
o
UJ
O
n
o
o
tt
240
-------
•1MMMMMMMMMMMMMMMMMMMM
o. r-4> K> in FI K FI Ft FI FI Ft
OOOOOOOOOFItf 9FIAIOOOOOO
I- • •••••••••••••••••••
OOOOOOOOOOv«4>
_i Fiin
a.
IUMMMMMMMMMMMMMMMMMMMM
It OOOOOOOOOOO
It. OOOOOOOOOOO
ZoooooooooniBO9O
o
o
o
o
in
to
u
o
ac
in
O
z
o
a>
a;
i-l O •-
c o z
O lit
U 4J <_l 4J U
a
< i
a
N-
09
1-1
X
M
O
• o o-
o
oc
w
z
X
u>
•)
O
o
o
o
in
9
»
-O
it in ut
in o o
in u u
nt
o-
in
i
n
IU
b.
o
O
z
o o o o o o o
o o o o o o o
IOOOOOOOOOOOOOOOOOOOO
ee. ••••••••••••••.••...
biooooooooo(vi M
^ U '•
«
z
z o
II O U
< o
IU Z
Z (V
U
o
o
O X
or i-
Ht 9
m
nj
9
•
O
I
•
3
n
IU
O
IU
9
(VI
111
(9
»*
X
O
O
u
01
IU
241
-------
IV
It
9
O>
in
9
o
o
UJ
O
O
ra UI
in 1C
111
O to,
o
•
o •
o
o
O x
H- X
•o
UJ
O
o
a - z •
00 U 9
O in x
m
ui
m
oc
u
o •
03 Z
»4 »-
a M
< K
a. <
O
o
o
Ui
M m
o n
z in
or v
o w
u. a
o
o
u
o
UJ
>
I
UIZ
•9
CD
O
U
1-HZ
o
UJ
K
a.
x >
3Ct
UUI
— ru o KI (\j 9
• .o r~ o in o
ni t» " I
KI ni
ro «
eooeeooooeceeeccooceeeoee
—•-"—OOCOOOOOOOOOOO- —
ooooooooooooooooooooooooo
eoeoeooooooooeooooeoooooo
ooooooooooooooooooooooooo
inininmininininmintninmininininintninininininin
242
-------
9 IM •* nt 9
po«O9««M(\i.e«>.o«M'\io<\i
o -,,.„,« — ->
oooooooooooooooo
ooooooooooooocooo
OOOOOOOOOOOOOOOOOOOOOOOOO
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
— 9
o- ni
ro — m
(M 9 —
•O CO 9
9
O-
O
-I
at
ui
!•
Ul
u
tc
u.
o
in
9
cr>
z
o
•0
CD
O
o
>-
O
Ul
ac.
h.
-
o
u
ir
X Z
I) O
o o
o
u
en
ui
oc
Ui
ID
IV
I
« ec
ui
Z Ul CO
U ul
o z u
Ul UI Z !«
i- a: ui
•a ui a: z
_j> u. ui o
ui u u. »
a: » u i-
o >-< u
> o z
o ui o
Z >- Ul U
a: r> »-
1-1 _l 3 O
z o _> o
CO CO O O
CD CO Z
> « 01 11
O < -I
a i ui
o r> u x
cs z o 1-1
o «-• _i
Z X Z I
J < O IS
O Z CO O
243
-------
in
o-
O •
O b.
z b.
03 IU
< U
03 Z
O UJ
-i
ui o
a ta
o
bl
o
OO»OOOO—
oooooooooeoooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooo^ooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooo
•OOK»«l»\l-««-«-<—'OOOOOOOOOOOOOO.
O
o-
a
UJ
bf
a
o
o
244
-------
•9MMMMMMHMMMMMHMMMMMMM
o moi/>»o<->nj99inm
J O
•< J
> «
a: »
tai
»- o
z cc
M UI
«t u.
t- O
o
UIMMMMHMHMMMMMMMMMMMMM
at oooooooooooo
Ik OOOOOOOOOOOO
oooooooooooooooooaoo
xoooooooo9« iu u
moo
9
9
««*«««««
l«
o
o
z
« x
e-
•>
UI
_l
•>
Z COOOOOOOOO
UI OOOOOOOOOO
uooooooeooooooooooooo
u
DC
ts
o
o
.
Z « 9
O •C -C
» c> r~
•c
•O
9
r-
in
o-
-O
Kl
«o
o
o
• u.
o
z
ui
OOOOOOOOO OOOOOOOOOO
o
in
f
Z.
O
u.
UI
O
o
u. t-oooooooooo oo<
o
Z
< O
UI Z
x «i
z
UI
X
o
X
O
z
UI
O
X
O K>
• —
•
UI
>- eo
•> UI
UI X
X C9
O >i
j r
•>
Ml
X
o
u
245
-------
OOOOOOOOOGOOOOOOOOOOOOOOOOOGOOOOOOQOOOOOOOOOOOOOOOO
ooooooooooooooooooooooooooooooooooooooooooooooooooo
II I
ooooooooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooo&oooo
ooooooooooooooooooooooooooooooooooooooooooooooooooo
+ + + + »-»' + +» + * + *-«-*-** + *- + + +*-+'*-*-*-»»»» + *-*- + »*'+-»**.'«-»-»-+-»'«- + ***
)jj ij^l ^1 1.1 ••> ^1 l.t l»t 1. 1 Ut 1.1 l«t ttl It! ill 111 Ui Ul Ul IH lit Ul Iml l.j l»l I.I l«t m «jl ^ 1. 1 lil i. 1 i. I t»l l^i in 1. I ill t.f |^ ^jj ||J 1^ m I.I i.i i^l ^|l t«» Ut
ooooooooooooooooooooooooooooooooooooooooooooooooooo
I
I oaor*-r^r*-r^r^^*»^r*-r^r*r^r**r^?^r^-r*^r*f^ff'i^p*-
r
u.
o
z
O
< ^4*^*^*^0000000000000000000000000000000000000000^000000
r
>-i i i i i i i i I i I I i I I I I i I I
»—
a.
o
£ >jJOOOOf\JO**OfMOOOOOOO9OOOOOOOOO9~*orvJOO-*dOoOOO~«OOOO'OOGOOoO
X
l_ **QOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOGOOOOOOO
to ^
ua
OT ji^rufvjrvjrvjnjrurxjrvjrurururyrunjru^fXI^tvi'y--*--"*" — -- — -• — *•*- — *« — *"-^ — « — *« — — i\irvjryA)(\ii\jcviru
246
-------
OOOO
— — Or\J.OCOO
— o
«> o- o-
ooo
o K» — i
ru o innj
ru — o i
ooooooooo
oooo
*lit
UJ 'If UJ
ooo
oooo
oooo
__-_._.— ~ — — •— «*•!»• r«l»imK19999»ninin4>O
ooooooooooooooooooooooooooooooooo
Illlfllllfllll+lllllllllllllttll^
k&J ttt UJ UJ UJUJ UJ UJ UJ UJ UJ Ul ID t*J 1*1 ill '•!>•>> •->>
uj lal ul UJ uJUj uJ uj uj uj UJ UJ uj
~ ooinrtr** — — ~
oomtnni— -
omAJru^ftmAi**
•o o —
> o o o
oooooooooooo
UJ UJ UJ Ul
— •-> oru -o co
in r- co
~* o ro — in
9 «x oin ru
9 ru — o in
— •« to —
lij U> UJ
000
>» o o
O o O
IM O O
•o o o
ooooooo oooo
efo0000§Sg§§S§00000000000oo«,oooooooooooooooooooooo
***«-*********-*** + «-***»***»0000r000f0f000°oooooooo5
ujuuiuiuJujujujuujuJUJbJuJuJbJUJuJuJuj ujui uiuJUJujiijui^iiIiui u,r" ^ ui uj uj ui ui uj uj uiuiuj
tf»jy> * ««v« ^.*« «« « « **-»**-i>>*<*-*«SS00000 00000000oo°o ooooooo
J> ? S S £** S S 2 2 2 S S •**•••"*•••«««•«««» S «cS S 5 2 2 » *"° ***** *°
„ , , _ _ F^wwwwwwtvww?wt.ga?VCOQ«>fDtf>«D4>B«><,,,D«aDC0a>«>«O«D«OflOCO««oeOooco«»co««DcoflOaDcoao eofloco
^.-^ ^«»«»«««'«»1»*«'«'»w«'«'5r^*<»'*J*««!»^*««<»*«'»<.ttT9999V99Sf4>99^r9 9^^
Q OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO OOOO
ooooooo
ooooooo
***»*»
---
UJUJUJUJUJUJUJUJUJUJUJUJUJ UJUJ UJUJUJUJbJUJ
OOOOOOOOOOOOO 00
•a -o •a
nint
oooooooooooooooooooo
o*or\ja>t>co9* oKt^o-porxKOinKt
t}>i^-M.OMn/t1*«X>t/t> \J
ft***ftJAi—«rOiO«
ooooooo ooooo^iin *^^» ooooooo
r^r^r-r^r^r-r^'""^ ^^^or^r^r^r^K.f^r^r^h-i'*-
AJ (V-f\| fMr\|r\jr\j 'MfMfMf«fM/\|/Nj
&&&&^3r9^^zr&*t&&&& I»^=T
_ _ _-_^^.^ — .--j -* zr zr
f-ir»»T»»f»imroK>^ookO'^ofc^O'»^^{rooi-^»O' O'o-^
^^V^^^M^O OOOOO O O O O O O ^OOOOOOO
ooooo^oooooo»»-^--*'-^-~---— - -
f\j r\j ry
I I I
-0^0000000000000000000000000000000000000000=000000
ooooooooooooooooooooooooooooooooooooooooo,
ooooooooo
247
-------
loooooooooooooooooooooeo
•nioiontnioooinirirti«<«oin4>«9ni
•»*«if»fti**'prt»*r*io»«<*9i^
OOOOOOOOOQOOOOOOOOOOOOOOOOOOOOOOOOOOOOQOOOOOOOOOOO
oooooooooooooooooooooooooooooooooooooooooooooooooo
OOOOOOOOOOOOOOOOOQOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOQOQOQQOQOQOOOOOi
99999999999999999999999999999999999999999999999999
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
oooooooooooooooooooooooooooooooooooooooooooooooooo
ial i-i frl I-1 frp \*f 1^1 frf I.I l«i m Ui t^f u t t«l ^p t^i JtJ i»l *«' *•' «'* tf I 1^1 itl lftt ^ffl l>t 1.1 tti lfl Ul 1*1 It I tli HI l»l Ul *ti UJ t»l in t-t 1*1 HI in itl I.I l*> l«l
Oooooooooooooooooooooooooooooooooooooooooooooooooo
I -O -O -O
t^OOOOOO^~3^^^^>^OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO3OOOOOOO
------ - ...... - !
I I
OOOOOOOOO*4OOOO9«*OOO9O'O>OOOOOOOO£>OOOOO OOO 39O 3OO 3OO OO
oooooooooooooooooooooooooooooooooooooooooooooooooo
248
-------
-
ooooOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
oooooooooooooooooooooooooooooooooooooooooooooooooo
••• ••••
99OO€>GO^C^oGOv4OOOGOOOOOOOOOOOQ O OOOOOOOOOOOOOOOOOOftIO
00090000000000000000000000000 o oooooooooooooooooooo
249
-------
IP
ni
in
•o
in
D
UJ
a
to ui
o
i u
i ac
i <
is
I 3
I to
I
I O
I o
I <0
IU.
111.
UJ
o
Z
I U>
t a
lu
la.
IU.
I -i
I O
a
I UJ
i >
i ac
I Ul Z
I to
m
o
o
I u
a:
i a
ui o -»
a to »
o 9 o
r — o-
» r-
•o ni
• nt
ni in
ct
o
(3
o
to
»-«
t£
•«
a.
x
o
-o i^ t- n
•-« f\i to
in »-» to
i- o
• ui
9 J Z
n — < 2
r ui
z cr *
< o to
Ul Z
r it u.
u. o
• z o
I < «
>- -• z u.
*^ o < w
z 4j ui o
< z z u
— iAOin-oo-o—
*-*-4> oin «*
-o in 1^10 «o in i>- 9
X I 9 (9 O O O O M 1.3 n C C « O O /I /> ? 9 -O 'V M M
ooooooooooooooooooooooooo
ooooooooooooooooooooooooo
o ru KI o •
o o ao in •
O O KM- I
o o o -o
O O O O i
o o o o
o o o o
• • • •
ooooooooooooooooooooooooo
oAj9.o<09> (Mr^tn«^9inrnmov^K)ojV4
•OP-OO oi^9»r>-to oo- o9»tn-O<7kfO^«930l
4>inAi9-r^inro-^o*-4ruK^i
~4«***OOOOOOOOG
^ -"inK»to-r^r--»eO9ivi
X
z i •O9o-njinr~'X>0'«'\!rn9in-o-or-a>K>9-9-oo>*»*«
ooo-*-M^^^^rMrurvruM'M*\l?\lf\iAinjr\iA)fOKiKiK»rOi
I
O.OOO9O9OOOOOOOOOO9OO9OOOO
IZ>IOOOOOOOOOOOOOOOOOOOOOOOOO
O 9 '9 O >O O O O O O >9 O O O -O
ui i inininininininininininininmininintninminmininin!
O- O -« IM Kl <
250
-------
•Ai*.o-Aoo9'KioO'**9r-o
r-» or- ooniito o — — .o
o- « o- « _.» —
nii/tniooniB.»'=> '» O O O I
ooooooooooooooooooo ooooo
ooooooooooooooooooo ooooo
oooooooooooooooooooooooo
omin o*n»or*ir^**t»*i^ in*
O9OOOOO9O9OOOOOOOO9 O'OOO
ooooooooooooooooooooooo
ooooo .
<\l
-O
m
in
o-
IV
ac
u
^~
z
a
«o
x.
o
ro
9
o
w
r«-
>- r«
3 S Z
<
O
•
o
to U
Z Z
UJ Ul
3 3
3 C3
UJ ILl
Z Z
u. u.
;>
z
3
o
o
IX
> o
Z O
z « r-
M P 9
z
10 z a
ui 3 ui
Z to 3
U Ul
a z u
ui ui z
I- Z Ul
< ui z
-JU.nl
ui u. u.
Z *•« u-
O !-•
* o
O UJ
Z — UJ
OE 3 *-
i-> _l 3
r o _i
•y> u.
a r o
o >-t
E x r
_j < o
O Z CO
IM
I
Z
o
O
O
z
I
u
o
251
-------
* o
4> 111
_
Ul ">
O
o •
1 U.
u.
>• «
»- o
CO U
o z
Q. Ul
Q. ac
Ul
-I U.
< U.
JC M
ac o
3
Z
(9
O
_l
x
03
A. Ul
X CE
Ul
I «>
i- a>
ui
l-t X
u
z o
ui ui
K o
U. Ul
OC
o a.
ui
>
ct
Ul
OOOOOOOOOOOOOOOOOOOOOQOOOOOOOOOOO
I O 13 O (3 .3 -3 O .3 .3 13 (3 13 '3 O 3 3 ,3 3 o O O '3 O '3 .3 13 '3 '3 13 O '3 3 13 .3 3 O 13
lOO-OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
|,3OI= 1313 O 13 13 O 13 O OI3O 13 13 13 O <3 <3 O 13 13 i3 13 (3 O 13 13 O 13 3 13 (3 13 O 13
»ooooooooooooooooooooooooooooooooooooo
II
UJ
z
tu
X
Ul
u.
UJ
ac
C9
(O
z
UJ
r
o
o
cr
252
-------
o
a.
.oniinin9onini*9oo
OoooooooCT-«f»ir-t-
Q.
OOOO9OQOOOOQ
ooooooooeooo
oooooooooooooooooooo
o
o
o
in
I
i—
o
a:
UJ
i-k
I
z
o
« «
« « «
« « «
o
IV
o
o
o
9
o
z
to co
co —•
ui
uj
_l
»
UI
_)
<
>
u.
o
•
o
a: o o o o o
31
O
3
UI
UJ
3
O
•
o
=T N U.
CO O
tM »%
• • »•»
HI * Z
•» UI
z u.
UJ UJ
O
o
in
fu
•«
<0
•
in
«
in
z
UJ
H
2
Z
•<
UJ
I
o
z
a
z
CO
UI
X
o
ni
a*
to
UJ
z
»*
z
a
z
o
CO
UI
z
253
-------
z
O
DC
oc
O
u
O
K-
3
«
«
*
« K> to a rvi " -e o M in « o a> — < « o r~ m »
000000000
O O O O O O O O O O O O O
254
-------
255
-------
•o
z *
o
u +
z
h. m
*
O
kl (VI
IT I
an
o
•9
IU
1C
o
DC
O
O I
Z I-
o z
UJIL.
a K.
Kill
oo
uo
o tooo
w
«J
z
o
u
.— -ccr——
nj -o
a m
l
t l(u
«
ae
cs >•
< <
co
<9 a;
258
-------
*
«
*
*
I
»
I
I
I
*
I
I
*
I
I
*
• u
o
IV
u
^ * o — — »
I I I At I I
i tnii I inii I i
ui
tr
i • • — I I i
a
o
99 4 94 9
257
-------
APPENDIX C
BASIC STATISTICS, HISTOGRAMS, AND AUTOCORRELATIONS
FOR THE GEOGRAPHICAL DATA GROUP
259
-------
NEW YORK CITY, NY
a. t> «-• r-in ni-• o m
3
U
O O O O -*
ni
tc
o-
9
o
o
o
o
•o
o
ac
O
o
O
o
a
a
o
*-•
bl »
Z O
X t-
V K
X =>
•> K:
ttl
o
o >•
a
O
O
e
>-i X
CJ •
a
o
2 <\l
ut o
T m
o
o
o o
•0 -C
-o t-
to bl
K< tt
ui «o
i- ui
»
u.
o
t
t
t
i
I
CO
o
«
M
CD
•D
O-
• • O O
»- 00
=» z a> oo
o m m o o
M f\l r- o o
*- 9 CO O O
< to «w • • < «
c a: . . ni r-
tO < O >fl in
ni » « «*««*
«r-mnio •><
o--oif<4>tc«»a)in
o
Z
(W
hi X
Z 19
O it-
u.
o
Of.
to
o
t~
CO
»H
X
o
I
05
•
iw
o
o
Z
ui
to
ooooooooooo
l-ooooecoooo
260
-------
NEW YORK CITY, NY
*• M w.w w
! M,M W M *< M
M'•******•*"*******
-•it «m ««r^«<
^^^
«J (J O !_•
* •
o
o
o
o
u>
•
o-
IP
-o « in o .«
^ m« o « « «^» «« mm ra9 9in m » KI ,0 « «w ev «•> «> K> o t^ K» r-- » «i » i-
a
ui
UJ
K
19
a:
o
-.
O O O O O O C O O
a
ui
to
Ul
eeoooooooooo<
o
•
o
261
-------
NEW YORK CITY, NY
« «
.« «.«
« * •
k.M •»««*«
+ §«*««
Z I • * « *
O • «*«••
« IE
O
« 0«
O •
* •-
3
« -a-a
»
•>
H-
-t •
3 •
m
hi o
a •
V*
z t
o
••« z »--
•- oz
« « * «
« «
OC Wb. OOCCGOGOOIttVtftCOOOOCGOOGOCrOOOGCItltlftfVICOOOGOCCO
O OC k.
O K UJ
O O O
>- 1>O
« Ul
U
« O
' U
IE
•-•
«
a.
-<3»- — trio- ior-«
262
-------
NEW YORK CITY. WY
o» 01—t^ » «> « 3 • h-OIKI — » a » o* » —
-------
PASADENA. CA
•> M-M *t Mi Wist! JUS Ml ah Ml MI
O ••»«> MM »»-••• MM
o. m »:>>•-•*- ra m » o v>
o
o o • t- ..
o m •>>
•a
U, M M »• *» M
Or «•
x o wr^«*
u
o
o
o
X
»-
o
o
Z
•> «3
•0 n
Ul •>
z o
o o
o hi
o to
a.
•>
— u
o ^
u a. o
•^
• M J
•< • •«
Z >- >
« u
o o u.
«
« 9 O
Q. O Z
O
O C
•o -c
u •
a
rft Ul
O IE
•
• O
a
iu
a.
3
JC
•> k. k.
•« Ul u
•* O O
o
o
HI
O
l\l
o
O •
o
o tei
* • r>
u.
o
K
»-
TO
•«•
O
O
K
U
o
o
»- M MM
in
c
«
m
< to
> o
o
a>
o u
in u o or^m
• a. ••••
o tu
I a.
MM M M W M *(
in o 01
m mm «
~ ni •« t» m m m
•9
U
to
•a
•a
«
Z U. Z
Ul Ul Ul
X O X
o o o
r z
Z K
m «
z «
Ul >
o
X
o
u
o oooooooooo
to u>
U X
z a
o .»•
o
X
a:
o
>-
m>
t-t
x
u. »-oocoooooooo
o
oo —
i
264
-------
PASADENA. CA
X-X'WX'X. X,X1X'X»X|X>X1»««»'1X'XIX «•! X-, XI X' *" X'r •" X»X
t^
<>• - w •> o — irxo i*m -•
GOGOGOOG^^OOOGGGGGGGOGCOOOGOOGOGOG
GGGOGOGGO
OGCOOOOCOCOOOOOOCOOOOOOOCC O C OC CGGCGOe
265
-------
PASADENA;CA
z »
o
o *
U.M
•»
• « ••«««*««
«•« «*««««*«
«•«.«•*«««««*«
.*.*.«
* « *
« * •
« « « * «
**«*•«•«
:«««««***
* • «
:««««*
«
*
« « «
« « «
*••«««*
I
K
O
U9
O •
I
Z »-
oz
K klh. oocooc
o ttb.
t> acu
o oo
»- o o
««*«eoo«^ntAi««kniAiw^oc>oo<^i
OOGOOOOC^OOOCOOOOOCC-OOCCOOOOOCOOOOCOOOOCC'OCCO
«
*
O
o
o >• o ox >«
»• » o o c
-x^^xxxxx—'^x^-nimm
266
-------
PASADENA. CA
« * «
« * « -o
v«
o
*««« *«««««• .
• ««« *««*«•«• M
* * * * * ****•***• •*
•
Ml
O
«
M
tr
«
>
o
z
•«
» ••
ni
9
ni »o e o oxmniiv AI9 • 9 M-« •
ocoooeocooeoooeooooceecce z
u
z
o
e .o K> •» r-ni r-9 i^ K> IP r^ v> r~ o « •> 9-o 9 «>-e m 9-o •-
~ ~ — ~ * imt>-i«»«ir»»->ofy«O'«««ii»>ir'"
c
Ul
or
K
O
O
O
3
9999999999999999999999999 •»
I
(E
c
ni ni miwniniruivivnim
-------
SPRINGFIELD, MA
M' »t, »». M|
IMMXatWWMM
• ••» «o-
• 0 moo>
3 ->f*l*^ »«> fr » » »
Kt
•
ni
ki 0>
z to
i«»
o
o
a
u. u
O <9 h.
Z •< O
»^
a IT o
a. o z
» r-ni o
•».•.
ix o
I
•>
U
«
«
Z
(U
K ra-o « mr-» « o 4> 9 ra
uj o-1^ —.-•-o ni « » «v o »
-o ra n» »•
•
U
_i O
« »-b.»-»-_l«Z
1^ Zh.ZZ-«>-O
U
I O f f t~ bl
• OUOO«-»O9
Z hi X
m t o o x z o
bs z oe »- o ««
3 X IW *» =» -I X
ooooooooooo
u.
o
cr
o
o
_
_i •< o — • —
» n:
kllltl*tllltl
b. »-oocoooec>ooe
o z minirir* «*•>!»» «»•»•» in
oe x
i
9 IP « K
268
-------
SPRINGFIELD. MA
XIM1M, X X>X X X|X>X M'X X X|X|X M. X'X X X.X X X u t> oi-1
M'WMWMMXX'X'WMMMMMMMMMM'XX'MMMMMMXKMXXIWWWMXMMX'MWWMMMWX)
o w «wo o o«o s» m-0 ««in»3 a> oeoecoocooooocrooococccooocooeooooc-oooooooocoococ fe.
^
-------
SPRINGFIELD, MA
o
•4
t-tf
u «•
« *
• *
« * *
« * «
* « « « *
O I
1-cO •
•
•
«
*
hi
e
O
u
O
him •*•
flc •
o
o r
t
-o
v«
•
z *-•
oz
oocoocoooooooooceooooocooo
KM
OO
O U
*
•<
* M
oc
* •«
>•
« O
z o -o *i » «i» -• ••!
- 1^ «^ -omni oo- m«o«
K)t»«ir
•••• m n> »
—^m-«— 'cooooooooocoooooooooo—
270
-------
SPRINGFIELD. MA
nj
«•»
a
»-
«
«
««««« *« •
« * «
O
•«
«•
Ml
O
»-
•
m
GO^OGCkOGOGGGOGGGOGGOOOGOG •
II !!!oc!»ft!ttft*»*to<:o*l 2-
tji«v r.*zs—*••*"
ino-_»ruiT"t~a:
. •• . ."". ..... •"*.**. ............ OC
OCGO^OOOGGOGOOGGGOGGGCOOG tf
llfl* •••••lltllllf II O
u
o
OSS" h.
271
-------
NEWHALL, CA
•BMMMMMMMMMMM
o r^nim««ir>. «*><%»>
a. i- *> »n .-.«ior~o> niwi
««»»»«
•-• f
*
*
in
c
•o
•o
•>
«
o
o
o
X •«
Z >-
u o>
o
o
o a
o ui
s ac.
o
L
o »
ac.
a. o
« o
o z
O
ni
ui a
o:
•« a
•<
in ui
O 1C
uj a>
ui
z
<
*—
to
m ~t
ui co
z o
z »-
ui tr
to *
to
ui
(9
o
-I 10
4 •-*
> X
»
in
o
o
>« •«
« «
« « *
u« « * « «
•9 •
ft O >— MM >* M M *C M M M *t M
a o z v a> 9-o ont o-x« •«
o uj r~ om «-«
a>
o
in
»
u.
o
I
•
I
I
I
to
u
-------
NEWHALL, CA
OOOOOOO
OOOOOOOOO
ooooo
OOOOOO
•ooooooooooooo
oooooooooooooooooooooooooooo
oooooooooooooooooooooooooooo
UUUULXJUUIJI
MMMMKWMMMMMMMMMMMMM
9 -omin
MMMMMMMMMMMKMMMWMMMMMMMMMMMM
^ooooooooooooooooooooooooooooooooooooooooooooooo
OOOOOOOOOOOOOOOOOOO'
.OOOOOOOOOOOOOOOOOOOOOOOO O
••oxtfiboooooooooooooooooooooooooooooooooao 00 = 000000 o
a
ui
ooooooooooooooooooooooooooooooooooooooooooooooooo
ooeoooooooooooooooooooooooooooooooooooooooooooooo
273
-------
NEWHALL, CA
o »
z
* * *
* < «
U.
c
to
one
274
-------
NEWHALL, CA
ni
• *
« * «
« * « « «
•
u
C»
Z
^
•-»
oc
•«
>•
o
z
<
o
«
10
K>
O OO 000000030000OOOOOOOOOO Z
<
Ul
z
z
o
1-4
. -* o> o in ...-. ..•.».•«.....•••• ct
oooooooooooooococooooocoo a;
o
u
o
^~
3
• ••••••••••••••••••••••a* •<
01
X i
o
O — fU M 9 l
275
-------
PHOENIX, AZ
COMMMMMMMMMMM
r-
00
o
•
st
o
o
o
ac
f\>
CO
o n
o
o o
O U!
It
«o .-t
ui co
2 C
u.
o
o o
•a
o-
(VI
(VI
o
ac
UJ
IM
U.
O
» (9
x n
IU
o
a
*
r- u.
w o
I
I
I
I
£ O X
O U O
S. X
o
o
o
o
o o
e «
-€>
r»
=»
ui
2
0
o
2
(VI
O
ac
*)
to
ui
» O
• • -i
W X
U 3
D _l O
«« =» O
•> u
I- UI
i- co co
CO UI
UI I
* a
O •-"
-I X
CO
<
cc
a
x
«
or
u.
O
a
(9
O
fr-
•O
•M
X
a. nj ru >•
til
o
O Z (Of- Kl »
o
HI
• a
z
-------
PHOENIX, AZ
« « « «
M
rvi
o
" g
o o
o o
o e>
o- •»
z
«
X
r\J«-)K1^«^
* a ru =i ni ni ni ru
t\ifu->ru _ — — —
COO^OOOOOOOOOOOOOOOOOOC OOOOO O OOOO OOOO
u: u
a u
(9 Z
O
c
CD
OOOCOOOOOOOOOOOOOOOOOOQOOQOOOOOOOOOOOOOOOOOOOOOOO
«o <
ID >•
r>
_i u.
< c
>
u
u. z
o «-
O «-i
Z _l
277
-------
PHOENIX, AZ
»
-------
PHOENIX. AZ
« *
« « « *
« *
« « «
« * « «
IT
in
o
z
o-
•
nt
ooooooooooooooooooooooooo z
ac
o
. r» r^f- i
c
o
279
-------
NAPA. CA
tQMMMMMMMMMMM
2Ot/lr^**lS)r\i«-*4>9-r'l
o._ir-9
ruaf- — ec.cinr-«Minr-
*- • ••••••••••
oo~ — — — in»-co»o-cr-
_i -. in « a o-» o-» o-o-
IMMMMMMMMMMM
m
o
~i
•« in to o- » » » » o- »
Ul CO
x in
in
o - »
u
f l> ^
« o
O
o z
o «
O 1^
in «
tfcf •
K
« o
«
in ui
o a
Ul <0
«- Ul
CO <<
»
9
l>" U.
» o
CM
in
o-
•c
in o
«c a
CO F-
o in
•*• ni
a> m
ca «-«
ui a
z O
x »-
Ul K
^ 3
CO tt
o u. b-
r^ ui ui
IV C O
9 U U
in
o
z
o
a -•
« m
• •
in tx
o
o
o
O
o
o
cc
«•«
tt
X
•—
O H
IH
X CO
ui
_l 3
or >
ui
i- O
z a
o
z
o
K
I
O
o
o
o
o
o
o
e
in
u
»-l
•—
(O
m
in
O
ui ui ui > u
O X Z I- Ul
X
* * * « «
1-4 o t-M K MM W M >« M MM **
o o z o -o KI -»«n ce o m KI o 9
O Ul »•!— -»9 -• » 9 9 •>
280
-------
NAPA, CA
A9ISir~OOOOOOOOOGOOGOOQOOOOOOOOOOeOOOOOOOOOOOOOOOOOO
wc-^^GOGGOOGOGGGGOGGoooGGoooooGGOGOGGOGGoGocooGooc
^o-^^oooocoooooooooocooooooooooc-cooooooooooccooooo
X
o
o-
in
I
4- »-
• OGGOGGGGGOGGOGG*
»OGGOOOGOGOOGOGGGO4
)OOGGOGGGGGGOGGOGGGGOC O
• OGOGOGGOOGGOGOGGOGGGGG
a:
(9
cr
o
•4tf>*^fUnjOGGGGGGOGOOGOGOGOGGGGGOGGOGGOGGOGOOGGOGGGGGG
OOOOOOOOOOOGOOOOOOOGGOOOOOOOOOOOOOOOGOOOOGGOOOOOG «O
OOOOOOOOOOGOGOOOOOOOOOOOOOOOOGOOOOOGGGOOOGGG h.
O
281
-------
NAPA. CA
Ul
ac
u
cr
ac.
o
u
o
I-i-l (VI SI O O —
t- •«
OOOOOOOOOOOOOOOOO^OOOOOOOOOCrO
O^OOOOO
««)«»«i«>«a:« oc «-•
282
-------
NAPA. CA
«
« « «
« «««•«• a
•
«*
ii
UJ
2
<
•-•
X
«
>
O
z
<
v«
M
ni
H
oooooooooooooocoooooooooo z
<
UJ
X
z
o
»-•
I-
^
_l
Ul
......................... a
cooooooooooooooooooocoooo or
a
u
o
at
o
283
-------
PHILADELPHIA, PA
_l It •« «> O- O-O- O- O-
•
.»-
z
IU -O
x m
o-
«• O
*• bi
• OE
-I O
« o »
< a
M a. o
x «
a. o _i
_> • «
kl >• >
O U
•« O k.
-too
o
tt
•<
a
Ul •>
z o
93 SC
k. k.
O O
Kl k. ».
O U ttl
<» O O
*• t> «J
O
a
o
•o H
I * t
t * <
O O
3 «
u
a
IA IE
Ul
•o
2 — 0-
o ni m
•M
»
=»
> M kl X
** «j » • ki n
M 3 _i a
H- fc. i- t- _» •« z
Z k. Z Z < >• O
kl hi kf kJ > o
Z O H I t- kl
• O«_>OO«-«50>
X Z Z •> kl
Z kl X
< a o x at u
ui z a: •- o .•->
x m io si j x
•4 O >- M M MM M M M M i
o tufa o- o -o v «••«• — « -OKI
10 or~ M -»»-» t»io — a rani
k.
O
C9
O
Z
<
•o
ui
>•
k.
o
•
o
m a m or> ro •» «
OB 9 r* »^mi
zr -« ni«
oocoooooeoo
>
o:
t-coooooooooo
z m m in v« m v> m in 10 m m
— fUKi=rir.o*-.«o-
284
-------
PHILADELPHIA, PA
M-W *«|M|MlM!MlM M M'MX.MtM *
MIMM'MM *«
MMMMMMMWX M>*< W
>oooooooooooooooooooooo
»OOOOOOOOGOOOOOGOOOOOOO
IOOOOOOOOOOOOOOOOOOOOOO
o
o
MMMMMMMMMMMMMMM
n o ir> IP o m n< in m —
x-xOO^OOOOOOOOOOOO
o-
fXI
oooooooooo
oooooooooooooeooo^
oooooeooooooooooooo
, o ki
• • oc
' o o
ae.
O
OOOOOOOOOi
OOOOOOOOOOOOOOCOOOOOOOOOOCOOOCOOOOOOC.OOO oo^??^?ooop?c'cfoGC'oocoooeooc> k.
- . . - - - _ . _ ._ o
285
-------
PHILADELPHIA, PA
•• *
• * «
t«t»•
* •
«•* « * •-«
HIM
ac. w
o«r
O I
• ••• I
t
oc uh.
O get.
O (EIU
O 00
»- U O
>
o
oooooooooooooo
286
-------
PHILADELPHIA. PA
ooooooooooooooooooooooooo
— in <* w » » o — IVKI a i/<
287
-------
DENVER, CO
a. IM co r>
cc or^>
ni -C o-
o
+ I
n
»-
z
II
z
f-
o
o
>•>
O
O
O
O
N
IP
o-
C
UJ
a
o
»-
CO
o
Cs
o
z
CO
O
•c
to
H
to
a
o
o
ni
n
to
O
O
o
o
n -•
fu
x
U> CO
z c
z t-
tu QC
X 3
to *
o
u
O
a.
o>
— ' OJ
o o
n _>
•o «
o >
IX
a. o
— o
o 2
. M
III
U1 f^
o «o
H
«
bi *
•r
< c
a •>
ui
•- o
z a.
1-1 IU
H <
IE
to ts
IU O
J CO
« »^
» X
ru u.
45 UJ
5T O
m u
UJ
O
u
* OB
o »
a- —
o
o
c
-
o
o
o
o
n
u
a:
ac
CO
u.
o
I
u>
o
Z —
O -C
O tK .
CO «t —
<\J
K> • U.
~ O
o- »,
• • »—
-> at z
•« ut
** «J •«
s o s
• O u O
O
a:
HI
!>-
O-
o
o
o
o
o
a
•-I
X
O
z
c
(J
o
X
<
oe
O c
o _ _ _ _
in oicfu-oa—o-cc —
. a;........
I a. (V rgru —
Z
«
« X
.
O
z
I
»-
!»
z
o
to
Lki
<3
(0 OOOOOOOOOOO
IU
x
ut
^ ft~c^<^^!?^®°^?!?
O Z
(O
«t
to
to
•-I
T
288
-------
DENVER, CO
o
o
o
ooooooooooooooooooooooooooooooooooooooooooooooooo
a
u
o
o
ooooooooooooooooooooooooooooooooooooeeooooooooooo
ooooeocooooooooooooooooooooooooooooooocooooeooeoo
u.
o
289
-------
DENVER, CO
-c
z »
o
b. (XI
»
to
Ul
or
o
ui ni
« I
K
o
U 9
O t
* « *
* « * « «
oc
o
u
o
Ul U. O O O
IT te.
00
(JO
oooocoooooooooooeoooooooooitooooooooo•toooooooo
kl
u
O
-
« •«
JO
290
-------
DENVER. CO
*
• « «
r-
K
in
in
ooooooooooooooooooo-***
oof t t looooooot I I I c o o o
^in^r^or—-*m
I —— o •
ooooooooooooocooo
I I I I III!
C O O — O O O OL
c
o
c
xi nj nj n< ru nj ni ru nj r\t >M m ni nj ni •->
a:
o
291
-------
ALEXANDRIA, VA
COMJ4MMMMMMMMM
o o « •« cooo>a9o>o>r\j
c in co —
_t uicc
a.
U.MMMMMMMMMMM
or or-oa?ooinu"»oo^
O-
•O
CJ
» I
r--
•
t-
z
n
I
>-
UJ >O
I o
-o
o -
in "
in
o-
lu
a
c
a:
a
z
n
to
Ul
Z
u.
o
> o
ac
CL
ar
o
UJ
I- O
2 Of
l-t UJ
^ Nl
>- O
O '
u.
o
-C
o
•
ru
O ~
ru
m
u
— tr 2.
4 UJ
2k »-•
«-• O
H
II UJ
O
«
a>
.
O
o
a u. z z < >
Z C i i *-
U D O O C •- to
£ i. I to uJ
Z UJ X
u.
o
o
a m
-------
ALEXANDRIA. VA
XXXXMXMXXXXXXXXXMXMXXXXXXXMXXXMXXXXXKXXXMXXXXXXXX
XXXXXXMXXXXXXMXXXXXXXXXXXXXXXXXXXXXXXXMXKXXXXMXMM
ji-Of^.pjiarur^t^i^oocoooooooooooooooooooooooooooooooooooo
o
•
o
o-
If
—•OOOGCOOOOOOOOOOOOOOOCCCOOOOC'OOOOOOOOCOOCOOOOOOOO
OOOOOOOOOOOC7OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
X
o
ooooooooooooooooooooooococoocoooooooooooooooooooc
e
UJ
-------
ALEXANDRIA, VA
-O 4-
Z +
O
M
t- 9 •»•
O *
«
«
«
* «
« « «
« « « «
Z I •
o »
•1 O * •
Ul M
£t I
(E
O
O 9
O I
< -o
I
• « *
« « « « «
00
UJ
O
a:
OC
O
u
o
Ullft. OOOOOOOOO
orb.
o
fvjr-^
•» t~ w rv «
9 » o- i/>rv«u(M 9 irt om « in
K>r-afv<>*i\i — Ki»-»Ki
«->r» cv « »o o o IM .
B3
C3QC «4
294
-------
ALEXANDRIA. VA
»
r-
cc
Kl
o-
ru
« *
««««*«« «««
•
UJ
z
o
z
«
•>
•
OOGGGOGOGGGOGGGOOOGGOGOGO 3P
«
Z
o
M
»*
<
_l
UJ
Or
GGGGCGGOGOGGGGOGOGOGGOOGG OC
O
o
o
o
• •••••••••••••••••••••••ft «
• o o o o o » •>
( « « « « • « « «0 « « • C « « « « « «> C CO C « X
tt
O
295
-------
NORFOLK, VA
*t »»,»» W
».*•!»>'*<>
*« W M M M< W *l »
~*«>9«in
t-oiflni9
cima -o r~
•a
>~
ru
W-*
•e
>a
n<
-O Kl
»- -a
o- ^
ni
«D
o o
O Itl
• a
_» o
_l *-
o «
X
ID
— Ill
O
K
Z
*
I—
•D
U'LI «- I) «J
O
•< ac
•> a.
>
O
•o u
O O
in u
o
a:
IV.
O
O
(J
o
e
X
t~ X
D • «r
«H OC
X « 19
tu o
_l 3 t-
•tt _l «0
> X
O
o
o
u.
a
o
(S tt.
•« o
o o
— 2
01
o
— tw
m CD
Ul
ac
»
< iri
b.
O
Z
kl
o
o
o
o
o
•
»
01
Ul
z
0-10
*-• O *— »< i
O O 2. ->••
in u
• a:
10
m
t- _l
«o
r
o
Z 2 •« >
UJ St' >
r x >-
O O I- W
r T CO 'al
o
u
m
Uj
3
ooocoooeooc
O
Ct
-I ^1
o i-
in
<
co
u. *- c- c c c c c- c c- c f, c
O 2
• c c — ruK'ev-cr-'CCcr
O I
2
296
-------
NORFOLK. VA
OOOOGOOO
OOOOOOOOOOOOOOOOOGOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOGOOOOGOOOOOOOOOOO
oooooooooooooGooeoooooooooooooooooooooooo
o
o
» «iw — IM ni ra
=J I'l — -O -« -< «
ooooooooooooooooooooooooooocoooocoooooooooooooooo
OOOOOOOCOOOOGCOGOOOOOOCOOCOOGOOOOGOOOGOOOGCGCOOOG
o
IE
a
w
CO3GGGCCGGCGOGGOOGOCOGGGCCCC.OOGCGC>OOGGC'OOGOOCGCO
-
o
•
o
297
-------
NORFOLK, VA
K
K
O
t>
O
OOOO*^««^*400OOOOOOOOO
oooooeoocoooo
kite. ooco
cceocoeoo
oocooc: oceooooooooooe
ooot••••10000
-^-.^— iCOOCO—OOOOCOOOOOOOOOOOOO
OOOOOCOOOOOOOOCOOOO
OOOOOC.OOOOOOOOOOOOOOOOCOOOOOOO
298
-------
NORFOLK. VA
ni
•
« « « « «
« « « « « «
o
•9
»-*
K
•«
>
O
.2
«8
«• sr
r^
»«.
m
»
O9OOOCOOOOOOOOOOOOCOOOOOO •
I t i locconoccoccc icocccooi ?
-ft
Ul
z
z
o
•-4
•Or-Om0O9^AIO>OIO»^9f'lfO»«9C>9/V-O
mrw?7
-------
BARSTOW, CA
(OMMMMMMMMXMM
'99r-.e.c.C
— — ir
m t-t
uj 01
2 O
at >-
UJ IE
-
o o
O O
u u
o
o
o
o
•-*
z
a
tu
o
o
n
UJ
O
a
z
•-, Ul
" IM
O
O
o
o
o
o
f-
9
u.
C
o
»*
I-
3
o
tt 1>-
o
o
v>
o
o
if
lu <•
K
« O
<
IT ILi
O tt
H
•
or
k.
O
I
I
I
I
•5
O
•9
O
at >»-
Z Kl ec
o
O
CE
UJO
a. i- (
z
oooooooooco
at
Ul
300
-------
BARSTOW. CA
ooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooeoooo
ooooooooooooooooooooooooooooocoooecoooo<
ooooooooooooooooocooooooooooooooooooooooo<
•
o
o
in
o-
in
oocoooooooooooocooocoeoooocoocoooccoccoo^ooocoocc-
ooc o c oc oooocoooooooc oooooooooooocooooc ocooooc^ooc
OOGOOOOOOGOCOOOOOOOOOOOOOOOOOOOOOGOOOOOCOOCOOOOOO
u
<3
tr
c
c
u.1
ooooooooooooeoooooo
-ooooooooooooc^oooooocoooooo
oooooooooooeoooooooooooooooooooooooooc ocoooooooeo
o
•
o
301
-------
BARSTOW. CA
*
m
3
O)
III
-o
z *
o
u. ni
*
c
•-> o
ui ni
ae t
tt
u «
C I
t-
3
•tf
O
U
oooocoeoooocoooooccooocooooccccooeecocococoeoooccc
JO OOOOOOOOOOOOOOOOOOOOOOO
CO
OCE
«3
JO
302
-------
BARSTOW. CA
•
o-
* « * 00
« ««««««
««« *««««« «««*««*
H
UJ
u
2
4
X
O
2
<
r + If*
9
a
•
o
u
OOOOOOOOOOOOOOOOOOOOOOOOO X
<
UJ
z
o
• . • *"* .................... B
ccooooocoooooecococococco a
c
u
o
t-
3
«
•A-^
-------
APPENDIX D
BASIC STATISTICS, HISTOGRAMS, AND AUTOCORRELATIONS
FOR THE LONGITUDINAL DATA GROUP
305
-------
WASHINGTON, DC - 1966
MMMMMMMMMMM
in
^-
ni
u
» >
o
«-•
a
ni
i/i
O
nt
•
u o
o
o z
•o
a-
o
a:
o
z
•o
4>
in
«
0
«
51
10 •>
(O n
tu n
z o
z »-
UJ CE
^ 3
u> ^
o o
u u
or
m o
bj i i
O
or
ui
o
o
-o
f»
ni
o
z
o
I-
a)
»*4
a
to
a
o *— ;
o z <
a-
r-
o
H
Q
0>
Ul
o
•
o
a • UL
o
z
Ul
CO
CO
ni
- «o
CO Ul
Ul I
Q X X. i»
a »- o —
« •» j r
Ul
X
i3
a
z
o
u
Ul
m
z
<
I
CO
Ul
-. CM -. -.
-------
WASHINGTON, DC - 1966
MMMMMMXMMMMMMMMMMMMMMMMMKttMMMMMMMttMMMMXMMKMMMMa*****
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
UOOUUUUUUUUUUUUUUUUUUUUUUUtJUUUUUUUUUUUOOUUUUOUCJ
o
in
a:
UJ
oooooooooooooooooooooooooooooooocoooooooooooooooo o
a
o
oooocoooooooooooooooooooooooooooooooooooooooooooo m
o
z
307
-------
WASHINGTON, DC - 1966
z «•
o
« « «
« * « « «
O I
»• o •
at
o
_
•>
U
K
Z
O
DC
O
U
O
z. — <
o z
(C U.
OC Ul
o o
u u
Ul
u
•H •
«
o
u
•>
a:
308
-------
WASHINGTON. DC - 1966
OOOOOOOOOOOOOOOOOOOOOOOOO
« nj oj »«-. o» r~ r~ « t^ r- » — -• — <»• r^ ui
X '
o
309
-------
WASHINGTON, DC - 1968
OOMMMMMMMMMMM,
kJMMMMMMMMMMM
iw
•o
•
ni
ni
r\i
o-
(V
u
o
o
a:
I*
o
o
ru
o-
u —
X
t-
o •
05
O
<« ac
u o
u
o
(9 4
z :>
x o
n i-.
O b.
U O
•c
•o
a-
•>
i-.
«o
•
o
Ul
or
«r>
IU
o
z
u.
o
rjc
t- U. K
Z IL ?
U bu UJ
Z O £
QUO
Z Z
iu
i
C
«
7
o
2
ru
O
a:
m
UJ
W
UJ
x
(3
T
O
c
u
i
i
i
i
•a
«O 3
03 Z
_>
UJ
ooooooooooo
03
UJ
X
I- O >-
a
is
o
|L »-OOOOC, OOOOOO
O Z IT I
I
310
-------
WASHINGTON, DC - 1968
MMMMMMXMKMMMMMMMMMMMMMMMMMMMKMMMMMMMMMMXMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMMMMMMMMMXMMWMMMMMXMMMMMMMMMMM
UUUUUUUUOUUUUUCIUUUUUUUUUUUUUUUOUUUUUUUUUUUOUIJU
u u o
o
o
o
o
o
I
» *-
w^^OCOOOOOOOCOOOCC COOCOCOOOOOOOOOOOC OO
oooooooooocooooocoooooooocoooooooocoooooo
OOOOO
oooo
Ul
a
(9
or
o
ooooooocoooocooc oooooooooocoocoooooooooo ooooooooo
UJ
J
«
>
i/> * t- a a o —
ooooooocoooooooooooooocoooocooooooooooooooooooooo
O
311
-------
WASHINGTON, DC - 1968
• *
•*• t
* *
I
I « « *
•o «• « * «
O I « « *
£*i?£ «
O + I « « «
o
•t O
«o
hi
wni
O
*
O
IK
O
u
o
oz
t-lttl
UlL
ce u.
oc u
oo
ou
o
u
K
•^
«l
O.
-10
312
-------
WASHINGTON. DC • 1968
«•
•
«*«« *«* lit
««*«««*«««««««« «««««««! «
<«*<<«««««««•*««««««*«««« I tt
4t4(£4C4(#4(4t4l4l4l4lV4l4C4l4l4tllff4l444'* + •
««««*««««««*«««««*«««««««l -O
««•**•«««*««•*«««««••«««• I
H
IU
u
2
<
••H
>
«•
O
z
«
<»
*4
<•
OOOOOOGOOOOGOOOOOOOOOOOOO 2
u
z
o
••4
Ul
a
a
o
u
o
t-
o
r • • «
(\ir\ini«oooo-v?cair~t^r^r--c-<)-e«««999 o>
>-
a:
o
u.
313
-------
WASHINGTON, DC - 1969
— »i/itn«oivo'
_» ~«m inr- co » o- <>• » o-
o.
«o
m
o-
•o
in
in
a.
x o
« Ul
U Sf.
c
I-
o to
o
to
Z Ul
o r>
x a
CO <->
o u.
U O
IU
o
o
IX
o
z
o t-
9 -O
ru in
r>- o
O 9
o in
IR in
co m
CO w
Ul CO
z c
I •-
ui a
x 3
to ^
o
c
ii —
I
i-
O H
»«i
3 CO
u
+ •
— »n in r^ a> a o o- o- o-
or
u>
O
Cf
m
o
in
-o
in
r-
o ~o
CO •
in o
ac
in
IV
Ul
o
o
ui
o
< u.
»- o
c
z
o
C a
>- -c
>- f\l
« cv
•i in
a. •
o
•
e
H
C
UJ
a
CO
Ul
n
o
z
CO
u
m
u
>«i
CO
«
CD
•c
" H U.
9 O
in —
• • »-
»o a z
•a ui
> t—
\* u
•z u. z
u; ui ui
£ O S
cue
Ul
z
O
Z
o
or
3.
C
2
I
»-
IT
»
»-
CO
Ul
c
in
ni
H
Ui
CO
Ui
X
a
in
Ul
CO
ILt
X
X
c
c
u
Ul
a ID
• a: ......
I Q. *^ f\j rv — --
CO O
CO Z
oooccoooooo-
u.
o
a:
cs
o
> et
ujlllllllllll
u. i-ooooooooeoo
314
-------
WASHINGTON, DC - 1969
MMMMMMKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
^r-ai
9njt-
•CO-—<
•csro-o-r^ic.ceiri'iKiiriattoeoeoooooeoooooooooooooooooooooooooo
OO-O'O-O'O-O-C^O'O'O-O-tf-OOOOOOOOOGGOOOOOGOGOOOOOQGOOOOOOOOOO
M
o
G
o
OOC' OGCOCOG 000000000000000000000000000000000000000
X
Q
O-COCOC-OOOOOC OOCfOOOC OOOOO-C OC OOOC OOCOOC OC OCCOOOOOOO
03
UJ
o
•
O
C'OOOOOOOOOCOOOOOCOOOCOOOOCOOOOCOOOCCOOOGOOOOOGOOO
315
-------
WASHINGTON, DC - 1969
to
Ul
oc
z
o
UJ
a:
DC
O
U
o
«
*
ooooooooooooooooooooooooeooooooooooooooooooooooooo
OOOOOOOOOOOOOOCOOCOOOCO
-------
WASHINGTON. DC - 1MB
-O
(M
« F>-
« « • « « « «
• «««•«««««« ««««* l/>
*««*«««•««•«««*«<««««*•««
• «*«««««*««•««««««««•«««*
N
bJ
Z
<
••*
X
•9
»
a
z
4
i * -O
in
m
Kl
N
ooooooooooooooccooooooooo ?
«
IU
O
>-«
t-
^
_l
UJ
......................... o:
>occo^*.^*4«Mv4v4«p>vt^-
a:
ti.
317
-------
WASHINGTON. DC - 1970
— =» 4> r-
«»-O
c<»~
uj to
z c
UJ
x
o>
a
1C
o
o
o
o
II
r
o:
UI
(9 «
Z A
•-i
X O
CO "-•
< -I
c u.
u o
c
c r
r-
o-
u. u.
o o
UJ U)
O O O
CO U U
o
o
r»
rv
u
CO
UJ
tsl
U.
o
tr
C5
O
*-
CO
•-*
X
in
in
•o
o
•o
•
•c
u.
o
z
o
ZJ
CD
« * «
o « *
z —
o —
O
c
•c
o
o
o
»- in — c
4 1C " O C
^^ CO 9 • •
cc. . . o in
< in o> 9
u.
O
u
c
UJ
ae
co
UJ
u.
O
O
z
CO
u
co
u
fe*
CO
«
te
ec
»
r-
z
UJ
~ O
»^
i- u.
7. U.
n
w
II
UJ
o
S.
o
z
nj
z •< >
u. >
z •-
O K CO
£ CO UJ
UJ X
I I IB
t- O •-
9 _l X
U
UJ
CO
Ul
X
o
t-»
I
c
z
o
u
in
O
o
If
•a
c
o
i.
o
co
••*
X
CO
CO
u.
o
o
z
oocooooooeo
»-ooooooooooo
z
318
-------
WASHINGTON, DC - 1970
MXXXXXXXMXXXXXKXMXMMXXXXXXXMXXXMXXXXXXXXXXXXXXXXM
XXXXXXXMXXXWXWXXXXXXXXMMXXXXXXMXMXXXXXXXXXXMXXXXX
X
o
siac— •• — ococoooooooeocccocoecccoocecocccccocoecoeco
OCCOOOOOOOC;OOOOOOOOCOOOCOCOCCOOOOOOOCCOCCOOOOCOOO
If
•
o-
z
«
z
a
U)
a
IS
a:
c
coooooocoocoooccocooooococooooococeccc-oococococc
c
lu
n
lu
3
ooooooooooooooooooooo^oooooooooooooocoooooooooooo
u.
O
319
-------
WASHINGTON, DC - 1970
•O *
§* I
•
>-* i
U •*•
« *
« « «
« « « «
«
« « *
(9
t-
_>
IU
te
O
UJ
oc
CE
O
U
o
in ni *
oc »
o
U 9
O I
-o *
CO •*>
oz
l-i UJ
ID It. O
ac u.
ce IK
oo
oo
oooooooooooooocooooooooooooooooooooooooeooocooooo
«
«
*
«
«
UJ
u
c
u
— -foct^ccc—e —
— —(MfV" — occooo
0)
CE
<
a.
(9 >• O O — — I
«t « f»..«^. ...>>..••••.<..•
jo oooeoooooooooooooooococ.
320
-------
WASHINGTON. DC • 1970
«•
«
«
«
•
•e
« ««*«««*« «««
« « « « « «
t>J
u
z
«
*rt
>
O
<
r>
o
9
•
«
oocooooooooooolllltooocol z
z
z
o
»^
i. ....................... a
iCOCCCCOOCOCOCOCOCCCCOCCC IK
I I I I I I O
u
o
-O « IT IT 51
> Kl C
321
-------
WASHINGTON, DC - 1971
•9MMMMMMKMMMM
O t>
~» .or- eo » o- o- o- o-
in
•
ni
in
in m
in
>- o -»
< « —
(O H
!-• O
O O
I o. «(\i r\i -«
n
a
<
ui
(O
Ul
.
o
•
o
t
I
I
*
I
10
u
to
u
t-f
<0
<
o
f\ at z
« ui
in t- u.
m z u.
r» ui ui
II
Ul
(O
Ul
I
o
z
o
z
<
* I
-------
WASHINGTON, DC - 1971
MXMMXMMMMMMMMMMMMMXMMMMMMMMMMMMXMMMMMMMXMMXMMMMMM
-
a-
in
-Or«lAI'\IO^O*-'OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOGOOOOOOOOOQOOOOOOOOO
si ru -. «
tr
O
IX
o
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
a
Ui
09
UJ
J
o
«
o
323
-------
WASHINGTON. DC - 1971
« * « •
OC I
o
L>=»
O t
»- 02
-«-« iriAif" -
cu »mM»»o —rur^ir>»-o«>
OOOI I I 1000000
UI
u
«— — oooooooooooooooooooooc^—o
I > I I
OIK v^
324
-------
WASHINGTON, DC - 1971
r-
o
-o
•
•o
« «
««««««« •**«
<««««««««««««««««««*«««•«
««« ««««« • II
* «
Ul
u
z
<
»••
o:
»
3
z
<
10
•
O O | I IOOOOOOOOOI I I I I O O O O O I z
•4
UJ
z
z
o
••*
• ••••••••••••••••••••a*** QC
OOGOOGGGGGOOOGOOO^^OOOGGO O:
III I I I I I 10
u
o
ac
o
325
-------
WASHINGTON, DC - 1972
IUMMMMXMMMMMM
oco-.o-r^ooira — o-o —
u. w «c rvj p- o o ccc««o
01
ni
Of O-
i- -c
-o
tO
in
o
•
Kl
o
ni
o
a
a
z
O
it]
a:
o
u
u.
o o
u
c
ru z
IT
fa
Ul
•9
Ui
U.
O
o
n i->
Ui
u. >
Z I-
O I- CO
z z z co uj
uj x
O O X X 13
Z OC K O ••<
^w KI a _< x
CO
UJ
T
(5
O
z
c
<-l
UJ
CO
o
o
c
X
I-
o
CE
UJ
<
o
u.
o
1C
•M
or
»-
tr
u
» i
m
ru
z
H «
a
UJ !
_J CO
< l-l
> X
D
or
lu
c
z
*
u «
* «
* * *
* * *
* « * «
* o * « *
« « « « « H
!«««««•<
1 « « * *
>*«*««
o:
uJ
a.
C
X
4
tr
(3
O
»-
CO
•^
I
III
c
o
o
IT
•
o
I
z
X
CO OOOOOOOOCOO
u.' in
^ _'• «•
b. >-OOOOOOOOOOO
O • ----------
326
-------
WASHINGTON, DC • 1972
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXMXX
XKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXK
UUUUUUUUUUUUIJI
I l_l <-> U <-> U O U
K
o
o
o
z
<
z
a
UJ
a
a
o
o
>-
-I
«
c
o
--OGGOOOGGGGCGOOGCOCoeGOCOGOCCCGOGGCOGGOGCGCOCOOOO
GGGGGOGGGGGGGGGOGCGGGGGGGOCGGOGGCGGOGGOGGGOOGGGOG
GGOOOOGOGOOOOGGOOGGOGGGGGOCGGGOGOOGOOOGGGGGGGGOOG
327
-------
WASHINGTON, DC - 1972
2 »
O
«««*«
s*s
O I
wo*
at
o
u>
K
oc
o
u
o
in AI
a. I
U 9
O I
I
G
•
V4
I
z •-
o z
lij IL. OOOOOOGOOGGGOGGOGOGGOGGOGOGGGOGGGGGOOGGGGGGOGGOG'
a u.
o o
(J
2
O
U
r-irr~o-3 — -conj
coooo— ec — o
ts *• o o ->
«« ............
JO OOOOOOGOGOGG
328
-------
WASHINGTON. DC • 1*72
IP
3
IV
« « * •
III
u
ooooooooooooooocooooooooo
— IT — —
I V U-'
• . DC
i — a
c
u
c
en
i^
I
Of
tft o
— u.
329
-------
WASHINGTON. DC - 1973
O *) CO O CC 9 H> CO «• It in •£
_)
a.
<0
Kl
K>
•
M
—a IT r~«s « »•
o
DC
a
z o
« ILI
O a
O
f-
LJ CO
O
05
Z b.
O ^
»- -J
ir «
z =»
•-*
X O
CO 1-1
C U-
st
IM x
t- IM
O- C
O M
IT -C
•u in
« (0
O> l-l
UJ CO
Z O
3E I-
Ul IK
at 3
to x.
u. u.
o c
C U. U.
—. tfi t.t
IM O O
^} O U
9
oc
I/I
O
IM
CO
o
o
in
SI
IV
1C
o- • u.
r^ C
•o «
-c
•
IV
fV
c
•
o
o
c
o
c
in
u •«
I
o •
X 03
Ul
_l =>
< -I
> <
a =»
UJ
•- o
z a:
o •
o
u. z
o
o
I- O
^
»*
Z
UJ
O
Z
I
>-
7
r-
_i
«
>
t~
CO
UJ
2
0
_l
«
>
»-
CO
UJ
z
^*
I
o
z
c
«^
it.
«0
or
u.
c
Z
4
I
z
o> n
05 z
CO
U.'
ooooooocooo
r • * ******
« >
» a.
o
•
o
330
-------
WASHINGTON, DC - 1973
oooooocoooooccoooooooooooooooooooooococccooooococ
ooooooooooooooooooooooooooooooocoooooooooooooccoc v>
ooooooc^oooooooooooooooooooooooooooooooooooooooooo
331
-------
WASHINGTON, DC - 1973
-« t
*• I «
i «
o> *• <
I « «
I « *
Z «•
O
u. ni «•
Z I
C I
•-i O » •
»- I
•a I
« uj ni •*•
OL I I
* HI
O I
« O =J +
Oil
« I- I
D I
< < -C +
I I
10 I
I- I
-J OC *•
r it
o> i
U.' 01
cr • + i
z i
c
t- 2 »-
»- O Z
a ^- lii
Z UIU. OOOCOOOOOOOOOOOOOOOCOOOGOOCOOOOOOOOOOOOOOOOOOOOOOO
o a u.
U OC t'1
o o o
UJ
u
>- — -civ*
CO
a:
> e coeeocoooeooc-cooooooocC'
to
ua: ^
332
-------
WASHINGTON, DC - 1973
««««*««
|H
— -o
i\*-«
oooeoeoooooococcooooooooo
u
z
Z
o
a
a
u
o
<
CO
I
333
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
1. REPORT NO.
2.
3. RECIPIENT'S ACCESSION NO.
4. TITLE AND SUBTITLE
TESTING THE VALIDITY OF THE LOGNORMAL PROBABILITY MODEL:
OMPUTER ANALYSIS OF CARBON MONOXIDE DATA FROM U.S.
CITIES
5. REPORT DATE
June 1979
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
Wayne R. Ott, David T. Mage, and Victor W. Randecker
8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
10. PROGRAM ELEMENT NO.
1AD8TI
N/A
11. CONTRACT/GRANT NO.
N/A
12. SPONSORING AGENCY NAME AND ADDRESS
Office of Monitoring and Technical Support
Office of Research and Development
J.S. Environmental Protection Agency
Washington, D.C. 20460
13. TYPE OF REPORT AND PERIOD COVERED
14. SPONSORING AGENCY CODE
EPA/600/19
13. SUPPLEMENTARY NOTES
16. ABSTRACT
A stratified sample consisting of 11 data sets from an original list of 166 CO
air quality data sets in SAROAD was selected as a "national crossection" of observed
U.S. CO concentrations, along with a longitudinal group of data sets from a single
air monitoring station. The adequacy of the 2-parameter lognormal probability model
(LN2) was tested using these data. A special-purpose computer program was developed
for calculating the parameters of the LN2 model using four different techniques:
(1) direct "method of moments," (2) Larsen's "method of fractiles," (3) "maximum
likelihood estimation" (MLE) for grouped data using an MLE approximation technique,
and (4) MLE for grouped data using computer optimization.
The goodness-of-fit of the LN2 model to the data was evaluated using frequency-
based approaches (e.g., chi-square, magnitude of log-likelihood function, Kolmogorov-
Smirnov measures) and variate-based approaches (e.g., the difference between the con-
centration predicted by the model for each interval and the concentration actually
observed). The findings show that the method of calculating parameters for the LN2
model exerts a profound influence on the goodness-of-fit of the model to the data,
particularly at the higher concentrations of greatest interest for protecting public
health. The method of fractiles performs best for the variate-based tests but poorest
for the frequency-based tests, suggesting that it is responding more to the randomness
at the extremes than to the underlying distribution of the process.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.IDENTIFIERS/OPEN ENDED TERMS C. COS AT I Field/Group
Mathematical models, -Computer simulation,
Systems analysis, Statistical analysis,
Pollution - Air pollution, Sanitary
engineering, Environmental engineering,
Civil engineering
Environmental modeling,
Data analysis, Computer
Techniques, Probability
Models, Environmental
monitoring, Environ-
metrics, Decision anysis
04B
05C
06F
12A
12B
13B
18. DISTRIBUTION STATEMENT
RELEASE TO PUBLIC
19. SECURITY CLASS (ThisReport)
UNCLASSIFIED
21. NO. OF PAGES
20. SECURITY CLASS (Thispage)
UNCLASSIFIED
22. PRICE
EPA Form 2220-1 (9-73)
33^
lOmCfe H7I .281-147/132
-------
|