EPA-650/4-75-005
OBJECTIVE PROCEDURES FOR OPTIMUM LOCATION
OF AIR POLLUTION OBSERVATION STATIONS
by
C. Eugene Buell
Kaman Sciences Corporation
1500 Garden of the Gods Road
P. O. Box 7463
Colorado Springs, Colorado 80933
Contract No. 68-02-0699
EPA Project Officer: Kenneth L. Calder
Meteorology and Assessment Division
Environmental Sciences Research Laboratory
Research Triangle Park, North Carolina 27711
Prepared for
U.S. ENVIRONMENTAL PROTECTION AGENCY
Office of Research and Development
Environmental Sciences Research Laboratory
Research Triangle Park, North Carolina 27711
June 1975
-------
DISCLAIMER
This report has been reviewed by the Environmental Sciences
Research Laboratory, U.S. Environmental Protection Agency, and approved
for publication. Approval does not signify that the contents necessarily
reflect the views and policies of the U.S. Environmental Protection
Agency, nor does mention of trade names or commercial products constitute
endorsement or recommendation for use.
-------
ACKNOWLEDGMENTS
We express our appreciation for the assistance of
Mrs. Barbara Byerly for basic programming and especially
to Mr. Edwin Bathke for invaluable assistance in solving
our major problems and preparing this report.
-------
ABSTRACT
This document is concerned with the development of
linear regression techniques for interpolation of air pollutant
concentrations over an area and, using these techniques, the
construction of a computer program for determining the
optimum location of air pollution observing stations. The
general interpolation problem is surveyed in the first
chapter and the advantages of using linear regression formulas
as interpolation formulas are discussed. Special emphasis is
placed on the case in which the observations contain errors
of observation or effects of limited range of influence.
Since the use of linear regression methods depends on know-
ledge of the two-point correlation function for pollutant
concentration measures, the construction of correlation co-
efficient functions from synthetic data is taken up, together
with methods for interpolation of this information. Con-
siderable attention is given to the estimation of residual
variances or the effects of limited range of influence. It
is pointed out that certain aspects of Factor Analysis can
be used for this purpose. These methods are extended to
a continuous formulation of the problem in integral equation
form and it is shown that the lack of accuracy in the strictly
mathematical process of solution of the integral equation tends
to be more important than the statistical significance of
the data unless the residual variances are removed. If this
is done, then the tests for accuracy and statistical significance
are reconciled.
The computer program depends heavily on this last point.
It appears to work well when the residual variances are care-
fully handled. Many of the difficulties encountered in this
program were traced to this source so that users of this program
should be aware of this in constructing the input materials.
The reader's attention is directed to Chapter IV where this
is discussed in detail.
111
-------
PREFACE
The spatial distribution of air quality over an urban
area represents a statistically random field of pollutant
concentration since, even in principle, it is not possible
to specify its structure in a deterministic fashion, but
only in terms of various statistical properties. The space-
and time-variability of air quality thus present difficult
questions that are very inadequately and simplistically
analyzed at the present time, and of which a fundamental
understanding is still lacking. The accuracy of any analysis
that utilizes air quality data evidentally depends on the
intrinsic accuracy of the data and the density of the sampling
network at which the data are available, since based on
the latter values, it will normally be necessary to interpolate
values for intermediate locations. An analysis of the criteria
for objective selection, i.e., that does not involve personal
or subjective judgment, of the optimum sampling network does
not exist at the present time and is urgently needed. An
"optimum network" is here meant in the sense of a network
that is free from redundant observations, namely, data that
could be derived with "sufficient accuracy" by some specified
interpolation procedures from the given sampling network.
Similar decisions are required in establishing meteorological
observation networks and in this case have received a great
deal of attention that is reported in an extensive modern
literature. It is now required to develop and extend the
appropriate statistical methodology so that it will be directly
applicable to the selection of air quality sampling networks,
and having due regard to both the cost and informational
content of the network. At the time the present study was
initiated this appeared to be of particular importance in
view of the then forthcoming EPA Regional Air Pollution
Study which required the establishment of both a large air
quality sampling network and also an extensive meteorological
network.
iv
-------
Very early in the research described in this report
the exceptional subtleties and sophisticated difficulties of
the optimization problem'became apparent, and an unexpectedly
large number of unforseen statistical, mathematical and com-
putational problems were uncovered. Several different approaches,
even involving variations of the logical structure of the
problem, were explored by the contractor in a highly innovative
fashion. Because of time limitations it was finally decided
to specialize the problem and to study in some detail a one-
step-at-a-time add-on method of locating sampling stations. For
this it is assumed to start with that there are a few existing
observation points, or at least a few points at which obser-
vations will be made based on prior considerations. On the
basis of this starting network of observation points, a pro-
cedure is then developed to determine the location where the
statistical error of estimate, using a simple linear-type
interpolation formula in terms of the observed network con-
centration values, would be largest. This point would then
be accepted as a best location for a new observation point,
and the process repeated in an iterative fashion until the
required number of station locations had been determined.
Unfortunately, a fully operational and purely objective
computerized program was not achieved within the scope of
the present contract. However, in support of further study
of the problem, it is considered very desirable to make
readily available a complete account of the present research.
This is now offered in the hopes of it becoming a major
contribution towards future resolution of an exceptionally
difficult and subtle problem that continues to be of major
importance in defining the spatial distribution of air quality.
Research Triangle Park,
North Carolina
December 1975
Kenneth L. Calder
Chief Scientist
Meteorology & Assessment Division
Environmental Sciences Research Lab,
v
-------
TABLE OF CONTENTS
Page
ABSTRACT i:Li
PREFACE iv
TABLE OF CONTENTS vi
CHAPTER I. INTERPOLATION OF POLLUTANT CONCENTRATIONS 1
A. Interpolation Between Observation Points 1
B. Linear Regression as an Interpolation Formula 6
1. Derivation of the Basic Relations 6
2. Interpretation of Results 9
a. The point P at a point of P. 9
b. Continuous Correlation Coefficient 10
c. Extremely Discontinuous Correlation
Coefficient H
d. Discontinuous Correlation Coefficient 12
e. Range of Influence 13
f. Small Scale Effects 14
g. Summary 15
3. Error of Interpolated Values 16
a. Implications for Measurement Locations 17
CHAPTER II. DETAILED FORMULATION OF THE PROBLEM i9
A. Area Covered 19
B. Observation Collection 20
C. Source Locations, Types, Rates 21
D. Meteorological Conditions 21
E. Type of Pollution Measurements 22
F. Interpolation/Extrapolation Methods 22
G. Optimization Method 28
VI
-------
TABLE OF CONTENTS (Cont'd)
Pac
CHAPTER
A.
B.
C.
D.
CHAPTER
A.
B.
C.
D.
CHAPTER
A.
B.
C.
D.
E.
III. THE CORRELATION OF POLLUTANT CON-
CENTRATION BETWEEN POINTS
The Model Used to Generate Synthetic Data
Examples of the Correlation Coefficients
Interpolation of the Correlation Coefficients
Generation of Synthetic Pollution Correlation
Coefficients
1. Main Program
2. Subroutines
IV. THE ANALYSIS OF THE COVARIANCE MATRICES
Evaluation of the Effect of Errors of
Measurement or of Small Scale Phenomena
Determination of the Residual Variances
1. Representation of a Matrix in Terms
of Proper Functions/Proper Values
2. Factor Analysis Methods
3. Integral Equation Methods
a. The Jump Discontinuity
b. The Quadrature Factors
c. The Product Integral Technique
d. Test for Accuracy of Solutions
Analysis of St. Louis SO- Data
1. The Basic Data
2. Quadrature Factors and Techniques
3. Factor Analysis Methods
Summary and Conclusions
V. PROGRAM BAST AND ITS SUBROUTINES
Program BAST
Subroutine CIRCUM
Subroutine FMFP
Subroutine FUNB
Subroutine FUNCT
30
30
30
36
42
42
43
56
56
61
62
64
70
72
74
85
88
91
91
98
111
123
125
126
148
151
158
161
Vll
-------
TABLE OF CONTENTS (Cont'd)
Page
F. Subroutine CORFUN 167
G. Subroutine INT2D 173
H. Subroutine MATINV 175
I. Subroutine ADOPT(XO,YO) 178
J. Subroutine TRIFIX 184
K. Subroutine PORDR 188
L. Subroutine AR2 190
M. Subroutine TRITST 192
REFERENCES 194
APPENDIX A: Conditions on an Empirical Formula
for a Correlation Coefficient A-l
APPENDIX B: Note on the Rank of a Covariance Matrix B-l
Vlll
-------
CHAPTER I
INTERPOLATION OF POLLUTANT CONCENTRATIONS
The relationships between the correlation function of
pollutant concentrations at two points and the spacing of
observation points are discussed in the first two sections.
The first section is devoted to a simple analysis of the error
of interpolation using standard interpolation methods. This
serves to introduce some of the basic ideas involved. The
next section is devoted to the use of a linear regression as
an optimum interpolation formula. Several important aspects
of the problem are covered, particularly the role of errors
and small scale effects. The use of the linear regression as
an interpolation formula is intimately connected with the idea
of a Wiener Filter (Wiener, 1949) for both smoothing and
interpolation.
A. Interpolation Between Observation Points
The pollutant concentration is observed at a network of
points P. and has values x. at these points. The total field
of pollutant concentration is depicted by drawing contours of
equal concentration values which are determined by the observed
values. Usually, the values are sketched "by eye". To make
the procedure quantitative for analysis purposes, it is neces-
sary to specify a particular procedure used to determing the
interpolated pollution field. This is done by specifying an
interpolation formula that is used to describe the process.
Examples of such formulas are given in Abramowitz & Stegun,
1964, p. 882. Linear interpolation between two points:
f(xo+Ph) = (l-F)f0f0 + Pflf0
-------
Three point formula (plane fitted to the data)
f (xo+ph,yo+qk) = (l-p-q)f + pf
Four point formula (hyperbolic paraboloid fitted to the data)
l-p)(l-q)f + p(l-q)f, n
LJ j vJ JL j O
pqf
f (xo+ph,yo+qk) =
lfl
Six point formula (general quadratic fit)
_ + [p(p-l) /2 ]f
pq - p2 - q2)f + [p(p - 2q +
v/ j \J A. j \J
[q(q - 2p + l)/2]fQ 1
In the above formulas f , represents the concentration
a ,D
of the pollutant at the point with coordinates x=a,y=b,
with the point (x ,y ) taken as (0,0). The points (x,y)
are on a rectangular grid spaced h and k units apart in the
x and y directions respectively. The parameters (p/q) are
usually confined to the range (0,1), but not necessarily
so depending on the formula.
The efficiency of the interpolation formula may be
evaluated by estimating the error that would occur. We use
the simple linear interpolation formula as an example to
keep the arithmetic within bounds and because it illustrates
the essential features of the problem. The mean square
difference between the actual value and the interpolated
value may be written as
E = (y-(l
where the over-bar indicates a suitable average value
-------
Expanding this, and assuming that the mean field has been
removed so that XQ, x , and y are departures from the mean,
then,
- 2(l-p)x7y" - 2p l^J + (1-p)2"^ +
To further simplify the situation, assume that the concentration
variances at P, where y is measured (the interpolated coordinate)
and at PQ where XQ is measured and P where x.. is measured are
all the same,, We denote this common variance by the symbol
a2 . Then note that
xTy = a2r(y,x0 )
Xjy = CT2r(y,xi)
x0X! = asr(x0 ,Xj. )
where r(a,b) is the ordinary correlation coefficient relating
concentrations at the points A and B, where a and b are measured,
respectively. Then
F"=2a2[l - p + P2 - (l-p)r(y,x0) - pr(y,Xl ) + p(l-p)r(x0>Xi ) ]
The correlation coefficients not only relate the measured
concentrations, but are also functions of the relative locations
at which the concentrations are measured. Thus, we write
r(y,xQ) = r[ph]
r(y,X;L) = r[(l - p)h]
r(xQ,x1) = r[h]
where h is the appropriate scale factor, the distance between
the points P and P . (p is a dimensionless parameter that is
zero at P and 1 at P-,.) Then the mean square error of the
interpolated value is E2". it is readily seen that at p = 0,
r[ph] = 1, and E2 = 0 and that at p = 1, r[(l - p)h] = 1 so
-------
that again E^ = 0 ; that is, the interpolation error is zero
at the two data points, which is as it should be.
The important feature of the above relation is that the
mean square error of interpolation depends on the correlation
coefficients as functions of the spacing between data points,
r[h], and of the distance of the interpolated point from the
data points, r[ph] and r[(l-p)h].
Consider now a particular example of a correlation
coefficient which is 1 at zero distance and reduces linearly to
0 at a distance 4, and we assume that I is larger than h, the
distance between P and P, . Then
o 1
r[h] = 1-hA
r[ph] = l-ph/4,
The mean square error of the interpolated value may then
be written as
& = 2a2p(l-p)hA
It is readily seen that the error of the interpolated value
is a maximum at p = | and has the value there of E2 = ash/2l.
Under these conditions, we have an explicit expression that
can be used to determine the spacing of the observation points.
Thus, if we specify the maximum allowable mean square error
of interpolation, I2", then the distance between data points
may not exceed the value h = 2£(P/o-2).
The value of I may be thought of as a "range of influence"
of the correlation coefficient. The larger the value of I,
the farther apart the observation points may be spaced. The
spacing also depends on the inherent variability of the data
through the term a2 . The more highly variable the data the
-------
closer the observation points to achieve the same maximum mean
square error of interpolation.
Other analytical expressions may be used for the
correlation coefficient and the location of the point of
maximum error may be found. It is readily shown that the
point of maximum mean square error is at p = | and that the
mean square error of interpolation will be given by
"i* = 2a2 jl - (l/4)[l-r(h)] - nh/2) j .
This expression makes it possible to compute the spacing
between observation points that must not be exceeded when a
mean square error of interpolation is specified and the
correlation coefficient function is known.
The more complicated interpolation formulas for a two
dimensional array of points lead to vastly more complicated
arithmetic, but do not change the essential ideas brought
out by the above elementary analysis. The main idea is that
the mean square error of interpolation depends on the structure
of the correlation coefficient as a function of the distance
separating the points at which the pollutant concentration is
measured. When this structure is known, the spacing of the
observation point to achieve a given mean square interpolation
error may be specified.
-------
B. Linear Regression As An Interpolation Formula
This section is devoted to an elementary derivation of
the linear regression estimate of pollutant concentration at a
point P based on observed pollutant concentrations at a network
of points P., i =1, --- , n. It is initially assumed that the
values of pollutant concentration at P are observed. The point
of view is then reversed and the regression equation is considered
from the point of view of an interpolation formula which is used
to estimate the pollutant concentration at P when it is not
observed there .
1 . Derivation of the Basic Relations
Let Y be the pollutant concentration at P and let X. be
the pollutant concentrations at points P. , where the P. are a
network of n observation points, i = 1, --- , n. To simplify
the situation we consider the standardized variables (departure
from the mean divided by the standard deviation)
y = (Y-Y)/dY , x± = (Xi-Xi)/ax , i = 1, — , n
and we consider the relation
y = bX+ ... + b
where y is the estimate of y given the values KI, --- x .
The least squares procedure for determining the coefficients b.
leads to the set of equations
(2)
(yxn) =
where the bar over the symbol indicates a mean value and
where y is assumed to be a measured value at P. Since
the variables are normalized, these are correlation coefficients
-------
It will be convenient to write these equations also in the
standard form
bla1
- g
(2a)
Vnn
and we note that the matrix of coefficients is symmetric,
The solution for the b.Ts may be written in the long
form using Cramer's rule
b. =
|(xf) , (x1xs), ,
\(xsxl ), (xf~) , , (yxj), , (x3x
n'
, --- , (IF)
nXl '
(*nXl ) , (xnx2 ) , --- ,
^ith column.
and also in the form (Kenney and Keeping, 1951)
. a
(3)
,
(3a)
is the element from the
inverse of the matrix of coefficient (&.;-;)• This follows
immediately from (3) by expanding the determinant in the
numerator in terms of the sum of the products of the elements
in the i'th column and the cofactors of this column. The
ratio of the cofactor of the element of the i'th row of the
j'th column to the value of the determinant yields the element
a1-1 of the inverse (Turnbull, 1960) . The solution for
finding y may then be written as
(4)
-------
The important point to be considered now is that the
correlation coefficients (y x. ) = g.^ (and also the correlation
coefficients (x.x.; = a..) are functions of the locations of
the point P and the point P. (or of the points P. and P.,).
•*• J
Since the points P^ are fixed, we focus our attention on
the point P and write
Then (4) may be expressed as
g (P,P )ald] = I §,(?,?.)[£ x.aij] (4a)
Thus, (4a) may be considered as an interpolation formula since
s\
the location of the point P at which y is estimated appears
explicitly in the correlation coefficient g.(P,P.). If we
know the functional form of the correlation coefficients as
dependent on coordinates, then (4a) may be looked on as deter-
/v
mining y from a linear combination of the observed concen-
tration x. with coefficients which depend on the location of P,
or it may be considered as a linear combination of correlation
coefficients, functions of the location P where the concen-
tration y is estimated, weighted by factors that depend on
the observed concentrations at the points P . .
-------
2. Interpretation of Results
The results of the simple linear regression for pollutant
/v
concentration y at a variable point P in terms of observed
pollutant concentrations x. at a fixed network of points P.
are discussed below. The point of view in this discussion
is that the linear regression is a particular realization of
a "Wiener filter" for the interpolation and smoothing of ob-
served information on pollutant concentrations (Wiener, 1949).
The elaborate mathematical apparatus of N. Wiener's original
treatment is abandoned in favor of a more general point of
view so that more general results are obtained (at least in
a limited sense). We discuss several particular situations
that illustrate the kind of results that can be obtained.
a) The point P at a point of P.
If the point P at which concentrations y are measured
coincides with a point of the observing network, say P^, and
if the data on y is identical with the values of x. at P. , it
is readily seen that yx~. = x. x. and that, from (3) b. = 1,
1 K 1 K
and b. = 0 if j ^ k. (In the first case the kth column of the
tj
numerator in (3) is exactly that of the denominator; in the
second the kth column of the numerator is exactly the same as
the jth column and hence the determinant has the value zero.) In
this case y = x. is the result of the use of the regression,
which is precisely what it should be.
-------
b) Continuous Correlation Coefficient
Consider the case in which the correlation coefficients
g.j(P,P.) are continuous functions of P and for which g. -* I for
J J J
P -» PJ. A one dimensional schematic of this situation is shown
in Fig. I-la. The use of the interpolation
0
0
(P - P.)
Fig. I-la. Schematic illustration of the Correlation Coefficient
g. (P,P.) as a function of (P-P.).
formula (4) or (4a) leads to a smooth interpolation of the data
at the points which lie between the data points ?i and the
interpolated values lie on a surface that passes through the
data values x..^. This is shown schematically in the Fig. 1-b for
a one dimensional situation.
Fig. I-lb. Schematic illustration of the interpolation between
data points for correlation coefficients that are continuous.
10
-------
c) Extremely Discontinuous Correlation Coefficient
The correlation coefficient as a function of the location
of P with respect to P. as in g.(P,P.) must be a continuous
J J J
function of P-P^, except that it may have a jump discontinuity
«J
at the origin P-P =0. An extreme case is that in which
J
g.(P,P.) = 1 when P = P. and is equal to zero if P £ P.. This is
J J ,"J J
illustrated in Fig. I-2a. The resulting interpolation for
0
0
P-P.
J
Fig. I-2a. Schematic illustration of the extreme discontinuity
possible for a correlation coefficient in which it has the
value 1 at P-P.=0 and has the value 0 at P-P. ^ 0.
this kind of a correlation coefficient is illustrated in Fig. I-2b,
The interpolated values are zero between the data points, but
at the data point the observed data values are obtained.
x
X
X
Fig. I-2b. Schematic of the interpolation between data points
when the correlation coefficient is that illustrated in Fig. I-2a.
11
-------
d)
Discontinuous Correlation Coefficient
When measured values are subject to independent, random
errors, the correlation coefficient has a, (small) jump discon-
tinuity at the P = P. which is dependent on the relative values
-------
e) Range of Influence
The "range of influence" of a correlation coefficient
is of importance in the discussion of observation network density,
This is loosely defined as the distance P - P. over which the
J
correlation coefficient is significantly different from zero.
A correlation coefficient with limited range of influence is
illustrated in Fig. 4a. When the data points are spread out
Fig. I-4a. Schematic of a correlation coefficient with a limited
"range of influence".
so that no one is within the "range of influence" of another,
the interpolation formula (4) and (4a) leads to results illustrated
in Fig. I-4b. It is immediately apparent that for an adequate
Fig. I-4b. Schematic of interpolation when data points are so
sparsely located that one is not within the "range of influence"
of another.
network of observation points, the distance between points
must be small enough that one or more data points must be within
the "range of influence" of some other data point.
13
-------
f) Small Scale Effects
In dealing with atmospheric problems, the influence
of small scale effects must be adequately accounted for (or
adequately smoothed out). These show up as a small hump on
the correlation coefficient peaking it sharply upward at
P - P. = 0 as illustrated in Fig. I-5a.
0
0
P - P.
Fig. I-5a. Schematic illustration of a correlation coefficient
showing both large and small scale effects.
The effect on the interpolation formula (4) or (4a) is shown
in Fig. I-5b.
Fig. I-5b. Schematic illustration showing the results of small
scale effects on the interpolation of data.
14
-------
g) Summary
The implications of the above illustrative examples for
the determination of observation network density are immediate.
The first result is that the network of observation points
should be sufficiently dense that each data point is within the
"range of influence" of another data point. The second important
result is that the effects of errors and small scale effects
must be carefully determined and the interpolation formula used
in such a way that there is adequate filtering of these effects
in interpolating the results of observations.
15
-------
3. Error of Interpolated Values
The mean square error of the interpolated values may be
written at once from the results of the least squares
estimation (interpolation) formulas (1), (2), (2a), (3), and
(3a). The expression for the mean square error is
E~2 = y2 - (blgl + — + bngn) (5)
E2" = y2 - E bigi (5a)
i
and using (3a) for the value of the b.'s, one obtains
lj
E2 = y2 - S Z g,ag (5b)
i j J
The expression (5b) gives the mean square error E3 in
terms of the mean square deviation of pollutant concentration
at the point P, y2 , the correlation coefficient involving con-
centration at the point P and the measurement points P. ,
g.(P,P.), and constants involving the geometry of the measure-
ment points alj . The value of the mean square deviation, y2 ,
is readily estimated from the field of observation point values
x2. . Since we have used the normalized form, the values of x2.
are all 1 and the value of ys would be taken as 1 also.
The expression (5b) may be written to show the dependence
of E2 on the location of the point P, by displaying this
dependence in the correlation coefficients g., g.
- Z Z g.(P,P.)aiJg (P,P ) (5c)
i j J J
16
-------
The mean square error of interpolation for pollutant
concentration is illustrated schematically for a one-dimensional
example in Fig. 1-6 (assuming a continuous correlation coefficient
function as in Fib. I-lb or Fig. I-5b).
0
Fig. 1-6. Schematic illustration of the mean square error of
interpolated values for a one-dimensional case.
a) Implications for Measurement Locations
The implications of the above for determining the location
of additional measurement points is reasonably obvious. We
assume that it is desired to find the location of a few more
observation points to improve the observation network. On the
basis of (5c) one may easily compute the field of values of E2
as a function of the location of an interpolation point P. The
point where E2(P) is a maximum is the point where there is the
greatest error in interpolated values. This could then be a
point where an observation of pollutant concentrations could
contribute the most information to the augmented network systems.
There are many practical considerations that must be
taken into account in the location of observation points. These
may be readily accounted for in the computation procedures. One
may, for example, prescribe in advance the location of many
"feasible" observation points and compute the mean square errors
17
-------
of interpolation at these "feasible" points. The "feasible" points
at which the mean square error of interpolation is largest
would then be candidates for the augmented set of additional
observation points.
Note that the selection of additional observation points
is a step-wise procedure. One locates the first additional
observation point where the mean square error of interpolation
is maximum (a "feasible" point). This point is then added to
the network so that there are now n+1 points considered and the
computation of E2 is redone on the basis of the augmented network.
The maximum of E2 is found for this augmented case and a second
(feasible) observation point located. The network now contains
n+2 points and ~& is computed on this basis, etc., etc.
(Note: This kind of iterative procedure can be shown
to lead to a solution that is not necessarily optimum in the
general case, but it is a very practical approach and the
results are usually not far from an optimum.)
18
-------
CHAPTER II
DETAILED FORMULATION OF THE PROBLEM
The problem of determining an optimum network of air
pollution observation sites consists, first of all, of
determining what particular characteristics of the pollution
field are of primary interest, how the data are to be used
to obtain an estimate of these characteristics, and then to
specify what it is that is to be optimized and with what
restrictions this optimum is to be obtained. An extensive
monograph might be written in the exploration of the various
aspects of the problems involved in the items that have been
enumerated above. Rather than go into all of this detail,
we will short-cut these considerations and set down some
rather simplified ground rules that will be followed.
A. Area Covered
In order to formulate the problem of specifying the
optimum location of air pollution observation stations it
is necessary to start by defining the specific area over
which the pollution is to be observed. This is sufficiently
obvious that it scarcely needs any explanation. If one is
to optimize the location of air pollution stations in the
area of Dirty City, they need to be located in the vicinity
of Dirty City and not in Clean County, because Clean County
is not the area with which we are concerned. In fact, one
must be even more specific and define exactly what constitutes
"in the area of Dirty City". This involves specifying a size,
shape, and location. It may be, for example, that Dirty
City is a compact area of more or less uniform diameter in
any direction with this area centered approximately at the
City Hall. In such a case, an adequate specification might
well consist of the statement that the area to be covered
is a circular area centered at the Dirty City City Hall
19
-------
with a radius of 23 km (say). This implies that one is
to optimize the location of air pollution observation stations
throughout this particular circular area. It may well be
that one is also interested in air pollution measurements
at distances farther than 23 km from City Hall, but these
locations will not be a part of this particular air pollution
station location problem.
B. Observation Collection
It is assumed that the data collected from the air
pollution measuring sites will be measured simultaneously
at all locations. The term simultaneous is used somewhat
loosely and depends to some extent on the nature of the
measurements to be used. If, for example, one is concerned
with one-hour integrated air pollutant concentrations, the
averaging should be done over the same clock hour at all sites;
but a difference of five minutes or so between stations for
the beginning and ending of this one hour integration period
is relatively unimportant. In other words, the differences
in time of beginning and ending of the totalizing interval
are a relatively small fraction of this interval itself. If
the averaging is considered on a 24 hour basis, then the
exact beginning and ending of the averaging interval at each
location can be correspondingly relaxed. On the other hand,
if observation data at the different locations are taken at
different clock hours, then it may well be that a comparison
between stations is meaningless.
20
-------
C. Source Locations, Types, Rates
It is assumed that the locations, types and emission
rates of the pollution sources are known or at least can be
estimated with reasonable accuracy. The basic reason for this
is the fact that it is completely impossible to even approach
the problem of the intelligent location of pollution obser-
vation sites without this kind of information. If one were
interested in pollution at only one point, then obviously
one locates a measuring site at this point. When the problem
is to locate a network of measuring points, then the question
immediately arises concerning the relation of pollution
measurements at the several points with each other. If two
(or more) pollution measurements essentially duplicate each
other, then one is unnecessary (or at least of much less
value). On the other hand, if measurements at two locations
are completely unrelated to each other, then obviously
additional measurements are desirable between them. The
locations, types and emission rates of the sources involved
are quantities which determine how the measurements at related
points are going to be related, or unrelated, to each other.
A more exact specification of the relationship between the
pollution measurements at different locations is discussed
at some length in Section G below.
D. Meteorological Conditions
The meteorological conditions that prevail over the
area concerned are responsible for determining how the
pollutants are carried from their source locations to other
points and whether they tend to be concentrated near the
ground or carried high into the atmosphere. The principal
factors concerned here are the winds in the lower atmosphere
and the stability (or unstability) of the lower layers of the
atmosphere. These conditions need to be carefully specified
21
-------
and should adequately describe the conditions over the area
concerned. If in the area concerned the wind has a strongly
predominant direction, it would seem reasonable to locate
pollution observation points down-wind from the more important
pollution sources. Since stable conditions tend to hold
pollutants near the ground, it would seem reasonable to locate
observation points in the down-wind direction from the important
sources where this wind direction is that which prevails
when the air is stable. Consequently, the meteorological
conditions need to be specified as the frequency of occurrence
of various classes dependent on wind direction, wind speed,
and stability.
E. Type of Pollution Measurements
The type of pollution measurements to be made is another
factor contributing to the optimum location of pollution
measuring locations. If hourly average concentrations are
of primary interest, it would appear necessary to have more
closely located observation points than if, say, 24 hour
average concentrations are considered the more important.
F. Interpolation/Extrapolation Methods
The method that is used to interpolate data between
observation points is critical for the formulation of the
optimization technique for site locations. There is an
almost limitless number of techniques that may be used. The
method of interpolation used in this study consists of using
a linear estimate of the pollution concentration based on
the concentrations at the points where the concentrations
are observed and for which the correlation coefficient
function or covariance function is known (or at least can
be approximated with reasonable accuracy). Various aspects
of this process were discussed in Chapter I with illustrations
22
-------
in the case of a one-dimensional field (to simplify the
figures). This provides a highly flexible procedure that can
be adapted to a wide variety of different situations.
The interpolation/extrapolation method then consists
of estimating a measure of pollutant concentration at a point
/\
(£,n) in the region to be covered, say Y=Y(£,n), from a linear
combination of observations, X. = X(£.,n.) made at observation
points P. with coordinates (£.,n-). The formula for this
estimate is
* = bo + blXl = — + bnXn (1)
(where it assumed that there are n observation points). The
coefficients b., i=0/l, , n, will be determined later.
The numbers that describe the pollution concentration vary
over several orders of magnitude and are by no means normally
distributed, the logarithm of the pollutant concentration is
actually used as the measure of concentration. The logarithm
of concentrations is more nearly normally distributed (i.e.,
the pollution concentrations tend to be distributed in a
log-normal way).
It is assumed that reasonably accurate estimates of
the mean concentration measure are known. These will be
denoted by Y, X-, , _,_ X so that one may use the departures
i ~ ~ n _ _
from the mean, y = Y-Y, x, = X,-X,, , x = X -X with
the result that
y = b,x, + + b x (2)
11 n n
23
-------
The coefficient b that appeared in (1) is determined from
Y = bo + b^ + — + bnXn (3)
after the coefficients b, , , b have been determined.
1 n
(Actually, the problem at hand does not require that any
of the coefficients be computed explicitly; the technique
used is more clearly described if they are included at this
point.) If one had an observation point at the point P,
/\ /\
coordinates (E,,r}), where Y (or y) is being estimated, the
observed value would be Y (or y). If we assume that
actual values y, x, , , x are observed in an ensemble
of situations that have been defined by the conditions of
Sections C and D above, then the coefficients b,, , b
would be determined from the normal equations
bn(xnxnj
where (x.x.) is the covariance of the concentration measure
at the points P. and P. and (x.y) is that for the points
P. and P. The normal equations (4) are obtained by requiring
that the square of difference between the concentration
measure and the estimate from equation (2) summed over
the ensemble of situations, be a minimum in terms of the
coefficients b,, , b . (See standard texts on this
subject, for example Kenney and Keeping, 1951.)
24
-------
If we let a. . = (x.x.) and g. = (x.y) , then equations
(4) may be written as
(5)
These equations may be solved for the coefficients b, , --- ,
b , the formal solution being expressed as
b^ = 'Ea^g-, i/j=l/ --- , n (6)
j
The mean square error of estimate is given by the
expression
2 2
e = (-r
where y is the expression (2). This may be expressed in
the form
e2 = (y2) - 2>igi (7)
i
(Kenney and Keeping, 1951, or other suitable text). When
the expression (6) for the b.'s is substituted into (7),
the result is
e2 = (y2) - Z£g,aijg.. (8)
2
In (7) and (8) the term (y ) is the variance of the pollution
concentration measure at the point P.
(Digression. The quantities y, x. above have been
expressed in terms of departures from the mean as per the
expressions in the text between equations (1) and (2).
25
-------
The quantities (y ) and (x.x.) are the variances of the con-
centration measures at the various points__concerned. In
2 2 2
terms of standard deviations, a and a., y =a , (x.x.)=a. .
These may be used to convert the concentration measures to
standardized form (Y-Y)/a, (X.-X.)/a., and the argument
remains unchanged with the exception that in (7) and (8)
2
the value 1 is substituted for (y ) and the error of estimate
2 22
e becomes e /a .)
The role of the coordinates of the points concerned in
the preceding relations is important, but is concealed in
the notation. The covariances that appear in the normal
equations (4) and subsequently are functions of the locations
of the points for which the index number appears as a sub-
script with the exception of the point P for which no sub-
script appears. Thus a., = x.x.. = a..(P.,P.) or
= ai;. (£>i,ni;Cj/nj) while g± = x^y = g^P-j^P) or = g± (^ ,1^; 5 / n)
The elements of the inverse matrix, a1-1 , are exceptions to
this. In the process of matrix inversion, all of the elements
of the matrix are involved so that each element of the inverse,
a1-1, involves all of the points P,, --- , P , but not the
point P at which the concentration measure is being estimated.
This point appears only in the terms g. = g. (P. ,P) . If one
substitutes the expression (6) for b. into the equation for
the estimated concentration measure, (2), the result is
y =
i j
where in the second line the location of the point P at
/\
which y is estimated is explicitly shown. If the covariance
26
-------
function gi(P.,P), as a function of the points P. and P,
is known or can be estimated reasonably well, it may be
/^
interpreted as an interpolation formula for y based on
a linear combination of the observed concentration measures,
x., at the points P.. The expression in parenthesis gives
the "weight" of each observed measure, x., in terms of the
location of the point P with respect to the other points,
P., i=l, , n. This is quite analogous to the standard
two-dimensional interpolation formulas as for example those
shown in Section A of Chapter I.
One may also note that if in (9) the point P is the same
as the point P, , then g.(P.,P) becomes g.(P.,P,). Now
K D J D J K
by virtue of its definition [preceding equation (5)] this
is only another way of writing a., , that is
The summation in parenthesis in equation (9) then reduces
to
Z>ajk = 6ik
where S.,=0ifi^k and = 1 if i = k. This means that
1JC
the expression (9), when summed on index i, ignores all of
the observed values at other points and assigns to the point
A
P (=Pk) the estimate y=x,, i.e., the value observed. Note,
however, that the covariance function 9],(pv'p) is not
necessarily continuous at the point P, , so that as P approaches
/\ JC
P, the value y need not approach x, . The details of this
situation were discussed in Chapter I.
27
-------
G. Optimization Method
The expression for the mean square error of estimate
given by equation (8) forms the basis of the optimization
method that will be used. The development of this relation
was based on the assumption that the covariance of the con-
centration measurements between any two points, P. and P.,
was known, a. . -x.x., so that the elements of the inverse
in
matrix, a , could be found. It was also assumed that the
covariance function for concentration measure between an
observation point, P., and any other point, P, in the area
concerned was also known or at least could be reasonably well
estimated. (The details of how these assumptions are realized
are discussed in Chapter III.) The equation (8) then permits
one to compute the error of estimate using (9) as an inter-
polation formula for any arbitrary location of the point P.
One may say that the selection of the n observation
points P, , , P has been chosen in an optimal^ manner
in „
if the largest mean square error of estimate, e , at any
point P not an observation point has been reduced to a satis-
factory level. There are many ways in which such an optimi-
zation procedure can be carried out. The one adopted here
is a one-step-at-a-time add-on method. It is presumed to
start with that there are a few existing observation points
or at least a few points at which observations will be made
based on prior considerations. On the basis of this given
starting network of observation points, the equation (8)
is used to locate the point at which the error of estimate
is largest. This point is then accepted as a best location
for a new observation point. This point is then added to
the list of observation points that are used to determine the
error of estimate from equation (8). The process is then
28
-------
repeated; the location of the point of maximum mean square
error is found, a new observation point is located, etc.
The iterations of this process are terminated when the largest
mean square error of estimate has been reduced to an acceptable
level.
29
-------
CHAPTER III
THE CORRELATION OF POLLUTANT CONCENTRATION BETWEEN POINTS
In order to implement equation (8) of Chapter II for
use in estimating the mean square estimate, it is necessary
to know the correlation coefficient for pollutant concen-
tration measures between observation points and between an
observation point and an arbitrary point in the area of
interest. The use of actual data from past measurements of
pollution concentrations is desirable, but, in dealing with
an area over which observations have not been made, it is
not possible to follow such a procedure. Consequently,
the correlation coefficients for pollution concentrations
were determined by using a model to generate synthetic data
and the correlation coefficients were computed from the
synthetic data.
A. The Model Used to Generate Synthetic Data
The model that was used to generate the synthetic
data on which the correlation coefficients were based was
a simple Gaussian Plume model described by equation (3.2) ,
page 6, of Turner (1970) as follows
X(x,y,0,H) = (Q/Trayazu)exp[l/2{(y/ay)2+(H/az)2] (1)
where (x,y) are coordinates of the point at which the con-
centration X(x,y,0,H) is calculated (x is measured down-wind
from the source, y is measured cross-wind from the down-wind
axis), Q is the source strength, H is the source (stack)
height, u is the wind speed, a , a are dispersion coefficients
The dispersion coefficients, 0,0, were computed from the
formulas developed by Eimutis and Konicek (1972).
30
-------
The wind conditions were computed on the basis of a
double circularly normal density function. This is expressed
as
f(u,9)udud9 = [k1f1(u,6;w1,1,a1)+k2f2 (u,6;w , 2,a2) ]udud9
where k,,k_ represent the proportion of the time the wind
is in the state described by f,(u,0; ) and f2(u,6; )
respectively and k,+k2=l. The functions f,(u,8; ),
f_(u,9; ), are the circularly normal bivariate density
functions with parameters as shown after the semicolon, but
expressed in terms of wind speed, u, and direction, 9, instead
of in terms of rectangular wind components. Thus
f (u,9;w,c|),a) = (7ra2)~1exp{-a~2 [u2+w2-2uwcos (9-) ] }
where u is the wind speed, 9 is the wind direction, w is the
mean resultant wind speed, is the mean resultant wind
direction, and a is the vector standard deviation. (See
Brooks, et al, 1946, or Brooks, et al, 1950.) A single
circularly normal density function for wind at St. Louis
is inadequate since there are two distinct modes of the
over-all wind density. One of these is at w.=7.90, $,=198.5°,
a1=6.5, k1=0.4 while the other is at w2=9.00, <}>2=3250,
a2=6.50, kp=0.6. Stability class 4 was used throughout
(Turner, 1970).
The use of the probability density function for wind
enables one to use far fewer parameters to describe the
wind direction and velocity frequency tables. For 16
direction categories and 9 speed categories, a total of 144
31
-------
entries are required. In this case of a bimodal density
function, only 8 parameters are needed while the speed and
direction categories may be divided into arbitrarily small
intervals.
There were 21 pollutant sources used to compute the
correlation coefficients. Their locations, strength and
stack height are listed in Table III-l. Correlation coefficients
between points were computed for a 9X9 point grid of locations
centered at the "arch" in St. Louis. The spacing between
points was 10 km. Thus an 81X81 matrix of correlation co-
efficients was obtained. (See Section D below for further detail
of the program.)
Since the logarithm of pollutant concentration rather
than concentration itself was to be used in computing the
correlation coefficients, a very small background pollutant
concentration was added in each case to avoid the difficulty
presented by taking the logarithm of zero.
B. Examples of the Correlation Coefficients
Some of the correlation coefficients obtained in the
way described in Section A are illustrated in Figures III-l
and III-2. The points of the grid were numbered serially
starting in the lower left corner with point no. 1 and num-
bering upward in each column. Thus the bottom row of points
are those numbered 1, 10, 19, 28, 37, 46, 55, 64, 73. The
top right corner is point 81. The arch is at location 41.
Actual correlation coefficients for observed 24 hr SO-
concentrations were also computed using information from the
St. Louis 1964-1965 sulphur dioxide field data (Ruff, 1973a).
32
-------
TABLE III-l
SDURCc STRENGTH 3flR
y$ HS
168. 183.
16*. 48.
I5t. 71.
1U8. 65.
1U7. 66.
1*5. 1,3.
ii,?. 71.
ii,w. 9.
i4(,. ?i.
130. 108.
120. 1C9.
11*. IQD.
150. 8<».
15^. ia.
li»9. 10.
1!»9. 10.
1*9. 10.
1*9. 10.
139. 10.
139. 10.
129. 10.
MINIMUM CONCENTRATION = l.OCOE-20
(1) The data were provided by EPA from the 1964-1965 St.
Louis Air Pollution Study.
(2) Source strength and meteorology were assumed to be
independent.
(3) XS, YS are the coordinates of the source locations in
grid units. HS is the source height in meters. Source
strength is in units of grams per second.
POINT
i
2
5
6
7
8
9
10
11
12
13
1<*
15
16
17
13
19
20.
23
STRENGTH
• 36oo£O03
.1730t+03
• 2- <+ Jc fO 3
.5595EO4
. 2i»3^L *0 i#
.^903E<-32
.5900^f 32
. 120Jc*33
. 120 JEi-33
• 120 Of.*!)*
. 1 2 0 0 r. O "»
.1203L+-03
. 12GOt O3
.120Q£f03
.1200^03
Xb
89.
ICQ.
9i!
95.
9n .
93.
90.
67.
8*.
80.
75.
106.
7b.
bo .
93.
1 Ofr.
«fa.
^b .
115.
33
-------
lORR.COtFF.
-.576^
-.5fa2
cOIATION fO
623.1 -.56^ 4 -.375
V
.194 .176
.097
4
063
^ -if -or -+4 ij
FIGURE III-l. Unsmoothed Contours of Correlation
Coefficient Centered at Location 40
34
-------
;apR.co£FF. MATRIX LO;
510
-.044
-.751
-.7*6
FIGURE III-2. Unsmoothed Contours of Correlation
Coefficient Centered at Location 42
35
-------
The correlation coefficient contours from actual SO-
measurements are shown in Figures III- 3 and III-4 for
locations No. 7 and 12 respectively. Location No. 7 has
coordinates X=18.2, Y=22.50 and is described as "Power pole
with transformer in front of 1914 Obear St., between 20th
and Blair". Location No. 12 has coordinates X=22.48,
Y=18.64 and is described as "Pole South of East Side Health
District, 628 N. 20th St., East St. Louis, supplies power
to building approximately 100" east of street". The scale
is shown on these figures in the lower right hand corner
since it is quite different from that of Figures III-l and
III-2. On these charts, the entire area corresponds to
an area approximately 3 squares wide and 2 1/2 squares high
on the preceding charts. The station numbers and locations
for Figures III-3 and III-4 are shown in Figure IV-5, p. 92.
Figures III-l and III-3 together with Figures III-2
and III-4 illustrate the large changes that occur in the
correlation coefficient contours when the location of the
point with which all other locations are being correlated
crosses the central part of the area in which the pollution
sources lie.
C. Interpolation of the Correlation Coefficients
The use of equation (8), Chapter II, to obtain the
mean square error of estimate of pollutant concentration
requires that the correlation coefficients be those between
selected observation points or between observation points
and an arbitrary point. The correlation coefficients
(synthetic) calculated are those for points on a 9X9 grid
with 10 km separation between rows/columns of points. To go
from the latter to the former requires that some kind of
36
-------
ST.LOUIS SO? POLLUTION STATISTICS
STATION r CORRELATION C.OFFF 1C TF-NT
FIGURE III-3. Observed Correlation Coefficient of 24 hr. SO,
with that at Station No. 7. Similar to the Simulated
Correlation Coefficients in Figure III-l.
37
-------
ST.LOUIS SOS POLLUTION STATISTICS
STATION 12 COOPELATTON COEFFICIENT MATRIX
•-.051,
."00
FIGURE III-4. Observed Correlation Coefficient of 24 hr. SO,
with that at Station No. 12. Similar to the Simulated
Correlation Coefficients in Figure III-2
38
-------
interpolation procedure be used. The method finally adopted
is based on the proper function/proper value representation
of the correlation coefficient matrix.
To illustrate what is involved in the interpolation
process, it is desired to obtain the correlation coefficient
r(x.,y.;x., y.) where (x.,y.) are the coordinates of one
observation site, P., and (x.,y.) are the coordinates of
another observation site, P.. We have available correlation
coefficients between points on a 9X9 grid. Let these points
have coordinates (£ ,r\ ), (£ ,n ) so that the correlation
coefficients are functions r(£ ,n '^n'^)' One ^s tnen faced
with an interpolation procedure that involves four separate
coordinates. This in itself is a reasonably formidable task.
In view of the great irregularity of the correlation field,
as illustrated in Figures III-l and III-2, the task is even
further complicated.
The proper function/proper value representation of the
correlation coefficient matrix reduces the problem to that
of performing two interpolations, each in a two-dimensional
field, in succession. It also has the advantage that a
certain amount of smoothing of the correlation coefficient
field is being done at the same time. The details of this
process are discussed in Chapter IV, so that only a brief
extract of the essentials is given here. The essence of
the situation is that the correlation coefficient matrix
with elements cmn=xmxn=r^m'nm;^n'nn^ may be rePresented in
the form
Xk*k (?m' V *k (^n' V (2)
k
where, on the right hand side the summation over the index
k involves the proper values X..,X2, , A, , , X ,
39
-------
all positive numbers and in decreasing order, ^-\>^j --- >^ /
and the proper functions ,(£ ,r\ ), one for each of the
K. HI in
proper values. (These are also referred to as eigenf unctions
and eigenvalues, principal values and principal components,
empirical orthogonal functions, etc.) These are computed
at the points of the 9X9 grid over which the matrix of
synthetic correlation coefficients was obtained (81x81) . Each
of these is dependent on only the two coordinates of a grid
point. Then to obtain the synthetic correlation coefficient
between observation sites a. . =r (x . ,y . ;x . ,y . ) one simply
interpolates among the grid points to obtain each v(x.,y.)
K 1 1
and each , (x.,y.) and substitutes back in equation (2).
K J D
To obtain the factors g., g. that appear in (8), Chapter II,
one notes that these are also correlation coefficients, but
involve an observation site (x.,y.) and an arbitrary point
(x,y) . The coordinates of the arbitrary point are simply
used to get an interpolated value of
-------
(1,0) (1,1)
-•
N»(o,D
( B )
( C )
FIGURE III-5
THE SIX POINT ARRAY FOR QUADRATIC INTERPOLATION (A) AND
THE THREE 90° ROTATIONS ABOUT (0,0). THE AREA OF BEST
INTERPOLATION IS INDICATED IN EACH.
41
-------
The arrangement of the points is shown in (a) of Figure III-5.
This formula gives the best results over the triangle bounded
by (0,0), (1,0), and (0,1) (shaded in the figure). In order
to maintain this degree of interpolation accuracy, corres-
ponding formulae were written for the three 90° rotations of
this unsymmetrical array of points as shown in (b), (c),
and (d) of the figure. To carry out an interpolation, the
point (0,0) on the grid was located so that (x,y) fell within
or on the boundary of one of these triangles and the corres-
ponding interpolation formula was applied.
D. Generation of Synthetic Pollution Correlation Coefficients
The synthetic correlation coefficients for two-point
pollutant concentrations were generated by program PSCOV
which is reproduced on the pages following this discussion.
The pages following the program contain a print-out of the
input parameters that were used.
1. Main Program
The input parameters are read into the program in
lines 15 through 87. The wind selector, JW, was used (line 20)
to provide for wind information from either a frequency
function (JW=1) or from frequency tables (JW=2) and provided
a termination of the computation (JW=0). The subdivision
of wind directions and speeds, stability classes, and
inversion heights were read in on lines 29 through 35. The
parameters of the wind frequency function are read on lines
45 through 53 (option JW=7, that actually used). The provision
for the wind frequency tables is contained in lines 55
through 62. The ground coordinates of the locations at which
the correlation coefficients are to be computed are read
42
-------
on line 67. Pollution sources, strength, effective stack
height, and coordinates are read on line 82. A minimum
pollutant concentration was read on line 86. The purpose
of this was to provide a very small, but non-zero pollutant
concentration at all points in all situations so as to avoid
the possibility of an impossible logarithm of pollutant con-
centration which might occur in subsequent calculations.
The accumulation of data is initialized on line 90.
The data accumulation is carried out in a sequence of
nested DO-loops beginning on line 96 and ending on line 170.
Subroutines FREQF (line 112) and SIGMAS (line 152) are
called in this section and are discussed subsequently.
The statistical parameters are accumulated and reduced
in the section from lines 174 through line 190. The section
from line 190 to the end consists of output instructions.
Subroutine MAPI is used to printout the correlation co-
efficients, covariances, and other parameters in a "map"
format so that contours of equal correlation coefficient (for
example) could be rapidly drawn.
2. Subroutines
a) Subroutine FREQF
The wind frequency function (probability density) is a
combination of circularly normal bivariate distributions
f(v,6) = ^f (v,e;w1,F1,a1) + ... + *nf (vf 9;wn, W
in which k + ... + kR = 1, the ^'s being the fraction of
the total wind population in the ith category and the
parameters w. , 6". ,a. are the speed and direction of the
mean wind vectors and the vector standard deviation of wind
speed in the ith category. The probability density functions
43
-------
on the right differ only in the values assigned to the
pa rame te r s. Thus
f (v,0;w,e", ) = (ira2)~1exp{-a~2[v2+w2-2vwcos(e-'e) ] }
The common multiplier for all the density functions, vdvdG,
is omitted in the above. It has been found that the most
complex wind distributions can be represented succinctly
and with reasonable accuracy in the above form. This saves
storage of extensive tables of observed frequency distributions
b) Subroutine SIGMAS
This subroutine provides for the computation of the
"sigmas" that appear in the pollution concentration formula.
The data were taken from E.G. Eimutis and M.G. Konicek,
Derivations of Continuous Functions for the Lateral and
Vertical Atmospheric Dispersion Coefficients, Atmospheric
Environment, Pergamon Press, 1972, Vol. 6, pp. 859-863.
c) Subroutine MAP
This subroutine prints out on the standard page printer
the data input to it at the proper coordinates (as nearly
as the printer will allow). Since we are primarily interested
in correlation coefficients, the decimal point serves not
only to fix the magnitude of the quantity concerned, but
also as a "fix" for the point location.
44
-------
PROGRAM PSCOV
PROGRAM PSCOVM INPUT, OUTPUT,TAPE2l
C COMPUTES MEANS, STO.DEV. ,COV. ,ANO COR. OVER POINTS FROM
C UP TO 30 POINT SOURCE INPUTS.
C
5 COMMON F(18,10, 6) , X(100) , Y(tOO) ,QS(3Q) ,XS(30) ,
* YS{30l,HS<30),CaAR(10ai , CCC V 15050 1, CH ITL ( 100 1,CSIG{ 1001,
* CCOR(50501
COMrtON/FFF/IH(61, W«(6,4I ,H3AR( 6 ,<*) ,T8AR ( 6, 41 , HSIG(6,4I
* ,OIR(18l,l/£L(101 ,IO,IV
10 COM*ON/STA9/AL(6) ,ISTAB(6I
DIMENSION KNTC10) , ARRAY! 10 01 , I TYPE (4)
DIMENSION XK100) , YK100)
PI=3. 1415926535
RAO=0. 0174533
15 C
C INPUT QUANTITIES
C JH=HINO DATA SELECTOR* JW=0 FOR STOP,JW=1 FOR FREQ. FUNCTION,
C JW=2 FOR TABLES
C
20 10 REAO 1005, JH
1005 FORMATU5)
IFUW.EQ.OI STOP
C
C IQ.NO.OF WIND DIRECTIONS* OIRC INDIRECTION TABLE, MAX*13
25 C IV*NO.OF WIND SPEEDS, VELfI)*SPEEO TABLE, MAX=10
C IST»NOt OF STABILITY TYPES, ISTABCIlsSTABILITY TABLE,
C AL(I)*INVERSION HEIGHT, MAX = 6, 0. = NO INVERSION
C
REAO 1010,10, (OIR(I) ,1=1,10)
30 REAO 1010,1V, UEL(I> ,1=1, IV)
1010 FORMAT(I5/(8F8.2) I
RE A 3 1015, 1ST
1015 FORMATII5I
REAO 1116, < 1ST ABC II ,AL< II, 1=1, 1ST I
35 1116 FORMAT! 5CI5,F10.2l)
GO TO <15,25),JH
C
C READS FREQ. FUNCTION DATA
C IW
-------
PROGRAM PSCCW
C F(I,J,K)=FREQUENCY TABLE ENTRIES
C
25 DO 30 1=1,1ST
CO 30 J=1,IO
fcO REAO 1025,(F(J,K,I),K=1,I\M
30 CONTINUE
1025 FORMATC9F10.0)
C
C REAOS GROUND COORDINATES
65 C IG=HO. OF GROUND POINTS,X(I),Y(I)=COORCINATES
C
35 REA3 1030,IG,(XI
i030 FORMAT(I5/(2F10.0))
11 = 0
70 00 36 1=1,IG,9
00 36 J=1,IG,9
11=11+1
X(III=XICI)
36 Y(II)=YKJ)
75 C
C REAOS SOURCE POINTS AND PARAMETERS
C IS=NO. OF SOURCE POINTS
C QS
46
-------
PROGRAM PSCUi/
lie TO (55,60) JW
55 CALL F*EQF< 11,12, 13, FF)
GO TO 65
60 FF=F(I2,I3,I1)
115 65 IFCFF.tQ.O.) GO TO 115
SUHF=SUMF+FF
C
C LOOP ON COORDINATES
C
INO£X=0
uO 110 I*t=i,IG
YY=Y(I<»»
C
125 C ROTATES TO WINu COORDINATES
C
X1 = XX*COYY»SS
Y1=-XX*SS*YY*CC
C
130 C LOO? ON SOURCES
C
SUM=CHIMIN
00 100 15=1, IS
XX3=XS
-------
PSCOtf
C
C £NO LOOP ON COORDINATES
110 CONTINUE
C ENO LOOP ON SPEED, DIR., AND STA3ILITY
170 115 CONTINUE
C
C PUTS STATISTICS IN STANDARD FORM ANO UNITS
C
IDX = 0
175 DO 120 1=1,IG
C6A<(I)=CBAR(I)/SUMF
00 120 J=1,I
IOX=IOX*1
120 CCOV(OX)=CCOV , M3AR (I , J ) ,MSlG (I, J)
205 CONTINUE
20<*5 FORMAT(I10,<»F10.2)
2?0 GO TO 220
48
-------
PROGRAM PSCOV
210 PRINT 2050
2050 FOR*AT<»OHINOS FROM FREQUENCY TABLES*)
00 215 1=1,1ST
PRINT 2055,ISTA3(I)
225 2055 FORMAT<*OSTABILITY CLASS',15)
PRINT 2060,CVELCJ),J=l,Itf)
2060 FORNAT(8X,10F8.2)
00 215 K=1,ID
PRINT 2065,OIR
-------
PROGRAM
PSCOtf
00 250 J=1,I,8
J1 = J
J2=J*7
IFU2.GT.I) J2 =
280
285
290
295
300
305
310
315
320
325
330
00 2i»6 JJ=J1,J2
IOX=IDX*1
*=<«•!
KNT(K)=JJ
2<»8 ARRAVIK)=CCOVUOX)
PRINT 2119,
-------
PROGRAM PSCOV
INO=IOX
INC = I
11=1*1
00 320 L=I1,IG
335 ARRAYIK)*CCOV(IND+INC)
INC=INC+1
320 K=K*1
ENCOOEl2flf3000,ITYPE(3ll I
3000 FORHAT(*LOCATION*f I3.9X)
CALL MAP1(IG,X,Y,ARRAY,2,ITYPE»
330 CONTINUE
335 ITYPEI1)»10HCORR.COEFF
ITYPE(2)=10H. MATRIX
IOX = 0
00 360 1=1, IG
00 3<*0 J=l,I
350 ARRAY IK) «CCOR(IOX)
3^0 K=KH
IND=IOX
INC=I
11=1*1
355 00 350 L=LltIG
ARRAY (K)=CCOR(INO*INC)
INO*INO*INC
INC=INC»1
350 K=K*1
360 tNCODE(20,3000,ITYPE(3)J I
CALL MAP1(IG,X,Y,ARRAY,2,ITYPE1
360 CONTINUE
GO TO 10
END
51
-------
SUBROUTINE FREQF
SUBROUTINE FREQF(I1,I2,I3,FF)
COMi10N/FFF/IW<6» , UK »
GO TO 20
20 18 OV=(VEL(I3*l)-»/ELCI3-U)/2.
20 J2=IW(I1»
00 22 J=1,J2
SIGSQ= MSIG(I1,J)»»2
25 W1=W8AR(I1, J»
F=CVl**3+Hl**2-2.*tfl*Wl*COS(RAD*OIR(I21-RAD*TBAR(Il, J)))/SIGSO
F=W
-------
SUBROUTINE SIGHAS
SU3ROUTINE SIGMAS(IltISTa,X,SIGY,SIGZfH)
DIMENSION A 161 ,A1 (18) ,81 (18) ,C1 (18 )
COMNON/STA8/AL (61 * ISTA8 <6I
DATA (Af I) , 1*1,6) /. 365 8, 0.2751,0. 2 089, 0.1471,0. 1046, 0.0722/
5 DATA (AKI), 1=1, 18) /O. 000 2<*,0 .0 55, 0. 113 ,1.26,6. 73, 18. 05 ,
* 0.0015,0.028,0.113,0.222,0,211,8.086,
* 0.192, 0.156, 0.116, 0.079,0. 063, 0.053/
DATA Cai(I) ,I»1,18) /2.09i», 1.098, 0.911, 0.516, 0.315, 0.180,
* 1.9<»1,1. 1*9, 0.911, 0.725, 0.671, 0.740,
10 * 0.936, 0.922, 0.905, 0.111,0. 171, 0.81W
DATA
-------
SUBROUTINE MAP
10
15
10
15
20
25
30
35
C
C
C
20
SUBROUTINE MAP MATRIX(I)
FORMAT(5X, 1H*,F6.3,8X)
GO TO 200
£NCODE(20,1016,ARRAY(N,L)» MATRIX(I)
FORMAT(6X,1H»,F6.3,7X)
54
-------
SUBROUTINE MAP
GO TO 200
127 £NCOO£<20,1017,ARRAY(N,U) MATRIX(I)
1017 FORMAT<7X,1H*F6.3,6X)
GO TO 200
60 128 £NCOD£(20,1018,ARRAY,N=1
70 210 CONTINUE
2004 FORMATCX,13A10>
RETURN
ENTRY MAPI
11*0
75 IS=9
00 330 N~1«NP,9
I=IS
00 320 NN*1»9
11*11*1
80 X(II)»MATRlxm
320 1=1*9
IS=IS-1
330 CONTINUE
PRINT 2002,KIND
85 PRINT 2006, IX(I>,I = 1,NP)
2006 FORMATOF10.3/////1
RETURN
END
(Input data on point source statistics is listed in Table III-l,
p. 33)
55
-------
CHAPTER IV
THE ANALYSIS OF THE COVARIANCE MATRICES
In the use of the covariance function or correlation
coefficient function it is important to evaluate the part
that may be due to small scale effects and to random errors.
The problem is analogous to the problem in communication
theory in which a signal is observed in a noisy background.
In order to determine the signal, it is also required that
a considerable amount of information be available concerning
the nature of the noise.
The situation in general was described in qualitative
terms in Chapter I where the effect of a jump discontinuity
at zero distance or a part with small "range of influence"
on the statistical interpolation were discussed. Up to this
point the method of finding the magnitude of this discon-
tinuity at zero distance or the magnitude of the effects
with small "range of influence" have not been discussed.
They are the specific subject of this chapter.
A. Evaluation of the Effect of Errors of Measurement or of
Small Scale Phenomena
To evaluate the effect of errors of measurement or
of small scale phenomena, we carry further the procedures
that led to the mean square estimate of Chapter II, equation
(8) . The expression for the estimate of pollution concentration,
^
y, at P was given in (9), Chapter II, as
where the x.'s are the observed concentration measures on
a particular occasion, g.=(x.y) is the covariance of the
i~i
concentration measures at P. and P, a J is an element (row
i, column j) of the inverse of the covariance matrix {a. .},
56
-------
a. .= (x . x . ) , where a. . is the covariance of concentration
measures at P. and P.. It was pointed out following
equation (9) of Chapter II that if we let P^Pi,/ an ob-
servation point, and assume that g.+a., at the same time,
XV ] JK
then y+x, , i.e., the estimate at P approaches the observed
value at P, when P approaches P,.
We consider now the situation illustrated in Figures
I-3a, I-3b and Figures I-5a, I-5b. To express the ideas shown
there qualitatively we need an explicit formulation for the
covariances g.=g. (P.,P). Thus, let
(1) (2)
q . = g . + q .
yD y: yn
(1)
where g. is the part of the covariance g. which describes
the overall variation of g . as a function of the location
D (2)
of P with respect to P . while g . is the part that represents
the amount of discontinuity in g . at P . on the one hand or
the part of g. that has a limited "range of influence". The
expression "limited" also needs definition (or at least
needs to be made explicit) . For our purposes, "limited range
of influence" will be taken to mean that at distances from
P . to P that are of the order of magnitude of the spacing
between prospective observation sites the magnitude of
(2) (2)
g. is negligible (i.e., the presence of g. cannot be
distinguished from sampling variations) . With this speci-
fication of "limited range of influence", the situation
(2)
is treated as though g . were simply the magnitude of the
jump discontinuity if g (P.,P) for P-»-P . . Then we may write
3 3
57
-------
= g.j(1) (Pj'Pj) + 9j(2)
or
g..(2) (PjfP) = 0 if
g.(2)(P.,P) = g. (2) (P. ,P.) if p=p.
D D J D D D
Equation (1) then becomes
where P, may be any one of the points P . .
Consider now the element of the covariance matrix {a. .},
a. . = (x.x.). Note that these covariances are essentially
the same as those of g . = (x.y) except for the fact that the
wandering point P involved in g . = g.(P.,P) is now restricted
to one of the observation sites. As long as the points P.
and P. at which the covariance a. . = (x.x.) is computed
are distinct, the values concerned are just those of
g. (P., P.). The part of the covariance that had "limited
range of influence" is not involved. But throughout the
principal diagonal of {a. .} one has i=j . For these elements
of the matrix {a. .} both parts are involved. These elements
of the matrix will be written as
a.. =a..(1) + e.2
11 11 i
where a.. is what we have called g. (P.,P.) above and
58
-------
2 (2)
e. is what we have called g. v (P.,P.). (Note: the
J- « 1 11
quantity e. is not_to be confused with the mean square
2
error of estimate, e , without subscript/ used in equation
2
(8), Chapter II.) The terms e. will be referred to as the
"residual variances". They are the quantities that must
(2\
be determined in order to separate the part g. from g..
The method used to determine the residual variances
is an important part of Factor Analysis. It is unfortunate
that in the physical sciences the importance of these residual
variances has been, until recently, neglected to a large
extent. At least a part of this is due to the fact that
physical scientists also tend to be instrument designers and
have felt that to recognize "errors of measurement" was to
cast doubt on the quality of their instruments. Factor
Analysis is traceable to the psychologist Carl Spearman
(1904) and the recognition of the importance of the residual
variances originates with him also. As used by psychologists,
the covariances (or correlation coefficients) that make up
the matrix elements a.. are those between "test i" and
"test j" when given to a group of subjects. The psychologists,
like the physical scientists, were hesitant to admit that
these tests involved "errors of measurement". They did,
however, recognize that each test had something about it
that was unique (belonged to it alone). Consequently what
we here call the residual variances were called by the
psychologists the "uniquenesses".
Factor Analysis, as such, is devoted to things quite
different from the determination of the residual variances.
These are rather incidental, but have an important bearing
on the prime objective in that subject. We confine our
attention to finding the residual variances alone and ignore
59
-------
all other aspects of Factor Analysis. The best treatment
that we have found is that of Lawley and Maxwell (1963) and
(1971) who treat the subject from the point of view of a
statistician and include discussions of the applicable
tests of significance. The ordinary texts on Factor Analysis
tend to emphasize the computational and interpretive aspects
of the subject and neglect the significance tests. For
example, Horst (1965) is a 730 page compendium of computing
programs and the associated mathematical manipulations which
does not mention a single test for significance. On the other
hand, adequate tests of significance only go back as far as
Bartlett (1951) and Lawley (1956) . The methods used here
depend chiefly on Joreskog (1962) and may also be found in
Lawley and Maxwell (1971) [but not in (1963)]. As far as
we have been able to determine, the treatment of the method
of finding the residual variances using the formulation of
the problem as an integral equation rather than as a matrix
equation has not appeared elsewhere except for a brief note
by the writer (Buell, 1972) which covers only a small part
of the work reported here.
60
-------
B. Determination of the Residual Variances
The discussion of Section A preceding leads us now
2
to consider the matrix A = {a.. + e. 6..}, where 6..=0
if i^j and =1 if i=j so that we may write
A =
a22+e2
n2
2n
, a +e
nn n
and in which a..=a.., i.e., the matrix A is symmetric.
The matrix A is also positive definite. To keep the notation
from becoming excessivly complex, the diagonal terms shown
as a.. here were denoted by a.. in the previous section.
We note now that one may write the matrix A as a sum of two
matrices, A=C+D, where C is the matrix {a..} and D is the
2~ -1
diagonal matrix {e. 6..}. The problem at hand consists of
finding the elements of D and the diagonal elements of C
2
(i.e., the values of a..) so that their sum, a.. + e. will
have the known value that appears on the diagonal of the
given (known) matrix A and such that all of the diagonal
elements of D are positive (or perhaps zero) and such that
the resulting matrix C will be positive definite.
61
-------
1. Representation of a Matrix in Terms of Proper Functions/
Proper Values
In this section some background material is intro-
duced which forms the basis on which the following sections
depend. We consider a symmetric positive definite matrix
B={b..}. It is well known that such a matrix may be written
in the form
B=$A$', $•= transpose of $ (2)
in which the matrix A is a diagonal matrix such that the
elements on the principal diagonal, X., are all positive
(or at least non-negative) real numbers. It is also further
specified that these be written in decreasing order of
magnitude:
A,>A0>A_> >A >0
1— &— j— — n—
These are the proper values belonging to the matrix B.
(Also called eigenvalues, latent roots, characteristic roots,
etc.) The matrix B is assumed to have n rows and columns.
There are then n proper values, not necessarily all distinct.
The proper values are the solutions of the determinantal
equation
|B-AI| = o
where |•| stands for the determinant and I is the nXn unit
matrix. This equation is a polynomial of degree n in A.
With each of the proper values, there is associated a
proper function (or proper vector) which appears in the
column of $ that corresponds to the position of the associated
62
-------
proper value in A. Thus to X . , there corresponds the
column vector col{<}>, . ,4>~ . , --- ,cj> .} which stands in the
.1. -L £ J_ II J_
i'th column of . (These are also called eigenvectors, latent
vectors, characteristic vectors, etc.) These are the solutions
of the homogeneous equations
k
(It is also said that the matrices A, $, are the solutions
of the matrix equation B$=$A.)
The proper vectors or proper functions have the property
of being an orthonormal set of vectors. That is to say, they
are normalized,
k
and they are orthogonal to each other
k
The original statement that the matrix B may be ex-
pressed in terms of its proper values and proper functions
is written in explicit summation notation as
k
The decomposition of a matrix into its proper values/
proper sectors is an important aspect of matrix analysis,
of which we have presented only the bare essentials for a
particular case. See any standard text such as Bellman (1960) ,
Gantmacher (1960a) , MacDuffee (1949) , Perils (1952) ,
Turnbull (1960), etc. for a general treatment. Not only
63
-------
is it important for the subject at hand, but has many
applications, particularly to oscillatory systems from simple
strings and mechanical systems (Gantmacher, 1960b) to nuclear
spectra (Mehta, 1967).
In the above, the terms proper vectors and proper
functions have been used interchangeably. This usage
is intentional and is to emphasize the fact that what is
termed a "vector" with n discrete "components" in this dis-
cussion will turn out to be a function on a continuum and
it just happens that our computing procedure is limited to
the direct computation of values of this function at only
n points.
2. Factor Analysis Methods
Modern factor analysis techniques provide methods for
2
finding the individual values of e. , i=l, , n. Most
of the methods for finding these quantities (i.e., those that
are statistically sound) require the solution of the matrix
equation
(A-D)$ - $A (7)
2
where D is the diagonal matrix with elements e. on the
principal diagonal and O's elsewhere. All of D,A,$ are to
be found while only A is given. Nearly all of the methods
concerned are iterative, and have the characteristic feature
that (with only a few exceptions) they converge very slowly
(and in some cases, some methods fail to converge at all)
(Lawley and Maxwell, 1963).
The method used here is straight-forward and has given
results that appear to be satisfactory. It is that developed
it
by K.G. Joreskog (1962). The technique is as follows. It
64
-------
is assumed that the covariance matrix, A, may be written in
the form
A = $A$'+D (8)
where $' is the transpose of $ which in turn is the matrix
of proper functions of A-D, A is the diagonal matrix of proper
2
values, and D = {e. a. .} is the diagonal matrix of residual
variances. The proper functions (or vectors) corresponding
to the proper values are ordered in the same way as the
associated proper values [col (4>, . ,_., --- ,<)> .) corresponding
to A.]. The values of A., i=l, --- , n, are the roots of the
determinant equation
A-D-Al = 0
where I is the n x n unit matrix. Now consider the matrix
A-D instead of the matrix A and assume that all of the diagonal
2 2
terms of D are identical, e. = e , i=l, --- ,n. It is rather
easily shown that (a) the matrix A_has proper values which
2
are those of A-D but increased by e and (b) the proper
functions of A are exactly those of A-D. Thus, if y . ,
i=l, --- ,n, are the proper values of A, and A., i=l, --- ,n,
are those of A-D, then
y = A + e2 .
(Anderson, 1958, p. 287, problem 7)
If, now, we were to find the proper values of A as
y, , ...,y and if we had some way of determining that the
last n-k of these proper values were not significantly
different from each other, it would then seem reasonable that
we could lump these last n-k values into one batch with
65
-------
average value (y, , , , ..., +y )/(n-k), take this value as the
~T ~~2 ~T
estimate of e , and then consider the values of y,-e , ...,y,-e
_L K
as proper values of the matrix A-D, which will then be of
rank k( .)/ i=l,...,k, and the last n-k proper values,
A., would be zeros. It just so happens that just such a
test as required above is available. Its specification is
deferred until later since it is used in a somewhat more general
sense than is required by the heuristic argument above.
H
Return now to the technique of Joreskog (1962) in which
2
the elements of D={e. 8. .} may differ from each other. It is
assumed that the matrix D is given by the expression D =
0[diag(A~ )]" , where 6 is a scalar constant to be determined,
D is proportional to the diagonal matrix which is the inverse
of the diagonal of the inverse of the covariance matrix that
we started with. We reduce this notation to the form D=9A ,
where A=diag (A~ ) and (8) then becomes
A = (*A$') + 0A"1. (9)
Since A is a covariance matrix, the elements of A will
all be positive, and if we take the positive square roots of
1/2
the elements of A and denote this diagonal matrix by A ,
it then follows that (9) may be written as
= (A1/2**1/2)^1/2**1/2)' + 01
or
A1/2AA1/2 = CC1 +01
where
C =
1/2 1/2
The matrix A AA is then to be expressed in the form
described in the heuristic argument preceding where the
66
-------
single scalar coefficient, 6, corresponds to the common
2
residual variance, e , used there. To find the value of 6,
one finds the proper functions and values of the matrix
A^AA1/2, or
(A1//2AA1//2)F* = F*A*
where A* is the required diagonal matrix of proper values
(properly ordered) and F* is the matrix with proper functions
appearing in the columns in the same order as the proper
values.
The test for the last n-k proper values being different
from each other is to the effect that the quantity Q is
2
approximately distributed as x / where
Q = N'{-loge(A*+1- A*) + (n-k)log[(A*+1+ +A*)/(n-k)]
and
N1 = N-k+[2(n-k)+l+2/(n-k)]
2
and the number of degrees of freedom for x is
d.f. = (n-k+2)/(n-k-l)/2.
(N+l = number of observations on which A is based.) The
criterion operates in an inverse sense. If the value of
2
X for a given value of k (the number of proper values
accepted as being different from each other) is exceeded
then there is at least one more significant proper value that
should be included. If it is found that at a given significance
level, the number of significantly different proper values, k,
is adequate for the representation of the matrix A / AA ,
67
-------
then the value of 6 is taken as
9 = (Xk+l + ~~~ + V/(n~k) '
the average of the last (n-k) proper values.
We have now succeeded in representing the matrix
A1/2AA1//2 in the form
* * *
A(F) ' + 61
where A, , F, consist of only the first k proper values and
k k
proper functions. The matrix A may then be written as
A= (A~1/2F*)A*(A~1/2F*) ' + 9A'1, D=6A~1={ei26ij} (10)
-1/2 *
Since the matrix A ' F, has columns not orthogonal to
each other, these are no longer proper functions of any matrix,
To obtain proper functions and values, it is necessary to
recompute values of the k proper values/functions from
scratch, only using as the matrix concerned the matrix
-1/2 * * -1/2 *
(A F, )A,(A ' F, ) '. There are, of course, only k non-zero
proper values and proper functions, A, , $, , which are usually
K K * *
quite close to those already obtained for A,, F, (except
for a scale factor for the proper values).
To recapitulate, it was pointed out that the use of
equation (9) of Chapter II as an interpolation formula for
estimating the measure of pollutant concentration involves
two important items that are frequently overlooked. The
first is that the variance of the small scale effects, the
effects of limited range of influence, must be specified
68
-------
quantitatively. The Joreskog (1963) representation of
the covariance matrix does this. The required values are
exactly those specified by (10). The second item is that
a method of interpolation was required that would preserve
the character of the matrix A as a covariance matrix. This
is provided when the first matrix expression on the right
of (10) is recomputed in terms of its proper functions, $,,
and corresponding proper values, A, so that
i
A = $, A, $. + D . (11)
k k k
In (11) note that the subscript k is not a summation index.
It is used to indicate that only the first k proper values
and functions are used. The matrices concerned have dimensions
as follows: $, is (QXk) , A, is (kXk) , $, is (kXn) , so that
i K. K. JC
$, A, $, is (nXn) as are A and D.
It appears that the solution to the problems concerned
is at hand, but this is not necessarily the case. The Factor
2
Analysis method for finding the values (e ) is applicable
to quite general covariance matrices. We are concerned with
covariance matrices of a stochastic process on a continuum.
The Factor Analysis method is strongly dependent on the
value used for the number of statistically significant proper
values, k. The problem will next be considered using a con-
tinuum formulation. The effect on the evaluation of k is
found to be important.
In Sub-section 3, Integral Equation Methods, immediately
following, the problems involved in the solution of the
integral equation corresponding to (7) are considered. A
method for testing whether the solutions of this integral
69
-------
equation obtained by two (or more) only slightly different
procedures are consistent with each other is developed. The
section is strictly theoretical. The practical application
of the techniques of Sub-section 3 is made in Sub-section 4
to actual S0? measurements.
3. Integral Equation Methods
The factor analysis method for determining the values
of the residual variances fails to take into account the
fact that there are strong geometric relations connecting
the pollution concentration covariances at the various
points of the observing network. To bring this into the
picture, the problem is re-stated in terms of a continuum of
values rather than in a totally discrete form that ignores
this situation. To illustrate, the values x., x. that enter
into the covariances (x.x.) can be (in the factor analysis
case frequently are) test scores from the i'th and the j'th
tests given to a batch of subjects. The i'th test might be
on arithmetic ability and the j'th test on manual dexterity.
If a third test is considered, say x, for the score on the
k'th test, which is on social adaptiveness, it is silly to
place this in a strictly geometrical relation with respect
to the other two. On the other hand, when x., x., x, are
i 3 K
pollution concentration at points P., P., P,, we know all
i j K
about the strictly geometrical relations between these
points, i.e., we can say that P, is such and such a distance
x
from P. and also from P. and that P. and P. are so far from
each other and that the line joining them has a certain
direction. The natural generalization of the proper value/
function analysis of the covariance matrix is the integral
equation formulation for proper values and functions in a
continuum. Thus
= fKUrX'HU1) dx1 (12)
R
70
-------
where X is a proper value, <}> (x) is the corresponding proper
function, and K{x,x') is the kernel,- the exact analogue of
the covariance matrix. The kernel, K(x,x') is precisely the
covariance of concentrations at the points x and x'. Note
that x in the above may be a multidimensional variable in
which case dx1 is multidimensional and R is a region of the
same dimensions. We are concerned here with the two-dimen-
sional field of pollution concentrations. The kernel function,
K(x,x')/ is now a positive definite symmetrical [K(x,x') =
K(x',x)] function of the location coordinates. The fact
that we are now working on a continuum is also reflected in
the fact that (12) has a denumerable infinity of solutions,
X , (x) , of proper values and functions. The proper
values are considered as ordered
X.. >X^ > --- >X > --- > 0
1 - 2 - -n- —
and the proper functions (x) are taken in the same order.
The standard method of solving (12) is to reduce the
problem to a matrix problem where the values x,x" are
specified on a network of points and the integral is replaced
by a quadrature formula. This approximation may be written
as
(13)
where the factors A. correspond to an element of area about
the point x . such that ZA . = R = area of the region of
integration (for a two dimensional network of points) . If
the kernel function were given by an explicit formula, then
one could evaluate it at any number of point pairs x. and x . ,
i,j=l, ..., n, and the number of points could be taken as
very large. The larger the number of points selected, the
71
-------
more accurate would be the estimate of the solution for a
proper value/function of some specific order, n. On the other
hand, one is faced with the fact that one can only obtain a
finite number of solutions when it is known in advance that
there are a denumerable infinity of such. Thus, for a fixed
number of points, x., i=l, ..., n, n fixed, there must always
be some point, say n*, beyond which the solutions depart
more and more from the theoretical or exact solution. When
one applies the technique to a kernel function that is known
only at point pairs from a fixed number of points and for
which the value of the kernel function is known only to within
a certain sampling error, other considerations limit the
accuracy of the solutions. One of these is the accuracy of
the values of the kernel function itself and another is the
choice of the quadrature factors, A., and still a third is
the fact that K(x,x') may have a "jump discontinuity" along
x=x' (or at least an effective "jump discontinuity" as far
as the observation station spacing is concerned). These
several points will be considered separately in the following
paragraphs. None of these points is trivial. Considered
from the integral equation point of view, the proper values/
functions obtained from the matrix equations by Factor Analysis
methods are simply approximations to a correct solution in
which several of the factors concerned have been overlooked
due to the over-simplification of the problem.
a) The Jump Discontinuity
The fact that there may be a jump discontinuity along
x=x' may be handled by the following integration technique.
Subtract from the equation (12) the identity
R
72
-------
to obtain
4>(x) [A-yK^x'Jdx1] = y*K(x,x') [(x)]dx' (14)
R R
The discontinuity of the kernel function in the integrand on
the right of (14) is no longer effective since the factor
[ (x1) - (x) ] is zero on x=x'. It is still present in the
integral term on the left side. In this integral we use
quadrature factors that are dependent on x in addition to
(x)
x1 in such a way that B. =0 if x=x.. Quadrature formulas
using such factors are said to be of "open type". See
Abramowitz and Stegun (1964). If these quadrature factors
(x)
are indicated by B * while those on the right are A., then
the matrix equivalent of (14) may be written as
fK i v ^ T A ^ » Tf i v v ^ *Q i 1 ^~ » V / *v 'V i i A f v \ ^ ^f\ i ^ i 1 ^ I I ^ I
T\^^/ I ^ y ^ J\VA*XI/C* i J — £J^- \X • f x • ^ |_tpvx./tpvx.^j/\. v J--J /
j j
Rearranging the terms of (15), one finds that
X(fi(xi)= J^K(xifx.) (x. )A.+ EKfx.^/x .) ((> (x^) [B. i -A.] (16)
where neither summation contains the term i=j. This term
is missing from the first summation because regardless of A.,
it would be canceled by the corresponding term of the second
summation; it is missing from the second to also account for
(x )
the fact that B. i = 0. The second term on the right is the
equivalent of substituting for K(x.,x.)A^ the expression
K(xi,xj) [B.-Aj]. (16a)
This is equivalent to using an interpolation formula to
obtain K(x.,x.) from values of K(x.,x.) that make no use
J / \
of x- = x. explicitly. One must have I. B.i =R and
73
-------
£ A.=R-A. so that Z [B !xij -A . ] =A. , i.e., the sum of the
J*L D X J^i D D
weights given to each of the terms K(x.,x.) of (16a) is just exactly
the weight ascribed to K(x. ,x.) itself.
In order to obtain an explicit solution using the symmetric
positive definite matrix algorithms (which are much faster
than those for an unsymmetric matrix) , this form may be made
symmetrical using the device described in the following
section.
* *
When the proper functions and values, <(>, (x. ) /^,, have been
found from (16), then a new kernel K*(x.,x.) can be constructed,
thus
and the amount of the jump discontinuity along x.:=x. may
be determined from
J± » J(x±) = K(xi,xi) - K*(xi,xi).
b) The Quadrature Factors
The unsymmetrical formulation for the matrix approximation
to the integral equation using a quadrature formula, (13) ,
may be made completely symmetric by multiplying both sides
by i/AT. Thus (13) becomes
X[/A7(xi)] = {/ATk(xi,xj)/A7}[ /A7(Xj)] (17)
If we let /A~7(x.) = 9(x.), then the values of 6(x.), X are
proper functions/values corresponding to the weighted
74
-------
covariance matrix {/A7~K(x. ,x . ) v/AT} . Note that if we have
proper functions 0, (x. ) and 8,(x.) corresponding to proper
JC 1 XX
values A, ,A,, these functions are orthonormal , i.e.,
k6l = 6kl 6kl = 0 if
If now we substitute the expressions for 0v(x.)/ 6, (x.) in terms
JC 1 XX
of 4>Jc(xi)f ct>1(xi)/ one obtains
which corresponds to the orthonormality condition in integral
form which is satisfied by the proper functions belonging
to the integral equation (12),
^ (x)dx = 6kl
R
provided that the same quadrature factors are used to evaluate
this integral as were used to reduce the integral equation (12)
to matrix form as in (13) or (17).
The quadrature factors that are to be used in going
from (12) to (13) or (17) need to be very carefully considered.
First, even in the case of a single variable x, the choice
of quadrature factors is by no means unique. The standard
handbooks of mathematical formulae list the quadrature factors
for the Trapezoid rule and Simpson's rule. The more complete
work, Abramowitz and Stegun (1964), lists eight additional
sets of quadrature factors, not to mention the open-type
formulae which would be used to obtain quadrature factors
(x )
like the B. i' of the preceding section. In the case of
these formulae an error term is listed which generally is
proportional to some derivative of the integrand evaluated
75
-------
in the interval of integration. This, at least, gives the
impression that if the integrand is sufficiently "smooth",
the higher the order of this derivative, the more exact the
quadrature formula. All of these formulae are for equally-
spaced abscissae, x. ,-x.=constant. When the abscissae are
not equally spaced, one must be prepared to go to the basic
relations from which such quadrature formulae are usually
derived to obtain adequate expressions.
The situation when the parameter of integration is
multidimensional is worse. Even the compendious Abramowitz
and Stegun (1964) list only one quadrature formula for
points located on a square grid and which are also on the
boundary of the area over which one is integrating. It is
usually mentioned that in dealing with a rectangular grid of
points, one may make a multiple application of the one
dimensional quadrature formulae. None of these cases describe
the situation at hand, which we now explore to some extent.
To obtain the experimental values of the kernel K(x.,x.)
we have a fixed number of points, P., with coordinates (£.,n-)
that are arbitrarily located. We cannot increase the
number of points and we can do nothing about their location.
The domain of integration, R, is determined by this network
of points to a large extent, but not exactly. We may as
well simplify the problem as much as possible by specifying
that the boundary of R is a polygon obtained by connecting
the points on the perimeter of this area. Note that the
network of data points does not really define such a boundary
uniquely. We need to specify something like a "convex"
boundary before even the area of integration becomes uniquely
defined. We do not consider all of the ramifications of the
boundary selection any further since it would only make a
difficult situation more difficult.
76
-------
The ambiguity of the relation between the points, P., and
the domain of integration is illustrated by the simple
example of a re-entrant quadrilateral as shown in Figure IV-1.
In A, the four points are considered to be bounded by a simple
triangular region with the fourth point as an interior point
and the region divided into three non-overlapping triangles.
In B, the fourth point is considered to lie on the boundary
of a re-entrant quadrilateral which is subdivided into two
non-overlapping triangles.
Inside and on the boundary one has the points P.. Now
subdivide the region into elementary triangles with a point
P. at a vertex of each triangle. It is readily proved that
if P = total number of points, B = number of boundary points,
S = number of triangle sides, T = number of triangles, then
S = 3(P-1)-B
T = 2(P-1)-B
The fact that the boundary is not uniquely defined is reflected
in the fact that B (which is included in the total number
of points, P) is also a parameter that needs specification.
Consider now a single triangle that has for vertices
the points 1, 2, 3. In order to carry out the integration
of f(£,n) over this triangle when only the values f,,
f~, f.- at the vertices are known, it is reasonable to
consider a generalization (slight) of the trapezoid rule.
Let the function be approximated by a plane over this triangle.
It is easily shown that the integral over the triangle is
given by
(£,n)d£dn = (f1+f2+f3)•(A/3) (17)
A
where A is the area of the triangle.
77
-------
( A )
( B )
FIGURE IV-1
THE AMBIGUITY OF THE RELATION FOR THE NUMBER
OF TRIANGLES AND SIDES DEPENDING ON BOUNDARY
POINT ASSIGNMENT
78
-------
Next consider the array of points that is subdivided
into non-overlapping triangles that completely cover the
region of integration. With each point, P., there will be
one or more triangles with this point as a common vertex.
The integral of f(£,n) over the entire region R may then be
approximated by the quadrature formula
R i
where i is the point index and A. is given by
Ai = f Z\(i)]/3
where T, (i) represents the area of the k'th triangle that
* *
has the point P. as a vertex and Z is the sum of all such.
1 k
The situation is amenable to a simple geometric con-
struction to visualize the areas thus associated with the
points P.. In an individual triangle, the lines joining
the vertices with the mid-point of the opposite side divide
the triangle into three quadrilaterals each with the same
area. Thus, for the triangle 123, Figure IV-2, the points
A,B,C are the midpoints of the sides and lines Al, B2, C3
all meet at the point Q, the "center of gravity" of the
triangle. The quadrilaterals QB1C, QC2A, QA3B each have
an area equal to 1/3 the area of the triangle 123.
When several triangles are put together to form a region
subdivided into triangles, the area associated with each point
is illustrated in Figure IV-3.
79
-------
FIGURE IV-2
THE THREE EQUAL AREA QUADRILATERALS, 1CQB ,
2AQC , AND 3BQA , RESULTING FROM JOINING THE
VERTICES WITH THE MID - POINTS OF THE OPPOSITE
SIDES OF THE TRIANGLE.
80
-------
FIGURE IV-3
AN AREA SUBDIVIDED INTO TRIANGLES DETERMINED
BY THE POINT LOCATIONS AND THE AREAS ASSIGNED TO
EACH POINT ON THE BASES OF THE QUADRILATERALS IN
EACH TRIANGLE ILLUSTRATED IN FIGUREIV-2.
81
-------
The situation seems to be well under control at this
point, but now consider how we divide a very simple convex
quadrilateral into triangles. There are two choices of the
way in which this is done, as shown in Figure IV-4. It
is readily seen that the quadrature factors assigned to the
points 1,2,3,4 in these two cases are vastly different from
each other. When a region covered by many data points is
considered, the number of ways that it may be divided into
non-overlapping triangles becomes large, and each of these
methods of subdivision will be associated with a different
assignment of quadrature factors to the points. The problem
then resolves itself into the question of what is the best
way of subdividing the area covered by given points P. into
triangles so that the resulting quadrature formula will best
approximate the integral concerned.
As an example of the wide variety of quadrature factors,
consider a square with data points at (±h,±h), (±h,0), (0,±h),
and (0,0). There are four sub-squares and each can be divided
into two triangles in two ways. There are thus 16 possible
ways of dividing the area into non-overlapping triangles.
Each of these will give a different system of quadrature
factors based on the integration of a plane approximation over
the triangles. These are listed in the following table.
The quadrature factors in each case add to 24 so they are to
2
be multiplied by h /6 where h is the spacing between points
to get the correct units.
82
-------
FIGURE iv-4
THE TWO WAYS THAT A CONVEX QUADRILATERAL MAY
BE DIVIDED INTO TWO TRIANGLES AND THE RESULTING
DIFFERENCE IN THE AREAS ASSIGNED TO EACH POINT.
83
-------
Quadrature Factors for the 9 Points Covering a Square
Number
Coordinates
(0,+h)
(0,0)
(0,-h)
1 1
(Trap.) (Simp.)
1
4
1
4
4
4
1
4
1
2
2
2
2
8
2
2
2
2
1
3
2
3
6
3
2
3
1
1
4
1
3
6
3
2
2
2
1
3
2
3
7
2
2
2
2
1
4
1
3
5
4
2
3
1
3/2
3
3/2
3
6
3
3/2
3
3/2
2/3
8/3
2/3
8/3
32/3
8/3
2/3
8/3
2/3
The "number" row indicates the possible number of cases
of each type: 1 is associated with a symmetrical arrangement,
2 indicates that a 90° rotation gives another arrangement but
that a second rotation of 90° reproduces the original
arrangement; 4 indicates that the original is not reproduced
until the 4'th rotation through 90°.
The last two columns are symmetrical quadrature factors
for this point array that are derived from the double
application of the trapezoid rule and Simpson's rule respectively,
The quadrature factors for the trapezoid rule (applied
twice) are the average of those shown in the first two
columns.
In this example, there appears to be no good over-all
criteria for prefering one assignment of quadrature factors
over another. In the case of each square, the two possible
triangle subdivisions are symmetrical. One might prefer a
symmetrical arrangement as in the first two columns, but this
seems to be justified more on the basis of taste rather than
84
-------
a good substantiation of the better accuracy of the quadrature
formula. When the points are not regularly arranged on a
grid, but are more or less at random, one might prefer a sub-
division into triangles that would favor a tendency toward
equilateral triangles over long, skinny triangles, but again
this appears to be justified more on the basis of taste than
mathematics.
The standard hyperbolic paraboloid interpolation scheme
(sometimes called double linear interpolation or iterated
linear interpolation) over convex quadrilaterals might be
used, but the same ambiguity on the resulting quadrature
factors still exists because of the many ways that the
observation net can be divided into convex quadrilaterals.
In this case, one may be stuck with a few triangles since
the quadrilateral subdivision need not "come out even".
c) The Product Integral T_ech_ni._qu_e
If instead of interpolating the whole integrand on a
triangle to obtain the quadrature factors one interpolates
separately the two terms of the product, one obtains the
relation
IA =
A 1 1-q
= 2KJ yK(x,y;p,q)4>(p,q)dpdq
where 0 0
2~*i) + q(3-<|)1)
K(x,y;p,q) = K + p(K-K) + qdC--K-
85
-------
and in which cj>, , 2, 3 and K, , K2, K3 are the values of the
proper function and the kernel function at the triangle
corners and A is the triangle area. The results of this
integration lead to
IA = (A/12) [(|)1(2K1+K2+K3)+cj)2(K1+2K2+K3)+(j)3(K1+K2+2K3)]
which may be expressed in matrix/vector form as
(2, 1, 1) /*!)
I. = (A/12){K ,K~,K } {1, 2, 1) h }
111 71 II
!-L, X, n (^ J
*J
where now one has a matrix of quadrature factors. Sum over
all of the triangles that cover the region of integration to
obtain the quadrature form for the integral equation as
ik * ] (18)
JJ k D
where = KA
where is the column vector of proper values (x.). Now
assume for the moment that A is a positive definite matrix.
(It is symmetric.) It will then have a square root which we
1/2 1/2
will denote as A ' . Multiply on each side by A ' to obtain
86
-------
,,Al/2,. ,A1/2...,. 1/2, ,.1/2, >
A (A ' $) = (A ' KA ' ) (A ) .
1/2
If, now, we let A ' =6, we have the relation
xe =
which is the analogue of the relation (17) obtained when the
quadrature factors were simple scalars associated with each
point.
The solutions of the above matrix problem, 8]c(xi^'^k/
are such that, as before,
Then in terms of the values v(x.)f one has
Vxi> -Z
m
so that
m n
= I 24^) !<*„> (2
m n i
1/2 1/2
where, in the above, {A. . } stands for the elements of A '
the square root matrix of the matrix A (not the square
roots of the elements of A) . Since A1/2 is symmetric
(as was A) the summation on i in the above simply gives the
elements of A. Thus
m n
87
-------
This is the same result that would have been obtained had
we evaluated the "product integral" expression
R
using the same individual interpolation formula for the
factors in the integrand that was used to evaluate the
product integral that appears in the initial integral equation.
The square root matrix of the matrix A is readily
obtained by recourse to the proper functions and values
of the matrix A. If the proper values/functions of A are
y,, ty., so that M is the diagonal matrix of the yk's and
¥ the matrix of which fy., are the column vectors, then
A = yMV. Since for a symmetric positive definite matrix
the proper values are real and positive, then the proper values
have square roots (use the positive sign). Then the
matrix YM ' Y1, where M ' is the diagonal matrix of elements
1/2
y, on the principal diagonal and zeros elsewhere, is the
square root matrix of A, i.e., A1//2 = ¥M1//2<{" . Note that
1/2 1/2 _ 1/2 , , 1/2 , _ 1/2 1/2 , _ 1/2 1/2 , _
d) Test for the Accuracy of Solutions
In view of the fact that the quadrature factors for
the numerical solution of an homogeneous Fredholm integral
equation of the second kind are not well-defined quantities,
a test of the validity of solutions seems to be in order.
The usual mathematical error estimates are unsatisfactory
for this since they all require more information than is
available from experimental data on the kernel function. A
test is available, however; namely a simple comparison of
the proper functions arising from two equally valid choices
88
-------
for the quadrature factors. Thus, let there be two choices
of quadrature factors, A. and B.. T
algebraic equations to be solved are
of quadrature factors, A. and B.. The corresponding
£ [/A7K(xi,xj)/A7]
and let
e#(x.) =
e*(x.) =
# *
be the proper functions at the points x. and let A£, A,
be the corresponding proper values. Then consider the sum
of the squares of the differences of these solutions. If
the solutions are reasonably similar, this quantity should
be very close to zero. On expanding the square and summing
over the points x. , one obtains
i
The first and last term on the right are each unity so that
I[e^(xi)-e*(xi)]2 = 2 [i-s ej^x^e*^)].
The second term in brackets on the right is simply the
"correlation coefficient" for the point by point comparison
of the two solutions. (Each has unit second moment about
zero, but need not have a first moment of zero, although
89
-------
for proper functions of order greater than a small number
this is essentially the case. We compute the sum of products
as indicated ignoring the fact that the first moment may
not be zero. It is an immediate consequence of the Schwartz
inequality that the value of this sum of products lies between
+1 and -1 so it "looks like" a correlation coefficient
anyway . )
The quantity
serves as a test parameter for the "self consistency" of the
* #
two solutions 0, (x.) and 67 (x.). If C, is close to 1, it
JC 1 K. 3- K.
may be said that the solutions are self consistent. If it
departs from 1 by a significant amount, then the solutions
are not self consistent. A better condition for self con-
sistency is much stronger than this. We have a sequence of
values C., k=l, . . . , n. It is expected that for k small,
the values of the test parameter will be close to one. If
*
at some value of k, say k , the value of C,* has decreased
abruptly compared to the previous values, then we can say
that not more than k*-l solutions are self consistent. This
is regardless of whether or not the subsequent values of
C,, k>k*, may be close to 1. This is because in each
sequence of solutions every solution is orthogonal to all
previous solutions of the sequence.
One cannot say which of the solutions is incorrect if
C,<1 when the quadrature factors are equally valid. One
can only say that k*-l solutions appear to be equally valid
and that any further solutions in the sequence are suspect.
90
-------
From the preceding, it is concluded that by formulating
the problem of matrix reduction (outlined in Sub-section 1
and treated from the Factor Analysis point of view in Sub-
section 2 as an Integral Equation) there are ambiguities of
technique that have an important bearing on the accuracy of
determining the proper functions. These ambiguities may be
used to test for self-consistent solutions. If the solutions
are self-consistent one can say nothing about their accuracy.
Accurate solutions must be self-consistent, but inaccurate
ones may be also. On the other hand, if a pair of solutions
are inconsistent, then one or the other or both must be
inaccurate. This has an important bearing on the value of k
determined from the Factor Analysis method (k = number of
statistically significant solutions to the matrix problem)
as outlined in Section 2. It seems reasonable that if there
are not more than k* solutions of the integral equation
formulation that can be reasonably accurate from a mathematical
point of view, then using more than k* solutions to the
matrix problem appears to be of doubtful value, even if the
appropriate statistical tests indicate that they are "statis-
tically" significant (k*
-------
00
CM
4
in
CM
CM
•
mm
W
MM
O
J
e
en
u.
o
tn
z
o
p
§
LU
V,
5
liJ
a:
1 1
n oo iv
>J CM CM
CO
~~~
U)
CO
^
CM
CO
• CM
CM
• oo _
O CO
co _
o 2
CM TJ
•
co to
ID f ™ °° O —
*• ** CM *4 ^
^ • •
• 8
•«
0)
1
H
cn
o
TORIN
O
Z
o
CM
8
« s •
—
in • J
co
CM
10
CM
CM
CM
CM
O
CM
00
— ^
jn n ^ 0H
en A «y W
^ • * _
CM CT)
A °° —
- "T «*>
« . .
"^
—
CO
A in
~ ^ —
co 0
1 1 1 1 1 1 ^ 1 1 1 1 1 1 1
ID
—
CM
O
C\ICMCMC\JCMCMOJ«-«-^-«— ^-^-^-
T
u
a:
3
iZ
92
-------
covariance matrix would be of rank 27 instead of a more
desirable rank of 40 (see Appendix B). Inspection of the
data for missing days by stations revealed that there were a
large number of days on which data for only one or two
stations were missing. It was felt that it would be desirable
to "interpolate" data for the missing stations under such
circumstances. This would increase the validity of the data
as a whole without seriously degrading that for the single
station at which the interpolated data was inserted. The
result of this synthetic increase in the data base resulted
in an increase to 59 effective data days, which in turn
assured a full rank of 40 for the covariance matrix.
The area covered by the stations was divided into two
systems of non-overlapping triangles covering a region with
common exterior boundary in each case. The station coordinates
and the triangle assignments are listed in Tables IV-1 and
IV-2. It will be noted in Table IV-1 that though the areas
assigned to points are correlated, there are also some large
variations between the areas assigned to a given point, the
ratio in some instances being larger than 2:1. The arrange-
ments of the triangles are also shown in Figures IV-6 and
IV-1. The station coordinates are in grid units as in Ruff
(1973a). The areas assigned to the points in Table IV-1
are in square grid units and represent one third of the area
of the triangles which have the point concerned as a common
vertex as on page 79.
The triangle assignments were made in order to obtain
quadrature factors as outlined in Sections Ib and Ic. Only
the points shown in Figure IV-5 with coordinates given in
Table IV-1 at which covariance data are available are given
a priori. To obtain quadrature factors which are associated
with each of these points (that is, the areas that have been
tabulated in Table IV-1 under the columns headed 1 and 2
93
-------
Table IV-1. Station Coordinate and Area Assigments
Station
Index
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Coordinates
X
19.42
20.16
18.66
20.24
17.72
21.18
18.12
20.76
16.22
20.48
16.52
22.48
18.04
22.32
16.32
20.42
14.14
21.02
14.16
22.44
14.96
24.14
16.88
20.24
18.50
18.88
15.42
22.30
13.78
23.26
11.14
24.76
10.34
27.10
10.68
26.04
13.96
23.54
14.74
16.64
y
20.86
20.06
19.16
22.36
20.14
21.20
22.50
19.20
21.16
17.88
18.92
18.64
17.76
16.64
24.12
25.20
21.46
23.84
20.26
20.64
18.04
19.78
15.88
15.80
28.42
14.92
27.88
26.86
25.06
25.04
23.64
23.22
20.34
19.98
17.18
17.06
16.06
14.52
14.32
13.04
Area
j 1
1.77
2.00
1.50
3.53
3.35
2.48
4.70
3.89
3.65
1.53
3.22
1.59
5.68
8.63
11.14
9.87
9.19
3o88
4.97
6.50
4.80
3.62
4.48
3.42
4.20
5.90
6.01
0.90
2.47
3.20
5.08
5.71
4o01
5.96
5.10
3.14
3.51
4.70
3.75
2.71
2
3.20
1.15
2.89 .-
3.50
1.76
3.16
5.04
2.46
4.96
3.60
3.40
4.52
4.48
4.87
10.85
7.77
7.29
3.94
6.39
2.83
6.20
5.24
6.34
4.77
6.10
3.88
3.46
2.38
4.59
2.44
3.61
5.43
4.71
3.56
4.97
5.15
3.09
4.70
1.77
3.57
94
-------
Table iv-2. Triangle Assignments
Triangle
Index
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
No. 2
1
26
24
14
14
14
14
20
20
18
16
16
16
25
16
15
15
15
15
17
17
17
19
19
21
21
23
13
26
13
13
2
38
26
24
36
34
22
22
32
20
18
30
25
28
25
16
27
29
17
31
33
19
35
21
37
23
26
23
39
24
14
3
40
38
38
38
36
34
34
34
32
32
32
30
30
27
27
29
31
31
33
35
35
37
37
39
39
39
26
40
26
24
No. 1
1
26
24
14
14
12
12
22
22
20
6
4
4
18
16
16
16
15
15
15
15
17
17
17
19
19
21
21
23
23
23
2
38
26
24
36
14
22
34
32
22
20
6
18
30
18
28
25
16
25
27
17
29
31
19
33
21
35
23
37
39
26
3
40
38
38
38
36
36
36
34
34
32
32
32
32
30
30
28
25
27
29
29
31
33
33
35
35
37
37
39
40
40
Triangle
Index
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
No. 2
1
10
8
8
12
8
8
2
2
6
4
4
4
4
7
7
9
9
9
11
11
5
3
3
8
2
2
1
1
1
4
1
5
5
2
13
10
12
14
12
20
8
6
18
6
16
15
7
15
9
17
19
11
21
13
11
5
8
10
3
3
2
2
6
6
5
7
9
3
14
14
14
22
22
22
20
20
20
18
18
16
15
17
17
19
21
21
23
23
13
13
13
13
8
5
5
6
7
7
7
9
11
No. 1
1
23
13
10
10
10
8
8
12
6
2
1
1
1
4
7
7
7
9
9
9
11
11
13
3
3
3
2
1
1
3
5
1
1
2
24
23
13
14
12
10
12
20
8
6
2
4
4
7
16
15
9
15
17
11
19
13
21
11
10
8
3
2
3
5
9
5
7
3
26
24
24
24
14
12
20
22
20
8
6
6
7
18
18
16
15
17
19
19
21
21
23
13
13
10
8
3
5
11
11
9
9
95
-------
96
-------
97
-------
since we are to compare two different ways of obtaining
the quadrature factors or areas) the entire region is covered
by non-overlapping triangles and to each point is assigned
the area equal to one third of the area of all triangles with
a vertex at this point= This covering of the region by
triangles is not unique, so we say that we make triangle
"assignment" or "assignments" since there is some freedom
of choice here. For the purpose at hand,, only two such
"assignments" are made„ Many others could have been made.
The areas, or quadrature factors, for each of the two
assignments of the triangle coverage (Table IV-1) that are
associated with each point are "correlated" in the sense that
if for one assignment of triangle coverage the quadrature
factor (or area) associated with a given point is small (large)
then for another assignment of triangle coverage it will
also tend to be small (or large). In other words, if one were
to compute the product moment correlation coefficient for
the areas (quadrature factors) shown in columns 1 and 2
under Area in Table IV-1, this correlation coefficient would
be positive and significantly different from zero. This is
simply due to the fact that, as shown in Figure IV-5, in some
areas of the region the points are more dense than in others.
The statement that the areas (or quadrature factors) are
correlated would be true for any two ways of covering the
area with non-overlapping triangles^ not just the two that
happen to have been selected here-
2. Quadrature Factors and Techniques
In order to obtain a comparison between the proper
values/functions for the different area assignments (triangle
coverages) these were computed and compared using the "trap-
ezoid method" and the "product integral technique". The
"trapezoid method" refers to the assignment of quadrature
98
-------
factors on the basis of fitting the entire integrand as a
plane over each triangle. The "product integral technique"
refers to fitting each factor in the integrand separately as
a plane over each triangle.
With reference to the quadrature factors used in the
"trapezoid method", a search was made of the literature on
the subject. It was found that the texts Davis and Rabinowitz
(1967) and Stroud (1971) were devoted to more mathematical
aspects of the quadrature problem in several dimensions.
This was true also of papers found in the recent literature
such as Ewing (1941), Tyler (1953), Synge (1953), Mises (1954),
Thacher (1957), Hammer and Wymore (1957), Hammer and Stroud
(1958), Albrecht and Collatz (1958), Stroud (1960a and 1960b),
Ceschino and Letin (1972) to mention only a few. Even
Dixon (1973), a survey paper, did not list multidimensional
quadrature formulas that could be applied. Almost invariably
mathematical interest has been confined to quadrature formulas
wherein the function values are given at arrays of points
prescribed in advance. These are, of course, useless when
the data points are given in an essentially random manner. It
was found that Mises (1936) did give a formula that could be
applied to an arbitrarily given triangle. It is for this
reason that in Section Ib the derivation of the quadrature
factors for the "trapezoid method" was developed in some
detail from "first principles" and the implications of the
use of such a formula (especially the ambiguity of the triangle
selections) were pointed out. We have not seen this kind of
treatment of the problem in any published work. (On the
other hand, it is so elementary that we feel sure that it
must have been treated before somewhere and that our search
has not been sufficiently exhaustive to find it.)
99
-------
With respect to the "product integral technique", it
was found after the method of Section Ic had been developed
that a somewhat similar procedure had been developed by
Boland and Duris (1969 and 1971) and Boland (1972) but,
again, only for the case in which the function is evaluated
at a prescribed regular array of points in one dimension.
See also Beard (1947) for an earlier treatment, also in one
dimension. We have not found anything treating the case at
hand, arbitrarily preassigned points in two dimensions.
The proper values so obtained are shown in Figure IV-8
for the trapezoid method and Figure IV-9 for the product
integral method. Each figure shows the proper values for
each triangle assignment for index values 20-40, one by a
triangle and the other by a dot which appear one above the
other. For the index values 1-19 the differences were so
small that the points are scarcely distinguishable and
consequently only the dots are shown. The differences in
the assignments of the areas to each data point apparently
make little difference in the proper values obtained. On
the other hand, comparison between figures shows at once
that for index 5 through 40 the proper values obtained by
using the product integral method of evaluating the integral
yields proper values that are distinctly less than those
obtained by the trapezoid method. The first ten proper
values are compared in more detail in Table IV-3 in which
the ratio decreases somewhat irregularly from 0.9 at index 1
to near 0.5 at index 5 and higher. This is discussed in
more detail on p. 104.
The "knee" in the curve of log proper value vs index at
index 5 or 6 represents an interesting phenomena which has
been noted by Craddock and Flintoff (1970) and investigated
100
-------
UJ
Ul
0.
O
o:
a.
10.0
1.0
0. 1
o.ot
0.001
I
I
0 10 20 30 40 50 60 70
PROPER VALUE INDEX
FIGURE IV-8
PROPER VALUES FOR DIFFERENT TRIANGLE ASSIGNMENTS USING
THE TRAPEZOID METHOD.
101
-------
10.0
1.0
0. 1
u
D
DC
LJ
Q-
O
ft
0_
0.01
0.001
I
10
20 30 40
PROPER VALUE INDEX
50
60
70
FIGURE 1V-9
PROPER VALUES FOR THE DIFFERENT TRIANGLE ASSIGNMENTS
USING THE PRODUCT INTEGRAL METHOD
102
-------
Table IV-3. Comparison of the First Ten Proper Values by
the Different Computing Methods
Index Trapezoid
Proper Values
Product
Integral
Ratio
1
2
3
4
5
6
7
8
9
10
6.28810
3.02629
2.23924
0.98689
0.57566
0.35671
0.32275
0.29378
0.26159
0.23855
5.71721
2.49737
1.88175
0.68170
0.24709
0.18841
0.16377
0.16263
0.11669
0.11485
0.90921
0.82522
0.84035
0.69075
0.42923
0.52819
0.50742
0.55358
0.44608
0.48145
103
-------
synthetically by Farmer (1971) . Craddock and Flintoff
remark to the effect that this marks the point beyond which
the proper functions appear to be more or less random, but
they give no firm criteria on which they base their measure
of randomness. We will be able to provide a criterion that
seems to fit the situation rather well.
In order to compare the proper functions obtained by
the two triangle subdivisions and the two quadrature techniques
the "self consistency" test parameter, C, , i.e., the corre-
lation of the proper functions for the methods being compared,
was computed. The values of |ck| as a function of index
number are illustrated in Figures IV-10 and IV-11. The
absolute value is used since there is always a basic ambiguity
in the sign of the proper functions.* In Figure IV-10 it is
to be noted that the first significant drop in the self
consistency test parameter occurs after the 7'th index where
the quadrature by the trapezoid method is used and after the
6'th index where quadrature is by the product integral
method. These are one index higher than the knee in the
curves of log proper value against index. Although some of
the test parameter values are reasonably large for higher
values of the index, it does not seem prudent to admit
validity to the corresponding proper functions. Each proper
function is orthogonal to those of lower index and low
values of the self consistency criterion precede these
larger self consistency criterion values.
The proper functions are the solutions of the integral
equation (12), page 70 (or of its algebraic counterpart
when formulated in discrete terms). This integral
equation is "homogeneous". It is readily seen that
if (x) is a solution, then -4>(x) is also a solution. When
solved in discrete terms, a standard eigenvalue/eigenfunction
computation routine may be used. Whether one obtains (x) or
- (x) from such a computation routine is more or less a matter
of chance. Either one is valid.
104
-------
u
en
z
LU
tn
en
15
<
2
i-
i-
UJ
a:
LU
u.
LL
§
o
o;
LU
a.
o
a:
a.
LU
LU
LU
m
o
P
LU
£
o;
o
o
SNOl-LONnj H3dOdd N33MJ.38
105
-------
The self consistency criterion between the different
quadrature methods for the two triangle subdivisions is
illustrated in Figure IV-11. The situation seems to be
similar to that of the comparison between triangle subdivisions
of the preceding figure except that the first "break" in the
test parameter appears to occur following the fourth index.
It is to be noted that this is the point at which the ratio
of the proper values obtained took a very abrupt drop to
below 0.5 (and remained in the neighborhood of 0.5 thereafter).
The reasons for both the abrupt drop in the ratio of
proper values for the two different computing procedures
(trapezoid vs product-integral methods) and the drop in the
correlation between proper functions for these two computing
procedures lies in the fact that the product-integral technique
is in effect a procedure which "smooths" the kernel function
as compared with the trapezoid method. Thus, from (13),
page 71, for the trapezoid method we have (slight obvious
change in notation)
A<|> . = Z K. .A. .
Vi j ID IT J
while from (18), page 86, for the product-integral technique
X. = £ (ZK. A )
i j k IK K: j
where A. is defined on page 79 and A, . on page 86. In order
J K3
that the relations above be equivalent, it needs to be shown
that
Z K., A, . = K ..A.
k ik kj rj 3
*
where A. and A, . are defined as above and K .. is the required
D KD iO
smoothed value. In other words, it will be necessary to
show that
106
-------
SNOIlONHd
o o
N33MJ.38 NOIJ.V-13ddOO
I
UJ
o;
o
j
3
Ul C?
|s
*£
2§
< DC
§fc
Z<
UJ
o o;
ZUJ
- Sit
D =
C0°
zo
O 3E
yuj
If
IV U
rr a
o a z
K UJ UJ
Q-F-2
z z
i- D.
en
z .g
o tn 5
UJ Z 0
a i2
a: o J
OLJ <
OH >
107
-------
Z A
k
= A.
so that K. . will be the weighted average of the values
K., . Rather than go through the details of a formal
1 1C
demonstration in the abstract, we consider only a concrete
example which illustrates what is involved. In the diagram
(no number) consider the point j and for the example we
take j=6. Let the triangle selection be such that the
triangles with 6 as the common vertex be (2,3,6), (3,8,6)
(8,9,6), (9,2,6).
Since A, .=0 unless k=j=6 or the edge
is involved, there
are only five values that are not zero; An,, A-,, A,,, Aoc,
<£Q JO DO OO
Ag, and these have the values as defined on page 86:
*36
^66
^86
[(9,2,6) + (2,3,6)]/12
[(2,3,6) -I- (3,8f6)]/12
2[(2,3,6) + (3,8,6) + (8,9,6) + (9,2,6)1/12
[(3,8,6) + (8,9,6)112
[(8,9,6) + (9,2,6)1/12
where (i,j,k) indicates the area of the triangle with
vertices P., P., P, and (i,j,k) must be a legitimate triplet
1 ] K
from the specific triangle assignment concerned.
these up, their total is
Adding
A26+A36+A66+A86+A96 = [ (2 ' 3 ' 6)
/ 8 ' 6) + (8 ' 9/ 6)
' 6) ] /3
108
-------
which is exactly the area (quadrature factor) A, as specified
on page 79. This demonstrates the assertion in this particular
example. The general case is not difficult to handle.
*
The ordered ensemble A. will start with reasonably large
* * 1
values A , X., but these will decrease rapidly. On the
other hand, the ordered ensemble of proper values, y., will
* -1-
generally not be nearly as large as those of A . to start,
but will decrease rather slowly (the nature of the decrease
will depend on the ratio of the number of observations to
the order of the matrix concerned). As a consequence, the
ordered proper values of the total matrix K.., A., will
* ^O a
approximate the values of A. for small i, but will be
dominated by the proper values y. for larger values of i.
The "change-over" point will be in the neighborhood of the
"knee" of the curve of log A. vs i as illustrated in Figures
IV-8 and IV-9 but cannot be exactly located there. The
effect of the "smoothing" property of the product-integral
technique is to drastically reduce the effect of the proper
values of the matrix of departure from the true covariances,
k. . (i.e., all of the proper values y. are reduced in size
ij i *
by smoothing) while the effect on the true values A. is much
smaller. The net effect is then to decrease the size of the
larger computed total proper values, A., when the product-
integral technique is used.
Now that the smoothing of the covariance kernel in
the product-integral technique is established, we proceed
to consider the effect that this would have on the proper
values and proper functions. Since we are dealing with an
empirically determined function, an important consideration
is the fact that every covariance value K.. is affected by
sampling variations and the values K.. (i.e., when i=j)
contain the residual variances in addition to the variance
109
-------
of the true values. The next section is devoted to the
method for accounting for residual variances, but they have
not been accounted for at this point. Thus, the matrix of
observed variances and covariances may be written as the sum
of two matrices Kij=K.ii+k.ji where 1^ . are the variances
and covariances of the "true values" and k.. are the departures
from the true values (which on the diagonal i=j may be quite
large). Now the proper values of the matrix K.., say A.,
are associated with the proper values of the matrix K.^ . , say \^,
and the proper values of the matrix k.., say y., but we cannot
* * 13 1
write A.= A.+k. where A., A., y. are all three ordered. We
can only write A. = A + y, where A. is ordered, A,>A_>A- A
ia]D i j-^jn
and the subscripts a and b fall in some order to suit the
situation. It will be generally true that the effect of
the residual variances and sampling variation on the proper
functions for the trapezoid method as compared with the
product-integral technique is of a somewhat different
character. These are dependent in a large measure on the
nature of the sampling variation displayed by k.. at the
off-diagonal points, i^j. In the product-integral technique
these values are strongly smoothed while in the trapezoid
method they are not. As a consequence, one would expect
that there would be (after a. certain undetermined index
value) a larger difference (smaller correlation) between
proper functions computed by the different quadrature
methods for the same triangle assignments than for different
triangle assignments for the same quadrature method. This
is, of course, what is being illustrated in Figures IV-10
and IV-11.
The conclusion to be drawn from the above comparison of
two quadrature methods for two apparently equally valid
triangle assignments is that there are certainly no more
110
-------
than 7 (or possibly 6) proper functions that are adequately
self-consistent and that if one wishes to take a more con-
servative attitude, this can be reduced to only 4.
3. Factor Analysis Methods
In the preceding section it was shown that both the
assignment of the triangle coverage of the region .concerned
and the quadrature method used to evaluate the integral
had a strong effect on the self-consistency of the proper
functions computed. It was also pointed out there that at
least one reason for this was due to the effect of sampling
variation and the fact that the residual variances had not
been removed. Of these two, one may at least approximate
the residual variances and remove their effect by using
Factor Analysis methods. This approach is considered in this
section.
In order to obtain an estimate of the residual variances
contained in the St. Louis SO- data, resort was made to the
ii ~
abbreviated method due to Joreskog (1962). The method of
ii
Joreskog was modified in that, in order to keep the advantages
of the integral equation formulation of the problem, it was
applied after the equations were formulated in a symmetrical
matrix form. (That is, the covariance matrix was modified
by the proper quadrature factors for the trapezoid method;
the straight principal component formulation based on "equal
quadrature factors for all points" was not used.)
The comparison of proper functions for the two methods
is shown in Figure IV-12. It is to be noted in this figure
that the covariance of proper functions for the two triangle
subdivisions is reasonably high up through the 8'th proper
function, after which it takes an abrupt dip. On the basis
of the argument that once this dip occurs, the proper
functions are essentially irrelevant due to the mathematical
111
-------
CM
T
- o
UJ
I
h.
f-U-
HU.
oo
31-
LJ «
j>0
CQ I Q
u o;
o u
SI -r "y \J
B J £ X
. z2l-
z°!^
ui >.*52
Stultn
t3§§
aSo*j
Z Ld
f- z a. o
oo"1
£0a-
,. H H tn
52 z D
u o
-------
(not statistical) ambiguities of the quadrature formula,
we can assert that there are not more than 8 significant
proper functions. This is indicated by the line A in Figure
IV-12.
ii
On the basis of the analysis of Joreskog's method using
the 5% level, there are at least 13 significantly different
proper values. This is indicated by the line B in Figure IV-12,
It is to be noted that the drop in the correlation of the
values of the proper functions between 8 and 13 is apparently
transient in that the correlation between functions 12 and
13 is as high as that for those of index less than 8. It
would appear that the statistical test for significantly
different proper values is far less restrictive than the
test based on the ambiguity of the mathematical problem.
After the 13'th proper function has been passed, the cor-
responding proper functions are very poorly correlated,
uniformly until the last one is reached.
The correlation of proper functions shown in Figure IV-12
corresponds to that of Figure IV-10 (the points indicated
by the dots) except for the fact that in the case of Figure
IV-12 the proper functions correspond to the weighted co-
variance matrix modified by multiplying ahead and behind by
the square root of the diagonal of the inverse of the weighted
covariance matrix as described on page 66. As shown there
(page 66 and following) this has the effect of making the
resulting "residual variances" (that appear on the diagonal
after modification) all equal. It is well known that if a
positive quantity is added to the diagonal of a symmetric
matrix, all of the proper values are increased by this amount
while the proper functions remain unchanged (Anderson, 1958).
H
Figure IV-12 then implies that after the Joreskog modification
(as described on page 66) the effect of the two different
113
-------
triangle assignments is to effectively make all corresponding
(same index) proper functions after the 13'th essentially
uncorrelated, while those with index 13 or less are highly-
correlated (the three exceptions are discussed later).
This, of course, is primarily due to sampling variations alone
since the effect of the residual variances has been made
uniform after the matrix modification. One is thence led
to conclude that the early "cut off" of the correlation
between corresponding proper functions shown in Figure IV-10
was due to the fact that the point to point variation of the
residual variances was a dominant factor.
To summarize the situation in different terms, the
computation of the proper functions and proper values without
removing the residual variances (more exactly, without
modifying the covariance matrix weighted by the quadrature
factors to give a uniform equivalent of the residual variances)
results in proper functions that become inconsistent (low
correlation of corresponding proper functions for different
weight factors of equal validity) at a very low index and for
which the consistency check (correlation) behaves in a very
irregular way (sometimes high, sometimes low) for the higher
ii
index numbers. On the other hand, if the Joreskog method is
used (wherein the weighted covariance matrix is modified to
obtain a uniform equivalent of the residual variances) then
the consistency check (correlation of proper functions for
different weight choices) clearly divides the proper functions
into two distinct groups; those with index lower than a certain
value (here, 13) that are rather uniformly self-consistent
(highly correlated for different weight choices), and those
with higher index value (here, 14 and above) that are uniformly
inconsistent (low correlation for different weight factor
choices). This is precisely the result that we had hoped
to achieve.
114
-------
The low values of correlation between proper functions at
index values 9, 10, 11 of Figure IV-12 are due to the fact
that the matrix method does not give self-consistent results
for the solution of the integral equation type problem when
different quadrature factors are used. In other words,
the modification of the original matrix by the quadrature
factors has introduced considerations that are not accounted
for by the sole question of statistical significance.
II
It is to be noted that the Joreskog test is based on
proper values alone while that of the correlation test is
based on the proper functions alone. On the other hand,
applications are all essentially based on the proper functions
themselves. As a consequence, it is necessary to consider
further the question of the application of the proper functions
9-13.
To see more clearly what is taking place in the con-
sistency check for different quadrature factors (weights)
the elements near the principal diagonal of the matrix
cij = I e*i(xk)e?(V
are tabulated in Table IV-4. In this table the central column
headed "same" contains the values C.. on the principal
diagonal while the column on the left headed "order" is the
value of i. The tabulated values are thence the correlation
of the proper functions of the same index but with different
but equally valid quadrature factors (weights). The diagonals
of the matrix C-. that border the principal diagonal appear
in the adjoining columns of Table IV-4. The values of C- ^+,
are listed under the columns headed by the values k=-3,-2,-l,
+l,+2,+3. The absolute values of the numbers in the column
"same" are those shown in Figure IV-12.
115
-------
Table IV-4. Correlation Between Proper Functions for Different
Area Assignments. The central column indicates
proper functions of same order. Adjacent columns
are the correlations between proper functions of
different orders.
Order
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-3
___
.013
.003
.001
.045
-.189
.070
.177
.200
.044
-.054
-.011
-.038
-.021
.055
-.011
.114
.075
-2
___
-.050
.062
-.027
-.025
.001
.226
.050
-.103
-.585*
-.231
.039
.015
.095
-.002
-.529
-.118
-.061
.198
-1
.002
.007
.006
-.134
.082
-.176
.308
-.017
-.297
-.575*
-.043
.074
-.002
-.824
-.040
.007
-.037
-.445
-.367
Same
.982
.974
.974
.967
.935
.928
.858
.862
-.671
-.193
-.391
-.929
.965
-.098
.072
-.292
-.338
.163
-.004
.397
+1
-.005
.010
-.007
.133
-.016
.087
-.212
-.091
.674*
.880*
.184
.060
.057
.099
-.011
-.199
.057
-.027
.341
.445
+2
+ .049
-.061
+ .031
.014
.024
-.242
.254
-.128
-.092
.028
.024
.009
.025
-.207
-.103
.398
-.037
-.162
-.018
.024
+ 3
-.013
.015
-.009
-.072
.229
.139
.223
-.041
-.191
-.018
-.026
.030
.025
.534
.068
.047
.062
.190
.115
-.017
* See text for a discussion of these values
116
-------
In Table IV-4 the large off-diagonal correlations for
proper functions 9, 10 and 11 are marked with an asterisk, *.
It is indicated that proper function 9 is highly correlated
with itself and 10, 10 with 11, and 11 with 10 and 9. In
other words, as the weights (quadrature factors) are changed
the proper functions 9, 10 and 11 shift about among themselves.
* * 4 #
In more detail, if 0.(x,)=0., 9v(x,)=0. are the i'th proper
1 K. 1 1 K 1
functions for two different but equally valid quadrature
factor assignments, we have the approximate relations
6* «* -0.6718* + 0.6740*0
6* ~ + 0.8806*
10 11
9^ «* -0.5859* - 0.5750*Q
(The relations would be exact if all 40 proper functions
were listed on the right with appropriate coefficients from
the full table.) Thus 0*^ shifts to 0*Q while 0* and 0*Q
go into 0* and 0* (via a rotation of about 135°). (All
of this is, of course, dependent on the relations between the
two quadrature factor selections and would be invalid for
any other pair of such quadrature factors.) Further information
on this phenomena is provided by the proper values of the
modified weighted covariance matrix which are illustrated
in Figure IV-13. It is seen there that proper values 9
and 10 are very nearly equal.
When two (or more) proper values of a matrix (symmetric,
positive definite) are equal, the proper functions cor-
responding to these are not both uniquely defined (Anderson,
1958). Apparently this pair of proper values are sufficiently
close to each other that the change in the quadrature factors
used was sufficient to show up this "near indeterminancy".
117
-------
Table IV-4 for order numbers 14-20 shows the small
correlations for C.. of Figure IV-12 in the column headed
"same", generally small correlations in the off-diagonal
positions, but a few scattered larger values but no apparent
pattern.
The residual variances depend strongly on the point
at which the number of significant proper values/functions
is terminated. These are shown in Table IV-5 for a termination
of 8 or 13 such. Only the values for one subdivision of
the area into non-overlapping triangles is shown since to
the number of digits shown in this table, the results of the
two subdivisions into triangles were identical. It is to be
noted that, since S0~ concentrations are approximately log-
normally distributed, the variances are those of the logarithm
of the SO- concentration. The station-to-station variation
of the fraction of the residual variance compared with the
total residual variance is to be noted. These are particularly
large at stations 2, 9, 10, and 30. It is presumed that
this is due to station instrumentation or instrument exposure
at these locations since these locations are not obviously
related to each other (i.e., other stations with relatively
small residual variances lie between each pair (see Figures
IV-5, 6, 7).
The residual variance computed for station 9 clearly
indicates that all 13 of the proper functions are significant
in the case at hand. If only the first 8 proper functions
are used, the computed residual variance is larger than the
total variance, which is quite impossible.
It was pointed out above that the residual variance
at locations 2, 9, 10, 30 appear to be quite large. A remark
on the significance of the ratio of residual variance to
118
-------
Table IV-5.
Total Variances, Residual Variance and the
Fraction of the Total Variance for St. Louis
S02 Data at 40 Stations. The abbreviation P.V./F.
stands for proper values/functions. The number
preceding this abbreviation indicates the number
of P.V./F.'s used in computing the residual variances
shown. See text, p. 118.
Station
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Total
Variance
.0497
.0829
.0374
.0583
.0269
.0982
.0738
.0784
.0945
.0376
.0295
.0758
.0352
.1118
.0976
.1938
,0269
.0532
.1155
.0567
.0778
.0684
.0854
.1468
.1527
.1183
.1352
.1765
.1195
.1067
.1100
.0743
.1299
.0976
.0861
.1051
.1114
.0570
.1016
.0428
8 P.V.
Residual
Variance
.0049
.0351
.0072
.0121
.0029
.0151
.0086
.0048
.1050
.0208
.0026
.0028
.0049
.0229
.0114
.0184
.0060
.0051
.0341
.0090
.0040
.0074
.0087
.0159
.0139
.0071
.0070
.0378
.0047
.0752
.0111
.0129
.0186
.0126
.0222
.0141
.0059
.0126
.0126
.0125
/F.
Fraction
.10
.42
.19
.21
.11
.15
.12
.06
1.11
.55
.09
.04
.14
.20
.12
.09
.22
.10
.30
.16
.05
.11
.10
.11
.09
.06
.05
.21
.04
.70
.10
.17
.14
.13
.26
.13
.05
.22
.12
.29
13 P.V.
Residual
Variance
.0026
.0183
.0037
.0063
.0015
.0079
.0045
.0025
.0548
.0109
.0014
.0014
.0026
.0120
.0059
.0096
.0031
.0027
.0178
.0047
.0021
.0038
.0045
.0083
.0073
.0037
.0036
.0197
.0024
.0392
.0058
.0067
.0097
.0066
.0116
.0074
.0031
.0066
.0066
.0065
/F.
Fraction
.05
.22
.10
.11
.06
.08
.06
.03
.58
.29
.05
.02
.07
.11
.06
.05
.12
.05
.15
.08
.03
.06
.05
.06
.05
.03
.03
.11
.02
.36
.05
.09
.07
.07
.13
.07
.03
.12
.06
.15
119
-------
total variance may be appropriate at this point. The ratios
in the column headed fraction, F, are the values
7 ? o
F = a /(a + a )
r/ \ T>
2 2
where a is the residual variance and a is the "true"
variance. The ratio of the residual to "true" variance is
given by
aj/a2 = F/(1-F) .
The ratio of the standard deviation of the residuals to the
"true" standard deviation is the square root of the above.
Some values are tabulated in Table IV-6.
Table IV-6. Ratio of Standard Deviation of Residuals (a )
to the Standard Deviation of "True" Values
(a) as a Function of the Ratio of Residual
Variance to Total Variance (F)
F
0.00
0.05
0.10
0.15
0.20
tfr/a
0.00
0.23
0.33
0.42
0.50
F
0.30
0.40
0.50
0.60
1.00
or/a
0.66
0.82
1.00
1.22
CO
It is to be noted that in the case of the four locations
noted above all have a value of F in excess of 0.20 which
means that the standard deviation of the residuals is more
than 50% of the standard deviation of "true" values.
Table IV-5 lists 13 locations with F values of 0.10 or more
for which the standard deviation of residuals is more than
33% of the standard deviation of "true" values. These seem
to be rather large.
120
-------
The significance of the residuals should also be noted
here. In the type of analysis that has been made, the term
residual includes not only the errors of observation, but
also the ability of the station network to resolve small
scale effects. Thus, the size of the residuals is dependent
on the density of the pollutant concentration measurement
points.
It was pointed out earlier that the proper functions
were not comparable with each other for various triangle
subdivisions or quadrature techniques after a certain index
had been reached and that this index seemed to be associated
with the index at which a "knee" appeared in the plot of
log-proper value against index number. The proper values of
the matrix after being modified by multiplying ahead and
behind by the square root of the diagonal of the inverse
it
matrix, an essential feature of the Joreskog technique, are
shown in Figure IV-13. The numerical values of the proper
values have been radically changed, but the shape of the
curve of the logarithm of the proper value against its index
remains about the same with the exception that the "knee"
has been removed and is replaced by a gradual change of
slope between index numbers 5 and 10.
In view of the preceding analysis, we would then conclude
that the criterion of Craddock and Flintoff (1970) and
Farmer (1971) is not really a measure of the significant
number of proper values/functions but is a phenomenon associated
with the fact that the residual variances have not been
adequately treated by a principal component analysis. The
Factor Analysis procedures used here give different results
that seem to be self consistent and which in addition provide
a quantitative estimate of the residual variances that were
hitherto ignored.
121
-------
1000
LOGARITHMS OF PROPER VALUE PLOTTED
AGAINST ITS INDEX NUMBER FOR ST. LOUIS SO
DATA THESE VALUES APPLY AFTER THE CO-
VARIANCE MATRIX HAS BEEN MULTIPLIED AHEAD
AND BEHIND B: THE SQUARE ROOT OF THE DIA-
GONAL OF THi. INV£r
-------
Note that in Figures IV-10, IV-11 and IV-12 the
correlation of the 40'th proper function for different
quadrature methods (Figure IV-11) and for different quadrature
factors (Figures IV-10 and IV-12) is extremely large and is
an isolated point in Figure IV-12. This is a mathematical
phenomena and has nothing to do with the fields involved.
As a matter of practical experience, the larger the index,
the more oscillatory the proper functions. This is strictly
true of what has been called an "oscillatory" matrix (Gant-
macher, 1960b). We have not found a proof for any other type
of matrices and one can in fact construct symmetric positive
definite matrices for which the opposite is true. We strongly
suspect that "oscillatory" matrices do not exhaust the class
of matrices with this property, but characterize a particular
class of matrices for which this property could be proved.
With the fact that experience indicates a highly oscillatory
behavior for the last proper function, we also suspect that
there is a tendency for these oscillations to either be in
or out of phase for matrices that are similar to each other,
as is the case for the matrices concerned here. This would
account for the high correlation observed for proper functions
No. 40.
D. Summary and Conclusions
In this chapter, the subject of the analysis of co-
variance matrices has been treated in some detail, since it
is of great importance in the analysis of data fields of
all kinds and in particular to the fields of pollution con-
centration data, especially in connection with the problem
of optimum observation station locations to which it is
applied in the computer program that is discussed in detail
in the final chapter. Section A was devoted to a general
look at the problem, Section B to a detailed discussion
of how to determine the residual variances and the number
123
-------
of significant proper values and proper functions required
to describe the statistical properties of the pollutant con-
centration field using Factor Analysis methods but modified
by the fact that one is dealing with a two dimensional field
and should use a basic integral equation formulation to
account for variable data density. The techniques developed
in Section B were applied to St. Louis S0~ concentration data
in Section C.
As a result of the analysis in Section C, it was found
that the Factor Analysis technique as modified by the integral
equation formulation gave (1) a quantitative evaluation
of the number of significant proper values and proper functions
required to describe the pollution field, (2) a quantitative
evaluation of the residual variances in such detail that
unique values were assigned each observation location
separately (much more than a general estimate), (3) that the
method was consistent in spite of the fact that there are
mathematical ambiguities in the quadrature process (evaluation
of the integrals involved), and (4) that the self-consistency
of the solution from the mathematical point of view cor-
responded very closely to the tests for significance from
the statistical point of view. It was further concluded
that the Factor Analysis approach is a prime requisite for
the success of the analysis since when this approach is
disregarded the results tests for mathematical self-consistency
and for statistical significance are not completely compatible
with each other; and it was shown that this results from
ignoring the importance of the residual variances.
124
-------
CHAPTER V
PROGRAM BAST AND ITS SUBROUTINES
The program for an optimal location of pollution observation
points (BAST) and its twelve associated subroutines are described
in some detail in the following sections. The interrelations
between these subroutines and the main program is shown in the
following table. The order in which they are discussed follows
TABLE 1
THE RELATIONS BETWEEN PROGRAM BAST
AND ITS SEVERAL SUBROUTINES
SUBROUTINES
CALLED
CIRCUM (B)
FMFP (C)
FUNB (D)
FUNCT (E)
CORFUN (F)
INT2D (G)
MATINV (H)
ADgPT (I)
TRIP IX (J)
PORDR (K)
AR2 (L)
TRITST (M)
PROGRAM
BAST (A)
/
/
/
/
/
CALLING SUBROUTINE
CIRCUM
=
/
/
FMFP3
=
/
/
FUNB
=
/
FUNCT5
=
/
/
/
ADOPT6
=
/
/
/
/
TRIFIX7
=
/
PORDR8
=
/
that indicated by the letters in parentheses in the subroutines
called column, BAST itself being considered first.
125
-------
As shown in the table, the subroutines are divided into
two categories: those involved in computation of the station
location [(B) through (H)], and those involved with book-
keeping required to preserve, at all times, a satisfactory
subdivision of the area concerned into completely covering
but non-overlapping triangles [(I) through (M)].
A. PROGRAM BAST
This program is set up to accept NSTN coordinates
XS(I), YS(I), 1=1, NSTN, of already located observation stations
of which NBDY of these form the convex boundary of the area
covered by these station locations (line 38). The index
numbers of the boundary stations are entered as IBDY(I), 1=1,
NBDY, and should be in counterclockwise order around the
boundary but with any initial starting point (line 41). The
boundary of the area covered by the initial stations is a
convex polygon with the boundary stations at the vertices.
The area is convex in the sense that it must not have any
re-entrant corners. Put in a different way, if one traverses
the boundary in the counterclockwise direction, then at each
vertex, the line segment that forms the boundary is rotated
in a counterclockwise direction. (The mathematical statement
is that the area is convex if the line joining any pair of
points in the area lies entirely within the area.)
The area covered by the existing station location is
subdivided into non-overlapping triangles with stations
at the vertices (line 52). The boundary segments are sides
of some of these triangles. There are NTR such triangles.
The index numbers of their vertices are ITR(I,1), ITR(I,2),
ITR(I,3), 1=1, NTR, and these must be in counterclockwise
order about each triangle.
12G
-------
The entire region is interior to a circle with center
at (XC,YC) and of radius R (line 56). Some of the stations
listed previously may lie on this circle. If this is the
case, their azimuth must be specified. The number of such
is NCIR and the point azimuth AZ(I), 1=1, NCIR is measured
counterclockwise (degrees) from East.
Additional information to be specified (line 81) are:
the number of additional stations to be located (NADD), the
percent reduction required between successive maximum errors
of estimate (PMIN), the expected absolute error of the
minimization subroutine (EPS), the maximum number of iterations
allowed for this subroutine (LIMIT), and a control for print-
out of details (IPRNT).
Since the program calls for computation of correlation
coefficients via their proper value/function representation
at a grid of points covering a square 80 km on a side, the
coordinates of the grid points (lines 87,88), the proper
values (line 93) proper functions (line 96), and standard
deviations for weight parameters (line 101) are required
inputs. The program, as it stands, is set up for 15 proper
values/functions on a 9x9 grid with 10 km spacing between
points. The weight parameter (line 101) corresponds to the
real statistical situation where expected standard deviations
of pollutant concentrations are used, but the weight parameter
may be arbitrary. The program assumes that it may be
differentiated using finite (1 km) differences with sufficient
accuracy.
If less than two existing stations are listed on the
circle that defines the region being considered, then two
such points are located on the circle (lines 114-135).
127
-------
The criterion for station location is to the effect
that a "best" location is at a point where the error of
estimate using a linear least squares regression on sample
values at existing stations would be a maximum. Since such
errors of estimate are bounded between zero at an existing
station and the variance of pollution concentration, it is
reasonable to look for such maxima at locations distant from
existing stations. Consequently, candidate locations are
taken to be (a) at the center of gravity of the station
triangles or (b) midway on the circle between stations
located on the circular boundary (lines 135-173). Of the
errors of estimate found at these locations, the largest
three are selected and the neighborhood of each is searched
to find the value of the local maximum in its neighborhood.
These local maxima are then compared and the new station
location assigned at that point which has the largest local
maximum (lines 174-252). A bit of manipulation is required
(lines 253-275) to insure that new points on the circular
boundary do not wander outside the circle.
The total number of stations located (NSTN) is then
checked against the total number required (NSTOP) (line 287).
If this number has not been reached the process is repeated
by returning to line 135. If the required number has been
reached the remainder of the program (lines 290 through 322)
prints out the results and terminates. The final lines 323
through 345 are devoted to various diagnostics.
Note: The subroutine FMFP is a standard subroutine
that locates the minimum of a function of several variables.
The procedure described above involves locating a maximum.
To "trick" FMFP into thinking a maximum is a minimum, the
mean square error of estimate is given (internally) a negative
sign. Whenever the term minimum appears in subsequent des-
criptions it is to be understood in this sense.
128
-------
I READ INPUTS |
I ADD BOUNDARY POINTS VIA CIRCUM |
LOCATE C.G. S OF TRIANGLES
I
GET FUNCTION VALUES VIA FUNCT.
I
[SELECT BOUNDARY POINTS ]
r
GET FUNCTION VALUES VIA FUNB
i
J
FIND THREE SMALLEST FUNCTION VALUES
I
I GET LOCAL MIN. FOR EACH OF THESE VIA FMFP \
I
[PICK SMALLEST AS THE GLOBAL MINT}
I
ADD THIS TO STATION LIST AS A
BEST LOCATION VIA ADOPT
YES
NO
FIGURE V-l
ABBREVIATED FLOW CHART OF MAIN PROGRAM BAST
129
-------
'DECK
20
35
C
c
C
c
c
PROGRAM 3AiT( INPUT, OUTPUT )
GtTERMlNES LOCATION OF »OINTS QF LARGEST EK*GR OF ESTIMATE FRO
EXISTING N-T AND UCATc^ N'£W STATION THERE. PSOCE-JURE ITERATES
UNTIL EITHER TOTAL NO. OF STATIONS REACHES PRESCRIBED SIZE OR
TILL THE PERCENT REDUCTION OF TH£ MAX EOFE IS dELOW PRESCRIBED
AMOUNT
CGMiIGN / 3LK1 / NSTN, N6JY, NTR, XS(^0», YS(30), ITR(pO,3)
, , XC, YC, R, I3DY(20)
GC.IMON / BLK? / XD(9), YD(9), PH(15,9,9), ALAM(15), WO,9)
CCMMON / BLK3 / NCIRt AZ(20)
CLMrtON / PLK
-------
3AST
tO
65
70
75
85
90
95
100
105
110
C NCI* = NUM3EF OF POINTS ON THE CIRCLE
READ 1058, XC, YC, R, NCIR
1058 FORMAT <3F10.0,I5)
C AZIMUTH OF POINTS ON THE CIRCLE
C AZ(I», 1*1, NCIR, AZIMUTH IN DEGREES CC FROM EAST
C THESE MUST BE BOUNDARY POINTS APPEARING IN THE LISTS XS(II, YSU).
C NONE OF THE BOUNDARY POINTS MAY LIE OUTSIOE OF THE CIRCLE.
C SOME OR ALL MAY LIE INSUE. IF ALL LIE INSIDE THEN NCIR = 0.
C THEY MUST APPEAR IN ORDER OF INCREASING AZIMUTH, 0 TO 360 uEGREtS.
IF ( NCIR .LE. 0 ) GO TO 15
READ 1060, ( AZ(I>. 1=1, NCIR )
1060 FORMAT ( 9F8.0)
DC 1 1=1, NCIR
1 AZ(I) = AZU) * .017<»53292
C MISCELLANEOUS OTHER INPUT PARAMETERS
C NAOO « NUMBER OF POINTS TO 3t AJOEO.
C IF NCIR = 0, 1, THEN 2 OR 1 POINTS HILL BE ADOEO ON THE CIRCLE
C REGARDLESS OF NAOO. IF THIS IS NOT THE STOPPING CRITERION, NAUO=0
C PMlH * PERCENT REDUCTION 3ETHEEN SUCCESSIVE MAXIMUM EPPORS OF
C ESTIMATE. NO POINTS HILL BE ADDED AFTER PERCENT REDUCTION GOES
C BELOH THIS VALUE.
C EPS = EXPECTED ABSOLUTE ERROR FOR SU8RPOUTINE FMFP
C LIMIT = MAX. NO. OF ITERATIONS FOR SUBROUTINE FMFP
C SUGGESTED i/ALUES FOR EPS = .01, AND LIMIT = 5
C IPRNT = REQUEST FOR PRINTOUT OF DETAILS OF THt INPUT OTHER
C THAN COORDINATES OF INITIAL STATIONS
15 REAO 1062, NAOO, PMlN, EPS, LIMIT, IPRUT
1062 FORMAT (15 ,F10 . 0,£10.1,215)
TPI = 6.2831853
NSTOP = NSTN * NAOO
C COORDINATES FOR THE CORRELATION COEFFICIENT
C PARAMETERS XOU), YD1I>, I = 1, 9
REAJ 1056, ( XO(I), I = 1, 9 )
REAJ 1056, ( YOU) , I = 1, 9 )
C
C CORRELATION COEFFICIENT DATA INPUT IN TERMS OF
C PROPER VALUES AND PROPER FUNCTIONS
C
REAO 1055, ( ALAM(I), I = 1, 15 I
1055 FORMAT (8F10.0)
00 2 I * 1, 15
2 READ 1057, ( ( PH(I,J,K>, K = 1, 9), J = 1, 9 )
1057 FORMAT ( 16X, *»£16. 5/5E16. 8/ <<»E16. 8/5E16. 8 11
1056 FORMAT (9F8.0)
C HEIGHT FUNCTION TABLE FOR VALUES AT POINTS XO(I), YD(Jl
00 3 J = 1, 9
REAO 1056, ( H(I,J), I = 1, 9 )
3 CONTINUE
C OUTPUT OF THE INPUT STATIONS
2000 FORMAT (1H1)
2002 FORMAT
PRINT 2000
PRINT 2010
2010 FORMAT (5X,»A. COORDINATES OF INITIAL STATIONS*/5X,6HSERIAL,2X,
lllHCOOROINATES/5Xt6HNUM8£R,5X,lHX,5X,1HYJ
131
-------
3AST
i?o
12?
130
13
1<*0
150
155
160
165
13
PRINT 2012, < I, XS(I), YSd), 1 = 1, NSTN )
2312 FORMAT (I10,F7.1,F6.1)
C Me. A3 ING FOR RESULTS
Fn.EF = 1.
IF ( NCIR .GT. 1 ) GO TO 3
C ACJS ONE CR TWO POINTS ON TH£ CIRCLE
10 = 1 ? 1C = 2
CALL CIRCUMJ XC, YC, R, NCIR 1
PRINT 2018
2018 FORMAT (5x,*e. COORDINATES OF CIRCLE FCINTS AJCEO*)
C FRISTS COORDINATES ANb SERIAL NUMBERS OF POINTS ADDED ON CIRCLE
OC 6 I = 1, NCIR
II = NSTN - NCIR * I
PRINT 2012, lit XS(I1), YSdl)
6 CONTINUE
PRINT 201 » AZdl) ) /
2
LT. AZ(I) > ANG
IF ( AZ(I1)
CALL FUN6( 1, ANG, F, G >
II = NTR *• I
XT (ID = ANG
SAMP(I1> = F
NTOT = NTR * NCIR
PRINT 13, ( I, ( ITRd.JI,
,1=1, NTR )
FORMAT UI5t2E15.6tl5X,£15
NTR1 = NTR * 1
DO 304 I = NTR1, NTOT
X(l) = XC » R * COS( XT(II
ANG
3. li»15926E<»
J = 1, 3J, XT(I), YT(I), SAMP(I)
6l
132
-------
3AST
170
175
180
185
190
195
200
205
210
215
220
X(2> = XC * R » SIN( XT(I) )
304 PRINT 11, I, X(l», X(2), XTU), SAMP(I)
11 FORMAT (I5il5X,4£15.6l
c PIC
-------
2?5
230
235
2 *.0
503
£02
50*
C
C
26
28
30
32
3k
3b
38
C
C
kO
IF 1 YT1PU) .EQ. 99.9
YTMP(J) = YC +• * » SIN
XTMD = XC * R * COS
CONTINUE
PRINT 50<*, ( XTMF(J) ,
PRINT bOi», ( XMAX(J>,
) 503, 502
( XTMP(J) 1
( XTMPCJ1 )
YT*P, STAMP(J), J = 1, 3 )
YMAX ( J), FMAV( J) , J = 1, 3 )
FORMAT (3(2F10.5, 2X,E12. 5) )
END TcMP CARDS
NOW ORDER THESE
DC 30 K = 1, 3
SML = 1.
00 28 J = 1, 3
IF f FMAX(J» .LT, SML
SML = FMAXtJ)
Jl = J
CONTINUE
JTRY (M = Jl
FM2(K» = SML
FMAX(Jl) = 1.
CONTINUE
IF ( F-3EF .GT. 0. ) 32
PER = 999.9
GO TO 36
PER = ( FBEF - FM2UJ
DO 38 J = 1, 3
XAU) = XMAXf JTRV(J»
YA(J) = YMAXC JTRYUI
FM2(J» = -FM2(J)
FBcF = -FM2<1)
ADDS NEW POINT TO LIST
XO = XA(1)
YO = YA(1)
CALL AOOPT( XO, YO )
ADDS TO CIRCLE LIST IF
oisr = SORT ( ( xo - xc
IF { OIST .Gt. ( R - 0
ANG = ATAN2( YO - YC,
IF ( ANG .LT. 0. ) ANG
DO kk I = 1, NCIR
) 26, 28
, 3 <»
) / FBEF
>
»
NECESSARY
)**2 » < YO - YC )**2 )
.1 ) ) <+0, 50
XO - XC )
= TPI <• ANG
11 = I *• 1 $ IF ( I ,£Q. NCIR > 11 = 1
IF ( AZ(I1) .LT. AZdl
IF 1 ANG .LT. AZ(I1) .
) GO TO <»2
ANO. ANG .GT. AZ(I» ) <<6, kk
k2 IF ( ANG .GT. Azdi .AND. ANG .LE. TPI » GO TO <*&
kk
2017
<+6
IF ( ANG .GE. 0. .AND.
CONTINUE
PRINT 2017, ANG
ANG .LT. AZ(Il) ) GO TO U6
FORMAT (* FAILED TO FINO NEW CIRCLE POINT FOR ANG =
CALL EXIT
IA = 11
12 = NCIR - 11 * 1
NCI* = NCIR + 1
00 *fl I = 1, 12
11 = NCIR t 1 - I
IM = 11 - 1
250
255
260
265
PRINT 2017, ANG
F10.5I
CALL EXIT
V6 IA = II
270
275 k& AZ (ID = AZ(IM)
134
-------
PROGRAM 3AST
AZfIA) = ANG
50 CONTINUE
C PRINTS OUT THE RESULTS OF THIS PASS
PRINT 2020, NSTN, XA(1), YA(1), FM2C1), PER, ( XA(I), YA(I),
2£0 , FH2(I) , I = 2, 3 )
2020 FORMAT (I10,Fa.l,F6.1,tll.3,F8.2,F8.1,F6.1,E11.3,Fr.l,F6.1,£11.3)
ISPACE = ISPACE * 1
IF ( ISPACE .EQ. 5 ) 52, 5*
52 ISPACE = 0
265 PRINT 2002
5* CONTINUE
C HAS THE NUMBER ASKED FOR BEEN REACHED
IF ( NSTN .GE. NSTOP ) GO TO 900
C TRY AGAIN
290 GO TO 8
C PRINTS REST OF INPUTS
900 IF ( IPRNT .NE. 1 ) GO TO 999
PRINT 2000
PRINT 202*
295 PRINT 2026, ( J, ITR(J,1), ITRU.2), ITR(J,3), J = 1, NTR )
202* FORtlAT (8X,*lNOEX NUMBERS OF TRIANGLE VERTICES')
2026 FORHAT <8(IX,12,1HI,13,1H,,I 3,1H,,13,1H/))
PRINT 2000
PRINT 2028
300 2028 FORMAT (5X,'PARAMETERS OF THE EMPIRICAL CORRELATION COEFFICENTSV
/dX,*COOROINATES OF THE DATA GIRO'»
PRINT 2030, ( XD(I), I = 1,9 )
2030 FORMAT (* X = *,9{F8.2,1H,M
PRINT 203?, ( YO(I), I = 1, 9 )
305 2032 FORMAT (* X = *,9(F8.2,1H,I)
2038 FORMAT (1P9E12.*)
PRINT 20*0
20*0 FORMAT ('WEIGHT FUNCTION ARRAY IN FORM OF COORDINATE GRID*)
00 91* J = 1, 9
310 Jl = 10 - J
PRINT 2038, ( W(I,J1), I = 1, 9 )
91* CONTINUE
PRINT 2000
PRINT 20*2, XC, YC, R
315 20*2 FORHATC* CUTER CIRCLE HAS CENTER AT X = *,F7.3,2Xf» Y = *,F7.3,2X,
,' AND RADIUS = *,F«.3)
PRINT 2002
PRINT 20**, NADO, PMIN, EPS, LIMIT
20** FORMAT (* NUMBER OF POINTS TO BE ADOtD = *,I5/» PERCENT CHANGE OF
320 'MAXIMUM ERROR OF ESTIMATE BETWEEN ITERATIONS = »,F7.3/
»* EXPECTED ABSOLUTE ERROR FOR FMFP = ',El0.3/» MAXIMUM NUM3ER OF I
'TERATIONS OF FMFP = *,I5)
GO TO 999
C DIAGNOSTICS FROM FMFP
325 950 Nl = NSTN * 1
IF ( IER ,£Q. 1 ) 952, 95*
952 PRINT 2050, Nl, LIMIT
2050 FCRMATC NO CONVERGENCE FOR POINT NO. » ,I3,2X ,*IN» , 13, 2X,
"ITERATIONS')
330 GO TO 960
135
-------
33?
3<+0
3AST
95-* IF ( IER .£0. 2 ) 956, 959
95o PRINT 2052, Nl, LIMIT
20^2 FORMAT (* NO MAXIMUM, POINT NO. = *,i3,2x,» ITERATIONS = *,I3»
GO TO 960
958 PRINT 205<», Nl, LIMIT
205<+ FORMAT(* ERRORS IN GRADIENT. POINT NC. = *,I3,2X»* ITERATIONS = *
,,I3»
960 IF ( II ,GT. NTR ) GO TO 9b2
PRINT 2056, XT(T1), YTdl), STAMP(J), X(l», X(2), F, G<1), G(2)
2Q56 FORMAT (» INITIAL VALUES*/2F10.5,E15.5/» FINAL VALUES ANC GRADIENT*
**/2F10.5,3E15.5l
Gu TO 23
962 PRINT 2058, XTflll, STAMP(J), X(l), F, G(l)
2058 FGRMAT(» INITIAL VALUES*/F10.5,E15.5/» FINAL VALUES AND GRADIENT*/
/F10.5,?E15.5)
Go TO 25
999 STOP
ENO
136
-------
BAST Code Sample Problem
The input required for the BAST code is defined on the
following page. Reference to the main subprogram and to
the text is helpful for obtaining further detail. A sample
input case follows. The case consists of eight fixed stations,
the last four of which are boundary points, i.e., their
coordinates define the convex region containing the eight
stations (data card groups 1-3). A total of 10 triangles
cover the region defined by the eight stations (card
groups 4 and 5).
The region for which correlation coefficient data is
input, and to which additional stations are to be optimally
added, is defined by a circle centered at (0,0), with a
radius of 30 km (card 6). No input points on the circle
are indicated by card 6, so card 7 is not required. In
this event, the code will automatically define two points
on the perimeter of the circle.
Input card 8 specifies the number of stations to be
added, 17, plus program controls for the minimization routine
(FMFP) and printing option. Cards 9 and 10 contain the
standard set of coordinates for this problem, a 9 X 9 grid
with 10 km spacing in both the X and Y directions. Card
group 11 lists the proper values of the correlation coefficient
matrix, in monotonic decreasing order. The desired signif-
icance is obtained with the first 15 values. Card group 12
contains the proper function coefficients, at each point in
the 9X9 grid, corresponding to the 15 proper values.
Last, card group 13 is a 9 X 9 array of weight parameters
for the correlation coefficients. The normal situation of
equal weighting, all weights being 1.0, is input in this
case.
137
-------
A sample output follows the input. The fixed stations,
the stations added on the circle, and then the remaining
stations to be added are listed. For each added station
the three prospective coordinate pairs of maximum estimate
of error (E of E) are listed. The optimization procedures,
if successful, varies these coordinates in obtaining the
location for which the greatest reduction in estimate of
error can be achieved. Optionally, extensive mathematical
detail can also be output.
133
-------
CA3D GROUP
NUMBER
FORMAT
OF
MAXIMA OF 3o
9
10
XS, YS
N8DY
I°DY
NTR
ITR CCW FKOM EAST* OF POINTS LYING ON 9E8.0
CIRCLE, TOTAL Qf NClR POINTS, SKIP T^IS CARD IF
= 0.
NUMBE» OF STATION^ TO HE fiDOED IS,FlO . 0,E10. 1
PF«CEMT PEOUCTIOM BtTwEEM MAXIMUM E-RRORS OF
ESTIMATE, uo POINTS ADOEI AFTER ^RECENT -?EDUCTIO\J
GOES ?ELOd HMIN
EXPECTED ABSOLUTE E^^oRt SU^ROUTINJP rMF=
M/sXIMJM NJMaER Or ITERATIONS IN SUBROUTINE FMFP
PPINT OPTION FLAG. o» OUTPUT STATION COORDINATES
ONLY, i, DLTAILED JIAGMOSTICS PRINTFO.
FOR CORRELATION C°£FFICIEVT INPUT 9F8.0
Y_CORPlDNATtS FOR CORRELATION COEFFICIENT INPUT 9F8.0
DATA. COOE ASSUMES 9 VALJES 0^" XO AND YD, FQR A
9X9 GRID rflTH lo KM SoACI\|fi BETWEEN
II
12
13
8F10.0
ALAM PROPER VALUtS FOR CORRELATION COEFFICIENT DATA,
1* VALUES.
PH(I,J,K) COEFFICIENTS OF 1= =>R03E3 FUNCTIONS, I = 1, 15,
AT GRID POINTS SPEC^IED ^Y (XD»YD), ORDERED FOR SElb.9/
XD VALUES. KOR EACH YD VALUE < I.E.. UEI^.B,
(( PH(J,J,K), K = i,9 ), J = 1,9 ) >. 5E16.S)
FUNCTION TaPLE FOR POINTS (XO,YD),
9F8.0
139
-------
t 1NJDJT
C A '7 n =5 1. T J «
o. :..
-.5 u.b
-4.75 -2.5
-.5 -<+.
t.5 -8.
4.5 -2.5
3.5 9.
-22. 7.5
tt » # »
»o
p
5
I
3
1
2
1
2
2
•»»««
0.
U4HH1
17
•!>»«•«•
-40.
-40.
»»•»«•
CAPO SET
5
4
6
4
6
6
2
7
8
CA^D 6
0.
CARD H
.01
-3C.
-30.
CA3D cFT
3
3
1
i
1
2
7
3
8
3
»»4«tf»»«»»«»»»«»«»»««ftft»««ft««»
30. 0
«««*»»»»«*«»*»«»»»»»»»»*»*»««»
i ^ •
• * !
-20. -10. 0. 10. 20. 30. 40.
-20. -10. 0. 10. 20. 30. 4o.
*i»«*»»»*o***»«»»»»<»»«*»#***»»**
28.909829930.4315128
.73^2799 .5133393
.3574174
2.6616002 2.196S837 1.220i544 1.1634856
331^036 .2512566 .2123830 .1081351
PROPER FJNC. 1
7.8273254PE-02
4.91204939E-03
9.71762U28E-02
-1.13289246E-02
9.54481463E-02
9. 72^90823^-^2
1 . H649P 5^4r_o2
1 . ! lSOi4^o^-'J 1
1 . 3oB771 76---03
1 . 31 071 3
2.°1
1 . 0*
3
1
2
t -02
72--0 1
- -0 1
-5.52873886E-02 -5.2485044l--.Gc; -
4.523>^0567E-02
1 . 0^6lH454p-0 1
b « 792^98 53r-02
1 . 152C>31 37r-01
6. 20^60851 c--02
1 . 41
2.
&.57938454r-0
-------
2.3Q534203t-02 1.
-1.12859460E-01 -1.
-1.53931315E-01
-1.19492211E-01
-1.63982476E-01
-1.11995777E-01
-1.66174576E-01
-9.32753081E-02
-1.57249352E-01
-B.98516291E-02
PROPER FUNC. 2
1.&8599575F-01
2.58123666=:-|J3
i.o937«»<»l7t-01
P.P7039Q67---02
01 1.58256^97^-01
02
1.57131973E-01 1.595157l7r-0l
01
01 i.42279!05r-01
01
01 1.2225b099;.--0l
-3.«»5392130P-02
3.51570809£-C
1.13955398F-U1
1.51106930E-01
1.30655144E-01
1.47567607E-01
1.45989865E-01
1.51173858E-01
1.17485730E-01
1.25469179E-01
2.26306189E-02
9.34450852E-0
-------
9. 59*) -^450^-02
9.63918369E-OJ 1.: 91 7*690 ~—jtL -5.1
D.4336^953t-02
-1.09d276b7E-o:
-2.54435856E-03
-2.31335639E-Q1
PROPER FJNC. 5
2.9470556GE-02
-4.96574543E-02
5.71023314E-02
5.2938?061E-02
2.462669Q9E-02
1.04562477E-01 9
-2.09007517E-01 5
6.88108760E-02 3
-9.8l58579o£-0^ 2
-3.79426S54E-02 -4
2.53804551E-01 3
-5.16735579E-02 -4
1.07424413E-OI n
-6.49296552E-02 -6
-5.39149076E-03 -1
-1.24685118E-01 -i
2.25283490E-02
PROPER FUNC. 6
2.32072364t-01
-1.41854046E-OI
1.34712190E-01
-1.99453596E-01
-- '1
6.7179?61«£-03 -»
--01
1.0461S8&4F-01
8.67563695E-Q2 >.
-3.5569862CE-02
PROPER FUNC. 7
1.02962321E-01
-1.15002120E-01
f.55958905E-02
7.28336007"-U«!
1.74^91039
«.H3
2.00
01
02
:lo--n -;
- -01
-9,99^89954--(
l.!6°2o330-'-01
3.4iS03l7?5--02
4.79'7^3fl8r-02
5.7B0998Q9E-02
1.11^74066^-02
«. 0735^-707^-02
1 .b200°794£-01
1.31951755^-01
1.36663303^-01
A.26642007--01
1.72210495^-01
-i.09465773^-01
-1.3618B963--01
-d.B3940150---02
-J.172934Q5--02
3.42138412---J3
J.07191213--02
1.27163154^-01
2.30392728r-ol
-4.93870213=--02
1130F-02 -t
-1.03363937F-01
-7.«?994430E.-02 -1.01727667--01
3.324H062E-02
7.23906777E-02 -4
1.05074735E-01 1.53376502--')2
02 -5.1507"4955-02
02 2.06959993--02
1.33120038^-01
9.03230076F-02
2.Q5791661F-02
-b.39lPR916E-02
. 08357975-1-01
.49J54654- -02
6.2S410204F.-02
-2.61305459F-02
-1.4704Q130F-02
1.39613657E-03
1.21864921^-01
-7.98054349K-02
3.05759671F-02
.9«o76201r-02 -6.67310735F-02
-1.2n5?6417F-01
4.00501930F-0?
2.24656696F-01
2.74285861E-02
3
(».543b4bl8r_o<::
4.84856476F-02
t>.7544S»bo7--02
».04872955r-02
0.85158455^-02
1.46079962 — 01
-B. 81331683^-02
-4.95471852^-02
t. 58478857^-02
142
-------
•2.82799542E-OJ l.u
•3.73925194E-0> -2.1
3.12^83205E-01 -5.<*
-1 .641'-* i74B r-01 ~J
1.29770559E-01 -<
1.97011792E-0; 5
1.08734877E-01 -I
-1.91314695E-02 5
1.8339613EE-02
PROPER FUNC. 11
1.33987406E-oi
1.26697376E-01
2.37635991E-01
1.4857974RE-01
5.084Q9717E-02
3.18742356E-02
2.47013442E-02 -5
1.4265127GE-01
-4.72893111E-02
2.27701383E-01
-1.28835119E-01
2.12842445E-Q1
2.47074943E-02
-2.28151<*84E-02
3.45244999E-02
3.7066Q780E-02
PROPER FUNC. It
3.24804198E-02
5.1061Q381E-02
1.2425Q744E-02
.10554399E-OI
.03066976E-Q1
16872594E-02
.53692248E-02
S.1672Q501E-02
2.20869430E-01
8.45785730E-03
2.35151781E-01
-1.0277P290E-03
-2.25835953E-01
-7.42316933E-02
4.23024553E-02
5.39579118E-02
2.58440917E-02
PROPEP FUNC. 13
-7.75100209E-02
-1
1
3
8
-1.81153422E-01
-7.33847355E-02
-6.6077182<*E-02
3.
2.n997537lt-0l
1.13905516-1-01 -]
-1 ,58924964F-i>3 -'
1.58939523F-ol
2.15871R39--01
3.uQ329735f.-Jc:
2. ?
1.02347831-:-ul
2.37b396l2E-Oc!
1. 11951373?. «jl
l.R0641«44E:-01
i.l705hfe49r:-01
3.76b84478£-C2
-01 l.36424767--0ci
-1.33664454F-01
-1.6435^495^-01 -3
4.(t7-»72112"-02 -8.761«9945F-02 i
-1.35o39o8«E-Ol -!•4^6^442^^-01 -i
-1.33352487E-02
-»» i 94E-02 --
3.cH
-------
-2.33113562E-01 7.4'
3.63910562E-02 -z.t' ... . - .-._.._.
-2.63571796E-CI 2.'+579222c; -- • 1 « .61 775769-- 02 -"S
-1.09626299t-0i -2 . ^79^1 83b- -(, 2 S . 4Q 1 ^1 0 1 1-'-02 -<•
-1.88591191E-11 -3.55l3Q74fU--jl -2 .<-4dl 33B2~-03 3 . 7*7qa7snr-«Q 37F-02 i . 950 1 70 73- -OtJ
-2.44341837E-02 >.oQ334029-:-v2 !. 1 4*29427^-0 1 1.4421 GO 41r-01
3.4l58«728E-02 -!.'>7dt>«063F-ol 1 .1 6CJG5036'-0 1 1. 5H,f T0824E-01
1.6l45o89oE-02 5.15112172F-02 1 . o^**76fep^---0 1 b . 686^*)839F-02
-1.4440R245E-01 -2.15220237F-U1 <*.24oSl544r-02 1.n^6^«fl43E-01
PpOPEp FjMC. ^ -2.'+103i:>7:>6E-01 -2.15 J39b3«r-01 -7.5A27/+555F-02 1.50342204^-01
-1.01529704E-0* -:.'
7.63922850E-02 5.610E041 o-T-03 -3.7=)t»781 46--02 -b.8^5^3554^-02 -7. 92641030'-02
-1.76022413E-01 -1. O^OOl '*58F-v I -7.41219842F-02 -7. 00 7?0379F-02 -i
3.7339Q343E-02 3. C305b805^- Jti -2 .0^«935^9;-02 -1. lb9S?558r-0 1
-2.6547755«E-oi -2.707l = o3=---0 1 2. ?<57l I 95«£-02 1.1 ^244430E-01 7.5b327438r-n2
-6.57879989E-02 -3.05731488E-02 -1.P49Q9231F-0 1 -1.15494860^-01
1.86053399E-Q1 -1. 10037346 —oc -1 .oo^74fca7---0 1 a.3^50^599^-02 '<
-2.92134157E-02 5.4503596Q--C2 -7. 1 ?«:9QiJ7h--02 -9.8714^237^-03
5.30&31914E-02 -1 .900382B6--0 1 -2.4 1 aSl 05^^-02 -•* . ^7904733.F-02 -:
3.32083674E-Q2 5.0225]035F-02 -4.41b2o220--02 -1.079^^474^-01
-1.81341067E-01 -2.T0244037E--a -1 .3«*19223?~-0l -8.5Q9ST1 71 E-02 -:
1.09953670E-01 9.3237044?=-'^ -3.31'*67204F-02 -1 . 570 n20«--01
-4.30207760E-02 -1 . l4897422'-'> 1 .32^153«S--01 -1. 3^1 C543F-02 -:
3.40049886E-D2 1.49254773--u2 -fr .21o1S472E-02 -1.1?17n645^-01
-1.3670528lE-Oi 7.2660894^--02 2.C136797«--oI 3.Q37S4746E-02 -1*.35972384r-o2
PROPER FuNC. 9 2.12001231F-J1 1.83d7970c"r-01 ^.&20^C0?9F-02 -i.69783053--01
-2.07739310^-01 -1. T2879796--01 3.01 «*59o57--o2 1.l?394bl1£-01 1.16637655~-01
3.26869989E-QJ 9.o5219960E-0£i 6 .Q*> 1 *\ 729F-02 -b. 895a?951 F.-02
-2.61299804E-02 5.c>0876378E-uc' 9.8«3327l 1 "-02 9.52506936F-02 7.95o45234r_03
-1.14254261E-01 -1.28714919^-,1 -1 . 74b733«5---0 1 -b. 7.3760548E-02
1.60890183E-01 2.77820621 c- J1* -1 .2?8374b9--ol -1 . Pfc 1
1.36130580E-01 -9. q3326779E-C2 -^. = 92e:5470--02 -1.03637936F-01 -1.29696080---01
1.20347300E-OI 1.52708981F.-01 9.4042738H--02 -8.QO l=062bc-03
-1.53732209E-01 4. 86043964F-02 1 .89^17113E-01 8. U9Q7791 E-02 -'....,
PROPER FjNC. 10 5.19245337E-J2 5.25<»45873E-02 -».5334«695E-03 -b.10131986--02
-7.86148557E-02 -5.685l678oE-02 ]
02 2.ol7970l4t-02 -£
-6.06995400^-03 -7.7o83£»937E-'>3 -5.7643Q355E-02 -8
02 7.35752000F-OC -2.02°l4430~-03 -3.7543Q098F-02 -'
02 7.72921194F-02 1,0?'*77750E-01 4.6796141 Or-02
144
-------
-4.05590052E-02'
-5.30618510^-03
-6.07213021E-02
1.2018R588E-OI
--1.50699222E-0!
-1.32679503E-0;
-3.07bl5636E-02
5.27634533E-02
1.63238222E-02
-2.08352659E-02
3.1R360227E-02
4.67046101E-02
2.00<:132<»2E-02
5.28M7312E-02
-7.388124HOE-02
5.81050621E-03
-5.48997783E-02
-1.41158513E-02
3.336b8940E-02
-1.64419620E-01
3.30006154E-02
-5.46608587E-02
-3.14438469E-02
7.03747368E-02
-1.13225561E-01
9.43947797E-03
1.56216338E-02
-2.55055303E-02
-2.37234579E-02
PROPER FJNC. 15
-I.75163969E-02
2.14161623E-OI
-6.28205997E-02
1.63966818E-01
3.35785569E-02
1.72608575E-OI
-5.705«»7759E-02
i.l 310010 !-~-Cl
•2.75b30009---0cr
-1.53302877--02
-b.ec.i«?4!f^c_o2 -i
9.PJ 075947;-02
-2.4f04a420F-01 1.27b57b39c--ol
1. 722120351-01 -?.74d77127--01 -<; . 3^5?a471 r_o^
-02 -d.35750^66^-02
2.12«23579E-yl
-2.56435664K--01
-1.59906235E-OI
-5.33173438E-02
1.05015452E-02
3.16619165E-02
-2.14736746E-03
1.1470<»035E-02
7.58568479E-02
1.2030P773r-Ol
8.54459538--03
1.70372901r-rjl
2. 22630236^-01
5.ei24807lr-02
b.l3856509r-02
4.1fe477383f--"»
4. i'
1.17521337^-01
7.^4Hl0b70r-02 l.S02?1424F-01 -i.:
•1.257blb83r-01 -1.5^6«,a924F-01
Q.4R-+17994--03 1.7o2^^337F-01 - 1. 88647091 r-02
1.23<^b2483-:-01 -2.9fl9^4997F-01 ~d . 321 6i>a49^-n 1
1.7SU90830---01 -9.2430»8S8r-02 - I . 9030 7b 1 Or-0 1
1.23551373F-01
6.
-01 -9.0775J901p-02 -1.7125674U-01
-02 -7.20710232F-03
-01 -2.726777a5r_02 -d.815b4734r-02
5.26062167--02 -*.'
1.42981904^-01
i.4bOb299Sr-Jl
7.45787R42-.Jt;
2.53569499r-i)2
7.75720922r-0<;
"
1.0905079^^-01 -<
-1.35044622E-01
1.7097b227c-02
-i.!0639738r-01
-3.998^4300^-02
•». 15892431^-02
4.57189754--U2
1.28881170--02
2.78^95679^-02
P. 07730036^-02
1.55241502E-02
2.02^*64082^-02
1.463^3069^-01
9.86^02888^-02
1.03856222E-01
-H.84568572E-02
3.2239QQ35F-02
1.320333S2F-02
-3.11376437E-01 1
5.34787910^-02
-2.1345o843r-01 1.30392220r-01
l.lH5n
-------
1.0
1.0
1.0
CA=>Q
1.
i
* .
1.
\
*• •
1.
1.
i
* *
1.
1.
^ET : ?* *
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
l.o
l.o
1.0
1.0
1.0
1.0
1.0
1.0
l.o
1.0
l.o
l.o
1.0
1.0
1.0
1.0
» » t
l.o
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
..'J
.0
<\
. • *
.0
1.0
1 .0
'.0
1.0
'.0
1.0
' .0
1.0
1.0
1.0
1.0
1.0
1.0
l.u
1.0
l.J
1.0
i.J
1.0
146
-------
^c IMITIfi;. STATKNi.
;.- I AL C IG^Jlr.wTiS
1
1
H
3
e
*
3 . L 3 - -
9
10
C. (,00r
S - - x A L '
iNJ'' "Jtr
11
12
1"?
!•*
15
1*
17
IB
19
20
21
22
23
2 +
25
X Y
L • J U . 'J
- . •} H . 5
- . 5 - - . 0
-,.<= -3.1
-.." -2.5
3.3 9.0
-'2.1 1 . 5
'DILATES uF CIRCLE POINTi
2-1. H -7.0
-29. C 7.6
:OG°JJ
<
-7.0
-'5.9
7.6
13.1,
-18.1
-12. *
-9.1
7.1
-11.9
«.o
-15.1
-1.3
-6.6
-'9.7
a. 6
[f 5TES
Y
-?9. 0
-15.1
29. 0
10. 1
-23.9
-15. D
3.2
5.5
2.7
3.2
25.9
21.3
12. b
-t.l
-It. 9
-. OF F
3EFORE
.C58E+CO
.2J2E+03
,'*U2£-01
. t81E-01
.t53E-01
.Hl9£-01
. ullE-01
. t05E-01
. ^05£-01
.399E-01
.39PE-01
.395E-01
.39*.E-01
.337E-01
.1^05-01
AND SJPPL
PF^CENT ALFiRNATIyES REji
=?ELUCT.
999.90
.53
.79
.00
.05
.38
.02
. Ul
.OJ
.31
.00
.Gl
. 00
.02
• ok
X
7.6
7.6
15,1
-18.1
-12.3
-9.1
7. 1
-11,9
-15.1
-15,1
-29.7
-13. fa
-29.7
8.6
-1.5
Y
?9 . u
29.0
-25.9
-23.9
-1-5.0
3.2
5.5
2. 7
25.9
23.9
-t.l
It. 2
-t.l
-It. 9
It . 3
£ OF E
. 55rtc*
.232E+
. U3 2£-
. <*81t-
,i*53E-
,«19c-
. *11£-
, *»05£-
.t05E-
.399£-
.398t-
.395E-
.39«*t-
.387E-
. mot-
CTEJ
0 j
00
01
01
01
01
01
Oi
01
Jl
01
31
01
01
01
X
-9.3
15.1
-18.1
12.3
7.1
7.1
-15.1
7.5
8.0
-29.7
15.6
-29,7
8.6
15.6
-<». 1
Y
-1.0
-25.9
-23.9
-.<*
5.5
5.5
25.9
-.1
3.2
-4.1
-1.8
-t.l
-14.9
-1.3
29.7
£ OF E
. 558£*uO
.232E+00
,t82t-01
.t8 It- 01
.453E-01
.419E-01
.tllt-01
.405L-01
.U05E-01
.399E-01
.393E-01
.395E-01
.394E-01
.397E-01
.14 OE.-0 1
147
-------
B. Subroutine CIRCUM
Subroutine CIRCUM is used to locate points on the
circular boundary if not more than one is already located
there. If one point is already on the boundary the sub-
routine locates the second point on the boundary (line 35)
at the point of maximum residual error. The search is
started at a point opposite the point already located on the
boundary. If no point has been located on the boundary, the
"center of gravity" of the already located interior stations
is found. The first point is then located tentatively at
the point in which the line through the "center of gravity"
of already located points and the center of the circle meets
the circumference of the circular boundary. A minimum of
the residual variances is then sought from that starting
point.
148
-------
LOCATES C.G. OF ALL POINTS AVAILABLE
I
PICKS POINT ON BOUNDARY OPPOSITE THE C.G.
LOCATES LOCAL MINIMUM ON THE
BOUNDARY VIA FMFP ( FUNB )
ADDS TO LIST VIA ADOPT
LOCATES POINT ON BOUNDARY
OPPOSITE FIRST MINIMUM
LOCATES LOCAL MINIMUM
ON THE BOUNDARY VIA FMFP (FUNB)
| ADDS TO LIST VIA ADOPT |
RETURN
FIGURE V-2
ABBREVIATED FLOW CHART FOR SUBROUTINE CIRCUM.
149
-------
10
15
20
25
30
35
c
20
•DECK CIRCUM
SU3*JUTIN£ CIRCUMC XC, YC, 3, I )
C LOCATES POINTS ON THE CIRCUMFERENCE IF ^OT MORE THAN ONE 15
C ALREADY THERE. ( XC, YC ) IS THE CIRCLE CENTER, < = RA
COMMON / BLKi / iJSTN, NdJY , NTC, XS(30>, YS(30>, 113(50,"?)
CoMMON / 8LK3 / NCIR, AZ(20)
CCMMGN / 8LK<* / EST, EPS, LIMIT. ItR, F
CGMHuN / LINKS / ExRl, XO(fcO», OIFF(60)
EXTERNAL FUNS
JATA PI, T»I / 1.1-^15927, 6.231155"5 /
IF ( I ,tO. 1 ) GO TO 20
C ^IN£ CG OF POINTS AVAILABLE
ANJ = NSTN
LU^l = SUM2 r 0.
JO 10 I = 1, NSTN
5U11 = SUM1 * XS(I»
10 SUM2 = SUM2 * YS(D
XCG = SUMl / ANO
tCG = SUM2 / ANO
JIST = SQPT ( < XCG - XC )**2 * ( YCG - YC )*»2 )
C SELECTS PCINT OPPOSITE CG OR WEST
IF ( OIST ,LE. 1. ) 12, 1*
12 X = XC - R
Y = YC
Gu TO 16
14 X = XC - ( XCG - XC ) * R / OIST
Y = YC - ( YCG - YC ) * R / OIST
16 ANG = ATAN2( Y-YC, X-XC )
18 CALL FMFP( FUNS, 1, ANG, F, G, EST, EPS, LIMIT, IER, M )
XO = XC * R » COS( ANG »
YO = YC «• R * SIN( ANG »
CALL AOOPK XO, YO )
IF ( ANG ,LT. 0. ) ANG = ANG * TPI
NCI*? = 1
AZ(1) = ANG
PUTS ANOTHER POINT
ANG = AZC1I * PI
CALL FMFP( FUNB, i,
IF ( ANG .GT. TPI >
AZ(2) = ANG
NCIR = 2
XO = XC «• R • COS<
YO = YC » R » SIN(
CALL ADOPT < XO, YO
RETURN
tNO
ON CIRCLE
ANG, F, G, EST,
ANG = ANG - TFI
EPS, LIMIT, IER, H >
ANG
ANG
)
150
-------
C. Subroutine FMFP
Subroutine FMFP is a "standard" method of finding the
minimum of a function of several independent variables and
was extracted from the IBM System/360 Scientific Subroutine
Package (360-CM-03X) Version III, Programmer's Manual. The
details of the minimization procedure are described in the
above and also in R. Fletcher and M.J.D. Powell, "A Rapidly
Convergent Descent Method of Minimization", Computer Journal,
Vol. 6, No. 2, 1963, pp. 163-168. The subroutine as
listed in the first reference above has been modified at
lines 195-200. In our application the standard FMFP
iterative procedure did not meet the convergence criterion.
A loop was added to check the magnitude of t-ah./x. and if
it was found to be less than e for all i, then a branchout
of the iteration was made by transferring to statement 36.
151
-------
B ROUTINE FMFH(Fut,CT,N,X,F,G,EST»t.PS,LIMIT,
10
30
35
f 0
C
c
C
c
c
c
c
c
c
c
c
c
c
c
C
c
c
c
c
c
C
C
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
FUNCT -
TO FINu A LOCAL MINIMUM OF A FUNCTION OF SEVERAL VARIABLES
3Y THE METHOD OF FLETCHER ANO POWELL
USAGE
CALL FMFP(FUNCT,N,X,F,G,EST,EPS,LIMIT,IER,H)
CF PARAMETERS
USER-WRITTEN SUBROUTINE CONCERNING THE FUNCTION TO
3F MINIMIZED. IT MUST tic uF THE FORM
SUBROUTINE FUNCT(N,ARG,VAL,GRADJ
ANO MULT SERVE THE FOLLOWING PURPOSE
FOR EACH N-3IMENSIONAL ARGUMENT VECTOR ARG,
FUNCTION VALUE ANO GRADIENT VECTOR MUST 3E COMPUTED
AND, ON RETURN, STORED IN VAL ANO GRAD RESPECTIVELY
NUMBER OF VARIABLES
VECTOR OF DIMENSION N CONTAINING THE INITIAL
ARGUMENT WHERE THE ITERATION STARTS. ON RETURN,
X HOLDS THt ARGUMENT CORRESPONDING TO THE
COMPUTED MINIMUM FUNCTION VALUE
SINGLE VARIABLE CONTAINING THE MINIMUM FUNCTION
VALUE ON RETURN,, I.E. F=F{X).
VECTOR OF DIMENSION N CONTAINING THE GRADIENT
VECTOR CORRESPONDING TO THE MINIMUM ON RETURN,
I.E. G=G(X».
I£ AN ESTIMATE OF THE MINIMUM FUNCTION VALUE.
TESTVALUE REPRESENTING THE EXPECTED ABSOLUTE ERROR.
A REASONABLE CHOICE IS 10**(-6>, I.E.
SOMEWHAT GREATER THAN 10»*(-0>, WHERE D IS THE
NUMBER OF SIGNFICANT DIGITS IN FLOATING POINT
REPRESENTATION.
MAXIMUM NUMBER OF ITERATIONS.
ERROR PARAMETER
IE"? = 0 MEANS CONVERGENCE WAS OBTAINED
IER = 1 MEANS NO CONVERGENCE IN LlPlT ITERATIONS
IER =-1 MEANS ERROffS IN GRADIENT CALCULATION
IER = ?. MEANS LINEAR SEARCH TECHNIQUE INDICATES
IT IS LIKELY THAT THERE EXISTS NO MINIMUM.
WORKING STORAGE OF DIMENSION N*(N*7)/2.
f
G
EST
EPS
LIMIT -
IER
H
KtMARKS
I) THE SUPROUNTINE NAME REPLACING THE DUMMY ARGUMENT FUNCT
MUST 9E DECLARED AS EXTERNAL IN THE CALLING PROGRAM.
II) IER IS SET TO 2 IF t STEPPING IN ONE OF THE COMPUTED
DIRECTIONS, THE FUNCTION WILL NEVER INCREASE WITHIN
A TOLERABLE RANGE OF ARGUMENT.
IER = 2 MAY OCCUR ALSO IF THE INTERVAL WHERE F
INCREASES Ib SMALL ANO THE INITIAL ARGUMENT WAS
RELATIVELY FAR AWAY FROM THE MINIMUM SUCH THAT THE
MINIMUM WAS OVE»LEAP£0. THIS IS DUE TO THE SEARCH
TECHNIQUE WHICH DOUBLES THE STEPSIZE UNTIL A POINT
IS FOUND WHERE THE FUNCTION INCREASES.
152
-------
SU3KQUTINE FMFP ^
c
C SUBROUTINES AND FUNCTION SU3PROGRAMS REQUIRED
c FUNCT
c
tO C METHOD
C THE METHOD IS GESCRI9EC IN THE FOLLOWING ARTICLE
C R. FLETCHER AND M.J.O. POWELL, A RAPIC DESCENT METHOD FOR
C MINIMIZATION,
C COMPUTER JOURNAL VOL.6, ISS. 2, 1963, PP. 161-168.
65 C
C
C
COMMON / LINKS / ERR1, XOI60), OIFF(60»
C
70 C DIMENSIONED DUMMY VARIABLES
DIMENSION H(1),X(1),G(1)
C
C COMPUTE FUNCTION VALUE ANO GRADIENT VECTOR FOR INITIAL ARGUMENT
CALL FUNCT
100 K=K»N
H(K)=XIJ>
C
C DETERMINE DIRECTION VECTOR H
105 T=0.
00 3 L=1,N
T=T-G(L)*H(K)
IF(L-J)6,7,7
6 K=<*N-L
110 GO TO 8
153
-------
LUJROUTINE F-lFP
7 < = <«•!
3 CONTINUE
9 H( J» =T
C
115 C CHtCX WHETH£R FUNCTION WILL 0£Cx£ASt STEPPlN ALONG H.
1?0 C CALCULATE DIRECTIONAL DERIVATIVE ANO TESTVALU^S FOR DIRECTION
C VECTOR H AND GRADIENT VECTOR G.
CO 10 J=1»N
GNRM=GNRM*ABS
10 OY = DY4-H(J)*G(J)
C
C REPEAT SEARCH IN DIRECTION OF STEEPcST DESCENT IF DIRECTIONAL
C DERIVATIVE APPEARS TO 3E POSITIVE OR ZERO.
IF(QYH1,51,51
130 C
C REPEAT SEARCH lU DIRECTION OF STEEPEST DESCENT IF DIRECTION
C VECTOR H IS SMALL COMPARED TO GRADIENT VECTOR G.
11 IF51tpl,12
C
135 C SEARCH MINIMUM ALONG DIRECTION H
C
C SEARCH ALONG H FOR POSITIVE OIRECTICNAL DERIVATIVE
12 FY=F
ALFA=2.*(£ST-F)/DY
AM13A=1.
C
C USE ESTlMATt FOR STEPSIZE ONLY IF IT IS POSITIVE. AND LESS THAN
C 1. OTHERWISE TAKE 1. AS STEPSIZE
IFCALFAJ 15, 15, 13
13 IF(ALFA-AMBDA) 14,15,15
1* AMBOA=ALFA
15 ALFA=0.
C
C SAVE FUNCTION AND DERIVATIVE VALUES FOR OLD ARGUMENT
150 16 FX=FY
DX = 3Y
C
C STEP ARGUMENT ALONG H
00 17 1=1, N
155 17 X(I»=X(I)*AMBOA»H(I|
C
c COMPUTE FUNCTION VALUE AND GRADIENT FOR NEW ARGUMENT
CALL FUNCT(N,X,F,G)
FY=F
160 C
C COMPUTE DIRECTIONAL DERIVATIVE DY FOR NEW ARGUMENT. TERMINATE
C SEARCH, IF OY IS POSITIVE, IF OY IS ZERO THE MINIMUM IS FOUND
OY = 0.
DO 18 1=1, N
165 13 OY=OY»G(I)*H{I)
154
-------
FMFP
IF(DY)19,36,22
C
C TERMINATE SEARCH ALSO IF THE FUNCTION VALUE INDICATES THAT
C A MINIMUM HAS 3tEN PASSED
170 19 IF(FY-FX)20,22,22
C
C *EPtAT SEARCH AND DOU3LE STtPSIZE FOR FURTHER SEARCHES
20 AM80A=AM6DA*ALFA
0 ALFA=AM8DA
175 C £NO OF SEARCH LOOP
C
C TERMINATE IF T Ht. CHANGE IN ARGUMENT GETS VERY LARGE
IF(HNRM»AM9DA-1.£10)16,16,21
C
180 C LINEAR SEARCH TECHNIQUE INDICATES THAT NO MINIMUM EXISTS
21 I£* = 2
RETURN
C
C INTERPOLATE CU3ICALLV IN THE INTERVAL DEFINED 3Y THE SEARCH
165 c ABOVE AND COMPUTE THE ARGUMENT x FOR WHICH THE INTERPOLATION
C POLYNOMIAL IS MINIMIZED
22 T=0.
23 IFUM8DA)2<*,3fc,2<»
2«« Z=3.*(FX-FY)^AM30A+OXfDY
190 ALFA=AMAX1(A3S,A8S(OX),A3S(0Y ) )
OALFA=Z/ALFA
OALFA=OALFA*GALFA-OX/ALFA*OV/ALFA
IF(OALFA)51,25,25
25 W=ALFA»SQRT(DALFA)
195 ALFA=
IF(OALFA)30,33,33
30 IF(F-FX)32,31,33
31 IF(OX-DALFA)32,36,32
220 32 FX=F
155
-------
SUBROUTINE FMFP,
OX=OALFA
T = Al_FA
AMSOAsALFA
GO TO 23
225 33 IF(Fr-F)35, 3^,35
3<« IF 51,38,38
C
C TtST LENGTH OF ARGUMENT DIFFERENCE VECTOR AND DIRECTION VECTOR
C IF AT LEAST N ITERATIONS HAVE BEEN EXECUTED. TERMINATE, IF
2*5 C 30TH ARE LtSS THAN EPS
36 IER=0
IF(KOUNT-N»<*2,39,39
39 T=0.
Z=0.
250 DO +0 J=1,N
T=T*A8S(H(KI I
255 i»0 Z=Z«-H»H(K)
Wl IF(T-EPS)56,56,<»3,50,50
C
C PREPARE UPDATING OF MATRIX H
«»3 ALFA = 0.
DO k7 J=1,N
265 *=J«-N3
H = Q.
DC -»b L = 1,N
KL=N*L
270 IF(L-J)<*<», 45,
<»(» K=<»N-L
GO TO «*6
kf> CONTINUE
K=N*J
156
-------
SUBROUTINE FMFP
ALFA=ALFA+H*H
-------
D. Subroutine FUNB
When a prospective minimum point reaches the boundary,
it is constrained to remain on the boundary. This requires
that the (x,y) coordinates be exchanged for an angle
coordinate, 8 (azimuth of the point concerned), and the
derivative with respect to 0 replaces the derivatives with
respect to x and y. This is done by FUNB which calls FUNCT
and modifies the derivatives from it so that they are directed
tangentially to the boundary.
158
-------
CHANGES POLAR COORDINATES
TO CARTESIAN COORDINATES
GETS FUNCTION AND GRADIENT FROM FUNCT
CHANGES GRADIENT IN CARTESIAN
COORDINATES TO GRADIENT IN AZIMUTH
FIGURE V-3
ABBREVIATED FLOW CHART FOR FUNB
159
-------
»OEO. FUN?
SUBROUTINE FUNBf I, ANG, F, G )
C TO CONVERT FuNCTC N, X, F, G > INTO A ONE DIMENSIONAL
c FUNCTION on THE CIRCULAR BOUNDARY
5 COMMON / BLK1 / NSTNt NBJY, NTR, XS<30), YS(30), ITR(50,3>
, ,
CALL FUNCT( 2, X, F, G? )
G(ll = -G2
-------
E. Subroutine FUNCT
Subroutine FUNCT(N,X,F,G) is used by FMFP to get the
function value and its partial derivatives at the point X(I),
1=1, N, where N is the number of independent variables X(I)
that are the arguments of the function, F. The function
value is scalar and is returned in F. Its partial derivatives
with respect to X(I) are returned in the gradient vector
G(I) , 1=1, N.
The function value and its partial derivatives are
evaluated using the subroutine CORFUN (q.v.) from the proper
values/functions of the correlation coefficient at discrete
points and from the field of pollution concentration
variance W(I,J), I,J=1, 9, which is available in INT2D.
via COMMON/BLK2/.
The computation process is carried out on the following
basis. Let a.. be the elements of the correlation coefficient
matrix for station locations i, j; let g. be the correlation
coefficient of pollution concentration between stations i
and the point (x , y ), the starting point for the minimization
procedure (or some point during the search for the minimum);
2
let a be the variance of pollution concentration at (xo'yo)•
The least square error of estimate of pollution concentration
at (x ,y ) from linear regression already located stations i,
at (x.,y.)f is given by
F = e2 = a2(l- I I g^1^) (1)
i J
which is the function value required. Its partial derivatives
are given by
161
-------
= 9e2/9x = 2a(9a/9x)(l - S Sg.^1^.)
i j
- 2a2 £ £ gialj Ogj/gx) (2)
G(2) = 9e2/9y = 2a(9a/9y)(l - I Sg.a1
i J
- 2a2 I Igialj(9g./9y) (3)
i J J
where a1-' are the elements of the inverse of the matrix
The correlation coefficient function obtained from
CORFUN is such that if c(x,y;£,n) represents the correlation
between pollution concentrations at (x,y) and (£,ri), then
the limit for £ -> x, n + y is not 1, but a value somewhat
less than 1 (a jump discontinuity at £=x, n=y) . This means
that the correlation function is represented in the form
C(x,y;£,n) = C*(x,y;£,n) + A (x,y) 6 (x,y ; C , n) (4)
where C*(x,y;C/n) is the continuous part of the correlation
coefficient function, A(x,y) is the amount of the jump
discontinuity, 6(x/y;£,n) is the two-dimensional Dirac function
which is zero if x 7^ £, y ^ n and is 1 at x = £/ y = n. Thus,
the limit for ? -»• x,n ^ y of C*(x,y;5fH) exists and is
* *
C*(x,y;x,y) = C . In the limit sense C(x,y;£,rt) -»• C but
the actual value of C(x,y;x,y) is C + A(x,y). This means
that in (1) the terms of g. have this jump discontinuity
1 *
so that g..^ = g (xi,yi;xQ,yo) -> Co(xi,yi) in the limit sense,
but should actually take the value C (x.y.) + A(x.y.) when
x = x^, y = y. . Thus, F (x ,y ) in the limit sense does
not approach zero when (x ,y ) approaches the location of
an already located pollution concentration observation
station.
162
-------
To force the function to zero in such circumstances
the following modified form of the error of estimate (1)
2
was used. Let F be the value computed from
F« 1 ^ —-* __ J- 1
o ~ i 2. ^^a g-i •
i j
Then
F = a2(FQ - A/F.^2
will be zero when the point (x ,y ) approaches an already
located station.
In lines 17-38 the matrix {a..} is set up and inverted.
1^ 2
The terms g. are computed in lines 39-45, a is obtained on
line 48, the error variance on line 55 and the value of F
on line 62. If only the function value is required, 1C = 1,
and the rest of the subroutine is skipped. When derivatives
are required, the derivative of the variance of pollution
concentration is computed in lines 69-76, the correlation
coefficient derivatives are then obtained (77-83) and the
remaining terms of the gradient vector are added (78-90).
When diagnostics have been returned by any of the sub-
routines, their sense is printed (lines 91-106). Since the
coordinates of already located stations remain fixed, pro-
vision is made in IQ to hold the values of matrix {a -1} (IQ = 2)
during a given minimization, but to renew them (IQ = 1) when
a new minimization is started.
163
-------
GETS CORRELATION COEFFICIENTS FOR STATION PAIRS
TO FORM MATRIX OF COEFFICIENTS VIA CORFI
1
GETS INVERSE OF MATRIX
I
GETS CORRELATION COEFFICIENTS FOR
NON-HOMOGENEOUS TERMS
I
FINDS ERROR OF ESTIMATE ( VALUE OF FUNCT )|
NO
NEED
DERIVATIVES
FIGURE v-4
YES
GETS CORRELATION
COEFFICIENT
DERIVATIVES VIA CORFUN
I
COMPUTES FUNCTION
DERIVATIVES
RETURN
ABBREVIATED FLOW CHART OF FUNCT.
164
-------
GO
00
XM
YM
00
XI
Yl
TO ( 30,
61*1,
= XSIII
= VSCI)
<» J * 1,
= XSU)
= YS(J)
32 » ,
NSTN
I
•DECK FUNCT
SUBROUTINE FUNCT( N, X, F, G )
C TO COMPUTE THE FUNCTION TO BE MINIMIZED
C N = NO. OF VARIA3LES IN ARGUMENT! N = 2
r^ C XCI), I = If N, ARGUMENT
C F = FUNCTION VALUE ( SCALAR )
C G(I)( I = It N, GRADIENT OF FUNCTION
c ic, CONTROL; i = GET F ONLY, 2 « GET F AND cm
C 10 = CONTROL, 1 = 00 MHOLE THING, 2 = USE OLD APAM, AA, AINV.
10 C SINCE BASIC STATION LOCATIONS AR£ NOT CHANGED.
C NG PENALTY FUNCTION INVOLVED
C
COMMON / 8LK1 / NSTN, N90Y, NTR, XS130), YSC30), ITRf5C,3»
, , XC, YC, R, IBDYC20)
15 COMMON / Q / 1C, IQ
DIMENSION G(2), AlNV(30,30), GGC30), OGOX(30>, DGOY(30), AA(30,30)
DIMENSION X(1)
xo - xdi
VO = X(21
20 C GETS CORRELATION COEFFICIENT ARRAYS
IQ
30
25
CALL CORFUN( XM, YM, XI, Yl, COR, DCOX, OCOY, 10 )
IF ( 10 ,NE. 0 i GO TO 102
30 AA(I,J) - COR
IF ( I . EQ. J ) AA(I,J) z 1.
k CONTINUE
6 CONTINUE
C FILLS OUT CORCOEF MATRIX AND GETS INVERSE, ETC.
35 oo 9 i - i, NSTN
Ji = I * i
oo a j = ji, NSTN
8 AA(I,J) = AA(J,I)
CALL MATINVC AA, AINV, NSTN, 40 )
<+0 C GET NONHOMOGENtOUS TERMS
32 00 36 I > 1, NSTN
XM = XSCII
VM * vsm
CALL CORFUN< XM, YM, XO, YO, COR, OCOX, OCOY, 10 )
45 IF ( ID .Nt. 0 ) GO TO 102
36 GG(I) s COR
CALL INT2D( XO, YO, VAL, 10 )
IF ( 10 .N£. 0 ) GO TO 10<*
VSQ = VAL * VAL
50 C COMPUTES ERROR OF ESTIMATE AND ASSIGNS NEGATIVE SIGN SINCE
C SUBROUTINE FMFP GOES FOR MIN AND WE WANT MAX
SUM = 0.
00 10 I * It NSTN
00 10 J = 1, NSTN
55 10 SUM = SUM + GG(I1 * AlNVd,JI * GG(JI
165
-------
~u
FUNCT
11
500
C
C
75
22
•90
ic;
IPS
110
C
200
C
102
1002
10*
1QO<*
106
1006
300
t> = 1. - SUM
IF ( ERR .LT. 0.
CONTINUE
CALL CORFUNK
A = 1. - COR
FQ = SQRT< ERR )
IF ( FQ .£Q. 0. ) F = 0.
IF ( FO .Nc. 0. ) F = -VSQ * (
PPINT 500, ERR, CO*, A, FO, F
(5X,*FO = *,£13.5, * COR =
I GO TO 106
XO, YO, XO, YO, COR, OCCX, OCCY, 3
0.
FO - A / FO )**2
.5.
A =
bQRT{FO
CALL
IF (
CALL
IF {
G(2)
INT2C(
TO .NE
INT2C(
ID .N£
= -2.
= -2.
•
•
*
#
XI,
0 )
XO,
0 )
VAL
Y
Y
0,
GO
1,
GO
*
*
vx,
TO
VY,
TO
( VX
ID
10
10^
VAL )
VAL »
* ERR
* ERR
NSTN
l) = »,E13.5,* Fl - FO = »,£13.5)
IF ( 1C .EQ. 1 ) GO TO 200
-STIMATES DERIVATIVES USING 1 UNIT INCREMENT ON THE
WEIGHT FUNCTION
XI - XO 4- 1.
tl = YO * 1.
:<
<£
;(
<£
»
I
uC 22 I = 1,
XM = XS(I)
YM = YS(I)
CALL CORFUM
IF ( ID .N£.
DGOX(I) = dCCX
uGGY(I) = OCDY
SUMl = SUM2 = 0.
CO 2<* I = 1, NSTN
00 2«+ J = 1, NSTN
SUMl = SUMl *• GG(I)
5UM2 = SUH2 *• GG = &(?) *2. »
0 )
YM, XO, YO,
GO TO 102
COR, OCDX, DCDY, ID )
*
*
AINV(I,J)
»INV(I,J)
ViQ * SUMl
VSO * SUM2
OGOX(J)
OGOYCJI
THESE GRADIENT CO^IPNENTS HAVE SIGN RiiVERSEC TO CORRESPOND TO F.
RETURN
OIAGNOSTICS PRINT OUTS
PRINT 1002, ID
FOR1AT (* COORDINATE *,I3,» OUT OF RANGE IN CG^F'JN FROf SUBROUTINE
* FJNCT*)
GO TO 300
PRINT lOOi*, ID
FORMAT (* COORDINATE », 13, * OUT OF RANGE IN INT20 FROM SU3ROUTIN
*E FJNCT*>
GO TO 300
CONTINUE
PRINT 1006, ERR, XO, YQ
FORMAT (»
* »,F10.5,<
£RR = 0.
GO TO 11
CALL EXIT
END
INTERPOLATION ERR
Y = *,F10.3l
*,E10.3,« FROM SU3ROUTINE FUNCT AT X =
166
-------
F. Subroutine CORFUN
Subroutine CORFUN is used to determine the correlation
coefficient relating pollution concentration at two arbitrary
points neither of which need be a point of the 9x9 grid.
The technique used is that of a quadratic interpolation pro-
cedure based on the description of the correlation function
in terms of its proper functions and values. If the proper
values are A., i=l, ..., k, (A,>A_>...>A,) and the proper
1 J_ ^ K.
functions corresponding thereto are .(x,y), then the correlation
function may be expressed in the form
k
K(x,y;£,n) = £ Ai4>i(x,7)^(5,n) (D
i=l
where (x,y) and (?,n) are the points between which pollution
concentration is being correlated. The value of k is that
of the statistically significant proper values/functions.
The proper functions are known only at the points of a 9 x 9
square grid of points (10 km separation). The values of
(£,n) are found by means of a simple bivariate
quadratic interpolation procedure.
Had the correlation coefficient function itself been
used, K (x,y; C/i~l) a four variable interpolation method would
be required and additional constraints imposed to preserve
the basic character of the correlation coefficient function
(such as having a value not exceeding +1 and a "horizontal"
tangent plane at any point x = £, y = n). The use of the
proper functions greatly simplifies the interpolation pro-
cedure and guarantees that the characteristics of the
correlation function are preserved.
167
-------
The interpolation algorithm is standard and is given by
M. Abramowitz and i.A. Stegun, Handbook of Mathematical Functions,
U.S. Government Printing Office, Washington, D.C., June 1964,
p. 882, formula 25.2.67. This requires six points for a
rectangular grid in the arrangement of Figure V-5. The
formula is
f (X0+ph,y0+qk) =
4- (l+pq-p2-q2) f Q/ 0+P (p-2q+l) f l ^ Q/2+q (q-2p+l) f Q
which has been rearranged into the form
f(x0+Ph,y0+qk) =
In the above, (h,k) are the lengths of the sides of the
rectangular grid, and f. . are function values at the grid
1 / D
points with units indicated by the subscript and scale factors
h for index i, k for index j. The values of p and q are
determined from p=(x-x,.)/h, q=(y-y«)/k.
The arrangement of points in Figure V-5 provides for
values of p and q in the ranges 0
-------
•- '
(0,0)
FIGURE V-5
THE ARRANGEMENT OF DATA POINTS FOR 6-POINT QUADRATIC
INTERPOLATION. THE AREA OF MOST ACCEPTABLE VALUES OF
P AND Q ( 0 ' (41,+D (0.41)
• "---4 • • ,—-f • • j—, •
(0,0) (+1,0) (+2,0) (0.41) | (B) I (4-1,42) (-1.41) | | (+1.4-1)
I I I I
(•H.-1) (0,0) (*1,0) (0,0) (tl.O)
A ) ( B ) ( C )
FIGURE V-6
EQUIVALENT ARRANGEMENTS OF DATA POINTS TO ACCOMMODATE
OTHER VALUES OF P, Q LYING CLOSE TO THE POINT ( XO,YO )
169
-------
The partial derivatives of the correlation coefficient
function are obtained by differentiating (1) ,
k
= 2 A . . (x,y) [3 . (£, n)/H]
i=l
k
= 2 A . . (x,y) [3
-------
10
15
20
30
35
50
55
*0£CK CORFUN
SUBROUTINE CORFUN( XH, YM, XI, Yl, COR, OCOX, OCOY, 10 )
c Gtis CORRELATION FUNCTIONS ANO DERIVATIVES wo xi, YI FROM TABLES
C OF PROPER FUNCTIONS / VALUES.
C ID = 0 FOR COORDINATES IN RANGE, 10 .NE. 0, OUT OF RANGE
OIMENSION PHK151, PHHU5), OPH1DX(15), DPHlDYdS)
COMMON / 8LK2 / X0(9), YD(9), PH(15,9,9), ALAM(15I, H(9,9)
COMMON / Q / 1C, IQ
C PH
C ALAM(L2) * L2TH PROPER VALUE
6
C
6
10
11
12
13
16
C
ia
20
22
2k
26
28
30
15 /
, YH .EQ. Yl
$ OCOY = 0.
) <»,
DATA L2 /
10 = 0
IF ( XH .EQ. XI .ANO
COR = 1. S DCOX = 0.
GO TO 100
ENTRY CORFUN1
XA = XM $ YA * YM f K * 1
LOCATE LOWER LEFT CORNER OF SQUARE CONTAINING XA, YA
IF I XA .LT. XOI1) ) GO TO 11
12
00 10 I * 2,
IF ( XA .Lt.
CONTINUE
10 * i * 2 *
GO TO 100
IF ( YA .LT.
DO 13 J = 2,
IF ( VA ,LE.
CONTINUE
10 = 2 * 2 *
GO TO 100
12 * I " 1 $
9
xom
i < -
YOU)
9
YOU)
( K -
JZ *
) GO
1 )
) GO
) GO
1 )
J - 1
TO
TO
TO
16
INTERPOLATION ON PROPER FUNCTIONS
P0= .1 » I XA - XO(IZ) )
Q0= .1 * ( YA - YOCJZI I
00 3k L = 1, L2
IF ( PO.LE. .5 ) 18,
IF { QO.LE. .5 ) 22,
IF ( QO.LE. .5 I 26,
ICASE * 1 t P * PO t
A * PHU,IZ,JZ) 18 =
0 > PH(L«IZ,JZ+1) S E
GO TO 30
ICASE »3SP=POtQ=l. -QO
A a PH(L,IZ,JZ»1) S B = PH(L,IZ*1,JZ+1)
20
2<»
28
Q x QO
PH(L,IZ*1,JZ) S C *
PHU,IZ,JZ-1) t F
PH(L,IZ-1,JZ)
« PH(L,IZ*l,JZ+i)
$ C = PH(L,IZ-1,JZ*1)
i E = PH(L,IZ,JZ*2) IF* PH(L,IZ*1,JZ)
i) * PH(L,IZ,JZ)
GO TO 30
ICASE * 2 t P = 1. - PO
A * PH(L,IZ+1,JZ) $ 9 *
0 = PH(L,IZ*1,JZ*1) t E
GO TO 30
ICASE = k S P - 1. - POl Q * 1. - QO
A > PH(L,IZ+1,JZ»1) SB* PH(L,IZ,JZ+1) $ C
0 = PHIL,IZ*1,JZ) S E = PH(L,IZ*1,JZ*2) S F
IF { K .NE. 1 ) PHKCLI = PHKL)
t Q * QO
PH(L,IZ,JZ) f C * PHCL,IZ+2,JZ)
* PH(L,IZ*i,JZ-U $ F a PH(L,IZ,JZ+1)
PHIL,IZ+2,JZ*1)
FH(L,IZ,JZ>
171
-------
SU1RJJTINE CCRFUN
(P*(=5-C) *• Q » < D - = > f ? * F * < •?
* C - ?t * A ) + Q * Q » lC*E-2. *A)H-P*GI* (A *F-3-3
IF « K .EQ. 1 ) GO TO 3<»
GO TO ( 3**, 32 ), 1C
32 bf.NP = 1. 5 SGNQ = 1.
IF ( ICASE . £Q. 3 .OP. ICASE .EQ. i, ) SGNQ = -1.
IF ( ICASE .£0. 2 .0*. ICAS£ .£Q. % J SGN*3 = -1.
JPHIQX(L) = SGNP »«.5*(l-C>**=*<3*C-2. *A| +
4-Q* ( A * F - 3 - 0 ) )
= SGfjQ *(.5*(0-£)+Q»{D»e-2. *A) *
3*» CONTINUE
GO TO ( 36, 38 ) , N
36 < = 2
71 Xfl = Xl $ rA = Yl
GO TO 3
38 C03 = 0. f OCOX = 0. S OCOY = 0.
00 %2 L = 1, L2
COR = COR «• PHM(LI * ALAHCLI * PHKD
7? GO TO C *»2, 40 I, 1C
•+0 JCOX = OCDX + PHN(L> * ALAM(L> * OPH10XIL)
JCOV = OCOY * PHfl(U) * ALAM(L) *
t? CONTINUE
100 RETURN
e" EHO
172
-------
G. Subroutine INT2D
This is a two-dimensional interpolation subroutine
using the standard formula
f = f,
Four points f_ n, fn ,, f, n, f, , at the corners of a
U / U U f -L X / U X / -L
rectangle are used. The values p and q are given by
p = (xa-x1)/(x2-x-L) , q = (ya-y1)/(y2-y-L) where (*a»ya) is the
point at which the value of f is required and (x,, y,),
(x2,y,), (x,,y2), (x2,y2) are the coordinates at the corners
of the rectangle. The surface fitted to the data is a
hyperbolic paraboloid.
In the form INT2D(XA,YA,VAL,IDIOG) the coordinates
(x ,y ) are input as XA, YA, the result is output in
cL Si
VAL. IDIOG=0 for successful interpolation, =1 for XA
outside the table range, =2 for YA outside the table range.
In this case the table, W(I,J), I,J=1,9, appears in the
COMMON statement. The quantity PH(15,9,9) of the COMMON
statement is not used in this subroutine.
173
-------
*OEO INT?D
SUBROUTINE INT 20
Co^ON / BLK2 / X(9), Y(9), PH(15,9,9», ALAM{15), W(9,9J
G INTERPOLATES \/ALUE OF w AT XA,YA, FROP VALUES GI^EN AT X,Y
5 C N=9, IOIOG = DIAGNOSTIC, 0=GK, 1=X OUT OF RANGt, 2= Y OUT OF RANGE
IOIJG=0
30 10 1=1,9
IF(XA.LT.Xd) » GO TO 1?
10 CONTINUE
10 ICIOG-1
GU TO 18
1? UO It J=l,9
IF(YA.LT.Y-X(Il))
20 Q=(YA-Y(J1» »/(Y(J2)-Y(JD)
HI=W(II Jl)
H2=H(I2,Jl)
W3=W(I1,J2>
M4=W(I2,J2)
25 VAL
16 RETURN
END
174
-------
H. Subroutine MATINV
This is a standard subroutine for matrix inversion.
In MATINV(A,B,N,M) the matrix to be inverted is contained
in A and N gives the number of rows/columns of A. The inverse
of A is returned in B. Both A and B are listed in full
form.
175
-------
•OECK
j'J3^0UTINt MATINiHA, i,N, M)
1C
15
?0
50
eo
30
35
65
70
C
C
C
50
95
100
105
110
115
INITIALIZATION
GO IE I=ltN
30 1C J=1,N
a (I, J)=A (I, J>
Jf-IVUT0
00 50 J = 1,N
IF(JFIVOT.GT. A8S(8(I,J) ) ) GO TO 50
PIVOT = 8(1, Jl
JCOL=J
CONTINUE
CONTINUE
JFIrfoT ( JCOLJ-IROW
PIVOT COLUMN WITH ROM MULTIPLIERS
CO 70 1=1, N
*=-* (I, JCOL)
IFIIROW.NE. I) GO TO 70
X=l.
3 GO TO 115
00 110 1=1, N
IF(I*OW.EQ.I) GO TO 105
d(I,J) = B(I, J)+3(I, JCOLt *C( J)
GO TO 110
d
-------
Sim OUTING 1ATINV
L = IPIVOTIM MATI 5*0
120 CIO = 3(1,L) MATI 550
00 130 K=1,N MATI 5oQ
139 3(I,K) = C(K) MATI 570
cQ OG 150 1=1,N MATI 530
00 1*4.0 K = 1,N MATI 590
L = JPItfOKK) MATI 603
1*0 C«<) = 3(L,I) MATI 610
OC 150 K=1,N MATI 620
£5 150 S(K,I) = CfiO MATI b30
•
-------
I. Subroutine ADOPT(XO,YO)
This subroutine adds the station location with co-
ordinates XO,YO to the station list and rearranges the list
so that it is in canonical form, i.e., the triangle assignments
include the new point and vertices of all triangles are in
counterclockwise order', if it is a new boundary point it is
listed as such and the number of boundary points incremented
and the counterclockwise ordering of the boundary points is
maintained.
The logic is somewhat involved and many exceptional
cases occur. The subroutine print-out contains many ex-
planatory comments on what is taking place. These should
be sufficient to unravel the situation. Extensive use is
made of subroutines AR2, TRITST, PORDR, and TRIFIX.
One of the basic ideas is to the effect that if XO,YO
is the point P and if II, 12 and neighboring boundary points
listed in counterclockwise order, then the "area" of the
triangle formed by the ordered points P, 11, 12 will be
"positive" (i.e., P, II, 12 are in counterclockwise order
around a triangle) for all boundary point pairs II, 12
(neighboring) if P lies inside the boundary, and will be
"negative" for at least some pair II, 12 if P is outside
the boundary, and will be "zero" if P lies on 11, 12
(interior or exterior). If the point is clearly interior
to the boundary of the points already located, it is
required then to find the triangle of already located
points within which it lies (or on a side of which it lies).
One then subdivides this triangle. If the new point lies
outside the boundary of already located points, then additional
triangles are to be formed. The many exceptional cases
require careful handling since experience indicates that
they occur with distressing frequency.
178
-------
TESTS POINT COORDINATES AGAINST EACH BOUNDARY (P)
SEGMENT VIA AR2 TO FIND IF INSIDE OR OUTSIDE THE
BOUNDARY (P)
INSIDE
OUTSIDE
I
IDENTIFY WHICH TRIANGLE CONTAINS
THE NEW POINT VIA TRITST
INSIDE
ON SIDE
ADD NEW POINT AND TWO
NEW TRIANGLES TO THE LISTS
® »i
CHECK POINT ARRANGEMENTS
VIA PORDR
I
IMPROVE TRIANGLES
VIA TRIFIX
RETURN
INFERIOR
/is
V
SIDE INFERIOR OR
ON BOUNDARY (P)
BOUNDARY
FIND OTHER TRIANGLE
WITH THIS SIDE COMMON
I
MAKE FOUR TRIANGLES
OUT OF THESE TWO
i
LOCATE ADJACENT
BOUNDARY (P) POINTS
I
REASSIGNS BOUNDARY (P)
POINTS AND ADDS ONE
i
MAKES TWO TRIANGLES
OUT OF ONE AND ADDS
NEW POINT
FINDS LEFT MOST AND RIGHT MOST
BOUNDARY (P) POINTS "SEEN" FROM
NEW POINT, THERE ARE K SUCH
I
FORMS K-1 NEW TRIANGLES
ADDS BOUNDARY POINT
FIGURE V-7
ABBREVIATED FLOW CHART FOR ADOPT
179
-------
*OcC* A03PT
SUBROUTINE ADOPTI xo, YO >
C ADDS POINT TO AN ARRAY IN STANDARD FORM SO THAT THE STANDARD FORM
C IS PRESERVED AND CHECKS TRIANGLE ARRANGEMENTS
5 C CHECKS TRIANGLE ARRANGEMENT SO IT IS ,»GOQQ,«.
COMMON / 8LK1 / NSTN, N8QY, NTR, XSJ30), YS<30!, ITRC5Q.3)
, , XC, YC, R, IBOY(2fl>
INTEGER P
DIMENSION ISIG(2fll, PI2), IHI20)
10 C PUTS ( XO, YO ) IN XS, YS(NSTN«1)
Nl = NSTN * 1
XSCN1J * XO
YS(N1» x YO
c CHECKS WHETHER ( xo, YO » is INSIDE OR OUTSIOE THE BOUNDARY
15 00 10 I * It N80Y
II = I «• 1 J IF ( I .EO. N8DY I II * 1
CALL AR2< leOYlII, I30Y(I1I, Nl» A, 10 )
IF ( ID .Lt. 0 ) GO TO 54
10 CONTINUE
20 C ALL VALUES OF ID ARE * SO IS INSIDE. FINO WHICH TRIANGLE
00 12 I « 1, NTR
Jl = ITRCI.l) I J2 * ITR * IA 1 ITRII4,3» * M
C J03 DONE. TWO TRIANGLES, ONE POINT, NO 3DRY PTS AOOED
GO TO 12C
C NOW TAKE CARE OF EXCEPTIONAL CASES
50 20 IF ( ID .LT. 0 ) GO TO 22
C HA/E A DEGENERATE TRIANGLE
PRINT 1870, I, Jl, J2, J3
1070 FORMAT (» TRIANGLE *,I5,» VERTICES*,315,* IS A DEGENERATE*)
GC TO 126
55 C THE POINT ( XO, Y8 1 IS ON A SIDE OF THE TRIANGLE
180
-------
ACC^T
C
22
2*
2*
32
36
<.o
•»2
C
C
<*<»
FINJ
10
GO
IA
IA
IA
IA
IA
IA
=
TO
-
=
=
=
£
S
POINT
TH
00
IF
00
Jl
IF
IF
IS
•+«
(
<*6
=
(
(
THE
-10
(
Jl
J3
J2
Jl
J3
J2
(
OTHER TKIANGLL
2<»,
« IB
J IB
S 18
* IB
S 18
« IB
XO,
28, 3
= J3
= Jl
- Jl
= J2
= J?
- J3
YO )
^,
S
B
S
S
t
S
36,
1C
1C
1C
1C
1C
1C
WITH T
HO,
= J2
= J2
= J3
= J3
= Jl
= Jl
<*2
R
s
s
B
S
LIES ON SIDE
HIS SlDc ANu MAOc t OF 2.
)
GO
GO
GO
GO
GO
IA
, ID
TC *»<*
TO t+b
TO <*<*
TO <»<«
TO *«<»
, IB. FIND ANOTHER TRIANGLE WITH
SIDE
I
I .
J
J «•
IA
13
= If
EQ.
= 1,
1 S
.£0.
.EQ.
NT*?
IT )
3
IF (
ITR<
ITR(
GO
J
I,
If
TO
.EQ
J) .
J) .
1*3
. 3 )
AND.
ANO.
Jl
19
IA
•
•
= 1
EQ. ITR(IfJl) ) GO TO 50
EQ. ITRd.Jl) 1 GO TO 50
65
70
<*6 CONTINUE
w8 CONTINUE
7b C THE TRIANGLE MUST HA = Nl
ITRdFA.U = IA t ITR(ITA,2I = I£ S ITR IM = NBOY
IP = I * 1 S IF ( I .EQ. NBOY ) IP = 1
IF ( { ISIG(IM) .£Q. 1 > .ANO. ( ISIGJIP) .EQ. 1 ) ) GQ TO 90
GO TO 60
58 CONTINUE
105 GO TO 62
60 ISIGU1 = 1
C CHECKS OUT LEFT AND RIGHT POINTS ,,SE£N,, FRCM NEW POINT
62 K = 0
00 66 I = 1* N30Y
110 II = I * 1 f IF ( I .EQ. NBOY I II = 1
181
-------
SUBROUTINE ADOPT
IF { I3IGCI) .£0. ISIGtll) ) 66, c*
t><» K = K * 1
P(*» = II
66 CONTINUE
115 IF « ISIG(P(1) ) .EQ. -1 ) 68, 70
63 IL = P(l»
IR = P<2>
GO TO 72
70 IL = P(2)
120 IR = P(1I
C IL = FIRST POINT TO LEFT ,,SEEN,, FROM NEW POINT
C IR = LAST POINT TO RIGHT ,,SEEN,, FROM NEW POINT
C K = NUMBER OF POINTS , .SEEN,, FRCH NEW POINT
72 K = IR - IL + 1
125 IF ( K .LT. 0 > K. = NBOV «• K
IF ( K .LE. 1 ) 7t, 76
7<* II - IBDY(IL) { 12 = ISOYCIR)
PRINT 1075, II, 12, XO, YO
1075 FORHAT {» LEFT AND RIGHT BOUNDARY POINTS*/2I5/»ARE SEEN FROM*/
130 .2F10.5)
GO TO 126
C THtRt ARE K - 1 NEW TRIANGLES
76 12 = K- 1 5 N2 = NTR
DO 78 I = 1, 12
135 N2 = N2 + 1
Jl = IL * I - 1 $ IF ( Jl .GT. N3DY > Jl = Jl - NBOY
J2 = Jl * 1 $ IF ( J2 .GT. N30Y ) J2 = J2 - N8CY
ITRJN2,!) = IBDY(J2)
ITR(N2,2) = IEDY = Nl
12 = NBOY * 3 - K
DO 3% I = 2, 12
II - IR » I - 2 I IF { II .GT. N9CY » II = II - NBOY
150 8<» IBOY
-------
SUBROUTINE ADOPT
90 K2 = I8QY(IP)
II = IP - 1 $ IF 1 II .EQ. 0 ) II = 1
N! = IBOYdl)
DC 92 I - 1, NTR
170 00 92 J s l, 3
Jl = J »• 1 S IF < J ,£Q. 3 ) Jt = 1
IF ( Kl .£Q. ITRd.J) .ANu. K2 .tQ, ITR(I.Jl) ) GO TO 9<+
IF < K2 .EQ. ITRd.J) .AND. Kl ,EQ. ITR(I,J1> ) GO TO 9*
92 CONTINUE
175 PRINT 1085. Kl, K2
1035 FORMATC FAILS TO LOCATE TRIANGLE HITH SEGMENT'/?I5/» ON
»30UNDA«r»)
GO TO 126
94 J2 = J » 2 $ IF < J2 .GT. -T ) J2 = J2 - 3
180 IT = I
IA - ITR(ITtJ) « Id = ITR(IT.Jl) $ 1C = ITR(IT,J2>
96 IT^(IT,1I = 1C f ITRdT,?) = IA $ IT«
-------
J. Subroutine TRIFIX
This subroutine readjusts the triangle assignments
to maintain a network of non-overlapping triangles with
observation points at their vertices to prevent the occurrence
of triangles of unnecessarily small area. It finds triangle
pairs with a common side. When such a pair is located, they
are combined into a quadrilateral. This quadrilateral may
be divided into two ways. The triangle subdivision is
prefered that has the shorter of the two diagonals of the
quadrilateral as the common side (Figure V-8).
The quadrilateral formed by two triangles with a common
side may be re-entrant. In this case the second diagonal
lies "outside" the quadrilateral and the subdivision is
accepted as is (line 36), see Figure V-9. The case is
identified by the fact that when the outside diagonal is
used, one of the triangle areas resulting is larger than
the sum of the areas of the original triangle pairs.
When the re-entrant quadrilateral test fails but still
the area of the quadrilateral obtained by the second sub-
division exceeds that obtained by the first subdivision by
more than 0.01% a diagnostic is printed. This test is
required to account for the effect of roundoff errors in the
computer.
184
-------
FIGURE V-8
THE TWO POSSIBLE SUBDIVISIONS OF QUADRILATERAL ABCD INTO
TWO TRIANGLES. ON THE LEFT, THE SUBDIVISION ABC, ACD RESULTS
IN TRIANGLES WITH A LARGER COMMON SIDE AC WHILE THE
SUBDIVISION ABD.BCD ON THE RIGHT RESULTS IN A SMALLER
COMMON SIDE BD AND IS THEREFORE PREFERRED.
FIGURE V-9
THE REENTRANT QUADRILATERAL CASE. THE ORIGINAL TRIANGLES
ARE ABC AND ACD. THE SECOND DIAGONAL BD LIES " OUTSIDE "
THE QUADRILATERAL AND IS REJECTED.
185
-------
*OECK TRIFIX
SU3-40UTINE TRIFIX
C
C
C
C
10
15
20
25
30
35
12
1002
1001
CHiCKS THE TRIANGLE SUBDIVISION ANO TfiYS TO IMPROVE THE
TRIANGLE ARRANGEMENTS. FINOS TRIANGLE PAIRS WITH COMMON SIDE ANO
TESTS THE JIAGONALS OF THE QUADRILATERAL TO SEE IF MORE NEARLY
EQUAL AREA SUBDIVISION IS POSSIBLE
COMMON / BLK1 / NSTN, NBOY, NTR, XS<30), YS130), ITR(50,3)
, , XC, YC, R, I80Y(20>
DIMENSION XU3), X2(3), Yl(1), Y2{3)
12 = NTR - 1
00 20 I * 1, 12
2
H
6
a
10
11
C
C
<1 = I » 1
00 20 K = Kit NTR
DC 16 J = It 3
Jl = JtJ2 = J*-l$IF(J .EQ.
JA = ITRCI,J1) S J3 = ITR(I,J2>
00 16 L * It 3
Ll=LfL2=L+i$IFCL .£Q.
LA - ITRCKtLl) I Ld = ITR(K,L2)
IF ( ( JA .EQ. LA ) .AND. ( JB .EQ
ISM - 1 $ GO 10 A
IF ( ( JA .EQ. LB » .ANO. < JB .EC
ISH = 2
J3 = J2 * 1 $ IF ( J3 .GT. 3 1 J3
L3 = L2 * 1 $ IF I L3 .GT. 3 > L3
JA = ITR(ItJl) « JB = ITR(I,J2I t
LC = ITR«k,L3)
GO TO { lit 10 ), ISM
LA = ITR(K,L2) $ LB = ITRIK.L1
CALL AR2( JA, JB, JC, Al, 101
CALL AR2( LA, LB, LC, A2, 102
CALL AR2C JB, JC, LC, A3, 103
CALL AR2( JC, JA, LC, A<* , 104
AT = Al + A2
TESTS IF QUADRILATERAL REENTRANT.
OTHERWISE SIMPLY DIVIDE
3 ) J2 = 1
3 > L2 = 1
. LB > 1 2, H
. LA 1 ) 6, 16
= 1
= 1
JC = ITR(I,J3»
IF IS CANNOT
.GE. AT I 1 GO TC 20
BT
IF
DO
= A3 *
( A8S(
12 M =
AT I .LT,
0001 * AT ) GO TO
$ YKMI = YSUTRCItMH
$ Y2(M) = YSCITRCKtMM
M = 1, 3 )
Ak
BT -
It 3
= XS(ITR
-------
SUBROUTINE TRIFIX
16 CONTINUE
20 CONTINUE
RETURN
END
187
-------
K. Subroutine PORDR
This subroutine checks the arrangement of the vertex
points for counterclockwise ordering and rearranges the
ordering if a clockwise ordering is found.
188
-------
*OEO PGROR
SUBROUTINE PQR3R
C CHECKS THE ARRANGEMENT OF POINTS ARCUNO THE TRIANGLES TO
C ASSURE CC ORDERING
5 COMMON / 3LK1 / NSTN, M30Y, NTfc, XS(30), YS(3QI, ITR(5Q,3)
UO •* I - 1, NTR
Jl = ITRCI.ll t J2 = ITR(I,2» S J3 = ITR(I,3)
CALL AR2( Jl, J?t J3, A, IGIOG )
IF ( IOIOG .EQ. 1 ) GO TO k
10 ITRd.ll = J2 S ITR(I,?| = Jl
k CONTINUE
RETURN
END
189
-------
L. Subroutine AR2
This subroutine computes twice the area of the triangle
with vertices at the points with indices I,J,K and outputs
this as A with a diagnostic IDIOG which takes the value of
+1 if I,J,K in counterclockwise order about the triangle,
-1 if in clockwise order, and 0 if A=0.
190
-------
•DECK AR?
SUBROUTINE AP2I I, J, is, A, IDIOG )
C COMPUTES TWICE THE AREA OF THE TRIANGLE WITH STATIONS I, J, K AS
c VERTICES. RETURNS ABSOLUTE VALUE AT A ANO SIGN IN I^ICG AS <•! o'
5 C -1. IOIOG = +1 MEANS I, J, < IN CC ORDER.
CCMNON / BLK1 / NSTNf N90Y, NTR, XS(30), YS(30), ITR(50,3»
, , XC, YC, R, I8DY(20)
XI = XS(I) 8 X2 = XS(J> J X3 = XS(K»
Yl = YSII) S Y2 = YSU) J Y3 = YS(K)
10 A = ( X2 - XI ) * ( Y3 - Yl I - { X3 - XI I * ( Y2 - Yl )
IDIOG = SIGNt 1., A >
IF < A .EQ. 0. ) IOIOG = 0.
A - A3S( A )
RETURN
15 END
191
-------
M. Subroutine TRITST
This subroutine determines ;_he location of the point
input, P, with coordinates X, Y with respect to a triangle
with vertices at the points Jl, J2, J3 where these are the
index numbers for the coordinates of an already located
observation point. The output consists of a diagnostic
(IDIOG), and twice the areas of the following triangles: A for
J1,J2,J3, P for J1,P,J3, Q for J1,J2,P. The diagnostic
indicates as follows:
IDIOG = 2 , J1,J2,J3 in counterclockwise order, (X,Y) inside
1 , ' ' clockwise
0 , " " " in straight line
-l,-2 , P lies on Jl,J3
-3,-4 , " " " J1,J2
-5,-6 , ' J2,J3
-7,-8 , " is outside opposite J2
-9,-10 , " " " " J3
-11,-12, ' " Jl
192
-------
*D£CK TRUST
SUBROUTINE TRITSTl X, V, Jl, J2, Jl, IOIGG, A, P, Q >
C TESTS WHERE F $ Y2 = YSCJ2) t Y3 = VS(J3>
15 Al = X2 - XI « A2 = X3 - XI « A3 = X - XI
81 = Y2 - Yl f B2 s Y3 - VI $ 83 = Y - VI
A = Al * B2 - A2 * 81
P = A3 * B2 - B3 • A2
Q = Al * 83 - 31 * A3
20 IF ( A I 2, 12, <»
2 K s 1
A = -A $ P = -P f Q = -Q
H IF { P I 20, 14, 6
6 IF ( Q ) 22, 16, 8
25 8 IF { A - P - Q I 2<*, 18, 10
10 IDIOG = 2 - K
GO TO 26
12 IOIOG = 0
GO TO 26
30 1% IDIOG = -2 * K
IF ( 0 .LT. 0. 1 GO TO 20
IF C Q .GT. A » GO TO 2<*
GO TO 26
16 IDIOG = -<* * K
35 IF I P .GT. A ) GO TO 22
GO TO 26
18 IOIOG = -6 «• K
GO TO 26
20 IOIOG = -8 * K
*»0 GO TO 26
22 IOIOG = -10 * K
GO TO 26
2
-------
REFERENCES
1. Abramowitz, M. and Stegun, I.A. (1964), Handbook of
Mathematical Functions, N.B.S. Applied Mathematics
Series No. 55, Superintendent of Documents, U.S.G.P.O.,
Washington, B.C., xiv + 1046 pp.
2. Albrecht, J. and L. Collatz (1958), Zur numerischen
Auswertung mehrdimensionaler Integrals; Zeitschrift
fur Angewandte Mathematik und Mechanik, Vol. 38, pp. 1-15.
3. Anderson, T.W. (1958), An Introduction t£ Multivariate
Statistical Analyses, John Wiley & Sons, Inc., New
York, xii + 374 pp.
4. Bartlett, M.S. (1951), "The effect of standardization
on an approximation in factor analysis", Biometrika,
Vol. 38, pp. 337-334.
5. Beard, R.E. (1947), Some Notes on Approximate Product-
Integration, Institute of Actuaries, London, Vol. 73,
pp. 356-416.
6. Bellman, Richard (1960), Introduction to_ Matrix Analysis,
McGraw-Hill Book Co., Inc., New York, xx + 328 pp.
7. Boland, W. Robert and C.S. Duris (1969), Product
Type Quadrature Formulas, Mathematical Report 69-9,
Department of Mathematics, Drexel Institute of Technology,
Philadelphia, PA 19104, 48 pp.
8. Boland, W. Robert and C.S. Duris (1971), Product
Type Quadrature Formulas, B.I.T., Vol. 11, pp. 139-158.
9. Boland, W. Robert (1972), The Convergence of Product-type
Quadrature Formulas, SIAM J. Numerical Analysis, Vol. 9,
pp. 6-13.
10. Brooks, C.E.P., Durst, C.S., Carruthers, N., and Dewar, D.
(1950a), "Upper Winds over the World", Geophysical
Memoirs, No. 85, Meteorological Office, Air Ministry,
MO499e, London, 57 pp.
11. Brooks, C.E.P., Durst, C.S., and Carruthers, N. (1946),
"Upper Winds over the World, Part 1, The frequency dis-
tribution of winds at a point in the free air", Quarterly
Journal of the Royal Meteorological Society, London,
Vol. 72, pp. 55-73.
194
-------
12. Buell, C. Eugene (1972) , "Integral Equation Representation
for 'Factor Analysis'", Journal of A tmo spheric Science,
Vol. 28, No. 8, pp. 1502-1505". ~ '" ~ "~
13. Ceschino, F. and M. Letin (1972), Generalisation de la
methode de Havie au calcul des integrales multiples,
C.R. Acad. Sc. Paris, Series A, Vol. 275, pp. 505-508
14. Craddock, J.M. and Flintoff, S. (1970), "Eigenvector
Representation of Northern Hemisphere Fields", Quarterly
Journal of the Royal Met e or olog i c a 1 Society, Vol. 96,
pp. 124-129.
15. Davis, Philip J. and Philip Rabinowitz (1967), Numerical
Integration, Blaisdell Pub. Co., Waltham, Mass, 225 pp.
16. Dixon, Valerie A. (1973), Numerical Quadrature, A
Survey of Available Algorithms, National Physics
Laboratory, Division of Numerical Analysis and Computing,
Oxford, NPL Report NAC36, 28 pp.
17. Eimutis, E.G., and Konicek, M.G. (1972), "Derivations
of continuous functions for the lateral and vertical
atmospheric dispersion coefficients", Atmospheric
Environment, Vol. 6, pp. 859-863.
18. Ewing, G.M. (1941), On Approximate Cubature, Amer.
Math. Monthly, Vol. 48, pp. 134-136.
19. Farmer, S.A. (1971), "An Investigation of the Results
of Principal Component Analysis of Data Derived from
Random Numbers", The Statistician, Vol. 20, pp. 63-74.
20. Gantmacher, F.R. (1960a) , The. Theory of Matrices, I_,
Chelsea Publishing Co., New York, x + 374 pp.
21. Gantmacher, F.R. (1960b) , The Theory o_f Matrices, II,
Chelsea Publishing Co., New York, ix + 276 pp.
22. Hammer, P.C. and A.W. Wymore (1957), Numerical Evaluation
of Multiple Integrals, I., Mathematics of Computation,
Vol. 11, pp. 59-67.
23. Hammer, P.C. and A.H. Stroud (1958), Numerical Evaluation
of Multiple Integrals, II, Mathematics of Computation,
Vol. 12, pp. 272-280.
24. Horst, Paul (1965), Factor Analysis of Data Matrices,
Holt, Rinehart and Winston, Inc., New York, xix + 730 pp.
ir
25. Joreskog, K.G. (1962), "On the statistical treatment
of residuals in factor analysis", Psychometrika, Vol. 27,
pp. 335-354.
195
-------
26. Kenney, J.F. and Keeping, E.S. (1951), Mathematics of
Statistics, Part Two, D. Van Nostrand Co., Inc., New
York, xii + 429 pp.
27. Lawley, D.N. (1956), "Tests of significance for the
latent roots of covariance and correlation matrices,
Biometrika, Vol. 43, pp. 128-136.
28. Lawley, D.N. and Maxwell, A.E. (1963), Factor Analysis
as a. Statistical Method, Butterworths, London, vii + 113 pp.
29. Lawley, D.N. and Maxwell, A.E. (1971), Factor Analysis
as a_ Statistical Method, Butterworths, London, viii + 153 pp.
30. Levy, Paul (1965), Processes Stochastique et Movement
Brownien, Gauthier-Villars et Co., Paris, vi + 438 pp.
31. MacDuffee, C.C. (1949), Vectors and Matrices, The
Mathematical Association of America, Providence, R.I.,
xi + 203 pp.
32. Mehta, M.L. (1967), Random Matrices and the Statistical
Theory of Energy Levels, Academic Press, New York,
x + 259~pp~i
33. Mises, R. von(1936), Formules de cubature, Revere
Mathematique del'Union Interbalkanique, Vol. 1, pp. 2-27.
34. Mises, R. von (1954), Numerische Berechnung mehrdimensionaler
Integrals, Zeitschrift fur Angewemdte Math. Und Mech.,
Vol. 34, pp. 201-210.
35. Panchev, S. (1971), Random Functions and Turbulence,
Pergamon Press, Oxford, xiii + 444 pp.
36. Perils, S. (1952), Theory of Matrices, Addison-Wesley
Press, Inc., Cambridge, Mass., xiv + 237 pp.
37. Ruff, Ronald E (1973a), Memorandum of January 10 with
attachments.
38. Ruff, Ronald E (1973b), Memorandum of 23 January with
attachments.
39. Slepian, David (1962), "The one sided barrier problem
for Gaussian Noise", The Bell System Technical Journal,
Vol. 41, March 1962, pp. 463-501.
196
-------
40. Stroud, A.H. (1971), Approximate Calculation of Multiple
Integrals, Prentice-Hall, Inc., Englewood Cliffs, N.J.,
418 pp.
41. Stroud, A.H. (1960a), Quadrature Methods for Functions
of More than one Variable, Annals of_ the New York
Academy of Sciences, Vol. 86, pp. 776-791.
42. Stroud, A.H. (1960b), Numerical Integration Formulas
of Degree Two, Mathematics of Computation, Vol. 14,
pp. 21-26.
43. Spearman, Carl (1904), "General intelligence objectively
determined and measured", Amer. J. Psychol., Vol. 15,
pp. 201-293.
44. Synge, J.L. (1953), Simple bounding formulas for integrals,
Canadian Journal of_ Mathematics, Vol. 5, pp. 46-52.
45. Thacher, H.C. (1957), Optimum Quadrature Formulas in
8 Dimensions, Mathematical Tables and Other Aids to_
Computation, Vol. 11, pp. 189-194.
46. Turnbull, H.W. (1960), The Theory of Determinants,
Matrices and Invariants, Dover Publications, Inc., New
York, xviii + 374 pp. (From the second edition of
the original, 1945).
47. Turner, D. Bruce (1970), Workbook of Atmospheric
Dispersion Estimates, Air Resources Field Research
Office, Environmental Science Services Administration,
U.S. Department of Health, Education and Welfare,
National Air Pollution Central Administration, Cin-
cinnati, Ohio, vii + 84 pp.
48. Tyler, G.W. (1953), Numerical Integration of Functions
of Several Variables, Canadian Journal of Mathematics,
Vol. 5, pp. 393-412.
49. Wiener, N. (1949), Extrapolation, Interpolation, and
Smoothing of Stationary Time Series, The Technology
Press and John Wiley & Sons, New York, ix + 163 pp.
197
-------
APPENDIX A
Conditions on an Empirical Formula fo r_a Correlation
Coefficient
Before adopting the use of the proper function/proper
value expansion of the correlation coefficients as described
in Chapter III, it was felt that an empirical formula could
be developed that would represent the correlation coefficients
involved. This effort was unsuccessful. At least a part
of the reasons for this lie in the following analysis of the
structure of correlation coefficients or covariances .
An empirical formula to represent a correlation co-
efficient field may not be chosen in a perfectly arbitrary
way even though the expression leads to values that are con-
fined to the range (+!,-!) and has a maximum of +1 at zero
separation between the points concerned. There are several side
conditions that the functions must satisfy. They are stated
here in terms of only one independent variable to keep the
notation reasonably simple, but may be extended to two
or three independent variables with no difficulty. We
consider the correlation of a field property, p, at two
points, x, and x_ , or at x and x+£> Thus the correlation
coefficient concerned will be written as r(x2,x,), r(x+C,x),
or as r(£;x) all of which will be considered as equivalent.
The usual conditions that must be satisfied are:
1. The correlation coefficient takes on the value +1
when Xi=x2 or at £=0; i.e., r(x, ,x,)=+l, r(x+0,x)=+l,
r(0;x)=+l. On the other hand it is not necessary that the
correlation coefficient be continuous at this point. It
may have a jump discontinuity here. In other words, it may
be that the limit, for £>0, may be less than +1,
lim r(£;x)=a,
A-l
-------
2. The value of the correlation coefficient must be
within the range (+!,-!).
3. It must satisfy the symmetry condition r(x2,x,)=
r(x, ,x2). This may also be expressed as r(x+£,x)=r(x,x+£)
and also as r(£;x)=r(-£;x+£). This condition is occasionally
overlooked.
4. It must be a positive definite (or at least a non-
negative definite) function. In the case of an homogeneous
field, this implies that its Fourier transform be positive
(or non-negative).
Elementary statistics texts do not usually discuss these
points in detail. The reader is referred to Cramer and Lead-
better (1967), Panchev (1971), and others. Slepian (1962)
is particularly interesting though it deals principally with
another subject. Levy (1965) is especially good.
In constructing an empirical formula for a correlation
coefficient, one is tempted to use an expression in the form
r (£;x) =r (£ ;a (x) , 3 (x) , ) where a(x), 3(x) are parameters
which in turn are functions of the location of the point x.
To understand what goes on, consider a simple series
expansion of the correlation coefficient function
A-2
-------
where P-j^ has coordinate x and P is at x+£. The coefficient
A2(x) is negative. If we reverse the roles of P, and P2/
then PI will have coordinate (x+£)-£ and P2 will be x+£
so that one has r(£,x)=r(-£,x+£;) for the equality r(P2,P,)=
r(P-L,P2). Then one must have equality of the two series
9 ^ ? 3
1+A (v\ F +A (v\ F + = 14-A (•x+F)F — A (v+ri? + .
J-Tra^ \X-f £3 O * •"•' ^ ... — X~r\~ \ A < C, / c, O V •" S / S ~ * * *
^* O ^ -J
If the coefficients also have valid power series expansion
about x (or 5=0), we expand coefficients on the right and
collect terms in ascending powers of £ to obtain
Equating coefficients of like powers of £ on each side, then
i
A3 = ~A3 + A
i ii
A2/2!
- A3/2!
I ii ii I T\r
+ A/2! - A /3! + A
It is readily seen that these lead to a system of
equations for the coefficients of the odd order terms in
terms of the derivatives of the even ordered terms, and that
A-3
-------
they give no information on the even order terms. If we
write
A = a A' + a A'" + + * A(2n-V
2n-l 1 2n-2 2 2n-4 '" n 2
The coefficients a. are given by the system of equations
2al = 1/1!
a1/2!+2a2 = 1/3!
a1/4!+a2/2!+2a3 = 1/5!
a1/6!+a2/4!+a3/2!+2a4 = 1/7!
a1/(2n-2) !+a2(2n-4) !+ ... +an_1/2I+a
2n
which are easily solved recursively, the first few solutions
are a-L=l/2!/ a2=-l/4!, a3=3/6!, a4=-17/8!, a5=155/10!, etc.
The general expression for the solution is not immediately
obvious.
A-4
-------
A stochastic field of property is said to be homogeneous
(in the extended sense) if its statistical parameters are
independent of the location of the point P, (here, x) with
which the points P2 (here, x+C) are correlated. This means
that the correlation coefficient r(£;x) is a function of £
alone, r(£), and (of course) the standard deviations are
also constant.
The fact that the correlation coefficient for a non-
homogeneous process contains odd order terms in its series
expansion does not imply that a function form with odd order
terms necessarily represents a non-homogeneous process.
For example, nearly all simple empirical expressions for a
correlation coefficient are applicable only to an homogeneous
process. The very popular expression r=exp(-|£j/L), L =
scale parameter, can only apply to an homogeneous process.
Thus, if we assume the contrary and let r(x2,x,)=exp(-|x2~x,|/L(x,)),
then r(x,,x2)=exp(-|x,-x2|/L(x2)), and since these correlation
coefficients are equal, it follows at once that L(x,)-L(x~)
which means that L is a constant and hence that the process
is homogeneous.
The same kind of analysis may be made for the two point
covariance functions. It is not necessary that the covariance
function have a maximum at the point P1=P2* In tiie case
of a one-dimensional function one may use
cov(P2,P1) = cov(£;x) = cov(x)+C1(x)5+C2(x)52 +
and
A-5
-------
cov(P1,P2) = cov(-C;x+C) = CQ (x+£) -GI (x+€) £+C2 (x+C)
9 HIT JIT 4
V
Now rearranging in terms of powers of £ ,
cov(P1,P2) = C0+(CQ-C1)?+(CQ/2!-C2)C+(CQ/3!-C/2!+C2-C3)
TV " ' " ' 4
+ (CjV/4! -(:.,_ /3i+C2/2!-C3+C4)?4+ ...
Equating coefficients in the first and last of these expressions
since cov(P.,,P, ) = cov(P, ,P-) one obtains the system of
relations
f = C
L0 o
Cl = -C1+C0
= -C3+C2-C1/2!+C() /3!
I it n I -rw
C4 = C4~C3+C2/2I~C1 /3!+co
A-6
-------
which lead to expressions for the odd order terms as
I T\T W
/3!-C:Jv/4!+Cg/5!
This system of equations is the same as those obtained for
the terms in the expansion of the correlation coefficient
with the exception that one starts here with C, while before
one started with A~ and we have rather general values for
GO and C, while in the previous example A =0 and for any
correlation coefficient AQ=1.
The point of these exercises is to emphasize the fact
that care must be exercised in selecting an empirical formula
to represent a correlation coefficient function. Otherwise
one may have an expression that cannot represent a correlation
coefficient function. This is particarly the case when the
field concerned is not homogeneous.
A-7
-------
APPENDIX B
NOTE ON THE RANK OF A COVARIANCE MATRIX
Let A be the covariance matrix and let X be a data
matrix in which the element x. . is the departure from the
mean of the pollutant concentration at station i on day j .
Let there be n stations (i=l, --- , n) and d reporting days
(j=l, --- , d) . Then the covariance matrix may be written
in the form
A = XX' /d, X1 = transpose of X
so that the element a. . is the mean sum of products
and is the covariance of pollutant concentration at stations
i and j. The matrix A is nxn while X is nxd and X1 is dxn.
It is a well known theorem that the rank of a matrix
cannot exceed the smaller of the number of rows and the
number of columns. Thus, the rank of X (and of X1) is the
smaller of d and n. Also, the rank of a matrix product
is no greater than that of either factor (Perils (1952) ,
Ex. 7, p. 58). (The division by the scalar, d, to obtain A
does not affect the rank.)
As a result, the rank of the covariance matrix A, which
is always nxn, cannot exceed the smaller of n and d. Thus,
if there are 40 locations, but only one day of observations,
the rank of A would be 1. If there are 27 days of observations,
the rank of A would not exceed 27. If there are 59 days
of observations, the rank of A would not exceed 40. If
there were 1241 days of observations, the rank of A would
not exceed 40.
B-l
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
1 REPORT NO.
EPA-650/4-75-005
3. RECIPIENT'S ACCESSIOONO.
4. TITLE AND SUBTITLE
OBJECTIVE PROCEDURES FOR OPTIMUM LOCATION OF AIR
POLLUTION OBSERVATION STATIONS
5. REPORT DATE
June 1975
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
8. PERFORMING ORGANIZATION REPORT NO.
C. Eugene Buell
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Kaman Sciences Corporation
P. 0. Box 7463
Colorado Springs, Colorado 80933
10. PROGRAM ELEMENT NO.
1AA009
11. CONTRACT/GRANT NO.
EPA-68-02-0699
12. SPONSORING AGENCY NAME AND ADDRESS
13. TYPE OF REPORT AND PERIOD COVERED
Environmental Sciences Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Final
14. SPONSORING AGENCY CODE
EPA-ORD
15. SUPPLEMENTARY NOTES
16. ABSTRACT
This document is concerned with developing linear regression techniques for
interpolation of air pollutant concentrations over an area and, using these
techniques in a computer program for determining the optimum location of air
pollution observing stations. The general interpolation problem is surveyed and
the advantages of using linear regression formulas as interpolation formulas are
discussed. The case of observations containing errors of observation or effects
of limited range of influence is emphasized. Since the use of linear regression
methods depends on knowledge of the two-point correlation function for pollutant
concentration measures, the construction of correlation coefficients from
synthetic data is taken up. Attention is given to the estimation of residual
variances or the effects of limited range of influence, using Factor Analysis.
In extending these methods to a continuous formulation in integral equation form,
the lack of accuracy in the integral equation solution is more important than the
statistical significance of the data unless the residual variances are removed.
If this is done, then the tests for accuracy and statistical significance are
reconciled. If the user carefully handles the residual variances in constructing
program input, difficulties encountered in code development are avoidable.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b.IDENTIFIERS/OPEN ENDED TERMS C. COSATI Field/Group
Air pollution
Linear regression
Interpolation
Factor analysis
Optimization
13B
12A
18. DISTRIBUTION STATEMENT
RELEASE TO PUBLIC
19. SECURITY CLASS (This Report)
UNCLASSIFIED
21. NO. OF PAGES
215
20. SECURITY CLASS (Thispage)
UNCLASSIFIED
22. PRICE
EPA Form 2220-1 (9-73)
-------
INSTRUCTIONS
1. REPORT NUMBER
Insert the EPA report number as it appears on the cover of the publication.
2. LEAVE BLANK
3. RECIPIENTS ACCESSION NUMBER
Reserved for use by each report recipient.
4. TITLE AND SUBTITLE
Title should indicate clearly and briefly the subject coverage of the report, and be displayed prominently. Set subtitle, if used, in smaller
type or otherwise subordinate it to main title. When a report is prepared in more than one volume, repeat the primary title, add volume
number and include subtitle for the specific title.
5. REPORT DATE
Each report shall carry a date indicating at least month and year. Indicate the basis on which it was selected (e.g., date of issue, date of
approval, date of preparation, etc.).
6. PERFORMING ORGANIZATION CODE
Leave blank.
7. AUTHOR(S)
Give name(s) in conventional order (John R. Doe, J. Robert Doe, etc.). List author's affiliation if it differs from the performing organi-
zation.
8. PERFORMING ORGANIZATION REPORT NUMBER
Insert if performing organization wishes to assign this number.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Give name, street, city, state, and ZIP code. List no more than two levels of an organizational hirearchy.
10. PROGRAM ELEMENT NUMBER
Use the program element number under which the report was prepared. Subordinate numbers may be included in parentheses.
11. CONTRACT/GRANT NUMBER
Insert contract or grant number under which report was prepared.
12. SPONSORING AGENCY NAME AND ADDRESS
Include ZIP code.
13. TYPE OF REPORT AND PERIOD COVERED
Indicate interim final, etc., and if applicable, dates covered.
14. SPONSORING AGENCY CODE
Leave blank.
15. SUPPLEMENTARY NOTES
Enter information not included elsewhere but useful, such as: Prepared in cooperation with, Translation of, Presented at conference of,
To be published in, Supersedes, Supplements, etc.
16. ABSTRACT
Include a brief (200 words or less) factual summary of the most significant information contained in the report. If the report contains a
significant bibliography or literature survey, mention it here.
17. KEY WORDS AND DOCUMENT ANALYSIS
(a) DESCRIPTORS - Select from the Thesaurus of Engineering and Scientific Terms the proper authorized terms that identify the major
concept of the research and are sufficiently specific and precise to be used as index entries for cataloging.
(b) IDENTIFIERS AND OPEN-ENDED TERMS - Use identifiers for project names, code names, equipment designators, etc. Use open-
ended terms written in descriptor form for those subjects for which no descriptor exists.
(c) COSATI FIELD GROUP - Field and group assignments are to be taken from the 1965 COSATI Subject Category List. Since the ma-
jority of documents are multidisciplinary in nature, the Primary Field/Group assignment(s) will be specific discipline, area of human
endeavor, or type of physical object. The application(s) will be cross-referenced with secondary Field/Group assignments that will follow
the primary posting(s).
18. DISTRIBUTION STATEMENT
Denote releasability to the public or limitation for reasons other than security for example "Release Unlimited." Cite any availability to
the public, with address and price.
19. &20. SECURITY CLASSIFICATION
DO NOT submit classified reports to the National Technical Information service.
21. NUMBER OF PAGES
Insert the total number of pages, including this one and unnumbered pages, but exclude distribution list, if any.
22. PRICE
Insert the price set by the National Technical Information Service or the Government Printing Office, if known.
EPA Form 2220-1 (9-73) (Reverse)
-------
|