&EPA
United States
Environmental Protection
Agency
February 1986
A Primer on Kriging
-------
A PRIMER ON KRIGING
Prepared for
The Statistical Policy Branch
Office of Standards and Regulations
U. S. Environmental Protection Agency
Washington, D. C. 20460
By
Robert W. Jernigan
Department of Mathematics
Statistics and Computer Science
The American University
Washington, D. C. 20016
-------
ABSTRACT
Kriging is the name given to least squares prediction of spatial processes. It is one of
the defining tools of the geostatistician. It has been called surface fitting, trend surface
analysis, contouring sparse spatial data, and a generalization of spline interpolation. But
it comes down to being best linear unbiased estimation of a stochastic process by
generalized l^ast squares. We will examine the assumptions, tools, and methods involved
in kriging, and present several examples of its use. Kriging has applications in mapping
spatial environmental data, including, toxic wastes, acid rain, groundwater pollution, etc.
11
-------
PREFACE
This paper presents the essence of a statistical technique for the prediction and mapping
of spatially variable data, known as kriging. This work has grown out of an attempt to
consolidate results, methods, and applications of kriging that are widely scattered in
many journals of application. The presentation is intended for statisticians familiar with
the workings of least squares estimation through the use of matrix algebra. The results
that follow try to relate the language and techniques of kriging to more standard
statistical terms and methods. It is hoped this document will serve to expand the
audience for these procedures by providing a thorough presentation of statistical ideas
involved. Many sections are quite technical, but the essential approach can often be
garnered from just the text.
111
-------
Table of Contents
1.0...Introduction 1
2.0...Landfi11 Example 5
3.0...Definitions and Fundamental Tools 10
3.1...Covariance Function 10
3.2...Simplyfing Assumptions 12
3.2.1 Isotropy 12
3.2.2 Intrinsic Hypothesis 16
3.3...Properties of the Covariance 16
3.4 Nugget Effect 17
3.5...Variogram Models IS
3.6...Bounds on Variograms 21
3.7...Hole Effects 24
4.0...Estimation of the Variogram 26
4.1... Estimation of the Variogram 26
4.2...Robust Estimation 28
4.3...Graphical Estimation Techniques 29
4.4...Numerical Estimation Procedures 33
4.5...Small Data Set Variogram Estimation 35
5.0...Estimation by Kriging 4O
5.1...Simple Kriging 4O
5.2...Example from Time Series 42
5.3...Local vs. Global Estimation 44
5.4...Kriging with a Non—zero Mean 46
5.5...Ex amp1e 47
6.O...Universal Kriging (U.K.) 49
6.1...Universal Kriging 49
6.2...Example 52
6.3...Circular Estimation with U.K. 56
7.0...Generalized Covariances 57
8.0...A Robust Trend Technique 63
9. 0. ..Kriging and Splines 64
1O.O..Kriging the Landfill pH Data 66
Appendix
Simulation of Isotropic Processes 7O
References 81
-------
1.0 Introduction
The pattern and extent of many environmental variables are o-ften
examined by the mapping of spatial data. The set of contours of equal
temperature or pressure on weather maps is a common example. Other
diverse examples include: the mapping of mineral deposits by geologists,
species ranges by biologists, stellar radiation emissions by
astronomers, and locations of cancerous growths by physicians. Examples
common to pollution monitoring are contouring of groundwater pollution
from toxic waste dumps or landfills, mapping the impact of acid rain on
forest and aquatic life, spatial studies of air pollution, and regional
maps of birth defects to locate toxic sites. All of these examples are
involved with spatial relationships.
A common problem encountered in spatial analysis is the estimation of
the surface that is generated by a set of spatial data. This surface can
be helpful in finding explanations of spatial patterns or in predicting
data measurements in regions that were previously inaccessible or
unobserved (interpolation).
The problem of estimation of a spatial process is often called trend
surface analysis. Many techniques, both statistical and numerical, have
been applied to this problem of fitting a surface. Regression or
response surface analysis is one such technique. It typically estimates
-------
an average underlying surface with individual data points deviating from
a surface of best overall fit. Smoothing techniques, for example by
moving averages, are another way of producing a local average surface
value. Many interpolation schemes from numerical analysis have also been
applied to estimate a surface. Fourier or polynomial interpolation
produce surfaces that exactly interpolate the data, but unfortunately,
they can produce quite erratic behavior. Osculatory and some other
piecewise polynomial techniques also produce exact interpolations, but
they sometimes have no sound statistical background. Interpolation by
cubic splines, a related technique, does have a close tie to the
prediction of stochastic processes. In this paper we will consider a
technique that combines the statistical properties of regression and
interpolating properties of splines. It is called kriging.
Kriging is the name given to the least squares prediction of spatial
processes. It is a form of surface fitting using regression techniques
that include spline interpolation as a special case. In contrast to
smoothing or simple polynomial regression techniques it does not produce
a surface of overall general fit. Rather it produces a surface that is
"tied down" to interpolate the data points exactly and to estimate
optimally the process between the data points. Statistically, kriging is
best linear unbiased estimation of a spatial stochastic process using
generalized least squares. As such, as Watson<1969) has pointed out its
novelty does not lie in statistics. Its origins and development are
found in geology.
Kriging was developed by "geostatisticians", and in many ways it is a
technique that defines the term "geostatistics." A geostatistician or a
-------
geostatistical approach uses kriging. The name comes from D.R. Krige, a
mining engineer in South Africa, who in 1951 first applied the technique
to the estimation of gold deposits. The mathematical theory associated
with kriging was advanced by G. Matheron in the early 196O's at the
Fountainbleau Mining School in France, under the heading "Regionalized
Variables."
Kriging uses essentially the same techniques to model and predict
spatial variation that are used to model and predict temporal variation
in time series analysis. Although many of the techniques for time series
analysis can also be applied to spatial data, many new and different
problems remain. In time series analysis we consider past events,
resulting in dependencies that extend only in one direction. In . spatial
analysis, even if on a straight line in the plane, dependencies can
extend in two directions at once.
Our aim in this paper is to summarize the essential theory behind
kriging. We will examine its fundamental definitions, assumptions, and
tools. We will be most interested in considering how useful the
technique is in small sample estimation of environmental data. Some of
the simplifying assumptions used in kriging are not always reasonable in
many environmental settings. Without these assumptions a much greater
amount of data is needed. One difficulty is that often economic or other
considerations require that we work with sample sizes that are extremely
small. Much of the traditional work in kriging has been in developing
techniques for use with large data sets. Little previous work has gone
into techniques for use with small data sets. Kriging procedures need
modification and further exploration when used with small amounts of
-------
data. We will examine some approaches to this problem.
We will present several examples to show the essence of kriging. Some
examples connect kriging and time series prediction. Other examples
illustrate some of the unique problems of spatial estimation in the
presence of a trend. We also have a particular environmental example
that will be carried throughout the paper to illustrate various
techniques.
-------
2.0 LANDFILL EXAMPLE
To -Fix ideas on a particular practical problem we will consider data
collected on a land-fill in Dade County(Miami) Florida -for the
Environmental Protection Agency by Technos, Inc.(19S4). The land-fill
covers an area of about one square mile and is located in the Biscayne
Aquifer. This aquifer is relatively homogenous and near the ground
surface. It is tapped by two county well fields about 1.5 miles
down—gradient from the landfill.
It was of interest to determine the effects of the landfill on the
aquifer, and to measure these effects eight wells were drilled over the
neighboring area (Figure 2.1). Figure 2.2 shows measurements of pH taken
from samples of the water at 1O foot depths. We would like to examine
the structure of these measurements and, with the assumptions, tools and
techniques of kr-ging, estimate the pH at the unobserved positions. In
particular, we will compute estimates of pH at all the "x" points in the
grid of Figure 2.3. This appears to be an ambitious goal: to estimate
160 grid points from S data points. We shall see that this estimation is
quite poor in the corners of the map, far from the data, and better in
regions surrounded by the data points. Furthermore, kriging, being an
exact interpolator, will reproduce the pH estimate at well M7A exactly
as its given value of 11.4. This large value of pH may well be
questionable, but kriging will assume all of the data are correct and
produce an interpolating surface.
As mentioned earlier, the practice of kriging, developed mainly in
-------
FIGURE 2.1
Dade County (Miami) Florida Landfill
WASTE
RECOVERY
PLANT
0
M2
SCALE
i?5fl t*.
BORROW PITS
NW 58TH STREET
LANDFILL 0
H
DORAL GOLF COURSE
ROADS
0
M8
0 0
4A MSA MSA
0
MQ
™\J
X
"^0-
^X
J
WELLFIELD/
e ~~
/
M7A /
,
0
M18
CANALS
r
.
,.;;.,
1
I.J.J
L.
•*•*•»
,,.,1
I.J..I
ii|
-------
February 1986
Washington, D. C.
This publication is available free upon request, in single copies. For further information write
to N. Phillip Ross, Chief, Statistical Policy Branch, Office of Policy, Planning, and Evaluation,
U. S. Environmental Protection Agency, 401 M Street, S. W., Washington, D. C. 20460.
-------
FIGURE 2.2
PH levels at 10 foot depths
7.
o
0
7.4
.8 6.7 6.6
o
6.8
o
11.4
7.0
-------
FIGURE 2.3
Grid of Points to Estimate
X )
X )
X >
K )
x M2 )
0
X )
X )
X X Of? X
: x x x x
X X XX
X X XX
; x x xx
H4fi
X X X X °
X X X X
: x x x x
X * X X X X
—
X X X X X
X X X X X
X X XX X
\ A /"» Mft ^
0
< X X X X
MSA N6A
X ° X X° X X
V
X X X X X
X XX X X
u M9
X X X X X
X X
X X
X X
x £
UM
X X
x y
#\x x
X X "%£
>: x x /
\ • \.' •.;.•'/
l\ A .»/
7A /
X X " X
X X X
M16
X X
X X
X X
4* W—
XXX
XXX
XXX
.1 . W .. i W . "ii . . .
00
-------
geological settings, has been with large data sets. Little has been done
with small data sets. The problems encountered with data sets of this
size are considerable, and the confidence in the estimation can be low.
Given our choice we would much prefer to use more data, but this is
often not possible. We are often forced to analyze small samples. Some
approaches to this problem will be presented.
-------
ro
3.0 DEFINITIONS AND FUNDAMENTAL TOOLS
3.1 COVARIANCE FUNCTION
We will begin by considering a stochastic process, that is a collection
o-f random variables Z(x) indexed by a parameter x. The random variable
Z(x) could represent the prices o-f gold and silver at a certain time x.
In this case we would be using time series analysis, where x is
typically a time index with a natural ordering, and Z and VAR(Z(x)) denote the mean and variance, respectively o-f
the random variable associated with the spatial position x. A
fundamental tool in the study of stochastic processes is the covariance
function:
K(x,y> = E[ Z(x) - E(Z(x»][ Z(y) - E(ZCy))]
This function measures the joint variation of the measurements at
spatial positions x and y. For example, if when, on the average, a
measurement Z(x) at spatial position x is above its mean, another
measurement Z(y) at position y is below its mean, then the covariance
-------
11
between the two points would be negative. Positive covariances indicate
joint variability on the same side of the means, on the average. The
magnitude of this covariance is best considered by dividing it by the
standard deviation o-f both Z(x) and Z(y). This would produce a
correlation function, p(x,y). Points with high positive correlation would
tend to stay above(or below) their respective means together, on the
average. Points with high negative correlation would tend to lie with
one above and one below their means.
Calculation of the covariance function requires knowledge of the means
of Z(x) and Z(y), so that, geologists have made special use of another
measure of joint variation that can be calculated without the means.
This function, called the variogram, is the variance of the difference
or increment Z(x>—Z(y), and is written:
2'Y - Z(y) )
^(Xjy) is called the semi—variogram. This function is often called the
serial variation function in older statistical literature see
Jowett(1952,1955,1957), and a special case of this serial variation was
investigated by Von Neuman, et.al.(1941) as the mean square sucessive
difference. It is often used in tests of randomness. The semi—variogram
also measures covariation. Two measurements that tend to be above their
means together would likely have a small amount of variability in their
difference. Measurements that tend not to be correlated would have a
difference that was quite variable. As we will see, the variogram is, in
essence, the negative of the covariance function. Points with large
covariance have a small variogram value, and paints with low covariance
-------
12
have a high variogram value.
Several simplifying assumptions help us to model and predict spatial
behavior. If the covariance function of a stochastic process depends
only on the positions of two points relative to each other and not on
their particular fixed locations, we say that the process is stationary.
To be more specific, a stochastic process is said to be stationary of
order two if it has a constant mean and its covariance function K(x,y)
depends only on the difference vector h=x-y and not on the particular x
and y chosen. We could then write K(x,y) as some C(h) or C(| hi ,<*), where
I h| is the length of the vector h (distance between the vectors x and y),
and oc is the direction angle. All points separated by the vector h would
have the same covariance. Figure 3.1.1 shows an example of points
separated by the same distance and pointing in the same direction and
would thus have the same covariance for a stationary or homogenous
process. This type of covariance might be applicable to the flow of
water down a gradient, as illustrated in the next section.
3.2 SIMPLIFYING ASSUMPTIONS
3.2.1 ISOTROPY
Many of the equations developed in kriging are simplified greatly "if we
assume that the process has the same covariance structure in every
direction. Such a stochastic process is called homogenous and isotropic,
i.e., its probability structure does not change with a change in spatial
position or a rotation about a spatial position. Isotropy indicates some
uniformity in every direction. For example, at midday in a rural
-------
Fi 9 ure 3.1.1 P o i nts a long a grad i e nt where the
co^ariance might depend'on distance and direotii
U)
-------
14
neighborhood -the sky might be uniformly the same shade o-F blue at any
point just above the horizon. If we consider an urban neighborhood close
to an industrial factory, this uniform blue shade might turn
considerably darker when we look towards the factory's smokestacks. This
lack of uniformity is called anisotropy.
Isotropy is of great use with small data sets where it is difficult to
model and estimate covariances or variograms that depend on both the
distance and direction between two points in space. Much more data can
be combined if we assume that the covariance between two points depends
only on the distance between the points. The assumption of istropy is
not required to estimate the process with kriging; it serves to simplify
the equations and calculations. In fact, in many applications such an
assumption would not be realistic. Figure 3.2.1 shows two directions of
measurement of a flow of some pollutant down a stream. Also displayed
are typical sample variograms for the downstream direction (D) and the
across stream direction (A). Notice that the measurements across the
t
flow of the pollutant exhibit a much greater variability at short
distances than the measurements down the stream. The downstream
measurements are much more nearly alike and therefore have much less
variability within the pollutant flow. An assumption of isotropy here
would not be reasonable. The variability in both directions would have
to be taken into account.
The form of the covariance function for a homogeneous, isotropic process
is quite restrictive. Ripley<19Sl) has pointed out that in the plane
isotropic correlations are bounded below by -.403. Further bounds relate
isotropic covariances to Bessel functions, see Matern(196O). Unless
-------
VAR1QGRAHS
ACROSS STREAH
DQHNSTREAH
DISTANCE
FIGURE 3.2.1
-------
16
explicitly stated, we will assume that the process we are modeling is
both homogenous and isotropic.
3.2.2 INTRINSIC HYPOTHESIS
Geostatisticians sometimes encounter processes with an increasingly
large capacity for spatial variation, i.e. points at great distance from
one another have an increasingly larger variability. When dealing with
these processes with an unbounded variance, geostatisticians often make
use of another property called the intrinsic property. A stochastic
process is said to be intrinsic if the mean is constant and the
increment or difference Z(Xj+h)—Z(x1> has a finite variance which does
not depend on x-. This is equivalent to the increment being a stationary
stochastic process of order two. This same idea is often encountered in
time series analysis, where a process may exhibit nonstationary
behavior, but its first order difference is stationary. In a sense we
assume that the nonstationary behavior that produces the unbounded
variances can be subtracted out of the process. Thus even if the
covariance function was infinite, the variogram, being the variance of a
difference, would be finite.
3.3 PROPERTIES OF THE COVARIANCE FUNCTION
For a stationary stochastic process with mean p, the covariance function
C(h), depending on a vector h, has the following properties, (see
Journel and Huijbreghts(1978)>:
1) C(h) is positive definite, that is for any linear combination
-------
17
y=a1*Z(x1>+ • • '-»-a *Z(x ) the -function C(h) is such that the variance of y
is always greater than or equal to zero.
2) C(0)=VAR(Z(x)> is nonnegative.
3) C(h)=C(— h) the covariance function is an even function.
4) I C(h)| < C(O) Schwarz's inequality.
Many other properties relating the covariance function to spectral
representations of stochastic processes can be found in Ripley(1981> and
Koopmans(1976).
The variogram is related to the covariance function when the latter
exists by:
•Y< h ) = C(O) - C(h)
Furthermore,
1) 'Y(0)=0, although the limit 'Y(h) as h approaches zero is not necessarily
zero.
2)
The variogram may exist even in the case of a process with infinite
variance (C(0)>, and its calculation does not require knowledge of the
mean. For these reasons it is often favored by geostatisticians for
those processes with "unlimited capacity for variability," see Journel
and Huijbregts(1978).
3.4 NUGGET EFFECT
Dne would expect that as the distance between two points in space x,
and x0 decreases to zero, the variability of the increment
-------
18
ZOO—Ztx^,) would also decrease to zero. This continuity o-f the
variogram might well be expected in a region with smooth uniform
measurements. But the variogram need not be continuous at zero, and this
fact has practical advantages to a geologist. This property is termed
the "nugget effect." Its origins can be due to measurement errors or to
what Journel and Huijbregts(197B) call micro—variabilities. Such
discontinuities are reasonable if, for example, a geologist's core
sample at a given depth has hit a "nugget" of mineral, figure 3.3.1.
Suppose that this sample core has removed a concentrated nugget of
mineral, leaving behind a region perhaps devoid of any other measurable
quantities. No matter how close another sample is taken to the first,
quite different measurements would result. We would then measure
significant nonzero variability of the increments at arbitarily short
distances. Of course the variability between the original core sample
and itself is zero, but the variability between two arbitarily close
points may be quite positive. This discontinuity of the variogram at the
origin helps to rodel both continuous and discrete changes in spatial
processes.
3.5 VARIOGRAM MODELS
Geostatisticians have developed a catalog of forms for isotropic
covariances and variograms. There are eight basic variogram models
usually considered, see Journel and Huijbreghts(1978):
(Here we will use h=| h| since we are only considering isotropic models.)
1) The spherical or Matheron variogram is given by:
-------
FIGURE 3.3.1
\
THE NUGGET EFFECT
Location of core sample
Underground
^Av/AVV^xVv/AlV^
Nugget removed
V
^VVX
\:
My
x Surrounding region
with only residue
-------
20
= C [<3h/2a)-h3/<2a3>] -for h<=a
= C for h>a
The value a is termed the range, a distance beyond which two points in
space are no longer correlated. The value C is termed the sill and is
equal to the variance o-f the process C(0).
2) The exponential model is given by: 'Y(h)=C-(l—exp(—h/a»
3) A somewhat inappropriate name is given to the Gaussian model, with
2 2
the form: (Y(h)=C-(l-exp(-h /a )). These last two models reach their sills
asymptotically.
Variogram models without a sill, useful in modeling data with
increasingly larger spatial variability with increasing distance, are:
4) the linear model: 'Y
-------
21
3) generalized covariance model: a specialized linear combination o-F odd
powers o-F h, used in filtering of trends in kriging. (See section 7.0)
Figures 3.5.1 and 3.5.2 show the relationships between some of the
variogram models. Figure 3.5.1 shows examples of variograms with sills,
and Figure 3.5.2 shows those without sills. Notice the different rates
at which the variograms approach the sill. The more rapid the approach,
the smaller is the covariance between distance points, and the spatial
dependencies remain quite local. The rate of approach and the sill value
are often related to estimate geometrically the parameters of the model.
The exponential model reaches 95/C of its sill at a value of 3a, so the
scale parameter, a, is taken to be one-third of this distance. For the
gaussian model, the scale parameter, a, is taken to be 1/V3 of the
distance of 95% of the sill. For the spherical model the range
corresponds to the intersection of a tangent line at the origin and the
line of the sill. This value is 2/3 of the range, a, so that a is taken
to be 3/2 of this distance. These purely geometric properties are used
extensively by geostatisticians to fit variogram models, see Journel and
Huijbregts(1978).
3.6 BOUNDS ON VARIOGRAMS
Variograms that increase too rapidly are often indicative of a process
with a nonconstant mean. For example, consider a one—dimensional process
of the form:
Z(x) = |l(x) + e )=O.
The variogram for such a homogeneous process would be given by:
-------
FIGURE 3.5.1
URRIQGRflM MODELS UITH SILLS
EXPONENTIAL
D
2 3
H (DISTflNCEl
-------
FIGURE 3.5.2
LINERR RND PDUER FUNCTION UflRIDGRHMS
H 1 H-
CM
o
0
234
H (DISTflNCEl
to
-------
24
2-Y(h) = E( Z - Z(x) >2
= E( e )2 + E( p(x+h> - p )2
The e term is simply the variogram o-F the residual errors. The p term is
the bias due to a drift in the mean. If, for example, the mean is
linear, i.e. ^(x)=x, rather than a constant, then this bias term becomes
2
h , and the variogram would grow quadratically. A quadratic mean would
produce a variogram that grows as the fourth power of h, etc., see
Starks and Fang(1982) for further discussion of the effect of
nonconstant means, as well as Neuman and Jacobson(1984) and Aboufirassi
and Marino(1983).
Matheron<1973> showed for spatial processes the fact that the covariance
is positive definite insures that for a vector h,
lim -YtfD/lhl2 = O
I h| ->oo
That is, -Y
-------
25
when one is below its mean the other is above its mean. This is termed
the hole—e-f-fect. The bounds mentioned earlier for isotropic covariances
limit the size of negative covariances and these hole effects, and
therefore considering only positive covariances is not as restrictive as
it sounds. Variogram models with hole effects are typically
exponentially damped sines and cosines.
The theoretical bounds on isotropic covariances discussed in section 3.4
are often taken as rules for distinguishing hole—effects from random
variability in estimated variograms. This variability in estimated
variograms needs further study. It is clear that a more sound
statistical technique is needed to identify hole—effects.
-------
26
4.1 ESTIMATION OF THE VARIOGRAM
Geostatisticians use the term structural analysis to denote the
estimation, fit, and study of variogram models. Empirical variograms are
calculated from a given set of data and used to fit the various models
presented earlier. The empirical variogram is calculated as:
— -- 2
CZ(x.-t-h) - Z(x-)l
nh i=l 1 x
where n. is the number of pairs of points separated by the distance h.
Figure 4.1 shows a sample variogram calculated from the simulated
istropic process discussed in the appendix. A cubic spline has been fit
to the data, merely to connect smoothly the points for easier viewing.
Notice that the variability increases as the distance between points
increases. This corresponds to a decreasing correlation with increasing
distance. Notice the approach to what appears to be a flat sill value of
about 25, around a distance of about 4 units. Notice also the random
variability as the values fluctuate and do not increase in a strict
monotonic fashion. Whether a strict monotonic variogram should be fit in
these situations, or one with a hole effect (as with the splines)
deserves more study. Most practioneers would dismiss hole effects in
this example.
Some studies of the statistical behavior of the empirical variogram have
been undertaken by Borgman and Davis(1978,1981). They have shown that,
for processes with a finite variance, *Y(h) is asymptotically normally
-------
CD
CO
in
CM
o
CM
FIGURE 4.1 VARIOGRAM CALCULATED FROM SIMULATED DATA
H 1 H 1 1 I-
O
0
H (DISTRNCE)
-------
28
distributed. They have also produced, for one-dimensional normal
processes, tables of the exact sampling distribution of the variogram by
numerical inversion of the characteristic function.
4.2 ROBUST ESTIMATION
Small sample estimation can lead to estimates that are quite erratic.
Estimators that minimize the effects of single influential data points
might prove useful in such problems. Cressie and Hawkins(198O)
investigated several of these, so—called, robust estimators of the
variogram. Their approach was to consider variogram estimation, not as a
problem of variance or scale parameter estimation, but rather as a
o
problem of estimating the mean of the quantities W.=CZ(x-)—Z(x-+h)>
where Z(x.) are normally distributed. Power transformations of Y. to
symmetry and studies of cumulants suggested that
1/2
Y^l Z-Z(xi+h)| is approximately normal with a skewness of -OS and a
kurtosis excess of -.52. Several estimators of the point of symmetry of
the Y. distribution were then considered, including the mean, median,
trimmed means and M—estimators of Huber, Tukey, Hampel, and Andrews (for
studies of robust estimation see Hogg(1974), Huber<1972)). The mean
Y of the Y. is approximately normally distributed, by the central
limit theorem, and through series expansions Cressie and Hawkins found
that for n=n. ,
E[Y4/2'Y
-------
29
Z =.6Z ,+u where the u terms have various distributions with
n rt i f\ r?
progressively heavier tails. Cressie and Hawkins interpreted these
realizations as traverses across a spatial process and considered the
estimation of 'Yd) only. Their findings revealed that the M-estimators of
the mean of Y. are best, but considering its good performance and ease
of calculation, suggested that the variogram be calculated using:
= Y A.457 + .494n * + .O45n 2)
where Y = ^ £| Z(xi+h)-Z(xi>| 1/2 with "=nn
Figure 4.2 shows the Cressie and Hawkins variogram for the simulated
process discussed in the appendix. Notice that for the one distance that
they investigated (h=l) their estimation of 'Y(l) is slightly larger than
the empirical variogram, but for all other distances (h>l) it is
uniformly lower.
In one— dimension Sharp(1982) has examined estimation of variograms with
the maximum entropy method from time series analysis.
The previous techniques have brought much of modern statistical
estimation theory to the study of the variogram, but unfortunately most
techniques used in practice are either grossly subjective or blindly
numerical.
Journel and Huijbregts(1978) present several case studies in which
-------
CO
in
Cxi
(XI
CI5
FIGURE 4.2
EMPIRICnLtTDP] flND CRE5SIE RND HRUKINS UflRIDGRflMS
-f
2
3 4
H miSTRNCE]
I
6
U)
o
-------
FIGURE 4.3 GEOMETRICALLY FITTED VARIOGRAMS
CO
_
CM
Cxi
D 1
7
to
r-J
H [DISTRNCE1
-------
32
simple geometric properties of variograms are used to estimate its
functional form. For example, subjective, graphically chosen values for
the sill and the range are used in the variogram forms listed earlier.
Figure 4.3 shows the results of fitting the spherical, exponential, and
gaussian variogram models to the simulated data presented earlier. The
sill for all of the model was chosen graphically to be 25.25. For the
spherical model the range parameter, a, was calculated, following
Journel and Huijbregts(1978>, as three—halves of the distance (h) value
obtained from the line tangent to the variogram at the origin. This
graphically fit variogram fits the first two values quite well at the
expense of the remaining data. For points from 2 to 4 units apart, this
spherical variogram suggests more variability and therefore less
correlation between these points. The exponential scale parameter was
taken as one-third of the range(taken to be 4 units). The gaussian scale
parameter was taken as 1//3 of the 4 unit range.
The gaussian fit seems to pass more closely to the data values beyond a
distance of 2 units, while suggesting more highly correlateddess
variable) values under 2 units apart. The exponential fits well only
beyond the range. Since local behavior is important for most estimation
methods by kriging, probably the spherical or gaussian models would be
chosen. The expected behavior at the origin of the process being studied
would decide between the two, with higher expected correlation for
extremely close values going in favor of the gaussian model.
Geostatisticians have defended these graphical practices with the
evidence of much empirical work. They argue that estimation of the
variogram is not their primary goal, but only a tool for estimation by
-------
33
kriging, see David(1975). They are also in the unique position of being
able to compare directly their predictions with the true measurement. If
an estimate of the mean grade of a mineral is obtained for a particular
point they can then mine that region and directly measure the mean
grade. This method of direct checking of the estimation has been a
tremendous benefit to the development of the practice of kriging.
Unfortunately, these same checking procedures are impossible to obtain
in estimating, say, acid rain deposits.
4.4 NUMERICAL ESTIMATION PROCEDURES
In the above graphical procedures the choice of suitable sills and
ranges was surely in the eye of the beholder. The biases of different
researchers would produce different variograms and result in different
kriging estimates. To eliminate these biases, many regression techniques
have been used to fit the variogram. Devary and Doctor(1981) used
non—linear least squares to fit the gaussian model to hydrologic field
data. Since variogram estimates at different points are likely to be
correlated, the using of techniques that assume uncbrrelated data is
questionable. If one had knowledge of the correlation between variogram
estimates at different distances, these could be incorporated into a
generalized least squares estimation procedure. Unfortunately, these
correlations are not known. Figure 4.4 shows the results of fitting
variogram models to the simulated data presented earlier using nonlinear
least squares. Notice that each numerical technique seeks its own sill
level in an attempt to obtain the best fit.
Some researchers have tried to improve on these fitting procedures.
-------
FIGURE 4.4 NUMERICALLY FITTED VARIOGRAMS
CD j L 1 1 ] h
in
CD
CM
D
0
2 3
5
6
-H
7
U)
H miSTRCEl
-------
35
Aboufirassi and Marino(1983) determined the range of a spherical model
graphically, but estimated the sill using cross—validation. They began
with a sill value of one, then removed and reestimated(using kriging)
each data point from the spatial data. The resulting estimation variance
can be shown to be 1/C times the "true" estimation variance when the
variogram with the "true" sill value of C is used. For each data point,
the difference between the observed and estimated data values was
divided by individual estimation standard deviations to produce what
were called reduced errors. The sample variance of these reduced errors
was taken as the value of the sill. Sample variability of the variance
estimate was not considered. They further used geometric properties to
fit a gaussian model.
Other techniques used in the kriging of nonstationary processes can be
used to estimate generalized covariances and thus eliminate the need for
direct variogram calculation. We shall see these techniques in section
7.0
4.5 SMALL DATA SET VARIOGRAM ESTIMATION
In small data sets these techniques are difficult to apply. In our
landfill example, not only do we have a small amount of data, but the
data arrangement is extremely unequally spaced. For a given distance h
there may be at most one (n. =1) pair of points separated by h. In larger
data sets we could increase the number of terms (n.) by grouping
together terms in neighborhoods of various distances. The average of
these terms would then be assigned to the average distance of the point
pairs. In large data sets this averaging procedure has been investigated
-------
in
CM
CM
it:
I.—V
n
FIGURE 4.5.1
i
INDIVIDUAL VARIOGRAM TERMS
.__ i i i
r ~i • ~ • r
D
n
n o
FOR pH
D
10
?0
n
3D
1Q
BO
U)
cr»
H
-------
37
by Myers, et.al.(1982).
To illustrate this technique, Figure 4.5.1 shows the values of W. =
o
•tZ(x.) — Z(x-+. )>"*" for all 28 distances from our original 8 pH
measurements from the landfill example. Notice the large variability in
these terms. Figure 4.5.2 dispays the results of averaging disjoint
collections of about 6 of the W terms. A gaussian variogram has been
fitted to the resulting 6 data points using nonlinear least squares.
With such a small data set there is a drastic reduction in the useable
data to estimate the variogram. Furthermore the resulting variance, as
given by the sill, is more than a factor of two greater than the simple
variance of the pH measurements.
This is one technique that can be used to obtain an estimated variogram
from such a small set of data. Some other approaches that have proved
useful to some degree are suggested below.
1) A scatterplot smoothing of the W terms with respect to' h could be
applied. This has been done with the robust smoother, LOWESS, which
stands for LOcally WEighted Scatterplot Smoother. This smoothing
technique was developed by Cleveland(1978). It is essentially an
iterative regression prediction weighted most heavily by neighboring
points and least heavily, by points with large residuals. The general
results are a smoothed estimation of the variogram from the (h,W)
scatterplot that shows a linear increase for small h, followed by a
relatively flat range for middle—sized h, ending in a slight downward
turn for large h. The degree to which this smoothing could estimate the
true underlying variogram needs further investigation. A simulation of
-------
CD
FIGURE 4.5.2
GRUSSIRN UflRIQGRflM NDNLIN REG FIT TD PH
H 1 1 1 H——h
GO
3EI
cc
CE
CD
1Q
IB
20 25 30
DISTflNCE
35
45 50
U)
00
-------
39
this estimation procedure with small samples and a known parent
variogram is needed.
2) Other robust regression techniques could also be applied to the (h,W)
scatterplot. A discussion o-f many iteratively re—weighted least square
techniques that can yield a variety o-f -fitting criteria is presented in
Hosteller and Tukey(1979).
3) A resampling plan such as the jackknife or the bootstrap could be
used to obtain estimates o-f the variability of any o-f these estimation
schemes. The resulting variance could be used to place confidence
intervals on the variograms and in turn introduce the estimation error
into the kriging estimates.
-------
40
5.0 ESTIMATION BY KRIBINB
5.1 SIMPLE KRIBING
Kriging is the name given by Matheron(1965) to the calculation of best
linear unbiased estimators of various functions of spatial processes. It
is the time series prediction theory of Wiener<1949) and
Kolmogorov<1941) applied to spatial data, and as such is estimation in
the presence of correlated errors. The basic statistical technique is
widely used by statisticians as generalized least squares. The
application of this technique has been studied extensively as a time
series prediction problem by Srenander and Rosenblatt(1957),
Koopmans(1974), and Box and Jenkins(1976), to name only three.
Consider a spatial process of the form
Z(x) = p(x) + e
2) ^l(x)=unknown constant
3) [l(x)=unknown polynomial in space.
-------
41
For observed values Z(x~),...,Z(x ) we wish to estimate(interpolate)
the value of the process at an unobserved point x by a linear
combination o-f the observed values, i.e.
o,
Z(x> =
The coefficients A- are chosen so that Z(x> is:
1) unbiased, E[Z(x>] = Z(x>, and
2
2) mininum variance, ) -
where
-------
42
where K is the covariance matrix Her. -3, x=' and
This system has a solution provided that K is positive definite. The
minimal variance is given by:
= Var = 0, consider the following time
-------
43
series prediction problem. Let Z(x) = e(x) + ,5e(x—1) be a
one—dimensional first order moving average time series. The e(x> terms
are taken to be independent, normally distributed error terms with a
mean of zero and a variance of one. The covariance of this process is
given by:
5/4 for h=0
Cov(Zx,Zx+h) = 1/2 for h=l
0 for h>2
Consider a sample Z,,...,Z- of size n=4 from this process. The
covariance matrix K for this sample is given by:
I 5/4 1/2 0 O I
I 1/2 5/4 1/2 01
I O 1/2 5/4 1/2 |
I 0 0 1/2 5/4 |
Suppose that we wish to predict Zg. The covariance vector
(o-j ,,...,'=<-.047, .117, -.246, .498)'. The estimate
of Zg then becomes Zg = -.O^721 + .117Z2 - .246Z3 + .498Z4. This
estimate has a mean of zero, since each Z^ has a mean of zero.
-------
44
This is kriging and it is said that we have just "kriged" an esimate of
Z5-
Note the relationship between Box and Jenkins time series analysis
prediction of Z^ and kriging, Box and Jenkins (1976). They assume an
infinite past of available data and estimate Z— with weights (.5, -.25,
.125, -.0625, ... ,(-.5)j+1,...)'. Our kriging weights of (.498, -.246,
.117, -.047)' are close and differ due to the fact that we are using
only four data points to predict Zg.
5.3 LOCAL vs. GLOBAL ESTIMATION
The above approach assumes the use of all of the data, and since K is an
nxn matrix, large samples can severely limit the practical use of this
technique. Several approaches have been taken to limit the size of the
matrices involved and thus minimize the calculation. Journel and
Huijbregts(1978) and many others suggest using local estimation by
considering only data points close to the estimation point x. If the
covariance between nearby points decreases rapidly this technique can
limit the size of the kriging system of equations. For isotropic
processes the matrix K is completely determined by the relative geometry
of this local neighborhood. If this geometry remains constant, say by
moving over a regularly spaced grid, then the A constants will not
change for estimating similarly positioned points. For example, if the
dots in the figure below represent the only data points used to estimate
the points represented by x's, only one K matrix needs to be calculated
for one x value. Estimates of the others can be obtained by rotation.
-------
45
x x
X X
As long as we are restricted to a regular grid this simplification is
great, but it also requires a large amount of regularly spaced data.
Sparse regions are not conducive to this procedure. Small data sets can
easily make use of all of the data in the covariance matrix. In our
landfill example with 8 data points, we will need to solve a system of 8
linear equations, if we assume the mean is zero. More elaborate forms
for the mean of the spatial process will add more equations.
For geostatisticians, the large and regular data requirements for
kriging are net 'Lhat restrictive when you consider that a mining
engineer must make a decision to census the whole spatial region.
Compared to this eventual census, large sample sizes seem to be
necessary and probably cost effective.
The local neighborhood approach is somewhat less useful for
interpolating environmental variables such as acid rain or groundwater
pollution. Measurements of these variables often have large regions of
irregular spacing. For these applications the global technique, using
all of the data, which can span the sparse areas, is needed.
5.4 KRIGING WITH H EQUAL TO A CONSTANT
Now let us consider the estimation of Z(x>=n when n is a
-------
46
nonzero constant. For our estimator Z(x)=EA-Z(x-> to be unbiased we need
the additional restriction that £A-=!. This requires that we minimize:
E[Z(x)-2(x)]2 + 2v[
where v is a Lagrange multiplier. The resulting system is
K e I I A I I T. I
Art
e' O I I v I I II
where e = (!,...,!>' of dimension Ixn. The minimal variance is given by:
ZCx:> - !k'K ^x) - v
Geostatisticians prefer the variogram to describe spatial dependencies,
since the variance o-F the process may not exist. The kriging system can
be written in terms of the variogram, but for reasons of numerical
stability the covariance form of the kriging system is best. Under the
intrinsic hypothesis of stationary increments this system can be written
as:
'ij3 • I I x I ^ I ^ I
e' O I I u I I0|
-------
47
where *Y. . is the semi—variogram evaluated for points x. and x., that
is •y,Z
5.5 EXAMPLE OF KRIBING WITH A CONSTANT MEAN
As a simple example of this technique, let us again consider a
one—dimensional process and calculate the kriging system.
Let Z,, = e,. + .5e 1 + 15
*\ A yv~"* X
This is a simple, first order moving average process with a mean value o-f
15 and error terms € with a mean o-f zero and variance o-f one. The
covariance for this process is given by:
5/4 for h=0
Cov = 1/2 for h=l
O for h>2
Consider a sample Z,,...,Z4 of size n=4 from this process. The
covariance matrix K for this sample is given by:
5/4 1/2 0 0 I
1/2 5/4 1/2 0 I
0 1/2 5/4 1/2 I
O O 1/2 5/4 |
-------
4'8
Suppose that we wish to predict Z,-. The covariance vector
t(7lx'*"» *or x=4 is *0,O,0,l/2>'. The kriging system then
becomes:
5/4
1/2
0
0
1
1/2
5/4
1/2
0
1
O
1/2
5/4
1/2
1
0
0
1/2
5/4
1
1
1
1
1
0
I X, I 10
I x- I 10
1 x_ | = | 0
I x. | 11/2
I v I 11
with a solution '=(.164, .244, -.119, .710)'
Notice that E^ = 1, so that our estimator Z5=.164Z1 + -244Z2 -
.119Z-T + .710Z4 is an unbiased estimator o-f Z~. Note that the
coefficients are quite different from the p(x)=O case. Since the mean of
each Z. is equal to 15, the constraint £,\-=l insures that the mean of
.'*.
Z(5) is also equal to 15.
-------
49
6.1 UNIVERSAL KRIGING
Now let us assume that |i(x) takes the form of a spatial polynomial with
unknown coefficients. Kriging with this form of a mean has been called
Universal Kriging.
In universal kriging of a planar process the mean, |i(x> at a vector point
x^Xj^x—)' is assumed to be represented by a linear combination of the
form
, f <
= * *
where the fg(x) are known, linearly independent functions and the a.'s
are unknown. Usually the f .(x) functions are taken to be monomials of
order k. For example, for k=l, Jj=3 and f.,(x)=l, f7(x)=x1, f^.(x)=x0.
These are the p. oper variables for the construction of a plane of the
form a,. •+• a0x1 + a-^x,.,. For quadratic surfaces with k=2, we have J0=6
X *- 1 *-f ji. . A.
"2 "?
and f•(x)=l,x1,x-, xt, x^, xtx0, for k=3, J^=1O, for k=4, J.=15,
X X jiL x j&. X ^ i^ *T
etc., where J. = ^ i. If the estimation is done globally this
functional form for [i(x) must apply over the whole spatial range. On the
other hand, differing polynomial coefficients can be estimated for
subregions if the kriging is done locally.
To estimate values of a spatial process with a polynomial trend as the
mean, classical statistical techniques yield the solution. In a global
setting, it seems appropriate to estimate the polynomial trend, subtract
it from the data, and then apply kriging for a process with a mean of
-------
56
zero, to the residuals. This is indeed the optimal solution provided
that the trend is also estimated with generalized least squares
accounting for the known covariance structure. For example, consider our
A.
process Z(x) = (Jl(x) + £. If [l(x> is an estimate of the trend from
vX
generalized least squares, we form W(x) = Z(x) — |j(x) and apply our
kriging estimation scheme for a process with a zero mean as seen
-••s.
earlier. If W(x) represents the kriging prediction for W(x), then the
.A. .A. ^
resulting estimate of Z(x) is given by Z(x) = W(x) + |J(x).
XV. v^.
The variance of this estimate is not Var(W(x»+Var] subject to the unbiased
condition given above. This results in the following system of equations
involving Lagrange multipliers v. :
for i=1'~"n
-------
51
and
£x-f« - IA^. -
This technique can also be applied locally, in which case different
-------
52
forms of the given polynomial trend are fit to different regions. One
restriction of this technique is that adequate amounts of data must
exist in each of the local regions. This often restricts this local
application to regularly spaced sampling grids.
6.2 EXAMPLE OF UNIVERSAL KRIGING
As an example of universal kriging in the plane, let us consider a
regular grid of points that we will index by (i,j). Define a
two— dimensional process by:
a. •+• a^i •*• a--j
where again the e terms are independent, normally distributed error terms
with a mean of zero and a variance of one. Notice that the mean of this
Z process is given by a plane with unknown coefficients a-. The
covariance structure in this process is generated by the common terms
that define each Z(i,j). These terms arise from the error terms
associated with positions immediately above, below, and to the left and
right of the (i,j) point, see Figure 6.2.
For this process the covariance, which depends only on the distance
between the two points, is given by:
17/16 h=O
1/4 h=l
C( h ) = 1/32 h=v'2
1/64 h=2
0 h>3
-------
euj+i)
€(I-1J) • €(IJ) 6U+1J)
6UJ-1)
FIGURE 6.5.2 Structure of Error Terms in example
of Universal Kriging
Ul
-------
54
Consider the following array of data:
(0,0) *
(!,-!> * (2,-l) *
* (2,-2) x
The points with the symbol <*) are points at which we will assume that
data exists. The point with the symbol (x) will represent a point that
we wish to predict from the other four. For convenience of reference we
will label the points as follows:
2 3
4 (5)
The covariance matrix of these four data values is given by:
I 17/16 1/32 0 0
! 1/32 17/16 1/4 • 1/4 |
I O 1/4 17/16 1/32 I
I O 1/4 1/32 17/16 I
By calculating the distances between the fifth point and the four data
points, we find the covariance vector of the fifth point with the four
data points is KJx)' = ( 0 , 1/32 , 1/4 , 1/4 X
The unbiasedness condition imposes restrictions on tha coefficients of
-------
55
the four data points that are used to estimate the fi-fth. In particular
to estimate the intercept of the planar mean we need the restriction
that Xj+X2"l"^3"l"^4 = 1- To estimate the slope in the (i) or first coordinate
direction we need the restriction that Ox«+l \-+2\^+lx- = 2. This is merely
a x linear combination of the first coordinates of the data points set
equal to the first coordinate of the point to predict. Similarly, for
the second coordinates we have the restriction that
0\1-lx2-lx3-2x4 = -2.
With these restriction the system that we must solve is given by:
I17/
16
1/32
O
0
1
0
0
1/32
17/16
1/4
1/4
1
1
-1
0
1/4
17/16
1/32
1
2
-1
0
1/4
1/32
17/16
1
1
-2
1
1
1
1
0
O
0
O
1
2
1
0
O
0
0 II Xl
-l II x2
-l II x3
-211 X4
0 J| Vj
0 | 1 v0
0 II v"
1 0 1
1 1/321
1 1/4 |
1 1/4 |
1 1 1
1 2 1
1 -2 1
The added restrictions are included with Lagrange multipliers
The x coefficients that solve this system are (x1,x2»A3»X4) =
(— .305,— .084f.694,.694). If we arrange these coefficients in the pattern
of the original data we obtain:
-.305
-.084 .694
.694
Notice that the largest coefficients are given to the two data values
immediately above and the left of the point to be estimated. This is not
-------
56
surprising when you consider the manner in which the process is
constructed. What is surprising is that the next largest coe-f-ficient in
absolute value is given by the point that is farthest away from the
point to be estimated. The low weight applied to the middle point is
because much of the information that it provides is already well
represented by the points with the largest weights. The farthest point
brings information that is not used in the construction of the others.
6.3 CIRCULAR ESTIMATION WITH UNIVERSAL KRIBIN6
An important assumption behind kriging is that the covariance matrix of
the process is assumed known exactly. If it is estimated the optimal
properties of the estimator are questioned. Furthermore, even the
estimation of the covariance is not straightforward. We first must know
the covariance of the residuals e(x> before we can estimate the trend
accurately. On the other hand, we need to know the trend before we can
estimate the covariance of the residuals. The circular nature trend and
residual covariance estimation is a problem that has produced several
approximate solutions.
In applying universal kriging, Journel and Huijbregts(197B) suggested
that the sample covariance of the Z(x) values be used in place of the
true residual covariance matrix. If the trend is locally constant, this
provides numerical estimates, but again optimality is questioned. They
also suggested and Aboufirassi and Marino(19S3) demonstrated that if the
trend is mainly in one direction, the perpendicular direction may be
well behaved enough to allow calculation of a sample residual
covariance.
-------
57
Devary and Doctor(1981) calculated the residual covariance matrix from
the residuals of a standard least squares estimation of the trend. The
difficulty here is that least squares assumes uncorrelated errors, which
is quite inconsistent with the existence of a nontrivial covariance
matrix of residuals.
Meuman and Jacobson<1984> suggested an iterative approach to the
simultaneous estimation of the trend and the residual covariance. They
first 'assumed the order of the drift is k=l, and then estimated the
drift by standard least squares. They then calculated the variogram of
the residuals. If this variogram increased too rapidly or exhibits
anisotropic behavior, i.e. different covariance structure in different
directions, they would increase the order of the drift and re-estimate
until the variogram of the residuals had the "desired properties." The
resulting covariance matrix was then used to estimate the trend using
generalized least squares. This results in new residuals, with their own
covariance matrix, which were used to re—estimate the trend. This
xs
iteration was repeated until the estimated drift H
-------
58
remove nonstationary behavior and estimation of the covariance -from
resulting standard forms. They began by defining generalized increments.
For a one-dimensional process the first order difference Z(x+h)— Z(x) is
a zero order increment that will filter out a trend that is constant.
Notice also that for data values x,,...,x the sum EB-Z(x-) where
E8-=O will eliminate a constant in the data. Much of the literature of
generalized covariances confuses notation between the kriging weights x
and these generalized increment weights B by using the symbol x for
both. We have chosen to eliminate the confusion here. For a point
x-=
-------
59
The most important result -for the application of these increments is
Xv /N
that if Z is a kriging estimate of Z(x) then Z(x>—Z(x>, the
estimation or kriging error, is exactly a generalized increment. Written
as -Z(x) + Ex-Z(x-) we see that ^=-1 and 8-=,\- for all the remaining
coefficients.
Generalized increments turn out to have covariance functions of a
standard form. In Matheron(1973) intrinsic functions of order k were
defined as functions for which k order increments are weakly
stationary. Furthermore, it is possible to define a function K(h> such
that the variance of the increment is a quadratic form in K(h), that is:
This has the same form if K was an ordinary covariance function for the
Z(x) process, rather than its increments. It is called a generalised
covariance function. Matheron<1973> showed that any isotropic
generalized covariance function defines a class of functions which are
all equivalent up to an addition of an arbitrary even—powered polynomial
of degree less than or equal to 2k. For example, for k=l, K(h)=-h and
2
K(h)=—h+h are equivalent. The following table lists the essential odd
powers for generalized covariance functions of several low orders.
Order (k) K(h)
0 C6 + aQh, with aQ<0
1 C& + aQh + axh3, with a^a^O
2 CS + aQh + atn3 + a2h5, with aQ,a2
-------
60
and in R a^ > -
and in R a1 >
For all orders C8=C if h=O, and O otherwise.
The constraints insure that the variance expressed in terms of K(h) is
nonnegative.
Suppose that we choose a generalized covariance from the forms given
above, then delete each value Z(x.) in turn and calculate(krige)
estimates of Z(x.) from the remaining points. The resulting estimation
(r)
errors (indexed by r), Z(8 >=Z(x->—Z (x.) are collected. The variance
of the kriging errors is given by:
Var= Z I
The generalized covariance function K(h> is of the form:
K(h) = CS •• ^ -„.
p=0 P
So that the Var(Z(B(r>» becomes:
where
for p=O,l,...,k
Delfiner(1976) noted that the variance is linear in the parameters C and
-------
61
a , so that regressing Var(Z(8(r>»=E[Z(B(r>)]2 upon T£ and T£ +1, for
p=O,l,...,k will produce estimates of C and the a 's. Delfiner
suggested performing a weighted regression with weights inversely
proportional to the squares of the T's. Devary and Rice<1982) found the
regression to be quite robust with respect to these weights and suggest
performing standard regressions on all of the generalized covariances of
order k. For example, for k=l they would use regressions with K(h)=CS,
C&+aQh, C5+aQh+a..tV~', C6+a.h , aQh, and C5+ajh^'. Any given regression
model results in C and a coefficients that are reused to estimate new
residuals and Tr values, which are then used to re—estimate the C and
a coefficients. The iterations are continued until convergence. Devary
and Rice found the convergence to be quite erratic, sometimes converging
to inadmissible values.
Starting this iteration requires a determination of k, the order of the
trend. Devary and Rice(19S2) suggested starting with a generalized
covariance function, K(h)=-h, which is valid for all values of k. The
best choice for k depends on the behavior of the residuals obtained by
deleting and re—estimating each data point in turn. For various orders
of k, they examined mainly the absolute residuals and minimum mean
squared residuals to determine k. For the former, absolute residuals of
all orders considered(typically only k=O,l,2) were ranked by magnitude
and the value of k possessing the smallest average rank was chosen. Once
k was chosen, the iterative regression proceeded.
It may well happen that several sets of generalized covariance
parameters are obtained from the various models of order k tested. To
choose the best set Delfiner(1976) recommended comparing the
-------
62
estimation(kriging) variance, E[Z(& r )]"1" to the quadratic form in
K(h) that de-Fines the kriging variance. If the correct K(h) is chosen,
-s
the ratio of these values should be close to one. No distribution
results were quoted for this ratio, only rules of thumb i.e., the ratio
should lie between .75 and 1.25 for a parameter set to be selected. It
seems odd that a ratio is given bounds that are arithmetically symmetric
about 1 rather than the more reasonable bounds of 3/4 and 4/3.
There have been some objections and reluctance to use this technique
because the resulting covariance function is difficult to interpret. It
applies to the differenced process and not to the original data. Most
practioneers prefer the structural analysis that requires the
straightforward estimation and use of a variogram or a covariance
function. Much can be learned about the variability in the data from
these tools.
The generalised covariance technique with small data sets is probably
not very useful. The differencing that must be performed and the
resulting regressions have been seen to behave quite erratically even in
large data sets. The use with very small samples would certainly not
alleviate these problems.
-------
63
8.0 A ROBUST TREND TECHNIQUE
As noted earlier in the context of variogram estimation, small data sets
and their variable behavior suggest the use o-f robust procedures of
analysis. Cressie<1984) suggested a robust technique for estimating the
trend. Using techniques from Tukey's<1977) exploratory data analysis,
Cressie estimated the trend using what is called median polish. In this
technique medians of the values of a two—dimensional process on a
regular grid are calculated across one direction, say the rows. These
row medians are then subtracted from each value in the row. Then medians
of these residuals are calculated down the columns and then
subtracted from each value in the column. This process is then repeated
*•
until there is no further change. Cressie suggested using this technique
to estimate the trend at position general idea of using a robust method of estimating the
trend and then standard kriging to estimate values of the process would
-------
64
be quite useful. What is unknown is the form of the resulting estimation
variance when robust trend estimation is used. More work in this area is
needed.
9.0 KRIGING AND SPLINES
It has been shown that, for a suitable choice of the generalized
covariance function, the estimates of the values of a random process
obtained by kriging are equivalent to a deterministic interpolation of
those values by a cubic spline, see Wahba and Kimelhart(1970),
Matheron<1981), Dubrule(1983), and Watson(1984). The spline method
examines a class of functions that interpolates the data and minimize
the integral of their squared second derivative. This is thought of as a
measure of smoothness. The second derivative measures the rate of change
of the slope, arJ the faster the slope changes the rougher the function.
Squaring and integrating produces a total positive measure of
"roughness" that is to be minimized. Watson(1984) presented a simple
example to show the relationship between kriging and a special function
derived from the study of differential equations of splines, called a
Green's function. The Green's function, obtained as the solution to an
»
interpolating spline differential equation, and the covariance function,
used in kriging, are closely connected. It turns out that the spline
Green's function is simply an uncentered covariance function of some
random process. The converse though is not true. There may well be
random processes with covariances which are not Green's functions. This
makes kriging somewhat more general than interpolating splines.
-------
65
In R the generalized covariance that produces kriging estimates that are
3 "*
equivalent to interpolating splines is K(h)=| h| . In R ,
K(h)=h^log| hi and in R , K(h)=| hi, see Matheron(1981). Again, since these are
generalized covariance functions they define an entire class equivalent
o
up to the addition of an even polynomial. In R Davis and Grivet(1984)
used:
K+ [h2/(2R)log
-------
6-6
10.0 KRIGING THE LANDFILL pH DATA
The empirical variogram that we will use in kriging the pH data was
calculate in section 4.5. In that example the averaging o-f disjoint
distance ranges overestimated the variability in the original data. We
must expect a large variance of the kriging estimates. The covariances
between two points are calculated using the gaussian variogram fitted in
figure 4.5.2. The covariance matrix depends only on the distance between
the sampled data points, and the covariance vector depends only on the
distances between the point to be estimated and the given data
positions. These covariances are arranged in a matrix equation as
illustrated in section 6.2. Assuming a constant mean, this matrix
j-x
equation is solved for the x coefficients and estimates Z(x) are
j*>
calculated by Z(x)=E\-Z(x-> where Z(x-) are the pH measurements at
each of the sample points. Figure 1O.1 shows the contours of the kriging
estimates obtair«=d using this variogram (represented as a covariance
function). The contours follow the lines of equal pH. As in time series
analysis we will expect a prediction that lies far from the data to be
equal to this mean with a variance equal to the estimated variance of
the process. The closer we come to a cluster of data the more the
estimate is influenced by the data, and the smaller the estimation
variance. Recall that kriging is an exact interpolator, so that the data
values are reproduced exactly. Notice that the contour line of pH=7
passes directly through the location of well M1O. The large pH value of
11.4 causes a rather rapid falling of the coutours down to the more
reasonable range of 7. This produces what could be regarded as a small
"puddle" of highly alkaline water near well M7A. As mentioned in the
-------
FIGURE 10.1
Contours of PH obtained by Krlglng
7.
'- ' \\\
"'. ''• 'i\\
'•- \'-S.,Ss
» » •-.. ^y
Q • •. .
* . . • .
* .. .,
0 •
6
0 0
,8 6,7 .-6:6
« ».
•I N I II
." ?
6.5
•6,8
I I • • • •
-------
FIGURE 10.2
Contours of the Krlglng Variances of PH
M,««'**t*""HM««i»MMi«im,Mtll .MM"'
„-"'••. ' \
. - -" .. '• - „ • v \
»• •
' '
, .5
0
7,4- \
1 ' 11'. 4 ' ; '
i , . .
i i « •
i i • •
-,&.6,7 6,6. /
• * •
o • • • i
v f I • I
.11 ii it ••
I
'
s .'•.»/••
\ • '-.**..
C flf
O.v" / •
• ' .. •
. - • ..-• >
...•••• M---" /
„.»...."-'" ^/^
t • • ••
A 'L_
00
-------
69
introduction to these data, this value may well be in error but kriging
is an interpolation technique and will predict the data as given.
Figure 10.2 shows the contours of equal kriging variance. The variance
at each well is zero, and it increases as the distance away from the
wells increase. Far from the data cluster, in the corners of the map,
the variance is largest. It must be emphasized that this is the variance
of the estimation assuming that the variogram or covariance used is
exact. It does not account for the variation in the covariance
estimation. This additional variance would decrease somewhat the
confidence in the estimates.
In kriging the pH data from the Dade County landfill we are subject to a
very small data set. This is not at all uncommon in many pollution
monitoring situations. Data collection from groundwater is quite
expensive, and economic considerations often overcome statistical
objections to sir^ll samples. We have tried to present several ideas for
and approaches to dealing with small data sets. These approaches have
not been investigated extensively. Much more work is required to
determine their usefulness, variability, and reliability.
-------
70
APPENDIX
SIMULATION OF ISOTROPIC PROCESSES
It is of interest to simulate realizations of isotropic processes to
gain insight into their structure and the restrictions imposed by the
assumptions of isotropy. Matheron(1973) has developed what he calls the
"turning bands" technique of simulation. An example of this technique is
illustrated in Figures A.I through A.4. Figure A.I is a two—dimensional
lattice throughout which we will define our isotropic process by the
following procedure: Each point in this portion of the plane, not just
the lattice points, is projected onto a realization of a one-dimensional
stochastic process with a covariance of C.(h), (Figure A.2). All
points lying between the lattice points are projected onto the same
value. Hence points in an entire band of the plane are given equal
contributions. The entire lattice is then rotated through some random
angle uniformly distributed around a circle, (Figure A.3). The points in
this rotated lattice are then projected onto another realization of the
stochastic process with the same covariance C^h). The results of the
accumulation of these rotations is shown in Figure A.4. Here all points
in the plane are assigned simulated values corresponding to the sum of
the contributions from each rotation. The banded nature of these random
"turns" gives the procedure its name.
In d-dimensional Euclidean space the technique involves producing a
stationary process Z^ with covariance C., on the real line, then for a
point X=. A contribution to the
-------
FIGURE A.I
ISOTROPIC PROCESS SIMULATION BY MATHERON'S TURNING BANDS
-------
FIGURE A.2
BAND PROJECTION ON A ONE-DIMENSIONAL REALIZATION
,8 -,3 -2 ,5 1,1 ,2 -3 -,8 M -1
HITH COVARIANCE C( H )
NJ
-------
FIGURE A.3
ROTATION OF THE GRID
AND PROJECTION ON ANOTHER
REALIZATION
WITH THE SAME COVARIANCE
U)
-------
FIGURE A.4
SPATIAL
ACCUMULATION OF THE
REALIZATIONS
POINTS III
THIS REGION HAVE
VALUES OF ,8+,5
ACCUMULATIONS
CONTINUE FOR MANY
ROTATIONS
,8 -,3 -2
-1
-------
75
process for a point in d—space is given by the value of the first
coordinate projected onto the one—dimensional realization. The whole
realization is then given an independent uniformly distributed rotation
about the origin in d—space, see Ripley(1981). An accumulation of these
realizations produces a normal isotropic process. Matheron(1973) showed
that, as the number of rotations approaches infinity, the resulting
isotropic covariance in d-space is given by C(h)=E(C1(hV)), where E is
the expectation operator, Cj is the covariance for the Z, process on the
real line, and V is the first coordinate of a point on the surface of
the unit sphere in d—space chosen randomly with a uniform distribution.
A line originating within the lattice and radiating outward would fall
close to one of the rotations. Points along this line would have the
covariance structure imposed by the one—dimensional process used in the
generation. No matter what direction the line points, the covariance is
the same along this line and depends only on the distance between the
points.
Ripley(1981,p.l7) has suggested a modification of this procedure for
simple simulation of isotropic processes with covariance function C on a
lattice in the plane. Here eight independent realizations of a
stationary stochastic process with covariance C/8 are generated on the
real line. Then for any point
-------
76
about the origin.
Figure A.5 show a realization of an isotropic process on a 3Ox30 lattice
in the plane generated by rotations and projections suggested by
Ripley(1981) onto realizations of a normal, second order moving average
process of the form:
Z. = a. + .5a. _| •+• ,75a. _
where the a. values are normally distributed with a mean of zero and a
variance of one. Notice the prominent line of peaks running parallel to
the x0 axis for small values of x.. Closer examination also reveals
diagonal peaks and valleys (note middle left to upper center line of
peaks). These diagonal lines 'can be seen even more dramatically if the
process is scaled downward. Figure A.6 shows the result of scaling the
process by one tenth. Here we also have placed lines pointing to the
diagonals. The diagonal lines are a result of the eight rotations used
to construct the simulation.
Some of this behavior might be expected as the projections spread the
extreme values of the one— dimensional realizations throughout various
diagonal lines in the lattice. The difficulty with this method of
simulation comes from the small number of rotations used and their equal
spacing. Eight equally spaced rotations of ir/4 produce diagonals that
i
are too prominent.
A better procedure follows the technique that Matheron(1973) originally
-------
FIGURE A.s ISOTROPIC PROCESS SIMULATION BY FIXED ROTATION
TURNING BANDS
-------
FIGURE A.6
ISOTROPIC PROCESS SIMULATION BY FIXED ROTATION
TURNING BANDS
oo
-------
79
suggested. Instead of equally spaced rotations we should select several
randomly distributed rotations in the plane. Figure A.7 shows a
realization with 16 randomly selected rotations. A. line of peaks is not
prominent on the fixed diagonals given by Ripley's method.
The moving average process used for the one—dimensional realizations
possesses a varied covariance structure depending on the parameters
chosen. Journal and Huijbregts(1978) show how these parameters can be
chosen in one—dimension to produce many of the standard variogram models
in higher dimensions using the turning bands technique.
-------
FIGURE A. 7
ISOTROPIC PROCESS SIMULATION BY RANDOM ROTATION
TURNING BANDS
A
00
o
X
1
-------
81
Aboufirassi, M. and Marino, M. (1983), "Kriging of Water Levels
in the Souss Aquifer, Morocco," Mathematicai_Gegiggy, 15,
No.4, 537-551.
Borgman, L.E. and Davis, B.M.(1982), "A Note on the Asymptotic
Distribution of the Sample Variogram," MathematicaI_GegIogy,
14, No.2, 189-193.
Box, G.E.P. and Jenkins, Q. (1976), Time Series Analysis:.
Fgrecasti.ng_and_Cgn trgjl, Hoi den—Day, New York.
Cressie, N. and Hawkins, D.M.(198O), "Robust Estimation of the
Variogram: I," Mathemat ical._Geal.ogy, 12, No. 2, 115-125.
Cressie, N.(1984), "Kriging Nonstationary Data," presented at
the Amer. Stat. Assoc. meeting Philadelphia, August, 1984.
David, M.(1976), "The Practice of Kriging," Advanced
Gegstatistics_in_the Mining Industry, D. Reidel Publishing
Co., Dordrecht, Holland.
Davis, B.M. and Borgman, L.E.(1979), "Some Exact Sampling
Distributions for Variogram Estimators," Mathematical.
Geology, 11, No.6, 643-653.
Davis, M.W. and Grivet, C.(1984), "Kriging in a Global
Neighborhood," Mathematical._GegIggy, 16, No.3, 249-265.
Delfiner, P.(1975), "Linear Estimation of Non Stationary
Spatial Phenomena," Advanced Geostatistics i.n the Mining
Industry., Reidel, Boston.
Devary, J.L. and Doctor, P.G.(1981), "Geostatistical Modeling
of Pore Velocity," Battelle—Pacific Northwest Laboratory,
Report #PNL-3789.
Devary, J.L. and Rice, W.A.(1982), "Geostatistics Software
User's Manual for the Geosciences Research and Engineering
11/7O Computer System," Battelle-Pacific Northwest Laboratory
Report.
Dubrule, 0.(1983), "Two Methods with Different Objectives:
Splines and Kriging," Mathematical. 6eoiQ9y» 15» No.2,
245-257.
Efron, B.(1979),"Bootstrap Methods: Another look at the
jackknife," Annals of Statistics, vol.7, no.1.
Grenander, U. and Rosenblatt, M.(1957), Statistical. Analysis
of_Statignary_Time_Series, J. Wiley and Sons, New York.
Hogg, R.V.(1974), "Adaptive Robust Procedures: A Partial Review
-------
82
and some Suggestions for Future Applications and Theory," J._
.._St at^_ Assgc .. , 69, 9O9-923.
Huber, P. J. (1972), "Robust Statistics-Review," Annal,s_gf __ Math...
Stat..., 43, 1041-1067.
Journel , A. and Huijbrechts, C. ,(1978), MiQi.ng_6eost at i_st i cs ,
Academic Press, London.
Jowett, G.H. ,(1952), "The Accuracy of Systematic Sampling from
Conveyor Belts," ABB iied_St atistics , *» 5O-59.
Jowett, G.H. ,(1955), "Multiple Regression between Variables
Measured at points of a Plane Sheet," Ag_Biied_Stat i_st i.cs , 4,
80-89.
Jowett, G.H. ,(1957), "Sampling Properties of Local Statistics
in stationary Stochastic Series," Bigmetrika, 42, 160-169.
Kimeldorf, G. and Wahba, G. (1970), "A Correspondence between
Bayesian Estimation of Stochastic Processes and Smoothing by
Splines," Annal.s_gf _Math..._St at... , 41, 495-5O2.
Kolmogorov, A. N. ,(1951), "Interpolation und Extrapolation,"
lyiiet iQ_de_ilac ademie_des_sciences_de __ Ui_S ..S ..R .. » Ser . Math . 5 ,
3-14.
Koopmans, B. (1976), SBectral. __ Anal.v.si.5 __ gf_ ___ lime __ Seri.es,
Academic Press, New York.
Krige, D.G. (1951), "A Statistical Approach to Some Mine
Valuations and Allied Problems at Witwatersrand, " unpublished
Master's Thesis, University of Witwatersrand, South Africa.
Matern, B. (I960), Spatial. __ Variation, Almaenia Foerlaget,
Stockholm.
Matheron, G. (1965), Les __ __ Variabl.es ___ ReaiQQal.ise.es __ et __ Lsur
Estimation, Masson et Cie, Paris, France.
Matheron , G. (1973), "The Intrinsic Random Functions and their
Applications," Ad y an c es_in _ ABBiiecJ _Pr ob ab iiit y , 5 , 439-468 .
Matheron, G. (1981), "Splines and Kriging:. Their Formal
Equivalence," Syracuse University Geological Contribution #8,
edited by D.F. Merriam.
Mosteller, F. and Tukey, J.W. (1977), Data ___ Analysis __ and
Regression, Add i son-Wesley, Reading, MA.
Myers, D.E. , Begovich, C.L. , Butz , T.R., Kane, V.E. (1982),
"Variogram Models for Regional Groundwater Geochemical Data,"
Mat hematic ai_Geg!ggy. , 14, No. 6, 629-644.
Neuman, S.H. and Jacobson, E. A. (1984), "Analysis of
-------
83
Nonintrinsic Spatial Variability by Residual Kriging with
Application to Regional Broundwater Levels," Mat.hemati.cai
GegJLggv., 16, No. 5, 499-521.
Ripley, B. (1981), Sgat i.al _St atist i.cs , J. Wiley and Sons, New
York.
Sharp, W.E.(19S2), "Estimation of Semi variograms by the Maximum
Entropy Method," Mat hemati.ca!_Gegigg¥ , 14, No. 5, 457-474.
Starks, T.H. and Fang, J.H. (1982), "The Effect of Drift on the
Experimental Semivariogram, " Mat hemat i.cal._Gegl_ggY , 14, No. 4,
309-319.
Technos, Inc. (1984), "Detection and Mapping of Contaminated
Soils and Ground Water Using Geophysics," report to U.S.
Environmental Protection Agency.
Tukey, J.W. (1977), Exgigratgry __ Data_ Analysis , Add i son-Wesley,
Reading, Massachusetts.
Von Neumann, J., Kent, R.H. , Belliason, H.R. , Hart,
B. I., (1941), "The Mean Square Successive Difference," Annais
of _Mat hi_S t atj. , 12, 1 53 .
Watson, 6. S. (1971), "Trend-Surface Analysis," Mathematical.
., 3, No. 3, 215-226.
Watson, G.S. (1984), "Smoothing and Interpolation by Kriging and
with Splines," £ jthematicai_Gegiggy , 16, No. 6, 601-615.
Weiner, N. (1949), The ___ Extraggl.ati.gnj. ___ i.D.teregl.atign^ ___ and
Smggthing_gf _Statignar^_Ti.me_Series, J. Wiley, New York.
------- |