Primer on Kriging


&EPA
              United States
              Environmental Protection
              Agency
                             February 1986
A Primer on  Kriging

-------
      A PRIMER ON KRIGING
            Prepared for

     The Statistical Policy Branch
 Office of Standards and Regulations
U. S. Environmental Protection Agency
     Washington, D. C. 20460
                 By

         Robert W. Jernigan
     Department of Mathematics
   Statistics and Computer Science
      The American University
      Washington, D.  C. 20016

-------
                               ABSTRACT








Kriging is the name given to least squares prediction of spatial processes.  It is one of




the defining tools of the geostatistician.  It has been called surface fitting, trend surface




analysis, contouring sparse spatial data, and a generalization  of spline interpolation.  But




it comes down to being best linear unbiased estimation of a stochastic process by




generalized l^ast squares.  We will examine the assumptions,  tools, and methods involved




in kriging, and present several examples of its use.  Kriging  has applications in mapping




spatial environmental data, including, toxic wastes, acid rain, groundwater pollution, etc.
                                      11

-------
                                PREFACE









This paper presents the essence of a statistical technique for the prediction  and mapping




of spatially variable data, known as kriging.  This work has grown out of an attempt to




consolidate results, methods, and applications of kriging that are widely  scattered in




many journals of application. The presentation is intended for statisticians familiar with




the workings of least squares estimation through the use of matrix algebra.  The results




that follow try to relate the language and techniques of kriging to more  standard




statistical terms  and methods.  It is hoped this document will serve to  expand the




audience for these  procedures by providing a thorough presentation of statistical ideas




involved. Many sections are quite technical,  but the essential approach can often be




garnered from just the text.
                                    111

-------
                  Table of Contents

1.0...Introduction                              1

2.0...Landfi11 Example                          5

3.0...Definitions and Fundamental  Tools       10

   3.1...Covariance Function                   10
   3.2...Simplyfing Assumptions                12
     3.2.1 Isotropy                            12
     3.2.2 Intrinsic Hypothesis                16
   3.3...Properties of the Covariance          16
   3.4	Nugget Effect                         17
   3.5...Variogram Models                      IS
   3.6...Bounds on Variograms                  21
   3.7...Hole Effects                          24

4.0...Estimation of the Variogram              26

   4.1... Estimation of the Variogram           26
   4.2...Robust Estimation                     28
   4.3...Graphical Estimation Techniques       29
   4.4...Numerical Estimation Procedures       33
   4.5...Small Data Set Variogram  Estimation   35

5.0...Estimation by Kriging                    4O

   5.1...Simple Kriging                        4O
   5.2...Example from Time Series              42
   5.3...Local vs. Global Estimation           44
   5.4...Kriging with a Non—zero Mean          46
   5.5...Ex amp1e                               47

6.O...Universal Kriging  (U.K.)                 49

   6.1...Universal Kriging                     49
   6.2...Example                               52
   6.3...Circular Estimation with  U.K.         56

7.0...Generalized Covariances                  57

8.0...A Robust Trend Technique                 63

9. 0. ..Kriging and Splines                      64

1O.O..Kriging the Landfill pH Data            66

Appendix

Simulation of Isotropic Processes              7O

References                                     81

-------
1.0  Introduction









The pattern  and extent of many  environmental  variables  are o-ften




examined by  the  mapping  of spatial  data. The set of contours of  equal




temperature  or pressure  on weather maps  is a common example.  Other




diverse examples  include: the mapping of mineral deposits by geologists,




species ranges by biologists, stellar radiation emissions by




astronomers,  and locations of cancerous growths by physicians. Examples




common  to  pollution monitoring are  contouring  of groundwater pollution




from toxic waste  dumps or landfills, mapping the impact of acid rain on




forest  and aquatic  life,  spatial  studies of air pollution, and regional




maps of birth  defects  to locate toxic sites. All of these  examples are




involved with spatial relationships.








A common  problem encountered in spatial analysis is the estimation of




the surface  that is generated by a set of spatial  data. This surface can




be helpful in finding explanations of spatial patterns or in predicting




data measurements in  regions that  were previously  inaccessible or




unobserved (interpolation).









The problem  of estimation of a  spatial  process is often called  trend




surface analysis. Many techniques,  both statistical and numerical, have




been applied  to this problem of  fitting  a surface. Regression or




response  surface analysis is one such  technique. It typically estimates

-------
an average underlying  surface with individual data points  deviating from




a surface  of best overall fit. Smoothing techniques, for example  by




moving averages,  are  another way  of  producing  a local  average surface




value. Many interpolation schemes  from  numerical  analysis  have also been




applied to estimate a  surface. Fourier  or  polynomial interpolation




produce surfaces that exactly interpolate the data, but unfortunately,




they  can produce quite erratic behavior. Osculatory and some  other




piecewise polynomial techniques also produce exact  interpolations, but




they  sometimes have no  sound statistical background. Interpolation by




cubic splines,  a related technique, does have a close tie  to the




prediction  of stochastic processes. In this paper we will consider a




technique that  combines  the statistical properties  of regression and




interpolating properties of splines. It is called kriging.








Kriging is  the name given to  the least  squares prediction of spatial




processes. It is a form  of  surface fitting using regression techniques



that  include spline interpolation as a special case. In  contrast to




smoothing or  simple polynomial regression techniques it does not produce




a surface  of overall  general fit. Rather it produces a  surface that is




"tied  down" to interpolate the data points exactly and  to  estimate




optimally  the process  between the  data points.  Statistically,  kriging is




best  linear unbiased estimation of a  spatial stochastic process using




generalized least squares. As such, as  Watson<1969) has pointed out its




novelty does not lie in  statistics. Its  origins  and development are



found in geology.








Kriging was developed  by "geostatisticians", and in many ways it is a




technique that  defines the  term  "geostatistics."  A geostatistician or a

-------
geostatistical approach uses kriging. The name  comes from D.R.  Krige, a




mining engineer in  South Africa,  who in 1951  first applied the technique




to the estimation  of  gold  deposits. The  mathematical theory associated




with kriging  was advanced  by G. Matheron in  the early 196O's  at the




Fountainbleau Mining School in France, under the heading "Regionalized




Variables."








Kriging  uses essentially the same  techniques to model and predict




spatial variation that are used  to model and predict temporal  variation



in time series analysis. Although many of the techniques  for  time series




analysis can also  be  applied to  spatial  data, many  new and different




problems remain. In time series analysis we consider past  events,




resulting in  dependencies that extend  only in one direction. In  . spatial




analysis, even if on a straight line in the plane, dependencies  can




extend in two directions at  once.








Our aim  in this paper is to  summarize the essential theory behind




kriging. We will  examine  its fundamental definitions,  assumptions,  and



tools. We will be most interested in considering how useful the




technique is in small  sample estimation of environmental data. Some of




the simplifying  assumptions used in kriging are  not  always reasonable in




many  environmental settings. Without these assumptions a much  greater




amount of  data is  needed.  One difficulty is that often economic or other




considerations require that  we work with sample sizes that are extremely




small. Much of the traditional  work in  kriging has been in developing




techniques for  use with large  data sets. Little previous  work has gone




into techniques for use with small data  sets. Kriging procedures need




modification  and further exploration when used  with  small amounts of

-------
data. We will  examine  some approaches  to this  problem.








We will  present several examples to  show the essence of kriging. Some




examples connect kriging and time series prediction.  Other examples




illustrate  some of the unique problems of spatial  estimation  in  the




presence of a trend.  We  also have a particular environmental example




that will be carried  throughout the paper to illustrate  various




techniques.

-------
2.0 LANDFILL  EXAMPLE








To -Fix ideas on a  particular  practical problem we will consider data




collected on a land-fill  in Dade County(Miami) Florida -for the




Environmental Protection  Agency by Technos, Inc.(19S4). The land-fill




covers  an area of about  one square  mile and is  located  in the  Biscayne




Aquifer. This  aquifer is relatively  homogenous and  near  the ground




surface. It is tapped by  two  county well fields about 1.5  miles



down—gradient from the landfill.








It was  of  interest to determine the effects of the landfill on the




aquifer, and to measure these effects eight wells  were  drilled over  the




neighboring area (Figure 2.1).  Figure 2.2  shows measurements of pH taken




from samples  of  the water at 1O foot depths. We  would like to examine



the structure of these measurements and, with the assumptions,  tools and




techniques of kr-ging,  estimate the pH at the unobserved  positions. In




particular, we will compute  estimates of pH at  all  the "x" points in the




grid  of Figure 2.3. This appears to be an ambitious goal:  to  estimate




160 grid points from S  data  points. We shall see  that this estimation is




quite poor in the  corners of the map, far from the data,  and better in




regions surrounded by the data points. Furthermore,  kriging,  being  an




exact interpolator, will reproduce  the pH estimate  at well M7A exactly




as its  given value of 11.4. This large value of  pH may well be




questionable,  but kriging  will assume  all of the data are  correct and




produce an interpolating  surface.








As mentioned  earlier, the practice  of kriging, developed  mainly in

-------
         FIGURE 2.1



Dade County (Miami) Florida Landfill


WASTE
RECOVERY
PLANT






0
M2


SCALE
i?5fl t*.

BORROW PITS





NW 58TH STREET
LANDFILL 0
H



DORAL GOLF COURSE



ROADS



0
M8
0 0
4A MSA MSA

0
MQ
™\J

X
"^0-
^X
J
WELLFIELD/

e ~~
/
M7A /
,
0
M18

CANALS



r

.









,.;;.,
	 1
I.J.J
L.
•*•*•»
,,.,1
I.J..I
ii|

-------
February 1986
Washington,  D.  C.
This publication is available free upon request, in single copies. For further information write
to N. Phillip Ross, Chief, Statistical Policy Branch, Office of Policy, Planning, and Evaluation,
U. S. Environmental Protection Agency, 401 M Street, S. W., Washington, D. C. 20460.

-------
                            FIGURE 2.2
                     PH levels  at  10  foot  depths
7.
                                   o
                      0
                                    7.4
                         .8  6.7     6.6
                                   o
                                    6.8
                                                    o
11.4
7.0

-------
        FIGURE 2.3
Grid of Points to Estimate
X )
X )
X >
K )

x M2 )
0
X )
X )

X X Of? X
: x x x x
X X XX
X X XX
; x x xx
H4fi
X X X X °

X X X X
: x x x x

X * X X X X
—
X X X X X
X X X X X
X X XX X
\ A /"» Mft ^
0
< X X X X
MSA N6A
X ° X X° X X
V
X X X X X
X XX X X
u M9
X X X X X
X X
X X
X X
x £
UM
X X
x y
#\x x
X X "%£
>: x x /
\ • \.' •.;.•'/
l\ A .»/
7A /
X X " X
X X X
M16
X X
X X

X X
4* 	 W—
XXX
XXX

XXX
.1 . W .. i W 	 . "ii . . .
                                                      00

-------
geological settings, has  been with large data sets. Little  has been done




with small data sets.  The problems encountered with  data sets of this




size are considerable, and the confidence in the estimation can be low.




Given our  choice we would much prefer to use  more data, but this is




often not  possible. We are often  forced to  analyze small samples. Some



approaches to  this problem will be presented.

-------
                                                                          ro
3.0 DEFINITIONS AND  FUNDAMENTAL TOOLS









3.1  COVARIANCE FUNCTION









We  will begin by considering a stochastic process,  that is a collection




o-f  random  variables  Z(x) indexed by a parameter x.  The random variable




Z(x) could represent  the prices o-f gold and silver  at  a certain  time x.




In this case we  would be using time series analysis, where  x  is




typically a time index with a natural ordering, and Z  and  VAR(Z(x)) denote the mean and variance,  respectively  o-f




the random  variable  associated with the spatial position x.  A




fundamental  tool in the  study of stochastic processes is the covariance




function:
   K(x,y> =  E[  Z(x)  - E(Z(x»][ Z(y) - E(ZCy))]
This function measures the joint variation of the measurements at




spatial positions x and y. For example, if when,  on the average, a




measurement Z(x) at spatial position x is above  its  mean,  another




measurement Z(y) at position y is below  its  mean,  then the covariance

-------
                                                                       11





between  the two points would be negative. Positive covariances indicate




joint variability on the same side of the means, on  the average. The




magnitude of  this covariance is best considered by dividing it by the




standard deviation  o-f  both Z(x)  and Z(y). This would produce  a




correlation function,  p(x,y). Points with high positive  correlation would




tend to  stay above(or below) their respective means  together, on the




average. Points with high negative correlation would  tend to lie with




one  above  and one  below their means.








Calculation of the covariance function  requires  knowledge of the means




of Z(x) and  Z(y), so  that, geologists  have made special use of  another




measure  of joint  variation that can be calculated without the means.




This function, called the variogram,  is  the variance of the  difference




or increment  Z(x>—Z(y),  and  is written:








  2'Y -  Z(y)  )








^(Xjy) is  called the  semi—variogram. This function is often called the




serial variation function in older  statistical literature  see




Jowett(1952,1955,1957),  and  a special case of this serial variation was




investigated by Von Neuman, et.al.(1941)  as the mean square sucessive




difference.  It is often used in  tests of randomness.  The semi—variogram




also measures covariation. Two measurements that tend to be above  their




means  together would likely  have  a small amount  of variability in their




difference.  Measurements that  tend not to be correlated would have a




difference  that was quite  variable. As  we  will see, the variogram is, in




essence, the  negative  of the covariance function. Points with large




covariance  have  a small variogram value, and paints with low  covariance

-------
                                                                       12





have  a high  variogram value.








Several simplifying  assumptions help us to model  and  predict  spatial




behavior.  If  the covariance function of a stochastic  process depends




only on the  positions of two  points relative  to each other and  not on




their particular fixed locations, we  say that  the process  is  stationary.




To be more specific, a stochastic  process is  said to be stationary of




order  two if it has  a constant mean and  its  covariance function K(x,y)




depends only on the difference vector h=x-y  and  not  on the particular x




and y  chosen. We could then write  K(x,y) as some  C(h)  or C(| hi ,<*), where




I h|  is the length of the vector h  (distance between the vectors x  and y),




and oc  is the direction angle.  All points separated by the vector h  would




have  the  same covariance.  Figure 3.1.1  shows  an example of points




separated by the same distance and pointing  in the same  direction  and




would  thus have the same  covariance for a stationary or homogenous



process. This type of covariance might be applicable  to the  flow of



water  down a gradient, as  illustrated  in  the  next section.








3.2  SIMPLIFYING  ASSUMPTIONS








      3.2.1 ISOTROPY








Many  of the  equations developed in kriging are simplified greatly "if we




assume that  the process has  the same covariance structure in every




direction. Such  a stochastic process is called homogenous  and isotropic,




i.e.,  its probability structure does not change with a change  in  spatial



position or a rotation about  a spatial position. Isotropy indicates some




uniformity in every direction.  For  example, at midday  in a  rural

-------

Fi 9 ure 3.1.1  P o i nts a long  a  grad i e nt  where the
               co^ariance might depend'on  distance and direotii
                                                                             U)

-------
                                                                       14




neighborhood -the sky  might be  uniformly the  same shade o-F blue at  any




point just above the  horizon. If we consider an urban  neighborhood  close




to an industrial factory, this  uniform  blue shade might turn




considerably darker when we  look  towards the factory's smokestacks. This




lack of uniformity  is  called  anisotropy.








Isotropy is of great  use with  small data  sets where it is difficult to




model and estimate  covariances or variograms that  depend on both the




distance and direction between  two points in space. Much  more data can




be combined if we assume that the covariance between  two points depends



only on the distance  between the points.  The assumption of istropy is




not required to estimate the process with kriging; it serves  to  simplify




the equations and  calculations. In fact, in many applications such an




assumption would not  be realistic. Figure  3.2.1 shows two  directions of




measurement of  a flow of some  pollutant  down a  stream. Also  displayed




are typical sample variograms  for the  downstream direction (D) and the



across stream direction (A).  Notice that the  measurements across the
                                t


flow of the pollutant  exhibit a much greater variability at short




distances than the  measurements  down  the stream. The  downstream



measurements are much  more  nearly alike  and therefore have much less



variability within the pollutant flow. An assumption of  isotropy here




would not be reasonable.  The variability  in both  directions would have



to be taken into account.








The form of the  covariance function  for  a homogeneous, isotropic process



is quite  restrictive.  Ripley<19Sl) has pointed  out that  in the plane



isotropic  correlations are bounded below  by  -.403. Further bounds relate




isotropic  covariances to  Bessel functions, see Matern(196O). Unless

-------
                              VAR1QGRAHS
                                    ACROSS  STREAH
                                       DQHNSTREAH
                                       DISTANCE
FIGURE 3.2.1

-------
                                                                       16
explicitly stated, we will assume  that  the process we  are modeling is




both homogenous and isotropic.








       3.2.2 INTRINSIC HYPOTHESIS








Geostatisticians sometimes  encounter processes  with an increasingly




large capacity for spatial variation, i.e.  points  at great distance from




one another have an increasingly larger variability. When dealing with




these  processes with an unbounded  variance, geostatisticians  often  make




use of another property called the intrinsic property.  A stochastic




process is  said to be  intrinsic if the mean is constant and the




increment or difference Z(Xj+h)—Z(x1>  has a finite  variance which does




not depend  on  x-. This is equivalent to the increment being a  stationary




stochastic  process of  order two. This  same idea   is often  encountered in




time series analysis, where a process  may exhibit nonstationary




behavior, but its first order difference is stationary.  In a sense we



assume that the nonstationary behavior that produces the  unbounded




variances can  be  subtracted out  of the process. Thus even if the




covariance  function was  infinite,  the variogram,  being the variance of a



difference,  would  be  finite.








3.3 PROPERTIES OF THE COVARIANCE FUNCTION








For a  stationary  stochastic process with mean  p, the covariance function




C(h), depending  on a vector  h, has the  following properties, (see




Journel and Huijbreghts(1978)>:








1)  C(h)  is positive definite,  that  is  for any linear combination

-------
                                                                       17






y=a1*Z(x1>+ • • '-»-a *Z(x  ) the -function C(h) is such that the  variance of y




is always greater than or equal  to  zero.




2) C(0)=VAR(Z(x)> is nonnegative.




3) C(h)=C(— h)  the covariance function is an  even function.




4) I C(h)|  < C(O)  Schwarz's inequality.









Many  other  properties relating the covariance function to spectral




representations of stochastic processes can be found in Ripley(1981> and




Koopmans(1976).








The variogram is related to the  covariance function when the  latter




exists by:




           •Y<  h )  = C(O) - C(h)




Furthermore,




1) 'Y(0)=0,  although the limit 'Y(h) as h approaches zero is not necessarily




zero.




2)
The variogram may exist even  in the  case of a  process with infinite




variance (C(0)>,  and its calculation does not require knowledge  of the




mean. For these reasons it is often  favored by geostatisticians  for




those  processes with  "unlimited  capacity for variability," see  Journel




and Huijbregts(1978).








3.4 NUGGET  EFFECT









Dne would expect that as the  distance between  two  points in space x,




and x0 decreases to zero, the variability of the increment

-------
                                                                       18





ZOO—Ztx^,) would also decrease to zero.  This  continuity  o-f  the




variogram might well be expected in a region with  smooth uniform




measurements.  But the variogram  need not be continuous at  zero, and  this




fact has practical advantages to a geologist. This property is termed




the "nugget effect."  Its origins  can be  due to measurement errors  or  to




what Journel and  Huijbregts(197B) call micro—variabilities. Such




discontinuities are reasonable if, for example, a geologist's core




sample at a  given depth has hit a "nugget" of mineral,  figure 3.3.1.




Suppose  that this sample  core has removed a concentrated nugget of




mineral,  leaving behind a  region  perhaps devoid of any  other measurable




quantities. No  matter how close  another sample is taken to  the first,




quite different measurements would result. We would then measure




significant nonzero variability of the increments at arbitarily  short




distances.  Of course the  variability  between  the original core  sample




and itself  is zero,  but the variability  between two arbitarily close




points may be  quite  positive. This discontinuity  of the variogram at the



origin  helps  to rodel both continuous and discrete changes in spatial




processes.








3.5 VARIOGRAM  MODELS








Geostatisticians have  developed  a catalog of forms for isotropic



covariances and variograms.  There are  eight basic variogram  models




usually considered, see Journel  and Huijbreghts(1978):








(Here we will use  h=| h|  since we  are only considering isotropic models.)








1)  The  spherical or Matheron variogram  is given by:

-------
                 FIGURE 3.3.1
             \
THE NUGGET EFFECT
                                   Location of core sample
Underground
^Av/AVV^xVv/AlV^

               Nugget  removed
                                                                    V
                                               ^VVX
                                               \:
                                                             My
                                 x Surrounding region
                                   with only residue

-------
20
= C [<3h/2a)-h3/<2a3>] -for h<=a

= C for h>a
The value a is termed the range, a distance beyond which two points in

space are no longer correlated. The value C is termed the sill and is

equal to the variance o-f the process C(0).

2) The exponential model is given by: 'Y(h)=C-(l—exp(—h/a»

3) A somewhat inappropriate name is given to the Gaussian model, with

2 2
the form: (Y(h)=C-(l-exp(-h /a )). These last two models reach their sills

asymptotically.

Variogram models without a sill, useful in modeling data with

increasingly larger spatial variability with increasing distance, are:

4) the linear model: 'Y
-------
21

3) generalized covariance model: a specialized linear combination o-F odd

powers o-F h, used in filtering of trends in kriging. (See section 7.0)
Figures 3.5.1 and 3.5.2 show the relationships between some of the

variogram models. Figure 3.5.1 shows examples of variograms with sills,

and Figure 3.5.2 shows those without sills. Notice the different rates

at which the variograms approach the sill. The more rapid the approach,

the smaller is the covariance between distance points, and the spatial

dependencies remain quite local. The rate of approach and the sill value

are often related to estimate geometrically the parameters of the model.

The exponential model reaches 95/C of its sill at a value of 3a, so the

scale parameter, a, is taken to be one-third of this distance. For the

gaussian model, the scale parameter, a, is taken to be 1/V3 of the

distance of 95% of the sill. For the spherical model the range

corresponds to the intersection of a tangent line at the origin and the

line of the sill. This value is 2/3 of the range, a, so that a is taken

to be 3/2 of this distance. These purely geometric properties are used

extensively by geostatisticians to fit variogram models, see Journel and

Huijbregts(1978).

3.6 BOUNDS ON VARIOGRAMS

Variograms that increase too rapidly are often indicative of a process

with a nonconstant mean. For example, consider a one—dimensional process

of the form:

Z(x) = |l(x) + e )=O.

The variogram for such a homogeneous process would be given by:
-------
FIGURE 3.5.1

URRIQGRflM MODELS UITH SILLS
EXPONENTIAL
D
2 3
H (DISTflNCEl
-------
FIGURE 3.5.2

LINERR RND PDUER FUNCTION UflRIDGRHMS

H 1 H-
CM
o
0
234
H (DISTflNCEl
to
-------
24
2-Y(h) = E( Z - Z(x) >2

= E( e )2 + E( p(x+h> - p )2
The e term is simply the variogram o-F the residual errors. The p term is

the bias due to a drift in the mean. If, for example, the mean is

linear, i.e. ^(x)=x, rather than a constant, then this bias term becomes

2
h , and the variogram would grow quadratically. A quadratic mean would

produce a variogram that grows as the fourth power of h, etc., see

Starks and Fang(1982) for further discussion of the effect of

nonconstant means, as well as Neuman and Jacobson(1984) and Aboufirassi

and Marino(1983).

Matheron<1973> showed for spatial processes the fact that the covariance

is positive definite insures that for a vector h,

lim -YtfD/lhl2 = O

I h| ->oo

That is, -Y
-------
25

when one is below its mean the other is above its mean. This is termed

the hole—e-f-fect. The bounds mentioned earlier for isotropic covariances

limit the size of negative covariances and these hole effects, and

therefore considering only positive covariances is not as restrictive as

it sounds. Variogram models with hole effects are typically

exponentially damped sines and cosines.

The theoretical bounds on isotropic covariances discussed in section 3.4

are often taken as rules for distinguishing hole—effects from random

variability in estimated variograms. This variability in estimated

variograms needs further study. It is clear that a more sound

statistical technique is needed to identify hole—effects.
-------
26
4.1 ESTIMATION OF THE VARIOGRAM

Geostatisticians use the term structural analysis to denote the

estimation, fit, and study of variogram models. Empirical variograms are

calculated from a given set of data and used to fit the various models

presented earlier. The empirical variogram is calculated as:

— -- 2
CZ(x.-t-h) - Z(x-)l
nh i=l 1 x
where n. is the number of pairs of points separated by the distance h.

Figure 4.1 shows a sample variogram calculated from the simulated

istropic process discussed in the appendix. A cubic spline has been fit

to the data, merely to connect smoothly the points for easier viewing.

Notice that the variability increases as the distance between points

increases. This corresponds to a decreasing correlation with increasing

distance. Notice the approach to what appears to be a flat sill value of

about 25, around a distance of about 4 units. Notice also the random

variability as the values fluctuate and do not increase in a strict

monotonic fashion. Whether a strict monotonic variogram should be fit in

these situations, or one with a hole effect (as with the splines)

deserves more study. Most practioneers would dismiss hole effects in

this example.

Some studies of the statistical behavior of the empirical variogram have

been undertaken by Borgman and Davis(1978,1981). They have shown that,

for processes with a finite variance, *Y(h) is asymptotically normally
-------
CD

CO
in
CM
o
CM
FIGURE 4.1 VARIOGRAM CALCULATED FROM SIMULATED DATA

H 1 H 1 1 I-
O
0
H (DISTRNCE)
-------
28

distributed. They have also produced, for one-dimensional normal

processes, tables of the exact sampling distribution of the variogram by

numerical inversion of the characteristic function.

4.2 ROBUST ESTIMATION

Small sample estimation can lead to estimates that are quite erratic.

Estimators that minimize the effects of single influential data points

might prove useful in such problems. Cressie and Hawkins(198O)

investigated several of these, so—called, robust estimators of the

variogram. Their approach was to consider variogram estimation, not as a

problem of variance or scale parameter estimation, but rather as a

o
problem of estimating the mean of the quantities W.=CZ(x-)—Z(x-+h)>

where Z(x.) are normally distributed. Power transformations of Y. to

symmetry and studies of cumulants suggested that

1/2
Y^l Z-Z(xi+h)| is approximately normal with a skewness of -OS and a

kurtosis excess of -.52. Several estimators of the point of symmetry of

the Y. distribution were then considered, including the mean, median,

trimmed means and M—estimators of Huber, Tukey, Hampel, and Andrews (for

studies of robust estimation see Hogg(1974), Huber<1972)). The mean

Y of the Y. is approximately normally distributed, by the central

limit theorem, and through series expansions Cressie and Hawkins found

that for n=n. ,
E[Y4/2'Y
-------
29
Z =.6Z ,+u where the u terms have various distributions with
n rt i f\ r?
progressively heavier tails. Cressie and Hawkins interpreted these

realizations as traverses across a spatial process and considered the

estimation of 'Yd) only. Their findings revealed that the M-estimators of

the mean of Y. are best, but considering its good performance and ease

of calculation, suggested that the variogram be calculated using:
= Y A.457 + .494n * + .O45n 2)
where Y = ^ £| Z(xi+h)-Z(xi>| 1/2 with "=nn

Figure 4.2 shows the Cressie and Hawkins variogram for the simulated

process discussed in the appendix. Notice that for the one distance that

they investigated (h=l) their estimation of 'Y(l) is slightly larger than

the empirical variogram, but for all other distances (h>l) it is

uniformly lower.

In one— dimension Sharp(1982) has examined estimation of variograms with

the maximum entropy method from time series analysis.
The previous techniques have brought much of modern statistical

estimation theory to the study of the variogram, but unfortunately most

techniques used in practice are either grossly subjective or blindly

numerical.

Journel and Huijbregts(1978) present several case studies in which
-------
CO
in
Cxi
(XI
CI5
FIGURE 4.2

EMPIRICnLtTDP] flND CRE5SIE RND HRUKINS UflRIDGRflMS

-f

2
3 4
H miSTRNCE]
I

6
U)
o
-------
FIGURE 4.3 GEOMETRICALLY FITTED VARIOGRAMS
CO
_
CM
Cxi
D 1
7
to
r-J
H [DISTRNCE1
-------
32

simple geometric properties of variograms are used to estimate its

functional form. For example, subjective, graphically chosen values for

the sill and the range are used in the variogram forms listed earlier.

Figure 4.3 shows the results of fitting the spherical, exponential, and

gaussian variogram models to the simulated data presented earlier. The

sill for all of the model was chosen graphically to be 25.25. For the

spherical model the range parameter, a, was calculated, following

Journel and Huijbregts(1978>, as three—halves of the distance (h) value

obtained from the line tangent to the variogram at the origin. This

graphically fit variogram fits the first two values quite well at the

expense of the remaining data. For points from 2 to 4 units apart, this

spherical variogram suggests more variability and therefore less

correlation between these points. The exponential scale parameter was

taken as one-third of the range(taken to be 4 units). The gaussian scale

parameter was taken as 1//3 of the 4 unit range.

The gaussian fit seems to pass more closely to the data values beyond a

distance of 2 units, while suggesting more highly correlateddess

variable) values under 2 units apart. The exponential fits well only

beyond the range. Since local behavior is important for most estimation

methods by kriging, probably the spherical or gaussian models would be

chosen. The expected behavior at the origin of the process being studied

would decide between the two, with higher expected correlation for

extremely close values going in favor of the gaussian model.

Geostatisticians have defended these graphical practices with the

evidence of much empirical work. They argue that estimation of the

variogram is not their primary goal, but only a tool for estimation by
-------
33

kriging, see David(1975). They are also in the unique position of being

able to compare directly their predictions with the true measurement. If

an estimate of the mean grade of a mineral is obtained for a particular

point they can then mine that region and directly measure the mean

grade. This method of direct checking of the estimation has been a

tremendous benefit to the development of the practice of kriging.

Unfortunately, these same checking procedures are impossible to obtain

in estimating, say, acid rain deposits.

4.4 NUMERICAL ESTIMATION PROCEDURES

In the above graphical procedures the choice of suitable sills and

ranges was surely in the eye of the beholder. The biases of different

researchers would produce different variograms and result in different

kriging estimates. To eliminate these biases, many regression techniques

have been used to fit the variogram. Devary and Doctor(1981) used

non—linear least squares to fit the gaussian model to hydrologic field

data. Since variogram estimates at different points are likely to be

correlated, the using of techniques that assume uncbrrelated data is

questionable. If one had knowledge of the correlation between variogram

estimates at different distances, these could be incorporated into a

generalized least squares estimation procedure. Unfortunately, these

correlations are not known. Figure 4.4 shows the results of fitting

variogram models to the simulated data presented earlier using nonlinear

least squares. Notice that each numerical technique seeks its own sill

level in an attempt to obtain the best fit.

Some researchers have tried to improve on these fitting procedures.
-------
FIGURE 4.4 NUMERICALLY FITTED VARIOGRAMS

CD j L 1 1 ] h
in
CD
CM
D
0
2 3
5
6
-H

7
U)
H miSTRCEl
-------
35

Aboufirassi and Marino(1983) determined the range of a spherical model

graphically, but estimated the sill using cross—validation. They began

with a sill value of one, then removed and reestimated(using kriging)

each data point from the spatial data. The resulting estimation variance

can be shown to be 1/C times the "true" estimation variance when the

variogram with the "true" sill value of C is used. For each data point,

the difference between the observed and estimated data values was

divided by individual estimation standard deviations to produce what

were called reduced errors. The sample variance of these reduced errors

was taken as the value of the sill. Sample variability of the variance

estimate was not considered. They further used geometric properties to

fit a gaussian model.

Other techniques used in the kriging of nonstationary processes can be

used to estimate generalized covariances and thus eliminate the need for

direct variogram calculation. We shall see these techniques in section

7.0

4.5 SMALL DATA SET VARIOGRAM ESTIMATION

In small data sets these techniques are difficult to apply. In our

landfill example, not only do we have a small amount of data, but the

data arrangement is extremely unequally spaced. For a given distance h

there may be at most one (n. =1) pair of points separated by h. In larger

data sets we could increase the number of terms (n.) by grouping

together terms in neighborhoods of various distances. The average of

these terms would then be assigned to the average distance of the point

pairs. In large data sets this averaging procedure has been investigated
-------
in
CM
CM
it:
I.—V
n
FIGURE 4.5.1

i
INDIVIDUAL VARIOGRAM TERMS

.__ i i i
r ~i • ~ • r

D
n
n o
FOR pH
D
10
?0
n
3D
1Q
BO
U)
cr»
H
-------
37

by Myers, et.al.(1982).

To illustrate this technique, Figure 4.5.1 shows the values of W. =

o
•tZ(x.) — Z(x-+. )>"*" for all 28 distances from our original 8 pH

measurements from the landfill example. Notice the large variability in

these terms. Figure 4.5.2 dispays the results of averaging disjoint

collections of about 6 of the W terms. A gaussian variogram has been

fitted to the resulting 6 data points using nonlinear least squares.

With such a small data set there is a drastic reduction in the useable

data to estimate the variogram. Furthermore the resulting variance, as

given by the sill, is more than a factor of two greater than the simple

variance of the pH measurements.

This is one technique that can be used to obtain an estimated variogram

from such a small set of data. Some other approaches that have proved

useful to some degree are suggested below.

1) A scatterplot smoothing of the W terms with respect to' h could be

applied. This has been done with the robust smoother, LOWESS, which

stands for LOcally WEighted Scatterplot Smoother. This smoothing

technique was developed by Cleveland(1978). It is essentially an

iterative regression prediction weighted most heavily by neighboring

points and least heavily, by points with large residuals. The general

results are a smoothed estimation of the variogram from the (h,W)

scatterplot that shows a linear increase for small h, followed by a

relatively flat range for middle—sized h, ending in a slight downward

turn for large h. The degree to which this smoothing could estimate the

true underlying variogram needs further investigation. A simulation of
-------
CD
FIGURE 4.5.2

GRUSSIRN UflRIQGRflM NDNLIN REG FIT TD PH

H 1 1 1 H——h
GO
3EI
cc
CE
CD
1Q
IB
20 25 30
DISTflNCE
35
45 50
U)

00
-------
39

this estimation procedure with small samples and a known parent

variogram is needed.

2) Other robust regression techniques could also be applied to the (h,W)

scatterplot. A discussion o-f many iteratively re—weighted least square

techniques that can yield a variety o-f -fitting criteria is presented in

Hosteller and Tukey(1979).

3) A resampling plan such as the jackknife or the bootstrap could be

used to obtain estimates o-f the variability of any o-f these estimation

schemes. The resulting variance could be used to place confidence

intervals on the variograms and in turn introduce the estimation error

into the kriging estimates.
-------
40
5.0 ESTIMATION BY KRIBINB

5.1 SIMPLE KRIBING

Kriging is the name given by Matheron(1965) to the calculation of best

linear unbiased estimators of various functions of spatial processes. It

is the time series prediction theory of Wiener<1949) and

Kolmogorov<1941) applied to spatial data, and as such is estimation in

the presence of correlated errors. The basic statistical technique is

widely used by statisticians as generalized least squares. The

application of this technique has been studied extensively as a time

series prediction problem by Srenander and Rosenblatt(1957),

Koopmans(1974), and Box and Jenkins(1976), to name only three.

Consider a spatial process of the form

Z(x) = p(x) + e

2) ^l(x)=unknown constant

3) [l(x)=unknown polynomial in space.
-------
41
For observed values Z(x~),...,Z(x ) we wish to estimate(interpolate)

the value of the process at an unobserved point x by a linear

combination o-f the observed values, i.e.
o,

Z(x> =
The coefficients A- are chosen so that Z(x> is:

1) unbiased, E[Z(x>] = Z(x>, and

2
2) mininum variance, ) -
where
-------
42

where K is the covariance matrix Her. -3, x=' and
This system has a solution provided that K is positive definite. The

minimal variance is given by:
= Var = 0, consider the following time
-------
43

series prediction problem. Let Z(x) = e(x) + ,5e(x—1) be a

one—dimensional first order moving average time series. The e(x> terms

are taken to be independent, normally distributed error terms with a

mean of zero and a variance of one. The covariance of this process is

given by:

5/4 for h=0

Cov(Zx,Zx+h) = 1/2 for h=l

0 for h>2
Consider a sample Z,,...,Z- of size n=4 from this process. The

covariance matrix K for this sample is given by:
I 5/4 1/2 0 O I
I 1/2 5/4 1/2 01
I O 1/2 5/4 1/2 |
I 0 0 1/2 5/4 |
Suppose that we wish to predict Zg. The covariance vector

(o-j ,,...,'=<-.047, .117, -.246, .498)'. The estimate

of Zg then becomes Zg = -.O^721 + .117Z2 - .246Z3 + .498Z4. This

estimate has a mean of zero, since each Z^ has a mean of zero.
-------
44
This is kriging and it is said that we have just "kriged" an esimate of

Z5-
Note the relationship between Box and Jenkins time series analysis

prediction of Z^ and kriging, Box and Jenkins (1976). They assume an

infinite past of available data and estimate Z— with weights (.5, -.25,

.125, -.0625, ... ,(-.5)j+1,...)'. Our kriging weights of (.498, -.246,

.117, -.047)' are close and differ due to the fact that we are using

only four data points to predict Zg.

5.3 LOCAL vs. GLOBAL ESTIMATION

The above approach assumes the use of all of the data, and since K is an

nxn matrix, large samples can severely limit the practical use of this

technique. Several approaches have been taken to limit the size of the

matrices involved and thus minimize the calculation. Journel and

Huijbregts(1978) and many others suggest using local estimation by

considering only data points close to the estimation point x. If the

covariance between nearby points decreases rapidly this technique can

limit the size of the kriging system of equations. For isotropic

processes the matrix K is completely determined by the relative geometry

of this local neighborhood. If this geometry remains constant, say by

moving over a regularly spaced grid, then the A constants will not

change for estimating similarly positioned points. For example, if the

dots in the figure below represent the only data points used to estimate

the points represented by x's, only one K matrix needs to be calculated

for one x value. Estimates of the others can be obtained by rotation.
-------
45
x x
X X
As long as we are restricted to a regular grid this simplification is

great, but it also requires a large amount of regularly spaced data.

Sparse regions are not conducive to this procedure. Small data sets can

easily make use of all of the data in the covariance matrix. In our

landfill example with 8 data points, we will need to solve a system of 8

linear equations, if we assume the mean is zero. More elaborate forms

for the mean of the spatial process will add more equations.

For geostatisticians, the large and regular data requirements for

kriging are net 'Lhat restrictive when you consider that a mining

engineer must make a decision to census the whole spatial region.

Compared to this eventual census, large sample sizes seem to be

necessary and probably cost effective.

The local neighborhood approach is somewhat less useful for

interpolating environmental variables such as acid rain or groundwater

pollution. Measurements of these variables often have large regions of

irregular spacing. For these applications the global technique, using

all of the data, which can span the sparse areas, is needed.

5.4 KRIGING WITH H EQUAL TO A CONSTANT

Now let us consider the estimation of Z(x>=n when n is a
-------
46
nonzero constant. For our estimator Z(x)=EA-Z(x-> to be unbiased we need

the additional restriction that £A-=!. This requires that we minimize:
E[Z(x)-2(x)]2 + 2v[
where v is a Lagrange multiplier. The resulting system is
K e I I A I I T. I
Art

e' O I I v I I II
where e = (!,...,!>' of dimension Ixn. The minimal variance is given by:
ZCx:> - !k'K ^x) - v
Geostatisticians prefer the variogram to describe spatial dependencies,

since the variance o-F the process may not exist. The kriging system can

be written in terms of the variogram, but for reasons of numerical

stability the covariance form of the kriging system is best. Under the

intrinsic hypothesis of stationary increments this system can be written

as:
'ij3 • I I x I ^ I ^ I

e' O I I u I I0|
-------
47
where *Y. . is the semi—variogram evaluated for points x. and x., that

is •y,Z

5.5 EXAMPLE OF KRIBING WITH A CONSTANT MEAN

As a simple example of this technique, let us again consider a

one—dimensional process and calculate the kriging system.
Let Z,, = e,. + .5e 1 + 15
*\ A yv~"* X
This is a simple, first order moving average process with a mean value o-f

15 and error terms € with a mean o-f zero and variance o-f one. The

covariance for this process is given by:
5/4 for h=0

Cov = 1/2 for h=l

O for h>2
Consider a sample Z,,...,Z4 of size n=4 from this process. The

covariance matrix K for this sample is given by:
5/4 1/2 0 0 I
1/2 5/4 1/2 0 I
0 1/2 5/4 1/2 I
O O 1/2 5/4 |
-------
4'8
Suppose that we wish to predict Z,-. The covariance vector

t(7lx'*"» *or x=4 is *0,O,0,l/2>'. The kriging system then

becomes:
5/4
1/2
0
0
1
1/2
5/4
1/2
0
1
O
1/2
5/4
1/2
1
0
0
1/2
5/4
1
1
1
1
1
0
I X, I 10
I x- I 10
1 x_ | = | 0
I x. | 11/2
I v I 11
with a solution '=(.164, .244, -.119, .710)'

Notice that E^ = 1, so that our estimator Z5=.164Z1 + -244Z2 -
.119Z-T + .710Z4 is an unbiased estimator o-f Z~. Note that the

coefficients are quite different from the p(x)=O case. Since the mean of

each Z. is equal to 15, the constraint £,\-=l insures that the mean of
.'*.
Z(5) is also equal to 15.
-------
49
6.1 UNIVERSAL KRIGING

Now let us assume that |i(x) takes the form of a spatial polynomial with

unknown coefficients. Kriging with this form of a mean has been called

Universal Kriging.

In universal kriging of a planar process the mean, |i(x> at a vector point

x^Xj^x—)' is assumed to be represented by a linear combination of the

form
, f <

= * *
where the fg(x) are known, linearly independent functions and the a.'s

are unknown. Usually the f .(x) functions are taken to be monomials of

order k. For example, for k=l, Jj=3 and f.,(x)=l, f7(x)=x1, f^.(x)=x0.

These are the p. oper variables for the construction of a plane of the

form a,. •+• a0x1 + a-^x,.,. For quadratic surfaces with k=2, we have J0=6
X *- 1 *-f ji. . A.

"2 "?
and f•(x)=l,x1,x-, xt, x^, xtx0, for k=3, J^=1O, for k=4, J.=15,
X X jiL x j&. X ^ i^ *T

etc., where J. = ^ i. If the estimation is done globally this

functional form for [i(x) must apply over the whole spatial range. On the

other hand, differing polynomial coefficients can be estimated for

subregions if the kriging is done locally.

To estimate values of a spatial process with a polynomial trend as the

mean, classical statistical techniques yield the solution. In a global

setting, it seems appropriate to estimate the polynomial trend, subtract

it from the data, and then apply kriging for a process with a mean of
-------
56

zero, to the residuals. This is indeed the optimal solution provided

that the trend is also estimated with generalized least squares

accounting for the known covariance structure. For example, consider our

A.
process Z(x) = (Jl(x) + £. If [l(x> is an estimate of the trend from

vX
generalized least squares, we form W(x) = Z(x) — |j(x) and apply our

kriging estimation scheme for a process with a zero mean as seen

-••s.
earlier. If W(x) represents the kriging prediction for W(x), then the

.A. .A. ^
resulting estimate of Z(x) is given by Z(x) = W(x) + |J(x).

XV. v^.
The variance of this estimate is not Var(W(x»+Var] subject to the unbiased

condition given above. This results in the following system of equations

involving Lagrange multipliers v. :
for i=1'~"n
-------
51

and

£x-f« - IA^. -
This technique can also be applied locally, in which case different
-------
52
forms of the given polynomial trend are fit to different regions. One

restriction of this technique is that adequate amounts of data must

exist in each of the local regions. This often restricts this local

application to regularly spaced sampling grids.

6.2 EXAMPLE OF UNIVERSAL KRIGING

As an example of universal kriging in the plane, let us consider a

regular grid of points that we will index by (i,j). Define a

two— dimensional process by:
a. •+• a^i •*• a--j
where again the e terms are independent, normally distributed error terms

with a mean of zero and a variance of one. Notice that the mean of this

Z process is given by a plane with unknown coefficients a-. The

covariance structure in this process is generated by the common terms

that define each Z(i,j). These terms arise from the error terms

associated with positions immediately above, below, and to the left and

right of the (i,j) point, see Figure 6.2.

For this process the covariance, which depends only on the distance

between the two points, is given by:
17/16 h=O
1/4 h=l
C( h ) = 1/32 h=v'2
1/64 h=2
0 h>3
-------
euj+i)
€(I-1J) • €(IJ) 6U+1J)
6UJ-1)
FIGURE 6.5.2 Structure of Error Terms in example
of Universal Kriging
Ul
-------
54
Consider the following array of data:

(0,0) *

(!,-!> * (2,-l) *

* (2,-2) x

The points with the symbol <*) are points at which we will assume that

data exists. The point with the symbol (x) will represent a point that

we wish to predict from the other four. For convenience of reference we

will label the points as follows:
2 3

4 (5)

The covariance matrix of these four data values is given by:
I 17/16 1/32 0 0
! 1/32 17/16 1/4 • 1/4 |
I O 1/4 17/16 1/32 I
I O 1/4 1/32 17/16 I
By calculating the distances between the fifth point and the four data

points, we find the covariance vector of the fifth point with the four

data points is KJx)' = ( 0 , 1/32 , 1/4 , 1/4 X

The unbiasedness condition imposes restrictions on tha coefficients of
-------
55
the four data points that are used to estimate the fi-fth. In particular

to estimate the intercept of the planar mean we need the restriction

that Xj+X2"l"^3"l"^4 = 1- To estimate the slope in the (i) or first coordinate

direction we need the restriction that Ox«+l \-+2\^+lx- = 2. This is merely

a x linear combination of the first coordinates of the data points set

equal to the first coordinate of the point to predict. Similarly, for

the second coordinates we have the restriction that

0\1-lx2-lx3-2x4 = -2.

With these restriction the system that we must solve is given by:
I17/
16
1/32
O
0
1
0
0

1/32
17/16
1/4
1/4
1
1
-1
0
1/4
17/16
1/32
1
2
-1
0
1/4
1/32
17/16
1
1
-2
1
1
1
1
0
O
0
O
1
2
1
0
O
0
0 II Xl
-l II x2
-l II x3
-211 X4
0 J| Vj
0 | 1 v0
0 II v"
1 0 1
1 1/321
1 1/4 |
1 1/4 |
1 1 1
1 2 1
1 -2 1
The added restrictions are included with Lagrange multipliers
The x coefficients that solve this system are (x1,x2»A3»X4) =

(— .305,— .084f.694,.694). If we arrange these coefficients in the pattern

of the original data we obtain:

-.305

-.084 .694

.694

Notice that the largest coefficients are given to the two data values

immediately above and the left of the point to be estimated. This is not
-------
56

surprising when you consider the manner in which the process is

constructed. What is surprising is that the next largest coe-f-ficient in

absolute value is given by the point that is farthest away from the

point to be estimated. The low weight applied to the middle point is

because much of the information that it provides is already well

represented by the points with the largest weights. The farthest point

brings information that is not used in the construction of the others.

6.3 CIRCULAR ESTIMATION WITH UNIVERSAL KRIBIN6

An important assumption behind kriging is that the covariance matrix of

the process is assumed known exactly. If it is estimated the optimal

properties of the estimator are questioned. Furthermore, even the

estimation of the covariance is not straightforward. We first must know

the covariance of the residuals e(x> before we can estimate the trend

accurately. On the other hand, we need to know the trend before we can

estimate the covariance of the residuals. The circular nature trend and

residual covariance estimation is a problem that has produced several

approximate solutions.

In applying universal kriging, Journel and Huijbregts(197B) suggested

that the sample covariance of the Z(x) values be used in place of the

true residual covariance matrix. If the trend is locally constant, this

provides numerical estimates, but again optimality is questioned. They

also suggested and Aboufirassi and Marino(19S3) demonstrated that if the

trend is mainly in one direction, the perpendicular direction may be

well behaved enough to allow calculation of a sample residual

covariance.
-------
57
Devary and Doctor(1981) calculated the residual covariance matrix from

the residuals of a standard least squares estimation of the trend. The

difficulty here is that least squares assumes uncorrelated errors, which

is quite inconsistent with the existence of a nontrivial covariance

matrix of residuals.

Meuman and Jacobson<1984> suggested an iterative approach to the

simultaneous estimation of the trend and the residual covariance. They

first 'assumed the order of the drift is k=l, and then estimated the

drift by standard least squares. They then calculated the variogram of

the residuals. If this variogram increased too rapidly or exhibits

anisotropic behavior, i.e. different covariance structure in different

directions, they would increase the order of the drift and re-estimate

until the variogram of the residuals had the "desired properties." The

resulting covariance matrix was then used to estimate the trend using

generalized least squares. This results in new residuals, with their own

covariance matrix, which were used to re—estimate the trend. This

xs
iteration was repeated until the estimated drift H
-------
58

remove nonstationary behavior and estimation of the covariance -from

resulting standard forms. They began by defining generalized increments.

For a one-dimensional process the first order difference Z(x+h)— Z(x) is

a zero order increment that will filter out a trend that is constant.

Notice also that for data values x,,...,x the sum EB-Z(x-) where

E8-=O will eliminate a constant in the data. Much of the literature of

generalized covariances confuses notation between the kriging weights x

and these generalized increment weights B by using the symbol x for

both. We have chosen to eliminate the confusion here. For a point

x-=
-------
59
The most important result -for the application of these increments is

Xv /N

that if Z is a kriging estimate of Z(x) then Z(x>—Z(x>, the

estimation or kriging error, is exactly a generalized increment. Written

as -Z(x) + Ex-Z(x-) we see that ^=-1 and 8-=,\- for all the remaining

coefficients.

Generalized increments turn out to have covariance functions of a

standard form. In Matheron(1973) intrinsic functions of order k were

defined as functions for which k order increments are weakly

stationary. Furthermore, it is possible to define a function K(h> such

that the variance of the increment is a quadratic form in K(h), that is:
This has the same form if K was an ordinary covariance function for the

Z(x) process, rather than its increments. It is called a generalised

covariance function. Matheron<1973> showed that any isotropic

generalized covariance function defines a class of functions which are

all equivalent up to an addition of an arbitrary even—powered polynomial

of degree less than or equal to 2k. For example, for k=l, K(h)=-h and

2
K(h)=—h+h are equivalent. The following table lists the essential odd

powers for generalized covariance functions of several low orders.

Order (k) K(h)

0 C6 + aQh, with aQ<0

1 C& + aQh + axh3, with a^a^O

2 CS + aQh + atn3 + a2h5, with aQ,a2
-------
60
and in R a^ > -

and in R a1 >

For all orders C8=C if h=O, and O otherwise.
The constraints insure that the variance expressed in terms of K(h) is

nonnegative.

Suppose that we choose a generalized covariance from the forms given

above, then delete each value Z(x.) in turn and calculate(krige)

estimates of Z(x.) from the remaining points. The resulting estimation

(r)
errors (indexed by r), Z(8 >=Z(x->—Z (x.) are collected. The variance

of the kriging errors is given by:
Var= Z I
The generalized covariance function K(h> is of the form:
K(h) = CS •• ^ -„.

p=0 P
So that the Var(Z(B(r>» becomes:
where
for p=O,l,...,k
Delfiner(1976) noted that the variance is linear in the parameters C and
-------
61

a , so that regressing Var(Z(8(r>»=E[Z(B(r>)]2 upon T£ and T£ +1, for

p=O,l,...,k will produce estimates of C and the a 's. Delfiner

suggested performing a weighted regression with weights inversely

proportional to the squares of the T's. Devary and Rice<1982) found the

regression to be quite robust with respect to these weights and suggest

performing standard regressions on all of the generalized covariances of

order k. For example, for k=l they would use regressions with K(h)=CS,

C&+aQh, C5+aQh+a..tV~', C6+a.h , aQh, and C5+ajh^'. Any given regression

model results in C and a coefficients that are reused to estimate new

residuals and Tr values, which are then used to re—estimate the C and

a coefficients. The iterations are continued until convergence. Devary

and Rice found the convergence to be quite erratic, sometimes converging

to inadmissible values.
Starting this iteration requires a determination of k, the order of the

trend. Devary and Rice(19S2) suggested starting with a generalized

covariance function, K(h)=-h, which is valid for all values of k. The

best choice for k depends on the behavior of the residuals obtained by

deleting and re—estimating each data point in turn. For various orders

of k, they examined mainly the absolute residuals and minimum mean

squared residuals to determine k. For the former, absolute residuals of

all orders considered(typically only k=O,l,2) were ranked by magnitude

and the value of k possessing the smallest average rank was chosen. Once

k was chosen, the iterative regression proceeded.

It may well happen that several sets of generalized covariance

parameters are obtained from the various models of order k tested. To

choose the best set Delfiner(1976) recommended comparing the
-------
62
estimation(kriging) variance, E[Z(& r )]"1" to the quadratic form in
K(h) that de-Fines the kriging variance. If the correct K(h) is chosen,
-s
the ratio of these values should be close to one. No distribution

results were quoted for this ratio, only rules of thumb i.e., the ratio

should lie between .75 and 1.25 for a parameter set to be selected. It

seems odd that a ratio is given bounds that are arithmetically symmetric

about 1 rather than the more reasonable bounds of 3/4 and 4/3.

There have been some objections and reluctance to use this technique

because the resulting covariance function is difficult to interpret. It

applies to the differenced process and not to the original data. Most

practioneers prefer the structural analysis that requires the

straightforward estimation and use of a variogram or a covariance

function. Much can be learned about the variability in the data from

these tools.

The generalised covariance technique with small data sets is probably

not very useful. The differencing that must be performed and the

resulting regressions have been seen to behave quite erratically even in

large data sets. The use with very small samples would certainly not

alleviate these problems.
-------
63
8.0 A ROBUST TREND TECHNIQUE

As noted earlier in the context of variogram estimation, small data sets

and their variable behavior suggest the use o-f robust procedures of

analysis. Cressie<1984) suggested a robust technique for estimating the

trend. Using techniques from Tukey's<1977) exploratory data analysis,

Cressie estimated the trend using what is called median polish. In this

technique medians of the values of a two—dimensional process on a

regular grid are calculated across one direction, say the rows. These

row medians are then subtracted from each value in the row. Then medians

of these residuals are calculated down the columns and then

subtracted from each value in the column. This process is then repeated

*•
until there is no further change. Cressie suggested using this technique

to estimate the trend at position general idea of using a robust method of estimating the

trend and then standard kriging to estimate values of the process would
-------
64

be quite useful. What is unknown is the form of the resulting estimation

variance when robust trend estimation is used. More work in this area is

needed.
9.0 KRIGING AND SPLINES
It has been shown that, for a suitable choice of the generalized

covariance function, the estimates of the values of a random process

obtained by kriging are equivalent to a deterministic interpolation of

those values by a cubic spline, see Wahba and Kimelhart(1970),

Matheron<1981), Dubrule(1983), and Watson(1984). The spline method

examines a class of functions that interpolates the data and minimize

the integral of their squared second derivative. This is thought of as a

measure of smoothness. The second derivative measures the rate of change

of the slope, arJ the faster the slope changes the rougher the function.

Squaring and integrating produces a total positive measure of

"roughness" that is to be minimized. Watson(1984) presented a simple

example to show the relationship between kriging and a special function

derived from the study of differential equations of splines, called a

Green's function. The Green's function, obtained as the solution to an
»
interpolating spline differential equation, and the covariance function,

used in kriging, are closely connected. It turns out that the spline

Green's function is simply an uncentered covariance function of some

random process. The converse though is not true. There may well be

random processes with covariances which are not Green's functions. This

makes kriging somewhat more general than interpolating splines.
-------
65
In R the generalized covariance that produces kriging estimates that are

3 "*
equivalent to interpolating splines is K(h)=| h| . In R ,

K(h)=h^log| hi and in R , K(h)=| hi, see Matheron(1981). Again, since these are

generalized covariance functions they define an entire class equivalent

o
up to the addition of an even polynomial. In R Davis and Grivet(1984)

used:

K+ [h2/(2R)log
-------
6-6
10.0 KRIGING THE LANDFILL pH DATA

The empirical variogram that we will use in kriging the pH data was

calculate in section 4.5. In that example the averaging o-f disjoint

distance ranges overestimated the variability in the original data. We

must expect a large variance of the kriging estimates. The covariances

between two points are calculated using the gaussian variogram fitted in

figure 4.5.2. The covariance matrix depends only on the distance between

the sampled data points, and the covariance vector depends only on the

distances between the point to be estimated and the given data

positions. These covariances are arranged in a matrix equation as

illustrated in section 6.2. Assuming a constant mean, this matrix

j-x
equation is solved for the x coefficients and estimates Z(x) are

j*>
calculated by Z(x)=E\-Z(x-> where Z(x-) are the pH measurements at

each of the sample points. Figure 1O.1 shows the contours of the kriging

estimates obtair«=d using this variogram (represented as a covariance

function). The contours follow the lines of equal pH. As in time series

analysis we will expect a prediction that lies far from the data to be

equal to this mean with a variance equal to the estimated variance of

the process. The closer we come to a cluster of data the more the

estimate is influenced by the data, and the smaller the estimation

variance. Recall that kriging is an exact interpolator, so that the data

values are reproduced exactly. Notice that the contour line of pH=7

passes directly through the location of well M1O. The large pH value of

11.4 causes a rather rapid falling of the coutours down to the more

reasonable range of 7. This produces what could be regarded as a small

"puddle" of highly alkaline water near well M7A. As mentioned in the
-------
FIGURE 10.1
Contours of PH obtained by Krlglng
7.
'- ' \\\

"'. ''• 'i\\
'•- \'-S.,Ss
» » •-.. ^y
Q • •. .
* . . • .
* .. .,
0 •

6
0 0

,8 6,7 .-6:6
« ».

•I N I II
." ?

6.5
•6,8
I I • • • •
-------
FIGURE 10.2
Contours of the Krlglng Variances of PH
M,««'**t*""HM««i»MMi«im,Mtll .MM"'
„-"'••. ' \
. - -" .. '• - „ • v \
»• •
' '
, .5

0
7,4- \
1 ' 11'. 4 ' ; '
i , . .

i i « •
i i • •
-,&.6,7 6,6. /
• * •
o • • • i
v f I • I
.11 ii it ••
I

'
s .'•.»/••
\ • '-.**..
C flf
O.v" / •
• ' .. •
. - • ..-• >
...•••• M---" /
„.»...."-'" ^/^
t • • ••
A 'L_
00
-------
69

introduction to these data, this value may well be in error but kriging

is an interpolation technique and will predict the data as given.

Figure 10.2 shows the contours of equal kriging variance. The variance

at each well is zero, and it increases as the distance away from the

wells increase. Far from the data cluster, in the corners of the map,

the variance is largest. It must be emphasized that this is the variance

of the estimation assuming that the variogram or covariance used is

exact. It does not account for the variation in the covariance

estimation. This additional variance would decrease somewhat the

confidence in the estimates.

In kriging the pH data from the Dade County landfill we are subject to a

very small data set. This is not at all uncommon in many pollution

monitoring situations. Data collection from groundwater is quite

expensive, and economic considerations often overcome statistical

objections to sir^ll samples. We have tried to present several ideas for

and approaches to dealing with small data sets. These approaches have

not been investigated extensively. Much more work is required to

determine their usefulness, variability, and reliability.
-------
70
APPENDIX

SIMULATION OF ISOTROPIC PROCESSES

It is of interest to simulate realizations of isotropic processes to

gain insight into their structure and the restrictions imposed by the

assumptions of isotropy. Matheron(1973) has developed what he calls the

"turning bands" technique of simulation. An example of this technique is

illustrated in Figures A.I through A.4. Figure A.I is a two—dimensional

lattice throughout which we will define our isotropic process by the

following procedure: Each point in this portion of the plane, not just

the lattice points, is projected onto a realization of a one-dimensional

stochastic process with a covariance of C.(h), (Figure A.2). All

points lying between the lattice points are projected onto the same

value. Hence points in an entire band of the plane are given equal

contributions. The entire lattice is then rotated through some random

angle uniformly distributed around a circle, (Figure A.3). The points in

this rotated lattice are then projected onto another realization of the

stochastic process with the same covariance C^h). The results of the

accumulation of these rotations is shown in Figure A.4. Here all points

in the plane are assigned simulated values corresponding to the sum of

the contributions from each rotation. The banded nature of these random

"turns" gives the procedure its name.

In d-dimensional Euclidean space the technique involves producing a

stationary process Z^ with covariance C., on the real line, then for a

point X=. A contribution to the
-------
FIGURE A.I

ISOTROPIC PROCESS SIMULATION BY MATHERON'S TURNING BANDS
-------
FIGURE A.2
BAND PROJECTION ON A ONE-DIMENSIONAL REALIZATION
,8 -,3 -2 ,5 1,1 ,2 -3 -,8 M -1
HITH COVARIANCE C( H )
NJ
-------
FIGURE A.3
ROTATION OF THE GRID
AND PROJECTION ON ANOTHER
REALIZATION
WITH THE SAME COVARIANCE
U)
-------
FIGURE A.4
SPATIAL
ACCUMULATION OF THE
REALIZATIONS
POINTS III
THIS REGION HAVE
VALUES OF ,8+,5
ACCUMULATIONS
CONTINUE FOR MANY
ROTATIONS
,8 -,3 -2
-1
-------
75
process for a point in d—space is given by the value of the first

coordinate projected onto the one—dimensional realization. The whole

realization is then given an independent uniformly distributed rotation

about the origin in d—space, see Ripley(1981). An accumulation of these

realizations produces a normal isotropic process. Matheron(1973) showed

that, as the number of rotations approaches infinity, the resulting

isotropic covariance in d-space is given by C(h)=E(C1(hV)), where E is

the expectation operator, Cj is the covariance for the Z, process on the

real line, and V is the first coordinate of a point on the surface of

the unit sphere in d—space chosen randomly with a uniform distribution.

A line originating within the lattice and radiating outward would fall

close to one of the rotations. Points along this line would have the

covariance structure imposed by the one—dimensional process used in the

generation. No matter what direction the line points, the covariance is

the same along this line and depends only on the distance between the

points.

Ripley(1981,p.l7) has suggested a modification of this procedure for

simple simulation of isotropic processes with covariance function C on a

lattice in the plane. Here eight independent realizations of a

stationary stochastic process with covariance C/8 are generated on the

real line. Then for any point
-------
76

about the origin.
Figure A.5 show a realization of an isotropic process on a 3Ox30 lattice

in the plane generated by rotations and projections suggested by

Ripley(1981) onto realizations of a normal, second order moving average

process of the form:
Z. = a. + .5a. _| •+• ,75a. _
where the a. values are normally distributed with a mean of zero and a

variance of one. Notice the prominent line of peaks running parallel to

the x0 axis for small values of x.. Closer examination also reveals

diagonal peaks and valleys (note middle left to upper center line of

peaks). These diagonal lines 'can be seen even more dramatically if the

process is scaled downward. Figure A.6 shows the result of scaling the

process by one tenth. Here we also have placed lines pointing to the

diagonals. The diagonal lines are a result of the eight rotations used

to construct the simulation.

Some of this behavior might be expected as the projections spread the

extreme values of the one— dimensional realizations throughout various

diagonal lines in the lattice. The difficulty with this method of

simulation comes from the small number of rotations used and their equal

spacing. Eight equally spaced rotations of ir/4 produce diagonals that
i
are too prominent.

A better procedure follows the technique that Matheron(1973) originally
-------
FIGURE A.s ISOTROPIC PROCESS SIMULATION BY FIXED ROTATION
TURNING BANDS
-------
FIGURE A.6
ISOTROPIC PROCESS SIMULATION BY FIXED ROTATION
TURNING BANDS
oo
-------
79

suggested. Instead of equally spaced rotations we should select several

randomly distributed rotations in the plane. Figure A.7 shows a

realization with 16 randomly selected rotations. A. line of peaks is not

prominent on the fixed diagonals given by Ripley's method.

The moving average process used for the one—dimensional realizations

possesses a varied covariance structure depending on the parameters

chosen. Journal and Huijbregts(1978) show how these parameters can be

chosen in one—dimension to produce many of the standard variogram models

in higher dimensions using the turning bands technique.
-------
FIGURE A. 7
ISOTROPIC PROCESS SIMULATION BY RANDOM ROTATION

TURNING BANDS

A
00
o
X
1
-------
81
Aboufirassi, M. and Marino, M. (1983), "Kriging of Water Levels
in the Souss Aquifer, Morocco," Mathematicai_Gegiggy, 15,
No.4, 537-551.

Borgman, L.E. and Davis, B.M.(1982), "A Note on the Asymptotic
Distribution of the Sample Variogram," MathematicaI_GegIogy,
14, No.2, 189-193.

Box, G.E.P. and Jenkins, Q. (1976), Time Series Analysis:.
Fgrecasti.ng_and_Cgn trgjl, Hoi den—Day, New York.

Cressie, N. and Hawkins, D.M.(198O), "Robust Estimation of the
Variogram: I," Mathemat ical._Geal.ogy, 12, No. 2, 115-125.

Cressie, N.(1984), "Kriging Nonstationary Data," presented at
the Amer. Stat. Assoc. meeting Philadelphia, August, 1984.

David, M.(1976), "The Practice of Kriging," Advanced
Gegstatistics_in_the Mining Industry, D. Reidel Publishing
Co., Dordrecht, Holland.

Davis, B.M. and Borgman, L.E.(1979), "Some Exact Sampling
Distributions for Variogram Estimators," Mathematical.
Geology, 11, No.6, 643-653.

Davis, M.W. and Grivet, C.(1984), "Kriging in a Global
Neighborhood," Mathematical._GegIggy, 16, No.3, 249-265.

Delfiner, P.(1975), "Linear Estimation of Non Stationary
Spatial Phenomena," Advanced Geostatistics i.n the Mining
Industry., Reidel, Boston.

Devary, J.L. and Doctor, P.G.(1981), "Geostatistical Modeling
of Pore Velocity," Battelle—Pacific Northwest Laboratory,
Report #PNL-3789.

Devary, J.L. and Rice, W.A.(1982), "Geostatistics Software
User's Manual for the Geosciences Research and Engineering
11/7O Computer System," Battelle-Pacific Northwest Laboratory
Report.

Dubrule, 0.(1983), "Two Methods with Different Objectives:
Splines and Kriging," Mathematical. 6eoiQ9y» 15» No.2,
245-257.

Efron, B.(1979),"Bootstrap Methods: Another look at the
jackknife," Annals of Statistics, vol.7, no.1.

Grenander, U. and Rosenblatt, M.(1957), Statistical. Analysis
of_Statignary_Time_Series, J. Wiley and Sons, New York.

Hogg, R.V.(1974), "Adaptive Robust Procedures: A Partial Review
-------
82
and some Suggestions for Future Applications and Theory," J._
.._St at^_ Assgc .. , 69, 9O9-923.
Huber, P. J. (1972), "Robust Statistics-Review," Annal,s_gf __ Math...
Stat..., 43, 1041-1067.

Journel , A. and Huijbrechts, C. ,(1978), MiQi.ng_6eost at i_st i cs ,
Academic Press, London.

Jowett, G.H. ,(1952), "The Accuracy of Systematic Sampling from
Conveyor Belts," ABB iied_St atistics , *» 5O-59.

Jowett, G.H. ,(1955), "Multiple Regression between Variables
Measured at points of a Plane Sheet," Ag_Biied_Stat i_st i.cs , 4,
80-89.

Jowett, G.H. ,(1957), "Sampling Properties of Local Statistics
in stationary Stochastic Series," Bigmetrika, 42, 160-169.

Kimeldorf, G. and Wahba, G. (1970), "A Correspondence between
Bayesian Estimation of Stochastic Processes and Smoothing by
Splines," Annal.s_gf _Math..._St at... , 41, 495-5O2.

Kolmogorov, A. N. ,(1951), "Interpolation und Extrapolation,"
lyiiet iQ_de_ilac ademie_des_sciences_de __ Ui_S ..S ..R .. » Ser . Math . 5 ,
3-14.

Koopmans, B. (1976), SBectral. __ Anal.v.si.5 __ gf_ ___ lime __ Seri.es,
Academic Press, New York.

Krige, D.G. (1951), "A Statistical Approach to Some Mine
Valuations and Allied Problems at Witwatersrand, " unpublished
Master's Thesis, University of Witwatersrand, South Africa.

Matern, B. (I960), Spatial. __ Variation, Almaenia Foerlaget,
Stockholm.

Matheron, G. (1965), Les __ __ Variabl.es ___ ReaiQQal.ise.es __ et __ Lsur
Estimation, Masson et Cie, Paris, France.

Matheron , G. (1973), "The Intrinsic Random Functions and their
Applications," Ad y an c es_in _ ABBiiecJ _Pr ob ab iiit y , 5 , 439-468 .

Matheron, G. (1981), "Splines and Kriging:. Their Formal
Equivalence," Syracuse University Geological Contribution #8,
edited by D.F. Merriam.

Mosteller, F. and Tukey, J.W. (1977), Data ___ Analysis __ and
Regression, Add i son-Wesley, Reading, MA.

Myers, D.E. , Begovich, C.L. , Butz , T.R., Kane, V.E. (1982),
"Variogram Models for Regional Groundwater Geochemical Data,"
Mat hematic ai_Geg!ggy. , 14, No. 6, 629-644.

Neuman, S.H. and Jacobson, E. A. (1984), "Analysis of
-------
83
Nonintrinsic Spatial Variability by Residual Kriging with
Application to Regional Broundwater Levels," Mat.hemati.cai
GegJLggv., 16, No. 5, 499-521.

Ripley, B. (1981), Sgat i.al _St atist i.cs , J. Wiley and Sons, New
York.

Sharp, W.E.(19S2), "Estimation of Semi variograms by the Maximum
Entropy Method," Mat hemati.ca!_Gegigg¥ , 14, No. 5, 457-474.

Starks, T.H. and Fang, J.H. (1982), "The Effect of Drift on the
Experimental Semivariogram, " Mat hemat i.cal._Gegl_ggY , 14, No. 4,
309-319.

Technos, Inc. (1984), "Detection and Mapping of Contaminated
Soils and Ground Water Using Geophysics," report to U.S.
Environmental Protection Agency.

Tukey, J.W. (1977), Exgigratgry __ Data_ Analysis , Add i son-Wesley,
Reading, Massachusetts.

Von Neumann, J., Kent, R.H. , Belliason, H.R. , Hart,
B. I., (1941), "The Mean Square Successive Difference," Annais
of _Mat hi_S t atj. , 12, 1 53 .

Watson, 6. S. (1971), "Trend-Surface Analysis," Mathematical.
., 3, No. 3, 215-226.
Watson, G.S. (1984), "Smoothing and Interpolation by Kriging and
with Splines," £ jthematicai_Gegiggy , 16, No. 6, 601-615.

Weiner, N. (1949), The ___ Extraggl.ati.gnj. ___ i.D.teregl.atign^ ___ and
Smggthing_gf _Statignar^_Ti.me_Series, J. Wiley, New York.
-------