&EPA
              United States
              Environmental Protection
              Agency
                             February 1986
A Primer on  Kriging

-------
      A PRIMER ON KRIGING
            Prepared for

     The Statistical Policy Branch
 Office of Standards and Regulations
U. S. Environmental Protection Agency
     Washington, D. C. 20460
                 By

         Robert W. Jernigan
     Department of Mathematics
   Statistics and Computer Science
      The American University
      Washington, D.  C. 20016

-------
                               ABSTRACT








Kriging is the name given to least squares prediction of spatial processes.  It is one of




the defining tools of the geostatistician.  It has been called surface fitting, trend surface




analysis, contouring sparse spatial data, and a generalization  of spline interpolation.  But




it comes down to being best linear unbiased estimation of a stochastic process by




generalized l^ast squares.  We will examine the assumptions,  tools, and methods involved




in kriging, and present several examples of its use.  Kriging  has applications in mapping




spatial environmental data, including, toxic wastes, acid rain, groundwater pollution, etc.
                                      11

-------
                                PREFACE









This paper presents the essence of a statistical technique for the prediction  and mapping




of spatially variable data, known as kriging.  This work has grown out of an attempt to




consolidate results, methods, and applications of kriging that are widely  scattered in




many journals of application. The presentation is intended for statisticians familiar with




the workings of least squares estimation through the use of matrix algebra.  The results




that follow try to relate the language and techniques of kriging to more  standard




statistical terms  and methods.  It is hoped this document will serve to  expand the




audience for these  procedures by providing a thorough presentation of statistical ideas




involved. Many sections are quite technical,  but the essential approach can often be




garnered from just the text.
                                    111

-------
                  Table of Contents

1.0...Introduction                              1

2.0...Landfi11 Example                          5

3.0...Definitions and Fundamental  Tools       10

   3.1...Covariance Function                   10
   3.2...Simplyfing Assumptions                12
     3.2.1 Isotropy                            12
     3.2.2 Intrinsic Hypothesis                16
   3.3...Properties of the Covariance          16
   3.4	Nugget Effect                         17
   3.5...Variogram Models                      IS
   3.6...Bounds on Variograms                  21
   3.7...Hole Effects                          24

4.0...Estimation of the Variogram              26

   4.1... Estimation of the Variogram           26
   4.2...Robust Estimation                     28
   4.3...Graphical Estimation Techniques       29
   4.4...Numerical Estimation Procedures       33
   4.5...Small Data Set Variogram  Estimation   35

5.0...Estimation by Kriging                    4O

   5.1...Simple Kriging                        4O
   5.2...Example from Time Series              42
   5.3...Local vs. Global Estimation           44
   5.4...Kriging with a Non—zero Mean          46
   5.5...Ex amp1e                               47

6.O...Universal Kriging  (U.K.)                 49

   6.1...Universal Kriging                     49
   6.2...Example                               52
   6.3...Circular Estimation with  U.K.         56

7.0...Generalized Covariances                  57

8.0...A Robust Trend Technique                 63

9. 0. ..Kriging and Splines                      64

1O.O..Kriging the Landfill pH Data            66

Appendix

Simulation of Isotropic Processes              7O

References                                     81

-------
1.0  Introduction









The pattern  and extent of many  environmental  variables  are o-ften




examined by  the  mapping  of spatial  data. The set of contours of  equal




temperature  or pressure  on weather maps  is a common example.  Other




diverse examples  include: the mapping of mineral deposits by geologists,




species ranges by biologists, stellar radiation emissions by




astronomers,  and locations of cancerous growths by physicians. Examples




common  to  pollution monitoring are  contouring  of groundwater pollution




from toxic waste  dumps or landfills, mapping the impact of acid rain on




forest  and aquatic  life,  spatial  studies of air pollution, and regional




maps of birth  defects  to locate toxic sites. All of these  examples are




involved with spatial relationships.








A common  problem encountered in spatial analysis is the estimation of




the surface  that is generated by a set of spatial  data. This surface can




be helpful in finding explanations of spatial patterns or in predicting




data measurements in  regions that  were previously  inaccessible or




unobserved (interpolation).









The problem  of estimation of a  spatial  process is often called  trend




surface analysis. Many techniques,  both statistical and numerical, have




been applied  to this problem of  fitting  a surface. Regression or




response  surface analysis is one such  technique. It typically estimates

-------
an average underlying  surface with individual data points  deviating from




a surface  of best overall fit. Smoothing techniques, for example  by




moving averages,  are  another way  of  producing  a local  average surface




value. Many interpolation schemes  from  numerical  analysis  have also been




applied to estimate a  surface. Fourier  or  polynomial interpolation




produce surfaces that exactly interpolate the data, but unfortunately,




they  can produce quite erratic behavior. Osculatory and some  other




piecewise polynomial techniques also produce exact  interpolations, but




they  sometimes have no  sound statistical background. Interpolation by




cubic splines,  a related technique, does have a close tie  to the




prediction  of stochastic processes. In this paper we will consider a




technique that  combines  the statistical properties  of regression and




interpolating properties of splines. It is called kriging.








Kriging is  the name given to  the least  squares prediction of spatial




processes. It is a form  of  surface fitting using regression techniques



that  include spline interpolation as a special case. In  contrast to




smoothing or  simple polynomial regression techniques it does not produce




a surface  of overall  general fit. Rather it produces a  surface that is




"tied  down" to interpolate the data points exactly and  to  estimate




optimally  the process  between the  data points.  Statistically,  kriging is




best  linear unbiased estimation of a  spatial stochastic process using




generalized least squares. As such, as  Watson<1969) has pointed out its




novelty does not lie in  statistics. Its  origins  and development are



found in geology.








Kriging was developed  by "geostatisticians", and in many ways it is a




technique that  defines the  term  "geostatistics."  A geostatistician or a

-------
geostatistical approach uses kriging. The name  comes from D.R.  Krige, a




mining engineer in  South Africa,  who in 1951  first applied the technique




to the estimation  of  gold  deposits. The  mathematical theory associated




with kriging  was advanced  by G. Matheron in  the early 196O's  at the




Fountainbleau Mining School in France, under the heading "Regionalized




Variables."








Kriging  uses essentially the same  techniques to model and predict




spatial variation that are used  to model and predict temporal  variation



in time series analysis. Although many of the techniques  for  time series




analysis can also  be  applied to  spatial  data, many  new and different




problems remain. In time series analysis we consider past  events,




resulting in  dependencies that extend  only in one direction. In  . spatial




analysis, even if on a straight line in the plane, dependencies  can




extend in two directions at  once.








Our aim  in this paper is to  summarize the essential theory behind




kriging. We will  examine  its fundamental definitions,  assumptions,  and



tools. We will be most interested in considering how useful the




technique is in small  sample estimation of environmental data. Some of




the simplifying  assumptions used in kriging are  not  always reasonable in




many  environmental settings. Without these assumptions a much  greater




amount of  data is  needed.  One difficulty is that often economic or other




considerations require that  we work with sample sizes that are extremely




small. Much of the traditional  work in  kriging has been in developing




techniques for  use with large  data sets. Little previous  work has gone




into techniques for use with small data  sets. Kriging procedures need




modification  and further exploration when used  with  small amounts of

-------
data. We will  examine  some approaches  to this  problem.








We will  present several examples to  show the essence of kriging. Some




examples connect kriging and time series prediction.  Other examples




illustrate  some of the unique problems of spatial  estimation  in  the




presence of a trend.  We  also have a particular environmental example




that will be carried  throughout the paper to illustrate  various




techniques.

-------
2.0 LANDFILL  EXAMPLE








To -Fix ideas on a  particular  practical problem we will consider data




collected on a land-fill  in Dade County(Miami) Florida -for the




Environmental Protection  Agency by Technos, Inc.(19S4). The land-fill




covers  an area of about  one square  mile and is  located  in the  Biscayne




Aquifer. This  aquifer is relatively  homogenous and  near  the ground




surface. It is tapped by  two  county well fields about 1.5  miles



down—gradient from the landfill.








It was  of  interest to determine the effects of the landfill on the




aquifer, and to measure these effects eight wells  were  drilled over  the




neighboring area (Figure 2.1).  Figure 2.2  shows measurements of pH taken




from samples  of  the water at 1O foot depths. We  would like to examine



the structure of these measurements and, with the assumptions,  tools and




techniques of kr-ging,  estimate the pH at the unobserved  positions. In




particular, we will compute  estimates of pH at  all  the "x" points in the




grid  of Figure 2.3. This appears to be an ambitious goal:  to  estimate




160 grid points from S  data  points. We shall see  that this estimation is




quite poor in the  corners of the map, far from the data,  and better in




regions surrounded by the data points. Furthermore,  kriging,  being  an




exact interpolator, will reproduce  the pH estimate  at well M7A exactly




as its  given value of 11.4. This large value of  pH may well be




questionable,  but kriging  will assume  all of the data are  correct and




produce an interpolating  surface.








As mentioned  earlier, the practice  of kriging, developed  mainly in

-------
         FIGURE 2.1



Dade County (Miami) Florida Landfill


WASTE
RECOVERY
PLANT






0
M2


SCALE
i?5fl t*.

BORROW PITS





NW 58TH STREET
LANDFILL 0
H



DORAL GOLF COURSE



ROADS



0
M8
0 0
4A MSA MSA

0
MQ
™\J

X
"^0-
^X
J
WELLFIELD/

e ~~
/
M7A /
,
0
M18

CANALS



r

.









,.;;.,
	 1
I.J.J
L.
•*•*•»
,,.,1
I.J..I
ii|





-------
February 1986
Washington,  D.  C.
This publication is available free upon request, in single copies. For further information write
to N. Phillip Ross, Chief, Statistical Policy Branch, Office of Policy, Planning, and Evaluation,
U. S. Environmental Protection Agency, 401 M Street, S. W., Washington, D. C. 20460.

-------
                            FIGURE 2.2
                     PH levels  at  10  foot  depths
7.
                                   o
                      0
                                    7.4
                         .8  6.7     6.6
                                   o
                                    6.8
                                                    o
11.4
7.0

-------
        FIGURE 2.3
Grid of Points to Estimate
X )
X )
X >
K )

x M2 )
0
X )
X )

X X Of? X
: x x x x
X X XX
X X XX
; x x xx
H4fi
X X X X °

X X X X
: x x x x

X * X X X X
—
X X X X X
X X X X X
X X XX X
\ A /"» Mft ^
0
< X X X X
MSA N6A
X ° X X° X X
V
X X X X X
X XX X X
u M9
X X X X X
X X
X X
X X
x £
UM
X X
x y
#\x x
X X "%£
>: x x /
\ • \.' •.;.•'/
l\ A .»/
7A /
X X " X
X X X
M16
X X
X X

X X
4* 	 W—
XXX
XXX

XXX
.1 . W .. i W 	 . "ii . . .
                                                      00

-------
geological settings, has  been with large data sets. Little  has been done




with small data sets.  The problems encountered with  data sets of this




size are considerable, and the confidence in the estimation can be low.




Given our  choice we would much prefer to use  more data, but this is




often not  possible. We are often  forced to  analyze small samples. Some



approaches to  this problem will be presented.

-------
                                                                          ro
3.0 DEFINITIONS AND  FUNDAMENTAL TOOLS









3.1  COVARIANCE FUNCTION









We  will begin by considering a stochastic process,  that is a collection




o-f  random  variables  Z(x) indexed by a parameter x.  The random variable




Z(x) could represent  the prices o-f gold and silver  at  a certain  time x.




In this case we  would be using time series analysis, where  x  is




typically a time index with a natural ordering, and Z  and  VAR(Z(x)) denote the mean and variance,  respectively  o-f




the random  variable  associated with the spatial position x.  A




fundamental  tool in the  study of stochastic processes is the covariance




function:
   K(x,y> =  E[  Z(x)  - E(Z(x»][ Z(y) - E(ZCy))]
This function measures the joint variation of the measurements at




spatial positions x and y. For example, if when,  on the average, a




measurement Z(x) at spatial position x is above  its  mean,  another




measurement Z(y) at position y is below  its  mean,  then the covariance

-------
                                                                       11





between  the two points would be negative. Positive covariances indicate




joint variability on the same side of the means, on  the average. The




magnitude of  this covariance is best considered by dividing it by the




standard deviation  o-f  both Z(x)  and Z(y). This would produce  a




correlation function,  p(x,y). Points with high positive  correlation would




tend to  stay above(or below) their respective means  together, on the




average. Points with high negative correlation would  tend to lie with




one  above  and one  below their means.








Calculation of the covariance function  requires  knowledge of the means




of Z(x) and  Z(y), so  that, geologists  have made special use of  another




measure  of joint  variation that can be calculated without the means.




This function, called the variogram,  is  the variance of the  difference




or increment  Z(x>—Z(y),  and  is written:








  2'Y -  Z(y)  )








^(Xjy) is  called the  semi—variogram. This function is often called the




serial variation function in older  statistical literature  see




Jowett(1952,1955,1957),  and  a special case of this serial variation was




investigated by Von Neuman, et.al.(1941)  as the mean square sucessive




difference.  It is often used in  tests of randomness.  The semi—variogram




also measures covariation. Two measurements that tend to be above  their




means  together would likely  have  a small amount  of variability in their




difference.  Measurements that  tend not to be correlated would have a




difference  that was quite  variable. As  we  will see, the variogram is, in




essence, the  negative  of the covariance function. Points with large




covariance  have  a small variogram value, and paints with low  covariance

-------
                                                                       12





have  a high  variogram value.








Several simplifying  assumptions help us to model  and  predict  spatial




behavior.  If  the covariance function of a stochastic  process depends




only on the  positions of two  points relative  to each other and  not on




their particular fixed locations, we  say that  the process  is  stationary.




To be more specific, a stochastic  process is  said to be stationary of




order  two if it has  a constant mean and  its  covariance function K(x,y)




depends only on the difference vector h=x-y  and  not  on the particular x




and y  chosen. We could then write  K(x,y) as some  C(h)  or C(| hi ,<*), where




I h|  is the length of the vector h  (distance between the vectors x  and y),




and oc  is the direction angle.  All points separated by the vector h  would




have  the  same covariance.  Figure 3.1.1  shows  an example of points




separated by the same distance and pointing  in the same  direction  and




would  thus have the same  covariance for a stationary or homogenous



process. This type of covariance might be applicable  to the  flow of



water  down a gradient, as  illustrated  in  the  next section.








3.2  SIMPLIFYING  ASSUMPTIONS








      3.2.1 ISOTROPY








Many  of the  equations developed in kriging are simplified greatly "if we




assume that  the process has  the same covariance structure in every




direction. Such  a stochastic process is called homogenous  and isotropic,




i.e.,  its probability structure does not change with a change  in  spatial



position or a rotation about  a spatial position. Isotropy indicates some




uniformity in every direction.  For  example, at midday  in a  rural

-------

Fi 9 ure 3.1.1  P o i nts a long  a  grad i e nt  where the
               co^ariance might depend'on  distance and direotii
                                                                             U)

-------
                                                                       14




neighborhood -the sky  might be  uniformly the  same shade o-F blue at  any




point just above the  horizon. If we consider an urban  neighborhood  close




to an industrial factory, this  uniform  blue shade might turn




considerably darker when we  look  towards the factory's smokestacks. This




lack of uniformity  is  called  anisotropy.








Isotropy is of great  use with  small data  sets where it is difficult to




model and estimate  covariances or variograms that  depend on both the




distance and direction between  two points in space. Much  more data can




be combined if we assume that the covariance between  two points depends



only on the distance  between the points.  The assumption of istropy is




not required to estimate the process with kriging; it serves  to  simplify




the equations and  calculations. In fact, in many applications such an




assumption would not  be realistic. Figure  3.2.1 shows two  directions of




measurement of  a flow of some  pollutant  down a  stream. Also  displayed




are typical sample variograms  for the  downstream direction (D) and the



across stream direction (A).  Notice that the  measurements across the
                                t


flow of the pollutant  exhibit a much greater variability at short




distances than the  measurements  down  the stream. The  downstream



measurements are much  more  nearly alike  and therefore have much less



variability within the pollutant flow. An assumption of  isotropy here




would not be reasonable.  The variability  in both  directions would have



to be taken into account.








The form of the  covariance function  for  a homogeneous, isotropic process



is quite  restrictive.  Ripley<19Sl) has pointed  out that  in the plane



isotropic  correlations are bounded below  by  -.403. Further bounds relate




isotropic  covariances to  Bessel functions, see Matern(196O). Unless

-------
                              VAR1QGRAHS
                                    ACROSS  STREAH
                                       DQHNSTREAH
                                       DISTANCE
FIGURE 3.2.1

-------
                                                                       16
explicitly stated, we will assume  that  the process we  are modeling is




both homogenous and isotropic.








       3.2.2 INTRINSIC HYPOTHESIS








Geostatisticians sometimes  encounter processes  with an increasingly




large capacity for spatial variation, i.e.  points  at great distance from




one another have an increasingly larger variability. When dealing with




these  processes with an unbounded  variance, geostatisticians  often  make




use of another property called the intrinsic property.  A stochastic




process is  said to be  intrinsic if the mean is constant and the




increment or difference Z(Xj+h)—Z(x1>  has a finite  variance which does




not depend  on  x-. This is equivalent to the increment being a  stationary




stochastic  process of  order two. This  same idea   is often  encountered in




time series analysis, where a process  may exhibit nonstationary




behavior, but its first order difference is stationary.  In a sense we



assume that the nonstationary behavior that produces the  unbounded




variances can  be  subtracted out  of the process. Thus even if the




covariance  function was  infinite,  the variogram,  being the variance of a



difference,  would  be  finite.








3.3 PROPERTIES OF THE COVARIANCE FUNCTION








For a  stationary  stochastic process with mean  p, the covariance function




C(h), depending  on a vector  h, has the  following properties, (see




Journel and Huijbreghts(1978)>:








1)  C(h)  is positive definite,  that  is  for any linear combination

-------
                                                                       17






y=a1*Z(x1>+ • • '-»-a *Z(x  ) the -function C(h) is such that the  variance of y




is always greater than or equal  to  zero.




2) C(0)=VAR(Z(x)> is nonnegative.




3) C(h)=C(— h)  the covariance function is an  even function.




4) I C(h)|  < C(O)  Schwarz's inequality.









Many  other  properties relating the covariance function to spectral




representations of stochastic processes can be found in Ripley(1981> and




Koopmans(1976).








The variogram is related to the  covariance function when the  latter




exists by:




           •Y<  h )  = C(O) - C(h)




Furthermore,




1) 'Y(0)=0,  although the limit 'Y(h) as h approaches zero is not necessarily




zero.




2)
The variogram may exist even  in the  case of a  process with infinite




variance (C(0)>,  and its calculation does not require knowledge  of the




mean. For these reasons it is often  favored by geostatisticians  for




those  processes with  "unlimited  capacity for variability," see  Journel




and Huijbregts(1978).








3.4 NUGGET  EFFECT









Dne would expect that as the  distance between  two  points in space x,




and x0 decreases to zero, the variability of the increment

-------
                                                                       18





ZOO—Ztx^,) would also decrease to zero.  This  continuity  o-f  the




variogram might well be expected in a region with  smooth uniform




measurements.  But the variogram  need not be continuous at  zero, and  this




fact has practical advantages to a geologist. This property is termed




the "nugget effect."  Its origins  can be  due to measurement errors  or  to




what Journel and  Huijbregts(197B) call micro—variabilities. Such




discontinuities are reasonable if, for example, a geologist's core




sample at a  given depth has hit a "nugget" of mineral,  figure 3.3.1.




Suppose  that this sample  core has removed a concentrated nugget of




mineral,  leaving behind a  region  perhaps devoid of any  other measurable




quantities. No  matter how close  another sample is taken to  the first,




quite different measurements would result. We would then measure




significant nonzero variability of the increments at arbitarily  short




distances.  Of course the  variability  between  the original core  sample




and itself  is zero,  but the variability  between two arbitarily close




points may be  quite  positive. This discontinuity  of the variogram at the



origin  helps  to rodel both continuous and discrete changes in spatial




processes.








3.5 VARIOGRAM  MODELS








Geostatisticians have  developed  a catalog of forms for isotropic



covariances and variograms.  There are  eight basic variogram  models




usually considered, see Journel  and Huijbreghts(1978):








(Here we will use  h=| h|  since we  are only considering isotropic models.)








1)  The  spherical or Matheron variogram  is given by:

-------
                 FIGURE 3.3.1
             \
THE NUGGET EFFECT
                                   Location of core sample
Underground
^Av/AVV^xVv/AlV^

               Nugget  removed
                                                                    V
                                               ^VVX
                                               \:
                                                             My
                                 x Surrounding region
                                   with only residue

-------
                                                                         20
               = C [<3h/2a)-h3/<2a3>]    -for  h<=a



                    = C                        for h>a
 The value a  is  termed the  range,  a distance beyond which two points  in



 space are no longer correlated. The  value C is termed the sill  and is



 equal to the  variance o-f the process C(0).







 2)  The exponential model is given  by: 'Y(h)=C-(l—exp(—h/a»







 3)  A somewhat inappropriate name  is  given to the Gaussian model, with


                         2   2
 the form: (Y(h)=C-(l-exp(-h /a )). These last two  models reach their sills



 asymptotically.







 Variogram models  without a sill, useful  in modeling  data with



 increasingly  larger spatial variability with increasing distance,  are:







 4)  the linear  model: 'Y
-------
                                                                       21




3) generalized covariance model: a specialized  linear combination o-F  odd




powers  o-F h,  used in filtering of trends  in kriging. (See section 7.0)
Figures 3.5.1  and 3.5.2 show  the relationships  between some  of the




variogram  models. Figure 3.5.1  shows examples of variograms  with  sills,




and Figure 3.5.2 shows those without sills. Notice the different rates




at which the  variograms  approach the  sill. The more rapid the  approach,




the smaller is the covariance between distance points, and  the spatial




dependencies  remain  quite  local. The rate  of approach and the  sill value




are often  related to estimate geometrically the parameters of the model.




 The  exponential  model reaches 95/C of its sill at a value of  3a,  so  the




scale parameter,  a,  is taken  to  be  one-third of this distance. For the




gaussian model, the  scale  parameter,  a,  is taken  to be 1/V3 of the




distance of 95% of the sill. For the spherical model the  range




corresponds to the intersection of a tangent  line at  the origin and  the




line  of the sill. This value is 2/3 of  the  range,  a, so that a  is taken




to be 3/2  of this distance. These purely geometric properties  are used




extensively by geostatisticians to fit variogram  models,  see Journel and




Huijbregts(1978).









3.6 BOUNDS ON VARIOGRAMS









Variograms that  increase too rapidly are often indicative of a process




with a nonconstant mean. For example,  consider a one—dimensional process




of the form:




        Z(x) = |l(x) +  e )=O.




The variogram for such a  homogeneous process would  be given by:

-------
               FIGURE  3.5.1

        URRIQGRflM MODELS UITH  SILLS
                            EXPONENTIAL
D
2        3
  H (DISTflNCEl

-------
                 FIGURE 3.5.2

       LINERR RND  PDUER FUNCTION  UflRIDGRHMS

              H	1	H-
CM
o
     0
234
 H  (DISTflNCEl
                                                                               to

-------
                                                                       24
  2-Y(h) = E( Z  - Z(x) >2


         = E(  e )2 + E( p(x+h> - p )2
The  e term is  simply the  variogram o-F the residual errors. The  p term is



the  bias due to a drift in  the mean. If, for example, the mean is



linear, i.e. ^(x)=x, rather  than  a constant, then this bias term becomes


 2
h , and the variogram  would grow quadratically. A  quadratic mean would



produce a  variogram that grows as the fourth power  of h,  etc., see



Starks and Fang(1982) for further discussion of the effect of



nonconstant means,  as well  as Neuman and Jacobson(1984)  and  Aboufirassi



and  Marino(1983).






Matheron<1973>  showed for spatial processes  the fact that  the covariance



is positive definite insures that for a vector  h,




                 lim  -YtfD/lhl2 =  O

               I h| ->oo




That is,  -Y
-------
                                                                       25




when one  is  below  its  mean the other is above its  mean.  This  is  termed




the hole—e-f-fect. The bounds  mentioned earlier  for isotropic  covariances




limit the  size  of negative covariances and these hole effects,  and




therefore  considering  only positive covariances is  not as restrictive as




it sounds. Variogram models  with hole effects  are typically




exponentially damped sines and cosines.








The theoretical bounds on isotropic covariances discussed in section 3.4



are often  taken as rules for distinguishing  hole—effects  from  random



variability in estimated variograms. This variability in estimated




variograms needs  further study. It is clear  that a more  sound




statistical technique is needed to identify hole—effects.

-------
                                                                       26
4.1  ESTIMATION OF THE  VARIOGRAM




Geostatisticians use the  term structural analysis to denote the

estimation,  fit, and study of variogram models. Empirical variograms are

calculated from a given set  of data and used  to fit the  various models

presented earlier. The empirical variogram is calculated as:




          —        --         2
               CZ(x.-t-h)  - Z(x-)l
         nh i=l    1        x
where  n.  is  the  number of pairs of points separated by the distance h.




Figure 4.1 shows a sample  variogram calculated from the simulated

istropic process discussed in the  appendix. A cubic spline has been  fit

to the data, merely  to connect smoothly the  points  for easier viewing.

Notice that  the  variability increases as the  distance between points

increases. This corresponds to  a decreasing correlation with increasing

distance. Notice  the approach to what appears to  be a flat sill value  of

about  25, around a distance of about 4 units. Notice  also  the random

variability as  the values fluctuate and do not increase in a strict

monotonic fashion. Whether a strict monotonic variogram should be fit in

these  situations, or one with a hole  effect (as  with  the splines)

deserves more study. Most practioneers would dismiss hole effects in

this example.




Some studies of  the statistical behavior of the empirical variogram have

been undertaken  by Borgman and Davis(1978,1981).  They have  shown that,

for processes  with a finite variance, *Y(h)  is  asymptotically normally

-------
CD

CO
in
CM
o
CM
FIGURE 4.1  VARIOGRAM CALCULATED FROM SIMULATED DATA


       H	1	H	1	1	I-
O
     0
                                H (DISTRNCE)

-------
                                                                       28

distributed.  They have also produced, for  one-dimensional  normal



processes, tables of  the exact sampling distribution of  the variogram by



numerical inversion  of the characteristic function.








4.2 ROBUST ESTIMATION








Small sample estimation can lead to  estimates that are  quite erratic.



Estimators that  minimize the effects of single influential  data  points



might prove  useful in such problems. Cressie  and Hawkins(198O)



investigated several  of  these,  so—called, robust estimators of the



variogram. Their  approach was  to consider variogram estimation, not as  a



problem of variance or  scale  parameter  estimation, but  rather  as a


                                                                 o
problem of estimating the mean of the quantities W.=CZ(x-)—Z(x-+h)>



where  Z(x.) are normally distributed.  Power transformations of Y.  to



symmetry  and studies of cumulants suggested that


               1/2
Y^l Z-Z(xi+h)|    is approximately normal with a skewness of   -OS and a



kurtosis  excess  of  -.52. Several estimators  of  the point  of symmetry of



the Y. distribution were then considered, including  the mean, median,



trimmed means  and M—estimators  of Huber,  Tukey, Hampel, and Andrews (for



studies of robust estimation  see Hogg(1974), Huber<1972)). The mean



Y of the  Y.  is approximately normally distributed, by the central



limit theorem,  and through series expansions  Cressie and Hawkins found



that for  n=n. ,
E[Y4/2'Y
-------
                                                                       29
Z =.6Z  ,+u   where the u  terms  have various distributions with
 n    rt  i  f\             r?
progressively heavier  tails. Cressie and Hawkins interpreted  these


realizations as traverses across a spatial process  and considered the


estimation of 'Yd) only. Their  findings revealed that  the M-estimators  of


the mean of Y.  are best, but considering its good performance  and ease


of calculation,  suggested that the variogram be calculated using:
      = Y A.457 + .494n * + .O45n 2)
where   Y = ^  £| Z(xi+h)-Z(xi>| 1/2   with  "=nn






Figure  4.2 shows  the Cressie and Hawkins variogram  for the simulated



process discussed in the appendix. Notice that  for the one distance  that



they investigated (h=l)  their estimation of 'Y(l) is slightly  larger than


the empirical  variogram, but for all  other distances (h>l) it is



uniformly lower.






In one— dimension Sharp(1982) has examined estimation of variograms with



the maximum entropy method from time  series  analysis.
The previous  techniques have brought  much of modern statistical



estimation theory to  the study of the variogram, but unfortunately most



techniques used  in practice  are either grossly subjective or blindly



numerical.






Journel and Huijbregts(1978)  present  several case studies  in which

-------
CO
in
Cxi
(XI
CI5
                  FIGURE 4.2


    EMPIRICnLtTDP] flND CRE5SIE RND  HRUKINS  UflRIDGRflMS

                     -f

                     2
3       4
 H  miSTRNCE]
	I	



    6
                                                                                     U)
                                                                                     o

-------
       FIGURE 4.3   GEOMETRICALLY FITTED VARIOGRAMS
CO
_
CM
Cxi
D        1
                                                                     7
                                                                                              to
                                                                                              r-J
                                 H  [DISTRNCE1

-------
                                                                       32




simple geometric properties of variograms are used to  estimate its




functional form.  For example,  subjective, graphically  chosen values  for




the sill  and the range are used in the variogram forms listed earlier.




Figure 4.3 shows the results  of fitting the  spherical, exponential,  and




gaussian  variogram  models to the simulated data presented earlier. The




sill for  all of the  model  was  chosen graphically to be 25.25.  For the




spherical model the  range parameter,  a, was  calculated,  following




Journel and Huijbregts(1978>,  as three—halves of the distance (h) value




obtained  from the line tangent to  the variogram at the  origin.  This




graphically fit variogram  fits the  first two  values quite well at the




expense  of the  remaining  data. For  points  from 2 to 4  units  apart, this



spherical variogram  suggests  more variability  and  therefore less




correlation between  these points. The  exponential  scale  parameter was




taken as  one-third  of the range(taken  to  be 4 units). The  gaussian scale




parameter was taken as 1//3 of the 4 unit range.








The gaussian  fit seems to pass more  closely  to the data values beyond a




distance of 2 units, while suggesting  more highly correlateddess




variable)  values  under 2  units apart.  The exponential fits  well  only




beyond the range. Since local behavior is important for  most  estimation



methods  by kriging,  probably  the spherical or gaussian models would be




chosen. The expected behavior at  the  origin  of the process being studied




would decide between the  two, with  higher  expected correlation  for




extremely close values going in favor  of  the gaussian model.








Geostatisticians have defended  these  graphical practices with the



evidence of much empirical work.  They  argue that estimation of  the




variogram is not their primary goal, but only a tool for estimation by

-------
                                                                      33




kriging, see  David(1975). They  are also in the unique position of being




able to compare directly their predictions with the true measurement. If




an estimate  of  the mean grade of a  mineral  is obtained for  a particular




point they can  then mine that region and directly  measure the  mean




grade. This method  of direct checking of the estimation has  been a




tremendous benefit to the  development of the  practice of  kriging.




Unfortunately, these same  checking procedures are impossible to obtain




in estimating, say,  acid rain deposits.








4.4 NUMERICAL ESTIMATION PROCEDURES









In the above graphical procedures the choice of suitable  sills  and




ranges was surely in the eye of the beholder. The biases  of different




researchers  would produce  different  variograms and result in different




kriging estimates.  To eliminate these biases, many  regression techniques




have been used to  fit the  variogram. Devary and Doctor(1981) used




non—linear least squares to  fit  the  gaussian model to hydrologic field




data. Since variogram estimates  at different points are likely to be




correlated, the using of techniques that assume uncbrrelated data  is




questionable. If one had knowledge  of the correlation between variogram




estimates  at different distances, these  could be incorporated into  a




generalized least squares  estimation  procedure. Unfortunately,  these




correlations are not known. Figure 4.4 shows the results  of  fitting




variogram  models  to the simulated data presented  earlier using nonlinear




least squares. Notice that each  numerical technique seeks its own sill




level in  an attempt to obtain the best fit.









Some researchers have tried to  improve  on  these fitting procedures.

-------
      FIGURE 4.4   NUMERICALLY FITTED VARIOGRAMS



CD   j	L	1	1	]	h
in
CD
CM
                                          D
     0
2        3
5
6
-H


 7
                                                                                                U)
                                  H  miSTRCEl

-------
                                                                      35





Aboufirassi and Marino(1983) determined  the  range of a spherical model




graphically, but estimated  the sill using cross—validation.  They began




with a sill  value  of one, then removed  and  reestimated(using kriging)




each data point from the spatial  data.  The  resulting estimation variance




can be shown  to be 1/C times the "true" estimation variance when the




variogram with the  "true" sill value  of  C is used.  For  each data point,




the difference between the observed and estimated data values was




divided  by individual estimation standard deviations to  produce what




were called reduced errors. The sample  variance  of these reduced errors



was taken as  the value of the sill. Sample  variability  of the variance




estimate was  not considered.  They further used geometric  properties  to




fit a gaussian model.








Other techniques  used in the  kriging of nonstationary processes can be




used to  estimate generalized  covariances and thus eliminate the need for




direct variogram calculation. We shall see these  techniques in  section




7.0








4.5 SMALL DATA SET VARIOGRAM ESTIMATION








In small  data  sets  these techniques  are difficult to apply. In  our




landfill example,  not only do we have a  small amount of data, but the




data arrangement is extremely unequally spaced.  For a given distance h




there may be  at most  one  (n. =1) pair of points separated by h. In larger




data sets we  could increase the number of  terms (n.) by grouping




together terms in neighborhoods of various  distances.  The average of



these terms would then be  assigned to  the  average distance of the point




pairs. In large data sets  this averaging procedure has  been investigated

-------
  in
  CM
  CM
it:
I.—V
   n
FIGURE  4.5.1

	i	
  INDIVIDUAL VARIOGRAM TERMS

.__	i	i	i	
      r          ~i    • ~    • r

      D
                n
               n     o
                                                             FOR  pH
                   D
                   10
?0
                                                                          n
                 3D
                                            1Q
BO
                                                                                               U)
                                                                                               cr»
                                     H

-------
                                                                       37


by Myers,  et.al.(1982).





To illustrate this technique,  Figure 4.5.1 shows the  values of W. =

              o
•tZ(x.) — Z(x-+. )>"*"  for all 28  distances from our original 8 pH


measurements from  the landfill  example. Notice the large  variability in


these  terms. Figure 4.5.2 dispays the results  of  averaging disjoint


collections of  about 6 of the W terms.  A gaussian variogram  has been


fitted to the resulting 6 data  points using  nonlinear  least squares.


With such  a small data set  there is a drastic reduction  in the  useable


data to  estimate the variogram. Furthermore the  resulting variance, as


given by the sill, is more than a factor of  two greater than the  simple


variance of the  pH  measurements.





This is one technique that  can  be used to obtain an estimated variogram


from such  a  small set  of data.  Some other approaches that have proved


useful to  some  degree are  suggested below.





1) A  scatterplot  smoothing of the W  terms with respect to' h  could be


applied. This has been  done with the robust  smoother, LOWESS, which


stands for LOcally  WEighted Scatterplot  Smoother. This smoothing


technique was developed  by  Cleveland(1978). It  is essentially an


iterative regression prediction weighted most  heavily  by  neighboring


points and least heavily, by points with large  residuals. The general


results are a smoothed estimation of the variogram  from the (h,W)


scatterplot that shows  a linear increase for small  h,  followed by  a


relatively  flat range for middle—sized h, ending in a slight downward


turn for large h. The degree  to which this smoothing could estimate the


true underlying variogram needs further investigation. A  simulation of

-------
  CD
        FIGURE 4.5.2


GRUSSIRN UflRIQGRflM NDNLIN  REG FIT  TD PH

      H	1	1	1	H——h
  GO
3EI
cc
CE
  CD
                   1Q
            IB
20    25    30
    DISTflNCE
35
45    50
                                                                                            U)

                                                                                            00

-------
                                                                       39





this estimation procedure  with  small samples  and a known parent




variogram is needed.








2) Other robust  regression techniques could also be applied to the  (h,W)




scatterplot. A discussion o-f many  iteratively re—weighted  least square




techniques that  can  yield  a  variety o-f -fitting criteria is presented in




Hosteller and Tukey(1979).








3) A resampling plan  such as the jackknife or the  bootstrap could be




used to  obtain  estimates o-f the variability  of  any o-f these  estimation




schemes. The resulting variance could  be used to place confidence




intervals on the variograms  and in turn introduce the estimation error




into the kriging  estimates.

-------
                                                                        40
5.0 ESTIMATION BY  KRIBINB









5.1  SIMPLE KRIBING









Kriging is the name given by  Matheron(1965) to  the calculation of best




linear unbiased estimators  of various functions of  spatial processes. It




is the time series prediction theory of Wiener<1949) and




Kolmogorov<1941) applied to  spatial data, and  as such is estimation  in




the presence  of correlated errors. The  basic statistical  technique  is




widely used by  statisticians  as generalized  least squares. The




application of this technique has been  studied extensively as a  time




series prediction problem by  Srenander and Rosenblatt(1957),




Koopmans(1974),  and Box and Jenkins(1976), to  name  only  three.








Consider  a spatial process of the form









    Z(x) = p(x) + e




2) ^l(x)=unknown constant




3) [l(x)=unknown polynomial in space.

-------
                                                                          41
For  observed  values Z(x~),...,Z(x ) we  wish  to estimate(interpolate)



the  value of the  process at  an unobserved point x by  a linear



combination  o-f the observed values, i.e.
          o,

  Z(x>  =
The  coefficients A-  are chosen  so that  Z(x> is:



1)  unbiased,            E[Z(x>] = Z(x>,        and


                                 2
2) mininum variance,   )  -
where  

-------
                                                                        42




where  K is the covariance  matrix  Her. -3,  x='  and
This system has a  solution  provided that K  is positive  definite. The



minimal variance is given by:
        = Var = 0, consider the following time

-------
                                                                        43

series prediction problem. Let Z(x)  = e(x) + ,5e(x—1)  be a

one—dimensional first order  moving average  time  series.  The e(x> terms

are taken to be independent, normally distributed error  terms with a

mean  of  zero and a variance of  one. The covariance  of this process is

given by:



                                5/4    for  h=0

  Cov(Zx,Zx+h) =              1/2    for  h=l

                                 0      for  h>2
Consider a  sample Z,,...,Z-  of size n=4 from this process.  The

covariance  matrix  K for this sample is given by:
               I  5/4   1/2    0    O   I
               I  1/2   5/4   1/2   01
               I   O    1/2   5/4  1/2  |
               I   0     0    1/2  5/4  |
Suppose that we wish to  predict  Zg. The  covariance vector

(o-j ,,...,'=<-.047, .117, -.246,  .498)'. The  estimate

of Zg then becomes  Zg =  -.O^721 +  .117Z2 - .246Z3 + .498Z4. This

estimate has a mean of zero, since  each Z^  has a mean of  zero.

-------
                                                                       44
This is kriging  and it  is  said  that we have  just "kriged" an esimate of





Z5-
Note the  relationship between Box and Jenkins time series analysis




prediction of Z^ and kriging,  Box  and Jenkins (1976). They assume an




infinite past of available  data and estimate Z—  with  weights (.5, -.25,




.125, -.0625, ... ,(-.5)j+1,...)'. Our kriging  weights of (.498, -.246,




.117, -.047)' are close and  differ  due to the fact that  we are using




only four data points to predict  Zg.








5.3 LOCAL vs.  GLOBAL ESTIMATION








The above approach assumes the  use of all  of the  data, and  since  K  is  an



nxn matrix,  large  samples can severely  limit the practical use of this




technique. Several approaches have been taken to limit the  size of the




matrices involved and thus minimize the  calculation. Journel and



Huijbregts(1978) and many others  suggest using local  estimation by




considering only data points  close to the estimation point x. If  the



covariance  between nearby points decreases rapidly this technique can



limit the  size  of  the kriging  system of equations. For  isotropic




processes the matrix K is completely determined by the relative geometry




of this local neighborhood. If this geometry remains  constant,  say by



moving over a  regularly spaced grid,  then the A constants will not




change for  estimating similarly positioned points. For example, if the




dots in the figure below represent the only data points  used to estimate




the  points represented by x's, only  one K matrix needs to  be calculated




for one x value. Estimates of the others  can be obtained by  rotation.

-------
                                                                       45
                      x   x
                      X   X
As long as  we are restricted to a regular  grid this simplification is

great, but it also requires a large amount of regularly spaced data.

Sparse  regions are not conducive to this procedure. Small data sets can

easily make use of all of the data in  the covariance matrix.  In our

landfill example with 8 data points, we will need to solve a system of 8

linear equations,  if we assume the mean is zero.  More  elaborate forms

for the mean  of the spatial  process will  add more equations.


For geostatisticians,  the large and regular  data  requirements for

kriging  are  net 'Lhat restrictive when you consider that a mining

engineer must make a  decision  to census  the whole spatial region.

Compared to this  eventual census, large sample sizes seem to be

necessary and probably cost effective.


The local neighborhood approach  is somewhat less useful for

interpolating  environmental variables such as acid rain or groundwater

pollution. Measurements of these variables  often  have  large  regions of

irregular spacing.  For  these applications  the global technique, using

all of the  data, which  can span the sparse areas, is needed.


5.4  KRIGING  WITH H EQUAL TO A CONSTANT


Now let us  consider the estimation of  Z(x>=n when n is a

-------
                                                                        46
nonzero  constant.  For our estimator Z(x)=EA-Z(x-> to  be unbiased we  need



the additional restriction that £A-=!.  This  requires that we  minimize:
  E[Z(x)-2(x)]2  + 2v[
where  v  is  a Lagrange multiplier. The resulting system  is
    K      e I  I  A I    I   T.  I
                     	     Art



    e'     O  I  I  v  I    I    II
where  e = (!,...,!>' of dimension Ixn. The minimal  variance is given by:
              ZCx:> - !k'K ^x)  - v
Geostatisticians prefer  the variogram to  describe spatial dependencies,



since the variance o-F the process may not  exist. The  kriging system can



be written in terms of the variogram, but for reasons  of numerical



stability the covariance  form of the kriging system  is  best.  Under the



intrinsic hypothesis of stationary increments this system can be  written



as:
       'ij3    •   I   I  x I  ^  I  ^ I


        e'    O    I  I  u I     I0|

-------
                                                                       47
where *Y. . is the semi—variogram  evaluated for points x. and x., that

is •y,Z



5.5 EXAMPLE  OF KRIBING WITH A CONSTANT MEAN



As a simple  example  of this technique,  let us  again consider a

one—dimensional process and calculate the kriging  system.
Let  Z,, =  e,. +  .5e	1 + 15
      *\     A      yv~"* X
This is a simple, first order moving average process  with  a  mean value o-f

15 and error  terms € with a mean o-f  zero and variance o-f one. The

covariance for  this  process is given by:
                                5/4      for  h=0

    Cov    =          1/2      for  h=l

                                 O       for  h>2
Consider a  sample Z,,...,Z4 of  size n=4 from this process. The

covariance  matrix K  for  this  sample is given by:
                 5/4  1/2    0     0  I
                 1/2  5/4   1/2   0  I
                  0    1/2   5/4  1/2 I
                  O     O    1/2  5/4 |

-------
                                                                          4'8
Suppose that  we wish to predict  Z,-. The covariance  vector

t(7lx'*"»  *or x=4 is *0,O,0,l/2>'.  The kriging system then

becomes:
5/4
1/2
0
0
1
1/2
5/4
1/2
0
1
O
1/2
5/4
1/2
1
0
0
1/2
5/4
1
1
1
1
1
0
                                   I  X,  I     10
                                   I  x-  I     10
                                   1  x_  |  =   |   0
                                   I  x.  |     11/2
                                   I  v   I     11
with a  solution '=(.164,  .244, -.119,  .710)'

Notice  that E^ = 1,  so that our  estimator Z5=.164Z1 + -244Z2 -
.119Z-T  + .710Z4   is an unbiased  estimator o-f  Z~. Note that the

coefficients  are quite different from the p(x)=O case.  Since  the mean of

each  Z.  is  equal to 15,  the constraint  £,\-=l insures that the mean  of
.'*.
Z(5) is  also equal to  15.

-------
                                                                        49
6.1  UNIVERSAL KRIGING








Now  let  us assume  that |i(x) takes  the form of a  spatial  polynomial  with




unknown  coefficients. Kriging with this form of a  mean has  been called




Universal  Kriging.








In universal kriging of a planar process the mean,  |i(x>  at a vector  point




x^Xj^x—)' is assumed to be  represented by  a linear  combination of the



form
            ,  f <

           =  * *
where  the fg(x) are known, linearly independent functions  and the a.'s




are unknown. Usually  the f .(x) functions are  taken to be  monomials  of




order  k. For example, for  k=l, Jj=3 and  f.,(x)=l, f7(x)=x1,  f^.(x)=x0.




These  are the p. oper variables for the construction of  a plane  of the




form a,. •+•  a0x1 + a-^x,.,.  For quadratic  surfaces with k=2,  we  have J0=6
       X     *- 1    *-f ji.  .                                            A.


                 "2   "?
and f•(x)=l,x1,x-,  xt, x^,  xtx0,  for  k=3,  J^=1O,  for  k=4,  J.=15,
     X      X  jiL   x   j&.   X  ^           i^               *T



etc., where J.  = ^  i. If the  estimation  is done  globally  this




functional form for [i(x)  must apply over the whole spatial range. On  the




other  hand, differing polynomial coefficients can  be  estimated for




subregions if  the  kriging is  done locally.









To estimate values of a spatial  process  with a  polynomial  trend as  the




mean, classical statistical techniques yield  the  solution. In a global




setting, it seems  appropriate to estimate the polynomial trend,  subtract




it from the data,  and then apply kriging for a process with a mean of

-------
                                                                         56



zero, to the residuals.  This  is  indeed the optimal solution  provided


that  the  trend is also  estimated with generalized least  squares


accounting  for the known covariance structure. For  example, consider our

                             A.
process Z(x) = (Jl(x) +  £. If [l(x> is an estimate of the  trend from

                                                 vX
generalized  least  squares, we form W(x) = Z(x)  —  |j(x)  and apply our


kriging estimation scheme  for a process with a zero  mean as  seen

            -••s.
earlier. If  W(x) represents  the kriging prediction  for W(x), then the

                                       .A.      .A.     ^
resulting estimate of Z(x) is given by Z(x) = W(x) + |J(x).



                                           XV.       v^.
The  variance of this estimate is  not Var(W(x»+Var]   subject to the  unbiased

condition given above. This  results in the following system of equations

involving Lagrange  multipliers v. :
                                   for i=1'~"n

-------
                                                                           51



and



 £x-f« - IA^.  -
This  technique can  also be applied  locally, in which case different

-------
                                                                        52
forms of  the given polynomial  trend are fit to different regions. One

restriction of this technique is that adequate amounts  of data must

exist in each of the  local regions. This often restricts  this  local

application  to regularly spaced sampling grids.



6.2 EXAMPLE  OF UNIVERSAL KRIGING



As an example of  universal  kriging  in the plane, let us consider  a

regular grid of points that we will index by (i,j). Define  a

two— dimensional process  by:
           a. •+• a^i •*• a--j
where again  the  e  terms are independent, normally distributed error  terms

with a mean  of zero and a variance  of one.  Notice that the mean  of  this

Z process is given  by  a plane with unknown coefficients a-.  The

covariance structure in this  process is generated by the common  terms

that define each Z(i,j).  These  terms arise from  the error terms

associated with positions immediately above,  below, and  to  the left and

right of the (i,j) point, see Figure 6.2.



For this process  the covariance, which depends only  on  the distance

between the  two  points, is given by:
                             17/16       h=O
                              1/4        h=l
               C( h ) =      1/32       h=v'2
                              1/64       h=2
                              0         h>3

-------
                   euj+i)
    €(I-1J)  •	  €(IJ)  	 6U+1J)
                   6UJ-1)
FIGURE 6.5.2  Structure of Error Terms in example
            of Universal Kriging
                                                                   Ul

-------
                                                                       54
Consider the  following array  of data:



            (0,0) *

                         (!,-!> *         (2,-l) *

                           *         (2,-2)  x



The points with the  symbol <*)  are  points  at which we will assume  that

data exists. The point with the symbol (x) will  represent a point that

we wish to predict from  the other  four. For convenience of reference we

will label the  points as  follows:
                         2       3

                         4      (5)



The covariance matrix of these four  data values is given  by:
            I   17/16     1/32        0        0
            !    1/32     17/16       1/4 •    1/4    |
            I      O       1/4       17/16     1/32  I
            I      O       1/4        1/32   17/16  I
By calculating the distances  between the  fifth point and  the four data

points, we find the covariance  vector of the  fifth point with the  four

data  points  is   KJx)' = ( 0 ,  1/32 ,  1/4  , 1/4 X



The unbiasedness condition imposes restrictions on  tha coefficients of

-------
                                                                        55
the four  data points that are used to estimate the fi-fth. In particular




to estimate  the intercept of the planar mean we need the restriction




that Xj+X2"l"^3"l"^4 =  1- To estimate the slope in the (i) or  first coordinate



direction we need the restriction that  Ox«+l \-+2\^+lx-  = 2. This is merely




a x linear combination of the first coordinates of the data  points  set




equal to  the first  coordinate of the point to  predict. Similarly, for




the second coordinates we have  the restriction that




0\1-lx2-lx3-2x4 = -2.






With these restriction the system that we must solve is  given by:
I17/
16
1/32
O
0
1
0
0





1/32
17/16
1/4
1/4
1
1
-1
0
1/4
17/16
1/32
1
2
-1
0
1/4
1/32
17/16
1
1
-2
1
1
1
1
0
O
0
O
1
2
1
0
O
0
0 II Xl
-l II x2
-l II x3
-211 X4
0 J| Vj
0 | 1 v0
0 II v"
1 0 1
1 1/321
1 1/4 |
1 1/4 |
1 1 1
1 2 1
1 -2 1
The  added restrictions are included with Lagrange  multipliers
The  x  coefficients that  solve this  system are  (x1,x2»A3»X4)  =



(— .305,— .084f.694,.694).  If we  arrange these coefficients in the pattern




of the original data we  obtain:




                  -.305




                         -.084    .694




                           .694








Notice that the  largest  coefficients  are given to the two  data values




immediately above and  the left of the point to be estimated. This  is  not

-------
                                                                      56





surprising when you  consider  the manner in which the process is




constructed. What is surprising is that the next  largest  coe-f-ficient in




absolute  value  is given by  the point  that is  farthest away  from  the




point to  be estimated. The  low  weight applied to  the  middle  point is




because  much of the information that it provides is already well




represented  by  the points with the largest weights. The farthest point




brings information  that is not used in the construction of the  others.








6.3 CIRCULAR ESTIMATION WITH  UNIVERSAL KRIBIN6








An important assumption behind  kriging is that the covariance matrix of




the process  is  assumed known exactly. If  it is estimated  the optimal




properties of the estimator are questioned. Furthermore,  even  the




estimation of the covariance  is not straightforward.  We first must  know




the covariance  of the  residuals e(x> before we can estimate  the trend



accurately. On the other  hand,  we  need  to know the trend before  we can




estimate  the covariance of  the residuals.  The circular nature trend and




residual covariance estimation  is a problem that  has produced several




approximate solutions.








In applying universal kriging,  Journel  and Huijbregts(197B)  suggested




that the  sample covariance  of the Z(x) values be  used in  place  of the




true residual covariance  matrix. If the  trend  is  locally constant, this



provides  numerical estimates, but  again optimality is questioned.  They




also suggested  and  Aboufirassi  and Marino(19S3) demonstrated that if the




trend is  mainly  in  one direction, the perpendicular direction may  be



well  behaved enough  to allow  calculation of a sample  residual



covariance.

-------
                                                                       57
Devary  and Doctor(1981) calculated the residual covariance matrix from


the residuals of a  standard least squares  estimation  of  the trend.  The


difficulty  here  is that least squares assumes uncorrelated  errors,  which


is quite inconsistent with the existence of a  nontrivial covariance


matrix of residuals.




Meuman  and Jacobson<1984> suggested an iterative approach to  the


simultaneous estimation of  the  trend and the residual covariance.  They


first 'assumed the order of the drift is k=l, and  then  estimated the


drift  by standard least squares.  They then calculated the variogram of


the residuals.  If this  variogram increased too rapidly  or exhibits


anisotropic behavior, i.e.  different  covariance structure  in different


directions,  they would increase the order of  the drift  and re-estimate


until  the variogram of the  residuals  had  the  "desired  properties." The


resulting covariance matrix was then  used to estimate  the trend using


generalized least squares. This results in new residuals, with their  own


covariance  matrix, which were used  to re—estimate the  trend. This

                                                  xs
iteration was repeated until the  estimated  drift  H
-------
                                                                        58




remove nonstationary behavior  and estimation of the  covariance -from



resulting standard  forms.  They began  by  defining generalized  increments.



For a  one-dimensional process  the first  order difference Z(x+h)— Z(x)  is



a zero order increment that  will filter out a trend that is constant.



Notice also  that  for data values  x,,...,x  the  sum EB-Z(x-) where



E8-=O  will eliminate a constant in  the data. Much of the literature of



generalized covariances  confuses  notation  between the  kriging weights x



and these generalized increment weights  B  by  using the symbol x for



both.  We have  chosen to eliminate the confusion here. For a  point



x-=
-------
                                                                        59
The most important result -for  the application  of  these increments is

        Xv                                                 /N

that if Z  is  a kriging estimate of Z(x) then Z(x>—Z(x>, the



estimation or  kriging error, is exactly a generalized increment. Written



as  -Z(x) + Ex-Z(x-)  we  see that ^=-1 and 8-=,\-  for all  the remaining



coefficients.







Generalized increments  turn  out to have covariance functions of  a



standard form. In Matheron(1973) intrinsic functions  of  order k were



defined as functions for which  k    order increments are weakly



stationary. Furthermore, it  is  possible  to  define  a  function K(h> such



that the variance of the increment is a quadratic form in K(h), that is:
This has the  same form  if K  was an ordinary covariance function for the



Z(x) process,  rather  than its increments. It  is  called  a  generalised



covariance  function. Matheron<1973> showed that any isotropic



generalized covariance function  defines  a class of functions which are



all equivalent up to an  addition of  an arbitrary even—powered polynomial



of degree  less than or  equal to 2k. For example,  for  k=l, K(h)=-h and


         2
K(h)=—h+h  are equivalent. The following  table lists the  essential odd



powers  for generalized  covariance functions of several  low orders.






Order (k)                             K(h)



   0                         C6 + aQh,      with aQ<0



   1                  C&  + aQh +  axh3,       with a^a^O



   2               CS  + aQh +  atn3  + a2h5,       with aQ,a2
-------
                                                                        60
and in R     a^  >  -




and in R     a1  >




For all orders  C8=C if h=O, and O otherwise.
The constraints insure that the  variance expressed in  terms of K(h) is




nonnegative.








Suppose that we choose a generalized covariance from the  forms given




above,  then  delete each value  Z(x.) in turn  and calculate(krige)




estimates  of Z(x.)  from the remaining points. The resulting  estimation



                         (r)
errors  (indexed by  r),  Z(8  >=Z(x->—Z (x.)  are collected.  The variance




of the  kriging errors  is given  by:
   Var= Z I
The generalized covariance function K(h> is of  the form:
  K(h)  = CS •• ^ -„.

             p=0 P
So that  the Var(Z(B(r>» becomes:
where
for p=O,l,...,k
Delfiner(1976) noted  that the variance is linear in the  parameters C  and

-------
                                                                       61




a , so that regressing  Var(Z(8(r>»=E[Z(B(r>)]2 upon T£ and  T£ +1, for




p=O,l,...,k  will produce estimates  of  C and the a 's. Delfiner




suggested performing  a  weighted  regression with weights inversely




proportional to the squares of the T's. Devary and Rice<1982) found the




regression to  be  quite  robust with respect to these  weights and suggest




performing standard regressions  on all of  the generalized  covariances of




order k. For example,  for k=l they would use  regressions with K(h)=CS,




C&+aQh, C5+aQh+a..tV~', C6+a.h , aQh,  and C5+ajh^'. Any given regression




model results  in C and  a  coefficients that are  reused to  estimate new




residuals and  Tr values, which are  then  used  to re—estimate the C and




a  coefficients. The iterations are continued  until convergence.  Devary




and Rice  found the convergence to  be quite erratic,  sometimes converging




to inadmissible values.
Starting this iteration requires a determination of k, the order of the




trend. Devary and  Rice(19S2)  suggested starting with a generalized




covariance function, K(h)=-h,  which is valid for  all values of  k. The




best  choice  for k  depends on the behavior of the  residuals obtained  by




deleting and re—estimating each data point in turn. For various orders




of k,  they examined mainly the absolute residuals  and minimum mean




squared residuals  to  determine k. For the former,  absolute residuals  of




all  orders considered(typically only  k=O,l,2) were ranked by  magnitude




and the value of k possessing the smallest average  rank  was chosen.  Once



k was  chosen, the  iterative  regression proceeded.








It may  well happen that several sets of generalized  covariance




parameters are  obtained from the various models  of  order k  tested. To




choose the best set Delfiner(1976) recommended  comparing  the

-------
                                                                       62
estimation(kriging)  variance, E[Z(& r )]"1"  to the quadratic form in
K(h)  that  de-Fines  the kriging variance. If the correct K(h)  is chosen,
                                                                -s
the  ratio of these values should be close to  one. No  distribution

results were quoted for this ratio,  only rules of thumb i.e.,  the ratio

should lie between .75 and 1.25 for a parameter set to be selected. It

seems  odd that a ratio is given bounds  that are arithmetically symmetric

about  1 rather  than the more reasonable bounds of 3/4 and 4/3.




There  have been some  objections  and reluctance to use this technique

because the resulting  covariance function is difficult  to interpret. It

applies to the differenced process and not to the original data.  Most

practioneers prefer  the structural analysis  that requires the

straightforward estimation and  use of a variogram or  a  covariance

function.  Much  can be learned about  the  variability in  the data from

these  tools.




The  generalised covariance technique with small data sets is  probably

not  very  useful. The differencing that must  be performed and the

resulting  regressions have been seen to behave quite  erratically even in

large data sets. The use with  very small samples would certainly not

alleviate  these problems.

-------
                                                                         63
8.0 A ROBUST TREND TECHNIQUE





As noted  earlier in the context  of  variogram estimation,  small data  sets


and their variable behavior suggest the use o-f robust  procedures of


analysis.  Cressie<1984) suggested a  robust technique for estimating the


trend. Using techniques from Tukey's<1977) exploratory data  analysis,


Cressie estimated  the trend using  what is called  median polish. In this


technique medians  of  the  values  of a two—dimensional process  on  a


regular grid are calculated across  one direction, say the rows. These


row medians  are then subtracted from  each  value in the row. Then medians


of these  residuals are calculated  down the columns and  then


subtracted  from each value in the  column. This  process is then repeated

                                                 *•
until  there is no further change. Cressie suggested using this technique


to estimate  the trend at position    general idea  of using a robust method of estimating  the


trend and  then  standard kriging to  estimate  values of the process  would

-------
                                                                       64


be quite useful. What is unknown is the  form of the resulting estimation

variance when  robust trend estimation is used. More work  in  this area is

needed.
9.0 KRIGING AND SPLINES
It has been shown that,  for a suitable choice of the generalized

covariance function, the estimates of  the values of a random process

obtained by kriging are equivalent to  a deterministic interpolation of

those  values  by a cubic spline, see  Wahba and Kimelhart(1970),

Matheron<1981), Dubrule(1983), and Watson(1984). The spline method

examines a class  of functions that  interpolates  the data and minimize

the integral of their squared second derivative. This is thought  of as a

measure  of  smoothness. The  second  derivative measures the  rate of change

of the slope,  arJ the faster the  slope changes the  rougher  the function.

Squaring and  integrating produces a  total positive  measure  of

"roughness" that is to be minimized.  Watson(1984) presented a simple

example to show the relationship between kriging and a  special function

derived from  the  study  of differential equations of  splines,  called a

Green's function.  The Green's function, obtained as the  solution  to an
                                               »
interpolating  spline differential equation,  and the  covariance function,

used in kriging, are closely  connected. It turns out  that the spline

Green's function is simply an uncentered covariance  function of some

random process. The converse though is not  true. There  may  well be

random processes with covariances which are  not Green's functions. This

makes  kriging  somewhat more general than interpolating  splines.

-------
                                                                       65
In R  the generalized covariance that produces kriging  estimates that are


                                              3      "*
equivalent to  interpolating splines is K(h)=| h| . In R ,



K(h)=h^log| hi and  in  R , K(h)=| hi, see Matheron(1981).  Again, since these are



generalized covariance  functions they  define an entire class equivalent


                                              o
up to  the addition  of  an even polynomial. In  R   Davis and Grivet(1984)



used:



K+ [h2/(2R)log
-------
                                                                         6-6
10.0  KRIGING  THE LANDFILL  pH  DATA




The empirical variogram that  we  will use in kriging the pH data  was


calculate in  section 4.5. In that example the  averaging o-f disjoint


distance ranges  overestimated the variability in  the original data. We


must expect  a large variance of the kriging estimates. The covariances


between  two  points are calculated using the  gaussian  variogram fitted  in


figure 4.5.2. The covariance matrix depends only on the distance between


the sampled  data points, and  the covariance vector  depends only on the


distances between the point  to be estimated  and  the given data


positions. These covariances  are arranged in  a matrix equation  as


illustrated  in section 6.2. Assuming a  constant mean, this matrix

                                                         j-x
equation is solved  for the x  coefficients  and estimates Z(x) are

              j*>
calculated by Z(x)=E\-Z(x-> where Z(x-) are the pH measurements at


each of  the  sample points. Figure 1O.1  shows the  contours of the kriging


estimates obtair«=d  using this variogram (represented as a covariance


function). The contours follow the lines of equal  pH. As in  time  series


analysis we will  expect a prediction that lies far from the data to be


equal to this mean  with a  variance equal to  the  estimated  variance of


the process. The  closer we come to a  cluster of  data the more  the


estimate is influenced by the data, and the smaller  the estimation


variance. Recall  that kriging  is an exact interpolator, so that  the data


values are reproduced exactly. Notice  that the contour line of  pH=7


passes directly through the  location of well  M1O. The  large pH value of


11.4 causes  a rather  rapid falling of the coutours down to  the more


reasonable range of 7. This produces  what could  be  regarded as a  small


"puddle"  of highly alkaline  water near  well M7A. As mentioned in  the

-------
                       FIGURE 10.1
                 Contours of PH obtained by Krlglng
7.
                                 '-   '  \\\

                            "'.   ''•    'i\\
                               '•-  \'-S.,Ss
                                  »   »    •-.. ^y
                                 Q •   •.  .
                              * .   .  • .
                                              * .. .,
                     0 •

                      6
  0       0

,8  6,7    .-6:6
                                                 « ».

                                                     •I N I II
." ?

  6.5
                                  •6,8
                                               I I • • • •

-------
           FIGURE  10.2
Contours  of the  Krlglng Variances of PH
                      M,««'**t*""HM««i»MMi«im,Mtll  .MM"'
                                         „-"'••.   '      \
                                      . - -"  .. '• - „  • v  \
                       »• •
                   '  '
              ,  .5



                0
            7,4-  \
                                        1  '   11'. 4 '  ;  '
                                               i   ,  .  .

                                               i   i  «  •
                                                i  i  •  •
-,&.6,7      6,6.  /
•     *             •
                                            o  •      • •  i
                                            v   f   I  •   I
                                       .11 ii it ••
                      I

                        '
s  .'•.»/••
 \ •  '-.**..
                          C flf
                          O.v"  /  •
                          • '  ..  •
                                . - •  ..-• >
                          ...••••  M---"  /
                             „.»...."-'"  ^/^
                           t • •  ••
              A  'L_
                                                                          00

-------
                                                                       69




introduction to these  data, this value may well be  in error but kriging




is an interpolation technique and will predict the data  as given.








Figure 10.2 shows the  contours of equal kriging variance. The variance




at each  well is zero, and it increases as  the  distance  away from the




wells increase.  Far from the data cluster, in the corners of the  map,




the variance is largest. It must be emphasized that this is  the variance




of the estimation assuming  that the variogram or covariance used is




exact. It does not account  for the  variation in the covariance



estimation. This additional variance would  decrease  somewhat the




confidence in the estimates.








In kriging  the pH  data from the Dade County landfill we are  subject to a




very  small data set. This is not at all uncommon  in many  pollution




monitoring situations.  Data  collection from groundwater  is  quite




expensive, and economic considerations often overcome statistical




objections to sir^ll samples. We have tried to present  several ideas for




and approaches  to dealing  with small data  sets. These approaches have




not been  investigated  extensively. Much more work is required to




determine  their usefulness, variability, and reliability.

-------
                                                                       70
APPENDIX









SIMULATION OF  ISOTROPIC PROCESSES









It is of interest to simulate realizations of isotropic processes to




gain insight  into  their  structure and the restrictions  imposed by the




assumptions  of isotropy.  Matheron(1973) has developed what he calls the




"turning bands" technique of simulation. An example of this technique is




illustrated in  Figures  A.I through  A.4. Figure A.I  is  a two—dimensional




lattice throughout which we  will define our isotropic process  by the




following procedure:  Each point  in  this  portion  of the plane,  not just




the lattice points, is  projected onto a realization of a one-dimensional




stochastic process with a covariance of C.(h), (Figure A.2). All




points lying  between the  lattice points are projected onto the  same




value. Hence points in an entire band of  the  plane are given equal




contributions.  The entire lattice is  then  rotated  through  some  random




angle uniformly distributed  around a circle, (Figure  A.3). The  points in




this rotated lattice are  then projected onto another realization of  the




stochastic process with the same  covariance  C^h). The  results of the




accumulation of these  rotations is shown in Figure  A.4. Here all points




in the plane are  assigned simulated  values corresponding  to the sum of




the contributions from  each rotation. The banded  nature of these random




"turns" gives the procedure  its name.









In d-dimensional Euclidean space the technique  involves producing a




stationary process Z^  with  covariance C.,  on the real line, then  for a




point X=. A contribution  to the

-------
                  FIGURE A.I




ISOTROPIC PROCESS SIMULATION BY MATHERON'S TURNING BANDS

-------
            FIGURE A.2
BAND PROJECTION ON A ONE-DIMENSIONAL REALIZATION
   ,8  -,3  -2   ,5  1,1   ,2  -3  -,8   M  -1
        HITH COVARIANCE  C( H )
                                                                        NJ

-------
FIGURE A.3
                       ROTATION OF THE GRID
                       AND PROJECTION ON ANOTHER
                              REALIZATION
                                  WITH THE SAME  COVARIANCE
                                                             U)

-------
                                   FIGURE A.4
            SPATIAL
      ACCUMULATION OF THE
         REALIZATIONS
POINTS III
THIS REGION HAVE
VALUES OF ,8+,5
             ACCUMULATIONS
           CONTINUE  FOR MANY
             ROTATIONS
                     ,8  -,3  -2
-1

-------
                                                                          75
process  for  a point in d—space is given by the  value of the first

coordinate projected  onto the one—dimensional realization.  The  whole

realization is then given an independent uniformly distributed rotation

about  the origin in d—space, see  Ripley(1981).  An accumulation of  these

realizations  produces a normal isotropic process.  Matheron(1973)  showed

that, as the  number of rotations approaches  infinity, the resulting

isotropic covariance  in d-space is given by C(h)=E(C1(hV)), where  E is

the  expectation operator, Cj is the  covariance for the  Z, process on the

real line, and V  is the first  coordinate of a point on  the surface  of

the  unit sphere in  d—space chosen randomly with a uniform distribution.

A line originating  within  the lattice and radiating outward would fall

close to one  of  the rotations. Points along this line would have  the

covariance structure imposed  by  the one—dimensional  process used in  the

generation. No matter what direction the  line points, the  covariance is

the  same along this line  and depends only on the  distance between  the

points.



Ripley(1981,p.l7) has suggested  a modification  of this procedure  for

simple simulation  of isotropic  processes with covariance function C on a

lattice in the plane.  Here eight independent realizations of a

stationary stochastic process with covariance C/8 are generated on the

real line. Then for any point  
-------
                                                                       76


about the  origin.
Figure A.5 show a  realization of an isotropic process on a 3Ox30  lattice


in the plane generated  by rotations and projections suggested by


Ripley(1981)  onto realizations  of  a  normal,  second order moving  average


process  of  the form:
         Z.  = a. + .5a. _| •+• ,75a. _
where  the  a.  values are normally distributed with a mean  of zero and a

variance of one. Notice the prominent  line of peaks  running  parallel to


the x0 axis for small values of  x..  Closer examination also reveals

diagonal peaks and valleys (note middle left  to upper center line of


peaks). These  diagonal  lines 'can  be  seen even  more dramatically if the

process  is scaled downward. Figure  A.6  shows the result of  scaling the


process  by one tenth.  Here we also have placed lines pointing  to the

diagonals. The diagonal lines are a  result of the eight rotations used


to construct  the  simulation.




Some of  this  behavior  might be expected as  the projections  spread the


extreme  values of the one— dimensional realizations  throughout  various


diagonal lines in the lattice.  The difficulty  with this method of

simulation  comes from  the  small  number of rotations used  and their  equal


spacing.  Eight equally  spaced  rotations of ir/4  produce  diagonals that
                            i
are too  prominent.




A better procedure follows the technique that  Matheron(1973) originally

-------
FIGURE A.s      ISOTROPIC PROCESS SIMULATION BY FIXED ROTATION
                                       TURNING BANDS

-------
FIGURE A.6
ISOTROPIC PROCESS SIMULATION BY FIXED ROTATION
                       TURNING BANDS
                                                                                    oo

-------
                                                                        79





suggested. Instead of equally spaced rotations we should select several




randomly distributed rotations in the plane. Figure A.7 shows  a




realization with 16 randomly  selected rotations. A. line of peaks is not




prominent on the fixed diagonals given  by Ripley's method.








The moving average process used for the one—dimensional realizations




possesses  a  varied covariance structure depending on the parameters




chosen. Journal and Huijbregts(1978) show how  these parameters  can  be




chosen in one—dimension to produce many of the standard variogram  models



in higher dimensions  using  the turning  bands  technique.

-------
FIGURE A. 7
    ISOTROPIC  PROCESS  SIMULATION  BY   RANDOM   ROTATION

                                  TURNING   BANDS

                                                A
                                                                                   00
                                                                                   o
          X
1

-------
                                                                  81
Aboufirassi, M. and Marino,  M. (1983),  "Kriging  of  Water  Levels
in  the  Souss Aquifer,  Morocco,"  Mathematicai_Gegiggy,   15,
No.4, 537-551.

Borgman, L.E. and  Davis, B.M.(1982),  "A Note  on  the Asymptotic
Distribution of the  Sample Variogram,"  MathematicaI_GegIogy,
14, No.2, 189-193.

Box,  G.E.P.  and  Jenkins,  Q. (1976),  Time	Series	Analysis:.
Fgrecasti.ng_and_Cgn trgjl, Hoi den—Day, New York.

Cressie, N.  and Hawkins, D.M.(198O),  "Robust  Estimation  of the
Variogram: I," Mathemat ical._Geal.ogy, 12, No. 2,  115-125.

Cressie, N.(1984),  "Kriging   Nonstationary Data,"  presented  at
the Amer. Stat. Assoc. meeting Philadelphia, August, 1984.

David,  M.(1976),   "The   Practice    of   Kriging,"   Advanced
Gegstatistics_in_the	Mining	Industry,  D.  Reidel   Publishing
Co., Dordrecht, Holland.

Davis,  B.M.  and  Borgman,  L.E.(1979),   "Some   Exact  Sampling
Distributions   for    Variogram   Estimators,"     Mathematical.
Geology, 11, No.6, 643-653.

Davis,  M.W.  and   Grivet,    C.(1984),  "Kriging  in  a  Global
Neighborhood," Mathematical._GegIggy, 16, No.3, 249-265.

Delfiner,  P.(1975),  "Linear   Estimation   of    Non  Stationary
Spatial  Phenomena,"   Advanced	Geostatistics	i.n	the	Mining
Industry., Reidel, Boston.

Devary,  J.L. and  Doctor, P.G.(1981),  "Geostatistical  Modeling
of  Pore  Velocity,"  Battelle—Pacific  Northwest    Laboratory,
Report #PNL-3789.

Devary,  J.L.  and  Rice,  W.A.(1982),  "Geostatistics  Software
User's Manual  for  the  Geosciences   Research  and   Engineering
11/7O  Computer System,"  Battelle-Pacific Northwest  Laboratory
Report.

Dubrule,  0.(1983),  "Two  Methods  with   Different   Objectives:
Splines  and   Kriging,"   Mathematical.	6eoiQ9y»   15»  No.2,
245-257.

Efron,   B.(1979),"Bootstrap    Methods:  Another   look at  the
jackknife," Annals of Statistics, vol.7, no.1.

Grenander,  U.  and Rosenblatt, M.(1957),  Statistical.	Analysis
of_Statignary_Time_Series, J.  Wiley and Sons,  New York.

Hogg, R.V.(1974), "Adaptive Robust Procedures: A  Partial  Review

-------
                                                                  82
and  some Suggestions for  Future  Applications and Theory,"  J._
    .._St at^_ Assgc .. , 69, 9O9-923.
Huber, P. J. (1972),  "Robust  Statistics-Review," Annal,s_gf __ Math...
Stat..., 43,  1041-1067.

Journel ,  A. and Huijbrechts,  C. ,(1978),  MiQi.ng_6eost at i_st i cs ,
Academic Press, London.

Jowett,  G.H. ,(1952),  "The  Accuracy of Systematic  Sampling from
Conveyor Belts," ABB iied_St atistics , *» 5O-59.

Jowett,  G.H. ,(1955),   "Multiple  Regression  between  Variables
Measured at points  of  a  Plane   Sheet,"  Ag_Biied_Stat i_st i.cs , 4,
80-89.

Jowett, G.H. ,(1957),  "Sampling Properties  of  Local  Statistics
in stationary Stochastic Series,"  Bigmetrika, 42, 160-169.

Kimeldorf,  G.  and  Wahba,   G. (1970),   "A Correspondence between
Bayesian Estimation of   Stochastic  Processes and  Smoothing  by
Splines," Annal.s_gf _Math..._St at... , 41, 495-5O2.

Kolmogorov,  A. N. ,(1951),    "Interpolation  und  Extrapolation,"
lyiiet iQ_de_ilac ademie_des_sciences_de __ Ui_S ..S ..R .. » Ser . Math . 5 ,
3-14.

Koopmans,   B. (1976),    SBectral. __ Anal.v.si.5 __ gf_ ___ lime __ Seri.es,
Academic Press, New York.

Krige,  D.G. (1951),   "A   Statistical   Approach  to  Some   Mine
Valuations  and  Allied Problems at Witwatersrand, "  unpublished
Master's Thesis, University of Witwatersrand, South Africa.

Matern,  B. (I960),    Spatial. __ Variation,   Almaenia  Foerlaget,
Stockholm.

Matheron,   G. (1965),   Les __ __ Variabl.es ___ ReaiQQal.ise.es __ et __ Lsur
Estimation, Masson  et  Cie,  Paris,  France.

Matheron , G. (1973),   "The Intrinsic Random Functions  and  their
Applications," Ad y an c es_in _ ABBiiecJ _Pr ob ab iiit y , 5 , 439-468 .

Matheron,   G. (1981),   "Splines   and   Kriging:.  Their   Formal
Equivalence,"  Syracuse   University Geological  Contribution #8,
edited by D.F. Merriam.

Mosteller,  F.   and   Tukey,  J.W. (1977),  Data ___ Analysis __ and
Regression, Add i son-Wesley, Reading, MA.

Myers, D.E. ,  Begovich,   C.L. ,  Butz ,   T.R.,  Kane,  V.E. (1982),
"Variogram  Models   for   Regional Groundwater  Geochemical Data,"
Mat hematic ai_Geg!ggy. ,  14, No. 6, 629-644.

Neuman,     S.H.   and    Jacobson,    E. A. (1984),   "Analysis   of

-------
                                                                  83
Nonintrinsic   Spatial   Variability  by  Residual  Kriging  with
Application  to  Regional   Broundwater   Levels,"  Mat.hemati.cai
GegJLggv., 16, No. 5,  499-521.

Ripley, B. (1981), Sgat i.al _St atist i.cs , J.   Wiley  and Sons, New
York.

Sharp,  W.E.(19S2),  "Estimation of Semi variograms by the Maximum
Entropy Method," Mat hemati.ca!_Gegigg¥ , 14, No. 5, 457-474.

Starks,  T.H. and Fang,  J.H. (1982), "The Effect of Drift  on the
Experimental Semivariogram, "  Mat hemat i.cal._Gegl_ggY ,  14, No. 4,
309-319.

Technos,   Inc. (1984), "Detection  and  Mapping  of  Contaminated
Soils  and  Ground   Water   Using  Geophysics,"  report  to  U.S.
Environmental Protection Agency.

Tukey, J.W. (1977), Exgigratgry __ Data_ Analysis , Add i son-Wesley,
Reading, Massachusetts.

Von   Neumann,   J.,  Kent,    R.H. ,   Belliason,   H.R. ,   Hart,
B. I., (1941), "The Mean   Square  Successive  Difference," Annais
of _Mat hi_S t atj. , 12,  1 53 .

Watson,  6. S. (1971),  "Trend-Surface   Analysis,"  Mathematical.
       ., 3, No. 3, 215-226.
Watson,  G.S. (1984),  "Smoothing and Interpolation by Kriging and
with Splines," £ jthematicai_Gegiggy ,  16, No. 6, 601-615.

Weiner,  N. (1949),  The ___ Extraggl.ati.gnj. ___ i.D.teregl.atign^ ___ and
Smggthing_gf _Statignar^_Ti.me_Series,  J.  Wiley, New York.

-------