United States
Environmental Protection
Agency
Office of Policy, Planning
and Evaluation
Washington, DC 20460
                           EPA-23O08-8S035
Statistical Policy Branch
ASA/EPA Conferences  on
Interpretation of
Enviionmental Data
   Sampling and Site Selection
In Environmental Studies
May 14-15, 1987

                             n

-------
                        DISCLAIMER
This document has not undergone final review within EPA and should
     not be used to infer EPA approval of the views expressed.

-------
                                     PREFACE
    This volume is a compendium of the papers and commentaries that were presented at
the third of a series of conferences on interpretation of environmental data conducted by
the American Statistical Association and the U.S. Environmental Protection Agency's
Statistical Policy Branch of the Office of Standards and Regulations/Office of Policy,
Planning, and Evaluation.
    The purpose of these conferences is to provide a forum in which professionals from
the academic, private, and public sectors exchange ideas on statistical problems that
confront EPA in its charge to protect the public and the environment through regulation of
toxic exposures. They provide a unique opportunity for Agency statisticians and scientists
to interact with their counterparts in the private sector.
    The conference itself and these proceedings are primarily the result of the efforts of
the authors and discussants.  The discussants not only provided their input to the
proceedings but also reviewed the papers for the purpose of suggesting changes to the
authors.  The coordination of the conference and of the publication of the proceedings was
carried out by Mary Esther Barnes and Lee L. Decker of the ASA staff. The ASA
Committee on Statistics and the Environment was instrumental in developing this series of
conferences.
    The views presented in this conference are those of individual writers and should not
be construed as reflecting the official position  of any agency or organization.
    Following the first conference, "Current Assessment of Combined Toxicant Effects,"
in May 1986, and the second conference, "Statistical Issues in Combining Environmental
Studies," in October 1986, the third conference, "Sampling and Site  Selection in
Environmental Studies," was held in May 1987.  One additional conference, "Compliance
Sampling," was held in October 1987. A proceedings volume will also be published from
this conference.

                                Walter Liggett, Editor
                                National Institute of Standards and Technology
                                          Hi

-------
                                  INTRODUCTION
    The eight papers and accompanying discussions in these proceedings are about drawing
conclusions in environmental studies.  These papers provide valuable guidance in the
planning of future environmental studies.
    The papers address many aspects of environmental studies. The studies discussed
involve air, water, ground water, and soil. These studies are aimed at specific goals as
diverse as the effect of a regulatory intervention, the design of a remediation effort, the
spatial distribution of a hazardous material, the validity of an environmental model, and
the impact of a power plant. Some  studies emphasize in addition the planning of the field
work and of the chemical analyses in the laboratory.  The studies employ techniques from
various statistical areas such as probability sampling, response surface analysis, optimal
design of experiments, time series analysis, spatial prediction, power transformations, and
the analysis of variance. In the planning of an environmental study when almost all
options are still open, most of these aspects are potentially relevant.
    These proceedings are intended for statisticians  involved in the planning of
environmental studies.  Statistical planning is based on anticipation of the statistical
analysis to be performed so that the necessary data can be collected.  These proceedings
should help the statistician anticipate the analysis to be performed.  In addition, the
papers discuss implications for planning new studies.  No general prescriptions for planning
are offered; none may be possible.
    The emphases in these papers are quite different.  No two authors have chosen the
same aspect of environmental studies to  examine.  This diversity among authors who have
all invested considerable time and effort in environmental studies suggests a major
challenge. The challenge is to consider carefully each study aspect in the planning
process. Meeting this challenge will require a high degree of professionalism from the
statistician involved in  an environmental sutdy.

                                Walter Liggett, Editor
                                National Institute  of Standards and Technology
                                             iv

-------
                               INDEX OF AUTHORS
Bailey, R. Clifton	  73
Cressie, Noel A.C	  25
Englund, Evan J	  31
Folsom, Ralph E	  41
Hudak, G	   1
Jernigan, Robert W	  54
Johnson, W. Barnes 	  23
Liu, Lon-Mu 	   1
Livingston, Richard A	  55
Peterson, Bruce 	  70
Splitstone, Douglas E	   15
Stewart-Oaten, Allan 	  57
Thrall, Anthony D	  52
Tiao, George C	   1
Warren, John	  40

-------
                 TABLE  OF CONTENTS
 Preface
                                                                         111
Introduction. WALTER S. LIGGETT, National Institute of Standards and
Technology                                                                iv

Index of Authors                                                            v

      I. THE STATISTICAL BASIS: RANDOMIZATION AND PROCESS CONTROL
A Statistical Assessment of the Effect of the Arizona Car Inspection/
Maintenance Program on Ambient CO Air Quality in Phoenix, Arizona.
LON-MU LIU, G. HUDAK, GEORGE C. TIAO, University of Chicago                1

Sampling Design: Some Very Practical Considerations. DOUGLAS E.
SPLITSTONE, IT Corporation                                                15

Discussion.  W. BARNES JOHNSON, U. S. Environmental Protection
Agency                                                                   23

            H. INFERENCE ON CONTINUOUS SPATIAL DISTRIBUTIONS
Spatial Prediction and Site Selection. NOEL A. C. CRESSIE, Iowa
State University                                                           25

Spatial Autocorrelation: Implications for Sampling and Estimation.
EVAN J. ENGLUND, U. S. Environmental Protection Agency                      31

Discussion.  JOHN WARREN, U. S. Environmental Protection Agency               40

               ffl. DESIGNS BASED ON AUXILIARY INFORMATION
Sampling and Modeling Pollutant Plumes: Methods Combining Direct
Measurements and Remote Sensing Data.  RALPH E. FOLSOM, Research
Triangle Institute                                                           41

"Validation" of Air Pollution Dispersion Models.  ANTHONY D.
THRALL, Electric Power Research Institute                                    52

Modeling Pollutant Plumes. ROBERT W. JERNIGAN, The American
University and Statistical Policy Branch, Environmental
Protection Agency                                                         54

Estimating the Spatial Uncertainty of Inferred Rates of Dry
Acidic Deposition. RICHARD A. LIVINGSTON, University of Maryland              55

                   IV.  STATISTICAL COMPARISON OF SITES
Assessing Effects on Fluctuating Populations: Tests and Diagnostics.
ALLAN STEWART-OATEN, University of California-Santa Barbara                 57

Comparisons with Background Environment: Strategies for Design.
BRUCE PETERSON, CH2M Hill                                              70

Discussion.  R. CLIFTON BAILEY, U. S. Environmental Protection Agency           73

Appendix A: Program                                                       75

Appendix B: Conference Participants                                          77
                                     vii

-------
          A STATISTICAL ASSESSMENT OF THE EFFECT OP THE ARIZONA CAR INSPECTION/MAINTENANCE PROGRAM
                            ON AMBIENT CO AIR QUALITY IN PHOENIX, ARIZONA
                                              Lon-Mu Liu
                                               G. Hudak
                                               G. C. Tiao
       University of Chicago, Graduate School of Business, 1101 E. 58th Street, Chicago, IL 60637
1. INTRODUCTION AND SUMMARY OF FINDINGS
   This paper  presents a  statistical analysis
of ambient  carbon  monoxide concentrations over
the   period   1971   to  1982   at   three  air
Monitoring sites in the state  of Arizona.  All
three  sites (Phoenix Central,  Sunnyslop*,  and
Phoenix South) are  located in  the Phoenix area
where  a  vehicle  inspection  and  maintenance
(I/H  or VEI)  program has  been  in effect since
January  1977.    The  principal  objectives  of
this  study  are to assess  the  trend  in  the CO
concentrations  which  can   be  associated with
the  Federal  Motor  Vehicle  Emission  Control
Program to  determine  whether  or not  the  I/M
program in  the Phoenix area has  had a positive
impact on the concentration level.
   A  summary  of  our  principal  findings  is
given  in  this  section.   Section  2  provides a
description  of   the   nature  of   the  carbon
monoxide,  traffic  and  meteorological data used
in the analysis.   In  Section 3  a preliminary
trend  analysis  based  on the CO  readings alone
is  given.     In  Section  4   diurnal  models
relating  CO   to  traffic   and  meteorological
factors are constructed.   Such models serve to
identify    the   major   exogenous   variables
affecting  CO.    In  Section  5,  time  series
intervention models for  monthly  means  of  CO
readings  are   given.      Traffic   volume  and
relative humidity  (a  proxy  for  mixing height)
are    used   as    exogenous   variables,   and
appropriate  functional  forms  to  model  the
effects of  Federal emission standards  and  the
I/M  program   are   constructed.     Finally  in
Section 6,  a  summary  of  the  main  results in
trend  analysis  together  with  an  assessment of
the impact of the I/M program is presented.
   Our principal findings  are as follows:
(i)    At  all  three sites one  can  observe  a
reduction    in   ambient    CO   concentration
levels.   The  decrease at Phoenix  Central is
the    largest.        Considering    the    CO
concentrations  alone,  this  decrease  is about
3.6%   per   year   over   the  period  1973-1982.
Further,  the  reduction at Phoenix  Central is
higher  in  the  winter months,  5.7%  per year,
than in the summer months,   2.3% per year.
(ii)   Based on models  for  the  monthly means of
CO  at Phoenix Central  which  adjust for  the
effects of  traffic changes  and meteorological
variations,   the  winter   trend  reduction  is
estimated at  about  7.1%  per  year  while  the
summer reduction is about  2.3% per year.
(ill)   Yearly  percentage  changes  in CO  (based
on  adjusted   monthly  readings)   at  Phoenix
Central are compared with  two  sets of emission
factors derived from  MOBILE3  analysis.   One
set of factors  includes   the  expected effects
of the I/M program and the other does not.   It
is found  that, over the period  1973-1982,  the
year  to year changes  in  CO concentrations in
the winter  months are in  good  agreement with
changes in  the set of  emission  factors which
includes the effects of the I/M program.
Provided   that   the  emission   factors   are
accurate, there is then some evidence from the
observed  ambient  CO  concentration levels  to
support  the hypothesis that the I/M program has
had a positive impact on ambient CO levels.
(iv)   Analyses  of diurnal  models of  CO over
the period  1977-1982  produced  trend estimates
largely  consistent  with  estimates  based  on
monthly averages.
2.  Data Base
2.1 The Arizona Inspection and Maintenance
    Program
   The  United States  Clean  Air  Act Amendments
of  1977 require that  certain  states implement
vehicle  inspection  and  maintenance  programs
(I/M  programs)   in   certain  of  their  major
cities  to reduce  hydrocarbon  (HO  and carbon
monoxide  (CO) emissions  from  gasoline powered
vehicles.   Arizona Revised Statutes $36-1772,
which    established    the   Vehicle   Emissions
Inspection    (VEI)   Program,   requires   that
gasoline   powered  vehicles   pass  an  annual
inspection   to   ensure   that   their  exhaust
emissions  meet  standards  established  by  the
Department  of  Health  Services  (DHS).   This
program was  initiated  in 1976 on a trial  basis
in  the  Phoenix  and  Tucson  areas,  and repairs
became   mandatory   in  January   1977.      The
inspection   requirment  applies  generally   to
gasoline powered vehicles which  are less than
14  years  old  and  located within designated
portions of  Pima  and Maricopa  Counties  which
do  not  meet  the  carbon  monoxide standards  of
the Federal  Clean Air Act.   According to  the
VEI program  vehicles  are   tested  annually  to
ensure   that  carbon monoxide  and  hydrocarbons
in   their  exhaust  emissions  meet  standards
established by DHS.   Motorists whose  vehicles
fail  to meet these standards must  repair their
vehicles and submit  to a retest.
    Data on   the  annual number  of  inspections
and failure  rates  obtained from  the  Arizona
DHS.    Generally  speaking,  the monthly  failure
rates  between  1977  and  1983  range between  15
and 25  percent,  and  they  are  higher in  1979
and 1980 than in other years.
 2.2 Carbon Monoxide Data (COj
    The    data    consists    of    hourly    CO
concentrations (ppm)  recorded at three Phoenix
 sites.   The Phoenix CO measurment locations
 ares   Phoenix Central (at  1845 B.  Roosevelt in
 downtown). Sunny9lope (at  8531 N.  6th Street),
 and Phoenix South (at 4732 S. Central).
    live  hourly  data  on  CO  concentrations  at
 these  three  stations vary  in length:  Phoenix
 Central  (71/1 -  82/12),  Sunnyslope   (74/10  -
 82/12),  and  Phoenix  South (74/10  -  82/12).
 Missing  data  occur   at   all   three   sites.
 Phoenix Central  has one month  (January,  1979)
 in  which  data  are  completely  missing,  the
 other  two  sites  have  several  months completely
 missing.

-------
2.3 Traffic  (TR)
   Many  studies  have  shown  that ambient  CO
concentrations  are  approximately proportional
to traffic  (Tiao,  Box and Hamming (1975), Tiao
and Hillmer  (1978),  Ledolter and Tiao (1979b),
Tiao, Ledolter and Hudak  (1982)).  It is,
therefore, necessary to incorporate changes in
traffic  into the  trend analysis.  Ideally, one
would  want   to  use   traffic  data  recorded
throughout  the area affecting  CO measurement
sites  over  the  entire  period  under  study.
Unfortunately,  such  detailed  data  are  not
available for this study.
   Estimates  of  relative  traffic volume  per
day  in  the  vincinity  of Phoenix and  Central
and Phoenix  South between  1970  and  1983 were
provided by  the Bureau of  Air Quality Control
(BAQC),    Arizona    Department   of    Health
Services.   These figures are  listed  in  Table
2.1 Adjustment fractions for different  months
of  a  year,   different  days  of  a  week,  and
different hours of a day  were also provided by
the  Bureau  of  Air   Quality  Control  and  are
listed in Table 2.2.
2.4 Meteorological Data
   Apart  from traffic,  variations  in  ambient
CO  concentrations   are  to   a  large  extent
affected by meteorological conditions:
(a) wind speed and  wind direction  affect the
transport and  difussion  of  CO emissions  (with
low wind speed resulting  in high  CO levels);
(b)  inversion  (mixing   heights)  affects  the
volume  of air  available for  dilution   of  CO
emissions;
(c) temperature,  solar intensity and relative
humidity  are   related  to   the  duration  and
intensity  of  temperature  inversions  and  the
degree of vertical mixing;
(d)    meteorological   variables    (such   as
temperature)  influence the  efficiency  factors
of car  engines (cold  starts  leading to  higher
CO  levels).      Thus,   one   should   consider
incorporating  these  variables  as  exogenous
factors in a trend analysis of CO.
   Meteorological data were  obtained  from the
National   Climatic   Center   (NCC)   for   the
monitoring  station  of   the  -National  Weather
Service  in  Phoenix   (near  Phoenix  Central).
Data on  wind  speed  (WS,  knots), wind direction
(HO,   tens  of  degree),  temperature   (TP,  °F),
relative    humidity    (RH,    percent)     and
precipitation   frequency    (PREC,   0-60)   were
obtained   for   the    time   period  1971/1
1982/12.   Unfortunately,  mixing height  data
were   not   available.     A   closely   related
variable  whose  data  was  available  is  delta
temperature   (AT).    -The   AT  variable  is the
difference in  temperature readings recorded at
different heights.   Since   AT    is  a measure
of atmospheric  stability, it may  be used as a
proxy   for   mixing   heights.     Hourly    AT
measurements were  made by the  BAQC  at the 6th
Street and Bultler, where
       AT - temperature (°F) at 30 feet
          - temperature (°F) at 8 feet
The  hourly   AT  data are  available  between
77/5  -  82/12  but with  many missing  days and
months.
3. Preliminary Trend Analysis of CO
   The primary  objective of  this study  is to
assess statistical  evidence  of the  impact of
the  Arizona  I/M program  on  the  ambient  CO
concentration  levels.    Our  approach to  this
problem  is  as  follows.   We first present in
this  section   a  preliminary  trend  analysis
based on the CO  data alone.   The effects of
traffic   changes   and   the   influences   of
meteorological  variables are  then  considered
in  the  next  two  sections  (Sections  4  and 5)
where  various  models  relating  CO at  Phoenix
Central   to   these   exogenous  factors   are
constructed. Finally,  in Section 6,  we compare
the   model-based   trend  estimates   with  the
expected  reduction   in CO  emissions  based on
EPA  MOBILES analysis  applied  to the  Phoenix
Central  location  with  and  without  the  I/M
program.
3.1 Preliminary CO Trend Analysis for
    Phoenix Central
      Figure 3.1 shows monthly means of  CO at
Phoenix Central.  It is  seen that there was an
apparent  down  trend  in CO over  the  period
1973-1982.   (We have  been  informed by  the DHS
that  CO  data  for  the  period  1971/1  - 1972/3
received  by us  are incorrect  and hence -they
have  not been  used in  our  analysis.)    As  a
first  approximation,  if we  assumed a  linear
time  trend  operating  from  1973-1982,  then the
estimated  CO  reduction  in  the  yearly  means
would be about 3.6% per year.   Further study
of   the  data   shows   that   the   percentage
reduction is higher in the  winter months  (Oct.
-  Feb.)   than  in the  summer  months   (April  -
August).   Specifically,  over  the period 1973-
1982, the decrease   is about 2.3% per  year in
the  summer  months  and  5.7% per  year  in the
winter months.
3.2 Preliminary CO Trend Analysis for
    Sunnyslope and Phoenix  South
   For   the  Sunnyslope  location,  Figure  3.2
shows that  over the period  1974-1982 there are
many  months for which data were missing.   If
we exclude  the  four years containing months of
missing    data  (i.e.,  1974,  1979,  1980 and
1982),  we  see that  the  reduction  in  CO is
smaller   than   that  at  the  Phoenix  Central
location.     In   particular,   the   estimated
reduction  in the yearly means  of CO  here is
about 2.6% per year.
   Similarly,  for   the Phoenix South  site, we
see  from Figure  3.3  that   there  are  gaps of
missing  data.   Excluding the three years  1974,
1979  and 1980,  the  estimated reduction in the
yearly  means of  CO  is about  2.4%  per year,
which  is   again  smaller  than  that  at  the
Phoenix  Central site.
3.3 General Remarks
   The   preliminary  trend   calculations  were
based on the CO  readings  alone.   Recall  from
Table  2.1  that the  traffic  volume  increased
steadily over   the  years   1970-1982  in  the
Phoenix  area.   Thus,  the  estimated reduction
in  CO   emissions  would be higher  than  the
figures  given  above when the traffic increases
are  factored  into   the  analysis.    Also,  the
preceding  analysis   did  not  take into account
the  influence  of  meteorological variables.  We
now  turn to discuss  various moels relating CO

-------
 to   exogenous    traffic   and   meteorological
 factors.   In view of the fact  that  the  Phoenix
 Central  site has the longest and most complete
 data  record on  CO and  that the concentration
 level  is also much  higher  there than those  of
 the   two   other  sites,   we   shall   confine
 ourselves  to the  Phoenix Central  data  in our
 modeling study.
 4. Diurnal  Models  of CO
    The  focus of  this  section  is  on modeling
 the  diurnal  behavior of  CO  which  serves  to
 identify  the main  factors  affecting  CO, and
 thus  motivates   the  trend  models discussed  in
 Sections 5  and 6.
    As mentioned  in Section 2, detailed  traffic
 information,  such  as hourly  traffic   counts,
 was not  available  for the area  influencing the
 CO measurement sites.  In what  follows, we use
 the  estimated   relative traffic volume  over
 1971-1982  pertaining  to  Phoenix Central and
 the adjustment  fractions for different  months,
 day and  hours of  a  day provided by BAQC  (see
 Tables 2.1  and 2.2).
    We   have   found   that   analyses    using
 logarithmic  transformation  of   the  variables
 seem  to  be more  meaningful than  those  using
 the   original   variables.       The   following
 notations will be  employed:
  Carbon Monoxide         LCO -  In(CO  + 0.25)
  Traffic                 LTR -  ln(TR)
  Inverse of Wind Speed   LIW -  ln(l/(WS  +
                                 0.25))
  Temperature             LTP »  ln(TP)
  Relative Humidity       LRH -  ln(RH)
  Delta Temperature       LAT -  In (AT  + c),
                   c - 2.0 for winter  seasons
                   c • 4.0 for summer  seasons
 4.1 Formulation of the Models
    In general,  we can write a' model relating
 CO  to the   input  traffic  and  meteorological
 variables as
                CO - g(TR)f(met.)
 where g(TR)  is a function of the input traffic
 and  f(met.) is  a function of meteorological
•variables.    The  function  g(TR) will  depend,
 among   other   things,   on   the   units   of
 measurements   of   CO   and   TR.      As   an
 approximation it  seems  reasonable  to  suppose

 *""              8(W - k  TR«-         ».!)'
                 where k  Is  a constant measuring
  the  CO   emissions   and    BI   measures  the
  percentage   change in CO due  to a  one  percent
  change in  TR.
     Upon  studying the diurnal  diagrams of  CO  vs ,
  various   meteorological  variables,  we   have
  found that the  major  meteorological  factors
  influencing the diurnal behavior of  CO  are AT
  and the  wind speed (WS)  -or  its inverse
  IW^JS'1.      We  may   also   approximate  the
  function ftmet.) as
             f(met.) - c(AT)e2  (IW)S3
  so  that  we  have  a model  of  the-multiplicative
  form
              CO - <* T*
                                            (4.3)
                  where   ui - ck.     For  trend
assessment,  we  can make  the constant  k  to be
dependent  on time, and  in particular,  we mav
write  *
                 k • k. «-h«T
                  where   kg  is  a constant,  T
 measures    the   time    unit    (year)    under
 consideration and   kj^   the percentage  change
 in  CO  emissions  per unit time.
   We   are   thus   led   to   a   model  of   the
multiplicative form
            CO - «  TR8«UT)Bl(IW)9»u
                     where  u is the error term,
 and upon taking logarithms,
  LCD - 00 * BfLTR + BjLAT + 32UW + «    (4.4)
 where   0  - ln(o>) and e - ln(u).    For  diurnal
 data,  let  t stands  for the t-th hour  of  the
 day,   t»l,...,24.    Because   of  the  dynamic
 nature  of   the  traffic   and  meteorological
 effects,  we would expect that    LCD    could be
 related  to some  linear  aggregates of  the past
 values of  these exogenous  variables and also
 that   the   error  term   efc   would  be  serially
 correlated.  As  an approximation, we  consider
 the diurnal  model
         *«t-1 + »t
                               where  at's   are
           .-  • .    *.            «».=*.«  «£ a   are
white   noisjs  with   zero   means  and   common
variance   o  .

      tg\
    IT*,., • (l-*)[lT»t_| * «i.THt.j + ^LTHt.j + ...]
                   —— (A)           frill
and similarly for  UATt.,   and   LIW^.,.
For parameters  estimation,  we  can write  (4.5)
in the alternative form
         - BO*
                                          (4.6)
 where 8-j* - B-jUr*), j - 0,..., 3. This is  in the
 form of a  linear"  regression  model  and  hence
 the parameters   6j*'s and *  can  be estimated
 from standard least squares' routines.
    To distinguish  the behavior  of  CO  between
 the summer  and  the winter,  we have estimated
 the model  (4.6)  separately for  each of  these
 two seasons employing  all available  data,  over
 the years.   For  each season, we let LCO(Y)
 LTR<  ',  LAT(iJ,  LIW(i>   represent  the  average
 readings of these  variables at  the.. t-th  hour
 for  the i-the  year,  and  let    B(l'   be  a
 separate constant  for   each   of  °the  years
 considered.  Thus
                                                              LCO
             is  the  diurnal
 traffic  pattern listed in Table  2.2 divided by
 the annual  traffic volume  in  that year,  and
 then  multiplied  by the  annual  traffic  volume
 in  1972  for  normalization.    For  the  summer
 diurnal  model,  data  from  1977  to  1982  are
 complete.  Note  that in model  (4.7), the term
 60.    may  vary  from   year  to  year,  but  the
 coefficient    , Bj..,   B2*,  and  83.   are  held
 constant over  the years.
   For the convenience of trend  assessment,  we
may  impose a  simple  linear  time  function  on
 the yearly constants   B^i',  such as

-------
 Thus the model  in (4.7) "can be written as.

    iCOt("  - «o
                  + Bjaliw, + .           (fc.S)
 Note that from  (4.3), here we have i»T,
 kj* - k !
-------
        00)
                                           (5.5)
                      I
                                  t-1.2....
                                           (5.6)
where   Yt: LCOt, logarithm of fCCy- 0.25),  or

    LCO*, logarithm of ((COt+ 0.025)/TR(i))

      •208.275,   i - 1972,...,1982
 WNt:    winter months  (October-February)
         indicator,  i.e.,
         WNt  « 1   if it is  a winter  month
    (i)      * °   otherwise
 IDSt  :summer months  (April-August)

        indicator

        for the i-th year,  i.e.,

        IDSt  :- 1 if it is a summer month in
        the i-th year

        « 0 otherwise

        i-1 for  1974,...,  i-9 for 1982

 IDV»t3  s    winter month indicator for the j-th
           year,  i.e.,

 IDW^    - 1  if  it is a winter month in the
           j-th  year
        • 0  otherwise
           j  - 1  for 1973/74,...,  j-9 for
              1981/82

 TSt:   summer trend
       TSt « 1 for all summer months in 1973
       TSt - 2 for all summer months in 1974
              .
              .
              .
        TSt .- 10 for all summer months in 1982
        TSt -  0 otherwise
 TOt:  winter trend
        Wt  » 1 for all winter months
                in 1972/1973
        TOt - 2  for all winter months
               in 1973/1974
          TO « 10 for all winter month in
                1981/1982
          TO «•  0 otherwise
    In  the   above   two  models.  Model   (5.5)
 provides  more  detailed  information  of   the
 change  of  CO  concentrations  from  year  to
 year.  Specifically,   a^  measures  the  effect
 (percentage change)  of CO in  the i-th  summer
 compared with that for the base year 1973,  and
 a2j   the  effect (percentage change)  of  CO  in
 the   j-th  winter  compared  with  that  of   the
 1972/73 winter.   Instead,  Model  (5.6) assumes
 linear time  trends  and  8± and 92  represent
 the  percentage reductions of CO concentrations
 per    year   in    the   summer   and    winter
 respectively.
5.2 Model Estimation
   The  parameters  in models  (5.5)  and  (5.6)
are  fitted  to  monthly  averages of  CO,  CO*
and  RH  using the  SCA statistical  system (Liu
et  al,  1983).    The estimation  results for
Phoenix Central are listed in  Table 5.1  for
LCOt and Table 5.2 for  LO>t.
5.3 Trend Analysis for Phoenix Central
   For   LCOt    (without normalization  by the
traffic  volumes),  Figuere 5.1(a)  gives  a plot
of the  estimated effects  a2j '  (compared with
the  based  year,  winter  of  1972/73) for the
winters  of  1973/74  to  1981/82,  and  Figure
5.l(b) presents  a  plot  of  o.  .  (compared with
the base year, summer of 19737 for the  summers
of  1974 to 1982.   Similarly,  Figures  5.2 (a)
and  (b)  give the corresponding, plots of  LCo£
(after  traffic  normalization).    From  these
figures  and the  estimates  in Tables  5.1 and
5.2,  we make the following observations:
(i)  Both in  the winters  and in  the summers,
the down  trend  in CO is  more  pronounced after
traffic normalization.   This is, of course, to
be  expected  since   the  traffic  volume  was
steadily   increasing  throughout   the  period
1973-1982 considered.
(ii) By  comparing Figures 5.Ha)  with  5.2(a),
and Figures  5.Kb) with  5.2(b) we see that the
relative changes  in the  yearly  effects  are in
fact  much   closer  between   LCOt    and   LCOt
from  1976  onward compared  with   the  changes
before  1976.   This  is  also  reasonable since
the  annual  percentage   increase  in  traffic
volume was  much  larger before  1976 - see Table
2.1.    Thus,  normalization  by  traffic  volume
has  its major  impact  on  the  trend movement
during  the  early, part  of   the   data  period
considered.
(iii) For  the normalized  LCO*.    data, model
(5.6) yields  an estimated  down trend of  7.1%
per year  in the winter  season over the en.tire
period  1972/73  to 1981/82.    The  magnitude  of
the percentage  reduction in the  summer  season
is much smaller, 2.3% per year.   Also, figures
5.2(a)  and  (b)  show that  the  summer effects
exhibit  considerably more variability than the
winter  effects  do  from  year  to  year.    In
Section  6,  we  shall compare  these  estimated
effects   with   the   expected   reduction  in'
emissions  from EPA MOBILES analysis.
6. Summary of Trend Analysis  and Assessment of
the  Impact of   the  I/M Program  at  Phoenix
Central
   In  Sections  3-5,  we  have performed  several
types    of   trend   analyses   on   the   CO
concentration  levels.  Our main focus has been
on  CO  at   Phoenix  Central where   the readings
are  substantially  higher  than  those  at the
other  two  sites. Sunnyslope and Phoenix South,
and  also  where . the  data span  is the  longest
and  most  complete  of  these  three  locations.
The  main  results  at Phoenix Central  can be
partially  summarized  as  follows:
 (i)   Over . the   time  period  April  1972  to
December  1981 studied,' there  is clear evidence
of a down  trend  in the CO concentrations
 (ii)  The  down  trend is more  pronounced and
smoother  in   the   winter   months   (October-
February),     while     considerably    higher
variability exists  in  the  trend  movement in

-------
 the  summer  months  (April-August).   Note  that
 CO  concentration   levels  are   substantially
 higher  in  the winter.
 (iii)  For  the  winter  months  over  the period
 1972/1973  to 1981/1982:
    (a)  preliminary  analysis  of  monthly
 averages   of  CO   readings   alone  yields  an
 estimated  trend reduction of 5.6% per year
    (b)   after  allowing   for   traffic    and
 meteorological  changes,  our  analysis  of the
 monthly CO  averages   shows an  reduction of
 about 7.1% per year.
 (iv)  Over the  four winters 1977/78,  1979/80,
 1980/81   and   1981/82,   our   diurnal   model
 relating   CO  to  traffic   and   meteorological
 factors  yield  an  estimated reduction  of 15%
 per year.
 6.1  Assessment   of  the  Impact  of   the  I/M
 Program
   Empirical  assessment  of  the  impact of the
Arizona  I/M  program on CO concentration levels
is   made  difficult  by  (i)  the  confounding
between   the   expected   effect   of  Federal
emission standards  and that of the I/M program
and  (it)  the  lack  of a  comparable  "control"
site without the I/M program.   One approach to
this problem  would  be  to construct  a  time
series  model  of the monthly  averages of  CO
with a  linear  time  function .characterizing the
general   down   trend   in  the   data  and  an
additional  term  representing  possible change
in level of slope at  the  inception  of the  I/M
program.   For  examples,  we may contemplate the
models
        C0t -ao*ait + «
   or
C0t -
                                            (6'2)
       .,t*^(t.«.}.tct-»*  k
signifies the inception of  the 'I/M
       "   ,<'•>.}'     "-
                   (1     tit.
      Nt us the noise term which could be auto-
correlatated.     In   (6.1)  ' 03   measures ' the
level  change  and in  (6.2)    a2   measures   the
slope  change  associated  with  the I/M  program.
One  may then estimate  the   a's  from the data
and  test  the  significance  of    a,  .    This
approach  has  been  adopted  by   McCleary  and
Nienstedt  (1983)  in  their study of the Arizona
DO data,  and  they have concluded that  the   I/M
program has had no impact because  the estimated

a2   from models of the form  (6.1) or (6.2)   is
not  statistically significant.   Their  results
can  be  largely  seen  from  the  plots  of  the
estimated  yearly effects  of '  LCO^  in Figure
5.2.     Consider,  for   example,   the  winter
situation.   Clearly  there is  little evidence
from  these yearly effects to  support either a
change of  level or slope beginning in  1977/78.
   In the absence of any information about the
magnitude  of  expected effects of either  the
Federal   Emission  standard  and/or   the  I/M
program,  and  assuming that the   linear  time
functions  in  (6.1)  and  (6.2)   aresuitable
approximations  to the  shapes   of  the  expected
impacts   of   these   two   types   of   control
measures,   the   above   approach  would   be
 useful.    On  the  other hand,  when  accurate
 information   about   the   magnitude  of   the
 expected    effects    is   available,    other
 formulations of the problem are possible.
    Table- 6.1  lists  the CO  emission .factors
 from  1972 to  1982 with and  without the  I/M
 program   for   the  central  area   of  Phoenix
 supplied  to  us  by the  DHS.    These  factors
 were  derived  from EPA  MOBILES analysis  with
 adjustment  for   vehicle  tampering/misfueling
 and  varying  speed over years.     The  yearly
 factors  correspond to estimates for January 1
 of  the  years.    Assuming  that these  factors
 accurately represent   the  expected  effects  of
 the   Federal  emission   standards  and   the
 expected  impact   of the  I/M  program,  we  can
 then employ  the  CO readings to assess  to what
 extent trends  in  the  emission  factors with and
 without  I/M  are  compatible with evidence  from
 the data.   For   this  analysis, we shall  use
 monthly   averages  of   the  CO   readings   as
 discussed in Section 5.
    Figure 6.1(a)   shows plots of the logarithms
 of the emission factors over time where a dot
 "  "  corresponds  to the  situation  without  I/M
 and a cross   "x"  to  the   situation with  I/M
 effects  included.   The  points  in  this  figure
 then  represent   the   percentage   changes   in
 emissions from year to  year  with  and  without
 the effects of the I/M  program.   Superimposed
 in the same figure by the  circles  "0" are  the
 yearly effects of   LCo£  for the winter  months
 estimated from model   (5.5)  and shown  earlier
 in Figure 5.2(a).   It  is clear  that the  trends
 in  the   estimated  yearly  effects  are   in
 reasonable agreement with emission factors
 with the  I/M  program.
    In   a  similar   way.  Figure  6.Kb)   shows
 logarithms of  the emission  factors  together
 with  the yearly   effects   of   LCO*   for  the
 summer   seasons    shown   earlier   in   Figure
 5.2(b).    In  this case,  the  estimated  yearly
 effects  are in better agreement with  emission
 factors  without the I/M  program.   However,  it
 should again  be noted  that CO readings in  the
 winter months  are substantially   higher  and
 that  the fluctuations  in  the  summer  yearly
 effects are much  larger.
    To  formalize the analysis, we let  f.(t)
 be a  function  whose  values are given by  the
 logarithms of  the  emission factors with  the
 I/M program (i.e.,  dots  from  1972   to 1977  and
 crosses  afterwards) and  fQ
-------
where
«g(t)
1  t in summer months
0  otherwise
                                                        Model  (6.4)
                                                                           8
                                                                            SO
          6 "(t) »   1   t  in winter months-
                   0   otherwise
Thus,  the  model  (6.3) aims  at  assessing  to
what     extent     8S1 - 1 and  8^1-1      are
supported by  the data  (i.e.,  the  "with  I/M"
hypothesis), whilst  (6.4) is  intended for
              6SO - 1 and 6WO - 1
 (the  "without I/M" hypothesis)
 (6.3)  and  (6.4)  gives
 Model  (6.3)                  estimate
                  6
                                  Estimation of
                   SI

                  9wi
                  
-------
                   Table 2.1
 Relative  Traffic Volume Per Day  in  Phoenix  Area
Year
 70
 71
 72
 73"
 7U
 75"
 76
 77"
 78
 79"
 80
 81
 82"
 83
   Phoenix
   Central

   131.000
   151.500
   168.400
   181.900
   193.00
   202.200
   '08.300
   212.000
   214.800
   216.400
   217.900
   218.700
   219.300
   219.700
   Phoenix
   South

   17.733
   18.467
   19.200
   19.933
   20.667
   21.400
   22.133
   22.867
   23.600
   24.333
   25.066
   25.799
   26.533
   27.266
(•): interpolated data
                    Table 2.2
               Adjustment Fractions
            for Months, Days and Hours
              Adjustment
 Hr.
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
                  02
                  04
                  03
                  02
                 .97
                 .96

          Day of Week

             Hon.
             Tues.
             Weds.
             Thurs.
             Fri.
Month
July
Aug.
Sep.
Oct.
Nov.
Dec.
Adjustment
.94
.96
.98
1.00
1.02
1.05
                  Adjustment

                    .96
                    .96
                    .98
                   1.00
                   1.09
           Hour of Weekday Adjustments
Adjustment
    .011
    .007
    .004
    .004
    .005
    .012
    .035
    .069
    .057
    .051
    .054
    .059
Hr.
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
Adjustment
    .060
    .059
    .064
    .072
    .081
    .076
    .056
    .045
    .037
    .034
    .028
    .020

-------
                              Figure 3.1
               Plot of Monthly CO Means »t Phoenix Central
71 •  72  «  73 •  7<»   '  75   *  76   •  77   *   78  •   79  I 80   I 81   •  62   '
                                Figure 3.2
                   Plot of Monthly CO Means at Sunnyslope
            •  75 «  76  •  77  *  78  •  79  • 80   «  81  «  82   I
                                Figure 3.3
                    Plot of Monthly CO Means at Phoenix South
            •  75   «7o   •  77  •  78  «79  • 80   «  81  •  82  I

-------
                             Table   4.1
       EltlMtlen Reaultt for  the Diurnal nodela  (4.7) and  (4.9)
 (•)  Winter Seaioni   (1977/78  -  1981/82)
Node! (4.7)
•a1!' (1977/78)
fl.1*' (1979/80)
B*^ (1980/81)
B.'|' (1981/82)
a'*
»
2.260 (.369)
2.161 (.344)
1.958 (.318)
1.867 (.275)
.408 (.058)
.266 (.028)
.407 (.040)
.(41 (.157)
.194
Hodel (4.9)


aa
"1*
Bin
6,.
a


2.K3 (.357)
-.088 (.025)
.435 (.057)
.258 (.028)
.401 (.040)
•553 (.138)
.I9(
(b)   $
            teiaona  (1977 - 1982)
Model (4.7)
fl«'i" (1977)
B.(2) (1978)
B.',' (1979)
B.'f (1980)
B.(|l (1981)
fl.'? (1982)
5"
82*
'
.284 (.389)
.123 (.374)
• 292 (.373)
.197 (.343)
.138 (.297)
.183 (.288)
.467 (.0(9)
.117 (.035)
• 386 (.051)
.086 (.148)
.250
Hodel (4.9)




at
"l«
S1*
Bj*
Bj»
V




.030 (.375)
-.002 (.024)
.530 (.0(4)
.102 (.034)
•357 (.050)
.001 (.1)1)
• 252
                          Figure  4.1 (a)
Otaaerved (A) vi predicted (B) hourly CO avaragaa of Phoenix Central
                          (winter tenon)
                                                                                      CO
                                                                                            8   16
                                                                                                                16         8  16
                                                                                                                    Hour of the Day
                                                8   16
                                                                                                              Figure  4.l(b)
                                                                                    Obierved (A)  v« predicted O)  hourly CO averagaa of Phoenix Central
                                                                                                             (auamr aeaeon)
                                                                                 16
                                                                                                 16
                  8   16         8    16
                      Hour  of the Day
                                                                                                                                               16
                                                                                                                                                         8   16

-------
                    T»blt   5-1

EstiMtion Results  for  the  Tin Scries Models (5.5) and  (5.6)

            Yt - LCOt  (10/72  -  9/82)
                       TabU   5-2

EstiMtion Results for the Time  Series Models (5.5)  *nd  (5.6)
              Y. - LCD
                            (10/72  - 9/82)
Model (5.5)
0t<
0.t
74
75 i
76
77 4
78 *
79 •
80 r
81 •
82 an
W 73/74 ai,
W 74/75 an
W 75/76 a,.
W 76/77 ai4
W 77/78 ait
V 78/79 a,.
U 79/80 oi7
W 80/8) ai.
W 81/82 a,.
1 .
u
2.997 (-367)
.447 (.159)
-.033 (.124)
-.000 (.123)
-.364 (.123)
.045 (.123)
-.095 (.123)
.143 (.123)
.006 (.123)
-.177 (.123)
-.120 (.123)
-.206 (.152)
-.218 (.147)
-.113 (.150)
-.287 (.150)
-.282 (.147)
-.354 (.146)
-.382 (.148)
-.577 (.150
-.418 (.148)
-.611 (.098)
.721 (.083)
Model (5.6)
0*i
B.i
0<








0i








r
W
2.850 (.358)
.468 (.139)
-.005 (.011)
(sussMr)







-.048 (.013)
(winter)







-.580 (.098)
.705 (-090)
Modi
P..
0.1
74 an
75 an
76 an
77 ai.
78 a,,
79 a,.
80 air
81 at.
82 ait
W 73/7* ai,
W 74/75 aii
W 75/76 a>.
W 76/77 a, 4
W 77/78 a,,
W 78/79 a,.
W 79/80 air
W 80/81 a»
U 81/82 ai.
•y
w
U (5-5)
3.1*0 .366)
.508 .159)
-.092 .123)
-.106 .123)
-.499 ."3)
-.109 .123)
-.261 .123)
-.031 .123)
-.175 .123)
-,361 .123)
-.307 .123)
-.277 .152)
-.3*2 .U6)
-.278 .149)
-.477 .150)
-.487 .147)
-.570 (.145)
-.606 (.148)
-.807 (.150)
-.650 (.147)
-.614 (.098)
.676 (.083)
Hi
fl..
9.i
0.
0i
r
U
Kill (5.0)
2.927 (.360)
.525 (.140)
-.023 (-011)
-.071 (.013)
(winter)
-.574 (.099)
.655 (.090)
                        Figure  5-'(«)

 Plot of •»limited  yearly  effects of LCOt from model  (5.5)
                       (winter season)
                       Figure  5. 2 (a)

 Plot of estimated  yearly  effects of  LCflJ fro- awdel  (5.5)
                       (winter  season)
                                                                                                                •  IltUttld •ffMt

                                                                                                             i-.-i  :i s.i.

-.1


-.j'




-.*



r T '
i ! T t
1 I 1
Mi s
! f f
i i :

1 *


7V7» 7V7« 77/7«


• ntliatid
>—< si s.i.

T
; T
i "*"
f i
i
1 T '
i J
1 i !
7»/»» | ll/U
i
1
T i
i i
f : T T
A : a .' !
L i 1
; !
^

•
.

i
-
k

r

                                                                                       '  7J/7«77V7« '  7>/ao
                       Figure  5. Kb)

   Plot of estimated yearly effects of LCOt  fro* aodel  (5.5)
                       (sumer season)
                       Figure  $.2(b)

Plot of  estlswted yearly effects of
                       (susver Season)
LCO*
                                                                                                              from xodel  (5.5)
0.1

-a.


-•3-

r
1
i




T i
i i
t !
i •
i ^
.t

T

1

T
i
f
•
i




j r
1 i
i
i






T
1
i
i
i
i


T
1
i
i

*


                7*»
                        76
-a.
-J.

-.5'




t r
* ' : ]'
• 4* ' '
t * I :
i t :
i i *
i ^
i
i
                                                                11

-------
                        Table 6.1

CO Emission Factors (a/mil*) Derived From MOBILES Analysis
    (Phoenix Central,  7*10 am and 7-10 pm in Oct.-Feb.)
Year «
month
72/1
73/1
7 VI
75/1
76/1
77/1
78/1
79/1
80/1
81/1
82/1
83/1
No I/M
88.08
89.06
91.25
91.^3
90.01
88.96
87.35
85.51
81.75
78.23
73.9<»
69.56
with I/M






81.83
72.32
63.31
59.89
56.3fc
52.78
                             12

-------
                      Figure  6.1 (a)
Plot of CO mission factors with  yearly effects in th« winter season
S *t0n
+*
a
^* »
o
•^
CO cl «L
CO ""
•H

a
o

e
£ 4.2-
•£
cd
o

4.0


0 0

o • 0
X

e
0
0 •
X

o
o
x 0

X
X

- ~ .2
> 03
0
*
01
- 4 **
7<
cd
0)
>*
b
4)

- -.6 1
*

•

-.8
          73/74     75/76      77/78     79/80     81/82
                              year

         •  Emission factors  without I/M (based on MOBILE3)
         x  Emission factors  with I/M  (based on MOBILES)
         o  Winter yearly effect based on Model 5.5
                             13

-------
                        Figure 6. Kb)

Plot of CO emission fectors with yearly effects in the summer  season
£ *-61
o
0
o
« 4.4-
o>
~e
""
^4
O

| *»-2-
.^
(<
cd
I

4.0

o
8 ! ' • .
e
, * «
o *
8
X

0

x

0 X
X

-.0

to
o
SH

-------
                               SAMPLING DESIGN:  SOME VERY PRACTICAL CONSIDERATIONS
                                                 D.  E. Splitatone
                            IT Corporation, 2790 Mosside Blvd.,  Honroeville,  PA  15146
INTRODUCTION
   When Waiter Liggett extended the
invitation to participate in this conference,
I glibly responded in the affirmative and in
a cavalier manner conceived of a topic title
that, I perceived, would not get me in too
much trouble.  I will confess the possibility
of following George Tiao gave me some
pause.  However, I am about to provide some
words of what I hope will be wisdom regarding
very practical considerations in sampling
design for the gathering of information
during studies of the environment.
   Because my current position of employment
requires a concentration on the problems
posed by hazardous materials and wastes, this
paper will identify a few of the more
intriguing statistical issues associated with
hazardous waste investigations.  As my
approach to these issues is conditioned on my
subscription to Donald Marquardt's wider view
of the statistical profession, permit me to
quote his definition of that perspective in
order to provide a frame of reference.
   The wider view is that statisticians are
systems guides for tough, fuzzy issues
wherever the collection and interpretation of
data should be involved.  The wider view
expects statisticians to provide the
following:

   o  Full service guidance and
      participation in problem diagnosis
      and solution

   o  Services for problems in any area of
      application

   o  Services at any level of detail or
      abstraction.

From this perspective, individual statistical
methods are viewed as tools or component
elements in an overall systems'approach
(Marquardt, 1987).
   Philisophically, the design of any
sampling scheme is easy.  Paraphrasing
William Lurie's classic article (Lurie, 1958)
one only needs to answer the following
questions:

   o  What does one know about the site
      under investigation?

   o  What does one wish to determine
      regarding the site?

   o  And, how certain does one wish to be
      in the conclusions?

In addition, one may add the admonition of
Oscar Kempthorne (ca. 1966) that an objective
of any experimental design is to make the
analysis of the resulting data easy.
   The practicality of meeting of all of
these criteria, however, is quite another
matter.
EXAMPLE 1 - THE CONTAMINATED BUILDING
   Consider as anillustrative example the
situation presented by a building thought to
be contaminated by polychlorinated biphenyls
(PCBs), (Ubinger, 1987).  The design of a
sampling program for such a situation is no
trivial task.  It certainly must start with a
clear statement of the objectives which may
be multifaceted.  Consider the following
possibilities:

   o  Determination of the existence of
      contamination (a preliminary site
      investigation)

   o  An assessment of the risk of
      exposure

   o  Tracking the spread of contamination
      and mapping its extent

   o  Identification of sources of
      contamination

   o  Post cleanup confirmatory sampling,
      and

   o  Certification that cleanup has been
      achieved (closure sampling).

There is not single sampling strategy that
will apply to all types of problems and all
of the above objectives.  Clearly, the
objective of any sampling strategy is to
provide the most cost-effective approach to
meeting the study objectives while
maintaining a high degree of confidence,
precision and accuracy regarding the study's
conclusions.
   Once the objectives of the investigation
are clearly defined, the approach to the
development of a sampling strategy must
consider the nature of the source of
contamination.  The two major causes of PCB
contamination are electrical transformer
fires and historical usage of PCB-containing
lubricants.  Each results in very different
patterns of potential contamination which may
affect the design of data collection.
Typical patterns are as follows:

   Fires/explosions

   o  Fluid loss - Pattern radiating from
      the source; largely confined to
      horizontal surfaces at or below the
      source

   o  Smoke borne - Spread through
      ventilation system, decreasing
      contamination as distance from the
      source increases; horizontal and
      vertical surfaces at and above the
      level of the source
                                                  15

-------
   o  Tracking by human activity - Floors
      and high-contact surfaces; may see
      buildup in high traffic areas

   Historical Usage

   o  Oil creep - Generally on horizontal
      surfaces in proximity to the source;
      may be localized buildup in less
      accessible areas

   o  Reclamation and storage rooms and
      facilities

   o  Tracking - floor and other high-
      contact areas

   o  Other - permeation of floor into
      underlying soil; handling areas -
      loading docks, trash areas.

Let us assume for the moment that the initial
objective of the investigation is to define
the extent of contamination in anticipation
of building cleanup or remediation.
Traditionally, the sampling scheme employed
in such situations has been to engage in
directed or judgemental sampling.  That is,
sample those areas which appear through
evidence of oily stain to be contaminated,
and occasionally sample areas which do not
visually appear to be contaminated.  The
statistical effect of directed sampling is
certainly to increase the precision of the
estimated magnitude of contamination in those
areas which visually appear contaminated, but
is of little assistance in defining with
confidence the spatial extent of
contamination.  Nor does directed sampling
provide sufficient information on the lack  of
contamination, or the cleanliness of areas
which appear to be uncontaminated so that
unnecessary remediation can be avoided with
confidence (Ubinger, 1987).
   Though I will return to the issue of
making decisions regarding cleanliness later,
let me simply state that a decision to
require remediation can be made based on only
one sample.  The decision not to remediate,
however, requires much more evidence.  Thus,
if an objective is to minimize unnecessary
remediation, equal or perhaps greater
attention in sampling design should be given
to apparently "clean" areas as was accorded
apparently contaminated areas.
   Ubinger (1987) has compared the results  of
statistically designed sampling and directed
sampling and found the two methods to be
equally effective in detecting areas of high
contamination with an obvious advantage to
the statistically designed sampling program
of providing unbiased results.  In addition,
the statistically designed program was more
effective in supporting assessment of the
spatial extent of contamination and in
identifying areas which are uncontaminated.
   It appears obvious that there are
advantages to statistically designed sampling
plans.  But what type of statistical design
does one employ to do an effective job?  The
answer to that question can only be that it
depends.  It depends on the specific
objectives of the investigation and what is
known about the site and potential source of
contamination.  One generally starts by
laying a grid system over the area of
interest.  The grid spacing is determined in
large measure from decisions regarding the
areal size and geometry of the minimum
contaminated area necessary to detect and the
degree of risk deemed acceptable of not being
able to detect that area (Zirchky and
Gilbert, 1984).  These decisions regarding
target size and acceptable risk are not
inherently statistical decisions.  They are,
appropriately, the policy decisions of those
who must balance cost and risk.  There may
not be a single decision regarding target
size and the risk of missing that target
applicable to the whole of the area or
building.  For instance, if the objective is
to assess the health risk to building
occupants due to PCB exposure, areas with
high human contact potential are of greater
importance than areas of low contact
potential.  Thus, it may be perfectly
justifiable to engage in less sampling in low
contact areas than in high contact areas.
   It would certainly appear that
stratification is an important component in
the sampling design.  Intelligent choice of
strata can only be accomplished through
thorough knowledge of the physical plant of
the building under construction, the cause of
the alleged contamination, and traffic and
work patterns within the facility.  The
latter is less important if the objective of
the study is solely to map the spatial extent
of contamination; however, if health risk is
to be an important component in the decision
making process, a probability quota sampling
design may be employed.  Only if sampling
locations are chosen with probability
proportional to frequency of contact can an
overall estimate of exposure be formed
without application of external weights.
   In summary, the design of sampling schemes
for the allegedly contaminated building are
among the most complex to formulate.  In
order to define the problem statistically, a
number of definitions are required:

   o  What is to be measured; what is a
      suitable limit of detection?

   o  What is the criteria for cleanup
      decisions - are these criteria to be
      based on risk assessment?

   o  What are the statistical quantities
      that define a decision rule for when
      to remove material (the specific
      value may be a result of a site-
      specific risk assessment) , and the
      quantification of the risk of
      failing to remediate?

   o  What field sampling plan is to be
      used to obtain representative PCB
      concentration data?
                                                  16

-------
   o  What are the action guidelines?

   o  If the decision rule says "clean,"
      must the whole room be cleaned?

   o  If only hot spots are found, might
      they alone be cleaned?

   o  Is there a logical minimum to the
      size of a "remediation unit?"

Obviously, most of the answers to these
questions are site specific and, therefore,
defy general codification.  The answers also
require the direct interaction of the
statistician with other members of the study
team, the interchange of information among
the expert in building ventilation, the
industrial engineer, the expert in building
remediation, and the risk assessment
specialist.

EXAMPLE 2 - THE IMPOUNDMENT
   Compared to the problems associated with
the sampling of building interiors, the
sampling design for a waste impoundment might
appear trivial.  One does not have to reflect
on the sampling of walls, ceilings, air
handling systems and how to deal with
lighting fixtures and machinery.  However,
first impressions are often wrong, and though
certainly easier to conceptualize, the
sampling design for a waste impoundment is
not trivial.  Returning to the central theme
of this paper, one must start by asking
questions.

   o  Just what does one wish to find out
      about the impoundment and its
      contents?

   o  What types of waste might the
      impoundment contain and at what
      levels of concentration are they
      expected to be found?

   o  How did the waste get there?  By
      pipe, from a process reasonably
      continuous in time or by truck from
      some batch process?

   o  Where are the discharge points to
      the impo undmen t?

Answers to questions such as these provide
the clues to potential spatial stratification
for which the sampling design must account.
   Generally, whether the waste within the
impoundment is spatially homogeneous or
heterogeneous is of concern regardless of
what will be done with it later.  This
applies largely to sludges, as any liquid
layer will likely be well mixed at the moment
of treatment and disposal, except perhaps in
very large impoundments.  Considering the
sludge alone, prior to the design of a
sampling strategy, the potential
stabilization must be considered.   If no
stabilization is to occur, then the sampling
design must focus on the chemical
characterization of the sludge in situ
subject to the conditions of closure or
removal.  If the sludge is to be removed, the
process of removal may well suggest natural
sampling units, e.g., a volume associated
with a backhoe or clam shell bucket or
perhaps a truck load.  The delineation of
subareas containing hazardous or nonhazardous
material may be useful in reducing the cost
of treatment and/or disposal.  If
stabilization is to occur, the sludge may be
homogenized vertically by a barge.
   Popular guidance currently provided for
the construction of sampling plans for the
characterization of potentially hazardous
waste (U.S. EPA, SW-846), contains a
philosophy which I believe is misleading and
statistically naive.  Consider the following
statement from SW-846.

   The regulations pertaining to the
   management of hazardous wastes contain
   three references regarding the sampling
   of solid wastes for analytical
   properties.  The first reference, which
   occurs throughout the regulations,
   requires that representative samples of
   waste be collected and defines
   representative samples as exhibiting
   average properties of the whole waste.

This statement presumes two things which are
almost never true: (1) that the waste is
homogeneous, and (2) that environmental
protection is served by considering what
happens on the average.  It has been my
experience that hazardous waste is almost
always heterogeneous in space and/or time
except under very special circumstances.  It
certainly is appropriate to require that
samples be representative of the portion of
the waste from which they were collected, but
to define "...representative samples as
exhibiting average properties of the whole
waste" expresses a view of the world that
simply is not true.  Homogeneity is something
which must be determined.  Appropriate
situation specific sampling designs can
effectively be used for this purpose.  How
effective and efficient the sampling design
is for characterizing a hazardous waste
population depends in large measure upon the
amount of prior knowledge available about the
specific situation and a statement as to the
objective uses of the data collected.
   The fact that what happens on the average
rarely serves the purpose of environmental
protection seems obvious to me, and I will
later attempt to demonstrate this very
point.  I submit that it is not what happens
on the average but the consequences of the
extremes that are environmentally
interesting.  As will be illustrated later,
when discussing the example given in SW-846,
it gives one little comfort to know that in
expectation the average of nine samples of
sludge will be less than the specified
regulatory threshold 80 percent of the time
when a full 20 percent of the individual
samples will exceed that threshold.
                                                  17

-------
   The second paragraph of Section 1.1.1
indicates that the issue addressed by
representative samples is that of sampling
accuracy.  I submit that this is not the  case
under the definition of representative
samples stated.  Further, this may not be the
case for the subpopulation of waste from
which the sample was taken unless use of
various analytical methods and
interlaboratory comparisons are employed.

   The second reference, which pertains
   just to petitions to exclude wastes
   from being listed as hazardous wastes,
   specifies that enough samples (but in
   no case less than four samples) be
   collected over a period of time
   sufficient to represent the variability
   of the wastes.

The variability of the waste is certainly
important to assess, however, rarely is the
variability exhibited among samples taken
during even a well designed investigation due
solely to random variations in the waste.
Much of the variability observed can be
attributed to factors in space and/or time.
Ignoring such factors can have significant
consequences, not only in terms of the degree
of environmental protection, but also in
terms of unwarranted expense and the
sensibility of statistical tests employed to
evaluate the waste.  Sampling designs which
permit identification of these factors allow
for the  possibility of increased
environmental protection at lower cost.
Unfortunately, the  identification of
appropriate subpopulations may require the
application of statistical techniques in
light of specific knowledge of the situation
in a manner which cannot be codified.  In any
event, the number and placement of samples in
space and/or  time  for hazardous waste
investigations has many more aspects than one
of sampling precision once one conceives of
the problem more realistically than that of
characterizing average  properties.
   Consider the  following illustration of the
consequences  of basing  the decision  as to
whether  a waste  is  hazardous or not hazardous
on only  average  properties.  The  figure below
   60
                  ZO     SO      80
                  CUMULATIVE PERCENT
       DISTRIBUTION OF SINGLE SAMPLES AND SAMPLE MEANS
               BARIUM SLUDGE EXAMPLE SW-846
is generated from data presented as the
Hypothetical Example given on Page 14 of
SW-846 and used to illustrate sample size
selection under different strategies.  In
construction of the figure, I have assumed
that the normal distribution ia an adequate
model and that the true mean (ji) is equal to
93.56, and the true standard deviation is
equal to 7.75.  These values are the
estimates of their respective parameters at
the moment that a decision is made that
barium is not present in the sludge at a
hazardous level (see Equation 6 on Page 17).
   The salient points to be made using this
figure are simply stated in the following:
(1) Though a conclusion was reached that
barium was not present at a hazardous level
in the sludge based on inferences regarding
the mean of nine samples, a full 20 percent
of the sludge would be expected to be above
the hazardous level.  And, (2)  it would
appear that if the true mean concentration  is
below the regulatory threshold, a decision  as
to whether a constituent is or not present  at
a hazardous level is solely a function of the
number of samples taken using the logic
presented.
   In point 1 above, I have used the  term
expected in the statistical sense.   If one
were to demand a greater degree of confidence
in estimating the maximum percentage  above
the regulatory threshold, then  a tolerance
limit rather than a confidence  limit  should
be used.  The result would be closer  to
30 percent of the sludge in the hazardous
concentration range with 95 percent
confidence.  Guttman, 1970, differentiates
these approaches as "Beta content" versus
"Beta expectation" tolerance regions.
   If one  focuses on deciding whether or  not
the  sludge contents of an  impoundment are
hazardous, given a definition of "hazardous,"
a specification of risk  to be tolerated  in
terms of the  fraction of sludge permitted  to
exceed the defined threshold, and a  specified
desired degree of confidence  to be  associated
with the decision, then determination of  the
number of  samples required would  appear
straightforward.  However, many would attempt
such a determination by  using techniques
which assume  that the  statistical
distribution  of chemical concentrations  is
normal.  Though upon occasion,  one may be
willing  to  assume  that variations  introduced
by measurement  and  sampling may arise from  a
normal distribution,  in  general the
assumption of a normal distribution  of
hazardous  waste measurements over  space
 and/or  time cannot be  supported,   torse,  et
 al,  (1986)  suggest  that  nonparametric
approach  to the  problem  be taken.   In such  an
 approach,  the maximum  proportion  of the
 population exceeding  the ultimate  or
 penultimate observation  with specified  degree
of confidence is  a  function  of  the number of
 samples  taken.   For instance, one is
 95 percent confident  that  at most  five
 percent  of all samples will  have  a
 concentration greater  than the  maximum
 concentration of a set of 59 random samples.
                                                  18

-------
   The use of order statistics and the rank
orders of hazardous waste chemical
concentration data are quite useful in
hazardous waste investigations.  However,
that is a topic for an in-depth presentation
of data analysis.  Such data analysis
considerations do indeed have an effect on
the sampling design.
   Returning to the problem of making a
hazardous/nonhazardous declaration regarding
the sludge in an impoundment, a first step
might be to consider it as a target detection
problem.  The size and geometry of the target
is determined by one's knowledge of how the
waste entered the pond.  Entry in the form of
a slurry through a pipe at one end of the
impoundment might lead one to consider a
rather large elliptical target, for
instance.  A grid may then be superimposed on
the pond surface, which will give a high
probability of detecting the predefined
target if samples are collected at, say, grid
nodes.  Quite likely, the cost of analyzing
all of the collected samples will be quite
high.  However, if the concentration decision
level is sufficiently large, sample
compositing may be effectively introduced to
keep the costs within reason.  Of course, the
assumption here is that the analytical cost
is large with respect to the sampling cost.
This assumption is not always true.
   Consider for example an impoundment of
nominal dimensions of 200 by 150 feet.
Superimposing a square sampling grid with
25-foot grid spacing gives one a probability
of 80 percent of detecting a circular area of
contamination of 25 feet diameter with 48
samples.   If one is concerned about
contamination with barium, and the
concentration level for a declaration of a
hazard is 100 ppra, then the possibility of
compositing samples in sets of four will
reduce the number of analyses to 12 without
sacrifice to environmental protection,
provided that the analytical method detection
limit is less than 25 ppm.  One must
consider, however, the potential losses
associated with an increase in Type 1 error
should the sum of the concentrations of the
four individual samples, now unknown, exceed
100 ppm when each sample concentration is in
fact less than 100 ppm.  This increased risk
of unnecessary remediation may be reduced if
sample holding times are such that aliquots
of the individual samples can be preserved
for later analysis.   If such a two-stage
analytical program is not possible, then some
tough risk balancing decisions must be made
prior to the design of the sampling program.
   As a final comment regarding the waste
impoundment, if stabilization is to take
place, a two-stage sampling program may be
the most cost effective.  The purpose of the
first stage of sampling may be solely to
determine the gross trends in the
physical/chemical properties of the sludge.
Interest is primarily in determining those
factors which may affect stabilization.   At
this stage,  only a few samples are required
to investigate gross trends,  say only nine
 for the  impoundment described above.
 Multiple regression techniques may be used to
 investigate the existence of spatial trends
 important to cost-effective application of
 stabilizing treatments.  Once stabilization
 has been accomplished, the objective of
 additional sampling would be to determine the
 requirements of sludge disposal or
 impoundment closure.  As the act of
 stabilization almost invariably involves
 mixing, the properties of the stabilized
 sludge may be quite different than the
 unstabilized material.

 EXAMPLE 3 - THE LANDFILL AND THE INDUSTRIAL
 OR RESIDENTIAL SITE
   Much of the above discussion applies to
the investigation of landfills or the
potentially contaminated industrial site.  A
very important difference between this type
of site and the waste impoundment is the
greater potential for lack of spatial
homogeneity.  Obviously, in the instance of a
modern, well-run and documented landfill, the
sampling plan should be designed to
comprehend the known stratification of
material disposition.  However, consider the
situation of the "Superfund" site.  Little
information is generally available as to the
depositional location of potentially
hazardous material.  Certainly a continuum of
the degree of information exists regarding
the initial deposition, potential transport
and nature of the waste from well kept
documentation, to reliance on memory and to
nothing at all.  The same is true of the
industrial site where contamination may have
occurred at some past time through, not only
intentional disposal, but accidental spills
or process leaks.  One must also be cognizant
of construction activities and their
potential for transportation of contaminated
soil.  The more of this information that is
available, the better the sampling plan.
   As is the case with the waste impoundment,
it seems reasonable to begin the design of a
sampling plan by considering the problem as a
target detection problem.  Once the possible
mechanisms for hazardous material deposition
have been identified, the size of the target
to be detected and the degree of risk to be
tolerated in not detecting the target must be
determined.  The latter is purely a policy
decision, but the former may be assessed in
more technical terms.  Suppose it is known
that organic solvents were disposed of in 55-
gallon drums.  Given some information
regarding the soil permeability, the
transport coefficient of the solvent and the
approximate length of time a drum has been in
the ground, it is possible to estimate the
spatial extent of a potential plume arising
from a leaking drum.  However, the task is
more complex in the case of dioxin which is
generally not mobile in soil.  In this case,
very small isolated deposits of dioxin may
exist.  This is particularly true on the
industrial chemical plant site where
deposition may have been a result of
explosion, process leakage or plant expansion
activities.
                                                 19

-------
   If the analytical procedure to be employed
is inexpensive, as with a walk-over radiation
survey using a geiger counter or a portable
organic vapor analyzer, one may simply lay
out the required grid network for the desired
target size and risk of not detecting the
target.  Such situations are rare, however.
Hast frequently the statistician's job is to
effect a compromise between the competing
objectives of determining the potential of
hazard and minimizing the cost of doing so.
   As indicated in the above discussion of
the contaminated building, it takes only one
sample to indicate the existence of
contamination.  It takes several more samples
to determine the extent of contamination
however.  Often contamination is known to
exist at a site through a prior sampling
event.  Possibly the location of a
contaminated area is even known quite
precisely.  The objective of future study,
and hence the sampling plan, is to determine
the extent of contamination and provide a
rational basis for the making of remedial
decisions.  If one is willing to make the
assumption of the stationarity of spatial
similarity of contaminant concentrations,
then the techniques used in geostatistics
offer some exciting possibilities.  This is
particularly true of nonparametric
geostatistics (Flatman 1984:  Isaaks, 1984;
Journal 1984-1, 1984-b, 1987).  In such
instances one may be willing to employ a
sampling grid of rather course spacing,
augmenting it at an occasional grid node by
sampling along a geologic "fence."
   There is no substitute for knowledge.  To
be effective in meeting the objectives of any
investigation, the sampling scheme must be
designed with all the available knowledge
regarding the site, the suspected
contaminant, its potential for transport, its
means of initial deposition and potential
analytical methods required for its
detection.  This requires a close  interaction
among all members of the team of
investigators.  Only with site-specific
knowledge can an adequate sampling plan be
developed.
   Perhaps the most difficult objective to
accomplish is to declare with a specified
degree of certainty, that a site  is free of
contamination.  One is faced with  this
objective after remediation of a contaminated
site  and under the provisions of the State of
New Jersey Environmental Cleanup
Responsibility Act (ECRA).  ECRA  is involved
in all major real estate transfers in
New Jersey, and similar legislation is
pending  in several other states.
   Exner, Gilbert and Kinnison (1984) have
taken  a novel  approach to deciding when
remediation is complete for dioxin
contaminated  sites near Times Beach,
Missouri.  Their proposed  strategy divides
the known area of contamination  into "cleanup
units" of a size conducive  to the  use of
appropriate soil  removal  equipment (e.g.,
large earth moving equipment).  A practical
unit  size  is  20 by 250 feet.  Adjacent clean-
up units are established to ring the unit in
which remediation is initiated.   A grid is
established for each unit by the intersection
of two lines parallel to the long axis of the
unit, spaced ten feet apart starting five
feet from one end of the unit.  A fixed
geometric sampling pattern is established
with each grid square.  Samples  from the same
position in each square are then
composited.  A number of aliquots, n, are
then selected randomly from each composite
for analysis.  If there are m composite
samples, then there are N"nm data for each
clean-up unit.
   The N data are then used to calculate the
arithmetic mean and standard deviation of the
n composite means.  These statistics are tnen
used to compute an upper confidence limit for
the true mean concentration for the clean-up
unit.  The degree of confidence is consonant
with the risk of not remediating a
contaminated unit when the environmental risk
is based on the mean unit concentration.  If
this upper confidence limit statistic is
above a decision criteria (an acceptable true
mean concentration), then another two inches
of soil is removed from the entire unit and
the test repeated.  Otherwise, the unit is
deemed to be clean and remediation stops.
Because of the manner of statement of the
null hypothesis, this procedure controls the
risk of not remediating a contaminated
unit.  The risk of unnecessarily remediating
a clean unit remains uncontrolled however.
The financial loss of this uncontrolled risk
is perhaps quite small in that the cost of
removal of an additional two  inches of  soil
from a clean-up unit is small once
mobilization has occurred.  This  is
particularly true if there are sufficient
clean-up units to keep a field team occupied
while awaiting the analytical results  from  a
prior remediated unit test.
   If the  site under consideration has  not
previously been declared contaminated,  then
the situation is much more complex.   The
complexity derives not  from  the  lack  of
sufficient statistical techniques with  which
to design  sampling plans, but from  the  need
to balance possible risk with the cost  of
achieving  a  specified  level  of risk
control.   In  part, I believe  that this  is a
cultural problem.  We  are conditioned to
finding  something.   If we do  not  find
contamination, we  feel disappointed.   As  a
society, we must come  to grips with defining
the  levels of risk we  are willing to  take and
be willing  to bear the cost  of  achieving
those  levels.  Organizations who find
themselves subject to  legislation such  as
ECRA, must balance  the  risk  of not  finding
contamination, when  in  fact,  it  exists  with
the cost of  the  potential  liability  in  the
future.  The  statistician's  professional  role
is confined  to  the definition of risk
alternatives  and  providing  sampling designs
consonant  with  the  policy  decisions made in
response  to  those definitions.
                                                  20

-------
 EXAMPLE 4 - GROUND WATER MONITORING
    Rarely has a specification of statistical
 procedures created as much controversy as in
 the case of the monitoring of ground water
 around hazardous waste facilities.  Since the
 promulgation of the regulations now found in
40  CFR 265.92, there has been repeated
 criticism of the application of Student's "t"
 test in any form to the determination of
 ground water contamination by a hazardous
 waste facility.  Under federal regulations
 (40 CFR 265 and 264), the owners and/or
 operators of hazardous waste facilities are
 required to monitor the ground water around
 the hazardous waste facility and to detect
 the existence, if any, of ground water
contamination.  While this appears a
 perfectly reasonable objective, the
 procedures prescribed for doing so are, in
most cases, at best naive and at worst wrong
 (see Liggett, 1985 and Splitstone, 1986).
Here is a clear case where the promulgation
of  a statistical hypothesis testing procedure
 has relegated the importance of statistical
design to a secondary position rather than
 the other way around.
    The objective of monitoring ground water
near a hazardous waste facility is clear.  It
 is  to determine whether the facility is
contaminating the ground water.  A sampling
 program to make such a determination should
be  designed so that any contribution from the
 particular facility can be distinguished from
other sources.  These other sources, which
may be both anthropogenic and natural,
provide what is loosely referred to as the
background.  The design of a sampling program
 to  distinguish the contribution of the site
 interest from the background is complicated
by  temporal and spatial variations in the
background.
    Temporal variations in ground water
quality may in part be explained by recharge
rates which are weather related.  Spatial
variations among wells may be due to a
variety of reasons.  Some of this variability
may be reduced by ensuring that the wells
monitoring the aquifer beneath a particular
waste disposal site are located relatively
close together and drilled to the same
depth.  Nevertheless, spatial variations may
occur even if there is no contribution from
the  site being monitored.   These variations
might be expected to be a result of
horizontal gradients across the site,
unexpected confluence of different aquifers,
or more likely, the natural geochemical
conditions of the area.
    If one is fortunate enough to be able to
begin monitoring prior to the existence of
the  facility,  life is quite easy.  It is then.
possible to investigate  the existence of
potential temporal and spatial  trends in
ground water quality without  fear of any
confounding from the facility.   Such a
circumstance,  however, is rare  indeed.   More
likely,  monitoring begins around an existing
facility.   In  either case,  it would appear
advisable to design a study to  investigate
the characteristics inherent  to ground  water
quality in the environs of the site or
proposed site prior to designing a long-term
monitoring program.
   The design of the initial investigation,
where it includes the siting of new wells or
simply the analysis of existing data begins
by asking what is known or suspected about
the site.  Input from the hydrogeologist and
geochemist is a must.  Knowledge regarding
the probable direction of ground water flow,
possible seasonal or diurnal fluctuations and
the potential chemical changes in ground
water quality as it moves through the local
rock and soil is essential to the intelligent
interpretation of the data.  The result of a
well-designed initial investigation should be
a "model" describing natural or "background"
variations in ground water quality.  Such a
model is of necessity site specific.  The
design of a long-term ground water quality
monitoring program should have as its
objective the detection of deviations from
this background model.  Techniques
appropriate to testing hypotheses regarding
deviations from this background model must be
employed and should be permitted by federal
regulations.

CONCLUDING REMARKS
   The design of schemes to collect data for
the purpose of providing meaningful
information regarding potential hazardous
waste sites is of necessity site specific.
The design must be based on a clear statement
of the information gathering objective and
can only be efficient if careful
consideration is given to the accumulated
prior knowledge regarding the site.
   Attempts that have been made to specify
the statistical techniques to be employed in
environmental decision making often ignore
the presence of site-specific conditions
which may have a dramatic effect on the
outcome of the prescribed mathematical
calculations.  Realizing that it is not
possible to promulgate regulations applicable
to every site, a clear regulatory statement
of the objective to be achieved coupled with
a specification of the degree of certainty  of
achieving that objective would appear to be a
reasonable regulatory goal.  The details of
data collection and interpretation are
appropriately left to the statistician in
concert with the knowledgeable scientists and
engineers.
   The design of a sampling strategy for any
site investigation is more a logical thought
process that the application of statistical
tools.  The purpose of environmental
regulation might well be served by carefully
outlining the steps in this thought process
and schooling those charged with the
protection of the environment in their
application.  The role of statistics and the
statistician in this process have been
outlined nicely in the published notes from a
short course on nuclear site decontamination
and decommissioning (Barnes, 1981).
                                                  21

-------
REFERENCES
   Barnes, H. 6., 1981,  "Statistics and the
Statistician in Nuclear  Site Decontamination
and Decommissioning - Lecture Notes for a 4-
Day Short Course," U.S.  DOE Contract DE-AC06-
76RLO 1830,  Pacific Northwest Laboratory,
Richland, West Virginia.
   Exner, J. N., R. 0. Gilbert and R. R.
Kinnison, 1984, "A Sampling Strategy for
Cleanup of Dioxin in Soil," Appendix I, Quail
Run Site Hazard Mitigation Plan,
Environmental Emergency  Services Company,
Chesterfield, Missouri.
   Flatman G., 1984, "Using Geostatistics in
Assessing Lead Contamination Near Smelters,"
Environmental Sampling for Hazardous Waste
Sites, ed. Schweitzer, ACS, Washington, D.C.
   Guttman,  I., 1970, "Statistical Tolerance
Regions:  Classical and  Bayesian," Griffin's
Statistical  Monographs and Courses, No. 26,
Hafner Publishing Co., Darien, Connecticut.
   Isaaks, E.H., 1984 Risk Qualified Mappings
for Hazardous Waste.  A  Case Study in Non-
parametric Geostatistics,  MSc Thesis, Branner
Earth Sciences Library,  Stanford Univ.
   Journel,  A.G., 1984-a,  "The place of non-
parametric Geostatistics,  "Geostatistics  for
Natural Resources Characterization, ed. Verly
et al., Reidel, Dordrecht, pp. 307-35.
   Journel,  A.G. , 1984-b,  "Indicator Approach
to Toxic Chemical Sites, "EPA, EMSL-Las
Vegas, Report of Project No. CR-811235-02-0.
   Journel,  A. G., 1987, Short Course,
"Advanced Geostatistics  For Mining and the
Environment," sponsored  by Stanford
University,  and U.S. EPA Las Vegas,
January 12-16, Las Vegas,  Nevada.
   Kempthorne, 0., ca 1966, Course Notes
"Design of Experiments," Department of
Statistics,  Iowa State University, Ames,
Iowa.
   Liggett,  W., 1985, "Statistical Aspects of
Designs for Studying of  Contamination,"
Quality Assurance for Environmental
Measurement. ASTM STP867.
   Lurie, W., 1958, "The Impertinent
Questioner:  The Scientist's Guide to the
Statistician's Mind," American Scientist,
March, pp. 57-61.
   Marquardt, Donald W., 1987, "The
Importance of Statisticians," Journal of the
American Statistical Association, Vol. 8TJ
No. 397, pp. 1-7.
   Morse, M. W., Sproat and J. Warren, 1986,
"Sampling and Analysis for Delisting
Petitions/Statistical Basis for Sampling,"
Second Annual Symposium on Solid Waste
              5ym
              BlTi
Testing and Quality Assurance, U.S. EPA
                 ife
Conference, July 15-18^
   Splitstone, D. E., 1986, "A Statistician's
View of Ground Water Monitoring," Proceedings
of the National Conference on Hazardous
Wastes and Hazardous Materials, Hazardous
Materials Control Research Institute, pp. 8-
12.
   Ubinger, E. B., 1987, "Statistically Valid
Sampling Strategies for PCB Contamination,"
to be presented at the Electric Power
Research Institute Seminar on PCB
Contamination, Kansas City, Missouri.
   U.S. Environmental Protection Agency,
Interim Status Standards for Owners and
Operators of Hazardous Waste Facilities, 40
CFR 165.92.
   U.S. Environmental Protection Agency,
Regulations for Owners and Operators of
Permitted Hazardous Waste Facilities, 40 CFR
264.97.
   U.S. Environmental Protection Agency,
1982, Test Methods for Evaluating Solid
Wastes, SW-846, Second Edition, July.
   ZTrchsky, J. and R. 0. Gilbert, 1984,
"Detecting Hot Spots at Hazardous Waste
Sites," Chemical Engineering, July 9.
                                                 22

-------
                                        DISCUSSION
                                    W.  Barnes Johnson
        U.S. Environmental Protection Agency, 401 M St.  SW,  Washington,  DC  20460
    I have been asked to provide a discus-
 sion from the perspective of an
 Environmental Protection Agency (EPA)
 statistician.  In communicating  the  EPA
 perspective it is first important to note
 that statistics are used  in  a  variety of
 EPA's program  activities.   A  convenient
 taxonomy for  categorizing those  activi-
 ties begins with the most broad  and wide
 ranging and continues to the most detail-
 ed:
    RESEARCH AND DEVELOPMENT
    POLICY DECISIONS
    REGULATORY DEVELOPMENT
    GUIDANCE DEVELOPMENT
    PERMITTING
    COMPLIANCE.
    Research and development  activities
 are  the most  broad and  wide  ranging  in
 terms of their  possible  application  and
 impact on EPA's programs. Some of
 Dr.  Tiao's  work has been  done  to support
 EPA  activities at  the broad  end of  the
 continuum.   Policy decisions,  regulatory
 actions, and guidance development can be
 quite broad,  but can  also  be  exacting
 and  specific  in  terms  of   application;
 therefore they  reside  in the  middle  of
 the  continuum of  areas where  statistics
 might be used  at  EPA.   At  the  narrow
 end  of the  continuum the concerns  focus
 on site specific sampling, analysis,
 and  decision making problems related
 to permitting and compliance.
 Dr.   Splitstone's experiences and
 certain aspects of  Dr.  Tiao's work have
 been directed  toward   the problems that
 are  faced at the narrow end of the contin-
 uum.  The  spectrum  offered  by these  two
 presentations seem  to  cover  the  broad
 and  difficult  responsibilities that  EPA
 faces.  There is a  responsibility impos-
 ed on EPA to be general so that a nation-
 al scope is provided; but EPA  also  has a
 responsibility to provide technical  di-
 rection that  can  be  useful at  specific
 locations with unique circumstances.
    One thing that  becomes clear  is that
 at the EPA statistics are not  a  product.
 This is not so true,  for example,  at  the
 Census Bureau  or  the  Bureau   of  Labor
 Statistics; at these Agencies  statistics
 are  truly the product of interest.
 Instead, EPA's products are  primarily
 regulatory  decisions.   These decisions
 are  based  upon  environmental,  engineer-
 ing, economic, political, financial,
 risk, and health analyses.   This differ-
 ence reflects  itself  in the   way  that
 statistics  and therefore statisticians
 are  used at  EPA.   One might even expand
 this to say  that there  is   a  difference
 in the  way problems  are characterized,
 approached, and solved.   It  is important
 to remember that,  when statistical work
• is called  upon  to  provide   perspective,
insight, or  response  relative to  an EPA
interest or  initiative,  the  statistical
analysis is  only  a tool  that  is  used in
the ultimate  decision.   For  example,  in
the case of  Dr.  Splitstone's  PCB contam-
ination problem, EPA may have been
interested in  knowing  if  the  building
was a health  threat.   Knowing that there
is a  high  confidence that   no  circular
hot spots of  a  reasonably  small  size are
present with  PCB's  above  an action level
may or may not  be  sufficient information
for the decision maker.  The  implications
regarding exposure likelihood and the
resulting risk  potential  may  need  to be
analyzed.  Similarly, in the case of
Dr. Tiao's work,  EPA  might be interested
in deciding  whether  vehicle  inspection
and maintenance programs effectively
reduce ozone.   However,   the  study  may
not be useful for making a national
policy change  given  the  messy and  con-
founded information  from  a  few  sampling
stations located  in Oregon and  Arizona.
The primary  conclusion  is  that  statist-
ical methods are tools that support
EPA's program  functions  and,  as  such,
they must  be designed  to  allow  for  or
assist in formulating defensible
decisions.
   Although  statistical methods are a
tool that  can  be  called   upon  when  an
investigator or  policy analyst  desires,
they are  not  always  considered  early,
during the  design  phase   of   a  project.
Instead, "case  study"  or  ad hoc  "data
base" approaches are  often used  as meth-
ods for solving complex environmental pro-
blems.  Consider  the  following  hypothe-
tical examples  that  explain  why  these
data acquisition  methods  might  be used.
An investigator  needs  to  evaluate  the
nature, status,  or extent  of a  national
environmental problem.  The  investigator
decides that  data  must be  obtained that
in some  sense  "represent" the  national
situation.   One approach  is to pick in a
purposive manner geographic and/or tempor-
al locations  which  seem   to  reflect  or
parallel the national situation.   In some
cases, locations  or  situations  may  be
chosen to provide  results  that  are both
extreme and  average,  and  in  this  way
bound the  problem.   This  is the  "case
study" approach.
   Another approach,  which certainly has
been facilitated  by  information  proces-
sing technology, is to attempt to collect
every piece  of  data  that  has ever been
accumulated  by  any source  and  store the
data in  a  uniform format  on  a computer.
The idea  is  that  the availability of the
information  in an accessible  form ensures
a meaningful  "representation" of the sit-
uation.  This is the "data  base" approach.
                                           23

-------
   There  is  no question that these meth-
ods have  utility  and merit given certain
conditions or  investigative purposes;
however,  these  approaches can  also dis-
tort subsequent analysis  and evaluations.
Also, the  conclusions  that  can  be made
from analyses  performed  on  information
collected in  such a  manner  are limited.
Dr. Tiao,  in  one of  his examples,  was
provided data  from two  stations  (one in
Corvallis, Oregon,  and one  in  Portland,
Oregon).  Someone  selected  these  sites,
undoubtedly out  of  a  network  of  sites,
and commissioned  Dr.  Tiao  to  conduct  a
detailed and,  I must say, excellent time
series evaluation.   How  was it  decided
that these  two sites  were  chosen?  Cer-
tainly, it  would  have   been  preferable
for a  statistician  to be involved  with
the experimental  design  of  the  sample
station monitoring  network.   Many issues
such as between location  variablity,
frequency and  duration of sampling,  and
timing of  instrument  calibrations  could
have been  designed  into  the   study  and
Dr. Tiao  clearly  pointed  these  issues
out.  Instead, interests  in acquiring
"representative" information  by develop-
ing a "data  base"  and performing  a "case
study" have made the investigation,
although interesting and  informative,
less than it  could have  been given some
design prior  to  implementation  of  the
sampling.  Kruskal  and Hosteller  (1979a,
1979b, and 1979c) offer interesting back-
ground and discussion  regarding the
meanings of  "representative"  sampling as
used in a variety of  technical  and non-
technical literature sources.
   Dr. Splitstone is in the enviable
position of  being able  to  influence the
statistical design,  sampling, and  analy-
sis for the purpose of answering specific
questions.  Dr. Splitstone's examples are
site specific  and  in  response to  EPA's
permitting and  compliance programs  des-
cribed earlier.   Dr.  Splitstone attempts
to make no extrapolation beyond the
original scope  of the  sampling program.
Through a well conceived statistical
design and analysis,  he  is  able  to gen-
erate data which properly "represent"
and answer the specific, statistical
questions at hand.
   I would like  to return to an  earlier
idea which  couches  statistical  analysis
as a decision  maker's  tool.   I  also want
to continue  with  the  theme  that a  well
conceived statistical design and sampling
program offer the best method for
generating data which provide a "represen-
tative" look at  an environmental  problem
problem.  The  EPA manager realizes  that
decisions are  made  in a  risky environ-
ment; there is a chance that wrong
decisions will  be  made.   Statistically
designed sampling programs allow  quanti-
fication of  the  likelihood  of  a  wrong
decision.  However, ad hoc, poorly
developed "data  base"  and  "case  study"
approaches that do not  coincide  with the
objectives under  scrutiny,  do not  allow
quantification of the  error or  risk and
certainly provide a  biased  "representa-
tion" of  the  problem.   As  such,  these
data acquistion approaches  at  worst pro-
hibit and  at best constrain  the use  of
statistical analysis  as a decision maker's
tool for evaluating risk.
   There seem to  be  two  general  observa-
tions from  these presentations.   First,
involvement early in the design  process
as in Dr.  Splitstone's  situation  is re-
flected in  results that  satisfy  the ob-
jectives of  the  study  and  allow  decision
making in  a risky environment.   Second,
involvement with  the data  analyses  after
data collection  has   been  designed  and
executed will result  in analyses  and
conclusions that  must  be well  qualified
and reflective of the  limitations of the
original sampling  program.   Dr.  Tiao's
clear presentations  of  the  strengths and
weaknesses of  each  example he  presented
certainly support this observation.

REFERENCES

Kruskal, W. and F. Hosteller, 1979a.
Representative Sampling I:
Non-Scientific Literature.
International Statistical Review.
47s  13-24

Kruskal, W. and F. Hosteller, 1979b.
Representative Sampling IIs
Scientific Literature, Excluding
Statistics.  International Statisti-
cal Review.  47:  111-127

Kruskal, W. and F. Hosteller, 1979c.
Representative Sampling III:
Statistical  Literature.  International
Statistical Review.  47:  245-265
                                           24

-------
                                  SPATIAL PREDICTION AND SITE SELECTION
                                              Noel  Cressie
                    Department  of Statistics, Iowa  State University,  Ames,  IA   50011
Abstract
     This article considers various  stochastic
models that could be used in the prediction of
pollutants at locations where no data are
available, based on data taken  from  a spatial
network of monitoring sites.  The design problem
of selecting those sites is also discussed.

Introduction
     The "default" stochastic model  for  a batch
of measurements on a phenomenon of interest, is
that they are Independent and identically
distributed (i.i.d.).  But the notion that data
close together in time or space are  likely to be
correlated (i.e., cannot be modeled  as
Independent) is a natural one, and will  receive
particular attention in this article.

     Pure temporal models, or time series models
as they have come to be known, are usually based
on identically distributed observations  that are
dependent and occur at equally spaced time
points.  The ARNA models, popularized by Box and
Jenkins (1970), are important examples.  Spatial
models of the type used in thj.s article  have to
be more flexible since there is not  the  analogous
ordering in space and it is not reasonable to
assume that spatial locations of data occur
regularly.

     Relatively recently (the last 10 to 15
years) , meteorological space-time data sets have
been collected for the purpose of studying the
effects of atmospheric pollution and, in
particular, acid rain (see e.g. Peters and
Bonelli, 1982).  Daily data collection over a
number of years at various locations throughout
the Northeast USA yields a massive data  set.  But
most of it is temporal so that spatially speaking
the data are still rather sparse.  Nevertheless
spatial prediction is just as important  as
temporal prediction, since people living in those
cities and rural districts without monitoring
stations have the same right to know whether
their water or their air is polluted.  Thus a map
of e.g., wet deposition (acid rain), is  needed
based on samples collected from a spatial network
of monitoring sites.

     One could take either a global or a local
approach to spatial prediction.  Global  methods
such as trend surface analysis attempt to model
the spatial variation of the phenomenon  in terms
of large structural components (e.g., polynomial
trend).  However the error process usually
contains small-scale spatially correlated
variation.  Local methods such as kriglng attempt
to predict unknown values from weighted  averages
of nearby values.

     The spatial dependence can also be  exploited
in the design phase, to establish an "efficient"
network of sites.  Or if a given network needs a
site added or deleted, where should this occur?
I shall address these questions using a
stochastic spatial model, but it should not be
forgotten that there are important considerations
to take into account before statistical
optimization is performed.  Among these are:

• Determine what to measure and how to measure
  it.

• Be aware of such preliminary variables as local
  weather conditions, historical data, and take
  "baseline" data.

• Will measurements be replicated or not?

• Determine the spatial support (level of
  aggregation) of the measurements.

     Then finally, remember the following
aphorism of data collection:  The necessary is
too expensive and the affordable is inadequate.

2.  Spatial Prediction

     Suppose that measurements, both actual and
potential, are denoted

                 { z(a):   g£D >,

where £ Is a spatial location vector in 3r or Ir
(two- or three-dimensional Euclidean space).  The
index set D gives the extent of the region of
interest.  Data
                                            (2.1)
                                            (2.2)
are observed at known sites
     Henceforth I shall assume  that
 { z(s_):  s_eD  } is a realization of a  stochastic

 process
                                            (2.3)
But is this a realistic model when  it  is  known
that (apart from measurement error)  z(')  is  a
deterministic?  What is the source  of  the
randomness in (2.3)?  To justify  the stochastic
models used in this article, I shall argue as a
Bayesian would.  The random variation  comes  from
the uncertainty caused by not observing z(') at
all locations in D.  In fact with no data the
"prior" might be that the physical  process
behaves as a second-order stationary stochastic
process would.  This is a standard  tactic in
science; when there are infinitely  many possible
surfaces z(*)> deal with the possibilities
statistically.  Now once data (2.1)  are observed,
update the prior and work with the  posterior
distributions
            :  S.ED
                                                    25

-------
     From the data (2.1) it is required to
predict (possibly noiseless versions of)

Z(g0) or U/|B|) JB Z(s)ds, where  |B| - /g ds.

     I shall present both the trend surface model
and the random field model, and then compare the
two.

2.1  Trend surface model
     Suppose Z(') can be written as

     Z(s) - u^s) + e^s); seD,
                                           (2.4)
where e.(') is a zero mean white-noise process;
i.e.,  l
          E(e (s)) -   0  ; seD
             1             ~
                        ;   * u; s.ueD.
                                           (2.5)
cov(ei(s),

Furthermore, suppose the trend surface u.(«) can
be written as,                             -•

     P.(s) - B.f.(s) + ... + 0 f (s); seD, (2.6)
      1       1 1 ~           p p ~   «•

where f (s),...,f (s) are known functions of
       1 ~       p ~
spatial location s.   Notate the unknown
coefficients as
Z(sn))',
     8 • (6lt...,ep)',

the data as,

     Z - (Z(sj

the errors as

     £ - (e(s1),...,e(sn))',

and the trend-surface components as,

     F - {f1(s±) ... fp(gi)},
                                           (2.7)
                                           (2.8)
                                           (2.9)
                                           (2.10)
where the expression in braces represents the
i-th row of the (n x p) matrix F.
     Then,
                          £,
                                           (2.11)
where E(e) • 0, and var(e) - la,.   This is a
        ~    ~          ~      1
                                          2
linear model with unknown parameters ( B, op .

The primary goal is not to estimate these
parameters but rather to predict Z(s-); s eD.

It is easy to see that the best (i.e., minimum
mean-squared error) linear unbiased predictor is
where
              (f 1(sQ),.. . ,f (s^)) ' ,  and
                                           (2.13)
the best linear unbiased estimator of g.   This
predictor has mean squared error,
                                                        mse(Z(8{))) - E(Z(sQ) -
                                                                                                .  (2.14)
                                                        2.2  Random field model
                                         Recall that the trend surface model
                                    specifies that two data values, no matter how
                                    close they are to each other, are statistically
                                    independent.   Most scientists working in geology,
                                    soil science, crop science, forestry, environmen-
                                    tal science,  etc., find this assumption contra-
                                    dicts their underlying notions of the mechanisms
                                    that resulted in the data they are studying.  Put
                                    simply,  in many scientific studies where data
                                    have a spatial label, it is felt that observa-
                                    tions closer together are more dependent than
                                    those further apart.  This is modeled as
                                                             Z(s) - u (s) + S (s); s£D,
                                                                                                   (2.15)
                                    where U.( •) is the large-scale, deterministic,
                                    mean structure of the process, and 5_( •) is the
                                    small-scale, stochastic structure that models the
                                    spatial dependence referred to above.  That is,
                                                 E(6,(s)) - 0     ; sen
                                                    2
                                         cov(62(s), «2(u)) s C(s,u) ; s.ueD.
                                                                                                   (2.16)
The representation (2.15) is precisely the sort
of approach time series analysts take; after
detrending, they model the small-scale variation
with stochastic processes that capture the
temporal dependence (e.g., autoregressive,
integrated, moving average processes).

     In order to estimate the quantity and
quality of the spatial dependence, some minimal
assumptions of stationarity have to be made.  A
process Y(')  which satisfies
     E(Y(s)) - V   ; se

     var(Y(a) - Y(u)) -
                                                                               (2.17)

                                                                 - u) ; S.UCD,  (2.18)
                                                        is said to be intrinsically stationary  (over
                                                        D).  The quantity 2Y(') is a function of the
                                                        vector difference s - u_, and is called  the

                                                        variogram (Y(') is called the semivariogram) .

                                                             Intrinsically stationary processes are more
                                                        general than second-order stationary processes,
                                                        which assume
                                         cov(Y(s),
                                                                               C(g.-
                                    where C(') is a function of the vector difference
                                    and is usually called the au toco variance
                                    function.  It is the preferred tool of time
                                    series analysts, but for prediction purposes it
                                    is not as flexible as the variogram; see Cressie
                                    (1988) for more details and a summary of the
                                    properties of the variogram.

                                         The geostatistical method (Matheron, 1963)
                                    of spatial prediction is very simple.  Estimate
                                    the variogram with a nonparametric estimator
                                     A
                                         , fit a model 2T(h;9) whose form is known
                                                     26

-------
apart from a few parameters 8 (i.e., fit

2r(h;8) to 2Y(h)), and finally use the fitted
variogram in the best linear unbiased predictor
(I.e., the kriging predictor) as if it were
known.  However it is more the exception than the
rule that Z(-) is intrinsically stationary;
Cressle (1986, 1987a) investigates how to proceed
when respectively y.(') is not constant and

£.(•) does not possess a variogram.

     Let us assume for the moment that in (2.15)
P.(s) 2 y , and 5,(«) is a zero mean intrins-
 2 "•     2       L
ically stationary stochastic process.  Matheron
(1963) has proposed the following variogram
estimator
                          £i) - Z(sj)),   (2.19)

                           - s  - h} and  |u(h)|
                                                        T is a symmetric (n +  l)x(n +  1) matrix.   The
                                                        minimized mean-squared error is given by,
where N(h) =
is the number of distinct ordered pairs in the
set N(h).  If the data are believed to be
approximately Gaussian but with haphazard
"contaminated" observations present, Cressie and
Hawkins (1980) advocated the robust estimator,
2Y(h) =
                    N(h)
                         |z<£i> -z<2j>r
                      {0.457 + 0.494/|u(h)|}.
To either of these is fit a parametric model
2Y(h;6).  Cressie (1985) proposes a weighted and
a generalized least squares approach, while
Kltanidis (1983) proposes a restricted maximum
likelihood (REML) approach (i.e., estimation of
6 is obtained from generalized Increments of the
data).  Both methods have their strengths.

     Assume that the estimation and fitting of
the variogram has been carried out successfully,
and that the variogram can be considered as
known.  Suppose it is desired to predict Z(s.)) at
some unsampled spatial location s. using a linear
function of the data:
     .         n
     Z(en ) -  I  X,Z(sJ.                 (2.20)
                                                                                                    (2.22)
Matheron (1963) calls  this  spatial prediction
method, (ordinary)  kriging,  after D.  G.  Krige (a
South African mining engineer  who in  the 1950s
became aware that predicted  grade distributions
were not a good indicator of actual grade
distributions).  In fact the optimal  linear
predictor (2.20) can be found  earlier in the
literature in works of Kolmogorov (1941) and
Wiener (1949).  The kriging  predictor given by
(2.21) assumes the  process  Z(')  contains no
measurement error;  Cressie  (1987b) has details on
how to predict noiseless versions of  Z(').

     Several important design  considerations
become apparent from (2.21)  and  (2.20),  viz. the
optimal predictor and  the minimized mean-squared
error depend on the relative locations (2.2) of
the data, the semivariogram  values
T(') and the sample size.   These considerations
will be picked up again in  Section 3.

2.3  Comparison of  trend surface model and  random
      field model

     Agterberg (1984)  describes  a small
experiment to compare  prediction based on a
quadratic trend surface, prediction based on
ordinary kriging, and  prediction based on a
quadratic trend surface plus kriging  the
residuals.  It is not  a priori true that the
latter prediction method would perform the  best
since overfitting may  occur.   Agterberg  took a
contour map of the  terrain  at  the top of the
Arbuckle formation  in  central  Kansas, within a 30
mile by 30 mile area.  Then  he randomly  chose
spatial locations from that  area to provide the
data.  Finally spatial locations for  prediction
were randomly chosen and relative mean-squared
errors:
                                                                                         *  100Z
                                                                        (z - z(a0))
which is unbiased (i.e., E(Z(s{))) - E(Z(s{)))  )

and minimizes the mean-squared error

     E(Z(s0) - Z^))2.

Then the optimal X's  satisfy an (n-t-1)-
dimensional linear equation that depends on
      ~ S>; i.J " 0,1,....n:
     r i • 2.

where X - (Xj

2" (^£j - S
r -
                                           (2.21)
              ,X2,... ,\n,m) ' ,
                         - Eg),  1)', and
                   i-n+1, j-l,...,n
                                  n+1;
were computed.  From the limited  experiment,
trend surface prediction (26.22)  was  slightly
worse than ordinary kriging  (23.3Z),  and  both
were dominated by the combination of  the  two
(19.1Z).

     Laslett et al. (1987) performed  a very
interesting spatial experiment to compare the
performance of several two-dimensional spatial
prediction methods.  They note that rival methods
of prediction are based on different  models, so
that a proper comparison of  methods cannot be
made analytically or on artifically generated
data.  They then propose to  make  the  comparison
on real data sets, viz. soil pH.   Among the
eleven predictors (which included quadratic trend
surface and kriging) used, kriging and Laplacian
smoothing splines performed  the best.  In their
study, the presence of dependent  small-scale
variation was strongly suggested,  justifying the
use of geostatistical methods for  prediction.
                                                    27

-------
2.4  Nonstationary mean structure

     In Section 2.3, there was mention of a
combination (trend-surface-plus-kriging)
prediction method.  In fact Matheron (1969) has
called this universal kriging.  Recall the random
field model (2.15).  Let the large-scale
variation be given by
v,(s)
 / "-
                     + ... + e f (
                              q q
If the covariance structure C(s,u) in (2.16) is

known it is a simple matter to estimate
6 =(8 .,..., 6)'  by generalized least squares,

and predict Z(
                                                                                                    (3.6)
                                                             Therefore, by starting  on  the  far  right  of
                                                        (3.6) and optimizing  successively over  each
                                                        .factor, a design could be  found by  adding  one
                                                        point at a time; the  (i+l)-th site  is placed  in
                                                        an optimal location having already  fixed the  i
                                                        previous sites.  Is this iterated design optimal?

                                                             The answer to the above question is,  "no".
                                                     28

-------
 A very simple counter example is given by Cressie
 (1978), for Z(') a process of Independent
 increments on the real line.  However, if one
 defines the design measure to be the limiting
 (n •*• <">) density of design points over the design
 space, then does the asymptotic iterated design
 measure equal the asymptotic optimal design
 measure?  This is an open problem, but the answer
 is probably, "yes"; see Wu and Wynn (1978).

      Although the iterated design may produce
 design points
              V 5 { s......s  }
                    "• i     "*n
                                            (3.7)
 that are quite different from optimal design
 points when n is finite, it still offers the
 possibility of providing a suitable yardstick for
 other, more computer-intensive, algorithms.  One
 such is the annealing algorithm, adapted by Sacks
 and Schiller (1987) for the spatial design
 problem.  Suppose n is fixed; the problem is to
 find the best V.  Again modify the notation of

 (3.4) and write S2 as S2(V) to show its
                  n
 dependence on the design points.

      The annealing algorithm assumes a starting

 value v'O/.  At the J-th stage, the current value
 is
       «>  =
= { .,«> ..... s
    ""1
                               }
                                            (3.8)
 The algorithm proceeds as follows:
  •  Choose v"' in some random or systematic
 manner (e.g., single point replacement)
    Replace V(J^ by V(J+1) with probability


            1     ; if S2(V(3+1)) - S2(V(j)) < 0
it , where
  j
 Notice  that  the algorithm always demands a change
 provided  there continues  to  be an improvement in

'the  objective  function S  .   For suitable choice
 of Y  in it  ,  it will  always avoid getting stuck at
 local minima by allowing  a positive (ever-
 decreasing)  probability of changing even when no
 Improvement  in the objective function is
 achieved.  Geman and Geman (1984) is a good
 source  for the properties of this stochastic
 relaxation technique.

      But  the question  remains.  At what stage
 should  the algorithm be stopped?  I suggest that

 the  iterated optimal S obtained from (3.6) could

 be used as a yardstick below which the quantity
  2   M}
 S (VVJ/) must  fall.  Other two at-a-time,  etc.
 iterated designs that  have rather straightforward
 computational  solutions,  could also be used as
 yardsticks.

      Finally,  the question of how many sites
 should  be included in  the network, needs to be
 addressed.   As always,  there are conflicting
                                               forces  at  work here;  too many sites and the
                                               monitoring program will bankrupt itself, too few
                                               sites and  the information is not adequate enough
                                               to  allow definitive conclusions.  One way to
                                               approach this is  to take a predictor and
                                               calculate  its average mean-squared error (3.4)
                                               under a special case  of the model (3.1) where the
                                               data  Z(s  ),...,Z(s )  are independent, and under

                                               the model  (3.1) where the data Z(s ),...,Z(s )

                                               are dependent through the error random field

                                               «(•)•   Call  the former S2   and the latter S2 ,.
                                                                       n,e                 n,6
                                               Then the effective  number of observations neff is
                                               obtained by  solving
                                                              neff, 6   sn,e
                                                                                          (3.9)
 Thus  the  network designer could think about
 having  "neff"  independent observations and make
 decisions  about  precision and cost as if the data
 were  independent.   When  an "neff"  has been
 decided,  it  can  be  converted  back  to the proper
 "currency" using (3.9).

      A  simple  example  would be to  use the

 predictor  Z  to construct (3.9). Suppose there is
 an  SQ  of  particular Interest.  Then the above
 considerations lead to
          n    n
 neff  - n{  £    E   cov(Z(s.),Z(s,))/n
         1-1 j-1       "*     ~J
            n
        -  2  I  cov(Z(s1),Z(s()))}.

      It would  be dangerous  to  demand too much
 from  this rather heuristic  approach,  since  "neff"
very much depends on the  criterion chosen in
 (3.9).  In other words, an  "neff"  constructed
according to one criterion  but  used for  a
different purpose in another criterion,  may  yield
nonsensical results.
      \exp{-logj(S2(V(;i'l'1))  - S2(V(3))/Y);  else .       Acknowledgement
                                                             This research was supported by the National
                                                        Science Foundation under grant number DMS-
                                                        8703083.
                                                        References

                                                        Agterberg, F. P. (1984).  Trend surface analysis,
                                                           in Spatial Statistics and Models, eds. G. L.
                                                           Gaile and C. J.  Vttllmott.  Reidel, Dordrecht,
                                                           147-171.

                                                        Box,  G.  E. P., and  Jenkins,  G. M. (1970).  Time
                                                           Series Analysis. Forecasting and Control.
                                                           Holden-Day, San  Francisco.

                                                        Cressie,  N.  (1978).  Estimation of the integral
                                                           of  a  stochastic  process.   Bulletin of the
                                                           Australian Mathematical Society. 18,  83-93.

                                                        Cressie,  N.  (1985).  Fitting variogram models by
                                                           weighted  least squares.   Journal of the
                                                           International  Association  for Mathematical
                                                           Geology. _17_. 563-586.
                                                    29

-------
Cressie, N. (1986).  Kriging nonstatlonary
   data.  Journal of the American Statistical
   Association, 81, 625-634.

Cressie, N. (1987a).  A nonparametrlc view of
   generalized covarlances for krlglng.
   Mathematical Geology. 19. 425-449.
Cressie, N. (I987b).
   ordinary krlglng.
Spatial prediction and
Proceedings of the  •
   Mathematical Geologists of the United States
   Meeting (Redwood City, CA, April 1987),
   forthcoming.

Cressie, N. (1988).  Variogram, in Encyclopedia
   of Statistical Sciences, eds. S. Kotz and N.
   L. Johnson.  Wiley, New York, forthcoming.

Cressie, N., and Hawkins, D. M. (1980).  Robust
   estimation of the variogram:  I. Journal of
   the International Association for Mathematical
            12, 115-125.

Cressie, N., and Read, T. R. C. (1986).  Spatial
   data analysis of regional counts.  Statistical
   Laboratory Preprint No. 86-46, Iowa State
   University, Ames.

Geman, S., and Geman, D. (1984).  Stochastic
   relaxation, Gibbs distributions and the
   Bayesian restoration of images.  IEEE
   Transactions on Pattern Analysis and Machine
   Intelligence, ^, 721-741.

Kltanidis, P. K. (1983).  Statistical estimation
   of polynomial generalized covariance functions
   and hydrological applications.  Water
   Resources Research, 19, 909-921.

Kolmogorov, A. N. (1941).  Interpolation and
   extrapolation of stationary random
   sequences.  Izvestiia Akademii Nauk SSR,
   Serlia Matematlcheskala, 5_, no. 3.

Laslett, G. M., McBratney, A. B., Pahl, P. J.,
   and Hutchinson, M. F. (1987).  Comparison of
   several spatial prediction methods for soil
   pH.  Journal of Soil Science. 38, 325-341.

Matheron, G. (1963).  Principles of
   geostatistics.  Economic Geology, 58. 1246-
   1266.

Matheron, G. (1960).  Le Krlgeage Universe!.
   Cahiers du  Centre de Morphologic
   Mathematique, no. 1, Fontainebleau, France.

Peters, N. E., and Bonelli, J. E. (1982).
   Chemical Composition of Bulk Precipitation in
   the North-Central and Northeastern United
   States, December 1980 through February 1981.
   U. S. Geological Survey Circular 874.

Sacks, J., and Schiller, S. (1987).  Spatial
   designs, in Proceedings of the Fourth Purdue
   Symposium on Decision Theory and Related
   Topics, eds. S.  S. Gupta and J. 0. Berger.
   Springer, New York, forthcoming.
Tukey, J. W. (1977).  Exploratory Data
   Analysis.  Addison-Wesley, Reading.

Wiener, N. (1949).  Extrapolation, Interpolation
   and Smoothing of Stationary Time Series.  MIT
   Press, Cambridge.

Wu, C-F. J., and Wynn, H. P. (1978).  The
   convergence of general step-length algorithms
   for regular optimum design criteria.  Annals
   of Statistics, £, 1273-1285.
                                                     30

-------
                    SPATIAL AUTOCORRELATION:  IMPLICATIONS FOR SAMPLING AND ESTIMATION
                                             Evan  J.  Englund
              U.S. Environmental Protection Agency, P.O.  Box  93478,  Las Vegas,  NV 89193-3478
ABSTRACT
   Spatial autocorrelation is a normal result of
the physical and chemical processes Which operate
in the environment.  All measurable environmental
parameters will exhibit spatial autocorrelation
at some scale.  This not only causes technical
problems in sampling and estimation, but leads to
more fundamental problems in communication.  Terms
such as "chemical concentration", "representative
sample", and "frequency distribution" which are
commonly used and well understood in laboratory
situations, can become essentially meaningless
when applied to environmental measurements with-
out an explicit statement of the spatial scale
(support) being considered.
   A simulated spatially autocorrelated distribu-
tion is used to illustrate the changes in concen-
tration, frequency distribution, and sample qual-
ity associated with changes in support.  Vario-
grams computed from the simulated data illustrate
the relationship between spatial variability and
standard QA/QC.  Practical suggestions are made
for sampling and estimation of spatially autocor-
related sites.

INTRODUCTION

   The purpose of environmental sampling programs,
like other samplings, is to use the Information
obtained from the sample to make inferences about
the larger population from which the sample is
drawn.  For example, an industrial waste lagoon
may be sampled to determine whether its mean con-
centration exceeds an allowable maximum value for
a particular chemical.  In other cases, sampling
may be done to determine whether the level of pol-
lution has Increased or decreased from previous
levels at a site, or to Identify the locations of
polluted areas.  A typical sequence of events in
an environmental investigation might include the
design of a sampling plan, collection of samples,
laboratory chemical analysis, interpretation of
data, and finally, a decision based on the inter-
pretation.
   Decisions made by various federal, state, and
local agencies may require remedial, preventative,
or punitive measures which may have substantial
consequences in terms of human health as well as
economics.  It is obviously Important that envi-
ronmental Investigations be conducted in such a
manner as to ensure that decisions  are based on
the best possible information.  It  should be
emphasized here that the information on which
decisions are made is not the sample data itself,
but the interpretation of the data; that is, the
estimates of the characteristics of the larger
population which are made from the  data.
   Unfortunately, the interpretation of field
data is made difficult by the nature of the
environment Itself, which is not always amenable
to the methods of sampling and data analysis
designed for use under more controlled conditions
such as laboratory experiments.  Although the
same sampling and data analysis terminology is
used in both the laboratory and the field, the
precise meanings of many terms are  significantly
different.  The resulting ambiguities can lead to
problems in both the design of sampling programs,
and the interpretation of the results.  This
paper will explore the causes of these problems,
attempt to clarify the terminology, and offer
some practical suggestions for sampling design
and data interpretation.

SPATIAL AUTOCORRELATION

   Spatial autocorrelation is the basic cause of
the problems to be discussed.  All measureable
environmental parameters exhibit spatial (and
temporal) autocorrelation at some scale.  This
means that over some range of distances, measure-
ments will tend to be more similar to measurements
taken nearby than to measurements taken farther
away.  For practical purposes, stating that a
phenomenon is spatially autocorrelated is equiva-
lent to stating that it is not uniformly distrib-
uted, but autocorrelation is more easily quanti-
fied in statistical terms.  The term spatial
correlation is often used interchangeably with
spatial autocorrelation, but the former implies
that the measured values are correlated with
their locations, as temperature, for example, is
correlated with distance from the equator.
   The physical and chemical processes that con-
trol the fate and transport of chemicals in the
environment do not operate at random, although
most events include what may be considered random
processes, in the sense that they are too complex
to be predicted in detail.  Random and determin-
istic processes may operate simultaneously at
several scales to produce the phenomenon being
measured.  The measurement of precipitation (with
or without acid), provides a good illustration.
Regional weather patterns produce the conditions
necessary for rainfall over a large area.  Within
the area, individual clouds or clusters of clouds
form, apparently at random locations, due to
local fluctuations in temperature and humidity.
If one collected pairs of rain-guage measurements
taken at various spacings, one would expect to
observe strongly correlated readings at separa-
tions much smaller than the size of the average
cloud, weaker correlation at regional-scale
separations, and essentially zero correlation for
separations larger than the size of the region.
Although contiguous rain guages would be expected
to show the greatest correlation, it would not be
perfect, due to the fact that the two guages may
not receive Identical amounts of rain (spatial
variability), and even if they did, the readings
may not be identical (measurement error).
   After a major rainfall, for example, we might
be asked to estimate the average precipitation
over a watershed from a given set of rain guage
readings, in order to estimate the watershed's
contribution to a flood crest.  Alternately, we
might want to know how many guages we would need
in order to predict flooding with a specified
degree of accuracy.  The methodology commonly
known as geostatistics (Matheron, 1963; Journel
and Huijbregts, 1978) was developed in the mining
industry to deal with questions of local ore
grade estimation, and has since been shown to be
                                                    31

-------
generally applicable to most situations where
spatial autocorrelation Is present.
   A basic assumption in geostatistical analysis
is that the spatial autocorrelation exhibited in
a set of measurements can be represented by an
underlying autocorrelation function which is
valid over the region of interest.  For many
environmental phenomena, this assumption can be
intuitively related to the controlling processes.
   The variogram is one commonly used method for
quantifying spatial autocorrelation (Fig.l).
Experimental variograms are computed from sample
data by examining all possible data pairs, group-
Ing them by distance classes, and computing a
variance for each distance class using the stand-
ard paired-sample formula. A theoretical model is
then fitted to the experimental data.  A variogram
contains exactly the same Information as a plot
of correlation coeflclents for the same distance
classes (one can be transformed into the other by
Inverting the plot and rescaling the y-axis).
                     Itnnr
                     Totcal
                     M.I    75.«

                        Mitittt
Fig. 1.  Typical variogram plot and fitted model.
Variances are computed from paired sample differ-
ences for pairs in successive distance classes
and plotted against distance.  The fitted model
exhibits commonly observed features: a random
component or "nugget" at the y-axls intercept,
and an Increase in variance with distance up to
a maximum "range" of autocorrelation.
SAMPLE SUPPORT

   When spatial autocorrelation is present, the
physical dimensions of the sample become important
considerations.  The term 'support', defined as
the "size, shape, and orientation of the physical
sample taken at a sample point" (Starks ,1986), is
used to avoid confusion with the statistical size
(number of observations) of a sample.  The support
of a soil core, for example , would be its diam-
eter and length.  In the case of a rain guage
measurement, the support would Include the diam-
eter of the orifice, and the time of accumulation.
   When the sample support changes, the statis-
tical properties of the sample set such as the
variance of samples and the sampling variance
also change.  These changes make sample support a
critical element in the design of a sampling
program.  The concept of support applies equally
to the stage of data Interpretation.  The support
on which decisions will be made is rarely the
same as that of the samples.  The choice of
decision support can significantly affect the
outcome of an analysis, and should be considered
before a sampling program is undertaken.  The
idea of support and its ramifications will be
developed further in the following sections.

CHEMICAL CONCENTRATION

   The term 'chemical concentration' is meaning-
less -in the absence of a specified support.
Atoms and molecules represent the smallest scale
at which elements and compounds can be said to
exist.  If samples are taken at this scale, the
true concentration of any substance within the
sample will be a discrete binary phenomenon - the
concentration will be either 100Z or 0%.  Any
larger sample is made up of a mixture of discrete
components, and its true concentration will be
the sum of the weight or volume of the analyte
divided by the total weight or volume of the
sample; i.e., the average over the sample sup-
port.  Usually the entire volume of a sample is
not measured directly, but the measured concen-
tration of a subsample is used to estimate the
the mean concentration of the original sample.
Great care is taken during the preparation of a
sample to ensure uniform mixing, and if necessary,
to reduce the particle size of the material so
that any subsample used for analysis will have a
true mean concentration very close to the true
mean of the entire sample.  If the mixing is
effective, neither the size of a subsample nor
its location should have a significant effect on
the outcome of the analysis...for practical
purposes, one subsample is as good as any other.
   In the field, where the area being investi-
gated cannot be uniformly mixed, the situation is
quite different.  Layers, crystals, clumps, or
other high concentrations of a substance  often
occur such that if a given sample had been taken
at a slightly different location (sometimes only
a fraction of the sample size away), a signif-
icantly different true sample concentration
would be found.  The classic example of this was
observed in placer gold deposits, where the
presence or absence of a single, miniscule gold
nugget in a sample would make the difference
between an assay indicating high-grade ore, and
one indicating waste.  This led to the interest-
ing term 'nugget effect' often being applied to
the y-axis discontinuity in variogram models.
   Like changes in location, changes in support
also result in changes in concentration.  The
true value of a point sample is 0 or 100Z; the
true value of any larger volume centered on the
point is the mean concentration over the volume.
If the dimensions of a sample are increased or
decreased; if the shape of the sample is changed,
say from a sphere to a cube; or if the orienta-
tion of a non-spherical shape is changed; the
sample will contain a different set of molecules,
and probably a different true concentration.  For
any point in space which represents a potential
sample 'location', an infinite number of possible
sample supports exist centered on the point, and
an infinite number of possible true concentra-
tions which can be said to be the concentration
at that location.  Obviously, we must conclude
that any reported measurement of chemical concen-
tration in the environment is essentially meaning-
                                                    32

-------
less unless the support is also reported.  Like-
wise, a statement such as "remedial action will
be taken if the concentration of cadmium in soil
exceeds 500 ppm" is also meaningless unless the
support is specified.

A SIMULATED EXAMPLE

   An example based on a simple computer simula-
tion serves to illustrate the support problem.  A
blank computer screen is 'polluted' with a
'realistic-looking' pattern of pixels (Fig. 2).
The algorithm which was used to generate the pat-
tern first selected 25 points at random In the
central part of the screen.  Each of these points
was used as the center of a cluster of up to 2000
points scattered around the center at random
angles and approximately normally distributed
distances (sum of three uniformly distributed
random values).  Blank pixels were initialized
with a value of zero, and incremented by one each
time they were hit by the point generator.  A
color terminal can be used for a more effective
display than Fig. 2, because pixel values greater
than one can be represented by various color
codes.  The details of the algorithm are not cru-
cial.  Any algorithm which conditions the outcome
of a random process on a prior outcome of the
same or another random process, can be used to
generate spatially autocorrelated patterns.
Variations on the drunkard's walk (e.g., two
steps forward, three steps back) are effective.
  Fig. 2.  A simulated, spatially autocorrelated
  distribution of "pollutant".
   When the pattern has been generated, we have
an area subdivided into pixel-sized units, for
each of which we know the exact pollution concen-
tration.  The pixel scale of support is considered
to be smaller than any support size we would be
interested In, but since we have exact knowledge
at that scale, we can now combine pixels into any
larger support areas we choose, and compute the
exact average pollution concentrations over these
areas.              	     	.
   Figs. 3-7 illustrate the results of this kind
of change in support.  In Fig. 3, the screen area
was divided into support blocks of 5x5 pixels,
and the mean pixel value was computed for each
block.  The block means were grouped into class
intervals and represented as  shaded  patterns  on
the map.  The histogram scaled by  area, and uni-
variate statistics, are also  shown for the set of
non-zero value blocks.  Figs. 4-7  repeat this
process at supports of 10x10, 20x20,  40x40, and
80x80 pixels, respectively.
HfllkltJ IlKMTM In CMC


blub flit : I
vTFI 	 .
I2M.I

2WW.I
0
e
I
HMH.I
t
k

HM.I






















_




l.l
























~]
pi
1 h->T_
Statistics
NtoUl: 2321
1 Kissing: 1
DIM: 2321
Nu>: 1.457
Utriuce: 8.193
SUe.: 1.439
CM: i.%1
Skewcss: 1.681
brtnis: fc.SU
fli*: I.MR
«l: 8.128
IMiu: 1.328
«3: I.MI
1 ' ' ' i 	 1 	 1 B«. 3.WB
1.1 a.i i.i 4.1
        Fig. 3.  Map and histogram of true "concentrations" of simulated pollutant averaged over
        5x5 pixel blocks.  Darker shades on map represent higher values:  Blank «0.4 units per
        pixel); Light shade (0.4 to 0.8); Medium shade (0.8 to 1.2); Dark shade O1.2).
                                                    33

-------
HtKkttl Hiitnro tn CMC
Mock uzt : 10
,n — >
S00M.0

40000.0

0
C 10000.0
1
3
r
I
k 20010.0

10000.0


0 0





_,
















Tu
rh
i rk^_
Statistics
H total: 674
II Hini*: 1
H Used: 674
feat: 1.393
Uariance: 1.177
SU»: 1.428
01: 1.868
Stamen: 1.644
hrtosis: 6.372
Bin: 1.811
41: 1.868
IMian: 1.258
43: 8.6M
••• 1 i i 	 1 	 1 nuc -f.va
0.0 1.0 1.0 1.0 4.0
CMC
Fig. A.  Map and histogram of true "concentrations" of simulated pollutant averaged over
10x10 pixel blocks.  Shades represent the same values as in Fig. 3.

10BB0 0

40000.0
30001.0
k 20000.0
10M. 0
0




-


0

IkitkUd •iilctmi Icr CMC
klcck fix : 20
Q.
1



Iru n
1.0 >.0 3.0 4
CMC


Statistics
H Total: 192
H Hissing: 8
HUseJ: 192
Dean: 8.345
Variance: 8.161
SUn: 8.482
CU: 1.165
Stamen: 1.668
lurtosis: 6.228
Din: 8.8M
41: 8.838 !
Heoian: 8.188
43: 8.538
Rax- 2.148
0

Fig. 5.  Map and histogram of true "concentrations" of simulated pollutant averaged over
20x20 pixel blocks.  Shades represent the same values as in Fig. 3.
Utifhttd MistogrM tmr cone
block si» r 4t

.1001*05

a
I.IME<05
•
J
!
k.400[»05

2iBE4>iK
.<^K*«n
a •




1









S 1 1 1
NToUl
II Iliain,
NUsal

HMD
Vwincc
Sttev
CU
Skemess
Xurtosis
din
i 41
ri
(Mian
43
•.« 1.0 2.8 3.t 4.8
cone


i st i cs
59
1
59

8.281
e.I28
8.358
1.275
1.484
4.773
B8W
8.818
8.128
1.485
1 CCA
l.bbD

Fig. 6.  Map and histogram of true "concentrations" of simulated pollutant averaged over
40x40 pixel blocks.  Shades represent the same values as in Fig. 3.
                                           34

-------
UtifhUd NifUgpM f>p CMC
MftCk sist = M
,:«,« 13 	 '



.M«
-------
REPRESENTATIVE SAMPLES

   'Representative'  is one of the most misused
terms in environmental sampling (Splltstone,
1987).  In this case, the effects of spatial
autocorrelation in the field combine with incon-
sistency in the use of the term 'sample'  itself,
resulting in a great deal of confusion.  A sta-
tistical sample of n observations is sometimes
said to be representative if the mean, and or
other parameters of its frequency distribution
are 'similar' to the population from which it was
drawn.  In a somewhat more abstract sense, the
distribution might be called representative if it
was properly drawn (that is, if each member of
the population had an equal chance of being in-
cluded in the sample) , even though it is  not
similar to the population.
   An environmental sample such as a soil core,
collected in the field and sent to a lab  for
analysis, is not a 'sample' in the commonly used
statistical sense.  It might be more appropri-
ately called a member of a population which has
been included in a sample of that population.  A
set of n such environmental samples , or observa-
tions, make up a statistical sample which can be
representative of the population of possible
observations of the same support within the sam-
pled domain.  It can not, however, be directly
representative of any other support population,
except for the population mean, which is  constant
within the domain regardless of support.   In the
context of ordinary random statistics, it is
therefore difficult to ascribe any representative-
ness to an individual environmental sample other
than being a member of a representative set at
that support.
   In the case of environmental field investiga-
tions, however, the whole point of a sampling
plan is often to obtain samples (observations)
that are representative in a different sense,
namely, representative of their local areas.
This spatial kind of representativeness Is also
hard to define.  Intuitively, what we are really
after is a measurement at a location that is
close to the local mean concentration, but as we
have seen, the local mean is a function of the
support.  Thus, until we define the support which
we want our  observation to represent, we can't
say anything about how good the observation is.
   Given a set of n observations within a domain,
a geometric  support neighborhood can be defined
around each  observation as the set  of all points
closer to it than to any other observation (i.e.,
» Voronl polygon).  Voronl polygons are an approx-
imation of the neighborhoods with which the
observations are most correlated, and give an
Idea of the  spatial resolution of the set of
observations.

APPLICATIONS

   In most of  the above discussion, we have dealt
with the uncertainties and ambiguities involved
in  taking and  measuring samples  of  the physical
environment, and  the inherent  limitations of re-
lating measured values back  to some 'real*  char-
acteristic of  the environment.   In  spite of these
problems, most  of which have not been  adequately
dealt with theoretically,  the  world marches on.
Site investigators  are still faced  with  the neces-
sity of collecting samples in the field, and
using them to make interpretations and decisions.
The remainder of this paper will focus on prac-
tical approaches to sampling and data analysis
which will help reduce ambiguity, and improve
data quality and usefulness.  We will continue to
use the simulation as an illustrative example.

SAMPLING DESIGNS

   One of the common problems facing a' site
Investigator Is the layout of a sampling pattern.
Assume for the moment that you have a predefined
domain that must be sampled and that the number
of samples you can take has been fixed by budget
constraints.  Where should you take the samples?
Most available guidance documents such as SW-846
recommend random sampling as the general solution.
However, it has been shown (Olea, 1984, Yfantis,
et al., 1987) that in the presence of spatial
autocorrelation, sampling on a systematic grid
will produce a more efficient sampling.  If
spatial autocorrelation is not present, the
regular grid will be no better or worse than a
random sample.  Because the regular grid is a
periodic sampling in space, it is obviously con-
traindicated when the presence of a spatial
periodicity in the phenomenon is suspected at a
scale near that of the proposed grid.  Fortun-
ately, this situation is not common, and the
regular grid, because of Its simplicity and
effectiveness, can be used in most spatial sam-
pling programs.
   The efficiency of the regular grid is a result
of the minimization of spatially clustered data
which are duplicating each other's information.
As a practical matter, small departures from
regularity do not have major effects, so that
field crews can make offsets from the grid to
avoid obstacles without affecting the results of
a study.  This is particularily  true when the
autocorrelation function exhibits a  large random
component (nugget effect) relative to the auto-
correlated component.  The larger the relative
nugget effect, the greater the tolerance area
around the regular grid nodes.

SAMPLE SUPPORT AND QA/QC

   To  illustrate  the potential impact of sample
support on the quality of an Investigation,  we
will sample  our simulation  at  two different  sup-
ports, and use the results  for interpolation of
concentrations.   The sample  locations are on the
regular grid  shown in Fig.  8.  The  first sampling
 is done  at  the  single pixel  support.  Each sample
is assigned  the true value  of  the pixel  at  that
location; we are  assuming  no sampling or analyt-
 ical error.   The  second  sampling is  done on  the
 same grid  at a  2x2 pixel  support.   In this  case
 the  true mean value  of  the  4 pixels  in  the  sample
 is  used.
                                                   36

-------
Fig. 8.  Location of samples taken from the simu-
lated distribution.  True values of single pixels
and 2x2 pixel blocks were taken at each point.

   Variograms computed from the two sample sets
are shown in Fig. 9.  Note the dramatic decreases
in variance at  all distances with the larger
support.  When  considering quality control, we
take particular note of  the projected y-axis
intercept or nugget, which has dropped from a
value of 0.5 to less than 0.1.  The nugget value
provides an estimate of  the total variance asso-
ciated with taking adjacent, or co-located sam-
ples,and Includes small-scale spatial variability
as well as sampling, preparation, and analytical
errors.  In the present  example, we Introduced no
measurement errors, so all of the nugget value is
the result of spatial variability, and the change
in support has  a major effect.  If, at the other
extreme, the nugget were entirely due to sampling
and analytical  errors, it would not be reduced at
all by changing support.
                      •wiwrai Nit
                  1 timl n. i«l timl tumrl
                               M.t
Fig. 9.  Variograms computed from simulated sam-
ples on single pixel (upper curve) and 2x2 pixel
(lower curve) support.  Points are experimental
values; solid lines are subjectively fitted
models.
   The approach one takes  to improving data qual-
ity, therefore, should ideally be dictated by a
variance components analysis of the variogram
nugget value.  The largest variance component
should get the most attention.  If spatial var-
iance dominates, increase  the support by taking
larger samples or perhaps  by compositing small
clusters of samples.  If measurement error dom-
inates, look for a more accurate method.  When
either spatial or measurement variance is very
dominant and cannot practically be reduced
further, it may be possible to achieve a signif-
icant cost saving without  seriously affecting
overall data quality by going to cheaper methods
which moderately increase  the lower variance
component.  For example, if the maximum feasible
support still results in a spatial variance com-
ponent ten times greater than the measurement
variance, the measurement  variance could be
allowed to increase by a factor of five with a
resultant increase in the  total standard error of
only 17%.

   The question of quality assurance goes beyond
quality control of sample  data, to Include the
adequacy of the program as a whole to achieve the
desired objectives.  In the case of a site Inves-
tigation, we are partlcularlly interested in the
quality of the spatial estimate made from the
sample data, and we must therefore look at sample
quantity and location, and the estimation or
interpolation procedures, as well as sample data
quality.  All of these factors can contribute to
the quality of the end result, and as always, the
most cost effective use of resources is to work
at reducing the largest error component.
   With our simulated example, we can compare the
effects of the sample quality on the overall
quality of the data interpretation by Interpolat-
ing concentrations over the screen area from each
of the two sample sets.  Using the variogram
models from Fig. 9, kriged estimates were com-
puted for 10x10 pixel blocks.  The results are
shown In Fig. 10, with the blocks shaded accord-
ing to the same classification used in Fig. 2.
Comparison with the true values shown in Fig. 3
Illustrates the overall superiority of the 2x2
sample support in defining the general pattern of
block concentrations.
   Plots of the true vs. estimated concentrations
for the two Interpolations are shown in Fig. 11,
and histograms of the estimation errors in Fig.
12.  Note that even though the 2x2 pixel samples
provide better estimates, there is still rela-
tively high error which may lead to a large
number of false positives and false negatives at
most concentration action levels.  Further reduc-
tion of estimation error would require more
samples, larger support, or both.
                                                   37

-------
Fig. 10.  Maps  of  kriged estimates for  10x10 pixel blocks using single  pixel (left) and
2x2 (right) samples.   Compare with map  of  true  block values in Fig.  4.
       Scilter Fl<<
   1 Pixel Xriffiftf CwrwisM
                                   I.M1
                                   I.B2

                                   1.711

                                  (II. 1M


                                    tut*
                                    1.119
     Sc»tt» rut
2x2 Pixtl Xrifiat Cwwi
 Figo 11.  Plots of estimated vs. true block values  and associated regression  statistics
 for the two kriged estimates in Fig. 10.  The  left  plot is from the single  pixel case,
 the right, from the  2x2  pixel case.
«...
Ml I
j "*•'
M.I
I.I
-1
1 Pinl biiiw
i ni
1 Id
CMWISM
] 	 1




-TtrrfTl

i
L
t> -«.l •.• •.• i
St>t 1st ics
II total: M8
N «iai»|: 8
NUni: 639
NH>: -4.813
Vwun: 8.85ZI-41
CUw: 1.292
CU: 23.188
Skemen: -1.386
brtni>: 7.Z74
Ri»: -1.515
Mia: 8.818
43: 8.142
llu: 8.785
i
121.1

Ml.l
I
9
e
t
J 1M.I
k
M.I
I.I
-1
2x2 Pixtl NriffiHff CMPWISW



r
J

L
( -I.I I.I I.I 1
Stttistics
H Total: Me
Kited: 537
fo»: 1.813
Utriuce: 8.3691-41
Sttw: 1.192
Skewets: -8.575
hrtaif. S.7€
Din: -8.9U
Ql: -4.871
Mi»: 8.813
t&: 1.188
Ihx: 8.583
i
MtlMtti (-tFM)
  Fig. 12.  Histograms  of  estimation errors (estimated  - true) for the single  pixel case
  (left), and the 2x2 pixel  case (right).

-------
ISOPLETH MAPS
                                                       CONCLUSIONS
   Isopleth, or contour, maps are a commonly used
and effective  method for displaying the spatial
distribution of chemical concentration.  The use
of isopleths, however re-introduces some of the
ambiguity discussed above regarding support.  If
we examine the krlged maps In Fig. 10, we can
easily visualize the process of generating a con-
tour line by smoothing the stepwise boundary
between two shading patterns.  This is essen-
tially the same result we would get if we assigned
the estimated block average values to the block
midpoints and ran the resulting grid through a
typical contouring algorithm.  It is also compar-
able to the limit we would approach (very expen-
sively) with krlging as the grid dimension ap-
proaches zero.  The problem arises because a
contour represents the intersections of a plane
parallel to the x-y axis and a continuous surface.
The existence of one contour line Implies the
existence of all possible contour lines, and
demonstrates that we have in effect estimated
Individual concentration values at every point.
But we showed earlier that point concentrations
are binary and discontinuous, and that meaningful
statements about concentration require specifying
a support.  Which support?
   Intuitively, we might like to think of the
isopleth surface as representing the result of a
moving average based on a specified support win-
dow.  While this sounds good, it doesen't work in
practice.  When we specified a 10x10 block grid,
It was easy to compute the true block means and
compare them to the estimated block means.
However, when we try to compute a true moving
average isopleth based on a 10x10 support window,
we find peaks, holes, and Irregularities in the
Isopleth lines occurring at much smaller scales
than the 10x10 support.  Such a situation is
self-contradictory, and it is not obvious how to
define a true Isopleth, compare it with an esti-
mated one, and determine the goodness of the
estimate.
   In spite of the difficulty of defining a 'real'
isopleth, a case can be made for using Isopleths
drawn from kriged block estimates as remediation
boundaries.  Given that kriging is in some sense
an optimal estimator, and the goal is to remediate
all of the area above an action level, the best
approach should be to krige every point in the
sampled domain, and remediate all of the points
estimated to be above the action level.  Contour-
ing krlged block values provides a good approxi-
mation which Is computationally feasable.  The
point to remember when doing this is that the
nice, smooth contours may be primarily a regres-
sion effect, and that the underlying reality may
in fact be very erratic.  If we were to clean up
an area Inside an isopleth boundary, and then
take check samples immediately around the bound-
ary, we would expect to see a large number
(approximately 30 to 50%, depending on the dis-
tribution) of samples exceed the action level,
even if the boundary is well estimated.  A more
practical approach would be to do the check
sampling along the boundary before the clean-up,
and use the additional data to better define the
boundary.
   Chemical compounds in the environment can be
spatially autocorrelated at scales ranging from
the molecular to global.  Autocorrelation is prac-
tically significant when it occurs at scales
relevant to the problem, that is, between the
dimensions of a sample and the dimensions of the
domain being investigated.  Under such conditions
the support, or physical size, shape, and orienta-
tion of a sample, becomes a critical factor in
the quality of an investigation.  The use of
QA/QC data for a variance components analysis of
the nugget component of a variogram model can be
very useful in selecting the most cost-effective
approach to additional sampling.
   Chemical concentration in the inhomogeneous
environment must refer to a specific support to
be meaningful.  Specifying the support associated
with an action level for enforcement or remedia-
tion removes a source of ambiguity and potential
misunderstanding.
   Spatial estimates of concentration such as
krlged blocks, and the contour maps derived from
them, are smoothed representations of reality,
and represent a support more like the sampling
grid size than the dimensions of the sample.
Site investigators should understand, and be able
to explain to the public, why many of the values
at the sample support, taken outside an Isopleth
boundary, are expected to exceed the Isopleth
value.

NOTICE

   The Information in this document has been
funded wholly by the U.S. Environmental
Protection Agency.  It has been subjected to
Agency review and approved for publication.

REFERENCES

Journel, A. G., and Huijbregts, Ch. J., 1978.
   Mining Geostatistics.  Academic Press, London.
   600 pp.
Matheron, G., 1963.  Principles of Geostatistics.
   Economic Geology, Vol. 58, pp 1246-1266.
Olea, R. A., 1984.  Systematic Sampling of
   Spatial Functions.  Kansas Geological Survey,
   Lawrence, Kansas. 57 pp.
Splitstone, D.  £., 1987.  Sampling Design:
   Some Very Practical Considerations.  (This
   volume).
Starks, T. H.,  1986.  Determination of Support in
   Soil Sampling.   Mathematical Geology, Vol. 18,
   No. 6, pp. 529-537.
Yfantis, E. A., Flatman, G. T. , and Behar, J. V.,
   1987.  Efficiency of Krlging Estimation for
   Square, Triangular, and Hexagonal Grids.
   Mathematical Geology, Vol. 19, No. 3, pp.
   183-205.
                                                   39

-------
                                       Discussion
                                       John Warren
                           US Environmental Protection Agency
      Spatial prediction,  as illustrated
 by these two papers,  holds great promise
 in extracting large amounts of informa-
 tion from situations where extensive
 data collection is not feasible.

      Evan Englund's paper on using
 Jcriging in conjunction with the concept
 of sample support is most interesting
 and his discussion illustrates clearly
 the problem of defining "sample" with
 respect to the physical dimensions of
 a sample site.  The use of "sample
 support" to avoid confusion with the
 common statistical understanding of
 "sample" is to be commended and his
 definition:   "The support on which
 decisions will be made is rarely the
 same as that of the samples.   The
 choice of decision support can sig-
 nificantly affect the outcome of an
 analysis,  and should  be considered
 before a sampling program is under-
 taken." should be included in every
 Data Quality Objectives (DQO)  Program.

      The illustration of  "sample
 support" by  simulation is particularly
 useful in demonstrating how the defi-
 nition of contamination is a  function
 of the particular support chosen.   This
 simulated example should  be made part of
 the Agency's DQO training program as it
 clearly demonstrates  to managers and
 decision-makers the importance of allo-
 cating resources proportional  to the
 ratio between decision-area and sample
 support area.

      Evan's  discussion on the  choice of
 most efficient or most practicable  sam-
 pling design for sampling in a spatial
 domain is very useful  for there does not
 seem to  be a general  solution  applicable
 to  all  sites.   Each site  must  have  a
 unique sampling plan  and  although the
 Agency's guidance document, SH-846,
 endorses simple  random sampling, the
 document does  recommend a  tailored-to-
 the-site approach.  The revised edition
 of SW-846  (4th edition with an expected
 publication  date of mid-1990) will pro-
vide guidance on most  sampling  schemes
 likely to be encountered  in spatial
 statistics,  and also gives guidance  on
 the construction of optimal grid sizes
 for the effective use  of kriging.

     Noel Cressie's paper  is a nice
 summarization of spatial statistics and
 introduces some of the scientific nota-
tions essential for discussion of
kriging.  His comparison of trend sur-
 face model with the random field model
 is very useful and clearly illustrates
the differences between the two models.
     The strength of the paper lies in
the excellent discussion of spatial
design of networks, Section 3.  The
problem of adding a monitoring site to
an existing network cannot be solved
easily and only by making key assum-
ptions on the intent and use of the
monitoring data can any solution be
attempted.  The concept of "average"
prediction error (equation 3.4) has
much merit as many of the Agency's
monitoring networks are intended to
measure mean increases/decreases in
pollution levels.  It seems clear that
this notion of optimal network design
can be applied to Agency networks where
non-statistical considerations in
determining actual sites are relatively
unimportant.

     There is much that needs to be done
in order to effectively bring spatial
statistics to the practical level at
EPA, and these two papers are an impor-
tant start.  There is definitely an
interest in spatial statistics as evi-
denced by the number of people present
at the spatial statistics sessions at
the annual EPA Conference on
Statistics; these two papers should
further stimulate this interest.
                                           40

-------
  SAMPLING AND MODELING POLLUTANT  PLUMES:  METHODS COMBINING DIRECT MEASUREMENTS AND REMOTE SENSING DATA
                                             Ralph E. Folsom
              Research  Triangle  Institute, P.O. Box 12194, Research Triangle Park, NC  27709
 1.0  Introduction
   The  goalof  this    paper  Is   to  describe
 response   or   trend     surface  methods   for
 characterizing groundwater contamination plumes.
 The  work   focuses   on    statistical  problems
 associated  with mapping   pollutant plumes In the
 vicinity of  hazardous  waste  (superfund) sites.
 The  Importance of conducting  low cost screening
 surveys  to  obtain   Indirect  measurements  of
 contamination Is emphasized.   Two such measures
 of   subsurface   electrical   conductivity  are
 described.  To  Illustrate the  utility of such
 low      cost       screening      measurements,
 electromagnetics  or  EM  1s  employed  as  a
 covarlate In   spatial  regression  models fit to
 Chloride   and   Tetrahydrofuran concentrations
 recorded at a Canadian landfill.
   We  also  explore  use   of Indirect measurement
 surveys  for  Improving  the efficiency of well-
 site sampling  plans.     A stochastic regression
 model  provides a  flexible theoretical framework
 for  deriving  such results.  In order to minimize
 the  Impact  of prediction  biases  that result from
 model  mlsspedflcatlon, probability sampling and
 estimation  methods are advocated.
   In  the    concluding    section,  an  extended
 stochastic  coefficient  model  that provides for
 spatial  correlation  among the  well-site specific
 Intercepts, />0l.  1s  considered.  This adaptation
 establishes a class  of spatial regression models
 encompassing   the   various    linear  prediction
 methods  that   are collectively  referred  to as
 Krlglng.
 2.0  Response  Surface Methods
   To  set the stage  for describing trend surface
 fitting  procedures,  a data  collection design for
 a  typical    hazardous    waste  site   will  be
 formulated.    First  consider  a two dimensional
 target  region,    R,   symbolizing  a  specified
 surface  area  around  a   hazardous  waste site.
 Within this region R we   let  0 denote a finite
 set  of potential  well  sites for sampling ground
 water.    These  potential  well  sites  can  be
 envisioned  as  a dense   grid  of  N  points or
 coordinate pairs  w^ »   (x^.y^)  blanketing the
 target region  R.    By   restricting  the set of
 potential well  sites to   a  finite subset of the
 points 1n R,   we  can  draw upon the extensively
 developed methodology  of probability sampling.
 This   methodology provides    a  wide  array  of
 randomized  site  selection  plans  that  can be
 employed to specify  a  sample,  s,  of n                             (1)
                                                   41

-------
 where  ft Is  the associated column  vector of  (p+1)
 unknown polynomial  regression  coefficients.  The
 feature  that  will    distinguish  our response
 surface  models  from   the  spatial   regression
 models  employed   1n   Krlglng   Is  the lack  of
 spatial   correlation    between   neighboring  zi<
 values.
    Since spatial   correlation   Is  Induced  by
 mlcrostructure In  the concentration surface that
 Is  not  adequately    accounted    for  by  the
 polynomial  mean function 1n equation  (1), having
 a good Indirect measurement  Cfc to  Include 1n Xt
 will     substantially    reduce    the  spatial
 correlation between residuals   ufc « (zk-X|d) and
 u].  The results  developed subsequently 1n this
 paper   are    derived    from    models   where  the
 covariance  1s zero  between  residuals ufc and u]
 that are from different  well   sites.   In the
 final   section extensions  of our models  are
 proposed that allow  for  spatial correlation.
 These  models  lead   to  stochastic  coefficient
 generalizations of  'Universal  Krlglng' methods.
 Due to the  Importance of having a good  covarlate
 1f simple   response   surface  methods   are to be
 successful,   we   will   devote   space  1n  the
 following section to elaborate on the properties
 of    two      such     Indirect    measurements,
 Electromagnetics and Resistivity.
 2.2 Electromagnetics and Resistivity
   Electromagneticsdevicesmeasure    the
 electrical  conductivity   of  subsurface  soil,
 rock, and groundwater.   The specific conductance
 of  the  ground  water   and  contaminating  pore
 fluids usually dominates  the  measurement.   The
 EM measurement responds  to significant  Increases
 1n   chloride,   sodium,   sulfate,   potassium,
 calcium,  and  magnesium.    Variation  1n these
 species  reflect  the  contaminant  variation at
parts per million  (ppm)  and  parts per billion
 (ppb)  levels.    EM  measurements  are  made by
 Induction with no ground  contact required.   The
 depth of the  conic shaped subsurface volume that
 1s assessed by  a  single  EM  reading depends on
 the  diameter of  the   coll  that  Induces  the
 electrical  field.    The  average  depth  of the
 conic volumes  scanned by  EM devices are roughly
 1.5, 6,  15, 30 and  60  meters.  The two deepest
 reading  devices are  vehicle mounted whereas the
 three  shallow  scanning  Instruments  are  hand
 held.  These   EM  devices  are Ideally  suited to
 generating  conductivity  profiles  along lateral
 transects targeted at  one  of the six available
depths.
   A  second   widely    used   screening  device
measures  subsurface  electrical  resistivity or
 the Inverse   of  conductivity.   The Resistivity
measuring   Instrument    Injects   an  electrical
 current  Into  the ground  at a pair of electrodes.
 Like  EM,   Resistivity   measures   an  average
 response for  a  subsurface  volume ranging from
the surface to the effective depth.  The nominal
 response  depth  for  a  particular  Resistivity
 sounding 1s approximately the electrode spacing.
Since the pair of electrodes can In principle be
 separated  by  any  predeslgnated  distance, the
 range of possible  reading  depths Is unlimited.
Noting   that  Resistivity  measurements  require
that a pair  of  electrodes  be place In contact
with  the  ground,   1t   1s   clear  that  such
measurements  are  slower   to  obtain  than  EM
 readings.    For  this  reason,  applications of
 Resistivity  are  generally   restricted  to depth
 sounding  at  a  limited   number  of  sites.  Due  to
 Its  speed of  application   for lateral coverage,
 the  EM device  Is  more   suitable for the blanket
 coverage  required  of   our low   cost screening
 survey.   In  the  following   section, a data base
 providing linked EM  survey measurements and lab
 recorded  contaminant concentrations Is explored.
 This data base  was  developed  to  monitor the
 magnitude   and     extent   of    groundwater
 contamination  around   the   Gloucester,  Ontario
 landfill.
 2.3  The Gloucester Landfill  Data
   The Gloucester landfill   TTTocated six miles
 south of  Ottawa,  Ontario.    From 1956 to 1973,
 the  landfill  was  the   site  of   domestic waste
 disposal.  From  1969   to   1980,   a special site
 within  the  landfill   was   used   to  dispose of
 hazardous university,   hospital,   and government
 wastes.    By  1985,  approximately  60 multiple
 level and  200  standplpe   monitoring  wells had
 been  Installed.    Extensive  EM  surveys  were
 conducted   obtaining    Indirect   conductivity
 measurements at the six  and fifty meter depths.
 To   demonstrate   the   utility   of   the   EM
 measurement, homogeneous  polynomial models were
 fitted to 1983 data  from   44  wells and to 1985
 data from 21 wells.
   A degree-p homogeneous   polynomials  1n the x
 and   y   spatial   coordinates    Includes   all
 regressors of the form
    X1t . xryc

 such that the polynomial  exponents r and c take
 values zero through  p   and r+cip.  Restricting
 attention to such polynomials Is advisable since
 the associated regression   equations produce the
 same   predicted   values    Independent   of  an
 arbitrary  rigid  rotation   of    the  x  and  y
 coordinate axis.  Since  the positioning of x and
 y coordinate axes  over the  target region of a
 hazardous  waste  site   Is  generally arbitrary,
 models whose predictions

              *k' V
 change with  the  positioning  of  the coordinate
 axis should be avoided.
   When  one  Is   forced   by   a sparslty  of
 observations    at    a     uniform   depth   to
 simultaneously fit  concentrations from varying
 depths,  then   polynomials   that Involve  the
 recorded depth d^ are   required.   This has been
 the case with  the  Gloucester  data.  Since the
 position  of  the  depth  axis  Is nonarbltrary,
 models  need   not   be   fully  homogeneous  or
 Invariant  under   rigid    rotations   In  three
 dimensions.  The  covarlate  measurements C|< are
 also linked to the potential well  sites w|< In an
 unambiguous  fashion  that  Is  not  altered  by
 repositioning the x and y coordinates.
   Table 1 displays a fully homogeneous degree-2
polynomial  fit  to  the  natural  log  of  1983
 chloride  concentrations  recorded  at  185 well
 site by  depth  combinations.    The average log
 concentration for the   185  measurements 1s 2.63
which yields  13.87  mg/L  on  the untransformed
 scale.     With   four   dimensions   among  the
predictors,  the  homogeneous degree-2 polynomial
 has an Intercept  and  fourteen regressors.  The
multiple R2 for this model was R2  = 0.63.
                                                  42

-------
   The pure error  mean  square  1s based on the
presence  of    replicated   water   samples  and
associated readings for  selected  well by depth
combinations.    The  test  for  lack-of-flt  Is
highly significant.  While It Is possible that a
higher order  polynomial  would  have achieved a
nonslgnlflcent  lack of  fit,  we did not attempt
any  higher  order  models.    Our  objective In
fitting  such   models  was  to  demonstrate  the
utility  of  the  EM  covarlate  (COND50)  for a
number of  different  pollutants.    To expedite
this demonstration, we  limited  our analysis to
the same  degree-2  polynomial  for  all  of the
pollutant outcomes.  The  50  foot EH varlate 1s
clearly  an  Important  predictor  of  the  1983
Chloride concentrations.    Figures  1 through 3
are Included to Illustrate a common problem with
polynomial  regression  predictors.    Figure  1
presents  two   so-called  bubble  plots  of  the
Chloride concentrations  observed  at  28 wells.
The square figures  represent  600  meter by 600
meter square  regions  overlaying  the hazardous
waste site.  The  variable  size dots denote the
size   category  of    the   maximum   Chloride
concentration observed at the Indicated location
and over the designated  depth range.  Note from
figure 1 that there  are only two well locations
at x-axis values In excess  of 3,000 meters.  We
have  also  observed  that  the  50  meter  (EM)
conductivity values beyond 3,000 meters on the x
axis are  all   relatively  small.    In spite of
these  relatively   small   values   of  the  EM
covarlate beyond  3,000  meters,  figure 2 shows
that  the  polynomial   equation  from  table  1
predicts a  substantial  Chloride  peak near the
center of the 3,100 meter boundary.  Figure 3 Is
a  three-dimensional  spider-web   plot  of  the
predicted    values    for    maximum   Chloride
concentration In the 15 to 25 meter depth range.
   These series of  figures  Illustrate the risk
of extending  one's  mapping  region  beyond the
area where water samples  have been analyzed for
contaminant  concentrations.    Another  way  of
addressing this problem Is  to Insure that wells
have  been  located   on   or   near  the  map's
boundaries and  at  sufficient density within the
region to  guarantee  a reasonable approximation
to  the  surfaces   major   features.    In  the
following sections, we  develop some alternative
well  location  strategies  based  on  different
stochastic models for the pollution process.
   Before  turning  to  the  well  site sampling
problem, 1t 1s  Instructive  to demonstrate that
In  addition  to  correlating  well  with highly
Ionic  contaminants   like   Chloride,   the  EM
electrical conductance measure  can also predict
the dispersion of organic contaminants.  Table 2
summarizes the  fit  of  our  degree-2 model to a
set   of   41   log   concentrations   of   1985
Tetrahydrofuran  (THF).     These  concentration
measurements were  taken  at  19  wells and were
associated with depths ranging  from four to six
meters.  Even within this narrow depth range,  It
Is clear that   the  associated  variable D Is an
Important predictor of  THF concentrations.  The
six meter EM (COND6)  variable  Is also a strong
predictor of THF.   Note  that  In this case the
test for lack  of  fit  Is nonsignificant.  With
only nine degrees of  freedom for estimating the
within  site  sampling   and  measurement  error
variance  (PURE   ERROR),   this  nonsignificant
result may well  be  a  reflection  of poor test
power.
3.0 Model Based Purposive Sampling
   To define statistical optimal 1ty criteria for
alternative well samples, we need to specify the
variance  and   co variance   function   for  our
stochastic   concentration    measurements   Z|<.
Recall that the  mean  function  has the general
linear form
where X|<  denotes  a  (p+1)  element  row vector
containing  a   leading   1   and  p  polynomial
functions of the spatial coordinates (xfc.yt;) and
the covarlate cfc.    In our original formulation
of this problem, the subscrlpts-k were linked to
potential well sites  wfc corresponding to points
(xk.Yk) blanketing the region  R  that was to be
mapped.    Recall  that  the  dense  grid  of  N
potential    well    sites    where    covarlate
measurements c^ had been obtained was denoted by
0.
   Implicit In  this  design  formulation Is the
notion that  all  wells  will  be constructed to
sample at the same set of depths prespeclfled on
the  basis  of  geophysical  and hydrogeologlcal
considerations.  To  account for the possibility
of  having  multiple  depth  targeted covarlates
like  our   six   meter   and   fifty  meter  EM
measurements, we can allow Xfc to Include the
full vector of covarlates, say cj<, so that the

mean function has the  same linear form for each
depth level -d; that Is,
for each of the prespeclfled depths-d.
   Given  a  common  linear  form  for  the mean
function at each depth-d and the further
assumption that the Zfoj are Independent with

common variance <7?, one observes that there Is a

single sample-s of n potential well sites chosen
from  0  that   Is  simultaneously  optimal  for
mapping  pollutant   concentrations   z  at  all
depths-d.  To specify an optimum sample-s, we
require an optlmallty criterion.  Under our
response surface model, the best linear unbiased
estimator for a future value of
                                    1s
 skd
       xkbsd
where b d Is the ordinary least squares (OLS)

estimator for the vector of depth specific
polynomial regression coefficients p^.  The

variance of this best linear predictor Is
     Var(zskd)

       '1
                                            (2)
where W~* 1s the Inverse of the left-hand-sides

matrix In the normal equations for estimating
b , and the superscript T denotes matrix

transposition.  In response surface parlance,
W  1s the Incidence matrix of the deslgn-s.

If Xs denotes  the  n  by (p+l)matr1x whose rows
correspond to the  n  vectors X|< associated with
the  sample  well   s1tes-s,   then  the  sample
Incidence matrix Is

     Vf = (xjx ).
      S     55

If we further let the measurements zkd denote a
                                                  43

-------
set of future values for the pollutant
concentrations at depth-d, then the squared

euclldean distance between the vector zQ(d) of N

future values and our vector zQ(sd) of best

linear predictors can be depicted as follows:


    LQ S j^kd - W2'               (3>

We will define the optimum sample-s as that
particular sample of size n that minimizes the
expected squared distance between the predicted

surface zQ(sd) and a future Independent

realization of the contamination process zQ(d).

   With Ljj(sd) In equation (3) representing our

loss function, the associated expected loss or
risk function Is

    R (sd) s  E E(zjd - z kd)2.             (4)
     Q       ^   ja

This  risk  function   Is  proportional   to  the
average mean-squared prediction  error where the
averaging extends over  all potential  well sites
wk belonging to Q;  that  Is,  over all  N points
where the  Z(j  surface  Is  to  be  mapped.  The
Independent residuals assumption of our Response
Surface model leads to the risk function
            .n*2
                       *sk'd>
                         N
                         E v
                        .k-1
                            sk
Now, one observes that the sample-s that
minimizes (5) must also minimize
      N
                      .
               trCW-W!1)
                   u s
                                             (5)
                                             (6)
where WQ 1s the universe level Incidence matrix

based on all N potential well sites and tr( ) Is
the trace operator.  An equivalent criterion
that Is proportional to (6) replaces WQ and W

by the associated mean square and cross product
matrices 0Q s (WQ + N) and Q$ = (W$ + n).

   In terms of these matrices, the criterion In
equation (6) reduces to
     N                      .
     E v fc - (N + n)tr(BQW -1) .             (7)
    k-1                 "
To  simplify  this  criterion  further,  one can
perform an orthonormal  transformation on the Xk
vectors  so  that  at  the  universe  level  the
transformed  mean   Incidence   matrix   Is  the
Identity   matrix.        Subsequent   to   this
transformation,    the    optimalIty   criterion
simplifies to
     N                    .
     E vs|f - (N + n)tr(fl-1) .              (8)
    k-1 slc              s
If we  assume  further  that  the potential well
sites w|( In Q were  positioned at the nodes of a
regular square grid, then the original xk and yk
well coordinates are  orthogonal  to each other.
In this  case  the  transformed  coordinates are
simply the standardized versions, say xk and yk,
formed by subtracting their respective unlverse-
                                                        0   level  means    and   dividing   the   resulting
                                                        deviations by  the   associated   root mean squared
                                                        deviations.
                                                           To   Illustrate   the  nature  of samples that
                                                        minimize our   risk  function,   consider a simple
                                                        degree-1  polynomial    function.     A  response
                                                        surface model  of degree-1 and a mapping universe
                                                        Q of  points   (xk,yk)   positioned  on   a regular
                                                        square  grid yields the  following orthonormal 1 zed
                                                        vector  of linear predictors
                                                           *k  s Oi *k. yk. *k)
                                                        where the transformed xk and y Coordinates are
                                                        the standardized versions described above and
                                                        ^denotes the  orthonormal version  of the
                                                        covarlate ck.  The transformed  covarlate has the
                                                        form
where c^ 1s the 0 level ordinary least squares

predictor for ck based on the first three

elements of j^.  The divisor a.Q Is the square

root of the average squared residual (cfc - cj2.

   Returning to the optlmallty criterion In
equation (8), one observes that for samples-s

that are orthogonal In the sense that 9  Is

diagonal, the risk function Is proportional to

    tr^1) - 1 + ,;2 + a'l + a'2            (10)

where the a* quantities In equation (10) are the

sample averages of the squared varlates x2,
 2       2
yk, and ^.  For such orthogonal samples, 1t

Is clear that our risk function 1s minimized by
choosing points wk with the largest values of

*k' yk' and ^k*  Such P°1nts I1e on or
near the spatial boundary of 0 where large
positive or negative residuals *kare

observed.  Note that these are points where the
degree-1 spatial polynomial predicts the
covarlate poorly.
    Of course, sample orthogonality Is not
a prerequisite for minimizing tr (O^1).

Focusing on the subset of orthogonal samples was
merely an expedient used to obtain a simple
characterization of designs that minimize

tr(B" ).  Various Iterative computer algorithms

have been devised  to  find near optimal samples
based on minimizing  criteria like equations (7)
and (8).    Mitchell   (1974)  presents  one such
algorithm.
   It Is Instructive to note that minimizing
equation (6) amounts to minimizing the average

variance of the predicted values zskd over 0

assuming that the form  of  the mean function Is
known.   Robust  response  surface  methods have
also been developed  that  seek  to minimize the
average  mean   squared   prediction   error  of
equation (4) assuming that the true mean
function for the process 1s a higher degree
                                                   44

-------
polynomial than  that being used to produce the z
predicted values.   In this case, the dominant
term  1n the  risk function of equation  (4) 1s the
sum over D of the squared prediction biases

    Bskd " Esk •                                       (15)
  SK   7S                          for all  tts
 Note that for the  sample well sites  (kes) the
 lsk estimator has  a second  term,  say ijslc, that
 estimates the vector  of random effects  i;k.
   The associated  set of BLU predictors for the
 z surface have the  form
zsk • Vsk
                              for
The stochastic coefficient model results In a

lack-of-flt adjustment to the z predictors for
wells wk that belong to the sample-s.  When

the measurement variance contribution In V2.,
        2
namely a  + mk, 1s negligible relative to the

lack-of-flt contribution (XkAXJ[) then zsfc - zfc

for the sample well sites.  When the measurement

variance 1s not negligible, then zsk for kcs has
the form of a James-Stein shrinkage estimator;
that 1s, for kes
                                                   45

-------
where the weight «k on zk 1s the fraction of the

variance V? that Is accounted for by the lack-

of-flt contribution

   While the set of  predictors In equation (16)
are best In  the  sense  of minimum mean squared
prediction error over  Q, the associated surface
1s discontinuous at the  sample  well sites.  To
produce an  appealing  graphic representation of
the surface, a  smoother prediction function may
be desirable.  For this purpose one can derive a
common  vector  of  regression  coefficients  to
substitute for the site specific p\ vectors, say
Is*,  such  that   the   set  of  smooth  linear
predictors

    zlk
minimize the risk function In equation (4) with

the z$k predictors replacing z  .  In this

Instance, one minimizes the risk function
Rn(/!$*) with respect to the elements of /Js*
given the sample-s.  Note also that the
expectation defining RQ(/>«*) Is In terms of our
stochastic coefficient model.
   Pfefferman (1984) shows  that  with  A and a2
known, the />s*  that  minimizes the average mean
squared prediction error has the form:
          - N  ,   1-1 f N
            I Xj XJ     E
          lk«l *  N   lk-1
                                  f.
                                  ' »*
                                            (18)
where the /Tsk are as defined In equation (15).
Folsom (1985) shows that when the number of
replicate water samples n\ per site Is a

constant i, fls* can be recast 1n terms of the
generalized least squares estimator 7$ of
equation (14) and the ordinary least squares

(OLS) estimator bs.  Letting R$- (x][zs) + n

denote the vector of mean cross-products In the
right-hand sides of the OLS estimation
equations, we have
where f
           s-<°s - VV (l-088(bs- 7S)1
          n + N Is the sampling fraction.
Recall that 5)s and BQ denote the mean Incidence
matrices for the sample-s and the unlverse-Q
respectively.
   Equation (19) shows that when one has wells
at all potential sites and s - 0 then the
minimum risk common p Is the universe level OLS
estimator
    fa
                                            (2°)
            T
with RQ s (x'zQ) + N denoting the population

analog of Rs.  Alternatively, as the sampling
fraction f -  (n + N) approaches zero the minimum
risk common B tends to the generalized least
squares (GLS) estimator 75.  It Is also
Interesting to note that when the sample

Incidence matrix fis matches the universe level

matrix QQ then the best common 6 simplifies to
     «s* " 7s  + fO>$ ~ 7s) •
     e fact   that  our  best  common coefficient
vector Is* goes  to  fa  as  the sampling rate f
goes  to   1   suggests   that   an  appropriate
definition for the finite population p parameter
1s fa.  In the  following section, we propose an
Inverse selection probability weighted analog of
the  OLS  estimator  bs,   namely  br,  that  Is
asymptotically  design  unbiased  for  fa.    In
addition   to   this   desirable   large  sample
property, we propose a sample design with
Inclusion probabilities r\ proportional to the

residual variances V? dictated by a simplified
form of our stochastic  coefficient model.  When
this  simplified  model  holds,  our  Inverse  r
weighted estimator br Is  equivalent to the best
small sample estimator 7$.
4.2 Efficient Probability Sampling
   Having settled on fa as our finite population
parameter, an ADU estimator for fa Is



where T«  1s  the  (nxn))  diagonal  matrix with
diagonal  elements  TSI   corresponding  to  the
Inclusion probabilities  for  the  well sites w^
that belong to the sample-s.  In order for b, to
be close  to  the  optimum  estimator  /Js* which
approaches 7$ when the sampling rate f 1s small,
we require a  sample  design  with the well site
selection probabilities rs\  proportional to our
model residual variance; that Is,

rsi - *V2 - flKX^xJ) + (a2 + 5)]           (22)

where 0 1s a proportionality constant and i Is
the common number of  water samples proposed for
each sample well site.
   The set of  selection probabilities specified
In equation (22)  are  clearly  functions of the
unknown model parameters A and 
-------
yields new predictor vectors %\ with the
property that the associated universe level  mean

Incidence matrix W reduces to !/„+})•  In
addition to Its potential for rendering the  p^
stochastic    coefficients    Independent    and
equlvariant, this orthonormal transformation
leads to an Interesting formal justification for
the Inclusion  probabilities  In  equation (23).
For  our  simplified  model,  we  can  show  that
equation (23) provides the vector of selection
probabilities TQ s (TJ, ..., TQ) that minimize
the risk function
                                    N
                                            <24>
where  E§  denotes   expectation  over  repeated
probability samples-s  and  Et  expectation over
the stochastic coefficient model.  The values of
*1 that minimize (24) are
         ™{L1V1
                                            (25)
Recall, that L  -
                         and V2 Is the model
variance of z .  In the simplified model where

V21s proportional to L2, we note that the
                                     2
optimum i-j are also proportional to Lj.

Therefore, under the  simplified model, the well
site selection  probabilities  that minimize the
risk function  of  equation  (24)  are  also the
probabilities that render  our  ADU estimator br
equivalent  to   the   model -best  small  sample
estimator 75.
   The  probability   sampling  scheme  outlined
above had Intuitive appeal In light of the model
based purposive sampling results of section 3.0.
Given that our well site selection probabilities
are proportional to  the  squared  length of the
associated   vectors   xi   of   orthonormal 1 zed
predictors,  we  observe  that  the  longest  ji
vectors are associated with sites on or near the
boundary of the sampling region where atypical ly
large values  of  the  transformed covarlate are
observed.  Recall that the transformed cgvarlate
for site  wi<  Is  proportional  to  (c^-cic), the
difference between  the  observed  covarlate and
Its (x,y) coordinate  polynomial  predictor.  It
Is clear that large (ck-cY) residuals will occur
where abrupt spikes or holes are observed In the
covarlate surface.    Near  the  sampling region
boundary It  will  be  the  spikes that generate
large residuals  and  correspondingly large site
selection probabilities.
   The model based  purposive  sample designs of
section  3.0   focused   exclusively   on  these
boundary  region  sites  where  atyplcally large
covarlate values were  observed.  Our randomized
sampling design  assigns  the  highest selection
probabilities  to  those  points  while Insuring
that all potential  sites  have a nonzero chance
of selection.  As  Indicted In section 3.0, this
defining property of probability samples ensures
that the Inverse T weighted sample Incidence

matrix 8 equals 0Q on average over repeated

samples and comes to closely approximate 8Q as
the  sample  size  becomes  sufficiently  large.
Therefore, 1n  the  context  of response surface
prediction, we see  that probability samples can
be  both  efficient   from   the  standpoint  of
controlling thevariance  of  prediction errors
and robust 1n  terms  of controlling the average
squared prediction bias.
4.3 Design Consistent Estimation
   Inthissection,weconsider  a  design
consistent adaptation of  the  best  common /S as
defined  by  equation  (19).    Recall  that our
orthogonal  transformation   of   the  predictor
vectors   reduces   the   universe   level  mean
Incidence matrix to  the  Identity matrix.  With
this transformation, the  best  common p reduces
to

    />s* - [".-flrWvu-flVWJ-  {25)
Examining equation (25), we observed that the
two leading terms would yield an ADU estimator
for fin = RQ If the Inverse f-we1ghted estimators
R, and Wr were substituted for Rs and fls.
    Specifically, we define the design
unbiased estimators
                                                                 J =
                                                                     1 +N'1
                                                       R  s  £
                                                        *   Its
                                                       and
                                                                 j
                                                       "j =  E  X\ X] * Nfi •
                                                        *   les
                                                       Making the  Indicated  substitutions,  we obtain
                                                       the following ADU estimator for /»Q:

                                                           Ps s [R, - («„ - Ipn)7s]  •            (26)
                                                       The estimator In  (26) Is clearly design unbiased

                                                       In large samples since WT •» WQ * Ip+i and

                                                       RT + RQ = PQ.  The form of /Js  In equation  (26)
                                                       reveals how knowledge of the population level

                                                       mean Incidence matrix BQ - Ip+i 1s employed to

                                                       adjust the unbiased estimator  R, for the

                                                       observed sampling deviation  (BT - BQ).  In this
                                                       form, PC has the  appearance   of a vector  valued
                                                       regression  estimator  with  coefficients  drawn
                                                       from  the   generalized   least   squares  (GLS)
                                                       estimator 75.
                                                          To form the  GLS  estimator (75) employed 1n
                                                        (26), we  require  a  sample   estimator  for the
                                                       stochastic coefficient  covarlance  matrix A and
                                                       the measurement variance  component  a2.  We are
                                                       Inclined to reduce A to a diagonal matrix  and
                                                       apply restricted maximum likelihood  (REML)
                                                       methods to obtain the sample estimators As and

                                                       a2.  In terms of  these statistics, we estimate
                                                        the  variance  of the  slte-1 mean
                                                                                           as
                                                        Our GLS   estimator  7S   Is  obtained  from  the
                                                        associated diagonal   covarlance   matrix.    An
                                                        alternative   form  for />«  emphasizes  this GLS
                                                        component.   Note  that   for   a  general  set of
                                                        untransformed predictors, the  two leading terms
                                                        of  equation  (19) lead  to  the ADU adaptation
                                                        In  this  form,   the   second  term performs a bias
                                                        correction  when  the   mean    function  Is  not
                                                   47

-------
adequately approximated  by  the specified (p+1)
variable linear predictor.    In  this case,  the
correction  renders   />«  asymptotically  design
unbiased  for   />Q,   the   universe  level   OLS
coefficients.
   Recall  that  the   original   best  common  p
estimator ps* of equation (19) also converged to
PQ as the sample-s  grew to encompass the entire
population of potential sites.  When our
simplified variance model holds approximately

with A - £2I(p+l) and (*s * m^  negligible,  then

the Inverse p-we1ghted estimator by s B(.1ft]r will

closely   approximate   the   model   based  GLS
estimator 7$.  In  this situation, the degree of
correction will be  slight  and  ps will closely
approximate 7$.
   In addition  to  combining  the efficiency of
the GLS coefficient vector 7$ and the robustness
of the r-welghted analog br, the regression
estimator /Js makes explicit use of our knowledge
of the population matrix On.  Effective use of
this added  Information  should Improve response
surface  prediction.     The  following  section
relates an  Interesting  connection  between our
simplified stochastic  coefficient  model  and a
robust co variance matrix estimator for br.
4.4 Robust Covarlance Matrix Estimation
    In  a   recent   paper,   C.F.J.  Bu  (1986)
recommends  a   weighted   jackknlfe  covarlance
matrix estimator that  Is strictly unbiased when
the model assumptions hold.  Wu's version of the
delete-one Jackknlfe  has  the  added robustness
property  of  asymptotic  unblasedness  when the
variance  model  does   not   hold.    For  this
robustness property to  hold  requires only that
the residuals be uncorrelated.   Recall that the
variance model motivating  our »-we1ghted vector
of regression coefficients has

V?

The associated site selection probability Is
As applied to  our p-we1ghted coefficient vector
br, Wu's  Jackknlfe  covarlance matrix estimator
covyu(bT) has  the form

[(p+D2*n] B"1^ ffor^LfCiMp*!)^]} O;1  (28)


where  the rj quantities depict the observed mean
residuals
 The {1  quantity In  the divisor of equation  (28)
 1s  defined as:


 Recall  that with the ji orthonormallzed,  the

 universe level  mean Incidence matrix WQ»  I(p+l)-
     Wu's results Imply that when  V2= $ZL2, then
                                     strictly model  unbiased.  On the other hand,
                                     when  V2 * 62L*  the  estimator In  (28)  Is  still
                                     asymptotically  unbiased as  long as  the
                                     unobserved  residuals
                                      1  "    1
                                                                                 f*n
                                     are uncorrelated.   This  result  follows  since Vj

                                     1s  asymptotically  unbiased.

                                        It  Is  Interesting  to note that  when the r-

                                     welghted Incidence matrix QT matches  the

                                     universe level  analog  WQ *  I/D+i\»  then (^ • 1

                                     for all s1tes-1  and  the bias  correction  factor
                                     applied  to   the    residuals In  equation  (30)
                                     reduces  to    the    familiar degrees-of -freedom
                                     adjustment.   It Is also  Instructive to  note that
                                     In  the  case   of pps  with  replacement  sampling,
                                     the estimator  with ft  » 1  reduces  to Fuller's
                                     (1974)  version   of  the   Taylor  Series  or
                                     linearization approximation  for  the  probability
                                     sampling covarlance matrix.   The f\  «  1 version
                                     1s  also analogous   to  Hlnkley's  (1977) weighted
                                     Jackknlfe.
                                        Folsom's  (1974) version  of  the  linearization
                                     estimator   employed  the   correction   factor
                                     [n+(n-l)].   Wu also  shows that  the standard
                                     bootstrap  covarlance  estimator  for  b,  can be
                                     approximated   by   equation   (28)  with   the bias
                                     correction factor  totally removed.  While  all of
                                     these variations   are  asymptotically  equivalent,
                                     Wu's  simulation results suggest  that the bias
                                     correction of equation (28)  can be  Important for
                                     small samples.
                                        The   connection    between  these   robust
                                     covarlance matrix  estimators and our stochastic
                                     coefficient model  follows from  the  definition of
                                     the site specific  coefficient estimators

                                                     1 - *1br>  * V?o
nr2 *
- V2
                                            (30)
 1s a model unbiased estimator for V  and (28)  Is
                                     where LQ « «I(p+1) and Vi0 *    r  As def1ned
                                     In equation (31) one observes that the br1
                                     coefficients sum to zero across sites.  Given
                                     this definition of b^, we can recast Wu's
                                     covarlance matrix estimator 1n terms of the
                                     associated sample covarlance matrix
          r     t

Employing the matrix Sb, the robust covarlance

matrix estimator becomes

covwu(bf) - (p+D^Vr1) * n '          (32)

Analogous covarlance matrix estimators can be
specified for our GLS coefficient vector 7S and
the design consistent adaptation

^s * 7s + BuX(br ' *»)'  We be11eve that
these  model  bias  corrected  versions  of  the
Jackknlfe/llnearlzatlon    estimators    warrant
further evaluation.
                                                   48

-------
5.0 Summary Comments

    In summary,  we  have  made  a  case for the
utility  of  remote  sensing  data, particularly
electromagnetics (EM), as  a predictor of ground'
water   contamination    levels.      Stochastic
coefficient  models  were  employed  to motivate
Interesting probability sampling alternatives to
model based designs.   We  believe that the good
large sample properties  of probability sampling
methods  merit   further   examination   In  the
moderate sample size arena.  In this context, we
favor coefficient estimators  that are efficient
when the model  holds  and asymptotically design
unbiased when the  model  falls.  An Interesting
jackknlfe     covariance     matrix    estimator
Incorporating a model  bias  correction was also
Introduced.
    In the  area  of  further  research,  we are
currently  exploring   an   extended   class  of
stochastic coefficient  models  that  permit the
regression  Intercepts  p$\  to  exhibit spatial
autocorrelation.   Restricted maximum likelihood
(REML) methods  are  being  employed to estimate
the covariance function parameters.  Our goal Is
to  develop  efficient  and  robust  statistical
procedures that  combine  the  best  features of
'response  surface*  methods  for  mean function
prediction and "Krlglng"  procedures for fitting
spatial covartance  functions.   REML estimation
Is attractive In this  context since It combines
model   based   efficiency    In   the   gauslan
environment with a  known robustness property In
the  nongauslan  case.    For  linear covarlance
function  parameters,  REML   Is  equivalent  to
1Iterated   MINQUE   (minimum   norm   quadratic
unbiased estimation).
6.0 References

Folsom,  R.  E.,  (1974).    National Assessment
    Approach  to   Sampling   Error  Estimation.
    Sampling Error Monograph.RTI final report
    prepared for the Education Commission of the
    States' National Assessment project.

Folsom, R. E., C. A.  Clayton, F. J. Potter, and
    R. M. Lucas, (1985).  The Use of Geophysical
    Techniques In the  Statistical Estimation of
    Magnitude   and    Extent   for   Subsurface
    Contamination at Hazardous Waste Sites.RTI
    final report prepared
    Policy Branch.
for EPA's Statistical
Fuller, W. A.,  (1975).  Regression Analysis for
    Sample Surveys.  Sankhya C 37, 117-132.
Hlnkley,  D.  V.,    (1977).
    Unbalanced Situations."
    285-292.
     "Jackkn1f1ng  1n
   Technometrlcs, 19,
Mitchell, T. J.,  (1974).   "An Algorithm for the
    Construction  of    'D-Optlmal'  Experimental
    Designs," Technometrlcs, 16,  203-210.

Pfeffermann, D.,  (1984).   "On Extensions of the
    Gauss-Markov    Theorem   to   the   Case  of
    Stochastic      Regression     Coefficients."
    Journal of  the  Royal  Statistical Society.
    46.  No. 1.  139-148.

Wu, C.  F. J.,  (1986).  "Jackknlfe,  Bootstrap and
    other  Resampling  Methods   In   Regression
    Analysis."    The   Annals  of Statistics. 14,
    1261-1343.
                                                    49

-------
Table 1:  Degree-2 Polynomial  Fit to 1983 Log-Chloride
Regression
Linear
Quadratic
Crossproduct
Total Regress
Residual
Lack of Fit
Pure Error
Total Error
Parameter
Intercept
X
Y
D
CondSO
X*X
Y*X
Y*Y
D*X
D*Y
D*D
Cond50*X
Cond50*Y
Cond50*0
Cond50*Cond50
Table 2.
REGRESSION
LINEAR
QUADRATIC
CROSSPRODUCT
TOTAL REGRESS
RESIDUAL
LACK OF FIT
PURE ERROR
TOTAL ERROR
PARAMETER
INTERCEPT
X
Y
D
COND6
X*X
Y*X
Y*Y
D*X
D*Y
D*D
COND6*X
COND6*Y
COND6*D
COND6*COND6
OF
*•
4
4
6
14
DF
137
33
170
DF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Type I SS
87.58111894
90.57004964
77.41551896
255.56669
SS
141.12041
9.02196057
150.14237
Estimate
-15.85168215
0.02120588
-0.14238897
0.29552937
1.34988504
-0.000043146
0.000215274
-0.000275732
0.001216132
0.001437620
-0.009610226
0.000273421
0.003379166
0.02522552
-0.06448431
Degree-2 Polynomial
DF
4
4
6
14
DF
17
9
26
DF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
TYPE I SS
10.13806033
2.27126830
7.93565507
20.34498370
SS
4.64649597
1.96751001
6.61400598
ESTIMATE
-58.56598111
0.10786243
0.21557419
-8.21549669
18.69031289
-0.000029699
0.001158774
-0.000902490
0.02107810
-0.006061689
0.35457561
-0.04542798
-0.03727344
0.23862751
-0.61796587
R-Square
0.2159
0.2232
0.1908
0.6299
Mean Square
1.03007597
0.27339274
0.88319040
Std Dev
4.41599408
0.009779667
0.01579998
0.30227869
0.39302999
0.000012153
0.000032573
0.000025697
0.000334634
0.000537719
0.007018137
0.000723282
0.000789312
0.01375624
0.01542040
F-Ratlo
24.79
25.64
14.61
20.67
F-Rat1o
3.768


T-Ratlo
-3.59
2.18
-9.01
0.98
3.43
-3.55
6.61
-10.73
3.63
2.67
-1.37
0.38
4.28
1.83
-4.18
Prob
0.0001
0.0001
0.0001
0.0001
Prob
0.0001


Prob
0.0004
0.0307
0.0001
0.3296
0.0007
0.0005
0.0001
0.0001
0.0004
0.0082
0.1727
0.7059
0.0001
0.0684
0.0001
Fit to 1985 Log-Tetrahydrofuran
R-SQUARE
0.3761
0.0842
0.2944
0.7547
MEAN SQUARE
0.27332329
0.21861222
0.25438485
STD DEV
27.55493712
0.06752790
0.15441818
2.26361317
6.05457900
0.000179072
0.001347590
0.001604787
0.005249011
0.01139379
0.13182192
0.01283282
0.01779248
0.13736266
0.23241860
F-RATIO
9.96
2.23
5.20
5.71
F-RATIO
1.250


'T-RATIO
-2.13
1.60
1.40
-3.63
3.09
-0.17
0.86
-0.56
4.02
-0.53
2.69
-3.54
-2.09
1.74
-2.66
PROB
0.0001
0.0931
0.0012
0.0001
PROS
0.3781


PROB
0.0432
0.1223
0.1745
0.0012
0.0048
0.8696
0.3977
0.5787
0.0005
0.5992
0.0123
0.0015
0.0461
0.0942
0.0132
                          50

-------
             i: 14tt2tlMn
                                                MI: »fe2SU«hn
   MO
    a*    no
                       MO    Jl»
            Chlorld* (mg/L):  ..5-12  .'12-24  • 24-36  • 36-57
Figure 1.   Distribution of  1983 Chloride Concentration by Depth  of Well
           O^tti: 25IMtn
                                                  Uox. of 15-25 IMtn
    2no  mo   800   aoo   xoo   sioo
BOOV-t
  aoo
                                            ••.... .;.........  .j
                                            i i '; i i i i i i i i i  i i i i i i }
                                                  2000   • noo   no   3100
        CL(mg/L)r . <20  .20-40 • 40-50 • 60-8& •80-100 • >100


    Figure 2.  Predicted Values of 1983  Chloride for Deep Depths
               8900
          Figure  3.   Predicted Values of 1983 Chloride (mg/L)
                      Maximum Concentration at 15-25  Meters
                                                                         3100
                                      51

-------
                   "VALIDATION" OF AIR POLLUTION DISPERSION MODELS
                                  Anthony D.  Thrall
    Electric Power Research Institute,  3412 Hillview Avenue,  Palo Alto,  CA 94303
ABSTRACT
   Computer simulations of the
dispersion of pollutants through the air
are used extensively to guide regulatory
decisions concerning air quality.  There
are a variety of air dispersion models,
some of which are classified by the U.S.
EPA as "preferred" for certain types of
applications, others of which are
classified as "alternatives," and the
remainder of which must prove themselves
"valid" for regulatory application
according to a specified evaluation
protocol.  The criteria, evaluation
procedures, and test results are
reviewed for the case of Gaussian plume
models of sulfur dioxide emissions from
isolated point-sources, such as coal-
fired power plants located in rural
areas.
REFERENCES
   American Meteorological Society,
1983.  Uncertainty in Air Quality
Modelling.  Report of a Workshop at
Woods Hole, MA, September 1982. AMS,
Boston, MA.
   Bowne, N.E., Londergan, R.J., Murray,
D.R. and Borenstein, H.S., 1983.
Overview, Results, and Conclusions for
the EPRI Plume Model Validation and
Development Project:  Plains Site.  EA-
3074, the Electric Power Research
Institute, Palo Alto, CA.
   Bowne, N.E., Londergan, R.J. and
Murray, D.R., 1985.  Summary of Results
and Conclusions for the EPRI Plume Model
Validation and Development Project:
Moderately Complex Terrain Site.  EA-
3755, the Electric Power Research
Institute, Palo Alto, CA.
   Bowne, N.E., 1987.  Observations and
Evaluations of Plume Models.  TRC
Environmental Consultants, Inc., East
Hartford, CT.
   Burton, C.J., Stoeckenius, T.E. and
Nordin, J.P., 1982.  Variability/
Uncertainty in Sulfur Emissions:  A
Summary of Current Knowledge on the
Effects on Ambient Standard Attainment
Demonstrations of Adopting Some Simple
Models of Sulfur Variability.
Proceeding of the Woods Hole Workshop on
Modeling Uncertainty.
   Carson, D.J.,  1986.  A Report on  the
Symposium on Uncertainty  in Modelling
Atmospheric Dispersion.  Atmospheric
Environment 20, 1047-1049.
   Dupuis, L.R. and Lipfert, F.W.,
1986.  Estimating  the  Cost of
Uncertainty in Air Quality Modeling.
EA-4707, the Electric  Power Research
Institute, Palo Alto,  CA.
   Environmental  Protection Agency,
1982a.  Air Quality Criteria for
Particulate Matter and Sulfur Oxides,
Volume I.  EPA600/8-82-029a, U.S.
Environmental Protection Agency,
Research Triangle Park, NC 27722.
   Environmental Protection Agency,
1982b.  Air Quality Criteria for
Particulate Matter and Sulfur Oxides,
Volume II.  EPA-600/8-82-029b, U.S.
Environmental Protection Agency,
Research Triangle Park, NC 27722.
   Environmental Protection Agency,
1986.  Guideline on Air Quality Models
(Revised).  EPA-450/2-78-027R, U.S.
Environmental Protection Agency.
   Fox, D.G., 1981.  Judging Air Quality
Model Performance.  Bulletin of the
American Meteorological Society 62, 599-
609.
   Hanna, S.R., 1986.  Spectra of
Concentration Fluctuations:  The Two
Time Scales of a Meandering Plume.
Atmospheric Environment 20, 1131-1137.
   Hayes, S.R. and Moore, G.E., 1986.
Air Quality Model Performance:  A
Comparative Analysis of 14 Model
Evaluation Studies.  Atmospheric
Environment 10, 1897-1911.
   Hillyer, M.J. and Burton, C.S.,
1980.  The ExEx Methods:  Incorporating
Variability in Sulfur Dioxide Emissions
Into Power Plant Impact Assessment.
Prepared for the U.S. Environmental
Protection Agency, Research Triangle,
Park, NC.
   Hilst, G.R., 1978.  Plume Model
Validation.  EA-917-SY, the Electric
Power Research Institute, Palo Alto, CA.
   Moore, G.E. and Liu, M.K., 1987.
Time Series and Spectral Analyses of
Concentration and Meteorological Data
from the Kincaid and Bull Run Power
Plants.  sySAPP-87/014,  Prepared for
the Electric Power Research Institute,
Palo Alto, CA.
   Murray, D.R. and Bowne, N.E., 1988.
Urban Power Plant Plume Studies.  EA-
5468, the Electric Power Research
Institute, Palo Alto, CA.
   Smith, M.E., 1984.  Review of the
Attributes and Performance of Ten Rural
Diffusion Models.  Bulletin of the
American Meteorological Society 65.
   Taylor, J.A., Jakeman, A.J. and
Simpson, R.W., 1986.  Modeling
Distributions of Air a Pollutant
Concentrations —I. Identification of
Statistical Models.  Atmospheric
Environment 20, 1781-1789.
   Thrall, A.D., Stoeckenius, T.E. and
Burton, C.S., 1985.  A Method for
Calculating Dispersion Modeling
Uncertainty Applied to the Regulation of
an Emission Source.  Prepared for the
U.S. Environmental Protection Agency,
Research Triangle Park, NC.
   Turner, D.B. and Irwin, J.S., 1982.
Extreme Value Statistics Related to
Performance of a Standard Air Quality
                                          52

-------
Simulation Model Using Data at Seven          «?r = S2*  Boundary-Layer Meteorology 17,
Power Plant Sites.  Atmospheric               5  ,,7,     _ _   _ , .     _ _     .
Environment 16, 8. 1907-1914.                 0  Alison, D.J. , Robins, A. G. and

-
   Venkatram, A., 1979.  A Note on the                f  ;>'     >^       ic  and
Measurement and Modeling of Pollutant         Conditionally-Averaged Concentration
Concentrations Associated with Point          Fluctuation Statistics in Plumes.
                                              Atmospheric Environment 19, 1053-1064.
                                           53

-------
                                         MODELING POLLUTANT  PLUMES
                                           Robert W.  Jernigan
                                       The  American University and
                                 Statistical   Policy  Branch,   EPA.  Washington,  DC
     The  previous  papers  and presentations  by
Anthony D. Thrall  and Ralph E.  Folsom  show two
different   approaches   to  the  modeling   of
pollutant  plumes.   Thrall   is  mainly  concerned
with  the  modeling  of   isolated air  pollution
point sources, such  as  coal fired power plants.
As  such   the  methodology  that  he  considers
involves models of the dynamic behavior of plume
development  and dispersion.   These  models  are
used to  evaluate  the effect of  various control
methods   that   are   considered   to  meet   the
standards  of  the  Clear  Air Act  Ammendments  of
1970 and  1977. There is  little to comment on in
this paper.  I  have seen the details of some  of
Thrall's related work where he has been involved
in  developing statistical  summaries  of  model
performance and  in  the  calibration problem  of
matching   model    calculations    to   observed
monitoring  data.   Using   a  measure  of   the
probability  of misclasslfication derived  from
simulated  data,  he  has developed  quantitative
procedures for the validation of such models.
     Folsom  gives   us more  to  examine,  so  my
comments  will  center on his paper. Folsom has
been developing  methodologies for  the  modeling
of  groundwater pollution  plumes.  His  modeling
efforts   rely  on   a   certain   stability  or
stationarity   in   the   pollutant   plume.   His
procedures  are used to aid  in the design  of
well-site  sampling plans.   The use  of  indirect
measurements   of   contamination   obtained  from
electromagnetics and resistivity devices  is the
crucial element in his mapping of plumes through
response  surface   methods.  The  use of  such  a
covariate  for  the  contaminant  provides  a useful
and relatively inexpensive  view  of the pollutant
plume.
     Folsom uses a class of low  order polynomial
spatial   regression  models   along   with  this
covariate  to  predict  pollutant   concentrations.
He  does  not however  Incorporate any measure of
spatial   autocorrelation  in  this  estimation.
Unfortunately, Inclusion of a  covariate does not
necessarily  explain all of the  autocorrelation
present.   In   fact   one  would  see  substantial
reduction in spatial  autocorrelation only if the
relationship   between  the  measurement  and its
covariate  was    not   affected   by    spatial
properties.  For  example it  might happen that a
measurement  and its covariate  are  more highly
correlated  in areas  of  high  rather than  low
pollution.  'An  indication of  such a behavior can
be   seen  in   the  scatterplots   of   EM  and
resistivity measurements in Folsom, et.al.(1985)
There  a few  high  side outliers  often behave
quite differently from the bulk of the data.
     Of  course,  the  technique  of  co-krlging
could  incorporate  both the  information  on  the
covariate  and  the spatial  autocorrelation  to
model  the  contaminant plum.  But it  is  easy to
incorporate  spatial  autocorrelations  and  the
essence  of this  kriging  methodology into  his
response surface techniques without resorting to
specialized kriging software.  What is needed is
a   preliminary    estimate  of   the   spatial
autocorrelations.  These   will   be  used  in  a
generalized least  squares  algorithm to estimate
the  contaminant.  This   technique  will  yield
predictions of measurements of  the contaminant
plume  whose  spatial  autocorrelations  can  be
calculated.  These  improved  estimates  of  the
spatial  autocorrelations  can then  be used with
generalized  least  squares  to  re-estimate  the
pollutant  concentrations.  As  such an iterative
technique  continues   it   incorporates  spatial
effects  directly and  R values in excess of the
63% obtained by Folsom  could be possible.
     Folsom's   use  of  a  random   coefficient
regression model is one  approach to account for
the  within site  sampling  variability as well as
the  laboratory analysis variability.Kriging with
a  nugget effect to smooth the response surface
is   another   way  to   model   these  sources  of
variability.
     In  all  such  modeling problems we  often
produce  a map  of  point  estimates.  We  must be
careful  in not interpreting this map as  truth.
Each point on our  map may be above or below the
true pollutant response surface. Our estimation
error  may be  quite small  and still  the map of
the  true surface can  appear different from  that
of  our  estimated  surface,  especially  in  the
presence of spatial autocorrelation.
     I  applaud  Folsom and  his  colleagues  for
considering   extensions   to   his   models   to
incorporate spatial  autocorrelation.  Perhaps by
combining  the  existing  techniques  of   mapping
groundwater   contaminants   more  efficient  and
reliable procedures  can  be developed  to model
pollutant  plumes.
                                                    54

-------
                        ESTIMATING THE SPATIAL UNCERTAINTY OF INFERRED RATES OF DRY
                                             ACIDIC DEPOSITION
                                           Richard A. Livingston
                             Geology Dept.,U.of Maryland,College Park.MD 20742
    The acid deposition monitoring program
 conducted by NAPAP concerns monitoring both wet
 and dry deposition.  Wet deposition or acid rain
 monitoring has been in operation since 1982. The
 dry deposition monitoring program is now getting
 underway'".  One of the major issues with moni-
 toring dry deposition concerns the extrapolation
 of the dry deposition rate measured at a moni-
 toring site to the entire region of interest.
     The problem is further complicated by the
 fact that the measurement technique involved
 gives only an indirect estimate of the actual
 flux to the ground.  Instead, an approach known
 as the inferential method is used.  This con-
 sists of measuring concentrations of acidic
 gases and particles in the atmosphere.  From the
 concentrations, the rate of deposition is in-
 ferred using an algorithm that incorporates
 several site-specific or time-dependent fac-
 tors'^'.  For example, the type of ground cover,
 which varies spatially, is a factor in the
 algorithm.  An example of a time-dependent
 factor would be windspeed.  These factors are
 measured at the actual monitoring site, but can
 obviously assume different values as one moves
 away from the site.
     The inferential method employs a simple
 formula for calculating dry deposition rates:
                     F = VdC
(1)
 where F is the dry deposition flux; C is the
 concentration of the acidic species in the
 atmosphere, and Vd is a factor known as the
 deposition velocity which is calculated from the
 algorithm described above.  The deposition
 velocity can range from'0.01 to 1 cm/a while the
 concentration of a gas such as sulfur dioxide is
 in the range of 1 to 50 wg/nr.
    The difficulty is that both terms on the
 xighthand side of Equation 1 can vary spatially.
 Consequently, the dry deposition flux, F, can
 also vary spatially.  However, the problem can
 be posed in such a way that effectively de-
 couples the variability in C from the vari-
 ability In Vd
     Differentiating equation 1 with respect to
 some spatial coordinate,  x, gives:
                             dx
                                              (2)
It is generally the case that at least in rural
areas, far from major point sources or urban
areas, the gradient of the atmospheric pollutant
concentration is very shallow, on the order of
10 *cg/nr on a scale of kilometers.  On the other
hand, for reasons discussed below, the deposi-
tion velocity can change by an order of magni-
tude over a distance of a meter or less.
    Thus, we can make the assumption that:
                 dC
                 —
                 dx
(3)
                              dF_
                              dx
                                               (4)
and hence:
    In other words,  on a local  scale,  the spatial
 variability of  the  dry deposition is  determined
 primarily by the variations  in the deposition
 velocity since  the  air pollution concentration
 can be regarded as  constant.
     While this  reduces the complexity of the
 problem,  it also creates some  awkwardness since
 it  implies that the major variability is not
 with the variable that is being monitored.   Thus
 it  is not possible  to estimate the spatial un-
 certainty of dry deposition  simply from the data
 on  atmospheric  concentrations.  Instead one must
 estimate the variability of  the deposition velo-
 city,  which entails acquiring  additional infor-
 mation.
     It is now necessary to consider in more de-
 tail the factors affecting the deposition velo-
 city.   As noted above,  type  of ground cover is
 an  important factor.   The deposition  velocity
 over bare rock  will be different than that over
 vegetation.   In fact,  there  are further sub-
 divisions under each category.  For example,  a
 bare limestone  will react more readily with
 acidic gases than bare granite.
    With vegetative  ground cover,  there are ad-
 ditional  considerations.  The leaves of different
 species of plants have different types of sur-
 face chemistry  and  pores,  or stomata,  for uptake
 of  acidic gases.  Thus,  the  deposition velocity
 on  a pine tree  will differ from that  on an oak
 tree.   Aside from the surface  chemistry,  the
 physical  structure  of the plant is also impor-
 tant in determining the deposition velocity.
The  spatial  pattern of  the leaves  and  branches
affects the  wind  velocities  and hence  the depo-
sition velocities in the  forest canopy.   For  the
same reason, grasslands have a different  deposi-
tion velocity than  forested  areas.
     Not only is the type of ground cover  impor-
tant, but  also  the  spatial relationships  of one
type of ground  cover to another.   An abrupt
change  in  the height of ground cover such as  oc-
curs going from a grassy  field to  a wooded area
will create  additional turbulence  in air  flow
patterns with a resulting higher deposition
velocity.  Thus, a uniform groundcover will have
a deposition velocity different from that of  an
area where patches of one type of  ground  cover
are  intermingled with another.
   Also affecting the wind field and thus the
deposition velocity is the topography.  A hilly
area will  have a much greater variance  in wind-
speeds and directions than a flat  area.  The
orientation  to prevailing winds and to solar
radiation  can influence deposition velocities as
well because deposition velocities are  influ-
enced by surface temperature and wetness.
   Thus the problem of characterizing the spa-
tial variability of the dry deposition comes
down to being able to estimate the spatial var-
iability of all these factors that determine the
deposition velocity.  To do this,  it is neces-
                                                   55

-------
 sary to assemble data and to use statistical
 methods from widely diverse disciplines.
    The  mapping of vegetation types  is  a  long  es-
 tablished part of plant  ecological  studies.   A
 variety of data bases already  exist at Federal,
 state or local levels on land  use and  on the
 types of vegetation,  at  least  for selected
 areas.  One example is the Geoecology data base
 established at Oak Ridge National Laboratory.
 It  remains to  be seen whether  these existing
 data bases have fine enough spatial resolution
 to  be applied  here.   There is  also  the question
 of  whether the sampling  procedures  were  consis-
 tent enough aaong the different  data bases to
 permit  pooling of the data.  The possibility
 also exists of creating  new data bases through
 interpretation of aerial photography.
    Ecologists  have developed a number  of statis-
 tical techniques for dealing with spatial pat-
 terns of plant species.  These  include  indices of
 species diversity,  species abundance and patchi-
 nesst*).   As yet,  these  measures have  not been
 applied to the specific  problem  of  dry deposi-
 tion.
    Similarly,  there is an abundance of topo-
 graphical maps on different scales.  However,
 these generally do not exist in  digitized form.
 Geomorphologists have developed  their  own set of
 statistical  measures  to  characterize the  physi-
 cal  landscape.   Some  relevant  indices  of  drain-
 age  basin morphometry include  drainage density,
 relative  relief and  ruggedness number' '•
    In addition to making use of  these  existing
 statistical  tools,  it will probably be necessary
 to develop new methods.   For example,  fractal
 analysis  may prove very  useful for  character-
 izing certain  aspects of the landscape.   For
 Instance,  a forest  canopy has  many  of  the attri-
 butes of  a fractal surface.
    Integrating all this  information will  no
 doubt require  the use of computers.  Specialized
 methods  for dealing with spatial  data  have been
 developed under the  general heading of Geograph-
 ic  Information Systems.   The rapidly growing
 field of  pattern recognition may also  provide
 some assistance.
   Ultimately,  once  all  the uncertainties in  the
 individual factors have  been estimated,  it will
 then be necessary to  make some summary state-
 ments about the overall  spatial  uncertainty of
'the  inferred dry deposition rates.   In essence,
 the  process  outlined  here consists  of  taking  the
 individual uncertainties in the  underlying fac-
 tors and  propagating  them through the  algorithm
 that calculates dry  deposition.   It should be
 noted that this process  yields only an estimate
 of  the  precision of  the  Inferred dry deposition.
 The  bias  is  impossible to determine absolutely
 since there is no  Independent  measurement tech-
 nique that could be  used as a  reference method.
 However,  it may be possible to put  some bounds
 on the  bias through the  use of mass balance
 estimates over a controlled watershed  or
 over a  specific area of  ground cover.
   Although it is not possible to state  an ab-
 solute  value fof the  bias and  hence the  true
 rate of dry deposition,  the data can from the
 dry deposition network will still be useful for
 policy-making.   That  is  because  the critical
 policy  questions being asked tend to be of a
 relative nature,  such as:  How  much  greater is
the dry deposition rate over affected' areas
compared to unpolluted background regions. Or:
Is dry deposition getting greater or lesser over
time.  These questions thus often involve dif-
ferences in time or space rather than absolute
values.  If the bias Is-constant, it will be
removed in calculating the differences.
   At this point another statistical issue must
be dealt with.  Given the variability in the
factors underlying the deposition velocity, it
is likely that the estimated precision will also
vary spatially.  The characterization of varia-
tion over a region is a classic problem in geo-
statistics. and a number of methods have been
developed'5'- The most well known method is
kriging.
    However simple kriging is not useful for
this purpose since it provides no insight on the
actual uncertainty at any distance from the
actual measurement points.  More advanced
techniques such as indicator or probability are
now coming into use.  These methods can gener-
ate maps that show areas where some minimum
level of detection is being exceeded' '•
    In conclusion, the task of estimating the
spatial uncertainty of dry deposition depends
heavily on the use of auxiliary information.
This information comes from a variety of disci-
plines, each with its own statistical tools.
Integrating the information from these diverse
sources and presenting it in a meaningful way
for policymakers will be a major challenge for
the statistical community.

References
1. National Acid Precipitation Assessment
     Program (1987): Annual Report 1986. US
     Government Printing Office, Washington DC,
     pp. 75-88
2. Hicks,B..Baldocchi.D..Hosker.R., Hutchi-
     son.B.,Matt,D.,Mcmillen,R., and Satter-
     field,L.(198S):"On the Use of Monitored Air
     Concentrations to Infer Dry Deposition",
     NOAA Technical Manual ERL ARL-141.  Air
     Resources Laboratory, Silver Spring,
     MD, 65 pgs.
3. Pielou,E.,(1975): Ecological Diversity.
     Wiley-Interscience, New York, 165 pgs.
4. Ritter,D.(1978): Process Geomorphology. Wm.C.
     Brown Pubs., Dubuque Iowa, pp. 154-176.
5. Journel.A. and Huijbregts.Ch.,(1978): Mining
     Geostatistics. Academic Press, London,
     pp. 303-444.
6. Journel.A.(1984): Indicator Approach to Toxic
     Chemical Sites. US EPA Environmental Moni-
     itoring Systems Laboratory, Las Vegas, NV,
     30 pgs.
                                                    56

-------
                ASSESSING EFFECTS ON FLUCTUATING POPULATIONS: TESTS AND DIAGNOSTICS
                                             Allan Stewart-Oaten
                          Biology Dept., University of California, Santa Barbara, Ca., 93106
    I. Introduction
    A standard  question  1n  environmental  Impact
assessment  1s  whether  some  human alteration  has
caused a decrease 1n some environmental variable.
    There  are  some   obvious  difficulties.  An
apparent  change   over  time  could  be  due  to
something else  occurring at  about  the same time
as  the  alteration,  or to  a general  trend over
time. A lower  value  of the variable  near  the
alteration  than far away could  be  due to  other
differences between the locations.
    One  way  to deal  with  these problems  1s to
have  observations  both  Before  and  After  the
alteration, at  sites both near to it (Impact)  and
far away (Control). (These capitalized words will
refer to these fixed  periods  and  sites  hence-
forth.)  The plan  1s  to  compare the difference
between Impact  and  Control  sites Before with  the
difference After,  and  to argue  that a change in
this difference cannot be the result of either  a
widespread time change or of  a difference between
the sites, so  1s  likely to Indicate an effect of
the alteratloji.
    There are some obvious difficulties with this
scheme  too.   One  Is  that  a   change  In   the
difference  could  be  due  to some  time-related
change other  than the  alteration,  but occurring
at about the same time and affecting one  site  but
not the other.  Another  Is that the difference  may
fall  to change because  the alteration  affects
both sites about  equally. Thus  the Control  needs
to  be close  enough  to  the Impact  site  to  be
affected similarly by  natural  changes,  but  far
enough  away   to   be   little  affected  by   the
alteration.  This   Important task  1s  not   the
subject of this paper, and we will assume that it
has been  accomplished  -  while  recognizing that
this  assumption  needs to  be  examined  and  can
never be guaranteed.
    A  more  technical   problem  1s  that  many
environmental variables  fluctuate naturally over
time.  This  applies   to   physical   or   chemical
variables,   but   perhaps   especially   to   the
densities of  biological  populations,  which will
be  our  focus.  Thus   any  observed  difference
between sites can be due to any of three causes  -
sampling   error,    natural   fluctuations,    and
Intrinsic  differences  between   the  sites   -  of
which we wish  to  distinguish  the third. This
cannot be done  with only one Before  sampling time
(or  only  one  After,  unless we  make  the   risky
assumption  that  variability due to  the   other
causes Is the same After as Before), so we  assume
there are many  sampling times In  each period.
    It 1s also  difficult  to do  1f  the two  sites
are sampled at  different times.  One or other site
1s  likely  to  have been  sampled   at  times  of
generally greater  abundance,  so  that one must
either  Include  covariates  (frequently   having
little Idea  which  ones matter  or  how  to  model
them) or  pretend  that the  sampling  times were
chosen and assigned randomly and  then the records
of them were lost.  We  will  therefore assume that
the  two  sites  are sampled  simultaneously,  or
nearly so.
    We will  thus have observations X^i,.., for 1 -
1  (Before)  or  2  (After),  j  -  1,  27  ....  n*
(sampling times In period 1), k - I (Impact) or 2
(Control), and m » 1, ..., r^ (replicates).
    In  this  paper,  we  will  assume  r^-l.  The
replicates give  Information  on sampling; variance
but not  on  fluctuation  over time.  If the r^'s
are equal and the sampling errors are iid Normal,
with known  variance, the  replicates can  be re-
placed by their average (a sufficient statistic)
without loss of  Information. The  same Is true If
the sampling  variance  is  unknown,  but  the time
fluctuations (or their differences between sites)
are also  lid Normal.  Neither of these conditions
will usually hold,  so the replicates might show
us a better "summary" than the average, and might
contribute information  useful  for model  building
or  for   estimating  variances   in   which  both
sampling error and time fluctuations play a part.
However, this  approach  seems likely  to  be quite
complex, and may not  be worth  the trouble if the
time fluctuations are large.
    Environmental  Impact assessment is often cast
as a problem in  statistical  testing:  the task is
described  as  one  of  detecting  a  change or  a
violation of some threshhold, and one of the most
frequent techniques  Invoked  Is Analysis  of Vari-
ance,  which Is  primarily a  testing  (rather than
an  estimation)   procedure.   Section   2   below
discusses the  testing problem,  particularly the
assumptions for a standard two-sample test on the
Impact-Control   differences,  and  ways to  adjust
the data to satisfy them, if necessary. Some data
will be presented to illustrate the effectiveness
of these ways,  and some of their shortcomings, in
Section  3.  Section  4   briefly  considers  three
topics:   gaps  in the  present testing  scheme and
directions  in   which  it  needs  to  be  extended;
weaknesses of testing as  a goal,  and an approach
to the  more  appropriate goal  of  estimation; and
some implications  for  the design  of assessment
programs.
    2.  Testing
    A natural  model  for  the observations,  X^i.,
is                                            1JIC
      X1jk"mi+aij+bk+ctjc+e1ik     U-1)
where m/is the JoveralT mefiii for the period,  a^
is  a time effect,  bk Is  a site effect,  c1k 1s C
If  1-2  and k-1 but  is  zero  otherwise, and gives
the change In  the site  effect  between periods,
presumably due  to  the alteration, and the  eiik's
are  lid errors. If  we  believed  this model,  and
wanted  to estimate  c,   we would  use two-sample
test  statistics  derived  from  the   differences,
"ij"*1jl~*1j2-
    Thrs  procedure  remains  valid  under  weaker
assumptions   than    those   In   (1.1).     These
assumptions  cannot  be  guaranteed   but  can  be
checked  for plausibility against both biological
theory and the observations themselves. They are:
    (I)   The  DJ-J'S   all  have the  same mean,  Mi.
That Is, modefj(l.l)  does not  need a time-x-site
interaction term,  d11k,  so Mi-bi-b?.  If there is
no  effect  of  the alteration,  the  D2j's  also all
have this  mean. If  the alteration floes  have an
                                                    57

-------
effect, the  model  says that  the D2j's  all  have
mean  t^-bi-bo+c,   but  the  test  would  not  be
Invalidated if c varied over  time,  though 1t may
well be Inefficient.
    (2)   The  errors,   Eij"eiji'eij2»   are   *n~
dependent.
    (3)  The EIJ'S  are identically distributed; in
particular, tliey have equal  variances.  The Eo-i's
also are Identically distributed.             J
    (4)  The distribution of the EH's Is the same
as that of the E2i's.             J
    (5)  The E^'sHre Normal.
    These  assumptions are  not all  essential,  or
equally Important.
    The first  seems   the  most  Important. If  1t
does  not  hold,  at  least  approximately,  then
neither   "the"  Before   nor   "the"   After  mean
difference, whose  comparison  Is  the aim  of the
test, Is well-defined.
    In   one   sense,   this   assumption   can   be
guaranteed. The errors, e^,  will  be the result
of   both    sampling   error   and  the    "random"
fluctuations over  time of the actual populations.
The distinction between "fixed" and "random" time
effects seems  necessarily  arbitrary.  Individual
births  and deaths,  and  probably movements,  of
individuals  or  schools,  seem  best regarded  as
random. But  the  rates  of  these changes  may  be
altered by  storms,  upwelllngs or other  physical
or  chemical  fluctuations. These  may be  to  some
extent  predictable   and  can  sometimes  be long-
lasting, which  suggests  fixed effects,  but  they
are  Irregular  and  hard  to predict,  so  can  be
argued  to  be random. Finally, all  these  sources
of  change  may depend  on  seasons,  which  seem
predictable enough  to be  called fixed  - except
that their effects on the organisms seem to vary
1n timing and amount from year to year.
    It  may  thus  be  possible  to  argue  that  time
effects should all be classified as "random", In
which case  they and  time-x-site Interactions can
all be  regarded as part of the error. Assumption
(1) is  then  formally Justified.  But this  Is only
a delaying  tactic:  any test  results  can then be
attributed  to  the  effects  of  the  particular
sampling  times  chosen, and  the  problem must be
dealt   with   again.   It   also   reduces   the
plausibility  of  assumption   (2),  since  major
events  like   seasons  or  long-lasting   weather
conditions  (e.g.  El  Nino)  will  affect  several
sampling times.
    Another approach might  be  to try to  list the
time-related  effects to  be called  "fixed", and
explicitly  allow  for them  1n the  model. But we
usually do not  know which  effects  to Include or
how to  model  them.  The response of many  species
to  even the most obvious  candidate, seasons, Is
not  consistent,  especially  in  timing,  and the
shape of the response can rarely  be modelled by  a
simple  function like  a sine wave.
    These  problems  vanish  If   large   or  long-
lastlng time-related  effects  are  approximately
the  same  at the two  sites, so they cancel in the
differences:  that   is,   1f  these  time-related
effects are approximately  additive with  respect
to  site.
    The  Idea   1s  that   there  Is  a   "region",
containing Impact and Control,  over which  major
time-related  changes, such  as  seasons  or  major
weather  conditions,  are  Imposed   approximately
uniformly.   The  problem 1s  that  these  uniform
changes are imposed on  locations  that differ, so
they  may  not  have  uniform  effects.  To  these
effects  are  added  the  fixed  effects  of  the
locations  themselves,   and  local  fluctuations:
local time-related changes such as minor weather
effects  or   water  movements.   Together  these
comprise the external  forces influencing the path
of the  population  density over time.  The actual
population is  determined by  "random" events like
Individual  deaths,  births or migratory movements,
but  these  stochastic   events   occur  at  rates
governed by  the external   forces  together  with
"internal" forces resulting from  the size of the
population  Itself,  and  of  other   interacting
populations,  in  the recent past.
    Treating the uniform, major changes as given,
the  path  traced out  by  the  actual   population
density  over  time  at a  given  site can be seen  as
a  stochastic  process  whose  mean   function   Is
determined  by  the  uniform  changes,   and  whose
deviations  from this   function  are  due  to  the
local or Internal forces, or to the  chance events
whose  probabilities  are determined  by all these
forces.  The  assumption that  the  Impact-Control
differences have equal  means  over time  requires
that the mean  function  at Impact differ from  that
at Control  by a constant amount:  the  differences
between  the   two   sites  should   not  lead   to
differences In the effects that  the major time-
related  changes  have at these sites.
    It  1s  the mean functions  that  need  to  be
additive  In   this   sense,    not   the   actual
populations   themselves,  which   incorporate  the
deviations  as  well.   For  example,   additivlty
cannot  be  tested by taking  replicate samples  at
each time  and place,  and comparing  the variation
among    replicates   with   the    variation   due
"interaction"  between  the  two  sites  and the  ni
sampling times.  This  tests non-additlvity In the
actual population values (which we expect because
of the deviations) but  not 1n the  mean functions.
Even  if we knew the actual  populations  at  each
time,  we  would  still   need  to  look for   non-
additlvity  In  a 2-way  table  (times  and sites)
with  one observation per  cell.  The addition  of
sampling  error  does   not   change  this  basic
situation:  the  only difference Is  that  we  must
use  estimates  of the actual population instead  of
known values.
    With only  one observation per  cell, we cannot
test  for all  possible  interactions, so we choose
the most likely. This may depend on  the situation
and  the  species,   but   in  most  cases we would
expect  non-additivity  to  show up  as a  relation
between  the difference  between the two sites and
the  overall abundance  of the species. An obvious
measure  of overall  abundance is  the average  of
the  two  sites, so  we  are  led   to  look  for  a
monotone relation  between  the difference and the
average. Two  simple test statistics for this are
Spearman's  rank correlation  coefficient  and the
slope   of   the  regression  of  the   differences
against  the   averages   -  which   corresponds  to
Tukey's  (1949) non-additivity test for this situ-
ation.
    As these  tests Imply, we  would expect time-
related  changes to have larger effects at dense
sites  than at sparse ones,  for  many species.
Many biologists would expect  these changes to be
                                                    58

-------
 multiplicative,  and would suggest using  the  logs
 of the sample abundances. Though  appealing,  this
 suggestion   may  be   the  result  of   misplaced
 Intuition  (Stewart-Oaten 1986). Additive changes
 In rates will lead  to multiplicative changes  In
 density  for  exponentially Increasing  or  declining*
 populations,  but there  Is  no obvious  guarantee
 that  the changes In  rates  are additive  and  most
 biological  populations  are  less  likely  to  be
 changing exponentially  than  to  be  at  a rough
 equilibrium,  fluctuating In  response to  changes
 In carrying  capacity.
    Rather than  pre-judge  this Issue, a  range  of
 transformations,    Including    logs,    can    be
 considered.   Probably  the   best-known   and  most
 convenient family 1s the one used by Box and Cox
 (1964),
       Y(X)-{(X+c)b-l}/b      (1.2)
 which  can  be thought, of as the  shift-plus-power
 transformation,  (X+c)° when  b-0  and log(X+c)  when
 b-0.   In  this  paper,  b  1s  restricted  to  lie
 between  -1 and 1. Larger values  of b  Imply that a
 given  time-related  Input  will   change   the  un-
 transformed  abundance more  at  the  sparser  site
 than at  the denser. Smaller  values Imply  that the
 amount of  habitat  (volume  of water or  area  of
 land)  required  to  support  a  single  animal  will
 change more at the site  where  It Is smaller.  Both
 seem unlikely.
    Assuming   there  are  values  of b  and  c  for
 which  Y  1s  additive, homoscedastic  and  Normal,
 Box and  Cox  proposed estimating  them by  maximum
 likelihood or Bayeslan  methods. There  are  some
 difficulties  with this  approach. A technical  one
 Is  that, at least  for  some  values  of  b,  the
 likelihood Is not  a  well-behaved function of  c
 (Berry 1987). A conceptual  one,  first  raised  by
 Tukey's discussion of Box and Cox  (1964),  Is  that
 the    method    may     emphasize    Normality,
 homoscedasticlty  and  additivity  In  that order
 (especially  when there  Is  only  one  observation
 per  cell),   when  the  order  of  Importance   1s
 generally agreed to  be the  reverse.  It  Is often
 asserted   that   these  properties  tend   to    go
 together  (e.g.  Atkinson 1985,  p.  81),  but  this
 claim  seems   to  rest  on  a  small set   of  non-
 randomly  chosen examples.
    In  this  paper,   we  propose  first  assessing
 additivity by the Tukey and Spearman  tests,  and
choosing  a  transformation  from  among those  for
which these tests (and accompanying plots) do not
 Indicate  significant non-addltlvlty 1n the  Before
 period.  (Non-additivity  After  might  be an  effect
 of the  alteration.   However   we  may  want   to
 consider  1t   If  the   After   observations  are
 different overall:  for  example,   If  the  Control
 area  has clearly   changed,  Implying  the whole
 region  has,   for  some  reason  other  than  the
 alteration.   Our  transformation choice method  Is
 very like fitting  a regression, and  we  face  the
 same   risks   when   we   extend   the   estimated
 relationship well beyond the observed range.)
    The null  hypothesis for these tests Is  really
 stronger  than  additivity:   the  differences  and
 averages  will  also  be  correlated If  the  Impact
 and Control  means are Identical but the variances
differ.  However, for population counts,  this  Is
 likely to lead us astray only If the variation  In
the   averages,   due   to  uniform  time-related
effects,   Is   small   compared  to  any  difference
between  Impact  and  Control  in variation  due  to
local  time-related  effects  and  sampling  error.
For most species, this seems far from the case.
    In   practice,  we  are  likely  to  find  that
either  many  transformations  seem  approximately
additive or none do.  The  latter case could arise
If  Impact  is  denser  than  Control  under  some
conditions  (e.g. when overall  abundance is high)
and sparser under others. For such a species, the
alteration  may  also  be  beneficial  under  some
conditions but not others,  and it may be best to
carry out  separate  analyses for  the  two sets of
conditions. The possibility of such a time-x-site
interaction that cannot be transformed away forms
part of the motivation  for  methods like those of
Section 4.
    If   many  transformations  are  approximately
additive,  we  must   choose  on  other  grounds.
Equality  of variances  -  the main  aspect  of our
assumption  (3) -  Is widely  agreed to be the next
most Important goal,  and  there  are  reasons for
thinking this to be the case here.
    Often this assumption  1s not  crucial.  If we
have   additivity  and   independence,   we  have
observations  Y^j  with  distributions  F^,  having
means m<.  It  is  natural to  compare this with the
homogeneous situation where Yi* has distribution
G1  • nr'ZF^  Stigler  (1975) shows  that any
linear  functloiT of the order  statistics has the
same mean and at least  as small  a variance in the
heterogeneous  situation  as  in  the  homogeneous
one.   These   results   Include   most   location
estimates  and  some   scale  estimates.  They  also
hold for  the standard  variance  estimate.  Thus  a
t-like test, with a measure  of discrepancy in  the
numerator   and  an   estimate   of  its  standard
deviation   In  the   denominator,   should  have
approximately  the   same  mean   but   a   smaller
variance  than In  the  homogeneous case.  Type  1
error  probabilities   will then  be  decreased  and
Type 2  error  probabilities  will be Increased  for
alternatives  near  the   null  but  decreased   for
distant alternatives.
    This  comparison   with   the  homogeneous  case
ignores   two  points.   One   1s   that   another
transformation, for which the corresponding Fj-i's
are more  alike, may be  more efficient - may hive
more power  to detect  effects  of the alteration.
This seems  impossible  to  decide  In  general.  If
there is  an effect,   then Impact and Control are
different either  Before or  After or  both, so at
least one of  the  transformations 1s non-additive
In at least one period.  The comparison would also
depend on which  tests were  used,  and on efforts
to  allow  for  heterogeneity  by  weighting  the
observations (Box and  Hill 1974).
    Perhaps   the   main   problem  Is   that   the
alteration   may   have   different   effects   at
different times. For  example,  an alteration that
warms part of  a body  of water might be  producing
a temperature  that  the  animals  prefer  In  Winter
but dislike In Summer.  If Impact and Control are
approximately  equally  abundant  in  the  Before
period,  differences  and  averages  will  appear
unrelated for  almost  any transformation.  But  It
still may be  that the size  of  a time-related
effect   depends   on   the   size  of  the   local
population It Is operating on. If populations are
smaller  in  Winter,  when the  alteration  has  a
positive (or large) effect,  than In Summer,  when
the effect is negative (or small), the results of
                                                   59

-------
tests and  estimates  may  depend  strongly on  the
transformation choice.
   This  Is not  a simple  Issue.  In a  case like
this,  the  results  will   also  depend  on  which
season  is  better represented  In  the  sampling
times chosen.  Also,  It  is  not  clear  how  "the"
effect should  be assessed even  if we  know  the
pattern:  does the positive effect in Winter, when
the  local   population's  survival   or  genetic
structure   may   be   threatened,    outweigh  the
negative effect  in Summer, when  the population's
interactions with Us environment  are  at  their
peak?
   A workable, though  Imperfect,  solution may be
to carry  out  three analyses,  one  for  all  the
data, another  for Summer  (or "abundant" periods
as determined  by, say,  the  Control  value),  and
the  third  for  Winter.  For  each  analysis,  the
choice of  transformation  (among those that  appear
to  be  additive)  could  be determined  partly by
whether  absolute  deviations  from  the  (supposedly)
common mean tend to  increase, decrease or  stay
the  same  as overall  abundance (measured by the
average  of Impact and Control) increases.
    Different variances  for  the  Before and After
differences  (assumption  4) do not seem to  affect
the  validity, of  t-tests much (Posten  1982), If
the  sample sizes are  nearly equal.  If they are
not,  the  Welch-Satterthwalte modification seems
almost always  to give a conservative test.  Non-
Normality  also seems to  have little effect  on the
validity of the  t-test (e.g.  Posten 1978).  The t-
test 1s  not efficient for many non-Normal distri-
butions,  but this problem seems  best  approached
by using more  robust tests along  with It.
    The results  on which  these comments  are based
are  for  symmetric situations:  either the  errors
have  symmetric distributions, or the Before and
After  errors  have  the  same distribution  (and
sample  size).  One-sided  one-sample  t-tests in-
volving   skewed   distributions  may  have  actual
levels  far from  the nominal ones  (Posten  1979),
though this seems to occur only for small samples
or  extreme skewness.  This suggests,  though,  that
two-sample tests might  go  astray  if the  Before
and  After difference distributions  are  skewed and
not  the  same  shape.  In such a  case,   also, the
various  robust tests we  might use in  concert  with
the  t-test - e.g. the  Wilcoxon-Mann-Whitney test,
or the t-like  test  based on  the 25% trimmed mean,
which  will  be  used   here  *  will   be  testing
different  things:  the   true  Before  and  After
means,   the true trimmed  means,  and  the  true
 probability that a random Before difference  will
 exceed  a   random After  difference may not  all
 agree  on whether Before differences  are "larger"
 than After differences.
    Tests   and   plots    for   symmetry   and   for
 comparing the shapes of the distributions  of the
 Before and After differences will thus be useful,
 partly   for  deciding    whether  a   particular
 transformation   is  "acceptable"   but   more   for
 Indicating  what to  do  if  tests  for  different
 acceptable  transformations   disagree.  There are
 many tests for  symmetry, most  of which  are too
 prone   to    reject    symmetry    for   symmetric
 distributions  with  long tails.  Gupta's  (1967)
 test does  not seem to do this (Antille, Kerstlng
 and  Zucchini   1982).  A  very  crude  measure  of
 agreement between the shapes of the distributions
 of the  Before and  After differences (independent
of differences of location  or  scale)  is obtained
by  standardizing each  observation  (subtracting
the mean and  dividing  by the standard deviation)
and   calculating    the  .standardized   Smirnov
statistic,  {n1n2/(n1+n2)}1/zD,  where   D is  the
maximum  difference  between the cdf's  (Lehmann
1975, p. 37).  If this  measure  is  under about 1,
the  shapes  seem  compatible (Lehmann,  Table F).
(Because  the  standard  deviation may  be  a poor
measure of  scale for some  distributions,  I have
removed the top  and  bottom 10% of each  distribu-
tion before calculating this scale measurement in
Section 3's examples.  A criterion smaller than  1
might be desirable after all this matching.)
    Assumption (2) remains.  The differences form
a  time  series,  and  would be  expected   to  be
serially  correlated  if the sampling  times were
very  close  together.  On the  other hand,  if we
have  an  additive transformation, the major  time-
related  effects  may  cancel  when  we  take the
Impact-Control differences,  and the minor,  local
effects  remaining may not  be  long-lasting.   (A
more  detailed argument  of this  kind  appears  In
Stewart-Oaten,  Murdoch  and Parker  1986  and  in
Stewart-Oaten 1986.)  There  are  many ways to  test
for  serial  correlation. Two simple  ones are the
von  Neumann  ratio  (von  Neumann  1941; Anderson
1970, Table 6.3),  the predecessor of  the Durbin-
Watson  test,  and  the  rank  version  of  this  test
(Bartels 1982).
    If   these   tests   indicate   that   serial
correlation  has  not  been  removed,  we  have   a
difficult   problem:   a   time   series    whose
observations  are  not  (usually)  equally  spaced
and,  whether  they  are or  not,  is  likely  to  be
non-stationary    because   the   magnitude   and
frequency of  disturbances, and  the  vulnerability
of the population,  vary over  time.  The  serious
modelling  needed  1s  beyond  the  scope of  this
paper.  In an  example  below (Species  3),   I  have
simply  adjusted the observations to  approximate
Independence  in  a very ad hoc manner.  (For cases
where the series  is amenable  to ARIMA modelling
or to ARMA modelling with  a fixed seasonal  compo-
nent,  see Box and  Tlao 1965 and 1975,  and  Reinsel
and Tlao  1987.)
    Thus  a  scheme for testing  for  an  effect  of
the alteration  Is:  transform  the  raw  Xy^'s  to
additive  Y^t/s  as  indicated  by  the  TuRey  and
Spearman  tesTs and  check  for  a  relation  between
the averages   (Y^ )  and the  absolute deviations
of the  difference's' (Y^T-Y^,) from their    "
 In the examples below,  thisjrelation was examined
 by computing  the  p-values  for  the slope  of the
 regression of the  absolute deviations against the
 averages  and  the  Spearman  test  for  correlation
 between them.  Check the  differences  for  serial
 correlation,  and  allow for it  if present. Carry
 out a two-sample  test  to see If the differences
 have  become   smaller   following   the  alteration.
 Check whether  the  test  result  Is  likely  to  be
 affected  by  skewness,  using  Gupta's test  and a
 measure of the discrepancy  between  the shapes of
 the  differences   of the  transformed  Before  and
 After values.  Also  check whether the  result may
 be due to changes other  than  the alteration; one
 check  Is  to  regress  the  Before  differences
 against time
     Because the t-test  lacks robustness, 1t makes
 sense to  use robust tests  as  well.  Similarly,  it
                                                     60

-------
 makes   sense  to  use   a  range  of   apparently
 acceptable   transformations.   If   the   results
 disagree,  we  need further  thought as  to  which
 ones most nearly describe the  changes  of concern.
 If  the  results agree,  we need to think  about  the
 size of the effect and  mechanisms.  This will  be
 easier   1f   the  transformed  variables  are   In
 interpretable   units:  b-1   (change  in   absolute
 numbers),   b-0  (percentage   change)   and  b— 1
 (change  in the  amount of space needed  per animal)
 seem best.

    3.  Examples
    The following examples all use real  data from
 an  ongoing   assessment  project.  However,  I have
 rescaled all numbers  so  that the maximum In each
 data set Is  95  and have  Interchanged "Impact"  and
 "Control" 1n some cases.
    Three kinds of table  appear. Data  tables list
 Impact  ("I") and Control values by sampling time
 (read  down   the columns  first).  Times  were  not
 listed unless the Independence checks  Indicated a
 serial  correlation problem or  there were  separate
 analyses for different  "seasons".  They are  given
 In  units of a  year, beginning with  January 1 of
 the first year the species was sampled.
    Preliminary  checks   give   the   power   and
 constant of  the transformation,  the p-values  for
 the  Tukey   and Spearman  tests  (differences  vs.
 averages),   the slope  and Spearman  tests  for a
 relationship between the  average and the  absolute
 deviation  of  the  difference  from   the  median
 difference,   the  von  Neumann  test   for  serial
 correlation  and Its  rank version,  Gupta's test
 for symmetry, and the test for a trend with  time.
 "Shape"  Is  the comparison of the shapes  of  the
distributions   of    the   Before    and    After
differences. All tests are two-sided except  those
for  serial   correlation.   Negative   "p-values"
 Indicate a  negative  slope or  correlation of  the
tested    variable   against   the  averages.   All
 transformations are  monotone  Increasing:,  when b
 1s negative, the transformation 1s -(X+c)D.
    Four tests   for an effect  of the  alteration
were used:   the standard  t-test,  the Wllcoxon-
Mann-Uh1tney test,  and the t-Uke  tests based on
the 25% trimmed mean (Yuen and D1xon 1973) and on
 the blwelght estimate (Kafadar.1982;  this  paper
 1s missing the factors  [9 HAD]2 and  [6 sb1]<  In
the  definitions of  sb4   and  ST respectively; I
 have used  4 Instead  of  6  1n ner  equ  (2),  see
 Kafadar 1983).
                                          When a  non-positive  power was  used,  and the
                                      constant was zero, all sampling times with a zero
                                      at either Impact or Control were omitted from the
                                      analysis. Sampling times with zeros at both sites
                                      were omitted  from all analyses,  though  they are
                                      listed with the data.
                                          In  all  plots,  filled  squares Indicate Before
                                      data and empty squares After data.
                                          The data and  analyses presented  here  do not
                                      support  any conclusions  relevant to  the  actual
                                      project.  First,   as  already  stated,  Impact  and
                                      Control have sometimes been reversed. Second, the
                                      analyses are all purely numerical. I have made  no
                                      use of anything known about the species Involved,
                                      or  even  given  their  names.  In  practice  these
                                      analyses   should   be   used   In   concert   with
                                      Information about the physiology  and population
                                      dynamics  of the  species  concerned. Third,  the
                                      examples  here  have  been  selected  from data   on
                                      hundreds  of species,  to  Illustrate statistical
                                      problems: they are not typical.

                                                     TABLE  1: Data  for Species 1.
I
0.83
2.17
0.11
0.37
3.13
0.10
0.08
0.73
15.14
4.68
0.00
C
0.68
0.87
0.39
0.19
5.22
0.14
0.03
0.44
1.31
2.46
0.00
                                                           BEFORE
                                                           I     C
                                                         0.25  0.52
                                                                 80
 1.09
 0.98
14.83
 1.42
 0.07
 0.16
 0.49
 0.00
 5.56
                                                               2.39
                                                               10.00
                                                               0.62
                                                               0.15
                                                               0.07
                                                               0.21
                                                               0.02
                                                               3.79
  I     C
28.62 34.75
        .60
        .16
        .09
1.35
0.15
0.41
1.46
1.19
1.55
3.19
2.
0.
0.
1.31
0.71
1.18
5.27
 4.77  1.15
                                                        41.57  12.43

I
1.84
0.32
0.54
0.02
0.08
0.81
0.13
0.74

C
0.87
0.64
2.14
0.16
0.47
8.45
0.34
1.20
AFTER
I C
0.34 0.02
0.00 0.00
1.34 22.29
0.11 0.37
3.54 11.18
64.91 10.59
0.79 1.92
0.01 0.01

I C
2.45 4.57
5.41 5.69
75.26 35.81
95.00 27.95
1.18 0.04
0.02 0.05
5.38 3.82
.
                                         Species  1.   The data  are  1n  Table  1.  The
                                      Before  samples  are spread over 5 years,  and  the
                                      After   over  3.   Table  2   shows  that   serial
                                      correlation  Is not  a serious  problem.  However,  no
                         TABLE 2
           power
             00
           1.00
           1.00
           1.00
           0.50
             50
             50
             50
             00
             00
           0.00
           0.00
           0.00
           0.00
           1.00
const
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
Tukey
0.00
0.73
0.58
0.42
0.63
0.51
0.47
0.34
0.58
0.26
0.24
0.20
0.16
0.10
0.00    0.01
: Preliminary checks for Soedes 1. all
Spear
0.28
0.85
0.81
0.46
0.89
0.79
0.75
0.35
0.88
0.48
0.43
0.28
0.24
0.14
0.12
vslope
0.00
0.15
0.82
0.20
0.14
0.96
0.72
0.19
0.75
0.60
0.27
0.09
0.03
0.00
0.00
vspear
0.00
0.13
0.93
0.19
0.14
0.94
0.76
0.15
0.61
0.75
0.26
0.05
0.01
0.00
0.00
von N
0.88
0.89
0.71
0.47
0.88
0.75
0.69
0.50
0.74
0.70
0.64
0.56
0.48
0.41
0.80
rank vN
0.89
0.85
0.85
0.72
0.90
0.90
0.85
0.76
0.86
0.86
0.85
0.82
0.72
0.63
0.64
times.
shape
0.91
0.40
0.71
0.53
0.31
0.58
0.71
0.58
0.68
0.76
0.71
0.71
0.71
0.53
0.89
       Gupta
        0.32
        0.48
        0.42
        0.45
        0.41
        0.17
        0.23
        0.21
        0.20
        0.15
        0.13
        0.27
        0.31
        0.47
        0.23
                                                                       trend
                                                                        0.82
                                                                          .69
                                                                          .89
                                                                          .95
                                                                          .73
                                                                        0.84
                                                                        0.89
                                                                        0.98
                                                                        0.83
                                                                        0.90
                                                                        0.91
                                                                          .96
                                                                          .98
                                                                          .92
 0.
 0.
 0.
 0.87
                                                   61

-------
           power  const
                       TABLE 4; Preliminary checks for Species 1. abundant times.
             00
             00
             00
             00
          -0.50
          •0.50
          •0.50
          -0.50
           0.00
            .00
            .00
            .00
            .00
            .00
            .00
-1.
-1.
-1.
-1.
 0.
 0.
 0.
 0.
 0.
 1.
  00
  50
  00
  00
  20
        0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
  00
  00
Tukey
 0.63
 0.77
 0.88
 0.95
 0.98
 0.97
 0.94
   .84
   .64
   .63
   .62
   .60
 0.56
 0.48
 0.16
Spear
0.30
0.49
0.60
1.00
0.73
0.76
0.92
0.94
0.50
0.50
0.45
0.39
0.39
0.33
0.45
vslope
0.03
0.09
0.17
0.34
0.24
0.30
0.33
0.49
0.92
0.95
0.98
0.96
0.86
0.63
0.01
                                                vspear
                   .08
                   .09
                 0.15
                 0.62
                 0.20
                 0.28
                 0.47
                 0.65
                 0.95
                 0.95
                 0.91
                 0.76
                 0.50
                 0.39
                 0.00
positive power  gives  samples that  appear either
additive or  homoscedastlc.  (I  omitted  from  the
table many  of  the transformations that  failed
these tests.  The  results   for  positive  powers
given here  are  typical of the others.)  Once we
restrict  to   the  transformations   that   seem
acceptable by  the preliminary  checks,  the  two-
sample  tests  all  Indicate a significant  change
(Table 3).

      TABLE 3: Species
power
-1.00
-1.00
-1.00
-1.00
-0.50
-0.50
-0.50
-0.50
0.00
0.00
0.00
0.00
0.00
0.00
1,09
const
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
9.QQ
t-test
0.400
0.037
0.015
0.011
0.041
0.022
0.019
0.017
0.061
0.055
0.051
0.051
0.058
0.083
0.806
Wile
0.012
0.003
0.005
0.015
0.004
0.008
0.009
0.023
0.019
0.028
0.027
0.033
0.050
0.052
P.Q65
trlm-t
0.187
0.012
0.011
0.024
0.014
0.010
0.013
0.034
0.031
0.043
0.054
0.065
0.070
0.094
0.215
blwt-t
0.005
0.001
0.007
0.029
0.003
0.010
0.014
0.041
0.037
0.059
0.071
0.092
0.124
0.197
0-174
    While this  seems  convincing,  we  may  want to
look  deeper because  the  largest  Impact-Control
differences actually  occur In  the  After period.
While  these   are  at  times   of  high  overall
abundance,  we   might  wonder  whether  effects  at
"abundant"  times  are  different  from those  at
"sparse"  times.   Accordingly,   I   divided  the
sampling times  Into those for  which  the Control
                von N
                0.49
                0.58
                0.63
                0.68
                0.66
                0.67
                0.68
                0.70
                0.74
                0.74
                0.74
                0.75
                0.75
                0.76
                0.85
               rank  vN
                0.73
                0.69
                0.71
                0.66
                0.70
                0.70
                0.66
                0.66
                0.73
                0.73
                0.73
                0.73
                0.73
                0.75
                0.83
                        shape
                         1.01
                         0.62
                         0.54
                         0.54
                         0.54
                         0.54
                         0.54
                         0.35
                         0.43
                         0.35
                         0.35
                         0.35
                         0.35
                         0.35
                         0.70
       Gupta
        0.50
        0.39
        0.44
        0.44
        0.50
          .50
          .50
          .44
          .50
          .50
          .50
          .50
          .50
          .50
 0.
 0.
 0.
 0.
 0.
 0.
 0.
 0.
 0.
 0.33
       trend
        0.58
          .58
          .57
          .56
          .56
          .56
          .56
          .55
          .56
        0.56
        0.56
        0.56
        0.56
        0.57
        0.87
                                             was  above  Its median  of .867,  and  those below.
                                             (Table   1's   values  are   rounded.   The  second
                                             sampling   time   in   the    Before   period   was
                                             "abundant",  while  the  first  After  time  was
                                             "sparse".)
                                                 Table 4  shows  that  the  transformations that
                                             appear  acceptable  overall   do  so  for  abundant
                                             times too, with  similar test  results  (Table 5).
                                             (The trlm-t values might be affected by the small
                                             samples.) Table 6, however,  shows few transforma-
                                             tions as acceptable on all  counts for the sparse
                                             times:  some  promising  ones  give mildly  skewed
                                             Before  distributions.  Test  p-values are  larger
                                             than    those    conventionally   required    for
                                             "significance"   (Table   7),   though  they  also
                                             suggest a reduction due to the alteration. The t-
                                             test here  is affected  by the  long  upper tail  of
                                             the After differences.

                                                 TABLE 5;  Species 1  tests, abundant times
                       TABLE  6:  Preliminary  checks  for  Species  1.  soarse  times.
power const
-1.00
-1.00
-1.00
-1.00
-0.50
-0.50
-0.50
-0.50
0.00
0.00
0.00
0.00
0.00
0.00
1.00
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
0.00
t-test
0.005
0.005
0.006
0.009
0.008
0.010
0.011
0.015
0.047
0.049
0.051
0.056
0.064
0.089
0.789
Wile trlm-t
0.004
0.006
0.007
0.016
0.014
0.016
0.016
0.030
0.052
0.058
0.058
0.058
0.064
0.064
0.216
0.113
0.129
0.124
0.096
0.136
0.124
0.119
0.104
0.135
0.136
0.137
0.139
0.154
0.192
0.451
biwt-t
0.016
0.018
0.021
0.028
0.027
0.030
0.032
0.041
0.083
0.086
0.089
0.095
0.107
0.138
0.047
           power
            .00
            .00
            .00
            .00
            .50
            .50
          -0.50
          -0.50
           0.00
           0.00
             00
             00
           0.00
           0.00
           1.00
        const
        0.00
        0.50
        1.00
        2.00
        0.20
        0.40
         .50
         .00
         .00
         .05
         .10
         .20
        0.40
        1.00
        0.00
0.
1.
0.
0.
0.
0.
Tukey
 0.01
 0.28
 0.11
 0.03
 0.35
 0.19
 0.15
 0.05
 0.79
 0.26
 0.21
 0.14
 0.07
 0.02
 0.00
Spear
 0.64
 0.31
 0.07
 0.05
 0.44
 0.15
 0.13
 0.05
 0.74
 0.42
 0.21
 0.13
 0.06
 0.03
                      vslope  vspear
                0.02
 .00
0.39
0.79
0.62
0.24
0.52
0.63
0.84
0.09
0.18
0.35
0.59
0.99
0.46
0.06
 .00
0.32
0.93
0.49
0.15
 .70
 .82
 .56
 .04
 .13
 .48
0.83
0.71
0.36
0.09
0.
0.
0.
0.
0.
0.
von N
0.85
0.87
0.84
0.79
0.87
0.86
0.85
0.81
0.88
0.87
0.86
0.85
0.82
0.77
0.63
rank vN
0.94
0.92
0.80
0.77
0.92
0.86
0.79
0.77
0.85
0.87
0.85
0.79
0.82
0.74
0.66
                       shape
                        0.43
                         .60
  60
  44
  64
  60
  60
  56
  71
  60
0.73
0.60
0.60
0.56
                                                                          0.56
Gupta
 0.29
 0.10
 0.23
 0.43
 0.05
 0.14
 0.20
 0.31
 0.09
  .03
  .04
  .12
  .31
 0.43
 0.35
0.
0.
0.
0.
trend
 0.97
  .29
  .28
  .29
  .31
  .28
  .28
  .29
  .31
 0.35
 0.31
 0.29
 0.28
 0.30
 0.34
                                                   62

-------
       TABLE 7; Species 1 tests, sparse times.
power
-1.00
-1.00
-1.00
-1.00
-0.50
-0.50
-0.50
-0.50
0.00
0.00
0.00
0.00
0.00
0.00
1.00
const
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
0.00
t-test
0.441
0.308
0.323
0.351
0.314
0.317
0.321
0.341
0.309
0.332
0.328
0.330
0.341
0.367
0.438
Wile
0.198
0.130
0.167
0.167
0.119
0.154
0.167
0.167
0.107
0.130
0.141
0.141
0.167
0.181
0.154
tr1m-t
0.491
0.170
0.181
0.206
0.175
0.174
0.178
0.201
0.208
0.187
0.179
0.186
0.203
0.218
0.277
b1wt-t
0.495
0.168
0.172
0.200
0.183
0.174
0.175
0.192
0.210
0.202
0.190
0.187
0.193
0.217
0.231
    Perhaps  the  best transformation  overall  Is
 ln(X+.2),   partly   because  1t   1s  easier   to
 Interpret:  there 1s not  much difference among the
 three or four best. F1g. 1  shows Before and After
 synmetry plots for this  transformation,  using all
 sampling times.  This plots Y/yj\-M  against  M-
 Y(m» where  Y^)  and  Y(L1j  are,  respectively,
the 1th  values above and below the median, and H
Is a  "middle'" value,  which I  took  as the point
nearest  to  the 10%  trimmed  mean In the  Interval
between  the (n/2-1) and  (n/2+2)  (or the  (n-l)/2
and (n+3)/2 1f n  1s  odd)  order statistics.  (The
median  1s more  commonly used,  but  Its  position
within the  Interval  formed  by the  Inmost  three or
four  order  statistics can  affect the appearance
of  the  plot  severely.)  Except  for  one  Before
outlier,  the  two  distributions  seem  similar In
shape  and not  severely asymmetric. Thus different
location  estimates should  lead to similar tests
and estimates  of change.

       Flo.  1; Symmetry plot for Species  1.
    Species  2.  Table 8 gives the  data,  and Table
9  shows  preliminary  checks   on  some  of  the
transformations. (The four After times with zeros
are omitted  when the  power Is non-positive  and
the   constant    Is    zero.)     Although    some
transformations  seem   satisfactory,   there,  are
standard transformations that do  not:  X,  X*3  and
ln(X+l)  all  seem unsuitable.
    Results   of   the  two-sample  t-test  for   an
effect of the  alteration vary  from  about  .06 to
.74; other tests are  less  erratic,  but still  far
from     constant     (Table    10).     However,
transformations  satisfying  the various criteria
TABLE 8: Data



time
0
0
0
0
1








.641
.644
.721
.723
.189
.208
.227
.257
.284
.309
.342
.361
.380
399


1
1
418
437
459
478
0
0
0
0
9
16
10
18
4
1
8

I
.532
.970
.019
.018
.020
.596
.046
.061
.960
.048
.294
86.859
6
13
2
1
2
2
.629
044
.322
520
460
224
for Species 2.


0
0
0
0
33
27
95
8
12
1
70
21
19
4
2
2
0
0
BEFORE
C
.048
.120
.262
.013
.368
.421
.000
.031
.129
.491
.660
.477
.542
.241
.435
.724
.643
729

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
time
.495
.514
.533
.552
.571
.590
.609
.628
.648
.667
.686
.724
.735
.743
.762
.781
.515
.879

8
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I
.746
.561
.613
.544
.679
.931
.968
.509
.084
.023
.032
.109
.005
.172
.002
.009
.265
.111

6
1
1
0
1
0
0
0
0
0
0
0
0
C
.097
.103
.248
.242
.665
.510
.461
.025
.919
.034
.016
.300
.003
0.111
0
0
0
0
.264
.116
.370
.121
AFTER
time
4.
4.
4.
5.
5.
5.
5.
5.
5.
5.
6.
6.
6,
548
644
953
197
347
410
464
642
757
970
058
307
364
0.
0.
2.
12.
1.
1.
0.
0.
0.
2.
15.
12.
2,
I
948
030
167
083
597
351
087
O'OO
000
176
739
633
129

2
0
0.
3.
2.
8.
0.
0.
0.
0.
5.
14.
2,
C
578
029
197
712
109
123
876
004
023
879
630
313
291
time
6
6
6
6
6
6
7
7
7
7
7
7
7,
.425
.482
.545
.597
.652
699
134
175
249
329
553
671
734
13
5
I
661
112
0.236
0
0
0.
5.
0.
2.
5.
0.
0.
0,
004
048
007
573
164
453
654
081
000
915

16
1
5
0
0
0
1
0.
11.
23.
1.
0.
0
C
266
733
257
303
000
047
687
154
486
911
260
002
903
 are more consistent. The  (power,  constant) pairs
 (-1,  1),  (-.75,  .5 or 1),  (-.5,  .2 or .5), (-.25,
 .05 to .2)  and (0, 0 to .05)  do  so here; their t-
 test  p-values vary from .07  to  .15,  but Wilcoxon
 ranges from .12 to  .17 and biwelght from .14 to
 .2 (except  for  (0,0), which  omits  four  After
 observations).     The   trim-t     results    are
 consistently  higher.  This  may be due  to the inner
 half  of the After  data  being  more "stretched out"
 than    would    be   expected    from    a   Normal
-distribution.  All  tests  seem similarly affected
 by the After  distribution's skewness.  Fig. 2 is a
 symmetry plot  for  ln(X+.01).

        Flo.  2:  Symmetry clot for Soecies 2.



                                              S.J
                                                   63

-------
TABLE 9: Preliminary checks for Species 2. all times.
power const Tukey Spear vslope vspear
-1.00 0.50 0.81 0.85 -0.08 -0.11
-1.00 1.00 -0.84 0.97 -0.34 -0.81
-1.00 3.00 -0.24 -0.38 0.01 0.00
-0.75 0.20 0.62 0.94 -0.03 -0.03
-0.75 0.50 -0.99 0.97 -0.19 -0.54
-0.75 1.00 -0.60 -0.71 -0.95 0.32
-0.75 3.00 -0.13 -0.34 0.00 0.00
-0.50 0.10 0.61 -0.90 -0.03 -0.06
-0.50 0.20 0.88 -0.98 -0.11 -0.37
-0.50 0.50 -0.66 -0.79 -0.83 0.53
-0.50 1.00 -0.32 -0.36 0.09 0.01
-0.25 0.05 0.79 -0.92 -0.09 -0.43
-0.25 0.10 -0.95 -0.92 -0.31 -0.80
-0.25 0.20 -0.64 -0.82 0.97 0.51
-0.25 0.50 -0.28 -0.35 0.03 0.01
0.00 0.00 0.93 -0.77 -0.39 -0.88
0.00 0.01 -0.78 -0.76 0.96 0.53
0.00 0.02 -0.64 -0.71 0.59 0.34
0.00 0.05 -0.44 -0.66 0.14 0.08
0.00 1.00 -0.06 -0.43 0.00 0.00
0.50 0.00 -0.05 -0.47 0.00 0.00
1.00 0.00 -0.06 -0.40 0.00 0.00
von N rank vN
0.63 0.71
0.71 0.84
0.84 0.85
0.54 0.65
0.66 0.80
0.74 0.91
0.90 0.87
0.51 0.70
0.60 0.74
0.71 0.87
0.81 0.88
0.54 0.68
0.62 0.68
0.71 0.79
0.83 0.89
0.55 0.79
0.68 0.78
0.73 0.78
0.80 0.81
0.97 0.92
0.98 0.92
0.97 0.84
Overall abundances vary greatly with time,
suggesting a look at "abundant" and "sparse"
periods. In this case, these seem to correspond
quite well to "seasons": a "Winter" from 0 to .43
seems to contain most of the large values and few
of the small. This gives 11 observations In
Winter for both Before and After, and 25 Before
and 15 After in Summer.
Table 11 gives preliminary checks for Winter.
Most transformations with negative powers are
acceptable by most of our standards. However,
most lead to markedly asymmetric distributions.
The positive powers give transformations for •
which the absolute deviations seem to increase
with overall abundance. If this means that
effects of the alteration when abundance is high
are more
this may
view. As
important
the same
change.


Influential than those when it is low,
not be undesirable from some points of
It happens, the choice here is not
: all transformations give essentially
result (Table 12), no significant



















TABLE 11: Preliminary checks
power const Tukey Spear vslope vspear


















-1.00 1.00 0.70 0.83 0.77
-1.00 3.00 -0.77 -0.59 0.24
-1.00 10.00 -0.52 -0.35 0.20
-0.75 1.00 -0.99 -0.59 0.43
-0.75 3.00 -0.66 -0.28 0.16
-0.75 10.00 -0.51 -0.40 0.14
-0.50 0.50 -0.88 -0.40 0.27
-0.50 3.00 -0.58 -0.35 0.14
-0.50 10.00 -0.50 -0.38 0.09
-0.25 0.00 -0.79 -0.48 0.14
-0.25 0.01 -0.79 -0.48 0.13
-0.25 0.05 -0.77 -0.48 0.12
-0.25 0.20 -0.73 -0.48 0.10
0.00 0.00 -0.61 -0.38 0.06
0.00 0.01 -0.61 -0.38 0.06
0.00 0.05 -0.61 -0.38 0.06
0.50 0.00 -0.52 -0.38 0.01
1.00 0.00 -0.52 -0.32 0.00
-0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
87
45
29
61
45
14
43
24
06
28
28
28
22
05
05
05
09
05








TABLE 10;
power
-1.00
-1.00
-1.00
-0.75
-0.75
-0.75
-0.75
-0.50
-0.50
-0.50
-0.50
-0.25
-0.25
-0.25
-0.25
0.00
0.00
0.00
0.00
0.00
0.50
l.QQ
for Scecies
const
0.50
1.00
3.00
0.20
0.50
1.00
3.00
0.10
0.20
0.50
1.00
0.05
0.10
0.20
0.50
0.00
0.01
0.02
0.05
1.00
0.00
0.00
shape
1.32
1.04
0.96
1.08
1.00
0.96
0.68
0.80
0.84
0.89
0.96
0.56
0.68
0.77
0.96
0.41
0.44
0.50
0.56
0.64
0.49
1.28
Gupta
0.30
0.31
0.39
0.34
0.38
0.32
0.47
0.31
0.37
0.39
0.36
0.43
0.43
0.43
0.39
0.45
0.40
0.38
0.41
0.43
0.41
0.32
Snecies
t-test
0.060
0.076
0.181
0.065
0.069
0.098
0.236
0.075
0.072
0.092
0.142
0.089
0.089
0.102
0.151
0.099
0.130
0.135
0.147
0.366
0.565
0.736
2. Winter.
von N rank vN
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.84
.95
.97
.91
.96
.97
.93
.97
.96
.94
.94
.95
.95
.96
.96
.96
.95
.91
0.66
0.96
0.92
0.96
0.91
0.95
0.91
0.92
0.94
0.92
0.92
0.92
0.92
0.94
0.94
0.94
0.94
0.90
shape
0.71
0.47
0.71
0.71
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
trend
0.19
0.27
0.70
0.20
0.23
0.37
0.89
0.24
0.24
0.34
0.56
0.31
0.32
0.37
0.59
0.46
0.49
0.51
0.56
0.85
0.68
0.58
2 tests: all
Wile
0.154
0.134
0.194
0.125
0.171
0.119
0.292
0.128
0
0
0
0
0
0
0
0
0
0
0
0
0
0

.175
.134
.157
.154
.171
.157
.168
.132
.154
.150
.168
.283
.259
,?73

Gupta


















0.01
0.36
0.43
0.03
0.43
0.36
0.18
0.50
0.50
0.23
0.23
0.23
0.36
0.50
0.50
0.50
0.36
0.50
trim-t
0.258
0.275
0.262
0.210
0.303
0.297
0.263
0.201
0.291
0.288
0.280
0.222
0.291
0.308
0.290
0.174
0.208
0.254
0.284
0.257
0.240
0.203

trend
0.34
0.29
0.25
0.30
0.27
0.24
0.27
0.25
0.24
0.25
0.25
0.25
0.25
0.24
0.24
0.24
0.24
0.26
times.
biwt-t
0.171
0.141
0.262
0.192
0.140
0.170
0.282
0.158
0.150
0.163
0.211
0.151
0.163
0.173
0.204
0.106
0.165
0.183
0.195
0.219
0.194
0 IRS




















                           64

-------
TABLE 12
power const
-1
-1
-1
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
0
0
0
0
1
.00
.00
.00
.75
.75
.75
.50
.50
.50
.25
.25
.25
.25
.00
.00
.00
.50
,00
1
3
10
1
3
10
0
3
10
0
0
0
0
0
0
0
0
p
.00
.00
.00
.00
.00
.00
.50
.00
.00
.00
.01
.05
.20
.00
.01
.05
.00
,00
: Soecies
t-test
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.534
.605
.727
.583
.652
.749
.621
.697
.767
.672
.672
.671
.673
.725
.725
.726
.779
1 77?
2 tests. Winter.
Wile
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
o
.693
.669
.738
.669
.738
.669
.716
.780
.693
.716
.716
.738
.738
.760
.760
.760
.780
,799
tr1m-t
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.671
.678
.667
.676
.672
.669
.672
.667
.675
.680
.679
.676
.670
.668
.668
.667
.671
,658
b1wt-t
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.645
.664
.722
.654
.684
.734
.673
.703
.745
.701
.701
.698
.696
.720
.720
.719
.760
,771
   Table  13  gives  the  preliminary checks  for
Summer, and Table  14  the results. The acceptable
transformations overlap  considerably with  those
for   all   times   together.   Also,  they   give
consistent results,  except  for the  trimmed  t:  a
Before-After  change  that  1s   significant  at  a
level around .05 - .08. The cause of the variable
trimmed t results  appears to  be that the size of
the  added constant determines  whether the  Inner
half  of the After data  will  be relatively  near
the  median  (as for  a long-tailed  distribution),
giving  low p-values,  or stretched out as much  as
the  outer  half  (like  a  uniform  distribution,
although  somewhat  skewed).
   The  transformation  -(X+.2)'-zs  seems  best
overall (I  am  biased  against having  deviation
size decrease with averages) though we may prefer
ln(X+.01)   as  more  Interpretable,  despite  the
Increase  1n  variation with  the  averages.  Both
seem to show no particular pattern  In the plot of
differences   against   season    (which   roughly
corresponds to  decreasing overall abundance), as
shown for ln(X+.01)  1n Fig.  3. They give similar
results:  conventional  non-significance  overall,
though  near  enough  to  borderline to warrant  a
closer  look;  clear  non-significance  In Winter,
with  even a hint  of  an  Increase  at  the Impact
TABIF
power
-1.
-1.
-0.
-0.
-0.
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
0.
0.
1.
00
00
75
75
50
50
25
25
25
25
00
00
00
00
00
50
00
14: Soecies
const
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0,
10
50
10
50
05
20
00
01
05
20
00
01
02
05
10
00
00
t-test
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
073
057
064
062
068
058
054
077
062
065
047
065
065
067
071
102
192
2 tests
Wile
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.055
.069
.058
.065
.055
.073
.039
.052
.062
.069
.049
.052
.049
.069
.073
.065
.115
. Summer.
trim-t
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0,
083
176
100
185
081
165
086
038
118
183
082
115
131
160
179
150
211
biwt-t
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.067
.079
.082
.085
.075
.083
.031
.043
.085
.091
.043
.075
.085
.093
.098
.133
,321

ilfl_
3
: Soecies
2 Differences
vs.
Season




o
























a n

n


"








.


t







•


•

.
°. n" ^ .


a

m





o

•






D








a



o'




o
a a


D


0
0*

a


.







s
•


m
m

•
a













0


e
•








     000
              0 25
                                  O 75
                                            I 00
area, although the  Before data are  limited  to a
13-week period;  and  a  significant  decrease (or
one  near  enough  to  be  taken seriously,  by  most
standards) in Summer.
                          TABLE 13; Preliminary checks for Species 2. Summer.
power
-1.00
-1.00
-0.75
-0.75
-0.50
-0.50
-0.25
-0.25
-0.25
-0.25
0.00
0.00
0.00
0.00
0.00
0.50
1.00
const
0.10
0.50
0.10
0.50
0.05
0.20
0.00
0.01
0.05
0.20
0.00
0.01
0.02
0.05
0.10
0.00
0.00
Tukey
0.38
0.41
0.38
0.41
0.35
0.39
0.19
0.28
0.35
0.38
0.25
0.30
0.31
0.34
0.35
0.17
0.00
Spear
0.63
0.28.
0.40
0.22
0.47
0.24
0.86
0.60
0.31
0.23
0.52
0.32
0.30
0.18
0.19
0.19
0.18
vslope
-0.22
0.72
-0.34
0.48
-0.28
0.97
-0.02
-0.17
-0.52
0.55
-0.24
-0.52
-0.73
0.83
0.47
0.02
0.00
vspear
-0.11
0.30
-0.60
0.12
-0.41
0.50
-0.01
-0.21
-0.93
0.18
-0.43
-0.89
0.91
0.28
0.13
0.00
0.00
von N
0.41
0.63
0.45
0.66
0.42
0.59
0.22
0.34
0.50
0.63
0.37
0.47
0.51
0.57
0.62
0.58
0.30
rank vN
0.66
0.64
0.52
0.67
0.54
0.53
0.69
0.61
0.48
0.62
0.50
0.46
0.46
0.45
0.54
0.66
0.54
shape
0.69
0.93
0.50
0.93
0.47
0.72
0.44
0.90
0.50
0.80
0.44
0.38
0.50
0.50
0.66
0.66
0.88
Gupta
0.31
0.36
0.45
0.29
0.39
0.41
0.26
0.12
0.27
0.34
0.11
0.18
0.31
0.43
0.32
0.16
0.08
                                                                                         trend
                                                                                          0.35
                                                                                          0.20
                                                                                          0.30
                                                                                          0.20
                                                                                          0.32
                                                                                          0.22
                                                                                          0.44
                                                                                          0.36
                                                                                          0.27
                                                                                          0.22
                                                                                          0.32
                                                                                          0.29
                                                                                          0.27
                                                                                          0.24
                                                                                          0.23
                                                                                          0.28
                                                                                          0.49
                                                    65

-------
0
0
0
0
0








1
1
1
1
1
1

time
.562
.641
.644
.721
.723
.189
.208
.227
.257
.284
.309
.342
.361
.380
.399
.418
.437
.459
.478

7
I
.375
2.388
2
4
3
48
95
59
11
27
34
4
3
2
1
1
5
2
3

time
4
4
4
5
5
5
5
5
5
5
5
6
6
$
.548
.644
.953
.197
.347
.410
.464
.642
.699
.757
.970
.058
.307
• 364
2
0
9
31
1
1
0
1
5
0
1
0
1
1
Species
.549
.353
.600
.513
.000
.158
.007
.064
.939
.654
.039
.632
.866
.465
.217
.579
.661

I
.476
'.795
.510
.010
.546
.053
.135
.500
.673
.058
.823
.651
.204
• 107
3,
7.
2.
2.
1.
3.
32.
50.
44.
15.
10.
16.
5.
5.
3.
2.
3.
9.
2.
2.


1.
1.
15.
34.
1.
1.
0.
2.
8.
0.
1.
0.
2.
1.
The
BEFORE
C time
234 1.495
609
726
275
149
010
123
649
227
728
510
212
315
163
358
884
.514
.533
.552
.571
.590
.609
.628
.648
.667
.686
.724
.735
.743
.762
.781
350 2.515
814 2.707
300 2.879
AFTER
C time
887 6.425
784 6.482
629 6.545
075 6.597
432 6.652
893 6.699
798 7.134
364 7.175
689 7.249
170 7.329
313 7.553
752 7.671
617 7.734
877
data are
0
0
0
0
0
4
7
3
0
0
1
0
0
0
0
0
3
0
0


0
0
15
0
12
2
14
19
20
8
1
4
0

I
.509
.339
.237
.319
.035
.610
.529
.352
.193
.396
.493
.417
.192
.383
.073
.015
.913
.274
.299

I
.451
.247
.770
.999
.619
.693
.308
.374
.595
.379
.147
.354
.320

3
0
0
0
0
3
9
3
1
1
1
2
0
0
0
0
3
0
0


3
0
18
2
29
7
19
25
12
15
1
3
0

C
.011
.419
.308
.042
.259
.792
.404
.642
.057
.203
.760
.320
.234
.356
.110
.074
.292
.185
.235

C
.649
.646
.839
.428
.479
.268
.925
.146
.051
.565
.107
.298
.550

in Table 1!
                                                           FIG. 4; Differences vs.  Averages forVX
    	                                and
examples from  the first  run  of  the  preliminary
checks are 1n Table 16. These  examples,  and  more
detailed tables, strongly suggest  a  problem  with
serial  correlation.   (The  rank  version  of  von
Neumann's test  seems  to  have  greater power,  as
Bartels  (1982)  claimed It  would  for  non-Normal
distributions.)  In  addition,  no  transformation
seems additive. This  appears to be due to the six
high density sampling  times  In  the Before period,
when  Impact   densities   were,  with  one   mild
exception,  greater than Control  densities, while
this ordering Is reversed  for most other  sampling
times. Fig. 4  shows  the plot of  the  differences
against  the   averages  when  the  square   root
                                                               oo >

                                                               '
   1 0                   5                    10
   Many  of the Before  sampling times  are  quite
close together (most are a week apart),  so serial
correlation 1s not surprising and some adjustment
seems needed.  I  have not attempted  any  serious
modelling here. I have  assumed  that the centered
differences, d^-D^-ECDy) (Dy-Y^-Y^), obey
<'ld"r1j<'l(j-l)+^1j    where   the     EIJ s    are
Independent  errors   and  r^  decreases  linearly
with the tlmegap, tij-ti(j-i)>  I  chose  the esti-
mated line by an  ad  noc argument:  I assumed that
TJJ-O if the tlmegap is  greater than  .15 (this is
about eight weeks,   and  contains  the  period  of
high density  Before observations), and that the
sample   autocorrelation   estimated   r^   for   a
tlmegap of  about  .019,  the median  (anf mode)  of
the  Before  timegaps.  This  gives   r*<-1.15r(l-
T/.15),   where T  Is  the  timegap.  Ifjr1i>0,  I
replaced Dj<  by  (Dirr  divided T>y (1-
rti) so the mean would be unchanged.
    Table  17  shows  the  preliminary  checks  for
these adjusted  samples.  The serial  correlation
problem  seems now  to  be minor,  but additivity
remains  elusive.  A  plot  like Fig. 4,  but using
the  adjusted  data,  looks  very  like  Fig.  4.
(Trends  in  the  absolute deviations, portrayed in
"vslope" and  "vspear",  may  be  due to  trends  in
the  mean.)  It  Is   possible that,   without  the
alteration, Impact  Is more  popular than  Control
in some  conditions  and  less  popular in  others. I
therefore broke each  data set into two, defining
"conditions"  by time-of-year (Summer  between .33
and   .9)  rather   than  by   "abundance"    (e.g.
Control>10), though the answers are similar.
    Tables  18 and  19 give preliminary  checks  on
the   adjusted  data   for   Summer   and   Winter
respectively.  Even   though  there   are  only  six
transformation Is
power
-1
-1
-1
-1
-0
-0
-0
-0
0
0
0
0
0
1
.00
.00
.00
.00
.50
.50
.50
.50
.00
.00
.00
.00
.50
.00
used.
const
0.00
1.00
3.00
5.00
0.00
1.00
3.00
5.00
0.00
0.10
0.50
1.00
0.00
0.00
TABLE 16: Preliminary checks for Soecies 3. all
Tukey
0.00
0.33
0.25
0.16
0.00
0.19
0.07
0.02
0.03
0.05
0.02
0.01
0.00
0.00
Spear
0.02
0.09
0.25
0.28
0.03
0.13
0.21
0.17
0.02
0.04
0.09
0.14
0.09
0.24
vslope
-0.00
-0.13
-0.97
0.43
-0.00
-0.73
0.17
0.02
-0.05
-0.95
0.11
0.01
0.00
0.00
vspear
-0.00
-0.10
0.42
0.04
-0.00
0.63
0.03
0.00
-0.27
-0.97
0.15
0.02
0.00
0.00
von N
0.74
0.36
0.28
0.22
0.90
0.27
0.17
0.11
0.63
0.28
0.12
0.09
0.01
0.00
rank vN
0.17
0.07
0.07
0.07
0.15
0.10
0.05
0.04
0.23
0.18
0.07
0.04
0.07
0.04
times.
shape
1.57
0.86
0.79
0.79
0.70
0.79
0.72
0.65
0.88
0.72
0.72
0.65
0.77
2.04

Gupta
0.03
0.44
0.37
0.46
0.08
0.45
0.44
0.50
0.41
0.48
0.37
0.28
0.22
0.49

trend
0.59
0.53
0.43
0.40
0.61
0.44
0.36
0.34
0.43
0.41
0.34
0.31
0.26
0.29
                                                   66

-------
TABLE 17: Preliminary checks for Species 3. adjusted for correlation, all times
power const Tukey Spear vslope vspear
-1.00 3.00 0.30 0.30 -0.86 0.39
-0.50 1.00 0.23 0.19 -0.61 0.87
-0.50 3.00 0.12 0.39 0.29 0.08
0.00 0.00 0.03 0.02 -0.05 -0.27
0.00 0.10 0.06 0.09 -0.81 0.99
0.00 0.50 0.04 0.17 0.29 0.22
0.00 1.00 0.03 0.22 0.07 0.06
0.50 0.00 0.00 0.14 0.00 0.07
0.50 0.50 0.00 0.26 0.00 0.02
1.00 0.00 0.00 0.28 0.00 0.01
TABLE 18: Preliminary checks for Soedes
power const Tukey Spear vslope vspear
-1.00 3.00 0.91 -0.67 0.77 0.69
-0.50 1.00 0.82 -0.98 -0.88 -0.93
-0.50 3.00 -0.86 -0.46 0.44 0.29
0.00 0.00 0.31 0.25 -0.01 -0.09
0.00 0.10 0.71 0.44 -0.36 -0.60
0.00 0.50 0.99 -0.78 0.82 0.86
0.00 1.00 -0.87 -0.59 0.51 0.35
0.50 0.00 -0.64 -0.80 0.32 0.33
0.50 0.50 -0.45 -0.36 0.12 0.18
1.00 0.00 -0.04 -0.39 0.00 0.01
TABLE 19: Preliminary checks for Soecies
power const Tukey Spear vslope vspear
-1.00 3.00 0.97 -0.90 -0.01 -0.10
-0.50 1.00 0.87 .00 -0.00 -0.05
-0.50 3.00 0.84 .00 -0.00 -0.05
0.00 0.00 0.62 .00 -0.05 -0.09
0.00 0.10 0.62 .00 -0.05 -0.09
0.00 0.50 0.61 .00 -0.06 -0.09
0.00 1.00 0.60 .00 -0.07 -0.09
0.50 0.00 0.23 0.37 -0.64 -0.95
0.50 0.50 0.23 0.37 -0.65 -0.95
1.00 0.00 0.04 0.20 0.47 0.70
Before observations ~ In Winter, many
transformations are ruled out by trends 1n the
sizes of the deviations. There are 32 Before
observations 1n Summer, but there Is a much
broader range of acceptable transformations: Im-
pact and Control differ little 1n this period,
and 1t may also be that, for very small
observations, transformations Involving a
relatively large added constant behave rather
alike.
If there Is a "season-x-slte" Interaction, It
may not be the case that the same transformation
1s appropriate for both seasons. However, the
square root transformation seems to be
satisfactory In both seasons here. More
reassuringly, the tests for an effect of the
alteration give very similar results for
acceptable (and many unacceptable)
transformations for all times (Table 20), Summer
alone (Table 21} and Winter alone (Table 22).
TABLE 20: Soecles 3 tests (cf Table 17).
power const t-test Wile trim-t blwt-t
-1.00 3.00 0.040 0.011 0.015 0.008
-0.50 1.00 0.036 0.010 0.010 0.004
-0.50 3.00 0.018 0.008 0.007 0.011
0.00 0.00 0.041 0.016 0.012 0.033
0.00 0.10 0.017 0.007 0.004 0.010
0.00 0.50 0.014 0.005 0.007 0.007
0.00 1.00 0.012 0.006 0.006 0.010
0.50 0.00 0.016 0.017 0.030 0.050
0.50 0.50 0.018 0.020 0.027 0.057
1.00 0.00 0.048 0.033 0.089 0.077
von N rank vN shape Gupta trend
0.53 0.27 0.09 0.51 0.53
0.54 0.39 0.09 0.49 0.53
0.54 0.49 0.15 0.45 0.49
0.63 0.23 -0.06 0.41 0.43
0.54 0.40 0.09 0.37 0.49
0.57 0.69 0.18 0.42 0.48
0.56 0.68 0.22 0.41 0.47
0.50 0.91 0.38 0.40 0.48
0.46 0.89 0.39 0.34 0.49
0.44 0.89 0.45 0.47 0.50
3. adjusted for correlation. Summer
von N rank vN shape Gupta trend
0.51 0.20 0.04 0.23 0.72
0.50 0.28 0.03 0.15 0.75
0.51 0.23 0.06 0.13 0.75
0.83 0.47 -0.18 0.36 0.71
0.56 0.35 -0.04 0.34 0.72
0.51 0.31 0.04 0.16 0.77
0.51 0.25 0.06 0.18 0.76
0.51 0.35 0.05 0.12 0.79
0.51 0.38 0.09 0.16 0.81
0.48 0.47 0.13 0.05 0.90
3. adjusted for correlation. Winter
von N rank vN shape Gupta trend
0.56 0.32 -0.11 0.50 0.37
0.57 0.41 -0.12 0.50 0.43
0.56 0.41 -0.11 0.50 0.45
0.59 0.41 -0.14 0.50 0.62
0.59 0.41 -0.14 0.50 0.62
0.59 0.41 -0.14 0.50 0.63
0.59 0.41 -0.13 0.50 0.64
0.59 0.73 -0.11 0.50 0.95
0.59 0.73 -0.11 0.50 0.94
0.52 0.32 -0.02 0.50 0.54
TABLE 21: Snedes 3 tests (cf Table 18).
power const t-test Wile trim-t blwt-t
-1.00 3.00 0.036 0.014 0.017 0.009
-0.50 1.00 0.032 0.011 0.021 0.006
-0.50 3.00 0.028 0.016 0.025 0.011
0.00 0.00 0.057 0.043 0.021 0.060
0.00 0.10 0.033 0.029 0.018 0.029
0.00 0.50 0.020 0.009 0.014 0.010
0.00 1.00 0.021 0.011 0.017 0.011
0.50 0.00 0.019 0.011 0.008 0.034
0.50 0.50 0.025 0.017 0.023 0.019
1.00 0.00 0.072 0.018 0.113 0.027
TABLE 22: Soedes 3 tests (cf Table 19).
power const t-test Wile trim-t biwt-t
-1.00 3.00 0.035 0.034 0.119 0.053
-0.50 1.00 0.022 0.034 0.083 0.031
-0.50 3.00 0.016 0.019 0.063 0.020
0.00 0.00 0.008 0.010 0.037 0.008
0.00 0.10 0.008 0.010 0.036 0.008
0.00 0.50 0.008 0.007 0.035 0.008
0.00 1.00 0.008 0.007 0.033 0.007
0.50 0.00 0.006 0.005 0.033 0.005
0.50 0.50 0.006 0.005 0.034 0.005
1.00 0.00 0.01? 0.004 0.073 O.OOQ
Thus this species seems to have been reduced
by the alteration. However, the evidence Is
stronger for Summer, when the two sites seem to
have been quite similar (Before) and there are
many observations, than for Winter, when the
relation between the sites seems to have been
different but there are only six Before
observations, all within a seven week period.
                                         67

-------
    4. Discussion.
    There   are   three   obvious   directions   for
further work: better testing,  better things than
testing, and lessons to  be  applied 1n the design
of assessment programs.
    With respect to the first,  there 1s a laundry
list of desirable  extensions.  A  proper  plan for
dealing with correlated data  1s  high  on  this
list.   However   it   seems   unlikely   to   be
accomplished without a model  Involving more than
half a dozen parameters,  and requiring larger and
longer  (in   time)  datasets   than   are   often
customary.
    Multivariate procedures are  another  Item.  It
will have been  noticed that Species 2 and 3 were
both dense at  about the same  times.  If  they are
similar biologically (or suspected to be so),  we
may get more reliable results,  and perhaps some
Idea  about  the mechanisms of   any effect,  by
analyzing them together. Unfortunately, knowledge
of many populations Is  such that  it  1s  easy  to
"suspect"  links between large numbers of species,
and difficult  to  rate the credibility  of these
suspicions.   Data-based   methods   for   linking
different species would be helpful.
    Mention of many species raises the problem of
multiple  testing.   In  many assessment  problems
there may  be dozens or  even hundreds of species
to consider: all are sampled the  same way, so the
"extra" data are  essentially  free.  Even  1f  no
species  is  affected,  the   probability  that  at
least  one  will be  "significant"  may  be  much
greater than  the   nominal   level.  The  standard
answer, which amounts  largely to  using very small
test  levels,  Is unsatisfactory:  Normal   and  t-
approximations  can't be  trusted  at these levels,
and the variances  and  sample sizes are such that
even  severely  affected  species   are  unlikely  to
register at them.
    Can  we do  better  than  significance  testing?
That many  resource managers and biologists seem
to  want,  and   be  content   with,   a  test  of
significance appears sometimes to be a remarkable
public relations achievement. The meaning of a p-
value 1s not easy  to  grasp, even  1n simple cases
where   the   model   Is   known.    It  may   not  be
understood by many of  those who ask  for it. There
are plenty of  logical  problems,   as the Bayeslans
have  pointed  out  (e.g.   Edwards,  Undman  and
Savage  1963, Lindley 1972).
    Arguments can  be  made  for Impact assessment
testing.  An  obvious   one  1s  that  environmental
protection  laws often  imply  tests  rather than
estimates: alterations are to make no. change In
the  "balanced  Indigenous"  community.  Another is
that estimation may require larger  sample  sizes.
It may  be  more  economical  to test first and then
decide  whether  further lab and field studies are
justified. Bayeslans have pointed out that  the p-
values  from  one-sided  tests may  be  interpretable
Bayesianly,  if  the prior 1s diffuse. (It  surely
should  be   in   much   of  this   work,  and  when
environmentalists  and  industrialists each have
sharp,  but sharply different,  priors, the  p-value
seems  useful  as an  objective  compromise.) Also,
difficult  as testing  is, estimation may be much
harder, both to  do and to explain.
    But  this   last   objection   comes   near  to
violating    Tukey's   principle,   "better   an
approximate  answer to the  right  question, which
Is often vague, than an exact answer to the wrong
question, which  can often be made  precise." The
right question Is surely "Do the effects matter?"
This  is  vague   because   it  Involves  a  value
judgment, but  statisticians  can help  Individuals
to decide for  themselves  by  answering the nearby
question "How big are the effects?"
    This  question  1s  easy to  answer,  at  least
approximately,  from  the   testing  work  only  if
Control   is   believed  to   be unchanged  and  the
transformation  used  in  the tests   implies  the
units we want.   A  change  in  X  gives  us  the
absolute change;  in ln(X)  the percentage change;
in  X'1  the  change in  space needed  per animal.
Added constants might be justified on  the grounds
that E(X+c)D-{EX)fir approximately,  using Jensen's
Inequality.   But  powers other than  -1,  0  and  1
seem hard to interpret, although they may well be
justified In terms  of "damping".  Also,  if the
units  we   want   do   not   correspond  to   the
transformation we used, there  is  no simple back-
conversion  to  help:  the  estimated  size of the
change will  Involve the actual sample  abundances,
which  will   usually  depend  on  the  choice  of
sampling times.
    An   alternative   1s  to  attack  estimation
directly.  We  are  immediately  faced  with  the
problem of  choosing units,  the  possibility that
the effect may vary (e.g.  with t1me-of-year), and
the  difficulty  of  presenting  estimates  (with
confidence  intervals,   etc.)  In  a  form  quickly
grasped   by  non-scientists.  But these questions
suggest   a  useful  approach.  This  is to use  the
Control  (or  several  Controls)  as  predictors, and
to present  the assessment In the form of  a plot
showing,  for each Control  value, the Impact value
expected  In the  Before  period and  the  Impact
value expected In the After period. (One can also
plot the observed After value at  Impact against
the Before value  that would be predicted from the
observed  After  values  at  Control.)  Thus  the
Control  stands for  environmental conditions which
might determine   the  direction and  size  of the
change at Impact.  This approach was  successfully
used by  Hathur,  Robbins and  Purdy  (1980),  but I
have not seen It elsewhere.
    This   approach  requires  the  calculation  of
regressions,  of   Impact  against Control(s),  for
Before and After. The problems arising in testing
arise    here   too,    often   more   severely.
Corresponding  to  transformation   choice  Is  the
choice of a model. This  latter choice Is wider,
though:   there  are  natural   models   that  allow
Control  to  be  more abundant than  Impact  1n some
conditions  and  less  abundant  In   others.  Also
there  are   new  possibilities,   like  isotonic
regression  (Barlow  et al   1972)  or  projection
pursuit regression  (Friedman and  Stuetzle 1981),
that allow us to evade a particular choice. These
bring more problems, some of them severe, such as
estimating variance (especially If It  varies) and
detecting and allowing for serial  correlation.
    Finally, two  design considerations should be
mentioned. First,  It  may  be  Inadequate to decide
on a Cample  sjze only by  the  usual calculation,
n- 2s'(z0-ze)Yd »  where a  is the test level, d
is  a  difference  jt  Is  desired  to detect  with
probability  e,  sz  is a variance  estimate,  and
zo and  z 6 are the  upper a and  b  points  of the
N(0,l) distribution.  (The  units  of d  and  sz are
those  of   a  transformation  applied   to  some
                                                   68

-------
preliminary data used  to obtain s2.)  While  such
calculations  need  to   be  done,   we   need   to
anticipate uncertainty about transformations,  the
need to split the sample  Into  two  or more parts,
serial correlation,  and  the  possibility  of year-
to-year variation.
    Secondly,   the   estimation  approach   makes
explicit what was only  Implicit  1n testing.  This
Is  that  the "control" '1s not a control  In  the
sense  used   In  experiments.    In   the   testing
problem, we think of  It  as "tracking" Impact, If
only we could find  the right transformation.  But
the estimation  approach  described  above  makes It
clear  that  1t  Is  a  predictor.  Thus  it  Is  not
essential that  Control  be "similar"  to Impact,
only   that   It  predict   It  well   Before   the
alteration.   Thus    several    Controls,   serving
complementary purposes, might be useful,  as might
be  observations at  "Control"  on   species  other
than the target one.

    Acknowledgement:  I  thank  the   Harlne  Review
Committee  for  permission  to  use  the  data  In
Section 3.  MRC does  not necessarily  agree  with
any of the opinions  1n this article.

                    REFERENCES

Anderson, T. W. 1970. The Statistical Analysis of
    Time Series. WHey,  New York.
Antllle,  A.  G.,  G.  Kersting  and  W.  Zucchini.
    1982.   Testing   Symmetry.   Journal   of   the
    American  Statistical   Association,  21;  639-
    646.
Atkinson, A. C. 1985.  Plots. Transformations and
    Regression.   Oxford   University  Press,   New
    York.
Barlow, R. £.,  D.  J. Bartholomew,  J. M. Bremner
    and H. 0. Brunk.  1972. Statistical  Inference
    Under Order Restrictions. Wiley, New  York.
Bartels,  R.  1982.   The  Rank  Version   of  von
    Neumann's Ratio  Test  for Randomness. Journal
    of the American  Statistical  Association,  21;
    40-46.
Berry,  0.   A.  Logarithmic  Transformations  In
    ANOVA. Biometrics,  &; 439-456.
Box,  G.  E.  P.,  and  D. R.  Cox. 1964. An Analysis
    of  Transformations.   Journal  of  the  Royal
    Statistical  Society,  Series 'B', 24;  211-252.
Box,  G.  E.  P..  and W. J.  H111.  1974. Correcting
    Inhomogenelty    of   Variances    with   Power
    Transformation   Weighting.  Technometrlcs  1£;
    385-389.
Box,  G. E. P.,  and  G.  C. Tiao. 1965. A Change in
    the Level  of a Non-Stationary  Time Series.
    Biometrlka,  52;  181-192.
Box,  G. E. P.,  and G. C. T1ao. 1975.  Intervention
    Analysis with  Applications  to  Economic  and
    Environmental    Problems.   Journal   of   the
    American Statistical  Association, 7_fl; 70-79.
Edwards,  W., H. Lindman and L.  J. Savage. 1963.
    Bayeslan    Statistical      Inference     for
    Psychological  Research. Psychological Review,
    Zfl; 193-242.
Friedman,  J.  and W. Stuetzle.  1981. Projection
    Pursuit Regression.  Journal  of the  American
    Statistical  Association,  7_fi;  817-823.
Gupta,  H.   K.  1967.   An  Asymptotically  Non-
    Parametric  Test   of  Symmetry.   Annals   of
    Mathematical Statistics,  IS;  849-866.
Kafadar, K. 1982. Using Biweight Estimates  in the
    Two-Sample   Problem.    Part   1:   Symmetric
    Populations.  Communications  in  Statistics   -
    Theory and Methods,  II;  1883-1901.
Kafadar, K. 1983.  The Efficiency of  the  Biweight
    as  a Robust Estimator of Location. Journal of
    Research of the National Bureau of Standards,
    88; 105-116.
Lehmann, E. L. 1975.  Non-Parametrics; Statistical
    Methods Based on Ranks.  Hoiden Day.
Lindley,  D.  V.   1972.  Baveslan  Statistics:   A
    Review. S. I. A. M., Philadelphia.
Mathur, D., T. W.  Robblns  and E. J.  Purdy.  1980.
    Assessment   of   Thermal    Discharges   on
    Zooplankton  in  Conowlngo Pond,  Pennsylvania,
    Canadian  Journal  of  Fisheries   and  Aquatic
    Science 21; 937-944.
Posten, H.  0. 1978.  The  Robustness  of  the Two-
    Sample  t-test   Over    the   Pearson  System.
    Journal   of    Statistical    Computation  and
   'Simulation, fi;  295-311.
Posten, H.  0. 1979.  The  Robustness  of  the One-
    Sample  t-test   Over    the •  Pearson  System.
    Journal   of    Statistical    Computation  and
    Simulation, 2;  133-149.
Posten,  H. 0.   1982.  Two-sample  Wilcoxon   Power
    Over the  Pearson  System  and Comparison with
    the    t-test.     Journal     of   Statistical
    Computation and Simulation, 16; 1-18.
Reinsel, G. C.,  and G. C.  Tiao. 1987.  Impact of
    Chlorofluoromethanes on  Stratospheric  Ozone.
    Journal   of   the   American   Statistical
    Association,  82; 20-30.
Stewart-Oaten,  A.,  W.  W.  Murdoch  and   K.  R.
    Parker.    1986.     Environmental      Impact
    Assessment:    Pseudo-Replication   in    Time?
    Ecology 61(4);  929-940.
Stewart-Oaten, A.  1986. Assessing Local  Impacts:
    Progress  and   Some  Problems.   Oceans  '86
    Conference   Record,   Vol   3,    pp    964-973.
    Available  from: IEEE Service Center, 445 Hoes
    Lane,   Plscataway,   N.   J.,   08854;  or   from:
    Marine Technology Society,   2000  Florida Av-
    enue,  N. W.,  Suite 500,  Washington, D. C.
Stlgler,  S.  M.   1976.  The  Effect  of   Sample
    Heterogeneity  on  Linear  Functions  of   Order
    Statistics,   with   Applications  to    Robust
    Estimation.    Journal    of    the   American
    Statistical Association, 7J.5  956-960.
Tukey,  J.  W. 1949. One Degree of Freedom for Non-
    Additlvlty. Biometrics,  5;  232-242
von Neumann,  J.  1941. Distribution  of the   Ratio
    of  the Mean  Square  Successive  Difference to
    the   Variance.   Annals    of    Mathematical
    Statistics, 12; 367-395.
Yuen,   K.  K.  and W.  J.  Dixon.  1973. Approximate
    Behavior  and  Performance  of  the Two-Sample
    Trimmed t.  Blometrika, fifl;  369-374.
                                                   69

-------
             COMPARISONS WITH BACKGROUND ENVIRONMENT:  STRATEGIES FOR DESIGN
                                     Bruce Peterson
                   CH2M HILL, 777 108th Avenue NE, Bellevue, WA  98004
              INTRODUCTION

   Current federal and state environmen-
tal regulations have greatly increased
the number of environmental studies
being performed each year in the United
States.  These studies range from inves-
tigations of Superfund sites to the
establishment of monitoring systems for
RCRA-permitted facilitates.  A common
component of all of these studies is the
comparison of the sample values of en-
vironmental parameters with background
samples or regulatory thresholds.
   Design of environmental sampling
strategies in a potentially litigious
regulatory context presents many new
challenges for the statistician.  Among
these are the traditional challenges of
adequate problem definition, newer chal-
lenges of maintaining data quality when
multiple organizations handle the sam-
ples, and the challenges in the rapidly
developing field of chemometrics (the
study of the statistical properties of
chemical measurements).
   Studies involving statistical com-
parisons have an initial phase during
which the objectives of the study and
the contrasts of interest are refined
and agreed upon.  Regulator!ly mandated
environmental studies, particularly of
Superfund sites, often involve multiple
actors in defining the study objectives.
Each of these actors—the regulatory
agencies, the potentially responsible
parties, and the public—have poten-
tially disparate agendas.  Each agenda
can be translated into statistical de-
sign goals that may be incompatible with
those derived from other agendas.  Fre-
quently the actors are statistically
unsophisticated:  the definition of
study objectives therefore becomes an
involved process of both education and
political maneuvering.
   Even with an agreed-upon set of study
objectives, the regulatory context of a
study results in a high probability of
litigation.  This places an obligation,
on the study team to carefully document
and track sampling activities and the
movement of samples through the analy-
tical process.  Well-constructed data
bases are a necessity.
   The data base should include provi-
sions for recording progress of samples
through the analysis, field changes to
the sample plan, variations in the ana-
lytical protocol, pointers to the origi-
nal documents, and the values of the
environmental parameters.
   The data base will ultimately provide
the information necessary for statisti-
cal analysis and comparison of the
environmental parameters.  The chal-
lenges of the analysis will include the
mundane:  obtaining a "final" data set,
and the complex:  performing comparisons
with multivariate and incomplete data.
In addition the data base must maintain
information on the quality of the chemi-
cal analysis for incorporation into the
estimate of the overall quality of
analysis.
   The chemometric measures of quality
stored in a data base enable multiple
uses of the data.  These measures are
available from the analytical labora-
tories, allowing data of differing qual-
ity to be used in analyses where appro-
priate.  As an analytical lab is capable
of generating a large number of quality
measures, these are best transferred
electronically.
          STRATEGIES FOR DESIGN;
        THE LOVE CANAL PILOT STUDY

   Many of these challenges have been en-
countered with the Love Canal Superfund
site.  Construction on the Love Canal was
started in the 1890s but never completed.
The canal was used as a chemical waste
dump from 1942 to 1953 and was then
filled and covered.  The Love Canal area
developed as an urban neighborhood until
significant contamination of the environ-
ment was recognized in 1978.  Extensive
remediation to isolate and contain the
wastes in the canal was performed in 1979
and additional remediation to clean the
surrounding area is continuing.  The Hab-
itability Study, due for completion in
May 1988, is designed to provide informa-
tion on the habitability of the emergency
declaration area (EDA) surrounding the
canal.
   The technical goals of the study were
developed by a Technical Review Committee
(TRC) established by the U.S. EPA to
oversee and coordinate the habitability
study.  TRC members represent the U.S.
EPA, the N.Y. State Department of Health,
the U.S. Department of Health and Human
Services/Centers for Disease Control, and
the N.Y. State Department of Environmen-
tal Conservation.  The TRC in turn formed
a scientific advisory panel to formulate
and recommend habitability criteria.
   The scientific panel recommended that
a habitability study be formulated as a
statistical comparison between each EDA
neighborhood and similar areas not in-
fluenced by a hazardous waste site in the
Niagara Falls/Buffalo area.  The compari-
son was to be based on the chemical con-
centrations of 8 indicator chemicals
(LCICs), chosen for mobility and speci-
ficity to the canal.  The comparison was
to provide a 90 percent probability of
detecting an "order of magnitude" differ-
ence in median values with 95 percent
confidence against the null hypothesis
that no difference in median values
                                          70

-------
existed.  The sample design was devel-
oped after internal discussion and a
peer review.  The full criteria can be
found in Love Canal Emergency Declara-
tion Area;  Proposed Habitability Cri-
teria, NYSDOH and DHHS/CDC 1986.
   This degree of specificity is unusual
in Superfund-driven environmental
studies.  However, the extensive discus-
sions that preceded agreement on these
criteria served to develop a common ex-
pectation as to what the habitability
study should provide.
   The habitability study, like most en-
vironmental studies, required estimates
of LCIC concentration variability to
design the sampling and analysis plan.
In addition, since previous studies had
found the indicator chemicals to have a
high percentage of undetectable concen-
trations, greater sensitivity of the
chemical analysis was needed.
   A pilot study was designed and imple-
mented to test the field operations re-
quired to collect the samples, test a
new analysis method for estimating LCIC
concentrations, and identify and quan-
tify sources of variability in the con-
centration estimates.
   The Love Canal pilot study illustra-
ted difficulties common to many large
environmental studies, one example being
the tracking of samples from field col-
lection through analysis.  Although a
data base was established to track sam-
ples, the number of changes that oc-
curred in the field required consid-
erable effort to trace during later
stages of the study.  Environmental
studies should establish sample tracking
and documentation as priority task.
   Laboratory variability, not generally
quantified in many environmental
studies, was found to be significant
during the pilot.  Laboratory operations
were closely tracked as part of the
pilot, yielding quantitative measures of
variability not commonly available
through EPA's contract laboratory pro-
gram.  This close tracking was continued
for the full study, as the detailed in-
formation provides insight about the
quality of the data.
   The pilot provided information on the
sources of variability and bias in the
laboratory estimates of concentration.
Interlaboratory variability was found to
be of similar magnitude as intralabora-
tory variability.  Biases occurred among
laboratories from several causes ranging
from contamination levels in different
lots or brands of reagents to differ-
ences in exact analytical protocols
among laboratories.
   An analytical method specific for
LCICs was developed to achieve suffi-
cient sensitivity to increase the number
of samples with detectable concentra-
tions.  Criticism of earlier studies
about the sensitivity of the analytical
methods led to an effort to estimate the
"detection limits" for the analytical
method.
   The analytical method proposed for
the habitability study uses a gas
chromatograph/mass spectrometer (GC/MS)
instrument similar to that used for most
sensitive organic analyses of environ-
mental samples.  The method differs from
the usual broad scan for a range of com-
pounds.  In this method the instrument
was programmed to search only for LCICs
by a method known as single ion moni-
toring (SIM).
   The GC/MS instrument is sensitive,
providing quantitative estimates of very-
low-level concentrations (parts per bil-
lion range), and accurate, providing a
high level of confidence as to the iden-
tity of the compound.  The proposed
method increased the sensitivity of the
instrument by operating it in the SIM
mode.  This concentrated the instru-
ments' scan time on the LCICs.
   Estimating the "detection limit" for
the GC/MS SIM proved difficult as the
chemometric literature has not dealt ex-
tensively with detection limits for
multivariate instruments.  A sample spe-
cific detection limit cannot be esti-
mated for the GC/MS SIM method.
However, a method detection limit may be
estimated.  The method detection limit
is the concentration at which samples
with a known concentration of a compound
would be differentiated from a blank
95 percent of the time.
   An estimator for the method detection
limit is difficult to develop as identi-
fication and quantification are con-
founded in the GC/MS instrument.  Iden-
tification and quantification is a
multivariate process where a compound is
not quantified unless it is first iden-
tified.  Generally the ability to iden-
tify a compound is lost at concentrations
greater than those at which quantifica-
tion could occur.  Identification often
fails because of chemical interferences
in the chromatogram at low concentra-
tions.  These interferences obscure the
ion area peaks associated with the target
compound.
   Standard estimators of detection
limits (Long and Winefordner 1983, Currie
1968, Hubaux and Vos 1968) are defined in
terms of instruments with univariate re-
sponses.  Since the GC/MS is a multivari-
ate instrument, these methods are not
directly applicable.  New estimators are
required for detection (or identifica-
tion) limits for the GC/MS.  These are
being developed in the full habitability
study.
          SUMMARY AND CONCLUSION

   Environmental studies are increasingly
being mandated by regulatory concern.  In
this potentially litigious context, ex-
ceptional efforts are required to produce
                                          71

-------
sound statistical designs.  The objec-
tives of each study must be clearly
defined and conflicting objectives iden-
tified and mediated before the study can
be successful.
   A good data base management system is
a prerequisite for a successful environ-
mental study.  This system should in-
clude the personnel, software, and hard-
ware to track not only the parameter
estimates but sample locations, se-
quences, quality assurance parameters,
and pointers to original documents.
With environmental studies typically
involving a number of organizations,
inadequacies of data management will
cause difficulties in data defensibility
and interpretation.
   Finally, a good understanding of the
analytical process that produces the
parameter estimates is required.  Ex-
perience with the analytical techniques
used at Love Canal has shown that basic
concepts such as "Detection Limit" are
not as clearly defined as may be neces-
sary.  Statisticians must expect to work
closely with chemists and other experts
to evaluate the impact of chemical con-
centration estimators on study
objectives.
                REFERENCES

Currie, L.  Limits for Qualitative Detec-
tion and Qualitative Determination.  Ana-
lytical Chemistry. Vol. 40, No. 3.  1968.
Pp. 586-593.

Hubaux, A., and G. Vos.  Decision and De-
tection Limits for Linear Calibration
Curves.  Analytical Chemistry. Vol. 42,
No. 8.  1968.  Pp. 849-855.

Long, G. L., and J. D. Winefordner,
Limit of Detection:  A Closer Look at the
IUPAC Definition.  Analytical Chemistry,
Vol. 55, No. 7.  1983.  Pp. 712-724.

New York State Department of Health and
U.S. Department of Health and Human
Services/Centers for Disease Control.
Love Canal Emergency Declaration Area;
Proposed Habitability Criteria.  1986.
                                          72

-------
                                          STATISTICAL COMPARISONS OF STTES:

                                                        DISCUSSION
                                                       R. Clifton Bailey
                                  Environmental Protection Agency, WH586,401 M. Street, SW
                                                    Washington, DC 20460
 The  common the problem in both papers is the choice of a
 comparison areas or sites. The approaches differ in the use of sites
 similar in time versus similar in location.  These choices are made
 for practical considerations. In both cases, traditional statistical
 techniques are stretched to fit the situation.

 Both papers emphasize statistical methods  adapted from traditional
 work  for designed  experiments,  and emphasize  making the
 comparison sites similar. There are, of course, always problems
 with  finding similar sites. Whatever the choice, in controversial
 situations, there is always an advocate to find fault with the choice.
 The problem is especially troublesome since traditional safeguards
 such as randomization are not available.

 Since it is difficult to find an ideal sit for comparison, we might
 examine the fundamental question: Do we want just one area for
 comparison? If more than one area is used for comparison, then
 how do we make use of the information from multiple sites?

 If more than one comparison site is used, then we could establish
 relationships between factors thought to be important, and use the
 results for  adjustment. Often this ploy is used in observational
 studies when the  conditions  do not permit the usual design
 interventions such as the random assignment of treatment and
 control sites.

 To establish relationships requires a range of values for variables of
 potential concern.  This means  that we seek a range of site
 conditions which can be used  to establish the  adjustment
 relationships rather than trying to find identical sites. Since identical
 sites  don't exist, we  need enough  information  to construct  a
 hypothetical site analytically  though computed adjustments.
 Consequently more than one control site must be used  unless there
 is an obvious control or reference that cannot be questioned.

 In effect you compute the ideal control site rather than find an
 almost ideal physical site. The control sites serve to calibrate your
 model so that you can adjust for other factors.

 In these situations, where traditional statistical design don't apply,
 there  will always be factors which will differ between sites and
 which someone will point to as an explanation for any difference
 found by traditional statistical tests.

 If we  include a reasonable set of concomitant information to make
 our adjustments, the analysis must be more complex.

Traditional statistical designs usually place a great deal of emphasis
on  making the analysis  simple. When it is not possible to use
traditional  statistical design to make the analysis easy, then  we
 must  consider more  complex  analyses  to circumvent design
 deficiencies.

 Design deficiencies may be circumvented by taking a global
 approach in which we use information from several studies jointly
 for common elements when the information in individual studies is
 inadequate.

 Success with the global  approach requires a modeling effort
 based on a subject matter understanding far beyond the models of
 traditional statistical methods.

 Similar ideas were expressed  W. G. Cochran(1965). The Planning
 of Observational Studies  of Human Populations,  Journal of the
 Roval Statistical Society. Series A. Vol. 128. Pt 2, pp 234-265.
 This is a good source of ideas for dealing with  studies that are
 outside the realm of traditional statistical designs. For example,
 Cochran says:

        Planning of observational studies are found in subject
         matter departments not in statistics departments.

        Effective  planning of observational  studies calls for
        considerable mastery of subject-matter field.

        In setting up comparisons, the investigator makes do with
        comparisons that are far from ideal.

        The time and resources for analysis of results are often
        underestimated.

        To handle  disturbing variables- list
        variables for which matching or adjustment is
        1) required,
        2) desirable,
        3) disregarded.

        For discussions of association and causation, he cites work
        on path analysis.

In the next section, I give a brief view of path analysis and suggest
environmental statisticians build on the vast experience with this
method of analysis.
 Environmental Path Analysis- Blazing New Trails with Path
Analysis

Why path analysis? Path analysis was developed to solve complex
problems in genetics. Path analysis helps in formulation of complex
environmental problems and provides a conceptual framework for
dealing with complex problems.

In path analysis one  graphically depicts causal pathways   to
represent statistical relationships among variables.

A simple example might depict a linear relationship between x and
y as follows:
Arrow shows direction of the relationship and U represents the
unexplained sources of variation in Y.
                                                            73

-------
The path diagram for multiple regression might be drawn as shown
below:
Path analysis extends single and multiple regression equation to
networks of variables involving more than one equation.  Some
writers omit U, the unexplained variation,  and the correlation
arrows between x's from path diagrams.
According to Ching Chun Li (1975) in  Path Analysis—a primer.
The Boxwood Press, Pacific Grove, CA p. 166
       "When properly analyzed, each path diagram will yield a
       set of consistent  conclusions for that particular causal
       system."

For each environmental problem similar to those described by Alan
Stuart-Oaten,
       T| j   might denote a site of concern or impact,


       1^2 * control site


        51  a problem  that affects site of concern, T|J, but not

               control site r\2


                                     T
                      In conclusion, path analysis provides a way to very quickly  build
                      complex models involving many variables. Some authors have
                      extended path analysis to nonlinear models. Nonlinear path models
                      allow investigator a broad range of modeling possibilities and a
                      framework for organizing their work. References are provided to
                      assist the interested reader with the basic concepts of path analysis.
                      Disclaimer:
                      The opinions and assertions expressed in this paper are the private
                      conclusions of the author and should not be construed as official or
                      reflecting the views of the U. S. Environmental Protection Agency.

                      References:

                      Sewall Wright Evolution and the Genetics of Populations. Volume
                             1 Genetic And Biometric Foundations:
                             Chapter 13. Path Analysis: Theory, pp. 299-324.
                             Chapter 14. Interpretation by Path Analysis, pp. 325-372.

                      S. Wright (1921) Correlation and causation. Jour. Agric. Res. 20:
                             557-85.

                      J. W. Tukey (1954). Causation, regression and path analysis. In
                             Statistics and Mathematics in Biology, eds. O.
                             Kempthome, T. A. Bancroft, J. W. Gowen, and J. L.
                             Lush, pp.35-66. Ames: Iowa State University Press.

                      M. E. Turner and C. E. Stevens (1959). The regression analysis of
                             causal paths. Biometrics IS: 236-258.

                      Ching Chun Li (1975) Path Analysis—a primer. The Boxwood
                             Press, Pacific Grove, CA

                      Studies in Econometric Method by Cowles Commission Research,
                             Edited by Wm. C. Hood and Tjalling C. Koopmans
                             (1953):
                             Chapter n. Identification Problems in
                             Economic Model Construction.
                             By Tjalling C. Koopmans,  pp.  27-48

                             Chapter III. Causal Ordering and Identifiability.
                             By Herbert A. Simon, pp.49-74
 lYobkn
                                           Consequence at
                                           Impact Site
                                           Consequence at
                                           Control Site
A more realistic model includes common effect of other variables,
for example, £2 on both the sites
 frobleiii
 Common
 Environmental
 Factors)
Consequence at
Impact Site
                                           Consequence flt
                                           ControlSHe
                                                            74

-------
                           APPENDIX A: Program
            ASA/EPA CONFERENCE ON SAMPLING AND SITE SELECTION
                          IN ENVIRONMENTAL STUDIES
                                 May 14-15, 1987
                                Thursday. Mav 14

8:50 a.m.      INTRODUCTION
              Walter S. Liggett, National Bureau of Standards
              Dorothy Wellington, U.S. Environmental Protection Agency
L  THE STATISTICAL BASIS: RANDOMIZATION AND PROCESS CONTROL
9:00 a.m.      Some Statistical Issues in the Collection and Analysis of Environmental
              Data
              George C. Tiao, University of Chicago
10:00 a.m.     BREAK
10:10 a.m.     Sampling Design: Some Very Practical  Considerations
              Douglas E. Splitstone, IT Corporation
ll:10a.m.     DISCUSSION
              Maurice E.B. Owens, Science Applications International Corporation
              W. Barnes Johnson, U.S. Environmental  Protection Agency
12:10 p.m.     LUNCH

H.  INFERENCE ON CONTINUOUS SPATIAL DISTRIBUTIONS
1:30 p.m.      Spatial Prediction
              Noel A.C. Cressie, Iowa State University
2:30 p.m.      BREAK
2:40 p.m.      Spatial Autocorrelation:  Implication for Sampling and Estimation
              Evan Englund, U.S. Environmental Protection Agency, EMSL-Las Vegas
3:40 p.m.      DISCUSSION
              John Warren, U.S. Environmental Protection Agency
4:00 p.m.      RECEPTION
                                     75

-------
                                  Friday. Mav 15
m.  DESIGNS BASED ON AUXILIARY INFORMATION

9:00 a.m.      Sampling and Modeling Superfund Site Pollutant Plumes: Methods
              Combining Direct Measurements and Remote Sensing Data
              Ralph E. Folsom, Research Triangle Institute

10:00 a.m.     BREAK

10:10 a.m.     "Validation" of Air Pollution Dispersion Models
              Anthony D. Thrall, Electric Power Research Institute

11:10 a.m.     DISCUSSION
              Robert W. Jernigan, The American University
              Richard A. Livingston, U.S. Environmental Protection Agency and
              University of Maryland

12:10 noon     LUNCH
IV.  STATISTICAL COMPARISON OF SITES

1:30 p.m.      Assessing Effects on Fluctuating Populations: Tests, Diagnostics, and
              Decisions
              Allan Stewart-Oaten, University of California-Santa Barbara

2:30 p.m.      BREAK

2:40 p.m.      Comparison with Background Environment: Strategies for Design
              Bruce Peterson, CH2M Hill

3:40 p.m.      DISCUSSION
              R. Clifton Bailey, U.S.  Environmental Protection Agency

4:10 p.m.      CLOSING REMARKS
              Walter S. Liggett, National Bureau of Standards
                 This   Conference  is  the  third  in a  series  of  research
             conferences  organized by  the American Statistical  Association,
             supported by a cooperative agreement between ASA and the Office
             of Standards and Regulations, under the Assistant Administrator
             for Policy Planning and Evaluation, U.S.  Environmental Protection
             Agency.
                          Conference Chair and Organizer:
                   Walter S. Liggett, National Bureau of Standards
                                      76

-------
                       APPENDIX B: Conference Participants

            ASA/EPA CONFERENCE ON SAMPLING AND SITE SELECTION
                          IN ENVIRONMENTAL STUDIES

                                May 14-15, 1987

                               Holiday Inn Capital
                                Washington, D.C.
 Joe Abe
 U.S. EPA
 401 M Street, S.W. (WH-565E)
 Washington, DC  20460

 Ruth H. Allen
 U.S. EPA
 Research and Development
 401 M Street, S.W.
 Washington, DC  20460

 Glenn F. Atkinson
 Environment Canada
 Fifth Floor, Place Vincent Massey
 Ottawa, Canada  K1A OH3

 R. Clifton Bailey
 U.S. EPA
 401 M Street, S.W. (WH-586)
 Washington, DC  20460

 James C. Baker
 U.S. EPA, Region 8
 999 18th Street, Suite 501
 Denver, CO 80202-2405

 Ted O. Berner
 Battelle Columbus Division
 2030 M Street, N.W., Suite 700
 Washington, DC  20036

 Gordon D. Booth
 USDA Forest Service
 797 East 5050 South
 Ogden, UT  84403

 Jill J. Braden
 Westat, Inc.
 1650 Research Boulevard
 Rockville, MD  20850

 Dennis L. Brandon
 USAE Waterways Experiment Station
 P.O. Box 631
Vicksburg, MS  39180

 Christine M. Bunck
U.S. Department  of Interior
 Patuxent Wildlife Research Center
8102 Triple Crown Road
Bowie, MD  20715
                                    77
 Thomas J. Bzik
 Air Products and Chemicals, Inc.
 P.O. Box 538
 Allentown, PA  18105

 Jean Chesson
 Bertram Price Associates, Inc.
 2475 Virginia Avenue, #213
 Washington, DC 20037

 Robert P. Clickner
 Westat, Inc.
 1650 Research Boulevard
 Rockville, MD 20850

 Margaret G. Conomos
 U.S. EPA
 Exposure Evaluation Div. (TS-798)
 401 M Street, S.W.
 Washington, DC 20460

 Gerald F. Cotton
 NOAA/U.S. Department of Commerce
 Air Resources Laboratory, R/E/AR
 8066 13th Street
 Silver Spring, MD  20910

 James Craig
 U.S. EPA (WH-563)
 401 M Street, S.W. Room 2817 WSM
 Washington, DC 20460

 Noel A.C. Cressie
 Iowa State University
 Department of Statistics
 Ames, IA 50011

 James M. Daley
 U.S. EPA
 Office of Standards & Regulations
 Washington, DC 20460

 Stephen K. Dietz
 Westat, Inc.
 1650 Research Boulevard
 Rockville, MD  20850

 Don Edwards
 Department of Statistics
University of South Carolina
 Columbia, SC 29208

-------
Evan Englund
U.S. EPA , EMSL-Las Vegas
P.O. Box 15027
Las Vegas, NV 89114

Barrett P. Eynon
SRI International
333 Ravenscross Avenue
Menlo Park, CA 94025

Paul Flyer
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850

Ralph E. Folsom
Research Triangle Institute
P.O. Box 12194
Research Triangle Park, NC 27709

Ruth E. Foster
U.S. EPA
401 M Street, S.W. (PM-223)
Washington, DC 20460

Robert Fusaro
Columbia University
P.O. Box 166, Route 9D
Garrison, NY 10524

Paul H. Geissler
U.S. Department of Interior
Patuxent Wildlife Research Center
12504 Windover Turn
Bowie, MD  20715

Michael Ginevan
Environ Corporation
1000 Potomac Street, N.W.
Washingon, DC  20007

Leigh Harrington
ERIM
1501 Wilson Boulevard, #1105
Arlington, VA 22209

Cynthia L. Hertzler
EG&G Idaho
P.O. Box 1625
Idaho Falls, ID 83415

ToddHiggins
U.S. Army Corps of Engineers
P.O. Box 631
Waterways Experiment Station
Vickburg, MS 39180

W. Ted Hinds
U.S. EPA
401 M Street, S.W. (RD-680)
Washington, DC  20460
Matthew Hnatov
U.S. EPA
401 M Street, S.W..
Washington, DC 20460

Thomas T. Holloway
U.S. EPA
Region 7 Laboratory
25 Funston Road
Kansas City, KS 66115

Robert W. Jernigan
The American University
Math Dept., Clark Hall, Rm. 215
4400 Massachusetts Avenue, N.W.
Washington, DC 20016

W. Barnes Johnson
U.S. EPA
401 M Street, S.W. (PM-223)
Washington, DC 20460

Henry Kahn
U.S. EPA
401 M Street, S.W.
Washington, DC 20460

Louise C. Kern
Texaco, Inc.
P.O. Box 1404
Houston, TX  77251-1404

Debra S. Knopman
U.S. Geological Survey
MS 410, National Center
Reston, VA 22092

Charles T. Kufs
Roy F. Weston,  Inc.
5-2 Weston Way
West Chester, PA 19380

Herbert Lacayo
U.S. EPA
401 M  Street, S.W., (PM-223)
Washington, DC 20460

Emanuel Landau
American Public Health Assn
1015  15th Street, N.W.
Washington, DC 20005

Barbara Leczynski
Battelle Columbus Division
2030 M Street, N.W., Suite 700
Washington, DC 20036

Walter S.  Liggett
National Bureau of Standards
Administration  Building, A337
Gaithersburg, MD 20899
                                     78

-------
 Richard A. Livingston
 University of Maryland
 Geology Department
 College Park, MD  20742

 Gordon R. Luster
 Woodward-Clyde Consultants
 100 Pringle Avenue, Suite 300
 Walnut Creek, CA  94596-3564

 David Marker
 Westat, Inc.
 1650 Research Boulevard
 Rockville, MD  20850

 Raymond E. Mclntyre
 Fairfax County Air Pollution
  Control
 10777 Main Street, Suite 100A
 Fairfax, VA 22030

 Daniel I. Michael
 Research  Triangle Institute
 1717 Massachusetts Avenue, N.W.
 Suite 102
 Washington, DC 20036

 John Michael
 Westat, Inc.
 11725 Happy Choice Lane
 Gaithersburg, MD  20878

 Steven P.  Millard
 University of Washington
 Biostat. Consulting Unit, SB-77
 Seattle, WA 98105

 Patricia A. Mundy
 U.S. EPA
 RD-680, 401 M Street S.W.
 Washington, DC 20460

 Patricia Murray
 U.S. EPA  (WH-563)
 401 M Street, S.W., Room SE264 WSM
 Washington, DC 20460

 Avis D. Newell
 Northrop Services
 Environmental Research Lab
 200 SW 35th Street
 Corvallis,  OR 97333

Maurice E.B. Owens
Science Applications Int'l Corp.
8400 West Park Drive
McLean, VA 22102

Tim B. Parkin
U.S. Department of Agriculture
ARS, Building 007, Room 229, BARC-W
Beltsville, MD  20705
 Bruce Peterson
 CH2M Hill
 P.O. Box 91500
 Bellevue, WA  98009

 Kurt H. Riiters
 Forest Response Program
 299 SW 35th Street
 Corvallis, OR  97333

 Douglas S. Robson
 Cornell University
 Biometrics Unit, Warren Hall
 Ithaca, NY 14853

 John W. Rogers
 Westat, Inc.
 1650 Research Boulevard
 Rockville, MD 20850

 Charles A. Rohde
 Johns Hopkins University
 Department of Biostatistics
 615 N. Wolfe Street
 Baltimore, MD  21205-3179

 Marie L. Saint-Louis
 EG&G Idaho, Inc.
 P.O. Box 1625
 Idaho Falls, ID 83415

 Paul D. Sampson
 University of Washington
 Department of Statistics, GN-22
 Seattle, WA 98195

 Paul A. Schif felbein
 DuPont Company
 Engineering Dept., Louviers 51N13
 P.O. Box 6090
 Newark, DE 19714-6090

 Bradley D. Schultz
 U.S. EPA (TS-798)
 Design Development Branch
 401 M Street, S.W., Room E309
 Washington, DC 20460

 Luther A. Smith
 North Carolina State University
 Atmospheric Impacts Research Program
 2501 Anne Carol Court
 Raleigh, NC  27603

 Carol J. Spease
U.S. Bureau of Labor Statistics
 12405 Braxfield Court, #14
 Rockville,  MD  20852
                                      79

-------
Douglas E. Splitstone
IT Corporation
10 Duff Road
Pittsburgh, PA  15235

Stephen V. Stehman
Cornell University
337 Warren Hall
Ithaca, NY  14853

Allan Stewart-Oaten
University of California
Department of Biology
Santa Barbara, CA  93106

Therese A. Stukel
Dartmouth Medical School
HB7927
Hanover, NH  03756

Clayton L. Stunkard
U.S. EPA
Statistical Policy Branch
401 M Street, S.W.
Washington, DC 20460

Charles Taillie
Pennsylvania State University
CSEES
316 Pond Lab
University Park, PA 16802

Cassie Thompson
Westat, Inc.
1650 Research Boulevard
Rockville, MD  20850

Anthony D. Thrall
Electric Power Research Institute
3412 Hillview Avenue
Palo Alto, CA  94303

George C. Tiao
University of Chicago
Graduate School of Business
1101 E. 58th Street
Chicago, IL  60637

Peter Tong
U.S. EPA
401 M Street, S.W. (WH-548A)
Washington, DC 20460
Y.K. Tung
University of Wyoming
P.O. Box 3322 University Station
Laramie,WY 82071

Alta Turner
EBASCO Services
160 Chubb Avenue
Lyndhurst, NJ  07071

Paul Wakim
American Petroleum Institute
1220 L Street, N.W.
Washington, DC  20005

Elbert Walker
New Mexico State University
Mathematics Department
Las Cruces, NM  88003

John Warren
U.S. EPA
401 M Street, S.W.  (PM-223)
Washington, DC  20460

Herbert I. Weisberg
Consulting Statisticians, Inc.
20 William Street
Wellesley, MA 02181

Dorothy Wellington
U.S. EPA
401 M Street, S.W.  (PM-223)
Washington, DC  20460

Louis Wijnberg
PEI Associates Inc.
310 Blue Ridge Road
Carrboro, NC  27510

Jack W. Zolyniak
Martin Marietta  Energy Systems
P.O. Box P, K-1004D, MS-278
Oak Ridge, TN 37831

American Statistical Asociation

Fred C. Leone
Executive Director

Mary Esther Barnes
Conference Coordinator
                                        so

-------