United States
Environmental Protection
Agency
Office of Policy, Planning
and Evaluation
Washington, DC 20460
EPA-23O08-8S035
Statistical Policy Branch
ASA/EPA Conferences on
Interpretation of
Enviionmental Data
Sampling and Site Selection
In Environmental Studies
May 14-15, 1987
n
-------
DISCLAIMER
This document has not undergone final review within EPA and should
not be used to infer EPA approval of the views expressed.
-------
PREFACE
This volume is a compendium of the papers and commentaries that were presented at
the third of a series of conferences on interpretation of environmental data conducted by
the American Statistical Association and the U.S. Environmental Protection Agency's
Statistical Policy Branch of the Office of Standards and Regulations/Office of Policy,
Planning, and Evaluation.
The purpose of these conferences is to provide a forum in which professionals from
the academic, private, and public sectors exchange ideas on statistical problems that
confront EPA in its charge to protect the public and the environment through regulation of
toxic exposures. They provide a unique opportunity for Agency statisticians and scientists
to interact with their counterparts in the private sector.
The conference itself and these proceedings are primarily the result of the efforts of
the authors and discussants. The discussants not only provided their input to the
proceedings but also reviewed the papers for the purpose of suggesting changes to the
authors. The coordination of the conference and of the publication of the proceedings was
carried out by Mary Esther Barnes and Lee L. Decker of the ASA staff. The ASA
Committee on Statistics and the Environment was instrumental in developing this series of
conferences.
The views presented in this conference are those of individual writers and should not
be construed as reflecting the official position of any agency or organization.
Following the first conference, "Current Assessment of Combined Toxicant Effects,"
in May 1986, and the second conference, "Statistical Issues in Combining Environmental
Studies," in October 1986, the third conference, "Sampling and Site Selection in
Environmental Studies," was held in May 1987. One additional conference, "Compliance
Sampling," was held in October 1987. A proceedings volume will also be published from
this conference.
Walter Liggett, Editor
National Institute of Standards and Technology
Hi
-------
INTRODUCTION
The eight papers and accompanying discussions in these proceedings are about drawing
conclusions in environmental studies. These papers provide valuable guidance in the
planning of future environmental studies.
The papers address many aspects of environmental studies. The studies discussed
involve air, water, ground water, and soil. These studies are aimed at specific goals as
diverse as the effect of a regulatory intervention, the design of a remediation effort, the
spatial distribution of a hazardous material, the validity of an environmental model, and
the impact of a power plant. Some studies emphasize in addition the planning of the field
work and of the chemical analyses in the laboratory. The studies employ techniques from
various statistical areas such as probability sampling, response surface analysis, optimal
design of experiments, time series analysis, spatial prediction, power transformations, and
the analysis of variance. In the planning of an environmental study when almost all
options are still open, most of these aspects are potentially relevant.
These proceedings are intended for statisticians involved in the planning of
environmental studies. Statistical planning is based on anticipation of the statistical
analysis to be performed so that the necessary data can be collected. These proceedings
should help the statistician anticipate the analysis to be performed. In addition, the
papers discuss implications for planning new studies. No general prescriptions for planning
are offered; none may be possible.
The emphases in these papers are quite different. No two authors have chosen the
same aspect of environmental studies to examine. This diversity among authors who have
all invested considerable time and effort in environmental studies suggests a major
challenge. The challenge is to consider carefully each study aspect in the planning
process. Meeting this challenge will require a high degree of professionalism from the
statistician involved in an environmental sutdy.
Walter Liggett, Editor
National Institute of Standards and Technology
iv
-------
INDEX OF AUTHORS
Bailey, R. Clifton 73
Cressie, Noel A.C 25
Englund, Evan J 31
Folsom, Ralph E 41
Hudak, G 1
Jernigan, Robert W 54
Johnson, W. Barnes 23
Liu, Lon-Mu 1
Livingston, Richard A 55
Peterson, Bruce 70
Splitstone, Douglas E 15
Stewart-Oaten, Allan 57
Thrall, Anthony D 52
Tiao, George C 1
Warren, John 40
-------
TABLE OF CONTENTS
Preface
111
Introduction. WALTER S. LIGGETT, National Institute of Standards and
Technology iv
Index of Authors v
I. THE STATISTICAL BASIS: RANDOMIZATION AND PROCESS CONTROL
A Statistical Assessment of the Effect of the Arizona Car Inspection/
Maintenance Program on Ambient CO Air Quality in Phoenix, Arizona.
LON-MU LIU, G. HUDAK, GEORGE C. TIAO, University of Chicago 1
Sampling Design: Some Very Practical Considerations. DOUGLAS E.
SPLITSTONE, IT Corporation 15
Discussion. W. BARNES JOHNSON, U. S. Environmental Protection
Agency 23
H. INFERENCE ON CONTINUOUS SPATIAL DISTRIBUTIONS
Spatial Prediction and Site Selection. NOEL A. C. CRESSIE, Iowa
State University 25
Spatial Autocorrelation: Implications for Sampling and Estimation.
EVAN J. ENGLUND, U. S. Environmental Protection Agency 31
Discussion. JOHN WARREN, U. S. Environmental Protection Agency 40
ffl. DESIGNS BASED ON AUXILIARY INFORMATION
Sampling and Modeling Pollutant Plumes: Methods Combining Direct
Measurements and Remote Sensing Data. RALPH E. FOLSOM, Research
Triangle Institute 41
"Validation" of Air Pollution Dispersion Models. ANTHONY D.
THRALL, Electric Power Research Institute 52
Modeling Pollutant Plumes. ROBERT W. JERNIGAN, The American
University and Statistical Policy Branch, Environmental
Protection Agency 54
Estimating the Spatial Uncertainty of Inferred Rates of Dry
Acidic Deposition. RICHARD A. LIVINGSTON, University of Maryland 55
IV. STATISTICAL COMPARISON OF SITES
Assessing Effects on Fluctuating Populations: Tests and Diagnostics.
ALLAN STEWART-OATEN, University of California-Santa Barbara 57
Comparisons with Background Environment: Strategies for Design.
BRUCE PETERSON, CH2M Hill 70
Discussion. R. CLIFTON BAILEY, U. S. Environmental Protection Agency 73
Appendix A: Program 75
Appendix B: Conference Participants 77
vii
-------
A STATISTICAL ASSESSMENT OF THE EFFECT OP THE ARIZONA CAR INSPECTION/MAINTENANCE PROGRAM
ON AMBIENT CO AIR QUALITY IN PHOENIX, ARIZONA
Lon-Mu Liu
G. Hudak
G. C. Tiao
University of Chicago, Graduate School of Business, 1101 E. 58th Street, Chicago, IL 60637
1. INTRODUCTION AND SUMMARY OF FINDINGS
This paper presents a statistical analysis
of ambient carbon monoxide concentrations over
the period 1971 to 1982 at three air
Monitoring sites in the state of Arizona. All
three sites (Phoenix Central, Sunnyslop*, and
Phoenix South) are located in the Phoenix area
where a vehicle inspection and maintenance
(I/H or VEI) program has been in effect since
January 1977. The principal objectives of
this study are to assess the trend in the CO
concentrations which can be associated with
the Federal Motor Vehicle Emission Control
Program to determine whether or not the I/M
program in the Phoenix area has had a positive
impact on the concentration level.
A summary of our principal findings is
given in this section. Section 2 provides a
description of the nature of the carbon
monoxide, traffic and meteorological data used
in the analysis. In Section 3 a preliminary
trend analysis based on the CO readings alone
is given. In Section 4 diurnal models
relating CO to traffic and meteorological
factors are constructed. Such models serve to
identify the major exogenous variables
affecting CO. In Section 5, time series
intervention models for monthly means of CO
readings are given. Traffic volume and
relative humidity (a proxy for mixing height)
are used as exogenous variables, and
appropriate functional forms to model the
effects of Federal emission standards and the
I/M program are constructed. Finally in
Section 6, a summary of the main results in
trend analysis together with an assessment of
the impact of the I/M program is presented.
Our principal findings are as follows:
(i) At all three sites one can observe a
reduction in ambient CO concentration
levels. The decrease at Phoenix Central is
the largest. Considering the CO
concentrations alone, this decrease is about
3.6% per year over the period 1973-1982.
Further, the reduction at Phoenix Central is
higher in the winter months, 5.7% per year,
than in the summer months, 2.3% per year.
(ii) Based on models for the monthly means of
CO at Phoenix Central which adjust for the
effects of traffic changes and meteorological
variations, the winter trend reduction is
estimated at about 7.1% per year while the
summer reduction is about 2.3% per year.
(ill) Yearly percentage changes in CO (based
on adjusted monthly readings) at Phoenix
Central are compared with two sets of emission
factors derived from MOBILE3 analysis. One
set of factors includes the expected effects
of the I/M program and the other does not. It
is found that, over the period 1973-1982, the
year to year changes in CO concentrations in
the winter months are in good agreement with
changes in the set of emission factors which
includes the effects of the I/M program.
Provided that the emission factors are
accurate, there is then some evidence from the
observed ambient CO concentration levels to
support the hypothesis that the I/M program has
had a positive impact on ambient CO levels.
(iv) Analyses of diurnal models of CO over
the period 1977-1982 produced trend estimates
largely consistent with estimates based on
monthly averages.
2. Data Base
2.1 The Arizona Inspection and Maintenance
Program
The United States Clean Air Act Amendments
of 1977 require that certain states implement
vehicle inspection and maintenance programs
(I/M programs) in certain of their major
cities to reduce hydrocarbon (HO and carbon
monoxide (CO) emissions from gasoline powered
vehicles. Arizona Revised Statutes $36-1772,
which established the Vehicle Emissions
Inspection (VEI) Program, requires that
gasoline powered vehicles pass an annual
inspection to ensure that their exhaust
emissions meet standards established by the
Department of Health Services (DHS). This
program was initiated in 1976 on a trial basis
in the Phoenix and Tucson areas, and repairs
became mandatory in January 1977. The
inspection requirment applies generally to
gasoline powered vehicles which are less than
14 years old and located within designated
portions of Pima and Maricopa Counties which
do not meet the carbon monoxide standards of
the Federal Clean Air Act. According to the
VEI program vehicles are tested annually to
ensure that carbon monoxide and hydrocarbons
in their exhaust emissions meet standards
established by DHS. Motorists whose vehicles
fail to meet these standards must repair their
vehicles and submit to a retest.
Data on the annual number of inspections
and failure rates obtained from the Arizona
DHS. Generally speaking, the monthly failure
rates between 1977 and 1983 range between 15
and 25 percent, and they are higher in 1979
and 1980 than in other years.
2.2 Carbon Monoxide Data (COj
The data consists of hourly CO
concentrations (ppm) recorded at three Phoenix
sites. The Phoenix CO measurment locations
ares Phoenix Central (at 1845 B. Roosevelt in
downtown). Sunny9lope (at 8531 N. 6th Street),
and Phoenix South (at 4732 S. Central).
live hourly data on CO concentrations at
these three stations vary in length: Phoenix
Central (71/1 - 82/12), Sunnyslope (74/10 -
82/12), and Phoenix South (74/10 - 82/12).
Missing data occur at all three sites.
Phoenix Central has one month (January, 1979)
in which data are completely missing, the
other two sites have several months completely
missing.
-------
2.3 Traffic (TR)
Many studies have shown that ambient CO
concentrations are approximately proportional
to traffic (Tiao, Box and Hamming (1975), Tiao
and Hillmer (1978), Ledolter and Tiao (1979b),
Tiao, Ledolter and Hudak (1982)). It is,
therefore, necessary to incorporate changes in
traffic into the trend analysis. Ideally, one
would want to use traffic data recorded
throughout the area affecting CO measurement
sites over the entire period under study.
Unfortunately, such detailed data are not
available for this study.
Estimates of relative traffic volume per
day in the vincinity of Phoenix and Central
and Phoenix South between 1970 and 1983 were
provided by the Bureau of Air Quality Control
(BAQC), Arizona Department of Health
Services. These figures are listed in Table
2.1 Adjustment fractions for different months
of a year, different days of a week, and
different hours of a day were also provided by
the Bureau of Air Quality Control and are
listed in Table 2.2.
2.4 Meteorological Data
Apart from traffic, variations in ambient
CO concentrations are to a large extent
affected by meteorological conditions:
(a) wind speed and wind direction affect the
transport and difussion of CO emissions (with
low wind speed resulting in high CO levels);
(b) inversion (mixing heights) affects the
volume of air available for dilution of CO
emissions;
(c) temperature, solar intensity and relative
humidity are related to the duration and
intensity of temperature inversions and the
degree of vertical mixing;
(d) meteorological variables (such as
temperature) influence the efficiency factors
of car engines (cold starts leading to higher
CO levels). Thus, one should consider
incorporating these variables as exogenous
factors in a trend analysis of CO.
Meteorological data were obtained from the
National Climatic Center (NCC) for the
monitoring station of the -National Weather
Service in Phoenix (near Phoenix Central).
Data on wind speed (WS, knots), wind direction
(HO, tens of degree), temperature (TP, °F),
relative humidity (RH, percent) and
precipitation frequency (PREC, 0-60) were
obtained for the time period 1971/1
1982/12. Unfortunately, mixing height data
were not available. A closely related
variable whose data was available is delta
temperature (AT). -The AT variable is the
difference in temperature readings recorded at
different heights. Since AT is a measure
of atmospheric stability, it may be used as a
proxy for mixing heights. Hourly AT
measurements were made by the BAQC at the 6th
Street and Bultler, where
AT - temperature (°F) at 30 feet
- temperature (°F) at 8 feet
The hourly AT data are available between
77/5 - 82/12 but with many missing days and
months.
3. Preliminary Trend Analysis of CO
The primary objective of this study is to
assess statistical evidence of the impact of
the Arizona I/M program on the ambient CO
concentration levels. Our approach to this
problem is as follows. We first present in
this section a preliminary trend analysis
based on the CO data alone. The effects of
traffic changes and the influences of
meteorological variables are then considered
in the next two sections (Sections 4 and 5)
where various models relating CO at Phoenix
Central to these exogenous factors are
constructed. Finally, in Section 6, we compare
the model-based trend estimates with the
expected reduction in CO emissions based on
EPA MOBILES analysis applied to the Phoenix
Central location with and without the I/M
program.
3.1 Preliminary CO Trend Analysis for
Phoenix Central
Figure 3.1 shows monthly means of CO at
Phoenix Central. It is seen that there was an
apparent down trend in CO over the period
1973-1982. (We have been informed by the DHS
that CO data for the period 1971/1 - 1972/3
received by us are incorrect and hence -they
have not been used in our analysis.) As a
first approximation, if we assumed a linear
time trend operating from 1973-1982, then the
estimated CO reduction in the yearly means
would be about 3.6% per year. Further study
of the data shows that the percentage
reduction is higher in the winter months (Oct.
- Feb.) than in the summer months (April -
August). Specifically, over the period 1973-
1982, the decrease is about 2.3% per year in
the summer months and 5.7% per year in the
winter months.
3.2 Preliminary CO Trend Analysis for
Sunnyslope and Phoenix South
For the Sunnyslope location, Figure 3.2
shows that over the period 1974-1982 there are
many months for which data were missing. If
we exclude the four years containing months of
missing data (i.e., 1974, 1979, 1980 and
1982), we see that the reduction in CO is
smaller than that at the Phoenix Central
location. In particular, the estimated
reduction in the yearly means of CO here is
about 2.6% per year.
Similarly, for the Phoenix South site, we
see from Figure 3.3 that there are gaps of
missing data. Excluding the three years 1974,
1979 and 1980, the estimated reduction in the
yearly means of CO is about 2.4% per year,
which is again smaller than that at the
Phoenix Central site.
3.3 General Remarks
The preliminary trend calculations were
based on the CO readings alone. Recall from
Table 2.1 that the traffic volume increased
steadily over the years 1970-1982 in the
Phoenix area. Thus, the estimated reduction
in CO emissions would be higher than the
figures given above when the traffic increases
are factored into the analysis. Also, the
preceding analysis did not take into account
the influence of meteorological variables. We
now turn to discuss various moels relating CO
-------
to exogenous traffic and meteorological
factors. In view of the fact that the Phoenix
Central site has the longest and most complete
data record on CO and that the concentration
level is also much higher there than those of
the two other sites, we shall confine
ourselves to the Phoenix Central data in our
modeling study.
4. Diurnal Models of CO
The focus of this section is on modeling
the diurnal behavior of CO which serves to
identify the main factors affecting CO, and
thus motivates the trend models discussed in
Sections 5 and 6.
As mentioned in Section 2, detailed traffic
information, such as hourly traffic counts,
was not available for the area influencing the
CO measurement sites. In what follows, we use
the estimated relative traffic volume over
1971-1982 pertaining to Phoenix Central and
the adjustment fractions for different months,
day and hours of a day provided by BAQC (see
Tables 2.1 and 2.2).
We have found that analyses using
logarithmic transformation of the variables
seem to be more meaningful than those using
the original variables. The following
notations will be employed:
Carbon Monoxide LCO - In(CO + 0.25)
Traffic LTR - ln(TR)
Inverse of Wind Speed LIW - ln(l/(WS +
0.25))
Temperature LTP » ln(TP)
Relative Humidity LRH - ln(RH)
Delta Temperature LAT - In (AT + c),
c - 2.0 for winter seasons
c • 4.0 for summer seasons
4.1 Formulation of the Models
In general, we can write a' model relating
CO to the input traffic and meteorological
variables as
CO - g(TR)f(met.)
where g(TR) is a function of the input traffic
and f(met.) is a function of meteorological
•variables. The function g(TR) will depend,
among other things, on the units of
measurements of CO and TR. As an
approximation it seems reasonable to suppose
*"" 8(W - k TR«- ».!)'
where k Is a constant measuring
the CO emissions and BI measures the
percentage change in CO due to a one percent
change in TR.
Upon studying the diurnal diagrams of CO vs ,
various meteorological variables, we have
found that the major meteorological factors
influencing the diurnal behavior of CO are AT
and the wind speed (WS) -or its inverse
IW^JS'1. We may also approximate the
function ftmet.) as
f(met.) - c(AT)e2 (IW)S3
so that we have a model of the-multiplicative
form
CO - <* T*
(4.3)
where ui - ck. For trend
assessment, we can make the constant k to be
dependent on time, and in particular, we mav
write *
k • k. «-h«T
where kg is a constant, T
measures the time unit (year) under
consideration and kj^ the percentage change
in CO emissions per unit time.
We are thus led to a model of the
multiplicative form
CO - « TR8«UT)Bl(IW)9»u
where u is the error term,
and upon taking logarithms,
LCD - 00 * BfLTR + BjLAT + 32UW + « (4.4)
where 0 - ln(o>) and e - ln(u). For diurnal
data, let t stands for the t-th hour of the
day, t»l,...,24. Because of the dynamic
nature of the traffic and meteorological
effects, we would expect that LCD could be
related to some linear aggregates of the past
values of these exogenous variables and also
that the error term efc would be serially
correlated. As an approximation, we consider
the diurnal model
*«t-1 + »t
where at's are
.- • . *. «».=*.« «£ a are
white noisjs with zero means and common
variance o .
tg\
IT*,., • (l-*)[lT»t_| * «i.THt.j + ^LTHt.j + ...]
—— (A) frill
and similarly for UATt., and LIW^.,.
For parameters estimation, we can write (4.5)
in the alternative form
- BO*
(4.6)
where 8-j* - B-jUr*), j - 0,..., 3. This is in the
form of a linear" regression model and hence
the parameters 6j*'s and * can be estimated
from standard least squares' routines.
To distinguish the behavior of CO between
the summer and the winter, we have estimated
the model (4.6) separately for each of these
two seasons employing all available data, over
the years. For each season, we let LCO(Y)
LTR< ', LAT(iJ, LIW(i> represent the average
readings of these variables at the.. t-th hour
for the i-the year, and let B(l' be a
separate constant for each of °the years
considered. Thus
LCO
is the diurnal
traffic pattern listed in Table 2.2 divided by
the annual traffic volume in that year, and
then multiplied by the annual traffic volume
in 1972 for normalization. For the summer
diurnal model, data from 1977 to 1982 are
complete. Note that in model (4.7), the term
60. may vary from year to year, but the
coefficient , Bj.., B2*, and 83. are held
constant over the years.
For the convenience of trend assessment, we
may impose a simple linear time function on
the yearly constants B^i', such as
-------
Thus the model in (4.7) "can be written as.
iCOt(" - «o
+ Bjaliw, + . (fc.S)
Note that from (4.3), here we have i»T,
kj* - k !
-------
00)
(5.5)
I
t-1.2....
(5.6)
where Yt: LCOt, logarithm of fCCy- 0.25), or
LCO*, logarithm of ((COt+ 0.025)/TR(i))
•208.275, i - 1972,...,1982
WNt: winter months (October-February)
indicator, i.e.,
WNt « 1 if it is a winter month
(i) * ° otherwise
IDSt :summer months (April-August)
indicator
for the i-th year, i.e.,
IDSt :- 1 if it is a summer month in
the i-th year
« 0 otherwise
i-1 for 1974,..., i-9 for 1982
IDV»t3 s winter month indicator for the j-th
year, i.e.,
IDW^ - 1 if it is a winter month in the
j-th year
• 0 otherwise
j - 1 for 1973/74,..., j-9 for
1981/82
TSt: summer trend
TSt « 1 for all summer months in 1973
TSt - 2 for all summer months in 1974
.
.
.
TSt .- 10 for all summer months in 1982
TSt - 0 otherwise
TOt: winter trend
Wt » 1 for all winter months
in 1972/1973
TOt - 2 for all winter months
in 1973/1974
TO « 10 for all winter month in
1981/1982
TO «• 0 otherwise
In the above two models. Model (5.5)
provides more detailed information of the
change of CO concentrations from year to
year. Specifically, a^ measures the effect
(percentage change) of CO in the i-th summer
compared with that for the base year 1973, and
a2j the effect (percentage change) of CO in
the j-th winter compared with that of the
1972/73 winter. Instead, Model (5.6) assumes
linear time trends and 8± and 92 represent
the percentage reductions of CO concentrations
per year in the summer and winter
respectively.
5.2 Model Estimation
The parameters in models (5.5) and (5.6)
are fitted to monthly averages of CO, CO*
and RH using the SCA statistical system (Liu
et al, 1983). The estimation results for
Phoenix Central are listed in Table 5.1 for
LCOt and Table 5.2 for LO>t.
5.3 Trend Analysis for Phoenix Central
For LCOt (without normalization by the
traffic volumes), Figuere 5.1(a) gives a plot
of the estimated effects a2j ' (compared with
the based year, winter of 1972/73) for the
winters of 1973/74 to 1981/82, and Figure
5.l(b) presents a plot of o. . (compared with
the base year, summer of 19737 for the summers
of 1974 to 1982. Similarly, Figures 5.2 (a)
and (b) give the corresponding, plots of LCo£
(after traffic normalization). From these
figures and the estimates in Tables 5.1 and
5.2, we make the following observations:
(i) Both in the winters and in the summers,
the down trend in CO is more pronounced after
traffic normalization. This is, of course, to
be expected since the traffic volume was
steadily increasing throughout the period
1973-1982 considered.
(ii) By comparing Figures 5.Ha) with 5.2(a),
and Figures 5.Kb) with 5.2(b) we see that the
relative changes in the yearly effects are in
fact much closer between LCOt and LCOt
from 1976 onward compared with the changes
before 1976. This is also reasonable since
the annual percentage increase in traffic
volume was much larger before 1976 - see Table
2.1. Thus, normalization by traffic volume
has its major impact on the trend movement
during the early, part of the data period
considered.
(iii) For the normalized LCO*. data, model
(5.6) yields an estimated down trend of 7.1%
per year in the winter season over the en.tire
period 1972/73 to 1981/82. The magnitude of
the percentage reduction in the summer season
is much smaller, 2.3% per year. Also, figures
5.2(a) and (b) show that the summer effects
exhibit considerably more variability than the
winter effects do from year to year. In
Section 6, we shall compare these estimated
effects with the expected reduction in'
emissions from EPA MOBILES analysis.
6. Summary of Trend Analysis and Assessment of
the Impact of the I/M Program at Phoenix
Central
In Sections 3-5, we have performed several
types of trend analyses on the CO
concentration levels. Our main focus has been
on CO at Phoenix Central where the readings
are substantially higher than those at the
other two sites. Sunnyslope and Phoenix South,
and also where . the data span is the longest
and most complete of these three locations.
The main results at Phoenix Central can be
partially summarized as follows:
(i) Over . the time period April 1972 to
December 1981 studied,' there is clear evidence
of a down trend in the CO concentrations
(ii) The down trend is more pronounced and
smoother in the winter months (October-
February), while considerably higher
variability exists in the trend movement in
-------
the summer months (April-August). Note that
CO concentration levels are substantially
higher in the winter.
(iii) For the winter months over the period
1972/1973 to 1981/1982:
(a) preliminary analysis of monthly
averages of CO readings alone yields an
estimated trend reduction of 5.6% per year
(b) after allowing for traffic and
meteorological changes, our analysis of the
monthly CO averages shows an reduction of
about 7.1% per year.
(iv) Over the four winters 1977/78, 1979/80,
1980/81 and 1981/82, our diurnal model
relating CO to traffic and meteorological
factors yield an estimated reduction of 15%
per year.
6.1 Assessment of the Impact of the I/M
Program
Empirical assessment of the impact of the
Arizona I/M program on CO concentration levels
is made difficult by (i) the confounding
between the expected effect of Federal
emission standards and that of the I/M program
and (it) the lack of a comparable "control"
site without the I/M program. One approach to
this problem would be to construct a time
series model of the monthly averages of CO
with a linear time function .characterizing the
general down trend in the data and an
additional term representing possible change
in level of slope at the inception of the I/M
program. For examples, we may contemplate the
models
C0t -ao*ait + «
or
C0t -
(6'2)
.,t*^(t.«.}.tct-»* k
signifies the inception of the 'I/M
" ,<'•>.}' "-
(1 tit.
Nt us the noise term which could be auto-
correlatated. In (6.1) ' 03 measures ' the
level change and in (6.2) a2 measures the
slope change associated with the I/M program.
One may then estimate the a's from the data
and test the significance of a, . This
approach has been adopted by McCleary and
Nienstedt (1983) in their study of the Arizona
DO data, and they have concluded that the I/M
program has had no impact because the estimated
a2 from models of the form (6.1) or (6.2) is
not statistically significant. Their results
can be largely seen from the plots of the
estimated yearly effects of ' LCO^ in Figure
5.2. Consider, for example, the winter
situation. Clearly there is little evidence
from these yearly effects to support either a
change of level or slope beginning in 1977/78.
In the absence of any information about the
magnitude of expected effects of either the
Federal Emission standard and/or the I/M
program, and assuming that the linear time
functions in (6.1) and (6.2) aresuitable
approximations to the shapes of the expected
impacts of these two types of control
measures, the above approach would be
useful. On the other hand, when accurate
information about the magnitude of the
expected effects is available, other
formulations of the problem are possible.
Table- 6.1 lists the CO emission .factors
from 1972 to 1982 with and without the I/M
program for the central area of Phoenix
supplied to us by the DHS. These factors
were derived from EPA MOBILES analysis with
adjustment for vehicle tampering/misfueling
and varying speed over years. The yearly
factors correspond to estimates for January 1
of the years. Assuming that these factors
accurately represent the expected effects of
the Federal emission standards and the
expected impact of the I/M program, we can
then employ the CO readings to assess to what
extent trends in the emission factors with and
without I/M are compatible with evidence from
the data. For this analysis, we shall use
monthly averages of the CO readings as
discussed in Section 5.
Figure 6.1(a) shows plots of the logarithms
of the emission factors over time where a dot
" " corresponds to the situation without I/M
and a cross "x" to the situation with I/M
effects included. The points in this figure
then represent the percentage changes in
emissions from year to year with and without
the effects of the I/M program. Superimposed
in the same figure by the circles "0" are the
yearly effects of LCo£ for the winter months
estimated from model (5.5) and shown earlier
in Figure 5.2(a). It is clear that the trends
in the estimated yearly effects are in
reasonable agreement with emission factors
with the I/M program.
In a similar way. Figure 6.Kb) shows
logarithms of the emission factors together
with the yearly effects of LCO* for the
summer seasons shown earlier in Figure
5.2(b). In this case, the estimated yearly
effects are in better agreement with emission
factors without the I/M program. However, it
should again be noted that CO readings in the
winter months are substantially higher and
that the fluctuations in the summer yearly
effects are much larger.
To formalize the analysis, we let f.(t)
be a function whose values are given by the
logarithms of the emission factors with the
I/M program (i.e., dots from 1972 to 1977 and
crosses afterwards) and fQ
-------
where
«g(t)
1 t in summer months
0 otherwise
Model (6.4)
8
SO
6 "(t) » 1 t in winter months-
0 otherwise
Thus, the model (6.3) aims at assessing to
what extent 8S1 - 1 and 8^1-1 are
supported by the data (i.e., the "with I/M"
hypothesis), whilst (6.4) is intended for
6SO - 1 and 6WO - 1
(the "without I/M" hypothesis)
(6.3) and (6.4) gives
Model (6.3) estimate
6
Estimation of
SI
9wi
-------
Table 2.1
Relative Traffic Volume Per Day in Phoenix Area
Year
70
71
72
73"
7U
75"
76
77"
78
79"
80
81
82"
83
Phoenix
Central
131.000
151.500
168.400
181.900
193.00
202.200
'08.300
212.000
214.800
216.400
217.900
218.700
219.300
219.700
Phoenix
South
17.733
18.467
19.200
19.933
20.667
21.400
22.133
22.867
23.600
24.333
25.066
25.799
26.533
27.266
(•): interpolated data
Table 2.2
Adjustment Fractions
for Months, Days and Hours
Adjustment
Hr.
1
2
3
4
5
6
7
8
9
10
11
12
02
04
03
02
.97
.96
Day of Week
Hon.
Tues.
Weds.
Thurs.
Fri.
Month
July
Aug.
Sep.
Oct.
Nov.
Dec.
Adjustment
.94
.96
.98
1.00
1.02
1.05
Adjustment
.96
.96
.98
1.00
1.09
Hour of Weekday Adjustments
Adjustment
.011
.007
.004
.004
.005
.012
.035
.069
.057
.051
.054
.059
Hr.
13
14
15
16
17
18
19
20
21
22
23
24
Adjustment
.060
.059
.064
.072
.081
.076
.056
.045
.037
.034
.028
.020
-------
Figure 3.1
Plot of Monthly CO Means »t Phoenix Central
71 • 72 « 73 • 7<» ' 75 * 76 • 77 * 78 • 79 I 80 I 81 • 62 '
Figure 3.2
Plot of Monthly CO Means at Sunnyslope
• 75 « 76 • 77 * 78 • 79 • 80 « 81 « 82 I
Figure 3.3
Plot of Monthly CO Means at Phoenix South
• 75 «7o • 77 • 78 «79 • 80 « 81 • 82 I
-------
Table 4.1
EltlMtlen Reaultt for the Diurnal nodela (4.7) and (4.9)
(•) Winter Seaioni (1977/78 - 1981/82)
Node! (4.7)
•a1!' (1977/78)
fl.1*' (1979/80)
B*^ (1980/81)
B.'|' (1981/82)
a'*
»
2.260 (.369)
2.161 (.344)
1.958 (.318)
1.867 (.275)
.408 (.058)
.266 (.028)
.407 (.040)
.(41 (.157)
.194
Hodel (4.9)
aa
"1*
Bin
6,.
a
2.K3 (.357)
-.088 (.025)
.435 (.057)
.258 (.028)
.401 (.040)
•553 (.138)
.I9(
(b) $
teiaona (1977 - 1982)
Model (4.7)
fl«'i" (1977)
B.(2) (1978)
B.',' (1979)
B.'f (1980)
B.(|l (1981)
fl.'? (1982)
5"
82*
'
.284 (.389)
.123 (.374)
• 292 (.373)
.197 (.343)
.138 (.297)
.183 (.288)
.467 (.0(9)
.117 (.035)
• 386 (.051)
.086 (.148)
.250
Hodel (4.9)
at
"l«
S1*
Bj*
Bj»
V
.030 (.375)
-.002 (.024)
.530 (.0(4)
.102 (.034)
•357 (.050)
.001 (.1)1)
• 252
Figure 4.1 (a)
Otaaerved (A) vi predicted (B) hourly CO avaragaa of Phoenix Central
(winter tenon)
CO
8 16
16 8 16
Hour of the Day
8 16
Figure 4.l(b)
Obierved (A) v« predicted O) hourly CO averagaa of Phoenix Central
(auamr aeaeon)
16
16
8 16 8 16
Hour of the Day
16
8 16
-------
T»blt 5-1
EstiMtion Results for the Tin Scries Models (5.5) and (5.6)
Yt - LCOt (10/72 - 9/82)
TabU 5-2
EstiMtion Results for the Time Series Models (5.5) *nd (5.6)
Y. - LCD
(10/72 - 9/82)
Model (5.5)
0t<
0.t
74
75 i
76
77 4
78 *
79 •
80 r
81 •
82 an
W 73/74 ai,
W 74/75 an
W 75/76 a,.
W 76/77 ai4
W 77/78 ait
V 78/79 a,.
U 79/80 oi7
W 80/8) ai.
W 81/82 a,.
1 .
u
2.997 (-367)
.447 (.159)
-.033 (.124)
-.000 (.123)
-.364 (.123)
.045 (.123)
-.095 (.123)
.143 (.123)
.006 (.123)
-.177 (.123)
-.120 (.123)
-.206 (.152)
-.218 (.147)
-.113 (.150)
-.287 (.150)
-.282 (.147)
-.354 (.146)
-.382 (.148)
-.577 (.150
-.418 (.148)
-.611 (.098)
.721 (.083)
Model (5.6)
0*i
B.i
0<
0i
r
W
2.850 (.358)
.468 (.139)
-.005 (.011)
(sussMr)
-.048 (.013)
(winter)
-.580 (.098)
.705 (-090)
Modi
P..
0.1
74 an
75 an
76 an
77 ai.
78 a,,
79 a,.
80 air
81 at.
82 ait
W 73/7* ai,
W 74/75 aii
W 75/76 a>.
W 76/77 a, 4
W 77/78 a,,
W 78/79 a,.
W 79/80 air
W 80/81 a»
U 81/82 ai.
•y
w
U (5-5)
3.1*0 .366)
.508 .159)
-.092 .123)
-.106 .123)
-.499 ."3)
-.109 .123)
-.261 .123)
-.031 .123)
-.175 .123)
-,361 .123)
-.307 .123)
-.277 .152)
-.3*2 .U6)
-.278 .149)
-.477 .150)
-.487 .147)
-.570 (.145)
-.606 (.148)
-.807 (.150)
-.650 (.147)
-.614 (.098)
.676 (.083)
Hi
fl..
9.i
0.
0i
r
U
Kill (5.0)
2.927 (.360)
.525 (.140)
-.023 (-011)
-.071 (.013)
(winter)
-.574 (.099)
.655 (.090)
Figure 5-'(«)
Plot of •»limited yearly effects of LCOt from model (5.5)
(winter season)
Figure 5. 2 (a)
Plot of estimated yearly effects of LCflJ fro- awdel (5.5)
(winter season)
• IltUttld •ffMt
i-.-i :i s.i.
-.1
-.j'
-.*
r T '
i ! T t
1 I 1
Mi s
! f f
i i :
1 *
7V7» 7V7« 77/7«
• ntliatid
>—< si s.i.
T
; T
i "*"
f i
i
1 T '
i J
1 i !
7»/»» | ll/U
i
1
T i
i i
f : T T
A : a .' !
L i 1
; !
^
•
.
i
-
k
r
' 7J/7«77V7« ' 7>/ao
Figure 5. Kb)
Plot of estimated yearly effects of LCOt fro* aodel (5.5)
(sumer season)
Figure $.2(b)
Plot of estlswted yearly effects of
(susver Season)
LCO*
from xodel (5.5)
0.1
-a.
-•3-
r
1
i
T i
i i
t !
i •
i ^
.t
T
1
T
i
f
•
i
j r
1 i
i
i
T
1
i
i
i
i
T
1
i
i
*
7*»
76
-a.
-J.
-.5'
t r
* ' : ]'
• 4* ' '
t * I :
i t :
i i *
i ^
i
i
11
-------
Table 6.1
CO Emission Factors (a/mil*) Derived From MOBILES Analysis
(Phoenix Central, 7*10 am and 7-10 pm in Oct.-Feb.)
Year «
month
72/1
73/1
7 VI
75/1
76/1
77/1
78/1
79/1
80/1
81/1
82/1
83/1
No I/M
88.08
89.06
91.25
91.^3
90.01
88.96
87.35
85.51
81.75
78.23
73.9<»
69.56
with I/M
81.83
72.32
63.31
59.89
56.3fc
52.78
12
-------
Figure 6.1 (a)
Plot of CO mission factors with yearly effects in th« winter season
S *t0n
+*
a
^* »
o
•^
CO cl «L
CO ""
•H
a
o
e
£ 4.2-
•£
cd
o
4.0
0 0
o • 0
X
e
0
0 •
X
o
o
x 0
X
X
- ~ .2
> 03
0
*
01
- 4 **
7<
cd
0)
>*
b
4)
- -.6 1
*
•
-.8
73/74 75/76 77/78 79/80 81/82
year
• Emission factors without I/M (based on MOBILE3)
x Emission factors with I/M (based on MOBILES)
o Winter yearly effect based on Model 5.5
13
-------
Figure 6. Kb)
Plot of CO emission fectors with yearly effects in the summer season
£ *-61
o
0
o
« 4.4-
o>
~e
""
^4
O
| *»-2-
.^
(<
cd
I
4.0
o
8 ! ' • .
e
, * «
o *
8
X
0
x
0 X
X
-.0
to
o
SH
-------
SAMPLING DESIGN: SOME VERY PRACTICAL CONSIDERATIONS
D. E. Splitatone
IT Corporation, 2790 Mosside Blvd., Honroeville, PA 15146
INTRODUCTION
When Waiter Liggett extended the
invitation to participate in this conference,
I glibly responded in the affirmative and in
a cavalier manner conceived of a topic title
that, I perceived, would not get me in too
much trouble. I will confess the possibility
of following George Tiao gave me some
pause. However, I am about to provide some
words of what I hope will be wisdom regarding
very practical considerations in sampling
design for the gathering of information
during studies of the environment.
Because my current position of employment
requires a concentration on the problems
posed by hazardous materials and wastes, this
paper will identify a few of the more
intriguing statistical issues associated with
hazardous waste investigations. As my
approach to these issues is conditioned on my
subscription to Donald Marquardt's wider view
of the statistical profession, permit me to
quote his definition of that perspective in
order to provide a frame of reference.
The wider view is that statisticians are
systems guides for tough, fuzzy issues
wherever the collection and interpretation of
data should be involved. The wider view
expects statisticians to provide the
following:
o Full service guidance and
participation in problem diagnosis
and solution
o Services for problems in any area of
application
o Services at any level of detail or
abstraction.
From this perspective, individual statistical
methods are viewed as tools or component
elements in an overall systems'approach
(Marquardt, 1987).
Philisophically, the design of any
sampling scheme is easy. Paraphrasing
William Lurie's classic article (Lurie, 1958)
one only needs to answer the following
questions:
o What does one know about the site
under investigation?
o What does one wish to determine
regarding the site?
o And, how certain does one wish to be
in the conclusions?
In addition, one may add the admonition of
Oscar Kempthorne (ca. 1966) that an objective
of any experimental design is to make the
analysis of the resulting data easy.
The practicality of meeting of all of
these criteria, however, is quite another
matter.
EXAMPLE 1 - THE CONTAMINATED BUILDING
Consider as anillustrative example the
situation presented by a building thought to
be contaminated by polychlorinated biphenyls
(PCBs), (Ubinger, 1987). The design of a
sampling program for such a situation is no
trivial task. It certainly must start with a
clear statement of the objectives which may
be multifaceted. Consider the following
possibilities:
o Determination of the existence of
contamination (a preliminary site
investigation)
o An assessment of the risk of
exposure
o Tracking the spread of contamination
and mapping its extent
o Identification of sources of
contamination
o Post cleanup confirmatory sampling,
and
o Certification that cleanup has been
achieved (closure sampling).
There is not single sampling strategy that
will apply to all types of problems and all
of the above objectives. Clearly, the
objective of any sampling strategy is to
provide the most cost-effective approach to
meeting the study objectives while
maintaining a high degree of confidence,
precision and accuracy regarding the study's
conclusions.
Once the objectives of the investigation
are clearly defined, the approach to the
development of a sampling strategy must
consider the nature of the source of
contamination. The two major causes of PCB
contamination are electrical transformer
fires and historical usage of PCB-containing
lubricants. Each results in very different
patterns of potential contamination which may
affect the design of data collection.
Typical patterns are as follows:
Fires/explosions
o Fluid loss - Pattern radiating from
the source; largely confined to
horizontal surfaces at or below the
source
o Smoke borne - Spread through
ventilation system, decreasing
contamination as distance from the
source increases; horizontal and
vertical surfaces at and above the
level of the source
15
-------
o Tracking by human activity - Floors
and high-contact surfaces; may see
buildup in high traffic areas
Historical Usage
o Oil creep - Generally on horizontal
surfaces in proximity to the source;
may be localized buildup in less
accessible areas
o Reclamation and storage rooms and
facilities
o Tracking - floor and other high-
contact areas
o Other - permeation of floor into
underlying soil; handling areas -
loading docks, trash areas.
Let us assume for the moment that the initial
objective of the investigation is to define
the extent of contamination in anticipation
of building cleanup or remediation.
Traditionally, the sampling scheme employed
in such situations has been to engage in
directed or judgemental sampling. That is,
sample those areas which appear through
evidence of oily stain to be contaminated,
and occasionally sample areas which do not
visually appear to be contaminated. The
statistical effect of directed sampling is
certainly to increase the precision of the
estimated magnitude of contamination in those
areas which visually appear contaminated, but
is of little assistance in defining with
confidence the spatial extent of
contamination. Nor does directed sampling
provide sufficient information on the lack of
contamination, or the cleanliness of areas
which appear to be uncontaminated so that
unnecessary remediation can be avoided with
confidence (Ubinger, 1987).
Though I will return to the issue of
making decisions regarding cleanliness later,
let me simply state that a decision to
require remediation can be made based on only
one sample. The decision not to remediate,
however, requires much more evidence. Thus,
if an objective is to minimize unnecessary
remediation, equal or perhaps greater
attention in sampling design should be given
to apparently "clean" areas as was accorded
apparently contaminated areas.
Ubinger (1987) has compared the results of
statistically designed sampling and directed
sampling and found the two methods to be
equally effective in detecting areas of high
contamination with an obvious advantage to
the statistically designed sampling program
of providing unbiased results. In addition,
the statistically designed program was more
effective in supporting assessment of the
spatial extent of contamination and in
identifying areas which are uncontaminated.
It appears obvious that there are
advantages to statistically designed sampling
plans. But what type of statistical design
does one employ to do an effective job? The
answer to that question can only be that it
depends. It depends on the specific
objectives of the investigation and what is
known about the site and potential source of
contamination. One generally starts by
laying a grid system over the area of
interest. The grid spacing is determined in
large measure from decisions regarding the
areal size and geometry of the minimum
contaminated area necessary to detect and the
degree of risk deemed acceptable of not being
able to detect that area (Zirchky and
Gilbert, 1984). These decisions regarding
target size and acceptable risk are not
inherently statistical decisions. They are,
appropriately, the policy decisions of those
who must balance cost and risk. There may
not be a single decision regarding target
size and the risk of missing that target
applicable to the whole of the area or
building. For instance, if the objective is
to assess the health risk to building
occupants due to PCB exposure, areas with
high human contact potential are of greater
importance than areas of low contact
potential. Thus, it may be perfectly
justifiable to engage in less sampling in low
contact areas than in high contact areas.
It would certainly appear that
stratification is an important component in
the sampling design. Intelligent choice of
strata can only be accomplished through
thorough knowledge of the physical plant of
the building under construction, the cause of
the alleged contamination, and traffic and
work patterns within the facility. The
latter is less important if the objective of
the study is solely to map the spatial extent
of contamination; however, if health risk is
to be an important component in the decision
making process, a probability quota sampling
design may be employed. Only if sampling
locations are chosen with probability
proportional to frequency of contact can an
overall estimate of exposure be formed
without application of external weights.
In summary, the design of sampling schemes
for the allegedly contaminated building are
among the most complex to formulate. In
order to define the problem statistically, a
number of definitions are required:
o What is to be measured; what is a
suitable limit of detection?
o What is the criteria for cleanup
decisions - are these criteria to be
based on risk assessment?
o What are the statistical quantities
that define a decision rule for when
to remove material (the specific
value may be a result of a site-
specific risk assessment) , and the
quantification of the risk of
failing to remediate?
o What field sampling plan is to be
used to obtain representative PCB
concentration data?
16
-------
o What are the action guidelines?
o If the decision rule says "clean,"
must the whole room be cleaned?
o If only hot spots are found, might
they alone be cleaned?
o Is there a logical minimum to the
size of a "remediation unit?"
Obviously, most of the answers to these
questions are site specific and, therefore,
defy general codification. The answers also
require the direct interaction of the
statistician with other members of the study
team, the interchange of information among
the expert in building ventilation, the
industrial engineer, the expert in building
remediation, and the risk assessment
specialist.
EXAMPLE 2 - THE IMPOUNDMENT
Compared to the problems associated with
the sampling of building interiors, the
sampling design for a waste impoundment might
appear trivial. One does not have to reflect
on the sampling of walls, ceilings, air
handling systems and how to deal with
lighting fixtures and machinery. However,
first impressions are often wrong, and though
certainly easier to conceptualize, the
sampling design for a waste impoundment is
not trivial. Returning to the central theme
of this paper, one must start by asking
questions.
o Just what does one wish to find out
about the impoundment and its
contents?
o What types of waste might the
impoundment contain and at what
levels of concentration are they
expected to be found?
o How did the waste get there? By
pipe, from a process reasonably
continuous in time or by truck from
some batch process?
o Where are the discharge points to
the impo undmen t?
Answers to questions such as these provide
the clues to potential spatial stratification
for which the sampling design must account.
Generally, whether the waste within the
impoundment is spatially homogeneous or
heterogeneous is of concern regardless of
what will be done with it later. This
applies largely to sludges, as any liquid
layer will likely be well mixed at the moment
of treatment and disposal, except perhaps in
very large impoundments. Considering the
sludge alone, prior to the design of a
sampling strategy, the potential
stabilization must be considered. If no
stabilization is to occur, then the sampling
design must focus on the chemical
characterization of the sludge in situ
subject to the conditions of closure or
removal. If the sludge is to be removed, the
process of removal may well suggest natural
sampling units, e.g., a volume associated
with a backhoe or clam shell bucket or
perhaps a truck load. The delineation of
subareas containing hazardous or nonhazardous
material may be useful in reducing the cost
of treatment and/or disposal. If
stabilization is to occur, the sludge may be
homogenized vertically by a barge.
Popular guidance currently provided for
the construction of sampling plans for the
characterization of potentially hazardous
waste (U.S. EPA, SW-846), contains a
philosophy which I believe is misleading and
statistically naive. Consider the following
statement from SW-846.
The regulations pertaining to the
management of hazardous wastes contain
three references regarding the sampling
of solid wastes for analytical
properties. The first reference, which
occurs throughout the regulations,
requires that representative samples of
waste be collected and defines
representative samples as exhibiting
average properties of the whole waste.
This statement presumes two things which are
almost never true: (1) that the waste is
homogeneous, and (2) that environmental
protection is served by considering what
happens on the average. It has been my
experience that hazardous waste is almost
always heterogeneous in space and/or time
except under very special circumstances. It
certainly is appropriate to require that
samples be representative of the portion of
the waste from which they were collected, but
to define "...representative samples as
exhibiting average properties of the whole
waste" expresses a view of the world that
simply is not true. Homogeneity is something
which must be determined. Appropriate
situation specific sampling designs can
effectively be used for this purpose. How
effective and efficient the sampling design
is for characterizing a hazardous waste
population depends in large measure upon the
amount of prior knowledge available about the
specific situation and a statement as to the
objective uses of the data collected.
The fact that what happens on the average
rarely serves the purpose of environmental
protection seems obvious to me, and I will
later attempt to demonstrate this very
point. I submit that it is not what happens
on the average but the consequences of the
extremes that are environmentally
interesting. As will be illustrated later,
when discussing the example given in SW-846,
it gives one little comfort to know that in
expectation the average of nine samples of
sludge will be less than the specified
regulatory threshold 80 percent of the time
when a full 20 percent of the individual
samples will exceed that threshold.
17
-------
The second paragraph of Section 1.1.1
indicates that the issue addressed by
representative samples is that of sampling
accuracy. I submit that this is not the case
under the definition of representative
samples stated. Further, this may not be the
case for the subpopulation of waste from
which the sample was taken unless use of
various analytical methods and
interlaboratory comparisons are employed.
The second reference, which pertains
just to petitions to exclude wastes
from being listed as hazardous wastes,
specifies that enough samples (but in
no case less than four samples) be
collected over a period of time
sufficient to represent the variability
of the wastes.
The variability of the waste is certainly
important to assess, however, rarely is the
variability exhibited among samples taken
during even a well designed investigation due
solely to random variations in the waste.
Much of the variability observed can be
attributed to factors in space and/or time.
Ignoring such factors can have significant
consequences, not only in terms of the degree
of environmental protection, but also in
terms of unwarranted expense and the
sensibility of statistical tests employed to
evaluate the waste. Sampling designs which
permit identification of these factors allow
for the possibility of increased
environmental protection at lower cost.
Unfortunately, the identification of
appropriate subpopulations may require the
application of statistical techniques in
light of specific knowledge of the situation
in a manner which cannot be codified. In any
event, the number and placement of samples in
space and/or time for hazardous waste
investigations has many more aspects than one
of sampling precision once one conceives of
the problem more realistically than that of
characterizing average properties.
Consider the following illustration of the
consequences of basing the decision as to
whether a waste is hazardous or not hazardous
on only average properties. The figure below
60
ZO SO 80
CUMULATIVE PERCENT
DISTRIBUTION OF SINGLE SAMPLES AND SAMPLE MEANS
BARIUM SLUDGE EXAMPLE SW-846
is generated from data presented as the
Hypothetical Example given on Page 14 of
SW-846 and used to illustrate sample size
selection under different strategies. In
construction of the figure, I have assumed
that the normal distribution ia an adequate
model and that the true mean (ji) is equal to
93.56, and the true standard deviation is
equal to 7.75. These values are the
estimates of their respective parameters at
the moment that a decision is made that
barium is not present in the sludge at a
hazardous level (see Equation 6 on Page 17).
The salient points to be made using this
figure are simply stated in the following:
(1) Though a conclusion was reached that
barium was not present at a hazardous level
in the sludge based on inferences regarding
the mean of nine samples, a full 20 percent
of the sludge would be expected to be above
the hazardous level. And, (2) it would
appear that if the true mean concentration is
below the regulatory threshold, a decision as
to whether a constituent is or not present at
a hazardous level is solely a function of the
number of samples taken using the logic
presented.
In point 1 above, I have used the term
expected in the statistical sense. If one
were to demand a greater degree of confidence
in estimating the maximum percentage above
the regulatory threshold, then a tolerance
limit rather than a confidence limit should
be used. The result would be closer to
30 percent of the sludge in the hazardous
concentration range with 95 percent
confidence. Guttman, 1970, differentiates
these approaches as "Beta content" versus
"Beta expectation" tolerance regions.
If one focuses on deciding whether or not
the sludge contents of an impoundment are
hazardous, given a definition of "hazardous,"
a specification of risk to be tolerated in
terms of the fraction of sludge permitted to
exceed the defined threshold, and a specified
desired degree of confidence to be associated
with the decision, then determination of the
number of samples required would appear
straightforward. However, many would attempt
such a determination by using techniques
which assume that the statistical
distribution of chemical concentrations is
normal. Though upon occasion, one may be
willing to assume that variations introduced
by measurement and sampling may arise from a
normal distribution, in general the
assumption of a normal distribution of
hazardous waste measurements over space
and/or time cannot be supported, torse, et
al, (1986) suggest that nonparametric
approach to the problem be taken. In such an
approach, the maximum proportion of the
population exceeding the ultimate or
penultimate observation with specified degree
of confidence is a function of the number of
samples taken. For instance, one is
95 percent confident that at most five
percent of all samples will have a
concentration greater than the maximum
concentration of a set of 59 random samples.
18
-------
The use of order statistics and the rank
orders of hazardous waste chemical
concentration data are quite useful in
hazardous waste investigations. However,
that is a topic for an in-depth presentation
of data analysis. Such data analysis
considerations do indeed have an effect on
the sampling design.
Returning to the problem of making a
hazardous/nonhazardous declaration regarding
the sludge in an impoundment, a first step
might be to consider it as a target detection
problem. The size and geometry of the target
is determined by one's knowledge of how the
waste entered the pond. Entry in the form of
a slurry through a pipe at one end of the
impoundment might lead one to consider a
rather large elliptical target, for
instance. A grid may then be superimposed on
the pond surface, which will give a high
probability of detecting the predefined
target if samples are collected at, say, grid
nodes. Quite likely, the cost of analyzing
all of the collected samples will be quite
high. However, if the concentration decision
level is sufficiently large, sample
compositing may be effectively introduced to
keep the costs within reason. Of course, the
assumption here is that the analytical cost
is large with respect to the sampling cost.
This assumption is not always true.
Consider for example an impoundment of
nominal dimensions of 200 by 150 feet.
Superimposing a square sampling grid with
25-foot grid spacing gives one a probability
of 80 percent of detecting a circular area of
contamination of 25 feet diameter with 48
samples. If one is concerned about
contamination with barium, and the
concentration level for a declaration of a
hazard is 100 ppra, then the possibility of
compositing samples in sets of four will
reduce the number of analyses to 12 without
sacrifice to environmental protection,
provided that the analytical method detection
limit is less than 25 ppm. One must
consider, however, the potential losses
associated with an increase in Type 1 error
should the sum of the concentrations of the
four individual samples, now unknown, exceed
100 ppm when each sample concentration is in
fact less than 100 ppm. This increased risk
of unnecessary remediation may be reduced if
sample holding times are such that aliquots
of the individual samples can be preserved
for later analysis. If such a two-stage
analytical program is not possible, then some
tough risk balancing decisions must be made
prior to the design of the sampling program.
As a final comment regarding the waste
impoundment, if stabilization is to take
place, a two-stage sampling program may be
the most cost effective. The purpose of the
first stage of sampling may be solely to
determine the gross trends in the
physical/chemical properties of the sludge.
Interest is primarily in determining those
factors which may affect stabilization. At
this stage, only a few samples are required
to investigate gross trends, say only nine
for the impoundment described above.
Multiple regression techniques may be used to
investigate the existence of spatial trends
important to cost-effective application of
stabilizing treatments. Once stabilization
has been accomplished, the objective of
additional sampling would be to determine the
requirements of sludge disposal or
impoundment closure. As the act of
stabilization almost invariably involves
mixing, the properties of the stabilized
sludge may be quite different than the
unstabilized material.
EXAMPLE 3 - THE LANDFILL AND THE INDUSTRIAL
OR RESIDENTIAL SITE
Much of the above discussion applies to
the investigation of landfills or the
potentially contaminated industrial site. A
very important difference between this type
of site and the waste impoundment is the
greater potential for lack of spatial
homogeneity. Obviously, in the instance of a
modern, well-run and documented landfill, the
sampling plan should be designed to
comprehend the known stratification of
material disposition. However, consider the
situation of the "Superfund" site. Little
information is generally available as to the
depositional location of potentially
hazardous material. Certainly a continuum of
the degree of information exists regarding
the initial deposition, potential transport
and nature of the waste from well kept
documentation, to reliance on memory and to
nothing at all. The same is true of the
industrial site where contamination may have
occurred at some past time through, not only
intentional disposal, but accidental spills
or process leaks. One must also be cognizant
of construction activities and their
potential for transportation of contaminated
soil. The more of this information that is
available, the better the sampling plan.
As is the case with the waste impoundment,
it seems reasonable to begin the design of a
sampling plan by considering the problem as a
target detection problem. Once the possible
mechanisms for hazardous material deposition
have been identified, the size of the target
to be detected and the degree of risk to be
tolerated in not detecting the target must be
determined. The latter is purely a policy
decision, but the former may be assessed in
more technical terms. Suppose it is known
that organic solvents were disposed of in 55-
gallon drums. Given some information
regarding the soil permeability, the
transport coefficient of the solvent and the
approximate length of time a drum has been in
the ground, it is possible to estimate the
spatial extent of a potential plume arising
from a leaking drum. However, the task is
more complex in the case of dioxin which is
generally not mobile in soil. In this case,
very small isolated deposits of dioxin may
exist. This is particularly true on the
industrial chemical plant site where
deposition may have been a result of
explosion, process leakage or plant expansion
activities.
19
-------
If the analytical procedure to be employed
is inexpensive, as with a walk-over radiation
survey using a geiger counter or a portable
organic vapor analyzer, one may simply lay
out the required grid network for the desired
target size and risk of not detecting the
target. Such situations are rare, however.
Hast frequently the statistician's job is to
effect a compromise between the competing
objectives of determining the potential of
hazard and minimizing the cost of doing so.
As indicated in the above discussion of
the contaminated building, it takes only one
sample to indicate the existence of
contamination. It takes several more samples
to determine the extent of contamination
however. Often contamination is known to
exist at a site through a prior sampling
event. Possibly the location of a
contaminated area is even known quite
precisely. The objective of future study,
and hence the sampling plan, is to determine
the extent of contamination and provide a
rational basis for the making of remedial
decisions. If one is willing to make the
assumption of the stationarity of spatial
similarity of contaminant concentrations,
then the techniques used in geostatistics
offer some exciting possibilities. This is
particularly true of nonparametric
geostatistics (Flatman 1984: Isaaks, 1984;
Journal 1984-1, 1984-b, 1987). In such
instances one may be willing to employ a
sampling grid of rather course spacing,
augmenting it at an occasional grid node by
sampling along a geologic "fence."
There is no substitute for knowledge. To
be effective in meeting the objectives of any
investigation, the sampling scheme must be
designed with all the available knowledge
regarding the site, the suspected
contaminant, its potential for transport, its
means of initial deposition and potential
analytical methods required for its
detection. This requires a close interaction
among all members of the team of
investigators. Only with site-specific
knowledge can an adequate sampling plan be
developed.
Perhaps the most difficult objective to
accomplish is to declare with a specified
degree of certainty, that a site is free of
contamination. One is faced with this
objective after remediation of a contaminated
site and under the provisions of the State of
New Jersey Environmental Cleanup
Responsibility Act (ECRA). ECRA is involved
in all major real estate transfers in
New Jersey, and similar legislation is
pending in several other states.
Exner, Gilbert and Kinnison (1984) have
taken a novel approach to deciding when
remediation is complete for dioxin
contaminated sites near Times Beach,
Missouri. Their proposed strategy divides
the known area of contamination into "cleanup
units" of a size conducive to the use of
appropriate soil removal equipment (e.g.,
large earth moving equipment). A practical
unit size is 20 by 250 feet. Adjacent clean-
up units are established to ring the unit in
which remediation is initiated. A grid is
established for each unit by the intersection
of two lines parallel to the long axis of the
unit, spaced ten feet apart starting five
feet from one end of the unit. A fixed
geometric sampling pattern is established
with each grid square. Samples from the same
position in each square are then
composited. A number of aliquots, n, are
then selected randomly from each composite
for analysis. If there are m composite
samples, then there are N"nm data for each
clean-up unit.
The N data are then used to calculate the
arithmetic mean and standard deviation of the
n composite means. These statistics are tnen
used to compute an upper confidence limit for
the true mean concentration for the clean-up
unit. The degree of confidence is consonant
with the risk of not remediating a
contaminated unit when the environmental risk
is based on the mean unit concentration. If
this upper confidence limit statistic is
above a decision criteria (an acceptable true
mean concentration), then another two inches
of soil is removed from the entire unit and
the test repeated. Otherwise, the unit is
deemed to be clean and remediation stops.
Because of the manner of statement of the
null hypothesis, this procedure controls the
risk of not remediating a contaminated
unit. The risk of unnecessarily remediating
a clean unit remains uncontrolled however.
The financial loss of this uncontrolled risk
is perhaps quite small in that the cost of
removal of an additional two inches of soil
from a clean-up unit is small once
mobilization has occurred. This is
particularly true if there are sufficient
clean-up units to keep a field team occupied
while awaiting the analytical results from a
prior remediated unit test.
If the site under consideration has not
previously been declared contaminated, then
the situation is much more complex. The
complexity derives not from the lack of
sufficient statistical techniques with which
to design sampling plans, but from the need
to balance possible risk with the cost of
achieving a specified level of risk
control. In part, I believe that this is a
cultural problem. We are conditioned to
finding something. If we do not find
contamination, we feel disappointed. As a
society, we must come to grips with defining
the levels of risk we are willing to take and
be willing to bear the cost of achieving
those levels. Organizations who find
themselves subject to legislation such as
ECRA, must balance the risk of not finding
contamination, when in fact, it exists with
the cost of the potential liability in the
future. The statistician's professional role
is confined to the definition of risk
alternatives and providing sampling designs
consonant with the policy decisions made in
response to those definitions.
20
-------
EXAMPLE 4 - GROUND WATER MONITORING
Rarely has a specification of statistical
procedures created as much controversy as in
the case of the monitoring of ground water
around hazardous waste facilities. Since the
promulgation of the regulations now found in
40 CFR 265.92, there has been repeated
criticism of the application of Student's "t"
test in any form to the determination of
ground water contamination by a hazardous
waste facility. Under federal regulations
(40 CFR 265 and 264), the owners and/or
operators of hazardous waste facilities are
required to monitor the ground water around
the hazardous waste facility and to detect
the existence, if any, of ground water
contamination. While this appears a
perfectly reasonable objective, the
procedures prescribed for doing so are, in
most cases, at best naive and at worst wrong
(see Liggett, 1985 and Splitstone, 1986).
Here is a clear case where the promulgation
of a statistical hypothesis testing procedure
has relegated the importance of statistical
design to a secondary position rather than
the other way around.
The objective of monitoring ground water
near a hazardous waste facility is clear. It
is to determine whether the facility is
contaminating the ground water. A sampling
program to make such a determination should
be designed so that any contribution from the
particular facility can be distinguished from
other sources. These other sources, which
may be both anthropogenic and natural,
provide what is loosely referred to as the
background. The design of a sampling program
to distinguish the contribution of the site
interest from the background is complicated
by temporal and spatial variations in the
background.
Temporal variations in ground water
quality may in part be explained by recharge
rates which are weather related. Spatial
variations among wells may be due to a
variety of reasons. Some of this variability
may be reduced by ensuring that the wells
monitoring the aquifer beneath a particular
waste disposal site are located relatively
close together and drilled to the same
depth. Nevertheless, spatial variations may
occur even if there is no contribution from
the site being monitored. These variations
might be expected to be a result of
horizontal gradients across the site,
unexpected confluence of different aquifers,
or more likely, the natural geochemical
conditions of the area.
If one is fortunate enough to be able to
begin monitoring prior to the existence of
the facility, life is quite easy. It is then.
possible to investigate the existence of
potential temporal and spatial trends in
ground water quality without fear of any
confounding from the facility. Such a
circumstance, however, is rare indeed. More
likely, monitoring begins around an existing
facility. In either case, it would appear
advisable to design a study to investigate
the characteristics inherent to ground water
quality in the environs of the site or
proposed site prior to designing a long-term
monitoring program.
The design of the initial investigation,
where it includes the siting of new wells or
simply the analysis of existing data begins
by asking what is known or suspected about
the site. Input from the hydrogeologist and
geochemist is a must. Knowledge regarding
the probable direction of ground water flow,
possible seasonal or diurnal fluctuations and
the potential chemical changes in ground
water quality as it moves through the local
rock and soil is essential to the intelligent
interpretation of the data. The result of a
well-designed initial investigation should be
a "model" describing natural or "background"
variations in ground water quality. Such a
model is of necessity site specific. The
design of a long-term ground water quality
monitoring program should have as its
objective the detection of deviations from
this background model. Techniques
appropriate to testing hypotheses regarding
deviations from this background model must be
employed and should be permitted by federal
regulations.
CONCLUDING REMARKS
The design of schemes to collect data for
the purpose of providing meaningful
information regarding potential hazardous
waste sites is of necessity site specific.
The design must be based on a clear statement
of the information gathering objective and
can only be efficient if careful
consideration is given to the accumulated
prior knowledge regarding the site.
Attempts that have been made to specify
the statistical techniques to be employed in
environmental decision making often ignore
the presence of site-specific conditions
which may have a dramatic effect on the
outcome of the prescribed mathematical
calculations. Realizing that it is not
possible to promulgate regulations applicable
to every site, a clear regulatory statement
of the objective to be achieved coupled with
a specification of the degree of certainty of
achieving that objective would appear to be a
reasonable regulatory goal. The details of
data collection and interpretation are
appropriately left to the statistician in
concert with the knowledgeable scientists and
engineers.
The design of a sampling strategy for any
site investigation is more a logical thought
process that the application of statistical
tools. The purpose of environmental
regulation might well be served by carefully
outlining the steps in this thought process
and schooling those charged with the
protection of the environment in their
application. The role of statistics and the
statistician in this process have been
outlined nicely in the published notes from a
short course on nuclear site decontamination
and decommissioning (Barnes, 1981).
21
-------
REFERENCES
Barnes, H. 6., 1981, "Statistics and the
Statistician in Nuclear Site Decontamination
and Decommissioning - Lecture Notes for a 4-
Day Short Course," U.S. DOE Contract DE-AC06-
76RLO 1830, Pacific Northwest Laboratory,
Richland, West Virginia.
Exner, J. N., R. 0. Gilbert and R. R.
Kinnison, 1984, "A Sampling Strategy for
Cleanup of Dioxin in Soil," Appendix I, Quail
Run Site Hazard Mitigation Plan,
Environmental Emergency Services Company,
Chesterfield, Missouri.
Flatman G., 1984, "Using Geostatistics in
Assessing Lead Contamination Near Smelters,"
Environmental Sampling for Hazardous Waste
Sites, ed. Schweitzer, ACS, Washington, D.C.
Guttman, I., 1970, "Statistical Tolerance
Regions: Classical and Bayesian," Griffin's
Statistical Monographs and Courses, No. 26,
Hafner Publishing Co., Darien, Connecticut.
Isaaks, E.H., 1984 Risk Qualified Mappings
for Hazardous Waste. A Case Study in Non-
parametric Geostatistics, MSc Thesis, Branner
Earth Sciences Library, Stanford Univ.
Journel, A.G., 1984-a, "The place of non-
parametric Geostatistics, "Geostatistics for
Natural Resources Characterization, ed. Verly
et al., Reidel, Dordrecht, pp. 307-35.
Journel, A.G. , 1984-b, "Indicator Approach
to Toxic Chemical Sites, "EPA, EMSL-Las
Vegas, Report of Project No. CR-811235-02-0.
Journel, A. G., 1987, Short Course,
"Advanced Geostatistics For Mining and the
Environment," sponsored by Stanford
University, and U.S. EPA Las Vegas,
January 12-16, Las Vegas, Nevada.
Kempthorne, 0., ca 1966, Course Notes
"Design of Experiments," Department of
Statistics, Iowa State University, Ames,
Iowa.
Liggett, W., 1985, "Statistical Aspects of
Designs for Studying of Contamination,"
Quality Assurance for Environmental
Measurement. ASTM STP867.
Lurie, W., 1958, "The Impertinent
Questioner: The Scientist's Guide to the
Statistician's Mind," American Scientist,
March, pp. 57-61.
Marquardt, Donald W., 1987, "The
Importance of Statisticians," Journal of the
American Statistical Association, Vol. 8TJ
No. 397, pp. 1-7.
Morse, M. W., Sproat and J. Warren, 1986,
"Sampling and Analysis for Delisting
Petitions/Statistical Basis for Sampling,"
Second Annual Symposium on Solid Waste
5ym
BlTi
Testing and Quality Assurance, U.S. EPA
ife
Conference, July 15-18^
Splitstone, D. E., 1986, "A Statistician's
View of Ground Water Monitoring," Proceedings
of the National Conference on Hazardous
Wastes and Hazardous Materials, Hazardous
Materials Control Research Institute, pp. 8-
12.
Ubinger, E. B., 1987, "Statistically Valid
Sampling Strategies for PCB Contamination,"
to be presented at the Electric Power
Research Institute Seminar on PCB
Contamination, Kansas City, Missouri.
U.S. Environmental Protection Agency,
Interim Status Standards for Owners and
Operators of Hazardous Waste Facilities, 40
CFR 165.92.
U.S. Environmental Protection Agency,
Regulations for Owners and Operators of
Permitted Hazardous Waste Facilities, 40 CFR
264.97.
U.S. Environmental Protection Agency,
1982, Test Methods for Evaluating Solid
Wastes, SW-846, Second Edition, July.
ZTrchsky, J. and R. 0. Gilbert, 1984,
"Detecting Hot Spots at Hazardous Waste
Sites," Chemical Engineering, July 9.
22
-------
DISCUSSION
W. Barnes Johnson
U.S. Environmental Protection Agency, 401 M St. SW, Washington, DC 20460
I have been asked to provide a discus-
sion from the perspective of an
Environmental Protection Agency (EPA)
statistician. In communicating the EPA
perspective it is first important to note
that statistics are used in a variety of
EPA's program activities. A convenient
taxonomy for categorizing those activi-
ties begins with the most broad and wide
ranging and continues to the most detail-
ed:
RESEARCH AND DEVELOPMENT
POLICY DECISIONS
REGULATORY DEVELOPMENT
GUIDANCE DEVELOPMENT
PERMITTING
COMPLIANCE.
Research and development activities
are the most broad and wide ranging in
terms of their possible application and
impact on EPA's programs. Some of
Dr. Tiao's work has been done to support
EPA activities at the broad end of the
continuum. Policy decisions, regulatory
actions, and guidance development can be
quite broad, but can also be exacting
and specific in terms of application;
therefore they reside in the middle of
the continuum of areas where statistics
might be used at EPA. At the narrow
end of the continuum the concerns focus
on site specific sampling, analysis,
and decision making problems related
to permitting and compliance.
Dr. Splitstone's experiences and
certain aspects of Dr. Tiao's work have
been directed toward the problems that
are faced at the narrow end of the contin-
uum. The spectrum offered by these two
presentations seem to cover the broad
and difficult responsibilities that EPA
faces. There is a responsibility impos-
ed on EPA to be general so that a nation-
al scope is provided; but EPA also has a
responsibility to provide technical di-
rection that can be useful at specific
locations with unique circumstances.
One thing that becomes clear is that
at the EPA statistics are not a product.
This is not so true, for example, at the
Census Bureau or the Bureau of Labor
Statistics; at these Agencies statistics
are truly the product of interest.
Instead, EPA's products are primarily
regulatory decisions. These decisions
are based upon environmental, engineer-
ing, economic, political, financial,
risk, and health analyses. This differ-
ence reflects itself in the way that
statistics and therefore statisticians
are used at EPA. One might even expand
this to say that there is a difference
in the way problems are characterized,
approached, and solved. It is important
to remember that, when statistical work
• is called upon to provide perspective,
insight, or response relative to an EPA
interest or initiative, the statistical
analysis is only a tool that is used in
the ultimate decision. For example, in
the case of Dr. Splitstone's PCB contam-
ination problem, EPA may have been
interested in knowing if the building
was a health threat. Knowing that there
is a high confidence that no circular
hot spots of a reasonably small size are
present with PCB's above an action level
may or may not be sufficient information
for the decision maker. The implications
regarding exposure likelihood and the
resulting risk potential may need to be
analyzed. Similarly, in the case of
Dr. Tiao's work, EPA might be interested
in deciding whether vehicle inspection
and maintenance programs effectively
reduce ozone. However, the study may
not be useful for making a national
policy change given the messy and con-
founded information from a few sampling
stations located in Oregon and Arizona.
The primary conclusion is that statist-
ical methods are tools that support
EPA's program functions and, as such,
they must be designed to allow for or
assist in formulating defensible
decisions.
Although statistical methods are a
tool that can be called upon when an
investigator or policy analyst desires,
they are not always considered early,
during the design phase of a project.
Instead, "case study" or ad hoc "data
base" approaches are often used as meth-
ods for solving complex environmental pro-
blems. Consider the following hypothe-
tical examples that explain why these
data acquisition methods might be used.
An investigator needs to evaluate the
nature, status, or extent of a national
environmental problem. The investigator
decides that data must be obtained that
in some sense "represent" the national
situation. One approach is to pick in a
purposive manner geographic and/or tempor-
al locations which seem to reflect or
parallel the national situation. In some
cases, locations or situations may be
chosen to provide results that are both
extreme and average, and in this way
bound the problem. This is the "case
study" approach.
Another approach, which certainly has
been facilitated by information proces-
sing technology, is to attempt to collect
every piece of data that has ever been
accumulated by any source and store the
data in a uniform format on a computer.
The idea is that the availability of the
information in an accessible form ensures
a meaningful "representation" of the sit-
uation. This is the "data base" approach.
23
-------
There is no question that these meth-
ods have utility and merit given certain
conditions or investigative purposes;
however, these approaches can also dis-
tort subsequent analysis and evaluations.
Also, the conclusions that can be made
from analyses performed on information
collected in such a manner are limited.
Dr. Tiao, in one of his examples, was
provided data from two stations (one in
Corvallis, Oregon, and one in Portland,
Oregon). Someone selected these sites,
undoubtedly out of a network of sites,
and commissioned Dr. Tiao to conduct a
detailed and, I must say, excellent time
series evaluation. How was it decided
that these two sites were chosen? Cer-
tainly, it would have been preferable
for a statistician to be involved with
the experimental design of the sample
station monitoring network. Many issues
such as between location variablity,
frequency and duration of sampling, and
timing of instrument calibrations could
have been designed into the study and
Dr. Tiao clearly pointed these issues
out. Instead, interests in acquiring
"representative" information by develop-
ing a "data base" and performing a "case
study" have made the investigation,
although interesting and informative,
less than it could have been given some
design prior to implementation of the
sampling. Kruskal and Hosteller (1979a,
1979b, and 1979c) offer interesting back-
ground and discussion regarding the
meanings of "representative" sampling as
used in a variety of technical and non-
technical literature sources.
Dr. Splitstone is in the enviable
position of being able to influence the
statistical design, sampling, and analy-
sis for the purpose of answering specific
questions. Dr. Splitstone's examples are
site specific and in response to EPA's
permitting and compliance programs des-
cribed earlier. Dr. Splitstone attempts
to make no extrapolation beyond the
original scope of the sampling program.
Through a well conceived statistical
design and analysis, he is able to gen-
erate data which properly "represent"
and answer the specific, statistical
questions at hand.
I would like to return to an earlier
idea which couches statistical analysis
as a decision maker's tool. I also want
to continue with the theme that a well
conceived statistical design and sampling
program offer the best method for
generating data which provide a "represen-
tative" look at an environmental problem
problem. The EPA manager realizes that
decisions are made in a risky environ-
ment; there is a chance that wrong
decisions will be made. Statistically
designed sampling programs allow quanti-
fication of the likelihood of a wrong
decision. However, ad hoc, poorly
developed "data base" and "case study"
approaches that do not coincide with the
objectives under scrutiny, do not allow
quantification of the error or risk and
certainly provide a biased "representa-
tion" of the problem. As such, these
data acquistion approaches at worst pro-
hibit and at best constrain the use of
statistical analysis as a decision maker's
tool for evaluating risk.
There seem to be two general observa-
tions from these presentations. First,
involvement early in the design process
as in Dr. Splitstone's situation is re-
flected in results that satisfy the ob-
jectives of the study and allow decision
making in a risky environment. Second,
involvement with the data analyses after
data collection has been designed and
executed will result in analyses and
conclusions that must be well qualified
and reflective of the limitations of the
original sampling program. Dr. Tiao's
clear presentations of the strengths and
weaknesses of each example he presented
certainly support this observation.
REFERENCES
Kruskal, W. and F. Hosteller, 1979a.
Representative Sampling I:
Non-Scientific Literature.
International Statistical Review.
47s 13-24
Kruskal, W. and F. Hosteller, 1979b.
Representative Sampling IIs
Scientific Literature, Excluding
Statistics. International Statisti-
cal Review. 47: 111-127
Kruskal, W. and F. Hosteller, 1979c.
Representative Sampling III:
Statistical Literature. International
Statistical Review. 47: 245-265
24
-------
SPATIAL PREDICTION AND SITE SELECTION
Noel Cressie
Department of Statistics, Iowa State University, Ames, IA 50011
Abstract
This article considers various stochastic
models that could be used in the prediction of
pollutants at locations where no data are
available, based on data taken from a spatial
network of monitoring sites. The design problem
of selecting those sites is also discussed.
Introduction
The "default" stochastic model for a batch
of measurements on a phenomenon of interest, is
that they are Independent and identically
distributed (i.i.d.). But the notion that data
close together in time or space are likely to be
correlated (i.e., cannot be modeled as
Independent) is a natural one, and will receive
particular attention in this article.
Pure temporal models, or time series models
as they have come to be known, are usually based
on identically distributed observations that are
dependent and occur at equally spaced time
points. The ARNA models, popularized by Box and
Jenkins (1970), are important examples. Spatial
models of the type used in thj.s article have to
be more flexible since there is not the analogous
ordering in space and it is not reasonable to
assume that spatial locations of data occur
regularly.
Relatively recently (the last 10 to 15
years) , meteorological space-time data sets have
been collected for the purpose of studying the
effects of atmospheric pollution and, in
particular, acid rain (see e.g. Peters and
Bonelli, 1982). Daily data collection over a
number of years at various locations throughout
the Northeast USA yields a massive data set. But
most of it is temporal so that spatially speaking
the data are still rather sparse. Nevertheless
spatial prediction is just as important as
temporal prediction, since people living in those
cities and rural districts without monitoring
stations have the same right to know whether
their water or their air is polluted. Thus a map
of e.g., wet deposition (acid rain), is needed
based on samples collected from a spatial network
of monitoring sites.
One could take either a global or a local
approach to spatial prediction. Global methods
such as trend surface analysis attempt to model
the spatial variation of the phenomenon in terms
of large structural components (e.g., polynomial
trend). However the error process usually
contains small-scale spatially correlated
variation. Local methods such as kriglng attempt
to predict unknown values from weighted averages
of nearby values.
The spatial dependence can also be exploited
in the design phase, to establish an "efficient"
network of sites. Or if a given network needs a
site added or deleted, where should this occur?
I shall address these questions using a
stochastic spatial model, but it should not be
forgotten that there are important considerations
to take into account before statistical
optimization is performed. Among these are:
• Determine what to measure and how to measure
it.
• Be aware of such preliminary variables as local
weather conditions, historical data, and take
"baseline" data.
• Will measurements be replicated or not?
• Determine the spatial support (level of
aggregation) of the measurements.
Then finally, remember the following
aphorism of data collection: The necessary is
too expensive and the affordable is inadequate.
2. Spatial Prediction
Suppose that measurements, both actual and
potential, are denoted
{ z(a): g£D >,
where £ Is a spatial location vector in 3r or Ir
(two- or three-dimensional Euclidean space). The
index set D gives the extent of the region of
interest. Data
(2.1)
(2.2)
are observed at known sites
Henceforth I shall assume that
{ z(s_): s_eD } is a realization of a stochastic
process
(2.3)
But is this a realistic model when it is known
that (apart from measurement error) z(') is a
deterministic? What is the source of the
randomness in (2.3)? To justify the stochastic
models used in this article, I shall argue as a
Bayesian would. The random variation comes from
the uncertainty caused by not observing z(') at
all locations in D. In fact with no data the
"prior" might be that the physical process
behaves as a second-order stationary stochastic
process would. This is a standard tactic in
science; when there are infinitely many possible
surfaces z(*)> deal with the possibilities
statistically. Now once data (2.1) are observed,
update the prior and work with the posterior
distributions
: S.ED
25
-------
From the data (2.1) it is required to
predict (possibly noiseless versions of)
Z(g0) or U/|B|) JB Z(s)ds, where |B| - /g ds.
I shall present both the trend surface model
and the random field model, and then compare the
two.
2.1 Trend surface model
Suppose Z(') can be written as
Z(s) - u^s) + e^s); seD,
(2.4)
where e.(') is a zero mean white-noise process;
i.e., l
E(e (s)) - 0 ; seD
1 ~
; * u; s.ueD.
(2.5)
cov(ei(s),
Furthermore, suppose the trend surface u.(«) can
be written as, -•
P.(s) - B.f.(s) + ... + 0 f (s); seD, (2.6)
1 1 1 ~ p p ~ «•
where f (s),...,f (s) are known functions of
1 ~ p ~
spatial location s. Notate the unknown
coefficients as
Z(sn))',
8 • (6lt...,ep)',
the data as,
Z - (Z(sj
the errors as
£ - (e(s1),...,e(sn))',
and the trend-surface components as,
F - {f1(s±) ... fp(gi)},
(2.7)
(2.8)
(2.9)
(2.10)
where the expression in braces represents the
i-th row of the (n x p) matrix F.
Then,
£,
(2.11)
where E(e) • 0, and var(e) - la,. This is a
~ ~ ~ 1
2
linear model with unknown parameters ( B, op .
The primary goal is not to estimate these
parameters but rather to predict Z(s-); s eD.
It is easy to see that the best (i.e., minimum
mean-squared error) linear unbiased predictor is
where
(f 1(sQ),.. . ,f (s^)) ' , and
(2.13)
the best linear unbiased estimator of g. This
predictor has mean squared error,
mse(Z(8{))) - E(Z(sQ) -
. (2.14)
2.2 Random field model
Recall that the trend surface model
specifies that two data values, no matter how
close they are to each other, are statistically
independent. Most scientists working in geology,
soil science, crop science, forestry, environmen-
tal science, etc., find this assumption contra-
dicts their underlying notions of the mechanisms
that resulted in the data they are studying. Put
simply, in many scientific studies where data
have a spatial label, it is felt that observa-
tions closer together are more dependent than
those further apart. This is modeled as
Z(s) - u (s) + S (s); s£D,
(2.15)
where U.( •) is the large-scale, deterministic,
mean structure of the process, and 5_( •) is the
small-scale, stochastic structure that models the
spatial dependence referred to above. That is,
E(6,(s)) - 0 ; sen
2
cov(62(s), «2(u)) s C(s,u) ; s.ueD.
(2.16)
The representation (2.15) is precisely the sort
of approach time series analysts take; after
detrending, they model the small-scale variation
with stochastic processes that capture the
temporal dependence (e.g., autoregressive,
integrated, moving average processes).
In order to estimate the quantity and
quality of the spatial dependence, some minimal
assumptions of stationarity have to be made. A
process Y(') which satisfies
E(Y(s)) - V ; se
var(Y(a) - Y(u)) -
(2.17)
- u) ; S.UCD, (2.18)
is said to be intrinsically stationary (over
D). The quantity 2Y(') is a function of the
vector difference s - u_, and is called the
variogram (Y(') is called the semivariogram) .
Intrinsically stationary processes are more
general than second-order stationary processes,
which assume
cov(Y(s),
C(g.-
where C(') is a function of the vector difference
and is usually called the au toco variance
function. It is the preferred tool of time
series analysts, but for prediction purposes it
is not as flexible as the variogram; see Cressie
(1988) for more details and a summary of the
properties of the variogram.
The geostatistical method (Matheron, 1963)
of spatial prediction is very simple. Estimate
the variogram with a nonparametric estimator
A
, fit a model 2T(h;9) whose form is known
26
-------
apart from a few parameters 8 (i.e., fit
2r(h;8) to 2Y(h)), and finally use the fitted
variogram in the best linear unbiased predictor
(I.e., the kriging predictor) as if it were
known. However it is more the exception than the
rule that Z(-) is intrinsically stationary;
Cressle (1986, 1987a) investigates how to proceed
when respectively y.(') is not constant and
£.(•) does not possess a variogram.
Let us assume for the moment that in (2.15)
P.(s) 2 y , and 5,(«) is a zero mean intrins-
2 "• 2 L
ically stationary stochastic process. Matheron
(1963) has proposed the following variogram
estimator
£i) - Z(sj)), (2.19)
- s - h} and |u(h)|
T is a symmetric (n + l)x(n + 1) matrix. The
minimized mean-squared error is given by,
where N(h) =
is the number of distinct ordered pairs in the
set N(h). If the data are believed to be
approximately Gaussian but with haphazard
"contaminated" observations present, Cressie and
Hawkins (1980) advocated the robust estimator,
2Y(h) =
N(h)
|z<£i> -z<2j>r
{0.457 + 0.494/|u(h)|}.
To either of these is fit a parametric model
2Y(h;6). Cressie (1985) proposes a weighted and
a generalized least squares approach, while
Kltanidis (1983) proposes a restricted maximum
likelihood (REML) approach (i.e., estimation of
6 is obtained from generalized Increments of the
data). Both methods have their strengths.
Assume that the estimation and fitting of
the variogram has been carried out successfully,
and that the variogram can be considered as
known. Suppose it is desired to predict Z(s.)) at
some unsampled spatial location s. using a linear
function of the data:
. n
Z(en ) - I X,Z(sJ. (2.20)
(2.22)
Matheron (1963) calls this spatial prediction
method, (ordinary) kriging, after D. G. Krige (a
South African mining engineer who in the 1950s
became aware that predicted grade distributions
were not a good indicator of actual grade
distributions). In fact the optimal linear
predictor (2.20) can be found earlier in the
literature in works of Kolmogorov (1941) and
Wiener (1949). The kriging predictor given by
(2.21) assumes the process Z(') contains no
measurement error; Cressie (1987b) has details on
how to predict noiseless versions of Z(').
Several important design considerations
become apparent from (2.21) and (2.20), viz. the
optimal predictor and the minimized mean-squared
error depend on the relative locations (2.2) of
the data, the semivariogram values
T(') and the sample size. These considerations
will be picked up again in Section 3.
2.3 Comparison of trend surface model and random
field model
Agterberg (1984) describes a small
experiment to compare prediction based on a
quadratic trend surface, prediction based on
ordinary kriging, and prediction based on a
quadratic trend surface plus kriging the
residuals. It is not a priori true that the
latter prediction method would perform the best
since overfitting may occur. Agterberg took a
contour map of the terrain at the top of the
Arbuckle formation in central Kansas, within a 30
mile by 30 mile area. Then he randomly chose
spatial locations from that area to provide the
data. Finally spatial locations for prediction
were randomly chosen and relative mean-squared
errors:
* 100Z
(z - z(a0))
which is unbiased (i.e., E(Z(s{))) - E(Z(s{))) )
and minimizes the mean-squared error
E(Z(s0) - Z^))2.
Then the optimal X's satisfy an (n-t-1)-
dimensional linear equation that depends on
~ S>; i.J " 0,1,....n:
r i • 2.
where X - (Xj
2" (^£j - S
r -
(2.21)
,X2,... ,\n,m) ' ,
- Eg), 1)', and
i-n+1, j-l,...,n
n+1;
were computed. From the limited experiment,
trend surface prediction (26.22) was slightly
worse than ordinary kriging (23.3Z), and both
were dominated by the combination of the two
(19.1Z).
Laslett et al. (1987) performed a very
interesting spatial experiment to compare the
performance of several two-dimensional spatial
prediction methods. They note that rival methods
of prediction are based on different models, so
that a proper comparison of methods cannot be
made analytically or on artifically generated
data. They then propose to make the comparison
on real data sets, viz. soil pH. Among the
eleven predictors (which included quadratic trend
surface and kriging) used, kriging and Laplacian
smoothing splines performed the best. In their
study, the presence of dependent small-scale
variation was strongly suggested, justifying the
use of geostatistical methods for prediction.
27
-------
2.4 Nonstationary mean structure
In Section 2.3, there was mention of a
combination (trend-surface-plus-kriging)
prediction method. In fact Matheron (1969) has
called this universal kriging. Recall the random
field model (2.15). Let the large-scale
variation be given by
v,(s)
/ "-
+ ... + e f (
q q
If the covariance structure C(s,u) in (2.16) is
known it is a simple matter to estimate
6 =(8 .,..., 6)' by generalized least squares,
and predict Z(
(3.6)
Therefore, by starting on the far right of
(3.6) and optimizing successively over each
.factor, a design could be found by adding one
point at a time; the (i+l)-th site is placed in
an optimal location having already fixed the i
previous sites. Is this iterated design optimal?
The answer to the above question is, "no".
28
-------
A very simple counter example is given by Cressie
(1978), for Z(') a process of Independent
increments on the real line. However, if one
defines the design measure to be the limiting
(n •*• <">) density of design points over the design
space, then does the asymptotic iterated design
measure equal the asymptotic optimal design
measure? This is an open problem, but the answer
is probably, "yes"; see Wu and Wynn (1978).
Although the iterated design may produce
design points
V 5 { s......s }
"• i "*n
(3.7)
that are quite different from optimal design
points when n is finite, it still offers the
possibility of providing a suitable yardstick for
other, more computer-intensive, algorithms. One
such is the annealing algorithm, adapted by Sacks
and Schiller (1987) for the spatial design
problem. Suppose n is fixed; the problem is to
find the best V. Again modify the notation of
(3.4) and write S2 as S2(V) to show its
n
dependence on the design points.
The annealing algorithm assumes a starting
value v'O/. At the J-th stage, the current value
is
«> =
= { .,«> ..... s
""1
}
(3.8)
The algorithm proceeds as follows:
• Choose v"' in some random or systematic
manner (e.g., single point replacement)
Replace V(J^ by V(J+1) with probability
1 ; if S2(V(3+1)) - S2(V(j)) < 0
it , where
j
Notice that the algorithm always demands a change
provided there continues to be an improvement in
'the objective function S . For suitable choice
of Y in it , it will always avoid getting stuck at
local minima by allowing a positive (ever-
decreasing) probability of changing even when no
Improvement in the objective function is
achieved. Geman and Geman (1984) is a good
source for the properties of this stochastic
relaxation technique.
But the question remains. At what stage
should the algorithm be stopped? I suggest that
the iterated optimal S obtained from (3.6) could
be used as a yardstick below which the quantity
2 M}
S (VVJ/) must fall. Other two at-a-time, etc.
iterated designs that have rather straightforward
computational solutions, could also be used as
yardsticks.
Finally, the question of how many sites
should be included in the network, needs to be
addressed. As always, there are conflicting
forces at work here; too many sites and the
monitoring program will bankrupt itself, too few
sites and the information is not adequate enough
to allow definitive conclusions. One way to
approach this is to take a predictor and
calculate its average mean-squared error (3.4)
under a special case of the model (3.1) where the
data Z(s ),...,Z(s ) are independent, and under
the model (3.1) where the data Z(s ),...,Z(s )
are dependent through the error random field
«(•)• Call the former S2 and the latter S2 ,.
n,e n,6
Then the effective number of observations neff is
obtained by solving
neff, 6 sn,e
(3.9)
Thus the network designer could think about
having "neff" independent observations and make
decisions about precision and cost as if the data
were independent. When an "neff" has been
decided, it can be converted back to the proper
"currency" using (3.9).
A simple example would be to use the
predictor Z to construct (3.9). Suppose there is
an SQ of particular Interest. Then the above
considerations lead to
n n
neff - n{ £ E cov(Z(s.),Z(s,))/n
1-1 j-1 "* ~J
n
- 2 I cov(Z(s1),Z(s()))}.
It would be dangerous to demand too much
from this rather heuristic approach, since "neff"
very much depends on the criterion chosen in
(3.9). In other words, an "neff" constructed
according to one criterion but used for a
different purpose in another criterion, may yield
nonsensical results.
\exp{-logj(S2(V(;i'l'1)) - S2(V(3))/Y); else . Acknowledgement
This research was supported by the National
Science Foundation under grant number DMS-
8703083.
References
Agterberg, F. P. (1984). Trend surface analysis,
in Spatial Statistics and Models, eds. G. L.
Gaile and C. J. Vttllmott. Reidel, Dordrecht,
147-171.
Box, G. E. P., and Jenkins, G. M. (1970). Time
Series Analysis. Forecasting and Control.
Holden-Day, San Francisco.
Cressie, N. (1978). Estimation of the integral
of a stochastic process. Bulletin of the
Australian Mathematical Society. 18, 83-93.
Cressie, N. (1985). Fitting variogram models by
weighted least squares. Journal of the
International Association for Mathematical
Geology. _17_. 563-586.
29
-------
Cressie, N. (1986). Kriging nonstatlonary
data. Journal of the American Statistical
Association, 81, 625-634.
Cressie, N. (1987a). A nonparametrlc view of
generalized covarlances for krlglng.
Mathematical Geology. 19. 425-449.
Cressie, N. (I987b).
ordinary krlglng.
Spatial prediction and
Proceedings of the •
Mathematical Geologists of the United States
Meeting (Redwood City, CA, April 1987),
forthcoming.
Cressie, N. (1988). Variogram, in Encyclopedia
of Statistical Sciences, eds. S. Kotz and N.
L. Johnson. Wiley, New York, forthcoming.
Cressie, N., and Hawkins, D. M. (1980). Robust
estimation of the variogram: I. Journal of
the International Association for Mathematical
12, 115-125.
Cressie, N., and Read, T. R. C. (1986). Spatial
data analysis of regional counts. Statistical
Laboratory Preprint No. 86-46, Iowa State
University, Ames.
Geman, S., and Geman, D. (1984). Stochastic
relaxation, Gibbs distributions and the
Bayesian restoration of images. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, ^, 721-741.
Kltanidis, P. K. (1983). Statistical estimation
of polynomial generalized covariance functions
and hydrological applications. Water
Resources Research, 19, 909-921.
Kolmogorov, A. N. (1941). Interpolation and
extrapolation of stationary random
sequences. Izvestiia Akademii Nauk SSR,
Serlia Matematlcheskala, 5_, no. 3.
Laslett, G. M., McBratney, A. B., Pahl, P. J.,
and Hutchinson, M. F. (1987). Comparison of
several spatial prediction methods for soil
pH. Journal of Soil Science. 38, 325-341.
Matheron, G. (1963). Principles of
geostatistics. Economic Geology, 58. 1246-
1266.
Matheron, G. (1960). Le Krlgeage Universe!.
Cahiers du Centre de Morphologic
Mathematique, no. 1, Fontainebleau, France.
Peters, N. E., and Bonelli, J. E. (1982).
Chemical Composition of Bulk Precipitation in
the North-Central and Northeastern United
States, December 1980 through February 1981.
U. S. Geological Survey Circular 874.
Sacks, J., and Schiller, S. (1987). Spatial
designs, in Proceedings of the Fourth Purdue
Symposium on Decision Theory and Related
Topics, eds. S. S. Gupta and J. 0. Berger.
Springer, New York, forthcoming.
Tukey, J. W. (1977). Exploratory Data
Analysis. Addison-Wesley, Reading.
Wiener, N. (1949). Extrapolation, Interpolation
and Smoothing of Stationary Time Series. MIT
Press, Cambridge.
Wu, C-F. J., and Wynn, H. P. (1978). The
convergence of general step-length algorithms
for regular optimum design criteria. Annals
of Statistics, £, 1273-1285.
30
-------
SPATIAL AUTOCORRELATION: IMPLICATIONS FOR SAMPLING AND ESTIMATION
Evan J. Englund
U.S. Environmental Protection Agency, P.O. Box 93478, Las Vegas, NV 89193-3478
ABSTRACT
Spatial autocorrelation is a normal result of
the physical and chemical processes Which operate
in the environment. All measurable environmental
parameters will exhibit spatial autocorrelation
at some scale. This not only causes technical
problems in sampling and estimation, but leads to
more fundamental problems in communication. Terms
such as "chemical concentration", "representative
sample", and "frequency distribution" which are
commonly used and well understood in laboratory
situations, can become essentially meaningless
when applied to environmental measurements with-
out an explicit statement of the spatial scale
(support) being considered.
A simulated spatially autocorrelated distribu-
tion is used to illustrate the changes in concen-
tration, frequency distribution, and sample qual-
ity associated with changes in support. Vario-
grams computed from the simulated data illustrate
the relationship between spatial variability and
standard QA/QC. Practical suggestions are made
for sampling and estimation of spatially autocor-
related sites.
INTRODUCTION
The purpose of environmental sampling programs,
like other samplings, is to use the Information
obtained from the sample to make inferences about
the larger population from which the sample is
drawn. For example, an industrial waste lagoon
may be sampled to determine whether its mean con-
centration exceeds an allowable maximum value for
a particular chemical. In other cases, sampling
may be done to determine whether the level of pol-
lution has Increased or decreased from previous
levels at a site, or to Identify the locations of
polluted areas. A typical sequence of events in
an environmental investigation might include the
design of a sampling plan, collection of samples,
laboratory chemical analysis, interpretation of
data, and finally, a decision based on the inter-
pretation.
Decisions made by various federal, state, and
local agencies may require remedial, preventative,
or punitive measures which may have substantial
consequences in terms of human health as well as
economics. It is obviously Important that envi-
ronmental Investigations be conducted in such a
manner as to ensure that decisions are based on
the best possible information. It should be
emphasized here that the information on which
decisions are made is not the sample data itself,
but the interpretation of the data; that is, the
estimates of the characteristics of the larger
population which are made from the data.
Unfortunately, the interpretation of field
data is made difficult by the nature of the
environment Itself, which is not always amenable
to the methods of sampling and data analysis
designed for use under more controlled conditions
such as laboratory experiments. Although the
same sampling and data analysis terminology is
used in both the laboratory and the field, the
precise meanings of many terms are significantly
different. The resulting ambiguities can lead to
problems in both the design of sampling programs,
and the interpretation of the results. This
paper will explore the causes of these problems,
attempt to clarify the terminology, and offer
some practical suggestions for sampling design
and data interpretation.
SPATIAL AUTOCORRELATION
Spatial autocorrelation is the basic cause of
the problems to be discussed. All measureable
environmental parameters exhibit spatial (and
temporal) autocorrelation at some scale. This
means that over some range of distances, measure-
ments will tend to be more similar to measurements
taken nearby than to measurements taken farther
away. For practical purposes, stating that a
phenomenon is spatially autocorrelated is equiva-
lent to stating that it is not uniformly distrib-
uted, but autocorrelation is more easily quanti-
fied in statistical terms. The term spatial
correlation is often used interchangeably with
spatial autocorrelation, but the former implies
that the measured values are correlated with
their locations, as temperature, for example, is
correlated with distance from the equator.
The physical and chemical processes that con-
trol the fate and transport of chemicals in the
environment do not operate at random, although
most events include what may be considered random
processes, in the sense that they are too complex
to be predicted in detail. Random and determin-
istic processes may operate simultaneously at
several scales to produce the phenomenon being
measured. The measurement of precipitation (with
or without acid), provides a good illustration.
Regional weather patterns produce the conditions
necessary for rainfall over a large area. Within
the area, individual clouds or clusters of clouds
form, apparently at random locations, due to
local fluctuations in temperature and humidity.
If one collected pairs of rain-guage measurements
taken at various spacings, one would expect to
observe strongly correlated readings at separa-
tions much smaller than the size of the average
cloud, weaker correlation at regional-scale
separations, and essentially zero correlation for
separations larger than the size of the region.
Although contiguous rain guages would be expected
to show the greatest correlation, it would not be
perfect, due to the fact that the two guages may
not receive Identical amounts of rain (spatial
variability), and even if they did, the readings
may not be identical (measurement error).
After a major rainfall, for example, we might
be asked to estimate the average precipitation
over a watershed from a given set of rain guage
readings, in order to estimate the watershed's
contribution to a flood crest. Alternately, we
might want to know how many guages we would need
in order to predict flooding with a specified
degree of accuracy. The methodology commonly
known as geostatistics (Matheron, 1963; Journel
and Huijbregts, 1978) was developed in the mining
industry to deal with questions of local ore
grade estimation, and has since been shown to be
31
-------
generally applicable to most situations where
spatial autocorrelation Is present.
A basic assumption in geostatistical analysis
is that the spatial autocorrelation exhibited in
a set of measurements can be represented by an
underlying autocorrelation function which is
valid over the region of interest. For many
environmental phenomena, this assumption can be
intuitively related to the controlling processes.
The variogram is one commonly used method for
quantifying spatial autocorrelation (Fig.l).
Experimental variograms are computed from sample
data by examining all possible data pairs, group-
Ing them by distance classes, and computing a
variance for each distance class using the stand-
ard paired-sample formula. A theoretical model is
then fitted to the experimental data. A variogram
contains exactly the same Information as a plot
of correlation coeflclents for the same distance
classes (one can be transformed into the other by
Inverting the plot and rescaling the y-axis).
Itnnr
Totcal
M.I 75.«
Mitittt
Fig. 1. Typical variogram plot and fitted model.
Variances are computed from paired sample differ-
ences for pairs in successive distance classes
and plotted against distance. The fitted model
exhibits commonly observed features: a random
component or "nugget" at the y-axls intercept,
and an Increase in variance with distance up to
a maximum "range" of autocorrelation.
SAMPLE SUPPORT
When spatial autocorrelation is present, the
physical dimensions of the sample become important
considerations. The term 'support', defined as
the "size, shape, and orientation of the physical
sample taken at a sample point" (Starks ,1986), is
used to avoid confusion with the statistical size
(number of observations) of a sample. The support
of a soil core, for example , would be its diam-
eter and length. In the case of a rain guage
measurement, the support would Include the diam-
eter of the orifice, and the time of accumulation.
When the sample support changes, the statis-
tical properties of the sample set such as the
variance of samples and the sampling variance
also change. These changes make sample support a
critical element in the design of a sampling
program. The concept of support applies equally
to the stage of data Interpretation. The support
on which decisions will be made is rarely the
same as that of the samples. The choice of
decision support can significantly affect the
outcome of an analysis, and should be considered
before a sampling program is undertaken. The
idea of support and its ramifications will be
developed further in the following sections.
CHEMICAL CONCENTRATION
The term 'chemical concentration' is meaning-
less -in the absence of a specified support.
Atoms and molecules represent the smallest scale
at which elements and compounds can be said to
exist. If samples are taken at this scale, the
true concentration of any substance within the
sample will be a discrete binary phenomenon - the
concentration will be either 100Z or 0%. Any
larger sample is made up of a mixture of discrete
components, and its true concentration will be
the sum of the weight or volume of the analyte
divided by the total weight or volume of the
sample; i.e., the average over the sample sup-
port. Usually the entire volume of a sample is
not measured directly, but the measured concen-
tration of a subsample is used to estimate the
the mean concentration of the original sample.
Great care is taken during the preparation of a
sample to ensure uniform mixing, and if necessary,
to reduce the particle size of the material so
that any subsample used for analysis will have a
true mean concentration very close to the true
mean of the entire sample. If the mixing is
effective, neither the size of a subsample nor
its location should have a significant effect on
the outcome of the analysis...for practical
purposes, one subsample is as good as any other.
In the field, where the area being investi-
gated cannot be uniformly mixed, the situation is
quite different. Layers, crystals, clumps, or
other high concentrations of a substance often
occur such that if a given sample had been taken
at a slightly different location (sometimes only
a fraction of the sample size away), a signif-
icantly different true sample concentration
would be found. The classic example of this was
observed in placer gold deposits, where the
presence or absence of a single, miniscule gold
nugget in a sample would make the difference
between an assay indicating high-grade ore, and
one indicating waste. This led to the interest-
ing term 'nugget effect' often being applied to
the y-axis discontinuity in variogram models.
Like changes in location, changes in support
also result in changes in concentration. The
true value of a point sample is 0 or 100Z; the
true value of any larger volume centered on the
point is the mean concentration over the volume.
If the dimensions of a sample are increased or
decreased; if the shape of the sample is changed,
say from a sphere to a cube; or if the orienta-
tion of a non-spherical shape is changed; the
sample will contain a different set of molecules,
and probably a different true concentration. For
any point in space which represents a potential
sample 'location', an infinite number of possible
sample supports exist centered on the point, and
an infinite number of possible true concentra-
tions which can be said to be the concentration
at that location. Obviously, we must conclude
that any reported measurement of chemical concen-
tration in the environment is essentially meaning-
32
-------
less unless the support is also reported. Like-
wise, a statement such as "remedial action will
be taken if the concentration of cadmium in soil
exceeds 500 ppm" is also meaningless unless the
support is specified.
A SIMULATED EXAMPLE
An example based on a simple computer simula-
tion serves to illustrate the support problem. A
blank computer screen is 'polluted' with a
'realistic-looking' pattern of pixels (Fig. 2).
The algorithm which was used to generate the pat-
tern first selected 25 points at random In the
central part of the screen. Each of these points
was used as the center of a cluster of up to 2000
points scattered around the center at random
angles and approximately normally distributed
distances (sum of three uniformly distributed
random values). Blank pixels were initialized
with a value of zero, and incremented by one each
time they were hit by the point generator. A
color terminal can be used for a more effective
display than Fig. 2, because pixel values greater
than one can be represented by various color
codes. The details of the algorithm are not cru-
cial. Any algorithm which conditions the outcome
of a random process on a prior outcome of the
same or another random process, can be used to
generate spatially autocorrelated patterns.
Variations on the drunkard's walk (e.g., two
steps forward, three steps back) are effective.
Fig. 2. A simulated, spatially autocorrelated
distribution of "pollutant".
When the pattern has been generated, we have
an area subdivided into pixel-sized units, for
each of which we know the exact pollution concen-
tration. The pixel scale of support is considered
to be smaller than any support size we would be
interested In, but since we have exact knowledge
at that scale, we can now combine pixels into any
larger support areas we choose, and compute the
exact average pollution concentrations over these
areas. .
Figs. 3-7 illustrate the results of this kind
of change in support. In Fig. 3, the screen area
was divided into support blocks of 5x5 pixels,
and the mean pixel value was computed for each
block. The block means were grouped into class
intervals and represented as shaded patterns on
the map. The histogram scaled by area, and uni-
variate statistics, are also shown for the set of
non-zero value blocks. Figs. 4-7 repeat this
process at supports of 10x10, 20x20, 40x40, and
80x80 pixels, respectively.
HfllkltJ IlKMTM In CMC
blub flit : I
vTFI .
I2M.I
2WW.I
0
e
I
HMH.I
t
k
HM.I
_
l.l
~]
pi
1 h->T_
Statistics
NtoUl: 2321
1 Kissing: 1
DIM: 2321
Nu>: 1.457
Utriuce: 8.193
SUe.: 1.439
CM: i.%1
Skewcss: 1.681
brtnis: fc.SU
fli*: I.MR
«l: 8.128
IMiu: 1.328
«3: I.MI
1 ' ' ' i 1 1 B«. 3.WB
1.1 a.i i.i 4.1
Fig. 3. Map and histogram of true "concentrations" of simulated pollutant averaged over
5x5 pixel blocks. Darker shades on map represent higher values: Blank «0.4 units per
pixel); Light shade (0.4 to 0.8); Medium shade (0.8 to 1.2); Dark shade O1.2).
33
-------
HtKkttl Hiitnro tn CMC
Mock uzt : 10
,n — >
S00M.0
40000.0
0
C 10000.0
1
3
r
I
k 20010.0
10000.0
0 0
_,
Tu
rh
i rk^_
Statistics
H total: 674
II Hini*: 1
H Used: 674
feat: 1.393
Uariance: 1.177
SU»: 1.428
01: 1.868
Stamen: 1.644
hrtosis: 6.372
Bin: 1.811
41: 1.868
IMian: 1.258
43: 8.6M
••• 1 i i 1 1 nuc -f.va
0.0 1.0 1.0 1.0 4.0
CMC
Fig. A. Map and histogram of true "concentrations" of simulated pollutant averaged over
10x10 pixel blocks. Shades represent the same values as in Fig. 3.
10BB0 0
40000.0
30001.0
k 20000.0
10M. 0
0
-
0
IkitkUd •iilctmi Icr CMC
klcck fix : 20
Q.
1
Iru n
1.0 >.0 3.0 4
CMC
Statistics
H Total: 192
H Hissing: 8
HUseJ: 192
Dean: 8.345
Variance: 8.161
SUn: 8.482
CU: 1.165
Stamen: 1.668
lurtosis: 6.228
Din: 8.8M
41: 8.838 !
Heoian: 8.188
43: 8.538
Rax- 2.148
0
Fig. 5. Map and histogram of true "concentrations" of simulated pollutant averaged over
20x20 pixel blocks. Shades represent the same values as in Fig. 3.
Utifhttd MistogrM tmr cone
block si» r 4t
.1001*05
a
I.IME<05
•
J
!
k.400[»05
2iBE4>iK
.<^K*«n
a •
1
S 1 1 1
NToUl
II Iliain,
NUsal
HMD
Vwincc
Sttev
CU
Skemess
Xurtosis
din
i 41
ri
(Mian
43
•.« 1.0 2.8 3.t 4.8
cone
i st i cs
59
1
59
8.281
e.I28
8.358
1.275
1.484
4.773
B8W
8.818
8.128
1.485
1 CCA
l.bbD
Fig. 6. Map and histogram of true "concentrations" of simulated pollutant averaged over
40x40 pixel blocks. Shades represent the same values as in Fig. 3.
34
-------
UtifhUd NifUgpM f>p CMC
MftCk sist = M
,:«,« 13 '
.M«
-------
REPRESENTATIVE SAMPLES
'Representative' is one of the most misused
terms in environmental sampling (Splltstone,
1987). In this case, the effects of spatial
autocorrelation in the field combine with incon-
sistency in the use of the term 'sample' itself,
resulting in a great deal of confusion. A sta-
tistical sample of n observations is sometimes
said to be representative if the mean, and or
other parameters of its frequency distribution
are 'similar' to the population from which it was
drawn. In a somewhat more abstract sense, the
distribution might be called representative if it
was properly drawn (that is, if each member of
the population had an equal chance of being in-
cluded in the sample) , even though it is not
similar to the population.
An environmental sample such as a soil core,
collected in the field and sent to a lab for
analysis, is not a 'sample' in the commonly used
statistical sense. It might be more appropri-
ately called a member of a population which has
been included in a sample of that population. A
set of n such environmental samples , or observa-
tions, make up a statistical sample which can be
representative of the population of possible
observations of the same support within the sam-
pled domain. It can not, however, be directly
representative of any other support population,
except for the population mean, which is constant
within the domain regardless of support. In the
context of ordinary random statistics, it is
therefore difficult to ascribe any representative-
ness to an individual environmental sample other
than being a member of a representative set at
that support.
In the case of environmental field investiga-
tions, however, the whole point of a sampling
plan is often to obtain samples (observations)
that are representative in a different sense,
namely, representative of their local areas.
This spatial kind of representativeness Is also
hard to define. Intuitively, what we are really
after is a measurement at a location that is
close to the local mean concentration, but as we
have seen, the local mean is a function of the
support. Thus, until we define the support which
we want our observation to represent, we can't
say anything about how good the observation is.
Given a set of n observations within a domain,
a geometric support neighborhood can be defined
around each observation as the set of all points
closer to it than to any other observation (i.e.,
» Voronl polygon). Voronl polygons are an approx-
imation of the neighborhoods with which the
observations are most correlated, and give an
Idea of the spatial resolution of the set of
observations.
APPLICATIONS
In most of the above discussion, we have dealt
with the uncertainties and ambiguities involved
in taking and measuring samples of the physical
environment, and the inherent limitations of re-
lating measured values back to some 'real* char-
acteristic of the environment. In spite of these
problems, most of which have not been adequately
dealt with theoretically, the world marches on.
Site investigators are still faced with the neces-
sity of collecting samples in the field, and
using them to make interpretations and decisions.
The remainder of this paper will focus on prac-
tical approaches to sampling and data analysis
which will help reduce ambiguity, and improve
data quality and usefulness. We will continue to
use the simulation as an illustrative example.
SAMPLING DESIGNS
One of the common problems facing a' site
Investigator Is the layout of a sampling pattern.
Assume for the moment that you have a predefined
domain that must be sampled and that the number
of samples you can take has been fixed by budget
constraints. Where should you take the samples?
Most available guidance documents such as SW-846
recommend random sampling as the general solution.
However, it has been shown (Olea, 1984, Yfantis,
et al., 1987) that in the presence of spatial
autocorrelation, sampling on a systematic grid
will produce a more efficient sampling. If
spatial autocorrelation is not present, the
regular grid will be no better or worse than a
random sample. Because the regular grid is a
periodic sampling in space, it is obviously con-
traindicated when the presence of a spatial
periodicity in the phenomenon is suspected at a
scale near that of the proposed grid. Fortun-
ately, this situation is not common, and the
regular grid, because of Its simplicity and
effectiveness, can be used in most spatial sam-
pling programs.
The efficiency of the regular grid is a result
of the minimization of spatially clustered data
which are duplicating each other's information.
As a practical matter, small departures from
regularity do not have major effects, so that
field crews can make offsets from the grid to
avoid obstacles without affecting the results of
a study. This is particularily true when the
autocorrelation function exhibits a large random
component (nugget effect) relative to the auto-
correlated component. The larger the relative
nugget effect, the greater the tolerance area
around the regular grid nodes.
SAMPLE SUPPORT AND QA/QC
To illustrate the potential impact of sample
support on the quality of an Investigation, we
will sample our simulation at two different sup-
ports, and use the results for interpolation of
concentrations. The sample locations are on the
regular grid shown in Fig. 8. The first sampling
is done at the single pixel support. Each sample
is assigned the true value of the pixel at that
location; we are assuming no sampling or analyt-
ical error. The second sampling is done on the
same grid at a 2x2 pixel support. In this case
the true mean value of the 4 pixels in the sample
is used.
36
-------
Fig. 8. Location of samples taken from the simu-
lated distribution. True values of single pixels
and 2x2 pixel blocks were taken at each point.
Variograms computed from the two sample sets
are shown in Fig. 9. Note the dramatic decreases
in variance at all distances with the larger
support. When considering quality control, we
take particular note of the projected y-axis
intercept or nugget, which has dropped from a
value of 0.5 to less than 0.1. The nugget value
provides an estimate of the total variance asso-
ciated with taking adjacent, or co-located sam-
ples,and Includes small-scale spatial variability
as well as sampling, preparation, and analytical
errors. In the present example, we Introduced no
measurement errors, so all of the nugget value is
the result of spatial variability, and the change
in support has a major effect. If, at the other
extreme, the nugget were entirely due to sampling
and analytical errors, it would not be reduced at
all by changing support.
•wiwrai Nit
1 timl n. i«l timl tumrl
M.t
Fig. 9. Variograms computed from simulated sam-
ples on single pixel (upper curve) and 2x2 pixel
(lower curve) support. Points are experimental
values; solid lines are subjectively fitted
models.
The approach one takes to improving data qual-
ity, therefore, should ideally be dictated by a
variance components analysis of the variogram
nugget value. The largest variance component
should get the most attention. If spatial var-
iance dominates, increase the support by taking
larger samples or perhaps by compositing small
clusters of samples. If measurement error dom-
inates, look for a more accurate method. When
either spatial or measurement variance is very
dominant and cannot practically be reduced
further, it may be possible to achieve a signif-
icant cost saving without seriously affecting
overall data quality by going to cheaper methods
which moderately increase the lower variance
component. For example, if the maximum feasible
support still results in a spatial variance com-
ponent ten times greater than the measurement
variance, the measurement variance could be
allowed to increase by a factor of five with a
resultant increase in the total standard error of
only 17%.
The question of quality assurance goes beyond
quality control of sample data, to Include the
adequacy of the program as a whole to achieve the
desired objectives. In the case of a site Inves-
tigation, we are partlcularlly interested in the
quality of the spatial estimate made from the
sample data, and we must therefore look at sample
quantity and location, and the estimation or
interpolation procedures, as well as sample data
quality. All of these factors can contribute to
the quality of the end result, and as always, the
most cost effective use of resources is to work
at reducing the largest error component.
With our simulated example, we can compare the
effects of the sample quality on the overall
quality of the data interpretation by Interpolat-
ing concentrations over the screen area from each
of the two sample sets. Using the variogram
models from Fig. 9, kriged estimates were com-
puted for 10x10 pixel blocks. The results are
shown In Fig. 10, with the blocks shaded accord-
ing to the same classification used in Fig. 2.
Comparison with the true values shown in Fig. 3
Illustrates the overall superiority of the 2x2
sample support in defining the general pattern of
block concentrations.
Plots of the true vs. estimated concentrations
for the two Interpolations are shown in Fig. 11,
and histograms of the estimation errors in Fig.
12. Note that even though the 2x2 pixel samples
provide better estimates, there is still rela-
tively high error which may lead to a large
number of false positives and false negatives at
most concentration action levels. Further reduc-
tion of estimation error would require more
samples, larger support, or both.
37
-------
Fig. 10. Maps of kriged estimates for 10x10 pixel blocks using single pixel (left) and
2x2 (right) samples. Compare with map of true block values in Fig. 4.
Scilter Fl<<
1 Pixel Xriffiftf CwrwisM
I.M1
I.B2
1.711
(II. 1M
tut*
1.119
Sc»tt» rut
2x2 Pixtl Xrifiat Cwwi
Figo 11. Plots of estimated vs. true block values and associated regression statistics
for the two kriged estimates in Fig. 10. The left plot is from the single pixel case,
the right, from the 2x2 pixel case.
«...
Ml I
j "*•'
M.I
I.I
-1
1 Pinl biiiw
i ni
1 Id
CMWISM
] 1
-TtrrfTl
i
L
t> -«.l •.• •.• i
St>t 1st ics
II total: M8
N «iai»|: 8
NUni: 639
NH>: -4.813
Vwun: 8.85ZI-41
CUw: 1.292
CU: 23.188
Skemen: -1.386
brtni>: 7.Z74
Ri»: -1.515
Mia: 8.818
43: 8.142
llu: 8.785
i
121.1
Ml.l
I
9
e
t
J 1M.I
k
M.I
I.I
-1
2x2 Pixtl NriffiHff CMPWISW
r
J
L
( -I.I I.I I.I 1
Stttistics
H Total: Me
Kited: 537
fo»: 1.813
Utriuce: 8.3691-41
Sttw: 1.192
Skewets: -8.575
hrtaif. S.7€
Din: -8.9U
Ql: -4.871
Mi»: 8.813
t&: 1.188
Ihx: 8.583
i
MtlMtti (-tFM)
Fig. 12. Histograms of estimation errors (estimated - true) for the single pixel case
(left), and the 2x2 pixel case (right).
-------
ISOPLETH MAPS
CONCLUSIONS
Isopleth, or contour, maps are a commonly used
and effective method for displaying the spatial
distribution of chemical concentration. The use
of isopleths, however re-introduces some of the
ambiguity discussed above regarding support. If
we examine the krlged maps In Fig. 10, we can
easily visualize the process of generating a con-
tour line by smoothing the stepwise boundary
between two shading patterns. This is essen-
tially the same result we would get if we assigned
the estimated block average values to the block
midpoints and ran the resulting grid through a
typical contouring algorithm. It is also compar-
able to the limit we would approach (very expen-
sively) with krlging as the grid dimension ap-
proaches zero. The problem arises because a
contour represents the intersections of a plane
parallel to the x-y axis and a continuous surface.
The existence of one contour line Implies the
existence of all possible contour lines, and
demonstrates that we have in effect estimated
Individual concentration values at every point.
But we showed earlier that point concentrations
are binary and discontinuous, and that meaningful
statements about concentration require specifying
a support. Which support?
Intuitively, we might like to think of the
isopleth surface as representing the result of a
moving average based on a specified support win-
dow. While this sounds good, it doesen't work in
practice. When we specified a 10x10 block grid,
It was easy to compute the true block means and
compare them to the estimated block means.
However, when we try to compute a true moving
average isopleth based on a 10x10 support window,
we find peaks, holes, and Irregularities in the
Isopleth lines occurring at much smaller scales
than the 10x10 support. Such a situation is
self-contradictory, and it is not obvious how to
define a true Isopleth, compare it with an esti-
mated one, and determine the goodness of the
estimate.
In spite of the difficulty of defining a 'real'
isopleth, a case can be made for using Isopleths
drawn from kriged block estimates as remediation
boundaries. Given that kriging is in some sense
an optimal estimator, and the goal is to remediate
all of the area above an action level, the best
approach should be to krige every point in the
sampled domain, and remediate all of the points
estimated to be above the action level. Contour-
ing krlged block values provides a good approxi-
mation which Is computationally feasable. The
point to remember when doing this is that the
nice, smooth contours may be primarily a regres-
sion effect, and that the underlying reality may
in fact be very erratic. If we were to clean up
an area Inside an isopleth boundary, and then
take check samples immediately around the bound-
ary, we would expect to see a large number
(approximately 30 to 50%, depending on the dis-
tribution) of samples exceed the action level,
even if the boundary is well estimated. A more
practical approach would be to do the check
sampling along the boundary before the clean-up,
and use the additional data to better define the
boundary.
Chemical compounds in the environment can be
spatially autocorrelated at scales ranging from
the molecular to global. Autocorrelation is prac-
tically significant when it occurs at scales
relevant to the problem, that is, between the
dimensions of a sample and the dimensions of the
domain being investigated. Under such conditions
the support, or physical size, shape, and orienta-
tion of a sample, becomes a critical factor in
the quality of an investigation. The use of
QA/QC data for a variance components analysis of
the nugget component of a variogram model can be
very useful in selecting the most cost-effective
approach to additional sampling.
Chemical concentration in the inhomogeneous
environment must refer to a specific support to
be meaningful. Specifying the support associated
with an action level for enforcement or remedia-
tion removes a source of ambiguity and potential
misunderstanding.
Spatial estimates of concentration such as
krlged blocks, and the contour maps derived from
them, are smoothed representations of reality,
and represent a support more like the sampling
grid size than the dimensions of the sample.
Site investigators should understand, and be able
to explain to the public, why many of the values
at the sample support, taken outside an Isopleth
boundary, are expected to exceed the Isopleth
value.
NOTICE
The Information in this document has been
funded wholly by the U.S. Environmental
Protection Agency. It has been subjected to
Agency review and approved for publication.
REFERENCES
Journel, A. G., and Huijbregts, Ch. J., 1978.
Mining Geostatistics. Academic Press, London.
600 pp.
Matheron, G., 1963. Principles of Geostatistics.
Economic Geology, Vol. 58, pp 1246-1266.
Olea, R. A., 1984. Systematic Sampling of
Spatial Functions. Kansas Geological Survey,
Lawrence, Kansas. 57 pp.
Splitstone, D. £., 1987. Sampling Design:
Some Very Practical Considerations. (This
volume).
Starks, T. H., 1986. Determination of Support in
Soil Sampling. Mathematical Geology, Vol. 18,
No. 6, pp. 529-537.
Yfantis, E. A., Flatman, G. T. , and Behar, J. V.,
1987. Efficiency of Krlging Estimation for
Square, Triangular, and Hexagonal Grids.
Mathematical Geology, Vol. 19, No. 3, pp.
183-205.
39
-------
Discussion
John Warren
US Environmental Protection Agency
Spatial prediction, as illustrated
by these two papers, holds great promise
in extracting large amounts of informa-
tion from situations where extensive
data collection is not feasible.
Evan Englund's paper on using
Jcriging in conjunction with the concept
of sample support is most interesting
and his discussion illustrates clearly
the problem of defining "sample" with
respect to the physical dimensions of
a sample site. The use of "sample
support" to avoid confusion with the
common statistical understanding of
"sample" is to be commended and his
definition: "The support on which
decisions will be made is rarely the
same as that of the samples. The
choice of decision support can sig-
nificantly affect the outcome of an
analysis, and should be considered
before a sampling program is under-
taken." should be included in every
Data Quality Objectives (DQO) Program.
The illustration of "sample
support" by simulation is particularly
useful in demonstrating how the defi-
nition of contamination is a function
of the particular support chosen. This
simulated example should be made part of
the Agency's DQO training program as it
clearly demonstrates to managers and
decision-makers the importance of allo-
cating resources proportional to the
ratio between decision-area and sample
support area.
Evan's discussion on the choice of
most efficient or most practicable sam-
pling design for sampling in a spatial
domain is very useful for there does not
seem to be a general solution applicable
to all sites. Each site must have a
unique sampling plan and although the
Agency's guidance document, SH-846,
endorses simple random sampling, the
document does recommend a tailored-to-
the-site approach. The revised edition
of SW-846 (4th edition with an expected
publication date of mid-1990) will pro-
vide guidance on most sampling schemes
likely to be encountered in spatial
statistics, and also gives guidance on
the construction of optimal grid sizes
for the effective use of kriging.
Noel Cressie's paper is a nice
summarization of spatial statistics and
introduces some of the scientific nota-
tions essential for discussion of
kriging. His comparison of trend sur-
face model with the random field model
is very useful and clearly illustrates
the differences between the two models.
The strength of the paper lies in
the excellent discussion of spatial
design of networks, Section 3. The
problem of adding a monitoring site to
an existing network cannot be solved
easily and only by making key assum-
ptions on the intent and use of the
monitoring data can any solution be
attempted. The concept of "average"
prediction error (equation 3.4) has
much merit as many of the Agency's
monitoring networks are intended to
measure mean increases/decreases in
pollution levels. It seems clear that
this notion of optimal network design
can be applied to Agency networks where
non-statistical considerations in
determining actual sites are relatively
unimportant.
There is much that needs to be done
in order to effectively bring spatial
statistics to the practical level at
EPA, and these two papers are an impor-
tant start. There is definitely an
interest in spatial statistics as evi-
denced by the number of people present
at the spatial statistics sessions at
the annual EPA Conference on
Statistics; these two papers should
further stimulate this interest.
40
-------
SAMPLING AND MODELING POLLUTANT PLUMES: METHODS COMBINING DIRECT MEASUREMENTS AND REMOTE SENSING DATA
Ralph E. Folsom
Research Triangle Institute, P.O. Box 12194, Research Triangle Park, NC 27709
1.0 Introduction
The goalof this paper Is to describe
response or trend surface methods for
characterizing groundwater contamination plumes.
The work focuses on statistical problems
associated with mapping pollutant plumes In the
vicinity of hazardous waste (superfund) sites.
The Importance of conducting low cost screening
surveys to obtain Indirect measurements of
contamination Is emphasized. Two such measures
of subsurface electrical conductivity are
described. To Illustrate the utility of such
low cost screening measurements,
electromagnetics or EM 1s employed as a
covarlate In spatial regression models fit to
Chloride and Tetrahydrofuran concentrations
recorded at a Canadian landfill.
We also explore use of Indirect measurement
surveys for Improving the efficiency of well-
site sampling plans. A stochastic regression
model provides a flexible theoretical framework
for deriving such results. In order to minimize
the Impact of prediction biases that result from
model mlsspedflcatlon, probability sampling and
estimation methods are advocated.
In the concluding section, an extended
stochastic coefficient model that provides for
spatial correlation among the well-site specific
Intercepts, />0l. 1s considered. This adaptation
establishes a class of spatial regression models
encompassing the various linear prediction
methods that are collectively referred to as
Krlglng.
2.0 Response Surface Methods
To set the stage for describing trend surface
fitting procedures, a data collection design for
a typical hazardous waste site will be
formulated. First consider a two dimensional
target region, R, symbolizing a specified
surface area around a hazardous waste site.
Within this region R we let 0 denote a finite
set of potential well sites for sampling ground
water. These potential well sites can be
envisioned as a dense grid of N points or
coordinate pairs w^ » (x^.y^) blanketing the
target region R. By restricting the set of
potential well sites to a finite subset of the
points 1n R, we can draw upon the extensively
developed methodology of probability sampling.
This methodology provides a wide array of
randomized site selection plans that can be
employed to specify a sample, s, of n (1)
41
-------
where ft Is the associated column vector of (p+1)
unknown polynomial regression coefficients. The
feature that will distinguish our response
surface models from the spatial regression
models employed 1n Krlglng Is the lack of
spatial correlation between neighboring zi<
values.
Since spatial correlation Is Induced by
mlcrostructure In the concentration surface that
Is not adequately accounted for by the
polynomial mean function 1n equation (1), having
a good Indirect measurement Cfc to Include 1n Xt
will substantially reduce the spatial
correlation between residuals ufc « (zk-X|d) and
u]. The results developed subsequently 1n this
paper are derived from models where the
covariance 1s zero between residuals ufc and u]
that are from different well sites. In the
final section extensions of our models are
proposed that allow for spatial correlation.
These models lead to stochastic coefficient
generalizations of 'Universal Krlglng' methods.
Due to the Importance of having a good covarlate
1f simple response surface methods are to be
successful, we will devote space 1n the
following section to elaborate on the properties
of two such Indirect measurements,
Electromagnetics and Resistivity.
2.2 Electromagnetics and Resistivity
Electromagneticsdevicesmeasure the
electrical conductivity of subsurface soil,
rock, and groundwater. The specific conductance
of the ground water and contaminating pore
fluids usually dominates the measurement. The
EM measurement responds to significant Increases
1n chloride, sodium, sulfate, potassium,
calcium, and magnesium. Variation 1n these
species reflect the contaminant variation at
parts per million (ppm) and parts per billion
(ppb) levels. EM measurements are made by
Induction with no ground contact required. The
depth of the conic shaped subsurface volume that
1s assessed by a single EM reading depends on
the diameter of the coll that Induces the
electrical field. The average depth of the
conic volumes scanned by EM devices are roughly
1.5, 6, 15, 30 and 60 meters. The two deepest
reading devices are vehicle mounted whereas the
three shallow scanning Instruments are hand
held. These EM devices are Ideally suited to
generating conductivity profiles along lateral
transects targeted at one of the six available
depths.
A second widely used screening device
measures subsurface electrical resistivity or
the Inverse of conductivity. The Resistivity
measuring Instrument Injects an electrical
current Into the ground at a pair of electrodes.
Like EM, Resistivity measures an average
response for a subsurface volume ranging from
the surface to the effective depth. The nominal
response depth for a particular Resistivity
sounding 1s approximately the electrode spacing.
Since the pair of electrodes can In principle be
separated by any predeslgnated distance, the
range of possible reading depths Is unlimited.
Noting that Resistivity measurements require
that a pair of electrodes be place In contact
with the ground, 1t 1s clear that such
measurements are slower to obtain than EM
readings. For this reason, applications of
Resistivity are generally restricted to depth
sounding at a limited number of sites. Due to
Its speed of application for lateral coverage,
the EM device Is more suitable for the blanket
coverage required of our low cost screening
survey. In the following section, a data base
providing linked EM survey measurements and lab
recorded contaminant concentrations Is explored.
This data base was developed to monitor the
magnitude and extent of groundwater
contamination around the Gloucester, Ontario
landfill.
2.3 The Gloucester Landfill Data
The Gloucester landfill TTTocated six miles
south of Ottawa, Ontario. From 1956 to 1973,
the landfill was the site of domestic waste
disposal. From 1969 to 1980, a special site
within the landfill was used to dispose of
hazardous university, hospital, and government
wastes. By 1985, approximately 60 multiple
level and 200 standplpe monitoring wells had
been Installed. Extensive EM surveys were
conducted obtaining Indirect conductivity
measurements at the six and fifty meter depths.
To demonstrate the utility of the EM
measurement, homogeneous polynomial models were
fitted to 1983 data from 44 wells and to 1985
data from 21 wells.
A degree-p homogeneous polynomials 1n the x
and y spatial coordinates Includes all
regressors of the form
X1t . xryc
such that the polynomial exponents r and c take
values zero through p and r+cip. Restricting
attention to such polynomials Is advisable since
the associated regression equations produce the
same predicted values Independent of an
arbitrary rigid rotation of the x and y
coordinate axis. Since the positioning of x and
y coordinate axes over the target region of a
hazardous waste site Is generally arbitrary,
models whose predictions
*k' V
change with the positioning of the coordinate
axis should be avoided.
When one Is forced by a sparslty of
observations at a uniform depth to
simultaneously fit concentrations from varying
depths, then polynomials that Involve the
recorded depth d^ are required. This has been
the case with the Gloucester data. Since the
position of the depth axis Is nonarbltrary,
models need not be fully homogeneous or
Invariant under rigid rotations In three
dimensions. The covarlate measurements C|< are
also linked to the potential well sites w|< In an
unambiguous fashion that Is not altered by
repositioning the x and y coordinates.
Table 1 displays a fully homogeneous degree-2
polynomial fit to the natural log of 1983
chloride concentrations recorded at 185 well
site by depth combinations. The average log
concentration for the 185 measurements 1s 2.63
which yields 13.87 mg/L on the untransformed
scale. With four dimensions among the
predictors, the homogeneous degree-2 polynomial
has an Intercept and fourteen regressors. The
multiple R2 for this model was R2 = 0.63.
42
-------
The pure error mean square 1s based on the
presence of replicated water samples and
associated readings for selected well by depth
combinations. The test for lack-of-flt Is
highly significant. While It Is possible that a
higher order polynomial would have achieved a
nonslgnlflcent lack of fit, we did not attempt
any higher order models. Our objective In
fitting such models was to demonstrate the
utility of the EM covarlate (COND50) for a
number of different pollutants. To expedite
this demonstration, we limited our analysis to
the same degree-2 polynomial for all of the
pollutant outcomes. The 50 foot EH varlate 1s
clearly an Important predictor of the 1983
Chloride concentrations. Figures 1 through 3
are Included to Illustrate a common problem with
polynomial regression predictors. Figure 1
presents two so-called bubble plots of the
Chloride concentrations observed at 28 wells.
The square figures represent 600 meter by 600
meter square regions overlaying the hazardous
waste site. The variable size dots denote the
size category of the maximum Chloride
concentration observed at the Indicated location
and over the designated depth range. Note from
figure 1 that there are only two well locations
at x-axis values In excess of 3,000 meters. We
have also observed that the 50 meter (EM)
conductivity values beyond 3,000 meters on the x
axis are all relatively small. In spite of
these relatively small values of the EM
covarlate beyond 3,000 meters, figure 2 shows
that the polynomial equation from table 1
predicts a substantial Chloride peak near the
center of the 3,100 meter boundary. Figure 3 Is
a three-dimensional spider-web plot of the
predicted values for maximum Chloride
concentration In the 15 to 25 meter depth range.
These series of figures Illustrate the risk
of extending one's mapping region beyond the
area where water samples have been analyzed for
contaminant concentrations. Another way of
addressing this problem Is to Insure that wells
have been located on or near the map's
boundaries and at sufficient density within the
region to guarantee a reasonable approximation
to the surfaces major features. In the
following sections, we develop some alternative
well location strategies based on different
stochastic models for the pollution process.
Before turning to the well site sampling
problem, 1t 1s Instructive to demonstrate that
In addition to correlating well with highly
Ionic contaminants like Chloride, the EM
electrical conductance measure can also predict
the dispersion of organic contaminants. Table 2
summarizes the fit of our degree-2 model to a
set of 41 log concentrations of 1985
Tetrahydrofuran (THF). These concentration
measurements were taken at 19 wells and were
associated with depths ranging from four to six
meters. Even within this narrow depth range, It
Is clear that the associated variable D Is an
Important predictor of THF concentrations. The
six meter EM (COND6) variable Is also a strong
predictor of THF. Note that In this case the
test for lack of fit Is nonsignificant. With
only nine degrees of freedom for estimating the
within site sampling and measurement error
variance (PURE ERROR), this nonsignificant
result may well be a reflection of poor test
power.
3.0 Model Based Purposive Sampling
To define statistical optimal 1ty criteria for
alternative well samples, we need to specify the
variance and co variance function for our
stochastic concentration measurements Z|<.
Recall that the mean function has the general
linear form
where X|< denotes a (p+1) element row vector
containing a leading 1 and p polynomial
functions of the spatial coordinates (xfc.yt;) and
the covarlate cfc. In our original formulation
of this problem, the subscrlpts-k were linked to
potential well sites wfc corresponding to points
(xk.Yk) blanketing the region R that was to be
mapped. Recall that the dense grid of N
potential well sites where covarlate
measurements c^ had been obtained was denoted by
0.
Implicit In this design formulation Is the
notion that all wells will be constructed to
sample at the same set of depths prespeclfled on
the basis of geophysical and hydrogeologlcal
considerations. To account for the possibility
of having multiple depth targeted covarlates
like our six meter and fifty meter EM
measurements, we can allow Xfc to Include the
full vector of covarlates, say cj<, so that the
mean function has the same linear form for each
depth level -d; that Is,
for each of the prespeclfled depths-d.
Given a common linear form for the mean
function at each depth-d and the further
assumption that the Zfoj are Independent with
common variance <7?, one observes that there Is a
single sample-s of n potential well sites chosen
from 0 that Is simultaneously optimal for
mapping pollutant concentrations z at all
depths-d. To specify an optimum sample-s, we
require an optlmallty criterion. Under our
response surface model, the best linear unbiased
estimator for a future value of
1s
skd
xkbsd
where b d Is the ordinary least squares (OLS)
estimator for the vector of depth specific
polynomial regression coefficients p^. The
variance of this best linear predictor Is
Var(zskd)
'1
(2)
where W~* 1s the Inverse of the left-hand-sides
matrix In the normal equations for estimating
b , and the superscript T denotes matrix
transposition. In response surface parlance,
W 1s the Incidence matrix of the deslgn-s.
If Xs denotes the n by (p+l)matr1x whose rows
correspond to the n vectors X|< associated with
the sample well s1tes-s, then the sample
Incidence matrix Is
Vf = (xjx ).
S 55
If we further let the measurements zkd denote a
43
-------
set of future values for the pollutant
concentrations at depth-d, then the squared
euclldean distance between the vector zQ(d) of N
future values and our vector zQ(sd) of best
linear predictors can be depicted as follows:
LQ S j^kd - W2' (3>
We will define the optimum sample-s as that
particular sample of size n that minimizes the
expected squared distance between the predicted
surface zQ(sd) and a future Independent
realization of the contamination process zQ(d).
With Ljj(sd) In equation (3) representing our
loss function, the associated expected loss or
risk function Is
R (sd) s E E(zjd - z kd)2. (4)
Q ^ ja
This risk function Is proportional to the
average mean-squared prediction error where the
averaging extends over all potential well sites
wk belonging to Q; that Is, over all N points
where the Z(j surface Is to be mapped. The
Independent residuals assumption of our Response
Surface model leads to the risk function
.n*2
*sk'd>
N
E v
.k-1
sk
Now, one observes that the sample-s that
minimizes (5) must also minimize
N
.
trCW-W!1)
u s
(5)
(6)
where WQ 1s the universe level Incidence matrix
based on all N potential well sites and tr( ) Is
the trace operator. An equivalent criterion
that Is proportional to (6) replaces WQ and W
by the associated mean square and cross product
matrices 0Q s (WQ + N) and Q$ = (W$ + n).
In terms of these matrices, the criterion In
equation (6) reduces to
N .
E v fc - (N + n)tr(BQW -1) . (7)
k-1 "
To simplify this criterion further, one can
perform an orthonormal transformation on the Xk
vectors so that at the universe level the
transformed mean Incidence matrix Is the
Identity matrix. Subsequent to this
transformation, the optimalIty criterion
simplifies to
N .
E vs|f - (N + n)tr(fl-1) . (8)
k-1 slc s
If we assume further that the potential well
sites w|( In Q were positioned at the nodes of a
regular square grid, then the original xk and yk
well coordinates are orthogonal to each other.
In this case the transformed coordinates are
simply the standardized versions, say xk and yk,
formed by subtracting their respective unlverse-
0 level means and dividing the resulting
deviations by the associated root mean squared
deviations.
To Illustrate the nature of samples that
minimize our risk function, consider a simple
degree-1 polynomial function. A response
surface model of degree-1 and a mapping universe
Q of points (xk,yk) positioned on a regular
square grid yields the following orthonormal 1 zed
vector of linear predictors
*k s Oi *k. yk. *k)
where the transformed xk and y Coordinates are
the standardized versions described above and
^denotes the orthonormal version of the
covarlate ck. The transformed covarlate has the
form
where c^ 1s the 0 level ordinary least squares
predictor for ck based on the first three
elements of j^. The divisor a.Q Is the square
root of the average squared residual (cfc - cj2.
Returning to the optlmallty criterion In
equation (8), one observes that for samples-s
that are orthogonal In the sense that 9 Is
diagonal, the risk function Is proportional to
tr^1) - 1 + ,;2 + a'l + a'2 (10)
where the a* quantities In equation (10) are the
sample averages of the squared varlates x2,
2 2
yk, and ^. For such orthogonal samples, 1t
Is clear that our risk function 1s minimized by
choosing points wk with the largest values of
*k' yk' and ^k* Such P°1nts I1e on or
near the spatial boundary of 0 where large
positive or negative residuals *kare
observed. Note that these are points where the
degree-1 spatial polynomial predicts the
covarlate poorly.
Of course, sample orthogonality Is not
a prerequisite for minimizing tr (O^1).
Focusing on the subset of orthogonal samples was
merely an expedient used to obtain a simple
characterization of designs that minimize
tr(B" ). Various Iterative computer algorithms
have been devised to find near optimal samples
based on minimizing criteria like equations (7)
and (8). Mitchell (1974) presents one such
algorithm.
It Is Instructive to note that minimizing
equation (6) amounts to minimizing the average
variance of the predicted values zskd over 0
assuming that the form of the mean function Is
known. Robust response surface methods have
also been developed that seek to minimize the
average mean squared prediction error of
equation (4) assuming that the true mean
function for the process 1s a higher degree
44
-------
polynomial than that being used to produce the z
predicted values. In this case, the dominant
term 1n the risk function of equation (4) 1s the
sum over D of the squared prediction biases
Bskd " Esk • (15)
SK 7S for all tts
Note that for the sample well sites (kes) the
lsk estimator has a second term, say ijslc, that
estimates the vector of random effects i;k.
The associated set of BLU predictors for the
z surface have the form
zsk • Vsk
for
The stochastic coefficient model results In a
lack-of-flt adjustment to the z predictors for
wells wk that belong to the sample-s. When
the measurement variance contribution In V2.,
2
namely a + mk, 1s negligible relative to the
lack-of-flt contribution (XkAXJ[) then zsfc - zfc
for the sample well sites. When the measurement
variance 1s not negligible, then zsk for kcs has
the form of a James-Stein shrinkage estimator;
that 1s, for kes
45
-------
where the weight «k on zk 1s the fraction of the
variance V? that Is accounted for by the lack-
of-flt contribution
While the set of predictors In equation (16)
are best In the sense of minimum mean squared
prediction error over Q, the associated surface
1s discontinuous at the sample well sites. To
produce an appealing graphic representation of
the surface, a smoother prediction function may
be desirable. For this purpose one can derive a
common vector of regression coefficients to
substitute for the site specific p\ vectors, say
Is*, such that the set of smooth linear
predictors
zlk
minimize the risk function In equation (4) with
the z$k predictors replacing z . In this
Instance, one minimizes the risk function
Rn(/!$*) with respect to the elements of /Js*
given the sample-s. Note also that the
expectation defining RQ(/>«*) Is In terms of our
stochastic coefficient model.
Pfefferman (1984) shows that with A and a2
known, the />s* that minimizes the average mean
squared prediction error has the form:
- N , 1-1 f N
I Xj XJ E
lk«l * N lk-1
f.
' »*
(18)
where the /Tsk are as defined In equation (15).
Folsom (1985) shows that when the number of
replicate water samples n\ per site Is a
constant i, fls* can be recast 1n terms of the
generalized least squares estimator 7$ of
equation (14) and the ordinary least squares
(OLS) estimator bs. Letting R$- (x][zs) + n
denote the vector of mean cross-products In the
right-hand sides of the OLS estimation
equations, we have
where f
s-<°s - VV (l-088(bs- 7S)1
n + N Is the sampling fraction.
Recall that 5)s and BQ denote the mean Incidence
matrices for the sample-s and the unlverse-Q
respectively.
Equation (19) shows that when one has wells
at all potential sites and s - 0 then the
minimum risk common p Is the universe level OLS
estimator
fa
(2°)
T
with RQ s (x'zQ) + N denoting the population
analog of Rs. Alternatively, as the sampling
fraction f - (n + N) approaches zero the minimum
risk common B tends to the generalized least
squares (GLS) estimator 75. It Is also
Interesting to note that when the sample
Incidence matrix fis matches the universe level
matrix QQ then the best common 6 simplifies to
«s* " 7s + fO>$ ~ 7s) •
e fact that our best common coefficient
vector Is* goes to fa as the sampling rate f
goes to 1 suggests that an appropriate
definition for the finite population p parameter
1s fa. In the following section, we propose an
Inverse selection probability weighted analog of
the OLS estimator bs, namely br, that Is
asymptotically design unbiased for fa. In
addition to this desirable large sample
property, we propose a sample design with
Inclusion probabilities r\ proportional to the
residual variances V? dictated by a simplified
form of our stochastic coefficient model. When
this simplified model holds, our Inverse r
weighted estimator br Is equivalent to the best
small sample estimator 7$.
4.2 Efficient Probability Sampling
Having settled on fa as our finite population
parameter, an ADU estimator for fa Is
where T« 1s the (nxn)) diagonal matrix with
diagonal elements TSI corresponding to the
Inclusion probabilities for the well sites w^
that belong to the sample-s. In order for b, to
be close to the optimum estimator /Js* which
approaches 7$ when the sampling rate f 1s small,
we require a sample design with the well site
selection probabilities rs\ proportional to our
model residual variance; that Is,
rsi - *V2 - flKX^xJ) + (a2 + 5)] (22)
where 0 1s a proportionality constant and i Is
the common number of water samples proposed for
each sample well site.
The set of selection probabilities specified
In equation (22) are clearly functions of the
unknown model parameters A and
-------
yields new predictor vectors %\ with the
property that the associated universe level mean
Incidence matrix W reduces to !/„+})• In
addition to Its potential for rendering the p^
stochastic coefficients Independent and
equlvariant, this orthonormal transformation
leads to an Interesting formal justification for
the Inclusion probabilities In equation (23).
For our simplified model, we can show that
equation (23) provides the vector of selection
probabilities TQ s (TJ, ..., TQ) that minimize
the risk function
N
<24>
where E§ denotes expectation over repeated
probability samples-s and Et expectation over
the stochastic coefficient model. The values of
*1 that minimize (24) are
™{L1V1
(25)
Recall, that L -
and V2 Is the model
variance of z . In the simplified model where
V21s proportional to L2, we note that the
2
optimum i-j are also proportional to Lj.
Therefore, under the simplified model, the well
site selection probabilities that minimize the
risk function of equation (24) are also the
probabilities that render our ADU estimator br
equivalent to the model -best small sample
estimator 75.
The probability sampling scheme outlined
above had Intuitive appeal In light of the model
based purposive sampling results of section 3.0.
Given that our well site selection probabilities
are proportional to the squared length of the
associated vectors xi of orthonormal 1 zed
predictors, we observe that the longest ji
vectors are associated with sites on or near the
boundary of the sampling region where atypical ly
large values of the transformed covarlate are
observed. Recall that the transformed cgvarlate
for site wi< Is proportional to (c^-cic), the
difference between the observed covarlate and
Its (x,y) coordinate polynomial predictor. It
Is clear that large (ck-cY) residuals will occur
where abrupt spikes or holes are observed In the
covarlate surface. Near the sampling region
boundary It will be the spikes that generate
large residuals and correspondingly large site
selection probabilities.
The model based purposive sample designs of
section 3.0 focused exclusively on these
boundary region sites where atyplcally large
covarlate values were observed. Our randomized
sampling design assigns the highest selection
probabilities to those points while Insuring
that all potential sites have a nonzero chance
of selection. As Indicted In section 3.0, this
defining property of probability samples ensures
that the Inverse T weighted sample Incidence
matrix 8 equals 0Q on average over repeated
samples and comes to closely approximate 8Q as
the sample size becomes sufficiently large.
Therefore, 1n the context of response surface
prediction, we see that probability samples can
be both efficient from the standpoint of
controlling thevariance of prediction errors
and robust 1n terms of controlling the average
squared prediction bias.
4.3 Design Consistent Estimation
Inthissection,weconsider a design
consistent adaptation of the best common /S as
defined by equation (19). Recall that our
orthogonal transformation of the predictor
vectors reduces the universe level mean
Incidence matrix to the Identity matrix. With
this transformation, the best common p reduces
to
/>s* - [".-flrWvu-flVWJ- {25)
Examining equation (25), we observed that the
two leading terms would yield an ADU estimator
for fin = RQ If the Inverse f-we1ghted estimators
R, and Wr were substituted for Rs and fls.
Specifically, we define the design
unbiased estimators
J =
1 +N'1
R s £
* Its
and
j
"j = E X\ X] * Nfi •
* les
Making the Indicated substitutions, we obtain
the following ADU estimator for /»Q:
Ps s [R, - («„ - Ipn)7s] • (26)
The estimator In (26) Is clearly design unbiased
In large samples since WT •» WQ * Ip+i and
RT + RQ = PQ. The form of /Js In equation (26)
reveals how knowledge of the population level
mean Incidence matrix BQ - Ip+i 1s employed to
adjust the unbiased estimator R, for the
observed sampling deviation (BT - BQ). In this
form, PC has the appearance of a vector valued
regression estimator with coefficients drawn
from the generalized least squares (GLS)
estimator 75.
To form the GLS estimator (75) employed 1n
(26), we require a sample estimator for the
stochastic coefficient covarlance matrix A and
the measurement variance component a2. We are
Inclined to reduce A to a diagonal matrix and
apply restricted maximum likelihood (REML)
methods to obtain the sample estimators As and
a2. In terms of these statistics, we estimate
the variance of the slte-1 mean
as
Our GLS estimator 7S Is obtained from the
associated diagonal covarlance matrix. An
alternative form for />« emphasizes this GLS
component. Note that for a general set of
untransformed predictors, the two leading terms
of equation (19) lead to the ADU adaptation
In this form, the second term performs a bias
correction when the mean function Is not
47
-------
adequately approximated by the specified (p+1)
variable linear predictor. In this case, the
correction renders />« asymptotically design
unbiased for />Q, the universe level OLS
coefficients.
Recall that the original best common p
estimator ps* of equation (19) also converged to
PQ as the sample-s grew to encompass the entire
population of potential sites. When our
simplified variance model holds approximately
with A - £2I(p+l) and (*s * m^ negligible, then
the Inverse p-we1ghted estimator by s B(.1ft]r will
closely approximate the model based GLS
estimator 7$. In this situation, the degree of
correction will be slight and ps will closely
approximate 7$.
In addition to combining the efficiency of
the GLS coefficient vector 7$ and the robustness
of the r-welghted analog br, the regression
estimator /Js makes explicit use of our knowledge
of the population matrix On. Effective use of
this added Information should Improve response
surface prediction. The following section
relates an Interesting connection between our
simplified stochastic coefficient model and a
robust co variance matrix estimator for br.
4.4 Robust Covarlance Matrix Estimation
In a recent paper, C.F.J. Bu (1986)
recommends a weighted jackknlfe covarlance
matrix estimator that Is strictly unbiased when
the model assumptions hold. Wu's version of the
delete-one Jackknlfe has the added robustness
property of asymptotic unblasedness when the
variance model does not hold. For this
robustness property to hold requires only that
the residuals be uncorrelated. Recall that the
variance model motivating our »-we1ghted vector
of regression coefficients has
V?
The associated site selection probability Is
As applied to our p-we1ghted coefficient vector
br, Wu's Jackknlfe covarlance matrix estimator
covyu(bT) has the form
[(p+D2*n] B"1^ ffor^LfCiMp*!)^]} O;1 (28)
where the rj quantities depict the observed mean
residuals
The {1 quantity In the divisor of equation (28)
1s defined as:
Recall that with the ji orthonormallzed, the
universe level mean Incidence matrix WQ» I(p+l)-
Wu's results Imply that when V2= $ZL2, then
strictly model unbiased. On the other hand,
when V2 * 62L* the estimator In (28) Is still
asymptotically unbiased as long as the
unobserved residuals
1 " 1
f*n
are uncorrelated. This result follows since Vj
1s asymptotically unbiased.
It Is Interesting to note that when the r-
welghted Incidence matrix QT matches the
universe level analog WQ * I/D+i\» then (^ • 1
for all s1tes-1 and the bias correction factor
applied to the residuals In equation (30)
reduces to the familiar degrees-of -freedom
adjustment. It Is also Instructive to note that
In the case of pps with replacement sampling,
the estimator with ft » 1 reduces to Fuller's
(1974) version of the Taylor Series or
linearization approximation for the probability
sampling covarlance matrix. The f\ « 1 version
1s also analogous to Hlnkley's (1977) weighted
Jackknlfe.
Folsom's (1974) version of the linearization
estimator employed the correction factor
[n+(n-l)]. Wu also shows that the standard
bootstrap covarlance estimator for b, can be
approximated by equation (28) with the bias
correction factor totally removed. While all of
these variations are asymptotically equivalent,
Wu's simulation results suggest that the bias
correction of equation (28) can be Important for
small samples.
The connection between these robust
covarlance matrix estimators and our stochastic
coefficient model follows from the definition of
the site specific coefficient estimators
1 - *1br> * V?o
nr2 *
- V2
(30)
1s a model unbiased estimator for V and (28) Is
where LQ « «I(p+1) and Vi0 * r As def1ned
In equation (31) one observes that the br1
coefficients sum to zero across sites. Given
this definition of b^, we can recast Wu's
covarlance matrix estimator 1n terms of the
associated sample covarlance matrix
r t
Employing the matrix Sb, the robust covarlance
matrix estimator becomes
covwu(bf) - (p+D^Vr1) * n ' (32)
Analogous covarlance matrix estimators can be
specified for our GLS coefficient vector 7S and
the design consistent adaptation
^s * 7s + BuX(br ' *»)' We be11eve that
these model bias corrected versions of the
Jackknlfe/llnearlzatlon estimators warrant
further evaluation.
48
-------
5.0 Summary Comments
In summary, we have made a case for the
utility of remote sensing data, particularly
electromagnetics (EM), as a predictor of ground'
water contamination levels. Stochastic
coefficient models were employed to motivate
Interesting probability sampling alternatives to
model based designs. We believe that the good
large sample properties of probability sampling
methods merit further examination In the
moderate sample size arena. In this context, we
favor coefficient estimators that are efficient
when the model holds and asymptotically design
unbiased when the model falls. An Interesting
jackknlfe covariance matrix estimator
Incorporating a model bias correction was also
Introduced.
In the area of further research, we are
currently exploring an extended class of
stochastic coefficient models that permit the
regression Intercepts p$\ to exhibit spatial
autocorrelation. Restricted maximum likelihood
(REML) methods are being employed to estimate
the covariance function parameters. Our goal Is
to develop efficient and robust statistical
procedures that combine the best features of
'response surface* methods for mean function
prediction and "Krlglng" procedures for fitting
spatial covartance functions. REML estimation
Is attractive In this context since It combines
model based efficiency In the gauslan
environment with a known robustness property In
the nongauslan case. For linear covarlance
function parameters, REML Is equivalent to
1Iterated MINQUE (minimum norm quadratic
unbiased estimation).
6.0 References
Folsom, R. E., (1974). National Assessment
Approach to Sampling Error Estimation.
Sampling Error Monograph.RTI final report
prepared for the Education Commission of the
States' National Assessment project.
Folsom, R. E., C. A. Clayton, F. J. Potter, and
R. M. Lucas, (1985). The Use of Geophysical
Techniques In the Statistical Estimation of
Magnitude and Extent for Subsurface
Contamination at Hazardous Waste Sites.RTI
final report prepared
Policy Branch.
for EPA's Statistical
Fuller, W. A., (1975). Regression Analysis for
Sample Surveys. Sankhya C 37, 117-132.
Hlnkley, D. V., (1977).
Unbalanced Situations."
285-292.
"Jackkn1f1ng 1n
Technometrlcs, 19,
Mitchell, T. J., (1974). "An Algorithm for the
Construction of 'D-Optlmal' Experimental
Designs," Technometrlcs, 16, 203-210.
Pfeffermann, D., (1984). "On Extensions of the
Gauss-Markov Theorem to the Case of
Stochastic Regression Coefficients."
Journal of the Royal Statistical Society.
46. No. 1. 139-148.
Wu, C. F. J., (1986). "Jackknlfe, Bootstrap and
other Resampling Methods In Regression
Analysis." The Annals of Statistics. 14,
1261-1343.
49
-------
Table 1: Degree-2 Polynomial Fit to 1983 Log-Chloride
Regression
Linear
Quadratic
Crossproduct
Total Regress
Residual
Lack of Fit
Pure Error
Total Error
Parameter
Intercept
X
Y
D
CondSO
X*X
Y*X
Y*Y
D*X
D*Y
D*D
Cond50*X
Cond50*Y
Cond50*0
Cond50*Cond50
Table 2.
REGRESSION
LINEAR
QUADRATIC
CROSSPRODUCT
TOTAL REGRESS
RESIDUAL
LACK OF FIT
PURE ERROR
TOTAL ERROR
PARAMETER
INTERCEPT
X
Y
D
COND6
X*X
Y*X
Y*Y
D*X
D*Y
D*D
COND6*X
COND6*Y
COND6*D
COND6*COND6
OF
*•
4
4
6
14
DF
137
33
170
DF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Type I SS
87.58111894
90.57004964
77.41551896
255.56669
SS
141.12041
9.02196057
150.14237
Estimate
-15.85168215
0.02120588
-0.14238897
0.29552937
1.34988504
-0.000043146
0.000215274
-0.000275732
0.001216132
0.001437620
-0.009610226
0.000273421
0.003379166
0.02522552
-0.06448431
Degree-2 Polynomial
DF
4
4
6
14
DF
17
9
26
DF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
TYPE I SS
10.13806033
2.27126830
7.93565507
20.34498370
SS
4.64649597
1.96751001
6.61400598
ESTIMATE
-58.56598111
0.10786243
0.21557419
-8.21549669
18.69031289
-0.000029699
0.001158774
-0.000902490
0.02107810
-0.006061689
0.35457561
-0.04542798
-0.03727344
0.23862751
-0.61796587
R-Square
0.2159
0.2232
0.1908
0.6299
Mean Square
1.03007597
0.27339274
0.88319040
Std Dev
4.41599408
0.009779667
0.01579998
0.30227869
0.39302999
0.000012153
0.000032573
0.000025697
0.000334634
0.000537719
0.007018137
0.000723282
0.000789312
0.01375624
0.01542040
F-Ratlo
24.79
25.64
14.61
20.67
F-Rat1o
3.768
T-Ratlo
-3.59
2.18
-9.01
0.98
3.43
-3.55
6.61
-10.73
3.63
2.67
-1.37
0.38
4.28
1.83
-4.18
Prob
0.0001
0.0001
0.0001
0.0001
Prob
0.0001
Prob
0.0004
0.0307
0.0001
0.3296
0.0007
0.0005
0.0001
0.0001
0.0004
0.0082
0.1727
0.7059
0.0001
0.0684
0.0001
Fit to 1985 Log-Tetrahydrofuran
R-SQUARE
0.3761
0.0842
0.2944
0.7547
MEAN SQUARE
0.27332329
0.21861222
0.25438485
STD DEV
27.55493712
0.06752790
0.15441818
2.26361317
6.05457900
0.000179072
0.001347590
0.001604787
0.005249011
0.01139379
0.13182192
0.01283282
0.01779248
0.13736266
0.23241860
F-RATIO
9.96
2.23
5.20
5.71
F-RATIO
1.250
'T-RATIO
-2.13
1.60
1.40
-3.63
3.09
-0.17
0.86
-0.56
4.02
-0.53
2.69
-3.54
-2.09
1.74
-2.66
PROB
0.0001
0.0931
0.0012
0.0001
PROS
0.3781
PROB
0.0432
0.1223
0.1745
0.0012
0.0048
0.8696
0.3977
0.5787
0.0005
0.5992
0.0123
0.0015
0.0461
0.0942
0.0132
50
-------
i: 14tt2tlMn
MI: »fe2SU«hn
MO
a* no
MO Jl»
Chlorld* (mg/L): ..5-12 .'12-24 • 24-36 • 36-57
Figure 1. Distribution of 1983 Chloride Concentration by Depth of Well
O^tti: 25IMtn
Uox. of 15-25 IMtn
2no mo 800 aoo xoo sioo
BOOV-t
aoo
••.... .;......... .j
i i '; i i i i i i i i i i i i i i i }
2000 • noo no 3100
CL(mg/L)r . <20 .20-40 • 40-50 • 60-8& •80-100 • >100
Figure 2. Predicted Values of 1983 Chloride for Deep Depths
8900
Figure 3. Predicted Values of 1983 Chloride (mg/L)
Maximum Concentration at 15-25 Meters
3100
51
-------
"VALIDATION" OF AIR POLLUTION DISPERSION MODELS
Anthony D. Thrall
Electric Power Research Institute, 3412 Hillview Avenue, Palo Alto, CA 94303
ABSTRACT
Computer simulations of the
dispersion of pollutants through the air
are used extensively to guide regulatory
decisions concerning air quality. There
are a variety of air dispersion models,
some of which are classified by the U.S.
EPA as "preferred" for certain types of
applications, others of which are
classified as "alternatives," and the
remainder of which must prove themselves
"valid" for regulatory application
according to a specified evaluation
protocol. The criteria, evaluation
procedures, and test results are
reviewed for the case of Gaussian plume
models of sulfur dioxide emissions from
isolated point-sources, such as coal-
fired power plants located in rural
areas.
REFERENCES
American Meteorological Society,
1983. Uncertainty in Air Quality
Modelling. Report of a Workshop at
Woods Hole, MA, September 1982. AMS,
Boston, MA.
Bowne, N.E., Londergan, R.J., Murray,
D.R. and Borenstein, H.S., 1983.
Overview, Results, and Conclusions for
the EPRI Plume Model Validation and
Development Project: Plains Site. EA-
3074, the Electric Power Research
Institute, Palo Alto, CA.
Bowne, N.E., Londergan, R.J. and
Murray, D.R., 1985. Summary of Results
and Conclusions for the EPRI Plume Model
Validation and Development Project:
Moderately Complex Terrain Site. EA-
3755, the Electric Power Research
Institute, Palo Alto, CA.
Bowne, N.E., 1987. Observations and
Evaluations of Plume Models. TRC
Environmental Consultants, Inc., East
Hartford, CT.
Burton, C.J., Stoeckenius, T.E. and
Nordin, J.P., 1982. Variability/
Uncertainty in Sulfur Emissions: A
Summary of Current Knowledge on the
Effects on Ambient Standard Attainment
Demonstrations of Adopting Some Simple
Models of Sulfur Variability.
Proceeding of the Woods Hole Workshop on
Modeling Uncertainty.
Carson, D.J., 1986. A Report on the
Symposium on Uncertainty in Modelling
Atmospheric Dispersion. Atmospheric
Environment 20, 1047-1049.
Dupuis, L.R. and Lipfert, F.W.,
1986. Estimating the Cost of
Uncertainty in Air Quality Modeling.
EA-4707, the Electric Power Research
Institute, Palo Alto, CA.
Environmental Protection Agency,
1982a. Air Quality Criteria for
Particulate Matter and Sulfur Oxides,
Volume I. EPA600/8-82-029a, U.S.
Environmental Protection Agency,
Research Triangle Park, NC 27722.
Environmental Protection Agency,
1982b. Air Quality Criteria for
Particulate Matter and Sulfur Oxides,
Volume II. EPA-600/8-82-029b, U.S.
Environmental Protection Agency,
Research Triangle Park, NC 27722.
Environmental Protection Agency,
1986. Guideline on Air Quality Models
(Revised). EPA-450/2-78-027R, U.S.
Environmental Protection Agency.
Fox, D.G., 1981. Judging Air Quality
Model Performance. Bulletin of the
American Meteorological Society 62, 599-
609.
Hanna, S.R., 1986. Spectra of
Concentration Fluctuations: The Two
Time Scales of a Meandering Plume.
Atmospheric Environment 20, 1131-1137.
Hayes, S.R. and Moore, G.E., 1986.
Air Quality Model Performance: A
Comparative Analysis of 14 Model
Evaluation Studies. Atmospheric
Environment 10, 1897-1911.
Hillyer, M.J. and Burton, C.S.,
1980. The ExEx Methods: Incorporating
Variability in Sulfur Dioxide Emissions
Into Power Plant Impact Assessment.
Prepared for the U.S. Environmental
Protection Agency, Research Triangle,
Park, NC.
Hilst, G.R., 1978. Plume Model
Validation. EA-917-SY, the Electric
Power Research Institute, Palo Alto, CA.
Moore, G.E. and Liu, M.K., 1987.
Time Series and Spectral Analyses of
Concentration and Meteorological Data
from the Kincaid and Bull Run Power
Plants. sySAPP-87/014, Prepared for
the Electric Power Research Institute,
Palo Alto, CA.
Murray, D.R. and Bowne, N.E., 1988.
Urban Power Plant Plume Studies. EA-
5468, the Electric Power Research
Institute, Palo Alto, CA.
Smith, M.E., 1984. Review of the
Attributes and Performance of Ten Rural
Diffusion Models. Bulletin of the
American Meteorological Society 65.
Taylor, J.A., Jakeman, A.J. and
Simpson, R.W., 1986. Modeling
Distributions of Air a Pollutant
Concentrations —I. Identification of
Statistical Models. Atmospheric
Environment 20, 1781-1789.
Thrall, A.D., Stoeckenius, T.E. and
Burton, C.S., 1985. A Method for
Calculating Dispersion Modeling
Uncertainty Applied to the Regulation of
an Emission Source. Prepared for the
U.S. Environmental Protection Agency,
Research Triangle Park, NC.
Turner, D.B. and Irwin, J.S., 1982.
Extreme Value Statistics Related to
Performance of a Standard Air Quality
52
-------
Simulation Model Using Data at Seven «?r = S2* Boundary-Layer Meteorology 17,
Power Plant Sites. Atmospheric 5 ,,7, _ _ _ , . _ _ .
Environment 16, 8. 1907-1914. 0 Alison, D.J. , Robins, A. G. and
-
Venkatram, A., 1979. A Note on the f ;>' >^ ic and
Measurement and Modeling of Pollutant Conditionally-Averaged Concentration
Concentrations Associated with Point Fluctuation Statistics in Plumes.
Atmospheric Environment 19, 1053-1064.
53
-------
MODELING POLLUTANT PLUMES
Robert W. Jernigan
The American University and
Statistical Policy Branch, EPA. Washington, DC
The previous papers and presentations by
Anthony D. Thrall and Ralph E. Folsom show two
different approaches to the modeling of
pollutant plumes. Thrall is mainly concerned
with the modeling of isolated air pollution
point sources, such as coal fired power plants.
As such the methodology that he considers
involves models of the dynamic behavior of plume
development and dispersion. These models are
used to evaluate the effect of various control
methods that are considered to meet the
standards of the Clear Air Act Ammendments of
1970 and 1977. There is little to comment on in
this paper. I have seen the details of some of
Thrall's related work where he has been involved
in developing statistical summaries of model
performance and in the calibration problem of
matching model calculations to observed
monitoring data. Using a measure of the
probability of misclasslfication derived from
simulated data, he has developed quantitative
procedures for the validation of such models.
Folsom gives us more to examine, so my
comments will center on his paper. Folsom has
been developing methodologies for the modeling
of groundwater pollution plumes. His modeling
efforts rely on a certain stability or
stationarity in the pollutant plume. His
procedures are used to aid in the design of
well-site sampling plans. The use of indirect
measurements of contamination obtained from
electromagnetics and resistivity devices is the
crucial element in his mapping of plumes through
response surface methods. The use of such a
covariate for the contaminant provides a useful
and relatively inexpensive view of the pollutant
plume.
Folsom uses a class of low order polynomial
spatial regression models along with this
covariate to predict pollutant concentrations.
He does not however Incorporate any measure of
spatial autocorrelation in this estimation.
Unfortunately, Inclusion of a covariate does not
necessarily explain all of the autocorrelation
present. In fact one would see substantial
reduction in spatial autocorrelation only if the
relationship between the measurement and its
covariate was not affected by spatial
properties. For example it might happen that a
measurement and its covariate are more highly
correlated in areas of high rather than low
pollution. 'An indication of such a behavior can
be seen in the scatterplots of EM and
resistivity measurements in Folsom, et.al.(1985)
There a few high side outliers often behave
quite differently from the bulk of the data.
Of course, the technique of co-krlging
could incorporate both the information on the
covariate and the spatial autocorrelation to
model the contaminant plum. But it is easy to
incorporate spatial autocorrelations and the
essence of this kriging methodology into his
response surface techniques without resorting to
specialized kriging software. What is needed is
a preliminary estimate of the spatial
autocorrelations. These will be used in a
generalized least squares algorithm to estimate
the contaminant. This technique will yield
predictions of measurements of the contaminant
plume whose spatial autocorrelations can be
calculated. These improved estimates of the
spatial autocorrelations can then be used with
generalized least squares to re-estimate the
pollutant concentrations. As such an iterative
technique continues it incorporates spatial
effects directly and R values in excess of the
63% obtained by Folsom could be possible.
Folsom's use of a random coefficient
regression model is one approach to account for
the within site sampling variability as well as
the laboratory analysis variability.Kriging with
a nugget effect to smooth the response surface
is another way to model these sources of
variability.
In all such modeling problems we often
produce a map of point estimates. We must be
careful in not interpreting this map as truth.
Each point on our map may be above or below the
true pollutant response surface. Our estimation
error may be quite small and still the map of
the true surface can appear different from that
of our estimated surface, especially in the
presence of spatial autocorrelation.
I applaud Folsom and his colleagues for
considering extensions to his models to
incorporate spatial autocorrelation. Perhaps by
combining the existing techniques of mapping
groundwater contaminants more efficient and
reliable procedures can be developed to model
pollutant plumes.
54
-------
ESTIMATING THE SPATIAL UNCERTAINTY OF INFERRED RATES OF DRY
ACIDIC DEPOSITION
Richard A. Livingston
Geology Dept.,U.of Maryland,College Park.MD 20742
The acid deposition monitoring program
conducted by NAPAP concerns monitoring both wet
and dry deposition. Wet deposition or acid rain
monitoring has been in operation since 1982. The
dry deposition monitoring program is now getting
underway'". One of the major issues with moni-
toring dry deposition concerns the extrapolation
of the dry deposition rate measured at a moni-
toring site to the entire region of interest.
The problem is further complicated by the
fact that the measurement technique involved
gives only an indirect estimate of the actual
flux to the ground. Instead, an approach known
as the inferential method is used. This con-
sists of measuring concentrations of acidic
gases and particles in the atmosphere. From the
concentrations, the rate of deposition is in-
ferred using an algorithm that incorporates
several site-specific or time-dependent fac-
tors'^'. For example, the type of ground cover,
which varies spatially, is a factor in the
algorithm. An example of a time-dependent
factor would be windspeed. These factors are
measured at the actual monitoring site, but can
obviously assume different values as one moves
away from the site.
The inferential method employs a simple
formula for calculating dry deposition rates:
F = VdC
(1)
where F is the dry deposition flux; C is the
concentration of the acidic species in the
atmosphere, and Vd is a factor known as the
deposition velocity which is calculated from the
algorithm described above. The deposition
velocity can range from'0.01 to 1 cm/a while the
concentration of a gas such as sulfur dioxide is
in the range of 1 to 50 wg/nr.
The difficulty is that both terms on the
xighthand side of Equation 1 can vary spatially.
Consequently, the dry deposition flux, F, can
also vary spatially. However, the problem can
be posed in such a way that effectively de-
couples the variability in C from the vari-
ability In Vd
Differentiating equation 1 with respect to
some spatial coordinate, x, gives:
dx
(2)
It is generally the case that at least in rural
areas, far from major point sources or urban
areas, the gradient of the atmospheric pollutant
concentration is very shallow, on the order of
10 *cg/nr on a scale of kilometers. On the other
hand, for reasons discussed below, the deposi-
tion velocity can change by an order of magni-
tude over a distance of a meter or less.
Thus, we can make the assumption that:
dC
—
dx
(3)
dF_
dx
(4)
and hence:
In other words, on a local scale, the spatial
variability of the dry deposition is determined
primarily by the variations in the deposition
velocity since the air pollution concentration
can be regarded as constant.
While this reduces the complexity of the
problem, it also creates some awkwardness since
it implies that the major variability is not
with the variable that is being monitored. Thus
it is not possible to estimate the spatial un-
certainty of dry deposition simply from the data
on atmospheric concentrations. Instead one must
estimate the variability of the deposition velo-
city, which entails acquiring additional infor-
mation.
It is now necessary to consider in more de-
tail the factors affecting the deposition velo-
city. As noted above, type of ground cover is
an important factor. The deposition velocity
over bare rock will be different than that over
vegetation. In fact, there are further sub-
divisions under each category. For example, a
bare limestone will react more readily with
acidic gases than bare granite.
With vegetative ground cover, there are ad-
ditional considerations. The leaves of different
species of plants have different types of sur-
face chemistry and pores, or stomata, for uptake
of acidic gases. Thus, the deposition velocity
on a pine tree will differ from that on an oak
tree. Aside from the surface chemistry, the
physical structure of the plant is also impor-
tant in determining the deposition velocity.
The spatial pattern of the leaves and branches
affects the wind velocities and hence the depo-
sition velocities in the forest canopy. For the
same reason, grasslands have a different deposi-
tion velocity than forested areas.
Not only is the type of ground cover impor-
tant, but also the spatial relationships of one
type of ground cover to another. An abrupt
change in the height of ground cover such as oc-
curs going from a grassy field to a wooded area
will create additional turbulence in air flow
patterns with a resulting higher deposition
velocity. Thus, a uniform groundcover will have
a deposition velocity different from that of an
area where patches of one type of ground cover
are intermingled with another.
Also affecting the wind field and thus the
deposition velocity is the topography. A hilly
area will have a much greater variance in wind-
speeds and directions than a flat area. The
orientation to prevailing winds and to solar
radiation can influence deposition velocities as
well because deposition velocities are influ-
enced by surface temperature and wetness.
Thus the problem of characterizing the spa-
tial variability of the dry deposition comes
down to being able to estimate the spatial var-
iability of all these factors that determine the
deposition velocity. To do this, it is neces-
55
-------
sary to assemble data and to use statistical
methods from widely diverse disciplines.
The mapping of vegetation types is a long es-
tablished part of plant ecological studies. A
variety of data bases already exist at Federal,
state or local levels on land use and on the
types of vegetation, at least for selected
areas. One example is the Geoecology data base
established at Oak Ridge National Laboratory.
It remains to be seen whether these existing
data bases have fine enough spatial resolution
to be applied here. There is also the question
of whether the sampling procedures were consis-
tent enough aaong the different data bases to
permit pooling of the data. The possibility
also exists of creating new data bases through
interpretation of aerial photography.
Ecologists have developed a number of statis-
tical techniques for dealing with spatial pat-
terns of plant species. These include indices of
species diversity, species abundance and patchi-
nesst*). As yet, these measures have not been
applied to the specific problem of dry deposi-
tion.
Similarly, there is an abundance of topo-
graphical maps on different scales. However,
these generally do not exist in digitized form.
Geomorphologists have developed their own set of
statistical measures to characterize the physi-
cal landscape. Some relevant indices of drain-
age basin morphometry include drainage density,
relative relief and ruggedness number' '•
In addition to making use of these existing
statistical tools, it will probably be necessary
to develop new methods. For example, fractal
analysis may prove very useful for character-
izing certain aspects of the landscape. For
Instance, a forest canopy has many of the attri-
butes of a fractal surface.
Integrating all this information will no
doubt require the use of computers. Specialized
methods for dealing with spatial data have been
developed under the general heading of Geograph-
ic Information Systems. The rapidly growing
field of pattern recognition may also provide
some assistance.
Ultimately, once all the uncertainties in the
individual factors have been estimated, it will
then be necessary to make some summary state-
ments about the overall spatial uncertainty of
'the inferred dry deposition rates. In essence,
the process outlined here consists of taking the
individual uncertainties in the underlying fac-
tors and propagating them through the algorithm
that calculates dry deposition. It should be
noted that this process yields only an estimate
of the precision of the Inferred dry deposition.
The bias is impossible to determine absolutely
since there is no Independent measurement tech-
nique that could be used as a reference method.
However, it may be possible to put some bounds
on the bias through the use of mass balance
estimates over a controlled watershed or
over a specific area of ground cover.
Although it is not possible to state an ab-
solute value fof the bias and hence the true
rate of dry deposition, the data can from the
dry deposition network will still be useful for
policy-making. That is because the critical
policy questions being asked tend to be of a
relative nature, such as: How much greater is
the dry deposition rate over affected' areas
compared to unpolluted background regions. Or:
Is dry deposition getting greater or lesser over
time. These questions thus often involve dif-
ferences in time or space rather than absolute
values. If the bias Is-constant, it will be
removed in calculating the differences.
At this point another statistical issue must
be dealt with. Given the variability in the
factors underlying the deposition velocity, it
is likely that the estimated precision will also
vary spatially. The characterization of varia-
tion over a region is a classic problem in geo-
statistics. and a number of methods have been
developed'5'- The most well known method is
kriging.
However simple kriging is not useful for
this purpose since it provides no insight on the
actual uncertainty at any distance from the
actual measurement points. More advanced
techniques such as indicator or probability are
now coming into use. These methods can gener-
ate maps that show areas where some minimum
level of detection is being exceeded' '•
In conclusion, the task of estimating the
spatial uncertainty of dry deposition depends
heavily on the use of auxiliary information.
This information comes from a variety of disci-
plines, each with its own statistical tools.
Integrating the information from these diverse
sources and presenting it in a meaningful way
for policymakers will be a major challenge for
the statistical community.
References
1. National Acid Precipitation Assessment
Program (1987): Annual Report 1986. US
Government Printing Office, Washington DC,
pp. 75-88
2. Hicks,B..Baldocchi.D..Hosker.R., Hutchi-
son.B.,Matt,D.,Mcmillen,R., and Satter-
field,L.(198S):"On the Use of Monitored Air
Concentrations to Infer Dry Deposition",
NOAA Technical Manual ERL ARL-141. Air
Resources Laboratory, Silver Spring,
MD, 65 pgs.
3. Pielou,E.,(1975): Ecological Diversity.
Wiley-Interscience, New York, 165 pgs.
4. Ritter,D.(1978): Process Geomorphology. Wm.C.
Brown Pubs., Dubuque Iowa, pp. 154-176.
5. Journel.A. and Huijbregts.Ch.,(1978): Mining
Geostatistics. Academic Press, London,
pp. 303-444.
6. Journel.A.(1984): Indicator Approach to Toxic
Chemical Sites. US EPA Environmental Moni-
itoring Systems Laboratory, Las Vegas, NV,
30 pgs.
56
-------
ASSESSING EFFECTS ON FLUCTUATING POPULATIONS: TESTS AND DIAGNOSTICS
Allan Stewart-Oaten
Biology Dept., University of California, Santa Barbara, Ca., 93106
I. Introduction
A standard question 1n environmental Impact
assessment 1s whether some human alteration has
caused a decrease 1n some environmental variable.
There are some obvious difficulties. An
apparent change over time could be due to
something else occurring at about the same time
as the alteration, or to a general trend over
time. A lower value of the variable near the
alteration than far away could be due to other
differences between the locations.
One way to deal with these problems 1s to
have observations both Before and After the
alteration, at sites both near to it (Impact) and
far away (Control). (These capitalized words will
refer to these fixed periods and sites hence-
forth.) The plan 1s to compare the difference
between Impact and Control sites Before with the
difference After, and to argue that a change in
this difference cannot be the result of either a
widespread time change or of a difference between
the sites, so 1s likely to Indicate an effect of
the alteratloji.
There are some obvious difficulties with this
scheme too. One Is that a change In the
difference could be due to some time-related
change other than the alteration, but occurring
at about the same time and affecting one site but
not the other. Another Is that the difference may
fall to change because the alteration affects
both sites about equally. Thus the Control needs
to be close enough to the Impact site to be
affected similarly by natural changes, but far
enough away to be little affected by the
alteration. This Important task 1s not the
subject of this paper, and we will assume that it
has been accomplished - while recognizing that
this assumption needs to be examined and can
never be guaranteed.
A more technical problem 1s that many
environmental variables fluctuate naturally over
time. This applies to physical or chemical
variables, but perhaps especially to the
densities of biological populations, which will
be our focus. Thus any observed difference
between sites can be due to any of three causes -
sampling error, natural fluctuations, and
Intrinsic differences between the sites - of
which we wish to distinguish the third. This
cannot be done with only one Before sampling time
(or only one After, unless we make the risky
assumption that variability due to the other
causes Is the same After as Before), so we assume
there are many sampling times In each period.
It 1s also difficult to do 1f the two sites
are sampled at different times. One or other site
1s likely to have been sampled at times of
generally greater abundance, so that one must
either Include covariates (frequently having
little Idea which ones matter or how to model
them) or pretend that the sampling times were
chosen and assigned randomly and then the records
of them were lost. We will therefore assume that
the two sites are sampled simultaneously, or
nearly so.
We will thus have observations X^i,.., for 1 -
1 (Before) or 2 (After), j - 1, 27 .... n*
(sampling times In period 1), k - I (Impact) or 2
(Control), and m » 1, ..., r^ (replicates).
In this paper, we will assume r^-l. The
replicates give Information on sampling; variance
but not on fluctuation over time. If the r^'s
are equal and the sampling errors are iid Normal,
with known variance, the replicates can be re-
placed by their average (a sufficient statistic)
without loss of Information. The same Is true If
the sampling variance is unknown, but the time
fluctuations (or their differences between sites)
are also lid Normal. Neither of these conditions
will usually hold, so the replicates might show
us a better "summary" than the average, and might
contribute information useful for model building
or for estimating variances in which both
sampling error and time fluctuations play a part.
However, this approach seems likely to be quite
complex, and may not be worth the trouble if the
time fluctuations are large.
Environmental Impact assessment is often cast
as a problem in statistical testing: the task is
described as one of detecting a change or a
violation of some threshhold, and one of the most
frequent techniques Invoked Is Analysis of Vari-
ance, which Is primarily a testing (rather than
an estimation) procedure. Section 2 below
discusses the testing problem, particularly the
assumptions for a standard two-sample test on the
Impact-Control differences, and ways to adjust
the data to satisfy them, if necessary. Some data
will be presented to illustrate the effectiveness
of these ways, and some of their shortcomings, in
Section 3. Section 4 briefly considers three
topics: gaps in the present testing scheme and
directions in which it needs to be extended;
weaknesses of testing as a goal, and an approach
to the more appropriate goal of estimation; and
some implications for the design of assessment
programs.
2. Testing
A natural model for the observations, X^i.,
is 1JIC
X1jk"mi+aij+bk+ctjc+e1ik U-1)
where m/is the JoveralT mefiii for the period, a^
is a time effect, bk Is a site effect, c1k 1s C
If 1-2 and k-1 but is zero otherwise, and gives
the change In the site effect between periods,
presumably due to the alteration, and the eiik's
are lid errors. If we believed this model, and
wanted to estimate c, we would use two-sample
test statistics derived from the differences,
"ij"*1jl~*1j2-
Thrs procedure remains valid under weaker
assumptions than those In (1.1). These
assumptions cannot be guaranteed but can be
checked for plausibility against both biological
theory and the observations themselves. They are:
(I) The DJ-J'S all have the same mean, Mi.
That Is, modefj(l.l) does not need a time-x-site
interaction term, d11k, so Mi-bi-b?. If there is
no effect of the alteration, the D2j's also all
have this mean. If the alteration floes have an
57
-------
effect, the model says that the D2j's all have
mean t^-bi-bo+c, but the test would not be
Invalidated if c varied over time, though 1t may
well be Inefficient.
(2) The errors, Eij"eiji'eij2» are *n~
dependent.
(3) The EIJ'S are identically distributed; in
particular, tliey have equal variances. The Eo-i's
also are Identically distributed. J
(4) The distribution of the EH's Is the same
as that of the E2i's. J
(5) The E^'sHre Normal.
These assumptions are not all essential, or
equally Important.
The first seems the most Important. If 1t
does not hold, at least approximately, then
neither "the" Before nor "the" After mean
difference, whose comparison Is the aim of the
test, Is well-defined.
In one sense, this assumption can be
guaranteed. The errors, e^, will be the result
of both sampling error and the "random"
fluctuations over time of the actual populations.
The distinction between "fixed" and "random" time
effects seems necessarily arbitrary. Individual
births and deaths, and probably movements, of
individuals or schools, seem best regarded as
random. But the rates of these changes may be
altered by storms, upwelllngs or other physical
or chemical fluctuations. These may be to some
extent predictable and can sometimes be long-
lasting, which suggests fixed effects, but they
are Irregular and hard to predict, so can be
argued to be random. Finally, all these sources
of change may depend on seasons, which seem
predictable enough to be called fixed - except
that their effects on the organisms seem to vary
1n timing and amount from year to year.
It may thus be possible to argue that time
effects should all be classified as "random", In
which case they and time-x-site Interactions can
all be regarded as part of the error. Assumption
(1) is then formally Justified. But this Is only
a delaying tactic: any test results can then be
attributed to the effects of the particular
sampling times chosen, and the problem must be
dealt with again. It also reduces the
plausibility of assumption (2), since major
events like seasons or long-lasting weather
conditions (e.g. El Nino) will affect several
sampling times.
Another approach might be to try to list the
time-related effects to be called "fixed", and
explicitly allow for them 1n the model. But we
usually do not know which effects to Include or
how to model them. The response of many species
to even the most obvious candidate, seasons, Is
not consistent, especially in timing, and the
shape of the response can rarely be modelled by a
simple function like a sine wave.
These problems vanish If large or long-
lastlng time-related effects are approximately
the same at the two sites, so they cancel in the
differences: that is, 1f these time-related
effects are approximately additive with respect
to site.
The Idea 1s that there Is a "region",
containing Impact and Control, over which major
time-related changes, such as seasons or major
weather conditions, are Imposed approximately
uniformly. The problem 1s that these uniform
changes are imposed on locations that differ, so
they may not have uniform effects. To these
effects are added the fixed effects of the
locations themselves, and local fluctuations:
local time-related changes such as minor weather
effects or water movements. Together these
comprise the external forces influencing the path
of the population density over time. The actual
population is determined by "random" events like
Individual deaths, births or migratory movements,
but these stochastic events occur at rates
governed by the external forces together with
"internal" forces resulting from the size of the
population Itself, and of other interacting
populations, in the recent past.
Treating the uniform, major changes as given,
the path traced out by the actual population
density over time at a given site can be seen as
a stochastic process whose mean function Is
determined by the uniform changes, and whose
deviations from this function are due to the
local or Internal forces, or to the chance events
whose probabilities are determined by all these
forces. The assumption that the Impact-Control
differences have equal means over time requires
that the mean function at Impact differ from that
at Control by a constant amount: the differences
between the two sites should not lead to
differences In the effects that the major time-
related changes have at these sites.
It 1s the mean functions that need to be
additive In this sense, not the actual
populations themselves, which incorporate the
deviations as well. For example, additivlty
cannot be tested by taking replicate samples at
each time and place, and comparing the variation
among replicates with the variation due
"interaction" between the two sites and the ni
sampling times. This tests non-additlvity In the
actual population values (which we expect because
of the deviations) but not 1n the mean functions.
Even if we knew the actual populations at each
time, we would still need to look for non-
additlvity In a 2-way table (times and sites)
with one observation per cell. The addition of
sampling error does not change this basic
situation: the only difference Is that we must
use estimates of the actual population instead of
known values.
With only one observation per cell, we cannot
test for all possible interactions, so we choose
the most likely. This may depend on the situation
and the species, but in most cases we would
expect non-additivity to show up as a relation
between the difference between the two sites and
the overall abundance of the species. An obvious
measure of overall abundance is the average of
the two sites, so we are led to look for a
monotone relation between the difference and the
average. Two simple test statistics for this are
Spearman's rank correlation coefficient and the
slope of the regression of the differences
against the averages - which corresponds to
Tukey's (1949) non-additivity test for this situ-
ation.
As these tests Imply, we would expect time-
related changes to have larger effects at dense
sites than at sparse ones, for many species.
Many biologists would expect these changes to be
58
-------
multiplicative, and would suggest using the logs
of the sample abundances. Though appealing, this
suggestion may be the result of misplaced
Intuition (Stewart-Oaten 1986). Additive changes
In rates will lead to multiplicative changes In
density for exponentially Increasing or declining*
populations, but there Is no obvious guarantee
that the changes In rates are additive and most
biological populations are less likely to be
changing exponentially than to be at a rough
equilibrium, fluctuating In response to changes
In carrying capacity.
Rather than pre-judge this Issue, a range of
transformations, Including logs, can be
considered. Probably the best-known and most
convenient family 1s the one used by Box and Cox
(1964),
Y(X)-{(X+c)b-l}/b (1.2)
which can be thought, of as the shift-plus-power
transformation, (X+c)° when b-0 and log(X+c) when
b-0. In this paper, b 1s restricted to lie
between -1 and 1. Larger values of b Imply that a
given time-related Input will change the un-
transformed abundance more at the sparser site
than at the denser. Smaller values Imply that the
amount of habitat (volume of water or area of
land) required to support a single animal will
change more at the site where It Is smaller. Both
seem unlikely.
Assuming there are values of b and c for
which Y 1s additive, homoscedastic and Normal,
Box and Cox proposed estimating them by maximum
likelihood or Bayeslan methods. There are some
difficulties with this approach. A technical one
Is that, at least for some values of b, the
likelihood Is not a well-behaved function of c
(Berry 1987). A conceptual one, first raised by
Tukey's discussion of Box and Cox (1964), Is that
the method may emphasize Normality,
homoscedasticlty and additivity In that order
(especially when there Is only one observation
per cell), when the order of Importance 1s
generally agreed to be the reverse. It Is often
asserted that these properties tend to go
together (e.g. Atkinson 1985, p. 81), but this
claim seems to rest on a small set of non-
randomly chosen examples.
In this paper, we propose first assessing
additivity by the Tukey and Spearman tests, and
choosing a transformation from among those for
which these tests (and accompanying plots) do not
Indicate significant non-addltlvlty 1n the Before
period. (Non-additivity After might be an effect
of the alteration. However we may want to
consider 1t If the After observations are
different overall: for example, If the Control
area has clearly changed, Implying the whole
region has, for some reason other than the
alteration. Our transformation choice method Is
very like fitting a regression, and we face the
same risks when we extend the estimated
relationship well beyond the observed range.)
The null hypothesis for these tests Is really
stronger than additivity: the differences and
averages will also be correlated If the Impact
and Control means are Identical but the variances
differ. However, for population counts, this Is
likely to lead us astray only If the variation In
the averages, due to uniform time-related
effects, Is small compared to any difference
between Impact and Control in variation due to
local time-related effects and sampling error.
For most species, this seems far from the case.
In practice, we are likely to find that
either many transformations seem approximately
additive or none do. The latter case could arise
If Impact is denser than Control under some
conditions (e.g. when overall abundance is high)
and sparser under others. For such a species, the
alteration may also be beneficial under some
conditions but not others, and it may be best to
carry out separate analyses for the two sets of
conditions. The possibility of such a time-x-site
interaction that cannot be transformed away forms
part of the motivation for methods like those of
Section 4.
If many transformations are approximately
additive, we must choose on other grounds.
Equality of variances - the main aspect of our
assumption (3) - Is widely agreed to be the next
most Important goal, and there are reasons for
thinking this to be the case here.
Often this assumption 1s not crucial. If we
have additivity and independence, we have
observations Y^j with distributions F^, having
means m<. It is natural to compare this with the
homogeneous situation where Yi* has distribution
G1 • nr'ZF^ Stigler (1975) shows that any
linear functloiT of the order statistics has the
same mean and at least as small a variance in the
heterogeneous situation as in the homogeneous
one. These results Include most location
estimates and some scale estimates. They also
hold for the standard variance estimate. Thus a
t-like test, with a measure of discrepancy in the
numerator and an estimate of its standard
deviation In the denominator, should have
approximately the same mean but a smaller
variance than In the homogeneous case. Type 1
error probabilities will then be decreased and
Type 2 error probabilities will be Increased for
alternatives near the null but decreased for
distant alternatives.
This comparison with the homogeneous case
ignores two points. One 1s that another
transformation, for which the corresponding Fj-i's
are more alike, may be more efficient - may hive
more power to detect effects of the alteration.
This seems impossible to decide In general. If
there is an effect, then Impact and Control are
different either Before or After or both, so at
least one of the transformations 1s non-additive
In at least one period. The comparison would also
depend on which tests were used, and on efforts
to allow for heterogeneity by weighting the
observations (Box and Hill 1974).
Perhaps the main problem Is that the
alteration may have different effects at
different times. For example, an alteration that
warms part of a body of water might be producing
a temperature that the animals prefer In Winter
but dislike In Summer. If Impact and Control are
approximately equally abundant in the Before
period, differences and averages will appear
unrelated for almost any transformation. But It
still may be that the size of a time-related
effect depends on the size of the local
population It Is operating on. If populations are
smaller in Winter, when the alteration has a
positive (or large) effect, than In Summer, when
the effect is negative (or small), the results of
59
-------
tests and estimates may depend strongly on the
transformation choice.
This Is not a simple Issue. In a case like
this, the results will also depend on which
season is better represented In the sampling
times chosen. Also, It is not clear how "the"
effect should be assessed even if we know the
pattern: does the positive effect in Winter, when
the local population's survival or genetic
structure may be threatened, outweigh the
negative effect in Summer, when the population's
interactions with Us environment are at their
peak?
A workable, though Imperfect, solution may be
to carry out three analyses, one for all the
data, another for Summer (or "abundant" periods
as determined by, say, the Control value), and
the third for Winter. For each analysis, the
choice of transformation (among those that appear
to be additive) could be determined partly by
whether absolute deviations from the (supposedly)
common mean tend to increase, decrease or stay
the same as overall abundance (measured by the
average of Impact and Control) increases.
Different variances for the Before and After
differences (assumption 4) do not seem to affect
the validity, of t-tests much (Posten 1982), If
the sample sizes are nearly equal. If they are
not, the Welch-Satterthwalte modification seems
almost always to give a conservative test. Non-
Normality also seems to have little effect on the
validity of the t-test (e.g. Posten 1978). The t-
test 1s not efficient for many non-Normal distri-
butions, but this problem seems best approached
by using more robust tests along with It.
The results on which these comments are based
are for symmetric situations: either the errors
have symmetric distributions, or the Before and
After errors have the same distribution (and
sample size). One-sided one-sample t-tests in-
volving skewed distributions may have actual
levels far from the nominal ones (Posten 1979),
though this seems to occur only for small samples
or extreme skewness. This suggests, though, that
two-sample tests might go astray if the Before
and After difference distributions are skewed and
not the same shape. In such a case, also, the
various robust tests we might use in concert with
the t-test - e.g. the Wilcoxon-Mann-Whitney test,
or the t-like test based on the 25% trimmed mean,
which will be used here * will be testing
different things: the true Before and After
means, the true trimmed means, and the true
probability that a random Before difference will
exceed a random After difference may not all
agree on whether Before differences are "larger"
than After differences.
Tests and plots for symmetry and for
comparing the shapes of the distributions of the
Before and After differences will thus be useful,
partly for deciding whether a particular
transformation is "acceptable" but more for
Indicating what to do if tests for different
acceptable transformations disagree. There are
many tests for symmetry, most of which are too
prone to reject symmetry for symmetric
distributions with long tails. Gupta's (1967)
test does not seem to do this (Antille, Kerstlng
and Zucchini 1982). A very crude measure of
agreement between the shapes of the distributions
of the Before and After differences (independent
of differences of location or scale) is obtained
by standardizing each observation (subtracting
the mean and dividing by the standard deviation)
and calculating the .standardized Smirnov
statistic, {n1n2/(n1+n2)}1/zD, where D is the
maximum difference between the cdf's (Lehmann
1975, p. 37). If this measure is under about 1,
the shapes seem compatible (Lehmann, Table F).
(Because the standard deviation may be a poor
measure of scale for some distributions, I have
removed the top and bottom 10% of each distribu-
tion before calculating this scale measurement in
Section 3's examples. A criterion smaller than 1
might be desirable after all this matching.)
Assumption (2) remains. The differences form
a time series, and would be expected to be
serially correlated if the sampling times were
very close together. On the other hand, if we
have an additive transformation, the major time-
related effects may cancel when we take the
Impact-Control differences, and the minor, local
effects remaining may not be long-lasting. (A
more detailed argument of this kind appears In
Stewart-Oaten, Murdoch and Parker 1986 and in
Stewart-Oaten 1986.) There are many ways to test
for serial correlation. Two simple ones are the
von Neumann ratio (von Neumann 1941; Anderson
1970, Table 6.3), the predecessor of the Durbin-
Watson test, and the rank version of this test
(Bartels 1982).
If these tests indicate that serial
correlation has not been removed, we have a
difficult problem: a time series whose
observations are not (usually) equally spaced
and, whether they are or not, is likely to be
non-stationary because the magnitude and
frequency of disturbances, and the vulnerability
of the population, vary over time. The serious
modelling needed 1s beyond the scope of this
paper. In an example below (Species 3), I have
simply adjusted the observations to approximate
Independence in a very ad hoc manner. (For cases
where the series is amenable to ARIMA modelling
or to ARMA modelling with a fixed seasonal compo-
nent, see Box and Tlao 1965 and 1975, and Reinsel
and Tlao 1987.)
Thus a scheme for testing for an effect of
the alteration Is: transform the raw Xy^'s to
additive Y^t/s as indicated by the TuRey and
Spearman tesTs and check for a relation between
the averages (Y^ ) and the absolute deviations
of the difference's' (Y^T-Y^,) from their "
In the examples below, thisjrelation was examined
by computing the p-values for the slope of the
regression of the absolute deviations against the
averages and the Spearman test for correlation
between them. Check the differences for serial
correlation, and allow for it if present. Carry
out a two-sample test to see If the differences
have become smaller following the alteration.
Check whether the test result Is likely to be
affected by skewness, using Gupta's test and a
measure of the discrepancy between the shapes of
the differences of the transformed Before and
After values. Also check whether the result may
be due to changes other than the alteration; one
check Is to regress the Before differences
against time
Because the t-test lacks robustness, 1t makes
sense to use robust tests as well. Similarly, it
60
-------
makes sense to use a range of apparently
acceptable transformations. If the results
disagree, we need further thought as to which
ones most nearly describe the changes of concern.
If the results agree, we need to think about the
size of the effect and mechanisms. This will be
easier 1f the transformed variables are In
interpretable units: b-1 (change in absolute
numbers), b-0 (percentage change) and b— 1
(change in the amount of space needed per animal)
seem best.
3. Examples
The following examples all use real data from
an ongoing assessment project. However, I have
rescaled all numbers so that the maximum In each
data set Is 95 and have Interchanged "Impact" and
"Control" 1n some cases.
Three kinds of table appear. Data tables list
Impact ("I") and Control values by sampling time
(read down the columns first). Times were not
listed unless the Independence checks Indicated a
serial correlation problem or there were separate
analyses for different "seasons". They are given
In units of a year, beginning with January 1 of
the first year the species was sampled.
Preliminary checks give the power and
constant of the transformation, the p-values for
the Tukey and Spearman tests (differences vs.
averages), the slope and Spearman tests for a
relationship between the average and the absolute
deviation of the difference from the median
difference, the von Neumann test for serial
correlation and Its rank version, Gupta's test
for symmetry, and the test for a trend with time.
"Shape" Is the comparison of the shapes of the
distributions of the Before and After
differences. All tests are two-sided except those
for serial correlation. Negative "p-values"
Indicate a negative slope or correlation of the
tested variable against the averages. All
transformations are monotone Increasing:, when b
1s negative, the transformation 1s -(X+c)D.
Four tests for an effect of the alteration
were used: the standard t-test, the Wllcoxon-
Mann-Uh1tney test, and the t-Uke tests based on
the 25% trimmed mean (Yuen and D1xon 1973) and on
the blwelght estimate (Kafadar.1982; this paper
1s missing the factors [9 HAD]2 and [6 sb1]< In
the definitions of sb4 and ST respectively; I
have used 4 Instead of 6 1n ner equ (2), see
Kafadar 1983).
When a non-positive power was used, and the
constant was zero, all sampling times with a zero
at either Impact or Control were omitted from the
analysis. Sampling times with zeros at both sites
were omitted from all analyses, though they are
listed with the data.
In all plots, filled squares Indicate Before
data and empty squares After data.
The data and analyses presented here do not
support any conclusions relevant to the actual
project. First, as already stated, Impact and
Control have sometimes been reversed. Second, the
analyses are all purely numerical. I have made no
use of anything known about the species Involved,
or even given their names. In practice these
analyses should be used In concert with
Information about the physiology and population
dynamics of the species concerned. Third, the
examples here have been selected from data on
hundreds of species, to Illustrate statistical
problems: they are not typical.
TABLE 1: Data for Species 1.
I
0.83
2.17
0.11
0.37
3.13
0.10
0.08
0.73
15.14
4.68
0.00
C
0.68
0.87
0.39
0.19
5.22
0.14
0.03
0.44
1.31
2.46
0.00
BEFORE
I C
0.25 0.52
80
1.09
0.98
14.83
1.42
0.07
0.16
0.49
0.00
5.56
2.39
10.00
0.62
0.15
0.07
0.21
0.02
3.79
I C
28.62 34.75
.60
.16
.09
1.35
0.15
0.41
1.46
1.19
1.55
3.19
2.
0.
0.
1.31
0.71
1.18
5.27
4.77 1.15
41.57 12.43
I
1.84
0.32
0.54
0.02
0.08
0.81
0.13
0.74
C
0.87
0.64
2.14
0.16
0.47
8.45
0.34
1.20
AFTER
I C
0.34 0.02
0.00 0.00
1.34 22.29
0.11 0.37
3.54 11.18
64.91 10.59
0.79 1.92
0.01 0.01
I C
2.45 4.57
5.41 5.69
75.26 35.81
95.00 27.95
1.18 0.04
0.02 0.05
5.38 3.82
.
Species 1. The data are 1n Table 1. The
Before samples are spread over 5 years, and the
After over 3. Table 2 shows that serial
correlation Is not a serious problem. However, no
TABLE 2
power
00
1.00
1.00
1.00
0.50
50
50
50
00
00
0.00
0.00
0.00
0.00
1.00
const
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
Tukey
0.00
0.73
0.58
0.42
0.63
0.51
0.47
0.34
0.58
0.26
0.24
0.20
0.16
0.10
0.00 0.01
: Preliminary checks for Soedes 1. all
Spear
0.28
0.85
0.81
0.46
0.89
0.79
0.75
0.35
0.88
0.48
0.43
0.28
0.24
0.14
0.12
vslope
0.00
0.15
0.82
0.20
0.14
0.96
0.72
0.19
0.75
0.60
0.27
0.09
0.03
0.00
0.00
vspear
0.00
0.13
0.93
0.19
0.14
0.94
0.76
0.15
0.61
0.75
0.26
0.05
0.01
0.00
0.00
von N
0.88
0.89
0.71
0.47
0.88
0.75
0.69
0.50
0.74
0.70
0.64
0.56
0.48
0.41
0.80
rank vN
0.89
0.85
0.85
0.72
0.90
0.90
0.85
0.76
0.86
0.86
0.85
0.82
0.72
0.63
0.64
times.
shape
0.91
0.40
0.71
0.53
0.31
0.58
0.71
0.58
0.68
0.76
0.71
0.71
0.71
0.53
0.89
Gupta
0.32
0.48
0.42
0.45
0.41
0.17
0.23
0.21
0.20
0.15
0.13
0.27
0.31
0.47
0.23
trend
0.82
.69
.89
.95
.73
0.84
0.89
0.98
0.83
0.90
0.91
.96
.98
.92
0.
0.
0.
0.87
61
-------
power const
TABLE 4; Preliminary checks for Species 1. abundant times.
00
00
00
00
-0.50
•0.50
•0.50
-0.50
0.00
.00
.00
.00
.00
.00
.00
-1.
-1.
-1.
-1.
0.
0.
0.
0.
0.
1.
00
50
00
00
20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
00
00
Tukey
0.63
0.77
0.88
0.95
0.98
0.97
0.94
.84
.64
.63
.62
.60
0.56
0.48
0.16
Spear
0.30
0.49
0.60
1.00
0.73
0.76
0.92
0.94
0.50
0.50
0.45
0.39
0.39
0.33
0.45
vslope
0.03
0.09
0.17
0.34
0.24
0.30
0.33
0.49
0.92
0.95
0.98
0.96
0.86
0.63
0.01
vspear
.08
.09
0.15
0.62
0.20
0.28
0.47
0.65
0.95
0.95
0.91
0.76
0.50
0.39
0.00
positive power gives samples that appear either
additive or homoscedastlc. (I omitted from the
table many of the transformations that failed
these tests. The results for positive powers
given here are typical of the others.) Once we
restrict to the transformations that seem
acceptable by the preliminary checks, the two-
sample tests all Indicate a significant change
(Table 3).
TABLE 3: Species
power
-1.00
-1.00
-1.00
-1.00
-0.50
-0.50
-0.50
-0.50
0.00
0.00
0.00
0.00
0.00
0.00
1,09
const
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
9.QQ
t-test
0.400
0.037
0.015
0.011
0.041
0.022
0.019
0.017
0.061
0.055
0.051
0.051
0.058
0.083
0.806
Wile
0.012
0.003
0.005
0.015
0.004
0.008
0.009
0.023
0.019
0.028
0.027
0.033
0.050
0.052
P.Q65
trlm-t
0.187
0.012
0.011
0.024
0.014
0.010
0.013
0.034
0.031
0.043
0.054
0.065
0.070
0.094
0.215
blwt-t
0.005
0.001
0.007
0.029
0.003
0.010
0.014
0.041
0.037
0.059
0.071
0.092
0.124
0.197
0-174
While this seems convincing, we may want to
look deeper because the largest Impact-Control
differences actually occur In the After period.
While these are at times of high overall
abundance, we might wonder whether effects at
"abundant" times are different from those at
"sparse" times. Accordingly, I divided the
sampling times Into those for which the Control
von N
0.49
0.58
0.63
0.68
0.66
0.67
0.68
0.70
0.74
0.74
0.74
0.75
0.75
0.76
0.85
rank vN
0.73
0.69
0.71
0.66
0.70
0.70
0.66
0.66
0.73
0.73
0.73
0.73
0.73
0.75
0.83
shape
1.01
0.62
0.54
0.54
0.54
0.54
0.54
0.35
0.43
0.35
0.35
0.35
0.35
0.35
0.70
Gupta
0.50
0.39
0.44
0.44
0.50
.50
.50
.44
.50
.50
.50
.50
.50
.50
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.33
trend
0.58
.58
.57
.56
.56
.56
.56
.55
.56
0.56
0.56
0.56
0.56
0.57
0.87
was above Its median of .867, and those below.
(Table 1's values are rounded. The second
sampling time in the Before period was
"abundant", while the first After time was
"sparse".)
Table 4 shows that the transformations that
appear acceptable overall do so for abundant
times too, with similar test results (Table 5).
(The trlm-t values might be affected by the small
samples.) Table 6, however, shows few transforma-
tions as acceptable on all counts for the sparse
times: some promising ones give mildly skewed
Before distributions. Test p-values are larger
than those conventionally required for
"significance" (Table 7), though they also
suggest a reduction due to the alteration. The t-
test here is affected by the long upper tail of
the After differences.
TABLE 5; Species 1 tests, abundant times
TABLE 6: Preliminary checks for Species 1. soarse times.
power const
-1.00
-1.00
-1.00
-1.00
-0.50
-0.50
-0.50
-0.50
0.00
0.00
0.00
0.00
0.00
0.00
1.00
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
0.00
t-test
0.005
0.005
0.006
0.009
0.008
0.010
0.011
0.015
0.047
0.049
0.051
0.056
0.064
0.089
0.789
Wile trlm-t
0.004
0.006
0.007
0.016
0.014
0.016
0.016
0.030
0.052
0.058
0.058
0.058
0.064
0.064
0.216
0.113
0.129
0.124
0.096
0.136
0.124
0.119
0.104
0.135
0.136
0.137
0.139
0.154
0.192
0.451
biwt-t
0.016
0.018
0.021
0.028
0.027
0.030
0.032
0.041
0.083
0.086
0.089
0.095
0.107
0.138
0.047
power
.00
.00
.00
.00
.50
.50
-0.50
-0.50
0.00
0.00
00
00
0.00
0.00
1.00
const
0.00
0.50
1.00
2.00
0.20
0.40
.50
.00
.00
.05
.10
.20
0.40
1.00
0.00
0.
1.
0.
0.
0.
0.
Tukey
0.01
0.28
0.11
0.03
0.35
0.19
0.15
0.05
0.79
0.26
0.21
0.14
0.07
0.02
0.00
Spear
0.64
0.31
0.07
0.05
0.44
0.15
0.13
0.05
0.74
0.42
0.21
0.13
0.06
0.03
vslope vspear
0.02
.00
0.39
0.79
0.62
0.24
0.52
0.63
0.84
0.09
0.18
0.35
0.59
0.99
0.46
0.06
.00
0.32
0.93
0.49
0.15
.70
.82
.56
.04
.13
.48
0.83
0.71
0.36
0.09
0.
0.
0.
0.
0.
0.
von N
0.85
0.87
0.84
0.79
0.87
0.86
0.85
0.81
0.88
0.87
0.86
0.85
0.82
0.77
0.63
rank vN
0.94
0.92
0.80
0.77
0.92
0.86
0.79
0.77
0.85
0.87
0.85
0.79
0.82
0.74
0.66
shape
0.43
.60
60
44
64
60
60
56
71
60
0.73
0.60
0.60
0.56
0.56
Gupta
0.29
0.10
0.23
0.43
0.05
0.14
0.20
0.31
0.09
.03
.04
.12
.31
0.43
0.35
0.
0.
0.
0.
trend
0.97
.29
.28
.29
.31
.28
.28
.29
.31
0.35
0.31
0.29
0.28
0.30
0.34
62
-------
TABLE 7; Species 1 tests, sparse times.
power
-1.00
-1.00
-1.00
-1.00
-0.50
-0.50
-0.50
-0.50
0.00
0.00
0.00
0.00
0.00
0.00
1.00
const
0.00
0.50
1.00
2.00
0.20
0.40
0.50
1.00
0.00
0.05
0.10
0.20
0.40
1.00
0.00
t-test
0.441
0.308
0.323
0.351
0.314
0.317
0.321
0.341
0.309
0.332
0.328
0.330
0.341
0.367
0.438
Wile
0.198
0.130
0.167
0.167
0.119
0.154
0.167
0.167
0.107
0.130
0.141
0.141
0.167
0.181
0.154
tr1m-t
0.491
0.170
0.181
0.206
0.175
0.174
0.178
0.201
0.208
0.187
0.179
0.186
0.203
0.218
0.277
b1wt-t
0.495
0.168
0.172
0.200
0.183
0.174
0.175
0.192
0.210
0.202
0.190
0.187
0.193
0.217
0.231
Perhaps the best transformation overall Is
ln(X+.2), partly because 1t 1s easier to
Interpret: there 1s not much difference among the
three or four best. F1g. 1 shows Before and After
synmetry plots for this transformation, using all
sampling times. This plots Y/yj\-M against M-
Y(m» where Y^) and Y(L1j are, respectively,
the 1th values above and below the median, and H
Is a "middle'" value, which I took as the point
nearest to the 10% trimmed mean In the Interval
between the (n/2-1) and (n/2+2) (or the (n-l)/2
and (n+3)/2 1f n 1s odd) order statistics. (The
median 1s more commonly used, but Its position
within the Interval formed by the Inmost three or
four order statistics can affect the appearance
of the plot severely.) Except for one Before
outlier, the two distributions seem similar In
shape and not severely asymmetric. Thus different
location estimates should lead to similar tests
and estimates of change.
Flo. 1; Symmetry plot for Species 1.
Species 2. Table 8 gives the data, and Table
9 shows preliminary checks on some of the
transformations. (The four After times with zeros
are omitted when the power Is non-positive and
the constant Is zero.) Although some
transformations seem satisfactory, there, are
standard transformations that do not: X, X*3 and
ln(X+l) all seem unsuitable.
Results of the two-sample t-test for an
effect of the alteration vary from about .06 to
.74; other tests are less erratic, but still far
from constant (Table 10). However,
transformations satisfying the various criteria
TABLE 8: Data
time
0
0
0
0
1
.641
.644
.721
.723
.189
.208
.227
.257
.284
.309
.342
.361
.380
399
1
1
418
437
459
478
0
0
0
0
9
16
10
18
4
1
8
I
.532
.970
.019
.018
.020
.596
.046
.061
.960
.048
.294
86.859
6
13
2
1
2
2
.629
044
.322
520
460
224
for Species 2.
0
0
0
0
33
27
95
8
12
1
70
21
19
4
2
2
0
0
BEFORE
C
.048
.120
.262
.013
.368
.421
.000
.031
.129
.491
.660
.477
.542
.241
.435
.724
.643
729
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
time
.495
.514
.533
.552
.571
.590
.609
.628
.648
.667
.686
.724
.735
.743
.762
.781
.515
.879
8
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I
.746
.561
.613
.544
.679
.931
.968
.509
.084
.023
.032
.109
.005
.172
.002
.009
.265
.111
6
1
1
0
1
0
0
0
0
0
0
0
0
C
.097
.103
.248
.242
.665
.510
.461
.025
.919
.034
.016
.300
.003
0.111
0
0
0
0
.264
.116
.370
.121
AFTER
time
4.
4.
4.
5.
5.
5.
5.
5.
5.
5.
6.
6.
6,
548
644
953
197
347
410
464
642
757
970
058
307
364
0.
0.
2.
12.
1.
1.
0.
0.
0.
2.
15.
12.
2,
I
948
030
167
083
597
351
087
O'OO
000
176
739
633
129
2
0
0.
3.
2.
8.
0.
0.
0.
0.
5.
14.
2,
C
578
029
197
712
109
123
876
004
023
879
630
313
291
time
6
6
6
6
6
6
7
7
7
7
7
7
7,
.425
.482
.545
.597
.652
699
134
175
249
329
553
671
734
13
5
I
661
112
0.236
0
0
0.
5.
0.
2.
5.
0.
0.
0,
004
048
007
573
164
453
654
081
000
915
16
1
5
0
0
0
1
0.
11.
23.
1.
0.
0
C
266
733
257
303
000
047
687
154
486
911
260
002
903
are more consistent. The (power, constant) pairs
(-1, 1), (-.75, .5 or 1), (-.5, .2 or .5), (-.25,
.05 to .2) and (0, 0 to .05) do so here; their t-
test p-values vary from .07 to .15, but Wilcoxon
ranges from .12 to .17 and biwelght from .14 to
.2 (except for (0,0), which omits four After
observations). The trim-t results are
consistently higher. This may be due to the inner
half of the After data being more "stretched out"
than would be expected from a Normal
-distribution. All tests seem similarly affected
by the After distribution's skewness. Fig. 2 is a
symmetry plot for ln(X+.01).
Flo. 2: Symmetry clot for Soecies 2.
S.J
63
-------
TABLE 9: Preliminary checks for Species 2. all times.
power const Tukey Spear vslope vspear
-1.00 0.50 0.81 0.85 -0.08 -0.11
-1.00 1.00 -0.84 0.97 -0.34 -0.81
-1.00 3.00 -0.24 -0.38 0.01 0.00
-0.75 0.20 0.62 0.94 -0.03 -0.03
-0.75 0.50 -0.99 0.97 -0.19 -0.54
-0.75 1.00 -0.60 -0.71 -0.95 0.32
-0.75 3.00 -0.13 -0.34 0.00 0.00
-0.50 0.10 0.61 -0.90 -0.03 -0.06
-0.50 0.20 0.88 -0.98 -0.11 -0.37
-0.50 0.50 -0.66 -0.79 -0.83 0.53
-0.50 1.00 -0.32 -0.36 0.09 0.01
-0.25 0.05 0.79 -0.92 -0.09 -0.43
-0.25 0.10 -0.95 -0.92 -0.31 -0.80
-0.25 0.20 -0.64 -0.82 0.97 0.51
-0.25 0.50 -0.28 -0.35 0.03 0.01
0.00 0.00 0.93 -0.77 -0.39 -0.88
0.00 0.01 -0.78 -0.76 0.96 0.53
0.00 0.02 -0.64 -0.71 0.59 0.34
0.00 0.05 -0.44 -0.66 0.14 0.08
0.00 1.00 -0.06 -0.43 0.00 0.00
0.50 0.00 -0.05 -0.47 0.00 0.00
1.00 0.00 -0.06 -0.40 0.00 0.00
von N rank vN
0.63 0.71
0.71 0.84
0.84 0.85
0.54 0.65
0.66 0.80
0.74 0.91
0.90 0.87
0.51 0.70
0.60 0.74
0.71 0.87
0.81 0.88
0.54 0.68
0.62 0.68
0.71 0.79
0.83 0.89
0.55 0.79
0.68 0.78
0.73 0.78
0.80 0.81
0.97 0.92
0.98 0.92
0.97 0.84
Overall abundances vary greatly with time,
suggesting a look at "abundant" and "sparse"
periods. In this case, these seem to correspond
quite well to "seasons": a "Winter" from 0 to .43
seems to contain most of the large values and few
of the small. This gives 11 observations In
Winter for both Before and After, and 25 Before
and 15 After in Summer.
Table 11 gives preliminary checks for Winter.
Most transformations with negative powers are
acceptable by most of our standards. However,
most lead to markedly asymmetric distributions.
The positive powers give transformations for •
which the absolute deviations seem to increase
with overall abundance. If this means that
effects of the alteration when abundance is high
are more
this may
view. As
important
the same
change.
Influential than those when it is low,
not be undesirable from some points of
It happens, the choice here is not
: all transformations give essentially
result (Table 12), no significant
TABLE 11: Preliminary checks
power const Tukey Spear vslope vspear
-1.00 1.00 0.70 0.83 0.77
-1.00 3.00 -0.77 -0.59 0.24
-1.00 10.00 -0.52 -0.35 0.20
-0.75 1.00 -0.99 -0.59 0.43
-0.75 3.00 -0.66 -0.28 0.16
-0.75 10.00 -0.51 -0.40 0.14
-0.50 0.50 -0.88 -0.40 0.27
-0.50 3.00 -0.58 -0.35 0.14
-0.50 10.00 -0.50 -0.38 0.09
-0.25 0.00 -0.79 -0.48 0.14
-0.25 0.01 -0.79 -0.48 0.13
-0.25 0.05 -0.77 -0.48 0.12
-0.25 0.20 -0.73 -0.48 0.10
0.00 0.00 -0.61 -0.38 0.06
0.00 0.01 -0.61 -0.38 0.06
0.00 0.05 -0.61 -0.38 0.06
0.50 0.00 -0.52 -0.38 0.01
1.00 0.00 -0.52 -0.32 0.00
-0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
87
45
29
61
45
14
43
24
06
28
28
28
22
05
05
05
09
05
TABLE 10;
power
-1.00
-1.00
-1.00
-0.75
-0.75
-0.75
-0.75
-0.50
-0.50
-0.50
-0.50
-0.25
-0.25
-0.25
-0.25
0.00
0.00
0.00
0.00
0.00
0.50
l.QQ
for Scecies
const
0.50
1.00
3.00
0.20
0.50
1.00
3.00
0.10
0.20
0.50
1.00
0.05
0.10
0.20
0.50
0.00
0.01
0.02
0.05
1.00
0.00
0.00
shape
1.32
1.04
0.96
1.08
1.00
0.96
0.68
0.80
0.84
0.89
0.96
0.56
0.68
0.77
0.96
0.41
0.44
0.50
0.56
0.64
0.49
1.28
Gupta
0.30
0.31
0.39
0.34
0.38
0.32
0.47
0.31
0.37
0.39
0.36
0.43
0.43
0.43
0.39
0.45
0.40
0.38
0.41
0.43
0.41
0.32
Snecies
t-test
0.060
0.076
0.181
0.065
0.069
0.098
0.236
0.075
0.072
0.092
0.142
0.089
0.089
0.102
0.151
0.099
0.130
0.135
0.147
0.366
0.565
0.736
2. Winter.
von N rank vN
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.84
.95
.97
.91
.96
.97
.93
.97
.96
.94
.94
.95
.95
.96
.96
.96
.95
.91
0.66
0.96
0.92
0.96
0.91
0.95
0.91
0.92
0.94
0.92
0.92
0.92
0.92
0.94
0.94
0.94
0.94
0.90
shape
0.71
0.47
0.71
0.71
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
0.47
trend
0.19
0.27
0.70
0.20
0.23
0.37
0.89
0.24
0.24
0.34
0.56
0.31
0.32
0.37
0.59
0.46
0.49
0.51
0.56
0.85
0.68
0.58
2 tests: all
Wile
0.154
0.134
0.194
0.125
0.171
0.119
0.292
0.128
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.175
.134
.157
.154
.171
.157
.168
.132
.154
.150
.168
.283
.259
,?73
Gupta
0.01
0.36
0.43
0.03
0.43
0.36
0.18
0.50
0.50
0.23
0.23
0.23
0.36
0.50
0.50
0.50
0.36
0.50
trim-t
0.258
0.275
0.262
0.210
0.303
0.297
0.263
0.201
0.291
0.288
0.280
0.222
0.291
0.308
0.290
0.174
0.208
0.254
0.284
0.257
0.240
0.203
trend
0.34
0.29
0.25
0.30
0.27
0.24
0.27
0.25
0.24
0.25
0.25
0.25
0.25
0.24
0.24
0.24
0.24
0.26
times.
biwt-t
0.171
0.141
0.262
0.192
0.140
0.170
0.282
0.158
0.150
0.163
0.211
0.151
0.163
0.173
0.204
0.106
0.165
0.183
0.195
0.219
0.194
0 IRS
64
-------
TABLE 12
power const
-1
-1
-1
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
0
0
0
0
1
.00
.00
.00
.75
.75
.75
.50
.50
.50
.25
.25
.25
.25
.00
.00
.00
.50
,00
1
3
10
1
3
10
0
3
10
0
0
0
0
0
0
0
0
p
.00
.00
.00
.00
.00
.00
.50
.00
.00
.00
.01
.05
.20
.00
.01
.05
.00
,00
: Soecies
t-test
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.534
.605
.727
.583
.652
.749
.621
.697
.767
.672
.672
.671
.673
.725
.725
.726
.779
1 77?
2 tests. Winter.
Wile
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
o
.693
.669
.738
.669
.738
.669
.716
.780
.693
.716
.716
.738
.738
.760
.760
.760
.780
,799
tr1m-t
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.671
.678
.667
.676
.672
.669
.672
.667
.675
.680
.679
.676
.670
.668
.668
.667
.671
,658
b1wt-t
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.645
.664
.722
.654
.684
.734
.673
.703
.745
.701
.701
.698
.696
.720
.720
.719
.760
,771
Table 13 gives the preliminary checks for
Summer, and Table 14 the results. The acceptable
transformations overlap considerably with those
for all times together. Also, they give
consistent results, except for the trimmed t: a
Before-After change that 1s significant at a
level around .05 - .08. The cause of the variable
trimmed t results appears to be that the size of
the added constant determines whether the Inner
half of the After data will be relatively near
the median (as for a long-tailed distribution),
giving low p-values, or stretched out as much as
the outer half (like a uniform distribution,
although somewhat skewed).
The transformation -(X+.2)'-zs seems best
overall (I am biased against having deviation
size decrease with averages) though we may prefer
ln(X+.01) as more Interpretable, despite the
Increase 1n variation with the averages. Both
seem to show no particular pattern In the plot of
differences against season (which roughly
corresponds to decreasing overall abundance), as
shown for ln(X+.01) 1n Fig. 3. They give similar
results: conventional non-significance overall,
though near enough to borderline to warrant a
closer look; clear non-significance In Winter,
with even a hint of an Increase at the Impact
TABIF
power
-1.
-1.
-0.
-0.
-0.
-0.
-0.
-0.
-0.
-0.
0.
0.
0.
0.
0.
0.
1.
00
00
75
75
50
50
25
25
25
25
00
00
00
00
00
50
00
14: Soecies
const
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0,
10
50
10
50
05
20
00
01
05
20
00
01
02
05
10
00
00
t-test
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
073
057
064
062
068
058
054
077
062
065
047
065
065
067
071
102
192
2 tests
Wile
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.055
.069
.058
.065
.055
.073
.039
.052
.062
.069
.049
.052
.049
.069
.073
.065
.115
. Summer.
trim-t
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0,
083
176
100
185
081
165
086
038
118
183
082
115
131
160
179
150
211
biwt-t
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Q
.067
.079
.082
.085
.075
.083
.031
.043
.085
.091
.043
.075
.085
.093
.098
.133
,321
ilfl_
3
: Soecies
2 Differences
vs.
Season
o
a n
n
"
.
t
•
•
.
°. n" ^ .
a
m
o
•
D
a
o'
o
a a
D
0
0*
a
.
s
•
m
m
•
a
0
e
•
000
0 25
O 75
I 00
area, although the Before data are limited to a
13-week period; and a significant decrease (or
one near enough to be taken seriously, by most
standards) in Summer.
TABLE 13; Preliminary checks for Species 2. Summer.
power
-1.00
-1.00
-0.75
-0.75
-0.50
-0.50
-0.25
-0.25
-0.25
-0.25
0.00
0.00
0.00
0.00
0.00
0.50
1.00
const
0.10
0.50
0.10
0.50
0.05
0.20
0.00
0.01
0.05
0.20
0.00
0.01
0.02
0.05
0.10
0.00
0.00
Tukey
0.38
0.41
0.38
0.41
0.35
0.39
0.19
0.28
0.35
0.38
0.25
0.30
0.31
0.34
0.35
0.17
0.00
Spear
0.63
0.28.
0.40
0.22
0.47
0.24
0.86
0.60
0.31
0.23
0.52
0.32
0.30
0.18
0.19
0.19
0.18
vslope
-0.22
0.72
-0.34
0.48
-0.28
0.97
-0.02
-0.17
-0.52
0.55
-0.24
-0.52
-0.73
0.83
0.47
0.02
0.00
vspear
-0.11
0.30
-0.60
0.12
-0.41
0.50
-0.01
-0.21
-0.93
0.18
-0.43
-0.89
0.91
0.28
0.13
0.00
0.00
von N
0.41
0.63
0.45
0.66
0.42
0.59
0.22
0.34
0.50
0.63
0.37
0.47
0.51
0.57
0.62
0.58
0.30
rank vN
0.66
0.64
0.52
0.67
0.54
0.53
0.69
0.61
0.48
0.62
0.50
0.46
0.46
0.45
0.54
0.66
0.54
shape
0.69
0.93
0.50
0.93
0.47
0.72
0.44
0.90
0.50
0.80
0.44
0.38
0.50
0.50
0.66
0.66
0.88
Gupta
0.31
0.36
0.45
0.29
0.39
0.41
0.26
0.12
0.27
0.34
0.11
0.18
0.31
0.43
0.32
0.16
0.08
trend
0.35
0.20
0.30
0.20
0.32
0.22
0.44
0.36
0.27
0.22
0.32
0.29
0.27
0.24
0.23
0.28
0.49
65
-------
0
0
0
0
0
1
1
1
1
1
1
time
.562
.641
.644
.721
.723
.189
.208
.227
.257
.284
.309
.342
.361
.380
.399
.418
.437
.459
.478
7
I
.375
2.388
2
4
3
48
95
59
11
27
34
4
3
2
1
1
5
2
3
time
4
4
4
5
5
5
5
5
5
5
5
6
6
$
.548
.644
.953
.197
.347
.410
.464
.642
.699
.757
.970
.058
.307
• 364
2
0
9
31
1
1
0
1
5
0
1
0
1
1
Species
.549
.353
.600
.513
.000
.158
.007
.064
.939
.654
.039
.632
.866
.465
.217
.579
.661
I
.476
'.795
.510
.010
.546
.053
.135
.500
.673
.058
.823
.651
.204
• 107
3,
7.
2.
2.
1.
3.
32.
50.
44.
15.
10.
16.
5.
5.
3.
2.
3.
9.
2.
2.
1.
1.
15.
34.
1.
1.
0.
2.
8.
0.
1.
0.
2.
1.
The
BEFORE
C time
234 1.495
609
726
275
149
010
123
649
227
728
510
212
315
163
358
884
.514
.533
.552
.571
.590
.609
.628
.648
.667
.686
.724
.735
.743
.762
.781
350 2.515
814 2.707
300 2.879
AFTER
C time
887 6.425
784 6.482
629 6.545
075 6.597
432 6.652
893 6.699
798 7.134
364 7.175
689 7.249
170 7.329
313 7.553
752 7.671
617 7.734
877
data are
0
0
0
0
0
4
7
3
0
0
1
0
0
0
0
0
3
0
0
0
0
15
0
12
2
14
19
20
8
1
4
0
I
.509
.339
.237
.319
.035
.610
.529
.352
.193
.396
.493
.417
.192
.383
.073
.015
.913
.274
.299
I
.451
.247
.770
.999
.619
.693
.308
.374
.595
.379
.147
.354
.320
3
0
0
0
0
3
9
3
1
1
1
2
0
0
0
0
3
0
0
3
0
18
2
29
7
19
25
12
15
1
3
0
C
.011
.419
.308
.042
.259
.792
.404
.642
.057
.203
.760
.320
.234
.356
.110
.074
.292
.185
.235
C
.649
.646
.839
.428
.479
.268
.925
.146
.051
.565
.107
.298
.550
in Table 1!
FIG. 4; Differences vs. Averages forVX
and
examples from the first run of the preliminary
checks are 1n Table 16. These examples, and more
detailed tables, strongly suggest a problem with
serial correlation. (The rank version of von
Neumann's test seems to have greater power, as
Bartels (1982) claimed It would for non-Normal
distributions.) In addition, no transformation
seems additive. This appears to be due to the six
high density sampling times In the Before period,
when Impact densities were, with one mild
exception, greater than Control densities, while
this ordering Is reversed for most other sampling
times. Fig. 4 shows the plot of the differences
against the averages when the square root
oo >
'
1 0 5 10
Many of the Before sampling times are quite
close together (most are a week apart), so serial
correlation 1s not surprising and some adjustment
seems needed. I have not attempted any serious
modelling here. I have assumed that the centered
differences, d^-D^-ECDy) (Dy-Y^-Y^), obey
<'ld"r1j<'l(j-l)+^1j where the EIJ s are
Independent errors and r^ decreases linearly
with the tlmegap, tij-ti(j-i)> I chose the esti-
mated line by an ad noc argument: I assumed that
TJJ-O if the tlmegap is greater than .15 (this is
about eight weeks, and contains the period of
high density Before observations), and that the
sample autocorrelation estimated r^ for a
tlmegap of about .019, the median (anf mode) of
the Before timegaps. This gives r*<-1.15r(l-
T/.15), where T Is the timegap. Ifjr1i>0, I
replaced Dj< by (Dirr divided T>y (1-
rti) so the mean would be unchanged.
Table 17 shows the preliminary checks for
these adjusted samples. The serial correlation
problem seems now to be minor, but additivity
remains elusive. A plot like Fig. 4, but using
the adjusted data, looks very like Fig. 4.
(Trends in the absolute deviations, portrayed in
"vslope" and "vspear", may be due to trends in
the mean.) It Is possible that, without the
alteration, Impact Is more popular than Control
in some conditions and less popular in others. I
therefore broke each data set into two, defining
"conditions" by time-of-year (Summer between .33
and .9) rather than by "abundance" (e.g.
Control>10), though the answers are similar.
Tables 18 and 19 give preliminary checks on
the adjusted data for Summer and Winter
respectively. Even though there are only six
transformation Is
power
-1
-1
-1
-1
-0
-0
-0
-0
0
0
0
0
0
1
.00
.00
.00
.00
.50
.50
.50
.50
.00
.00
.00
.00
.50
.00
used.
const
0.00
1.00
3.00
5.00
0.00
1.00
3.00
5.00
0.00
0.10
0.50
1.00
0.00
0.00
TABLE 16: Preliminary checks for Soecies 3. all
Tukey
0.00
0.33
0.25
0.16
0.00
0.19
0.07
0.02
0.03
0.05
0.02
0.01
0.00
0.00
Spear
0.02
0.09
0.25
0.28
0.03
0.13
0.21
0.17
0.02
0.04
0.09
0.14
0.09
0.24
vslope
-0.00
-0.13
-0.97
0.43
-0.00
-0.73
0.17
0.02
-0.05
-0.95
0.11
0.01
0.00
0.00
vspear
-0.00
-0.10
0.42
0.04
-0.00
0.63
0.03
0.00
-0.27
-0.97
0.15
0.02
0.00
0.00
von N
0.74
0.36
0.28
0.22
0.90
0.27
0.17
0.11
0.63
0.28
0.12
0.09
0.01
0.00
rank vN
0.17
0.07
0.07
0.07
0.15
0.10
0.05
0.04
0.23
0.18
0.07
0.04
0.07
0.04
times.
shape
1.57
0.86
0.79
0.79
0.70
0.79
0.72
0.65
0.88
0.72
0.72
0.65
0.77
2.04
Gupta
0.03
0.44
0.37
0.46
0.08
0.45
0.44
0.50
0.41
0.48
0.37
0.28
0.22
0.49
trend
0.59
0.53
0.43
0.40
0.61
0.44
0.36
0.34
0.43
0.41
0.34
0.31
0.26
0.29
66
-------
TABLE 17: Preliminary checks for Species 3. adjusted for correlation, all times
power const Tukey Spear vslope vspear
-1.00 3.00 0.30 0.30 -0.86 0.39
-0.50 1.00 0.23 0.19 -0.61 0.87
-0.50 3.00 0.12 0.39 0.29 0.08
0.00 0.00 0.03 0.02 -0.05 -0.27
0.00 0.10 0.06 0.09 -0.81 0.99
0.00 0.50 0.04 0.17 0.29 0.22
0.00 1.00 0.03 0.22 0.07 0.06
0.50 0.00 0.00 0.14 0.00 0.07
0.50 0.50 0.00 0.26 0.00 0.02
1.00 0.00 0.00 0.28 0.00 0.01
TABLE 18: Preliminary checks for Soedes
power const Tukey Spear vslope vspear
-1.00 3.00 0.91 -0.67 0.77 0.69
-0.50 1.00 0.82 -0.98 -0.88 -0.93
-0.50 3.00 -0.86 -0.46 0.44 0.29
0.00 0.00 0.31 0.25 -0.01 -0.09
0.00 0.10 0.71 0.44 -0.36 -0.60
0.00 0.50 0.99 -0.78 0.82 0.86
0.00 1.00 -0.87 -0.59 0.51 0.35
0.50 0.00 -0.64 -0.80 0.32 0.33
0.50 0.50 -0.45 -0.36 0.12 0.18
1.00 0.00 -0.04 -0.39 0.00 0.01
TABLE 19: Preliminary checks for Soecies
power const Tukey Spear vslope vspear
-1.00 3.00 0.97 -0.90 -0.01 -0.10
-0.50 1.00 0.87 .00 -0.00 -0.05
-0.50 3.00 0.84 .00 -0.00 -0.05
0.00 0.00 0.62 .00 -0.05 -0.09
0.00 0.10 0.62 .00 -0.05 -0.09
0.00 0.50 0.61 .00 -0.06 -0.09
0.00 1.00 0.60 .00 -0.07 -0.09
0.50 0.00 0.23 0.37 -0.64 -0.95
0.50 0.50 0.23 0.37 -0.65 -0.95
1.00 0.00 0.04 0.20 0.47 0.70
Before observations ~ In Winter, many
transformations are ruled out by trends 1n the
sizes of the deviations. There are 32 Before
observations 1n Summer, but there Is a much
broader range of acceptable transformations: Im-
pact and Control differ little 1n this period,
and 1t may also be that, for very small
observations, transformations Involving a
relatively large added constant behave rather
alike.
If there Is a "season-x-slte" Interaction, It
may not be the case that the same transformation
1s appropriate for both seasons. However, the
square root transformation seems to be
satisfactory In both seasons here. More
reassuringly, the tests for an effect of the
alteration give very similar results for
acceptable (and many unacceptable)
transformations for all times (Table 20), Summer
alone (Table 21} and Winter alone (Table 22).
TABLE 20: Soecles 3 tests (cf Table 17).
power const t-test Wile trim-t blwt-t
-1.00 3.00 0.040 0.011 0.015 0.008
-0.50 1.00 0.036 0.010 0.010 0.004
-0.50 3.00 0.018 0.008 0.007 0.011
0.00 0.00 0.041 0.016 0.012 0.033
0.00 0.10 0.017 0.007 0.004 0.010
0.00 0.50 0.014 0.005 0.007 0.007
0.00 1.00 0.012 0.006 0.006 0.010
0.50 0.00 0.016 0.017 0.030 0.050
0.50 0.50 0.018 0.020 0.027 0.057
1.00 0.00 0.048 0.033 0.089 0.077
von N rank vN shape Gupta trend
0.53 0.27 0.09 0.51 0.53
0.54 0.39 0.09 0.49 0.53
0.54 0.49 0.15 0.45 0.49
0.63 0.23 -0.06 0.41 0.43
0.54 0.40 0.09 0.37 0.49
0.57 0.69 0.18 0.42 0.48
0.56 0.68 0.22 0.41 0.47
0.50 0.91 0.38 0.40 0.48
0.46 0.89 0.39 0.34 0.49
0.44 0.89 0.45 0.47 0.50
3. adjusted for correlation. Summer
von N rank vN shape Gupta trend
0.51 0.20 0.04 0.23 0.72
0.50 0.28 0.03 0.15 0.75
0.51 0.23 0.06 0.13 0.75
0.83 0.47 -0.18 0.36 0.71
0.56 0.35 -0.04 0.34 0.72
0.51 0.31 0.04 0.16 0.77
0.51 0.25 0.06 0.18 0.76
0.51 0.35 0.05 0.12 0.79
0.51 0.38 0.09 0.16 0.81
0.48 0.47 0.13 0.05 0.90
3. adjusted for correlation. Winter
von N rank vN shape Gupta trend
0.56 0.32 -0.11 0.50 0.37
0.57 0.41 -0.12 0.50 0.43
0.56 0.41 -0.11 0.50 0.45
0.59 0.41 -0.14 0.50 0.62
0.59 0.41 -0.14 0.50 0.62
0.59 0.41 -0.14 0.50 0.63
0.59 0.41 -0.13 0.50 0.64
0.59 0.73 -0.11 0.50 0.95
0.59 0.73 -0.11 0.50 0.94
0.52 0.32 -0.02 0.50 0.54
TABLE 21: Snedes 3 tests (cf Table 18).
power const t-test Wile trim-t blwt-t
-1.00 3.00 0.036 0.014 0.017 0.009
-0.50 1.00 0.032 0.011 0.021 0.006
-0.50 3.00 0.028 0.016 0.025 0.011
0.00 0.00 0.057 0.043 0.021 0.060
0.00 0.10 0.033 0.029 0.018 0.029
0.00 0.50 0.020 0.009 0.014 0.010
0.00 1.00 0.021 0.011 0.017 0.011
0.50 0.00 0.019 0.011 0.008 0.034
0.50 0.50 0.025 0.017 0.023 0.019
1.00 0.00 0.072 0.018 0.113 0.027
TABLE 22: Soedes 3 tests (cf Table 19).
power const t-test Wile trim-t biwt-t
-1.00 3.00 0.035 0.034 0.119 0.053
-0.50 1.00 0.022 0.034 0.083 0.031
-0.50 3.00 0.016 0.019 0.063 0.020
0.00 0.00 0.008 0.010 0.037 0.008
0.00 0.10 0.008 0.010 0.036 0.008
0.00 0.50 0.008 0.007 0.035 0.008
0.00 1.00 0.008 0.007 0.033 0.007
0.50 0.00 0.006 0.005 0.033 0.005
0.50 0.50 0.006 0.005 0.034 0.005
1.00 0.00 0.01? 0.004 0.073 O.OOQ
Thus this species seems to have been reduced
by the alteration. However, the evidence Is
stronger for Summer, when the two sites seem to
have been quite similar (Before) and there are
many observations, than for Winter, when the
relation between the sites seems to have been
different but there are only six Before
observations, all within a seven week period.
67
-------
4. Discussion.
There are three obvious directions for
further work: better testing, better things than
testing, and lessons to be applied 1n the design
of assessment programs.
With respect to the first, there 1s a laundry
list of desirable extensions. A proper plan for
dealing with correlated data 1s high on this
list. However it seems unlikely to be
accomplished without a model Involving more than
half a dozen parameters, and requiring larger and
longer (in time) datasets than are often
customary.
Multivariate procedures are another Item. It
will have been noticed that Species 2 and 3 were
both dense at about the same times. If they are
similar biologically (or suspected to be so), we
may get more reliable results, and perhaps some
Idea about the mechanisms of any effect, by
analyzing them together. Unfortunately, knowledge
of many populations Is such that it 1s easy to
"suspect" links between large numbers of species,
and difficult to rate the credibility of these
suspicions. Data-based methods for linking
different species would be helpful.
Mention of many species raises the problem of
multiple testing. In many assessment problems
there may be dozens or even hundreds of species
to consider: all are sampled the same way, so the
"extra" data are essentially free. Even 1f no
species is affected, the probability that at
least one will be "significant" may be much
greater than the nominal level. The standard
answer, which amounts largely to using very small
test levels, Is unsatisfactory: Normal and t-
approximations can't be trusted at these levels,
and the variances and sample sizes are such that
even severely affected species are unlikely to
register at them.
Can we do better than significance testing?
That many resource managers and biologists seem
to want, and be content with, a test of
significance appears sometimes to be a remarkable
public relations achievement. The meaning of a p-
value 1s not easy to grasp, even 1n simple cases
where the model Is known. It may not be
understood by many of those who ask for it. There
are plenty of logical problems, as the Bayeslans
have pointed out (e.g. Edwards, Undman and
Savage 1963, Lindley 1972).
Arguments can be made for Impact assessment
testing. An obvious one 1s that environmental
protection laws often imply tests rather than
estimates: alterations are to make no. change In
the "balanced Indigenous" community. Another is
that estimation may require larger sample sizes.
It may be more economical to test first and then
decide whether further lab and field studies are
justified. Bayeslans have pointed out that the p-
values from one-sided tests may be interpretable
Bayesianly, if the prior 1s diffuse. (It surely
should be in much of this work, and when
environmentalists and industrialists each have
sharp, but sharply different, priors, the p-value
seems useful as an objective compromise.) Also,
difficult as testing is, estimation may be much
harder, both to do and to explain.
But this last objection comes near to
violating Tukey's principle, "better an
approximate answer to the right question, which
Is often vague, than an exact answer to the wrong
question, which can often be made precise." The
right question Is surely "Do the effects matter?"
This is vague because it Involves a value
judgment, but statisticians can help Individuals
to decide for themselves by answering the nearby
question "How big are the effects?"
This question 1s easy to answer, at least
approximately, from the testing work only if
Control is believed to be unchanged and the
transformation used in the tests implies the
units we want. A change in X gives us the
absolute change; in ln(X) the percentage change;
in X'1 the change in space needed per animal.
Added constants might be justified on the grounds
that E(X+c)D-{EX)fir approximately, using Jensen's
Inequality. But powers other than -1, 0 and 1
seem hard to interpret, although they may well be
justified In terms of "damping". Also, if the
units we want do not correspond to the
transformation we used, there is no simple back-
conversion to help: the estimated size of the
change will Involve the actual sample abundances,
which will usually depend on the choice of
sampling times.
An alternative 1s to attack estimation
directly. We are immediately faced with the
problem of choosing units, the possibility that
the effect may vary (e.g. with t1me-of-year), and
the difficulty of presenting estimates (with
confidence intervals, etc.) In a form quickly
grasped by non-scientists. But these questions
suggest a useful approach. This is to use the
Control (or several Controls) as predictors, and
to present the assessment In the form of a plot
showing, for each Control value, the Impact value
expected In the Before period and the Impact
value expected In the After period. (One can also
plot the observed After value at Impact against
the Before value that would be predicted from the
observed After values at Control.) Thus the
Control stands for environmental conditions which
might determine the direction and size of the
change at Impact. This approach was successfully
used by Hathur, Robbins and Purdy (1980), but I
have not seen It elsewhere.
This approach requires the calculation of
regressions, of Impact against Control(s), for
Before and After. The problems arising in testing
arise here too, often more severely.
Corresponding to transformation choice Is the
choice of a model. This latter choice Is wider,
though: there are natural models that allow
Control to be more abundant than Impact 1n some
conditions and less abundant In others. Also
there are new possibilities, like isotonic
regression (Barlow et al 1972) or projection
pursuit regression (Friedman and Stuetzle 1981),
that allow us to evade a particular choice. These
bring more problems, some of them severe, such as
estimating variance (especially If It varies) and
detecting and allowing for serial correlation.
Finally, two design considerations should be
mentioned. First, It may be Inadequate to decide
on a Cample sjze only by the usual calculation,
n- 2s'(z0-ze)Yd » where a is the test level, d
is a difference jt Is desired to detect with
probability e, sz is a variance estimate, and
zo and z 6 are the upper a and b points of the
N(0,l) distribution. (The units of d and sz are
those of a transformation applied to some
68
-------
preliminary data used to obtain s2.) While such
calculations need to be done, we need to
anticipate uncertainty about transformations, the
need to split the sample Into two or more parts,
serial correlation, and the possibility of year-
to-year variation.
Secondly, the estimation approach makes
explicit what was only Implicit 1n testing. This
Is that the "control" '1s not a control In the
sense used In experiments. In the testing
problem, we think of It as "tracking" Impact, If
only we could find the right transformation. But
the estimation approach described above makes It
clear that 1t Is a predictor. Thus it Is not
essential that Control be "similar" to Impact,
only that It predict It well Before the
alteration. Thus several Controls, serving
complementary purposes, might be useful, as might
be observations at "Control" on species other
than the target one.
Acknowledgement: I thank the Harlne Review
Committee for permission to use the data In
Section 3. MRC does not necessarily agree with
any of the opinions 1n this article.
REFERENCES
Anderson, T. W. 1970. The Statistical Analysis of
Time Series. WHey, New York.
Antllle, A. G., G. Kersting and W. Zucchini.
1982. Testing Symmetry. Journal of the
American Statistical Association, 21; 639-
646.
Atkinson, A. C. 1985. Plots. Transformations and
Regression. Oxford University Press, New
York.
Barlow, R. £., D. J. Bartholomew, J. M. Bremner
and H. 0. Brunk. 1972. Statistical Inference
Under Order Restrictions. Wiley, New York.
Bartels, R. 1982. The Rank Version of von
Neumann's Ratio Test for Randomness. Journal
of the American Statistical Association, 21;
40-46.
Berry, 0. A. Logarithmic Transformations In
ANOVA. Biometrics, &; 439-456.
Box, G. E. P., and D. R. Cox. 1964. An Analysis
of Transformations. Journal of the Royal
Statistical Society, Series 'B', 24; 211-252.
Box, G. E. P.. and W. J. H111. 1974. Correcting
Inhomogenelty of Variances with Power
Transformation Weighting. Technometrlcs 1£;
385-389.
Box, G. E. P., and G. C. Tiao. 1965. A Change in
the Level of a Non-Stationary Time Series.
Biometrlka, 52; 181-192.
Box, G. E. P., and G. C. T1ao. 1975. Intervention
Analysis with Applications to Economic and
Environmental Problems. Journal of the
American Statistical Association, 7_fl; 70-79.
Edwards, W., H. Lindman and L. J. Savage. 1963.
Bayeslan Statistical Inference for
Psychological Research. Psychological Review,
Zfl; 193-242.
Friedman, J. and W. Stuetzle. 1981. Projection
Pursuit Regression. Journal of the American
Statistical Association, 7_fi; 817-823.
Gupta, H. K. 1967. An Asymptotically Non-
Parametric Test of Symmetry. Annals of
Mathematical Statistics, IS; 849-866.
Kafadar, K. 1982. Using Biweight Estimates in the
Two-Sample Problem. Part 1: Symmetric
Populations. Communications in Statistics -
Theory and Methods, II; 1883-1901.
Kafadar, K. 1983. The Efficiency of the Biweight
as a Robust Estimator of Location. Journal of
Research of the National Bureau of Standards,
88; 105-116.
Lehmann, E. L. 1975. Non-Parametrics; Statistical
Methods Based on Ranks. Hoiden Day.
Lindley, D. V. 1972. Baveslan Statistics: A
Review. S. I. A. M., Philadelphia.
Mathur, D., T. W. Robblns and E. J. Purdy. 1980.
Assessment of Thermal Discharges on
Zooplankton in Conowlngo Pond, Pennsylvania,
Canadian Journal of Fisheries and Aquatic
Science 21; 937-944.
Posten, H. 0. 1978. The Robustness of the Two-
Sample t-test Over the Pearson System.
Journal of Statistical Computation and
'Simulation, fi; 295-311.
Posten, H. 0. 1979. The Robustness of the One-
Sample t-test Over the • Pearson System.
Journal of Statistical Computation and
Simulation, 2; 133-149.
Posten, H. 0. 1982. Two-sample Wilcoxon Power
Over the Pearson System and Comparison with
the t-test. Journal of Statistical
Computation and Simulation, 16; 1-18.
Reinsel, G. C., and G. C. Tiao. 1987. Impact of
Chlorofluoromethanes on Stratospheric Ozone.
Journal of the American Statistical
Association, 82; 20-30.
Stewart-Oaten, A., W. W. Murdoch and K. R.
Parker. 1986. Environmental Impact
Assessment: Pseudo-Replication in Time?
Ecology 61(4); 929-940.
Stewart-Oaten, A. 1986. Assessing Local Impacts:
Progress and Some Problems. Oceans '86
Conference Record, Vol 3, pp 964-973.
Available from: IEEE Service Center, 445 Hoes
Lane, Plscataway, N. J., 08854; or from:
Marine Technology Society, 2000 Florida Av-
enue, N. W., Suite 500, Washington, D. C.
Stlgler, S. M. 1976. The Effect of Sample
Heterogeneity on Linear Functions of Order
Statistics, with Applications to Robust
Estimation. Journal of the American
Statistical Association, 7J.5 956-960.
Tukey, J. W. 1949. One Degree of Freedom for Non-
Additlvlty. Biometrics, 5; 232-242
von Neumann, J. 1941. Distribution of the Ratio
of the Mean Square Successive Difference to
the Variance. Annals of Mathematical
Statistics, 12; 367-395.
Yuen, K. K. and W. J. Dixon. 1973. Approximate
Behavior and Performance of the Two-Sample
Trimmed t. Blometrika, fifl; 369-374.
69
-------
COMPARISONS WITH BACKGROUND ENVIRONMENT: STRATEGIES FOR DESIGN
Bruce Peterson
CH2M HILL, 777 108th Avenue NE, Bellevue, WA 98004
INTRODUCTION
Current federal and state environmen-
tal regulations have greatly increased
the number of environmental studies
being performed each year in the United
States. These studies range from inves-
tigations of Superfund sites to the
establishment of monitoring systems for
RCRA-permitted facilitates. A common
component of all of these studies is the
comparison of the sample values of en-
vironmental parameters with background
samples or regulatory thresholds.
Design of environmental sampling
strategies in a potentially litigious
regulatory context presents many new
challenges for the statistician. Among
these are the traditional challenges of
adequate problem definition, newer chal-
lenges of maintaining data quality when
multiple organizations handle the sam-
ples, and the challenges in the rapidly
developing field of chemometrics (the
study of the statistical properties of
chemical measurements).
Studies involving statistical com-
parisons have an initial phase during
which the objectives of the study and
the contrasts of interest are refined
and agreed upon. Regulator!ly mandated
environmental studies, particularly of
Superfund sites, often involve multiple
actors in defining the study objectives.
Each of these actors—the regulatory
agencies, the potentially responsible
parties, and the public—have poten-
tially disparate agendas. Each agenda
can be translated into statistical de-
sign goals that may be incompatible with
those derived from other agendas. Fre-
quently the actors are statistically
unsophisticated: the definition of
study objectives therefore becomes an
involved process of both education and
political maneuvering.
Even with an agreed-upon set of study
objectives, the regulatory context of a
study results in a high probability of
litigation. This places an obligation,
on the study team to carefully document
and track sampling activities and the
movement of samples through the analy-
tical process. Well-constructed data
bases are a necessity.
The data base should include provi-
sions for recording progress of samples
through the analysis, field changes to
the sample plan, variations in the ana-
lytical protocol, pointers to the origi-
nal documents, and the values of the
environmental parameters.
The data base will ultimately provide
the information necessary for statisti-
cal analysis and comparison of the
environmental parameters. The chal-
lenges of the analysis will include the
mundane: obtaining a "final" data set,
and the complex: performing comparisons
with multivariate and incomplete data.
In addition the data base must maintain
information on the quality of the chemi-
cal analysis for incorporation into the
estimate of the overall quality of
analysis.
The chemometric measures of quality
stored in a data base enable multiple
uses of the data. These measures are
available from the analytical labora-
tories, allowing data of differing qual-
ity to be used in analyses where appro-
priate. As an analytical lab is capable
of generating a large number of quality
measures, these are best transferred
electronically.
STRATEGIES FOR DESIGN;
THE LOVE CANAL PILOT STUDY
Many of these challenges have been en-
countered with the Love Canal Superfund
site. Construction on the Love Canal was
started in the 1890s but never completed.
The canal was used as a chemical waste
dump from 1942 to 1953 and was then
filled and covered. The Love Canal area
developed as an urban neighborhood until
significant contamination of the environ-
ment was recognized in 1978. Extensive
remediation to isolate and contain the
wastes in the canal was performed in 1979
and additional remediation to clean the
surrounding area is continuing. The Hab-
itability Study, due for completion in
May 1988, is designed to provide informa-
tion on the habitability of the emergency
declaration area (EDA) surrounding the
canal.
The technical goals of the study were
developed by a Technical Review Committee
(TRC) established by the U.S. EPA to
oversee and coordinate the habitability
study. TRC members represent the U.S.
EPA, the N.Y. State Department of Health,
the U.S. Department of Health and Human
Services/Centers for Disease Control, and
the N.Y. State Department of Environmen-
tal Conservation. The TRC in turn formed
a scientific advisory panel to formulate
and recommend habitability criteria.
The scientific panel recommended that
a habitability study be formulated as a
statistical comparison between each EDA
neighborhood and similar areas not in-
fluenced by a hazardous waste site in the
Niagara Falls/Buffalo area. The compari-
son was to be based on the chemical con-
centrations of 8 indicator chemicals
(LCICs), chosen for mobility and speci-
ficity to the canal. The comparison was
to provide a 90 percent probability of
detecting an "order of magnitude" differ-
ence in median values with 95 percent
confidence against the null hypothesis
that no difference in median values
70
-------
existed. The sample design was devel-
oped after internal discussion and a
peer review. The full criteria can be
found in Love Canal Emergency Declara-
tion Area; Proposed Habitability Cri-
teria, NYSDOH and DHHS/CDC 1986.
This degree of specificity is unusual
in Superfund-driven environmental
studies. However, the extensive discus-
sions that preceded agreement on these
criteria served to develop a common ex-
pectation as to what the habitability
study should provide.
The habitability study, like most en-
vironmental studies, required estimates
of LCIC concentration variability to
design the sampling and analysis plan.
In addition, since previous studies had
found the indicator chemicals to have a
high percentage of undetectable concen-
trations, greater sensitivity of the
chemical analysis was needed.
A pilot study was designed and imple-
mented to test the field operations re-
quired to collect the samples, test a
new analysis method for estimating LCIC
concentrations, and identify and quan-
tify sources of variability in the con-
centration estimates.
The Love Canal pilot study illustra-
ted difficulties common to many large
environmental studies, one example being
the tracking of samples from field col-
lection through analysis. Although a
data base was established to track sam-
ples, the number of changes that oc-
curred in the field required consid-
erable effort to trace during later
stages of the study. Environmental
studies should establish sample tracking
and documentation as priority task.
Laboratory variability, not generally
quantified in many environmental
studies, was found to be significant
during the pilot. Laboratory operations
were closely tracked as part of the
pilot, yielding quantitative measures of
variability not commonly available
through EPA's contract laboratory pro-
gram. This close tracking was continued
for the full study, as the detailed in-
formation provides insight about the
quality of the data.
The pilot provided information on the
sources of variability and bias in the
laboratory estimates of concentration.
Interlaboratory variability was found to
be of similar magnitude as intralabora-
tory variability. Biases occurred among
laboratories from several causes ranging
from contamination levels in different
lots or brands of reagents to differ-
ences in exact analytical protocols
among laboratories.
An analytical method specific for
LCICs was developed to achieve suffi-
cient sensitivity to increase the number
of samples with detectable concentra-
tions. Criticism of earlier studies
about the sensitivity of the analytical
methods led to an effort to estimate the
"detection limits" for the analytical
method.
The analytical method proposed for
the habitability study uses a gas
chromatograph/mass spectrometer (GC/MS)
instrument similar to that used for most
sensitive organic analyses of environ-
mental samples. The method differs from
the usual broad scan for a range of com-
pounds. In this method the instrument
was programmed to search only for LCICs
by a method known as single ion moni-
toring (SIM).
The GC/MS instrument is sensitive,
providing quantitative estimates of very-
low-level concentrations (parts per bil-
lion range), and accurate, providing a
high level of confidence as to the iden-
tity of the compound. The proposed
method increased the sensitivity of the
instrument by operating it in the SIM
mode. This concentrated the instru-
ments' scan time on the LCICs.
Estimating the "detection limit" for
the GC/MS SIM proved difficult as the
chemometric literature has not dealt ex-
tensively with detection limits for
multivariate instruments. A sample spe-
cific detection limit cannot be esti-
mated for the GC/MS SIM method.
However, a method detection limit may be
estimated. The method detection limit
is the concentration at which samples
with a known concentration of a compound
would be differentiated from a blank
95 percent of the time.
An estimator for the method detection
limit is difficult to develop as identi-
fication and quantification are con-
founded in the GC/MS instrument. Iden-
tification and quantification is a
multivariate process where a compound is
not quantified unless it is first iden-
tified. Generally the ability to iden-
tify a compound is lost at concentrations
greater than those at which quantifica-
tion could occur. Identification often
fails because of chemical interferences
in the chromatogram at low concentra-
tions. These interferences obscure the
ion area peaks associated with the target
compound.
Standard estimators of detection
limits (Long and Winefordner 1983, Currie
1968, Hubaux and Vos 1968) are defined in
terms of instruments with univariate re-
sponses. Since the GC/MS is a multivari-
ate instrument, these methods are not
directly applicable. New estimators are
required for detection (or identifica-
tion) limits for the GC/MS. These are
being developed in the full habitability
study.
SUMMARY AND CONCLUSION
Environmental studies are increasingly
being mandated by regulatory concern. In
this potentially litigious context, ex-
ceptional efforts are required to produce
71
-------
sound statistical designs. The objec-
tives of each study must be clearly
defined and conflicting objectives iden-
tified and mediated before the study can
be successful.
A good data base management system is
a prerequisite for a successful environ-
mental study. This system should in-
clude the personnel, software, and hard-
ware to track not only the parameter
estimates but sample locations, se-
quences, quality assurance parameters,
and pointers to original documents.
With environmental studies typically
involving a number of organizations,
inadequacies of data management will
cause difficulties in data defensibility
and interpretation.
Finally, a good understanding of the
analytical process that produces the
parameter estimates is required. Ex-
perience with the analytical techniques
used at Love Canal has shown that basic
concepts such as "Detection Limit" are
not as clearly defined as may be neces-
sary. Statisticians must expect to work
closely with chemists and other experts
to evaluate the impact of chemical con-
centration estimators on study
objectives.
REFERENCES
Currie, L. Limits for Qualitative Detec-
tion and Qualitative Determination. Ana-
lytical Chemistry. Vol. 40, No. 3. 1968.
Pp. 586-593.
Hubaux, A., and G. Vos. Decision and De-
tection Limits for Linear Calibration
Curves. Analytical Chemistry. Vol. 42,
No. 8. 1968. Pp. 849-855.
Long, G. L., and J. D. Winefordner,
Limit of Detection: A Closer Look at the
IUPAC Definition. Analytical Chemistry,
Vol. 55, No. 7. 1983. Pp. 712-724.
New York State Department of Health and
U.S. Department of Health and Human
Services/Centers for Disease Control.
Love Canal Emergency Declaration Area;
Proposed Habitability Criteria. 1986.
72
-------
STATISTICAL COMPARISONS OF STTES:
DISCUSSION
R. Clifton Bailey
Environmental Protection Agency, WH586,401 M. Street, SW
Washington, DC 20460
The common the problem in both papers is the choice of a
comparison areas or sites. The approaches differ in the use of sites
similar in time versus similar in location. These choices are made
for practical considerations. In both cases, traditional statistical
techniques are stretched to fit the situation.
Both papers emphasize statistical methods adapted from traditional
work for designed experiments, and emphasize making the
comparison sites similar. There are, of course, always problems
with finding similar sites. Whatever the choice, in controversial
situations, there is always an advocate to find fault with the choice.
The problem is especially troublesome since traditional safeguards
such as randomization are not available.
Since it is difficult to find an ideal sit for comparison, we might
examine the fundamental question: Do we want just one area for
comparison? If more than one area is used for comparison, then
how do we make use of the information from multiple sites?
If more than one comparison site is used, then we could establish
relationships between factors thought to be important, and use the
results for adjustment. Often this ploy is used in observational
studies when the conditions do not permit the usual design
interventions such as the random assignment of treatment and
control sites.
To establish relationships requires a range of values for variables of
potential concern. This means that we seek a range of site
conditions which can be used to establish the adjustment
relationships rather than trying to find identical sites. Since identical
sites don't exist, we need enough information to construct a
hypothetical site analytically though computed adjustments.
Consequently more than one control site must be used unless there
is an obvious control or reference that cannot be questioned.
In effect you compute the ideal control site rather than find an
almost ideal physical site. The control sites serve to calibrate your
model so that you can adjust for other factors.
In these situations, where traditional statistical design don't apply,
there will always be factors which will differ between sites and
which someone will point to as an explanation for any difference
found by traditional statistical tests.
If we include a reasonable set of concomitant information to make
our adjustments, the analysis must be more complex.
Traditional statistical designs usually place a great deal of emphasis
on making the analysis simple. When it is not possible to use
traditional statistical design to make the analysis easy, then we
must consider more complex analyses to circumvent design
deficiencies.
Design deficiencies may be circumvented by taking a global
approach in which we use information from several studies jointly
for common elements when the information in individual studies is
inadequate.
Success with the global approach requires a modeling effort
based on a subject matter understanding far beyond the models of
traditional statistical methods.
Similar ideas were expressed W. G. Cochran(1965). The Planning
of Observational Studies of Human Populations, Journal of the
Roval Statistical Society. Series A. Vol. 128. Pt 2, pp 234-265.
This is a good source of ideas for dealing with studies that are
outside the realm of traditional statistical designs. For example,
Cochran says:
Planning of observational studies are found in subject
matter departments not in statistics departments.
Effective planning of observational studies calls for
considerable mastery of subject-matter field.
In setting up comparisons, the investigator makes do with
comparisons that are far from ideal.
The time and resources for analysis of results are often
underestimated.
To handle disturbing variables- list
variables for which matching or adjustment is
1) required,
2) desirable,
3) disregarded.
For discussions of association and causation, he cites work
on path analysis.
In the next section, I give a brief view of path analysis and suggest
environmental statisticians build on the vast experience with this
method of analysis.
Environmental Path Analysis- Blazing New Trails with Path
Analysis
Why path analysis? Path analysis was developed to solve complex
problems in genetics. Path analysis helps in formulation of complex
environmental problems and provides a conceptual framework for
dealing with complex problems.
In path analysis one graphically depicts causal pathways to
represent statistical relationships among variables.
A simple example might depict a linear relationship between x and
y as follows:
Arrow shows direction of the relationship and U represents the
unexplained sources of variation in Y.
73
-------
The path diagram for multiple regression might be drawn as shown
below:
Path analysis extends single and multiple regression equation to
networks of variables involving more than one equation. Some
writers omit U, the unexplained variation, and the correlation
arrows between x's from path diagrams.
According to Ching Chun Li (1975) in Path Analysis—a primer.
The Boxwood Press, Pacific Grove, CA p. 166
"When properly analyzed, each path diagram will yield a
set of consistent conclusions for that particular causal
system."
For each environmental problem similar to those described by Alan
Stuart-Oaten,
T| j might denote a site of concern or impact,
1^2 * control site
51 a problem that affects site of concern, T|J, but not
control site r\2
T
In conclusion, path analysis provides a way to very quickly build
complex models involving many variables. Some authors have
extended path analysis to nonlinear models. Nonlinear path models
allow investigator a broad range of modeling possibilities and a
framework for organizing their work. References are provided to
assist the interested reader with the basic concepts of path analysis.
Disclaimer:
The opinions and assertions expressed in this paper are the private
conclusions of the author and should not be construed as official or
reflecting the views of the U. S. Environmental Protection Agency.
References:
Sewall Wright Evolution and the Genetics of Populations. Volume
1 Genetic And Biometric Foundations:
Chapter 13. Path Analysis: Theory, pp. 299-324.
Chapter 14. Interpretation by Path Analysis, pp. 325-372.
S. Wright (1921) Correlation and causation. Jour. Agric. Res. 20:
557-85.
J. W. Tukey (1954). Causation, regression and path analysis. In
Statistics and Mathematics in Biology, eds. O.
Kempthome, T. A. Bancroft, J. W. Gowen, and J. L.
Lush, pp.35-66. Ames: Iowa State University Press.
M. E. Turner and C. E. Stevens (1959). The regression analysis of
causal paths. Biometrics IS: 236-258.
Ching Chun Li (1975) Path Analysis—a primer. The Boxwood
Press, Pacific Grove, CA
Studies in Econometric Method by Cowles Commission Research,
Edited by Wm. C. Hood and Tjalling C. Koopmans
(1953):
Chapter n. Identification Problems in
Economic Model Construction.
By Tjalling C. Koopmans, pp. 27-48
Chapter III. Causal Ordering and Identifiability.
By Herbert A. Simon, pp.49-74
lYobkn
Consequence at
Impact Site
Consequence at
Control Site
A more realistic model includes common effect of other variables,
for example, £2 on both the sites
frobleiii
Common
Environmental
Factors)
Consequence at
Impact Site
Consequence flt
ControlSHe
74
-------
APPENDIX A: Program
ASA/EPA CONFERENCE ON SAMPLING AND SITE SELECTION
IN ENVIRONMENTAL STUDIES
May 14-15, 1987
Thursday. Mav 14
8:50 a.m. INTRODUCTION
Walter S. Liggett, National Bureau of Standards
Dorothy Wellington, U.S. Environmental Protection Agency
L THE STATISTICAL BASIS: RANDOMIZATION AND PROCESS CONTROL
9:00 a.m. Some Statistical Issues in the Collection and Analysis of Environmental
Data
George C. Tiao, University of Chicago
10:00 a.m. BREAK
10:10 a.m. Sampling Design: Some Very Practical Considerations
Douglas E. Splitstone, IT Corporation
ll:10a.m. DISCUSSION
Maurice E.B. Owens, Science Applications International Corporation
W. Barnes Johnson, U.S. Environmental Protection Agency
12:10 p.m. LUNCH
H. INFERENCE ON CONTINUOUS SPATIAL DISTRIBUTIONS
1:30 p.m. Spatial Prediction
Noel A.C. Cressie, Iowa State University
2:30 p.m. BREAK
2:40 p.m. Spatial Autocorrelation: Implication for Sampling and Estimation
Evan Englund, U.S. Environmental Protection Agency, EMSL-Las Vegas
3:40 p.m. DISCUSSION
John Warren, U.S. Environmental Protection Agency
4:00 p.m. RECEPTION
75
-------
Friday. Mav 15
m. DESIGNS BASED ON AUXILIARY INFORMATION
9:00 a.m. Sampling and Modeling Superfund Site Pollutant Plumes: Methods
Combining Direct Measurements and Remote Sensing Data
Ralph E. Folsom, Research Triangle Institute
10:00 a.m. BREAK
10:10 a.m. "Validation" of Air Pollution Dispersion Models
Anthony D. Thrall, Electric Power Research Institute
11:10 a.m. DISCUSSION
Robert W. Jernigan, The American University
Richard A. Livingston, U.S. Environmental Protection Agency and
University of Maryland
12:10 noon LUNCH
IV. STATISTICAL COMPARISON OF SITES
1:30 p.m. Assessing Effects on Fluctuating Populations: Tests, Diagnostics, and
Decisions
Allan Stewart-Oaten, University of California-Santa Barbara
2:30 p.m. BREAK
2:40 p.m. Comparison with Background Environment: Strategies for Design
Bruce Peterson, CH2M Hill
3:40 p.m. DISCUSSION
R. Clifton Bailey, U.S. Environmental Protection Agency
4:10 p.m. CLOSING REMARKS
Walter S. Liggett, National Bureau of Standards
This Conference is the third in a series of research
conferences organized by the American Statistical Association,
supported by a cooperative agreement between ASA and the Office
of Standards and Regulations, under the Assistant Administrator
for Policy Planning and Evaluation, U.S. Environmental Protection
Agency.
Conference Chair and Organizer:
Walter S. Liggett, National Bureau of Standards
76
-------
APPENDIX B: Conference Participants
ASA/EPA CONFERENCE ON SAMPLING AND SITE SELECTION
IN ENVIRONMENTAL STUDIES
May 14-15, 1987
Holiday Inn Capital
Washington, D.C.
Joe Abe
U.S. EPA
401 M Street, S.W. (WH-565E)
Washington, DC 20460
Ruth H. Allen
U.S. EPA
Research and Development
401 M Street, S.W.
Washington, DC 20460
Glenn F. Atkinson
Environment Canada
Fifth Floor, Place Vincent Massey
Ottawa, Canada K1A OH3
R. Clifton Bailey
U.S. EPA
401 M Street, S.W. (WH-586)
Washington, DC 20460
James C. Baker
U.S. EPA, Region 8
999 18th Street, Suite 501
Denver, CO 80202-2405
Ted O. Berner
Battelle Columbus Division
2030 M Street, N.W., Suite 700
Washington, DC 20036
Gordon D. Booth
USDA Forest Service
797 East 5050 South
Ogden, UT 84403
Jill J. Braden
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850
Dennis L. Brandon
USAE Waterways Experiment Station
P.O. Box 631
Vicksburg, MS 39180
Christine M. Bunck
U.S. Department of Interior
Patuxent Wildlife Research Center
8102 Triple Crown Road
Bowie, MD 20715
77
Thomas J. Bzik
Air Products and Chemicals, Inc.
P.O. Box 538
Allentown, PA 18105
Jean Chesson
Bertram Price Associates, Inc.
2475 Virginia Avenue, #213
Washington, DC 20037
Robert P. Clickner
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850
Margaret G. Conomos
U.S. EPA
Exposure Evaluation Div. (TS-798)
401 M Street, S.W.
Washington, DC 20460
Gerald F. Cotton
NOAA/U.S. Department of Commerce
Air Resources Laboratory, R/E/AR
8066 13th Street
Silver Spring, MD 20910
James Craig
U.S. EPA (WH-563)
401 M Street, S.W. Room 2817 WSM
Washington, DC 20460
Noel A.C. Cressie
Iowa State University
Department of Statistics
Ames, IA 50011
James M. Daley
U.S. EPA
Office of Standards & Regulations
Washington, DC 20460
Stephen K. Dietz
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850
Don Edwards
Department of Statistics
University of South Carolina
Columbia, SC 29208
-------
Evan Englund
U.S. EPA , EMSL-Las Vegas
P.O. Box 15027
Las Vegas, NV 89114
Barrett P. Eynon
SRI International
333 Ravenscross Avenue
Menlo Park, CA 94025
Paul Flyer
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850
Ralph E. Folsom
Research Triangle Institute
P.O. Box 12194
Research Triangle Park, NC 27709
Ruth E. Foster
U.S. EPA
401 M Street, S.W. (PM-223)
Washington, DC 20460
Robert Fusaro
Columbia University
P.O. Box 166, Route 9D
Garrison, NY 10524
Paul H. Geissler
U.S. Department of Interior
Patuxent Wildlife Research Center
12504 Windover Turn
Bowie, MD 20715
Michael Ginevan
Environ Corporation
1000 Potomac Street, N.W.
Washingon, DC 20007
Leigh Harrington
ERIM
1501 Wilson Boulevard, #1105
Arlington, VA 22209
Cynthia L. Hertzler
EG&G Idaho
P.O. Box 1625
Idaho Falls, ID 83415
ToddHiggins
U.S. Army Corps of Engineers
P.O. Box 631
Waterways Experiment Station
Vickburg, MS 39180
W. Ted Hinds
U.S. EPA
401 M Street, S.W. (RD-680)
Washington, DC 20460
Matthew Hnatov
U.S. EPA
401 M Street, S.W..
Washington, DC 20460
Thomas T. Holloway
U.S. EPA
Region 7 Laboratory
25 Funston Road
Kansas City, KS 66115
Robert W. Jernigan
The American University
Math Dept., Clark Hall, Rm. 215
4400 Massachusetts Avenue, N.W.
Washington, DC 20016
W. Barnes Johnson
U.S. EPA
401 M Street, S.W. (PM-223)
Washington, DC 20460
Henry Kahn
U.S. EPA
401 M Street, S.W.
Washington, DC 20460
Louise C. Kern
Texaco, Inc.
P.O. Box 1404
Houston, TX 77251-1404
Debra S. Knopman
U.S. Geological Survey
MS 410, National Center
Reston, VA 22092
Charles T. Kufs
Roy F. Weston, Inc.
5-2 Weston Way
West Chester, PA 19380
Herbert Lacayo
U.S. EPA
401 M Street, S.W., (PM-223)
Washington, DC 20460
Emanuel Landau
American Public Health Assn
1015 15th Street, N.W.
Washington, DC 20005
Barbara Leczynski
Battelle Columbus Division
2030 M Street, N.W., Suite 700
Washington, DC 20036
Walter S. Liggett
National Bureau of Standards
Administration Building, A337
Gaithersburg, MD 20899
78
-------
Richard A. Livingston
University of Maryland
Geology Department
College Park, MD 20742
Gordon R. Luster
Woodward-Clyde Consultants
100 Pringle Avenue, Suite 300
Walnut Creek, CA 94596-3564
David Marker
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850
Raymond E. Mclntyre
Fairfax County Air Pollution
Control
10777 Main Street, Suite 100A
Fairfax, VA 22030
Daniel I. Michael
Research Triangle Institute
1717 Massachusetts Avenue, N.W.
Suite 102
Washington, DC 20036
John Michael
Westat, Inc.
11725 Happy Choice Lane
Gaithersburg, MD 20878
Steven P. Millard
University of Washington
Biostat. Consulting Unit, SB-77
Seattle, WA 98105
Patricia A. Mundy
U.S. EPA
RD-680, 401 M Street S.W.
Washington, DC 20460
Patricia Murray
U.S. EPA (WH-563)
401 M Street, S.W., Room SE264 WSM
Washington, DC 20460
Avis D. Newell
Northrop Services
Environmental Research Lab
200 SW 35th Street
Corvallis, OR 97333
Maurice E.B. Owens
Science Applications Int'l Corp.
8400 West Park Drive
McLean, VA 22102
Tim B. Parkin
U.S. Department of Agriculture
ARS, Building 007, Room 229, BARC-W
Beltsville, MD 20705
Bruce Peterson
CH2M Hill
P.O. Box 91500
Bellevue, WA 98009
Kurt H. Riiters
Forest Response Program
299 SW 35th Street
Corvallis, OR 97333
Douglas S. Robson
Cornell University
Biometrics Unit, Warren Hall
Ithaca, NY 14853
John W. Rogers
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850
Charles A. Rohde
Johns Hopkins University
Department of Biostatistics
615 N. Wolfe Street
Baltimore, MD 21205-3179
Marie L. Saint-Louis
EG&G Idaho, Inc.
P.O. Box 1625
Idaho Falls, ID 83415
Paul D. Sampson
University of Washington
Department of Statistics, GN-22
Seattle, WA 98195
Paul A. Schif felbein
DuPont Company
Engineering Dept., Louviers 51N13
P.O. Box 6090
Newark, DE 19714-6090
Bradley D. Schultz
U.S. EPA (TS-798)
Design Development Branch
401 M Street, S.W., Room E309
Washington, DC 20460
Luther A. Smith
North Carolina State University
Atmospheric Impacts Research Program
2501 Anne Carol Court
Raleigh, NC 27603
Carol J. Spease
U.S. Bureau of Labor Statistics
12405 Braxfield Court, #14
Rockville, MD 20852
79
-------
Douglas E. Splitstone
IT Corporation
10 Duff Road
Pittsburgh, PA 15235
Stephen V. Stehman
Cornell University
337 Warren Hall
Ithaca, NY 14853
Allan Stewart-Oaten
University of California
Department of Biology
Santa Barbara, CA 93106
Therese A. Stukel
Dartmouth Medical School
HB7927
Hanover, NH 03756
Clayton L. Stunkard
U.S. EPA
Statistical Policy Branch
401 M Street, S.W.
Washington, DC 20460
Charles Taillie
Pennsylvania State University
CSEES
316 Pond Lab
University Park, PA 16802
Cassie Thompson
Westat, Inc.
1650 Research Boulevard
Rockville, MD 20850
Anthony D. Thrall
Electric Power Research Institute
3412 Hillview Avenue
Palo Alto, CA 94303
George C. Tiao
University of Chicago
Graduate School of Business
1101 E. 58th Street
Chicago, IL 60637
Peter Tong
U.S. EPA
401 M Street, S.W. (WH-548A)
Washington, DC 20460
Y.K. Tung
University of Wyoming
P.O. Box 3322 University Station
Laramie,WY 82071
Alta Turner
EBASCO Services
160 Chubb Avenue
Lyndhurst, NJ 07071
Paul Wakim
American Petroleum Institute
1220 L Street, N.W.
Washington, DC 20005
Elbert Walker
New Mexico State University
Mathematics Department
Las Cruces, NM 88003
John Warren
U.S. EPA
401 M Street, S.W. (PM-223)
Washington, DC 20460
Herbert I. Weisberg
Consulting Statisticians, Inc.
20 William Street
Wellesley, MA 02181
Dorothy Wellington
U.S. EPA
401 M Street, S.W. (PM-223)
Washington, DC 20460
Louis Wijnberg
PEI Associates Inc.
310 Blue Ridge Road
Carrboro, NC 27510
Jack W. Zolyniak
Martin Marietta Energy Systems
P.O. Box P, K-1004D, MS-278
Oak Ridge, TN 37831
American Statistical Asociation
Fred C. Leone
Executive Director
Mary Esther Barnes
Conference Coordinator
so
------- |