.
•
Future
Direction
far Statistics
at EPA
Twelfth Annual EPA Conference
on
Environmental Statistics
Richmond, Virginia
ApriM -3,1997
.
. .. -•
.
-------
Chapter 40
Geostatistical Sampling Designs
for Hazardous Waste Sites
George T. Flatman and Angelo A. Yfantis
This chapter discusses field sampling design for environmental sites and hazardous waste
sites with respect to random variable sampling theory, Gy's sampling theory, and geostatis-
tical (feriging) sampling theory. The literature often presents these samp/ing methods as an
adversarial "either/or" philosophy; this chapter emphasizes when each should be used with
a cooperative "both/and" philosophy. The intrasample variances, biases, or correlations
must be taken care of by the use of Gy's sampling theory for both independent random vari-
able sampling and analysis and correlated random variable sampling and analysis. The
deciding/actors in the choice of sampling design and analysis are not just intersample vari-
ances, biases, or correlations but also the discreteness of the waste under investigation,
remediation as a unit, and the relative cost of samples versus the cost of remediation.
ENVIRONMENTAL SAMPLING is a multidisciplinary science. It requires chemists,
media experts, risk assessors, and even statisticians. The sampling design is
an integral part of the experimental design and data analysis, and most
importantly, the data analysis cannot recover more information than the samples
contain. Thus the statistician needs to be on the project from its inception. Optimal
environmental sampling requires consideration of at least three branches of statis-
tics. Classical random variable statistics (1) are needed in quality assurance (QA)
and in the analysis of data that are reasonably independent (little or no process, spa-
tial, or chronological correlation). Gy's theory of sampling (2) is needed for the def-
inition of correctness for the "field sample" [determination of amount (mass or vol-
ume) sampled] and any samples taken in heterogeneous media (almost all
environmental samples). Geostatistics, and its most used form, kriging (3), is
needed for field sites with a spatial structure. The choice of sampling designs—
3152-4/96/0779$15.75/0 © 1996 American Chemical Society
-------
780 PRINCIPLES OF ENVIRONMENTAL SAMPLING
when to use classical random design or kriging's regular grid design—is a difficult
decision. Even statisticians differ on such a question. This chapter discusses the sta-
tistical rules diat enter into die decision. The decision depends on specifics of die
site and remediation plan as well as statistical aspects. For example, Gy's theory
must be used to take a correct sample for eidier random variable statistics (sampling
or analysis) or geostatistics (sampling or analysis).
When 1 discussed the role of statistics in sampling design with a manager of a
chemical laboratory, the manager confided in me diat his statistician's recommen-
dations were always illogical and irrational and contradicted common sense. We did
not have time to discuss specifics, but 1 suspect die advice he received was also poor
statistically because it confused the use of random variable statistics with die use of
spatial statistics. If the correct branch of statistics has been chosen, statistical
requirements can be explained from statistical theory in a logical and reasonable
manner that does not defy common sense. It is important in a multidisciplinary
project for all to be comfortable widi the soundness of the decisions. Statisticians
should be asked to explain the statistical requirements diey recommend until all feel
comfortable widi the design.
Random Variable Statistics
A random variable has both magnitude and probability. It may come from a sym-
metric distribution such as normal or uniform, or from a skewed distribution such
as lognormal or Poisson. Chemical environmental data sets are often assumed log-
normal, and radioactive data sets are often assumed Poisson. Because both distribu-
tions are positively skewed, the estimate of the mean based on few samples has a
higher probability of being underestimated than die mean of a normal distribution
or any symmetric distribution with a strong central tendency. Random errors as
monitored by QA are often assumed normal. The branch of statistics that deals with
random variables gives us the statistical inferences that have tools for QA. Random
variables provide measures of central tendency (such as mean, median, and mode),
dispersion (such as range and standard deviation), and statistical inference (such as
confidence intervals, prediction intervals, and tolerance intervals).
The mean and standard deviation are die statistics usually sought by a sam-
pling campaign; they are sufficient statistics (i.e., completely define die distribution)
for die normal distribution. The mean of any distribution becomes normal as die
number of samples, n, becomes large. This property justifies die use of confidence
intervals for die mean if, and only if, n is large enough (n > 16 for a symmetric dis-
tribution, and n > 50 for a skewed distribution). However, if the number of samples
is much fewer than 50 samples from the typical environmental distribution in a con-
fidence interval, then these limits are not to be trusted. Eidier knowledge of the dis-
tribution or transformation to normality is required for statistical inference about die
variable, its distribution, or future samples. A listing of means and standard devia-
tions or intervals, without investigating die distribution, is misleading and has die
-------
40. F1ATMAN AND YFANTIS Geostatistical Sampling Designs 781
potential of inviting wrong decisions because the readers will assume normality.
Nonparametric intervals and tests are available, but they lack power. For example,
the critical values for one-sided intervals for probabilities (1 - a) of 0.95 and 0.99
using the Tchebycheff inequality are 4.472 (square root of 20) and 10.000 instead of
the standard normal distribution values of 1.64 and 2.33. Most regulators will
cringe at 4 or 10 in a compliance hypothesis test. Anodier consideration is that ran-
dom variable sampling design requires rigorous definitions of the population and
sampling unit, so that the design can give each sampling unit an equal probability of
being chosen. This requirement will be discussed further.
Population Defined
In environmental samples, population is not as obvious or as well-defined a term as
it is in statistical textbooks (e.g., all the cards in a deck, or the two sides of a coin).
In site evaluation, the most obvious population is die waste site as a whole, but die
usual site has more than one population of interest. It may have population(s) of
plume(s) and background population(s). The population of interest is the popula-
tion (s) of the plume(s). \\kste plumes seldom honor property boundaries or travel
in politically defined shapes such as city blocks. Thus die populations of interest are
the plumeCs) and the background, not a mixture of these. To average all the samples
from the site would give an estimate of a mean from a mixture of populations, a
"fruit salad" of plumeCs) and backgroundCs). If the location and extent of the plume
or background are not known, but a map of mean contours (isopledis or isarithmic
lines) is wanted for multiple remediations, then this situation would require geosta-
tistical sampling and analysis. If the waste to be evaluated is well-defined and con-
fined, such as liquid waste stored in 55-gallon drums or a waste pile on a tarp that
will be disposed of as a unit, then the population of interest is the drum or pile and
dierefore classical statistics (a mean value) will be adequate for the decision.
Sampling Unit Defined
For textbook statistics, a sampling unit is a draw of a card or a flip of a coin, but for
an environmental sampling the unit is complicated by natural variation (e.g., media
heterogeneity or pollutant characteristics) and sampling tool variation and biases
(4). In laboratory QA the unit may be the contents of every ith vial in die queue of
die analyzing instrument. At an environmental site, die "sampling unit" is ambigu-
ously used to refer to both the sample and die sample support. The sample is much
smaller in volume or mass than the sample support, but if it is representative, it has
approximately die same concentrations of die pollutant or the same values of some
measured characteristic. The sample, simple or composite, is a small critical mass
diat is taken from die sample support for measurement. The sample support is die
larger volume or mass of in situ media diat is to be represented by the measure of
die sample. The sampling support is often die same volume as the remediation unit.
These two units are determined by the goal of the sampling campaign or die reme-
-------
782 PRINCIPLES OF ENVIRONMENTAL SAMPLING
diation option(s), but they must meet the requirements of Gy's theory of sampling
and geostatistics, which are each discussed in subsequent sections of this chapter.
The extractable mass or volume (field sample) cannot be dictated by the size of the
sampling tool or the size of the official container. It should be determined by the
heterogeneity of the media in accord with Gy's theory. Differing amounts of media
of interest, because they are ambiguously called "sample", should be identified by
size and use. The analysis sample (i.e., aliquot or split), used in its entirety by the
chemist for analysis, has a mass less than that of the preparation sample, which has
a mass less than or equal to that of the field sample. Each change of scale or reduc-
tion of sample mass must pass Gy's requirements (see the subsection Analytical
Error). The name of the sample is unimportant, but the change of mass is impor-
tant. Any change in volume (mass) must be checked using a monogram made up for
the current site. Extraction(s) for the field sample from the in situ sample support
(i.e., sampling unit) must satisfy both Gy's theory requirements and geostatistical
requirements.
Dealing with Correlation in Practice
In theory, the difference between an independent random variable and a random
variable correlated in time or space is clear, but this difference is not so clear in prac-
tice. In practice, most environmental samples are correlated in either time or space,
and possibly in both time and space, yet a random sampling or analysis is done.
Even the analyses of the samples in the queue of a mass spectrometer (MS) are cor-
related somewhat in time, but this correlation is weak enough and the QA samples
are spaced far enough apart that the correlation can be ignored. Correlation in space
or time can be taken into account by slighdy more complicated formulas in random
variable statistics; Gilbert (5) gives relevant sediment and groundwater examples of
how correlated sample units require more samples to be taken (larger n) than if the
observations were independent. The critical criterion for using a spatial sampling
and data analysis is the management decision or need to see a contour (isopleth)
map of die pollutant location as well as concentration (these are kriging results) in
place of a list or histogram of chemical analyses with a confidence interval about an
estimate of some mean (random variable output).
Pierre Gy's Sampling Theory
Pierre Gy is a mining engineer and Francis Pitard is a chemist. Both men have had
brilliant careers in process and mining quality control. Pitard has written a two-
volume work (2) that captures and communicates their experiences in die sampling
of heterogeneous media. These volumes are valuable for environmental sampling of
soils or sediments. Pitard organizes the taking of "correct" samples with correct sam-
pling tools, according to seven "errors". The emphasis on correct samples and tools
is analogous to the emphasis from the U.S. Environmental Protection Agency (EPA)
-------
40. FLATMAN AND YFANTIS Geostatistical Sampling Designs 783
on representative samples. Because of the potential of one, some, or all seven of
these errors to erode the correctness or the representativeness of an environmental
sample, this chapter will refer to them as "variances" to stress their additivity for a
component-of-variance model, "\fcriance" emphasizes the intrinsic nature of these
errors or biases in heterogeneous material sampling, in contrast to the negative con-
notations of these terms in the vernacular ("error" as a careless mistake; "bias" as an
intentional dishonesty). Variance, error, and bias are technical terms that describe dif-
fering problems with different solutions. An "error variance" is often thought of as
symmetric with a mean (expectation) of zero and as reducible by taking more sam-
ples; a "bias variance" is one-sided (e.g., always too high or too low) and is reducible
not by taking more samples but only through a correct sampling design. The sym-
metry or one-sidedness must be carefully thought out and often field-tested for all
potential variance in any sampling design and QA plan.
This theory sounds like any QA plan talking about errors, but it refers to a dif-
ferent type of error and needs to be discussed in its own part of the QA plan. Specif-
ically, it deals with intrasample error (errors within the sample) rather than inter-
sample error (errors between samples). The various components of variance of this
sampling theory sound trivially obvious when pointed out, but they are easily over-
looked in the stress of formulating a QA or sampling plan. Leaving them out can be
disastrous for QA and data quality objectives. Even though these sources of varia-
tion sometimes are obvious and trivial, they must be taken into account in every
environmental sampling plan.
The Fundamental Error
This component of variance is a natural property of heterogeneous material. It is not
an error in the sense of an avoidable mistake; however, if the sample planner does
not take it into consideration it will generate unnecessary (avoidable) variance in the
laboratory analyses. The variance is caused by the range of particle sizes in the
medium and the fact that often only certain sized particles contain the pollutant of
interest. This situation is illustrated in Figure 1; the shaded or lined particles are
assumed to contain or carry the pollutant, and the odier particles are die heteroge-
neous medium. Thus the chemical analysis depends on two values: die number of
solid particles (percentage composition), and their concentration. This dependence
adds anodier variance term or component of variance (percentage composition) to
the analytic variance. The magnitude of this error is small in a fine or homogeneous
soil or sediment but becomes larger as the medium becomes more heterogeneous in
particle size and particle affinity for the pollutant of interest. This fundamental com-
ponent of variance can be reduced by increasing the mass of the sample or by reduc-
ing die particle size of die sampling material by appropriate digestion.
To maintain die original level of accuracy, die sample material must always be
reduced in maximum particle size before being reduced in mass or volume (split or
aliquot). The mass of a sample required for a given relative variance [relative stan-
dard deviation (RSD) squared] can be read from Pitard's nomograms as a function of
-------
784
PRINCIPLES OF ENVIRONMENTAL SAMPLING
Figure 1. Heterogeneous material: fundamental error. (Reproduced with permission from reference 2,
Vol 1. Copyright 1989 CRC Press.)
various physical properties, the most important one being maximum particle size of
the medium (6). This relationship will be directly applicable to waste monitoring if
the pollutants of interest are heavy metals, but the application to volatile chemicals
or semivolatile chemicals remains to be developed. The EPA has a very readable doc-
ument on this subject that presents an example nomogram for soil properties (7).
The extension of Gy's theory to volatile chemicals and semivolatile chemicals is a
very important but as yet undeveloped part of environmental sampling.
Grouping and Segregation Error
There is potential for this variance in any heterogeneous media. The grouping and
segregation error develops through movement of samples through processing, han-
dling, shipping, or mixing. The heterogeneity may be in density or size (also adhe-
sion, cohesion, magnetism, affinity for moisture, and angle of repose of crystalline
structure) so that the particles come together by groups during any movement or
vibration. Figure 2 illustrates this type of error for the pile at the end of a conveyer
belt. If the black particles contain the pollutant of interest, then a sample from the
right side of the pile will be biased high and a sample from the left side will be
biased low. In taking a sample of a waste stream or pile, the potential variance can be
minimized by sampling along the gradient of grouping and segregation. For soil,
gravel, or sediment being carried on a conveyor belt, the gradient of grouping and
segregation would be across the belt orthogonal to the direction of motion, and thus
a correct sample would be a rectangular (not a trapezoidal) section oriented across
the belt. Sampling a pile, a truck, or a railroad car of waste in a correct manner is
-------
40. F1ATMANANDYFANTB Geostatistical Sampling Designs 785
Stream
O
^Q
O«
O pQ
&?&&&
L^oOoof*,^* V«»*;««
Rgure 2. Grouping and segregation error: (Reproduced with permission Jrom reference 2, VoL 1.
Copyright J989 CRC Press.)
very difficult because of this component of variance. The correct time to sample is
before the pile is built or the truck or railroad car is loaded. In sample preparation,
Pitard suggests that the pouring of the well-mixed material from the V-blender, espe-
cially if the paniculate material is allowed free fall of any distance, can undo (defeat)
the blending (8). Aliquotmg increases this error. The general rule is that as aliquot size
decreases, the variance increases. Theoretically, as the size of the aliquot approaches
the size of the grains of the sample, this error grows larger without bounds. The corol-
lary to this theorem is the tact that the chemist, aliquoting to get the relatively small
amount of material (analytical sample) actually required for the analysis, can turn the
analytic equipment into a random number generator if the sample material has not
been ground to the required fineness and aliquoted correctly.
Spatial and Periodic Errors
These error sources could be periodic and/or spatial structures on the scale of the
extracted sample or the sample support (the in situ area or volume represented by
the sample). If they were of a larger scale they would be studied by a time series
analysis or a geostatistical analysis, but they are not of interest, and the decision sta-
tistic is the mean of the unit and not the means of the subunits. In the preceding
discussion of classical statistics, the 55-gallon drum was assigned to a classical sta-
tistical analysis instead of a geostatistical analysis, even though there may have been
a structure in concentration in the vertical dimension of the drum. No one wants a
contour map of the concentration of pollutant inside of a drum because the drum
will be remediated (disposed of) as a unit. However, this gradient cannot be ignored;
instead it must be representatively sampled by sampling each layer proportional to
-------
786
PRINCIPLES OF ENVIRONMENTAL SAMPLING
its volume. This sampling is accomplished by the choice of sampling tools. To min-
imize the microspatial variance, a "composite liquid waste sampler" (COLIWASA)
must be used. The name of the sampling tool tells an important principle. Com-
positing is an important tool in random variable statistics to save chemical analysis
costs, but in spatial statistics it is used to ensure that the sample is representative of
the in situ sample support. Subsample compositing is physically doing the same
diing that statistical averaging does to the numerical values of replicate samples,
except compositing loses the information about the variance or standard deviation,
with the benefit of saving the cost of (n - 1) chemical analyses. These are two quite
different and important uses of compositing.
Increment Delimitation and Extraction Errors
These two variances arise from the interaction of a sampling tool with die hetero-
geneity of the media sampled. The circles in Figures 3 and 4 can represent the cut-
ting edge of a plugging or coring device descending on the media to take a soil or
sediment plug or core. In Figure 3, taking the shaded area of the larger particles
would be the correct sample, but if the larger particles are hard compared to the
softer interstitial material, the tool will not cut through the harder particles to give
the desired correct sample. Rather, the large hard particles will be pushed out of the
sample if their centers of gravity lie outside the corer, as illustrated by the white par-
O
Figure 3. Increment delimitation error. (Reproduced with permission from reference 2, Vol. 2.
Copyright 1989 CRC Press.)
-------
40. FLATMANANDYFANTIS Geostatistical Sampling Designs
787
O
Figure 4, Increment extraction error, (Reproduced with permission from reference 2, Vol. 2.
Copyright 1989 CRC Press.)
deles in Figure 4. If their centers of gravity fall within the corer as illustrated by the
shaded particles in Figure 4, then the particles in their entirety will be included in
the sample. Either case is incorrect, but the two cases tend to average out. It is
important to distinguish these two concepts: (1) the delimitation error is the varia-
tion caused by the inability to cut through all the heterogeneous media and take the
part included in the circle of the coring or plugging device, and (2) the extraction
error is the variation caused by taking or pushing out of the way the whole hard par-
ticle as a function of whether its center of gravity falls in or out of the circle of the
corer or plugger. If the cylinder could be cut out exactly by a laser and then taken
out intact by levitation as in science fiction, these two errors could be avoided.
Today's solution to these problems is to have a corer or plugger that is at least two or
three times the diameter of the largest particle size.
Analytical Error
The EPA and the American Chemical Society have published many excellent papers,
proceedings, and books on this interdisciplinary subject. Therefore, to avoid dupli-
cation, we wish to speak only to the chemist's method of abstracting a much smaller
sample (analytical sample) from the prepared sample. This step, because of the
smallness of the mass of the analytical sample compared to the mass of the sample
from which it comes, is the sample most apt to incur an unacceptable magnitude of
Gy's fundamental error. If the analytical sample is taken by sticking a spatula ran-
-------
788
PRINCIPLES OF ENVIRONMENTAL SAMPLING
domly into the top of the material in the bottle and taking out the desired amount,
such a sample is a grab sample and not an aliquot or split; the chemical analysis is
very apt to give a value that is incorrect for Gy's theory and unrepresentative for reg-
ulatory use.
For an example of the grinding and splitting or aliquoting needed to acquire a
correct and representative analytical sample, the critical path (A—»B-»C—>D—>E)
should be traced through Figure 5, a nomogram adapted from references 2 (Vol. 1)
and 7. (Grinding cannot be done, however, for volatile and semivolatile pollutants or
to the media for a leach test.) First the nomogram must be made for the specific site
(e.g., particle sizes and particle characteristics). The horizontal orx-axis is the sam-
ple weight in grams, and the vertical or_y-axis is the RSD of the fundamental error;
both axes are in log scale. In the center of the nomogram is a family of linear graphs
that introduces the third variable, maximum particle size. Each particle size has its
own line, and each line represents one and only one particle size.
The two ways to reduce diejy-axis intercept, the RSD of the fundamental error,
are: (1) to take a line with smaller particle size from the family of graphs, or (2) to
take a larger weight of sample on thex-axis. First, in the family of linear graphs, the
top line of the family represents the largest particle size, namely 75 mm, and inter-
cepts the largest RSD on thejy-axis. The next lower line is 25.4 mm, and so on down
to the line with the lowest RSD, which is for a particle size of 0.2 mm. The 0.2-mm
line is probably representative of QA internal standards, in contrast with Super-
fund's definition of soil as <2 mm and the definition from the Resource Conserva-
tion and Recovery Act (RCRA) of soil as <9 mm. These disparities in sizes might
explain some of the bench chemist's problems with increasing variance or RSD (e.g.,
1000.
1000
Sample Weight (gms)
10,000
1,000,000
100,000
Figure 5. Maximum panicle size: preparation error. (Adapted from reference 2, Vol. 1, and
reference 7.)
-------
40. FLATMANANDYFANTIS Geostotisttcal Sampling Designs . 789
square root of relative variance) as samples come from QA internal standards, Super-
fund samples, and RCRA samples. Second, each graph has a negative slope, which
shows that as the mass of sample on the horizontal axis for a given particle size
increases, the relative variance intercepted on the y-aaas decreases. The horizontal
line labeled 15% RSD represents the target accuracy or maximum acceptable RSD. If
the maximum particle size of the material of interest is measured empirically to be
75 mm, and the pollutant of interest is one that can be pulverized or grown without
loss, such as a heavy metal (Pb), then from the intersection of the horizontal line of
maximum RSD = 15% and the downward sloping line of particle size 75 mm, the
necessary minimum sample weight can be read on the horizontal orx-axis as 100 g.
Thus the field technician or scientist must take a sample or composite subsamples
so that a field sample of 100 g or more is obtained.
If the chemist is going to take an aliquot of 1 g for the analysis (analytical sam-
ple), then the preparation procedure must follow a path such as A-»B-»C->D-»E
in Figure 5. To maintain the accuracy of the 100 g of field sample whose maximum
particle size is 75 mm, the digestion process must first grind and then split. Grind-
ing reduces particle size and splitting reduces the mass of the sample. Grinding is
going down on the nomogram from A to B representing pulverizing from a maxi-
mum particle size of 75 mm to a particle size of 25.4 mm, and aliquoting or splitting
is moving to the left along the 25.4-mm line on the nomogram from B to C, repre-
senting aliquoting or splitting the sample of 100 g to a sample of 10 g. The new crit-
ical mass due to the particle reduction or the location of C on the new smaller parti-
cle line is the last integer weight tick line that intersects the new particle line just
below the 15% relative error line. The amount of information in the 100 g of mate-
rial of maximum particle size 25.4 mm at B appears to have an order of magnitude
(axes in log scale) decrease in RSD. This apparent decrease is not true, because vari-
ance of an extracted sample is not reduced by grinding, or information is not created
by digestion, but it does mean that now we can split the sample mass down to the
new critical mass (10 g) on the current line (25.4-mm line) and still have the origi-
nal RSD of the 100-g sample, namely 15%. A nomogram path has no lower RSD
than its highest point (in this example, point A). Again, more digestion moves the
sample from C on the 25.4-mm line to D on the 6.35-mm line. No information is
created by grinding, but now the information in 100 g of 75-mm particles, namely
15% RSD, can be carried by a new critical mass of only 1 g as splitting or aliquoting
moves us along the 6.35-mm line to E,
The process makes sense if the would-be user remembers that grinding reduces
the critical mass needed to carry the same RSD and that the aliquoting or splitting
removes only the unneeded mass. One might well ask, "Why the broken path?
Wouldn't it be easier to grind all the way in one step and then split?" Yes, it would
be simpler, but it would require the grinding of a larger mass of sample; the stepwise
path minimizes the mass of material digested. In the interest of minimizing grinding
or preparation, Pitard suggests sieving the material so the part less than the new
maximum particle size falls through, and then grinding only the pan that did not fall
through, remembering to recombine the two.
-------
790 PRINCIPLES OF ENVIRONMENTAL SAMPLING
This process sounds a little complicated because it is complicated, but with
particle size analyses of the media of interest and with statistically guided prepara-
tion (pulverizing and splitting or aliquoting), a correct and representative analytical
sample can be prepared for the chemical analysis.
Spatial Variable Statistics
The old adage that a chain is as strong as its weakest link implies that the prudent
blacksmith will strengthen die weakest link and try to make all links equally strong.
The application to environmental sampling is that error variances are a chain: the
analytical variance, the sampling and handling variance, and die field variance are
links. The goal of quality improvement is to make the sum of die variances as small
as possible, and the cost-effective way to minimize this sum is to spend more
resources on die variance link that is improved most cheaply. Because of diminish-
ing returns in variance reduction, the optimal variance to reduce is often die biggest
one. The field sampling variance is often the appropriate link or variance to reduce.
Variance reduction is most obviously accomplished by taking more samples, but if
sampling or analytic costs are high, increasing samples may be too expensive. In
many cases, the field sampling variance is economically reduced by going from a
random to a spatial variable sampling design.
The term geostatistics was coined by Matheron (9) to describe the study of
regionalized or spatially correlated variables. In die past 20 years, the geostatistical
literature has grown enormously, and many significant developments in dieory and
methodology have been presented. The practice of geostatistics has also spread from
its original applications in the mining industry to such fields as soil science, forestry,
meteorology, and environmental science.
The geostatistical methods described in this chapter, namely semivariograms
and ordinary kriging, represent two of the approaches available to us, and we
selected diem primarily to illustrate geostatistical concepts and dieir implications for
sampling programs. A discussion of the pros and cons of alternate approaches, such
as generalized covariance and universal kriging, is beyond the scope of diis chapter.
More extensive treatments of die subject can be found in references 3 and 10.
Random or Spatial Variables
Most field sampling plans are based on random variable statistics and assume that
die sample observations are independent and identically distributed (IID). However,
field samples are usually spatially correlated. Correlation is a statistical measurement
of the intuitive physical fact that samples taken close togedier are more similar in
value than samples taken fanher apart. Neglecting this correlation can make the sta-
tistics, tests, and sampling procedures diat assume independence (IID) inappropri-
ate (11,12); using this correlation makes die statistics, tests, and sampling proce-
dures of spatial statistics more appropriate and powerful. A truly random variable is
-------
40. FLATMAN AND YFANT1S Geo$tatistical Sampling Designs 791
completely described by its probability distribution. Samples are used to estimate
this distribution and to estimate statistical descriptors such as mean, median, and
standard deviation. In addition, spatial variables must be described by a measure of
the correlation between each value and die values at nearby locations. Samples can
be used to estimate the spatial correlation function and are frequently used to esti-
mate localized mean values for remediation units or exposure units.
Localized mean estimates are often displayed in the form of isopleths or con-
tour maps. A practical rule for the investigator is that if a contour map is a desired or
even a plausible end product of a proposed study, geostatistical methods should be
considered.
The implications for the design of a sampling program can be significant.
Although random sampling is appropriate for random variables, Olea (13) demon-
strated that the most effective sampling pattern for local estimation of spatial vari-
ables is the regular grid. Yfantis (14) evaluated triangular, square, and hexagonal
grids. Also, geostatistical studies commonly use a multiphase approach, and the first
sampling phase is oriented primarily toward estimating the spatial correlation Q5).
Semivariogramsfor Quantifying Spatial Correlation
One way in which spatial correlation can be measured and displayed is by a semi-
variogram, or graph of the type shown in Figure 6. The dots are the empirical semi-
variogram representing experimental values computed from sample data; the fitted
curve is a theoretical semivariogram or an estimation of a spatial correlation function
Range of Correlation
T
Distance in Lags (h)-axis
Figure 6. A typical semwarioffom. (Reproducedfrom reference 16. Copyright 1988 American
Chemical Society.)
-------
792 PRINCIPLES OF ENVIRONMENTAL SAMPLING
assumed to be characteristic of the sampled area. The horizontal axis, called the lag
axis, is the distance between points in linear units such as meters or kilometers; the
vertical axis, called the gamma axis, is the variance of differences in pollution units
squared, such as parts per million squared. The experimental points are computed
by averaging data grouped into distance class intervals, \fcriance is a function of lag.
The rising nature of the points and curve follows the principle of sampling that
states the variance or difference between observations increases as the distance
between their locations increases.
Sill and Range of Correlation. Figure 6 is typical of many semivariograms of
chemical concentrations in the environment; the rise in variance has an upper
bound known as the sill. When the variance reaches the sill, sample locations are far
enough apart to make the samples independent. The distance on the lag axis at
which the semivariogram's curve reaches the sill is the range of correlation. This dis-
tance is important to die sampling plan, the estimation of pollution over the area
under investigation, and the interpolation error. The range of correlation explains a
practical relationship between spatial variables and random variables; random vari-
abks are field samples that are farther apart than the range of correlation, and spatial
variables art field samples closer together than die range of correlation. This range of
correlation is important for choosing die correct analysis; if a classical random vari-
able statistic is wanted, such as the mean or variance, then one type of sampling
design that would ensure spatial independence of the samples would be any sys-
tematic random design requiring that all samples are at least the range of correlation
apart (17). If a contour map of pollution isopleths or interpolation variance is
wanted, then as the sampling locations get closer together, the local interpolation
error decreases. Depending on the information wanted and the spacing of the sam-
ple locations, either random or spatial variance statistical analysis can be used on
field samples.
Variance Model. In Figure 6, on die vertical axis of the fitted model the variance
has two components, C0 and Cj. The Cj component of the variance is the measure
of structural variation and has the characteristic of increasing variance between sam-
ple observations as the distance between sample locations increases. The C0 com-
ponent of the variance combines random variance factors, such as sampling and
analytical error, along with any unmeasured spatial variance that may exist at dis-
tances smaller than die sampling interval; C0 is constant for all lags. The relationship
of C0 to the need for compositing samples and the relationship of Cl to the distance
between sample locations will be discussed in a later section.
Anisotropy and Directional Semivariograms. The variance structure, as
measured by the semivariogram, is often different in the range of correlation in dif-
ferent directions. This condition is called anisotropy and must be measured by direc-
tional semivariograms. Directional semivariograms are computed experimentally by
grouping sample pairs into directional classes, or windows, as well as into distance
-------
40. FLATMANANDYFANT1S Geostattstical Sampling Designs 793
classes. The directional ranges of correlation can change the geometry of the sam-
pling grid and the orientation of the grid. Often, not enough preliminary data are
available to compute directional semivariograms, and thus the sampling design
must work with only an omnidirectional range of correlation. However, an omni-
directional range of correlation and a sampling design from it honor the variance-
covariance structure more than conventional random variable methods that con-
sider only a scalar variance.
Kriging for Surface Estimation
Kriging is a linear-weighted average interpolation technique used in geostatistics to
estimate unknown points or blocks from surrounding sample data. By assuming
that the spatial correlation function inferred from the experimental semivariogram
is representative of the points to be estimated as well as those sampled, die inter-
polation error (kriging error or kriging standard deviation) associated with any esti-
mate that is a linear-weighted average of sample values can be computed. The krig-
ing algorithm computes the set of sample weights that minimize the interpolation
error.
Kriging software usually offers both punctual and block output options. Punc-
tual kriging treats die input values as located at points and output estimates as values
at points. Block kriging estimates die output for an area or volume (called block) by
averaging multiple points estimated over that area or volume. This difference is
determined in the sampling and becomes important in the data analysis (see the
subsection Sample Support and Estimation Blocks').
Kriging has a number of characteristics of a desirable estimation method: sam-
ple weights can be adjusted for anisotropy; samples in correlated clusters can be
down-weighted; die degree of smoodiing increases as the random component (CJ
of the semivariogram model increases; and, when the semivariogram model is com-
pletely random (Cj = 0), the kriging estimator becomes the sample mean, as in
independent random sample statistics.
Spatial Outliers
Spatial oudiers can be found by examining a geographical plot of the data; diey may
fit into a random variable histogram of all the data very well. In odier words, a spa-
tial outlier is a sample value that does not agree in magnitude with the values of its
neighboring samples, especially the samples within a range of correlation. For exam-
ple, a high (polluted) value in a low (background) neighborhood might be a spatial
oudier but not a random variable outlier because the high value agrees widi odier
polluted values. Once these outliers are identified, their location descriptions
should be looked up in the sampling diary. If they are obviously from different
sources that do not have the same correlation structure, they should be excluded
from the semivariogram evaluation. The question of whether to include a spatial
oudier in die final local estimate of concentration must be answered on a case-by-
-------
794 PRINCIPLES OF ENVIRONMENTAL SAMPLING
case basis. This matter involves the investigator's judgment, just as in the case of
random variables.
The following discussion exemplifies an analysis of spatial outliers. Investigat-
ing the data from a city-wide sampling campaign for Pb, exploratory data analysts
showed an empirical semivariogram with a range of correlation of at least 6 miles
and two hot spots that were one order of magnitude higher in concentration than
the rest of the data. The data set was printed out on a geographical plot that showed
the two hot spots to be in sharp contradiction to their individual local neighbor-
hoods, that is, every neighboring point and every point one neighbor out was at
least one order of magnitude lower in concentration. The geographical map that
identified the freeway system and the data showed that both points seemed very
close to the freeways. In checking the sample log book this conclusion was con-
firmed; one of the aberrant samples was taken under a freeway overpass and the
other at a freeway on-ramp. Freeway Pb is said to have a range of about 500 feet.
Thus, because the two points represented a different source of Pb and had a much
shorter range, they were excluded from the semivariogram computations. However,
what was to be done with them in the kriging and mapping? If they were included
in the kriging, they would spread their high values over circular areas of 6 miles in
radius. This representation would be grossly untrue because the outliers are known
to have a different source and a shorter range of correlation. The mapping would
show a large area needing remediation that, in fact, did not need remediation. Nev-
ertheless, the values had been found, and users of the map (risk assessors) needed
to know of the hot spots. The compromise was to krige and contour the Pb concen-
trations of the other samples onto a kriging map and then just print the magnitude
of such outliers at their respective locations on the map.
Spatial Soil Sampling
The growing number and complexity of toxic chemicals and hazardous waste sites
call for a new statistical technique for monitoring with more efficient sampling
designs and more precise data analysis. Geostatistics is a promising tool for these
needs. This section traces the logic sequence of geostatistical analysis and then
draws together the implications of geostatistical sampling design for soil pollution
monitoring. Geostatistical sampling design has at least two phases: (1) the survey or
the preliminary sampling to find the extent of the plume and to estimate a semivar-
iogram, and (2) the census to take as many samples as needed to estimate the sur-
face within the desired accuracy as calculated from the semivariogram model.
Sample Support and Estimation Blocks
The baste assumption of geostatistical sampling is to define and assign area or vol-
ume to all inputs and outputs. In monitoring for environmental protection, the spa-
tial quantities to be defined and assigned are the sampling unit (area or volume), the
-------
40. FLATMAN AND YFANT1S Geostatistical Sampling Designs 795
remediation unit, and the exposure unit. Geostatisticians call the sampling unit the
sample support. The sampling unit or support is ambiguous: it is used to refer to
both the amount of medium extracted for the sample and the in situ area or volume
represented by the sample. The context usually identifies whether the extracted sup-
port or the in situ support is meant. The remediation unit is determined by the
mediod of remediation, and die exposure unit is determined by the risk assessor.
For example, an appropriate remediation block might be a volume 250 ft long, 16 ft
wide, and 0.5 ft diick, because diis amount was the minimum volume to move eco-
nomically. The shape is dictated by die up and back pass of a bulldozer widi an 8-ft
blade diat scrapes up one truckload of contaminated soil. Sample unit, remediation
unit, and exposure unit need to be defined (18) and dien incorporated by a geostat-
istician into the sampling plan.
The critical mass of a correct sample should be calculated as previously
explained (see die subsection Analytical Error). The spatial variance of the sampling
unit should be measured by taking "too many" equally spaced samples in several
units in an exploratory sampling trip to the site. If die sampling unit is larger in spa-
tial variance (a large spatial variance can be encountered in a small area), dien die
field samples design will have to use composite samples. In spatial compositing die
geometry as well as the mass of the subsamples (samples to be composited) is
important. The general rule is that subsamples should be equally spaced on the
sampling unit. For example, if four subsamples can be afforded, then one should be
taken from each quarter of die in situ support. Each subsample for the compositing
should be a correct sample (see the subsection Analytical Error). All samples for all
analyses, even eye-balling, should have die same representativeness, which for com-
posite samples means die same number of subsamples. The composite field sample,
just like any other average, has a variance divided by die number of subsamples.
Homoscedastidty (equality of variances) is a requirement for every data analysis even
when eye-balling die data. If die quantity to be estimated (e.g., remediation or expo-
sure unit) equals die sampling unit, punctual kriging analysis may be used because
mere is no change of scale or support. If the desired area or volume of estimation is
larger than the sampling unit, block kriging will have to be used.
Survey or Semivariogram Sampling
In a multiphase sampling program using spatial statistics, the primary goal in the
initial exploratory sampling is die collection of enough data to compute an empiri-
cal semivariogram and to determine the extent of the plume. These goals may con-
flict if limited resources are available. Widely spaced samples are needed to define
extent, and closely spaced samples are often needed for semivariogram analysis.
Approaches to diis problem include regular grids (i.e., radial, square, or rectangu-
lar), transects, and combinations.
Burgess et al. (19) suggested transect sampling for variogram input, and this
idea led to very good variograms in agricultural applications. However, in pollution
monitoring, transects alone have given very noisy variograms. This result is probably
-------
796
PRINCIPLES OF ENVIRONMENTAL SAMPLING
due to intrinsic noise in pollution data, which is often highly skewed and contains
high coefficients of variation. A combination exploratory grid, consisting of a grid of
square sampling units having extended transects in the directions of the major axis
and minor axis of the estimated plume (20), is illustrated in Figure 7. Prior informa-
tion may be used to select the best grid orientation. For example, if the plume to be
investigated was made by aerial deposition from an identifiable source, then wind
roses can be examined for wind direction and magnitude, and topographic maps
can be examined for natural barriers. Only the relatively regular grid concept is
important in Figure 7; the orientation is site-specific.
If the extent of the plume must be found, and funds are limited, then the tran-
sect samples should be variably spaced closer together at the grid center and farther
apart at the grid extremes. The purpose of this sampling is to capture the correlation
structure of the plume. Inhabited areas have a high occurrence of disturbed sam-
pling sites and local pollution from secondary sources, which are only stochastic
noise to the semivariogram's calculation. Therefore, this noise should be avoided by
this sampling. For example, aerially deposited smelter Pb should not be mixed with
auto Pb by taking samples along the freeways. The samples from the semivariogram
sampling can be pooled with the secondary mapping samples if they have the same
support.
However, the semivariogram sampling often is the sampling that tests for the
need for more compositing. If the support is changed between the samplings and
we wish to pool the samples for analysis, then the change in support must be cor-
rected before pooling. The sampling team must be aware of the need to keep all
samples on the same support. When compositing, the same number and mass of
Major Axis of Plume
Figure 7, An exploratory grid design, (Reproduced from reference 16. Copyright 1988 American
Chemical Society.)
-------
40. FLATMAN AND YFANTIS GeostatMcal Sampling Designs 797
subsamples and the same spacing or geometry must be maintained. When the sam-
pled locations must move from the regular grid to avoid cultural improvements or
natural barriers, then the spatial analysis program is corrected for this movement by
the true coordinates of the new sample locations; however, no easy method is avail-
able for the program to correct for change of support. If the microvariation could be
sampled and the support established before the semivariogram sampling, then a
complex statistical problem could be avoided in the pooling of samples for the spa-
tial analysis.
Some samples should be taken close together (in the scale of the sampling
unit) to determine the need for composite samples. This sampling can be combined
with field duplicates for quality analysis and control. Gy's fundamental error and
compositing become more important as coring volumes decrease. These microvaria-
tion samples should be taken at a distance of a few multiples of the core's diameter
apart. The distance between sample locations or grid unit's length needs to be esti-
mated from the sample unit of interest (e.g., residential yard, city block, or square
mile section) and the desired output unit (e.g., remediation unit, that is, the mini-
mum volume of surface soil to be removed). The optimum exploratory sampling
distance is a proper fraction of these measurements, but it is often determined by
money available for sampling.
Census or Sampling for Map Making
In spatial statistics, the goal of secondary sampling Is to uniformly cover the area in
question with a density of samples sufficient to contour the plume with an accept-
able error of interpolation. This sample coverage is accomplished by using the direc-
tional semivariograms to determine the orientation, shape, and size of the grid cell.
Independent random variable statistics, in which the number of samples is com-
puted, differs from spatial statistics, in which orientation, shape, and size of the grid
are calculated and the number of samples is determined from the number of grid
cells needed to cover the area.
If the directional semivariograms have a marked difference in their respective
ranges of correlation, then the optimum cell geometry is not a square but a rectan-
gle with the longer side in the direction of the longer range of correlation, and the
ratio of the sides should be the ratio of the ranges of correlation. Thus the grid cell
sides are of equal correlation or kriging (interpolation) variances rather than equal
distance. This characteristic will save a lot of samples while retaining the same accu-
racy in both directions.
Boundary. For secondary sampling, the extent of the sampling grid must first be
chosen. The sampling grid must extend beyond the suspected plume or area in
question. The area in question must be bounded by sampling locations to avoid
extrapolation in the kriging estimation algorithm for contouring. Extrapolation,
which is estimating a value from data on only one side of the location of the point to
be estimated, is likely to lead to unrealistically high or low values. If an action level
-------
798 PRINCIPLES OF ENVIRONMENTAL SAMPLING
has been set and a part of the plume has been adequately proven to be above or
below the acting level, then that part of the plume need not be resampled. The sam-
pling may be guided more by population areas or critical receptors than by the
actual plume. The goals of the sampling must be written, and the areas of interest,
action levels, and action areas (sampling unit, remediation unit, and exposure unit)
must be defined before the optimum grid design can be made.
Compositing Samples Reduces Nugget. The next step in secondary sampling
is choosing the sample support (21). If a residential yard is the sampling unit, then
the ideal sampling process would be to take the entire yard, blend it to homogene-
ity, and remove the appropriate number of aliquots or splits to meet the volume
needed by the laboratory for analysis. However, because few residents would donate
their whole yard to science, and laboratory mixing equipment such as V-blenders or
ball mills cannot homogenize so large a volume, this sampling unit must be repre-
sented by a few symmetrically laid out subsamples composited together. The num-
ber of subsamples is a compromise between the size of the microvariance and the
amount of time and money allowed for the digestion of the subsamples. The sub-
samples are laid out symmetrically because a structural or spatial correlation may
exist.
The mixing of the subsamples to achieve homogeneity is essential for com-
positing. If the medium is water, then the task is relatively easy; for soils or sedi-
ments, the task is difficult. Aliquots or splits should be taken after the mixing to
make the final sample more representative. If a large nugget (e.g., C0 > 0.3 relative
variance) persists after Gy's critical mass calculations and compositing within the
support, then the relative sizes of the field sampling and the laboratory analysis
errors must be identified. The analysis of some pollutants has an analytical error that
overwhelms the field sampling error and accounts for approximately all the semivar-
iogram nugget.
The minimum volume at each step and especially the aliquot used by the
chemical analyst in die lab must exceed the critical mass referred to in Gy's theory
(see die section Pierre Gy's Sampling Theory),
Grid Unit Length or Distance Between Sample Locations. The range of
correlations, the nugget (CJ, and the sampling budget determine the grid unit
length, or the distance between sample locations. This length determination was dis-
cussed in mathematical detail byvYfantis et al. (H). Figure 8 shows the graphs of
interpolation variance as a function of die ratio of grid spacing to range of correlation
for a family of semivariograms. The model variograms each have relative Cl and C0
so that dieir sum equals 100%. The variograms differ only in the fraction of die sill
(C0 + Cj) represented by the nugget component (CJ and the structure (Cp. If the
semivariogram has a big nugget like the top graph of Cj = 10% and C0 = 90%, dien
diminishing returns (the curve has less rapid vertical drop and becomes more hori-
zontal) start and increase if the sample distance is less than two-diirds of die range
of correlation. For a very low nugget, such as the lowest graph (Cj = 100% and C0
-------
40. FLATMAN AND YFANT1S Geostatistical Sampling Designs 799
Variance
c
o
«-»
to
o
CL
^
u
*-
c
o
_c
'5>
a
0>
»-
o
c
i$*^_
^ x \ "~ _ C,
V ^ ^ ^ ^ - "--"- "c"
Xxx^""----..... r "~c>
^--__ c,
c,
II 1 1
= 10 C0=90
=20 C0=80
^40~C~=60~
=60 C^40~
=80 C0=20
= 100 C0=0
i
1.0 2/3 1/2 1/3 1/4 1/5
Ratio of Distance between Samples to Range of Correlation
figure 8. Diminishing information for additional samples. (Reproduced from reference 16. Copyright
1988 American Chemical Society.)
= 0%), diminishing returns do not start and increase until the sampling distance is
less dian one-half of the range of correlation. The general rule is that for smaller
nuggets (Cg), the distance between sampling points on the sampling grid gets
smaller. The grid should be laid out with no vertices unsampled. If this design
exceeds budget, then the whole grid size should be adjusted, not just certain ver-
tices left unsampled as in systematic random sampling.
Some real-world examples can clarify how the magnitude of the nugget (CJ
and the range of correlation determine the optimum cell size or distance between
samples. One Pb smelter had a nugget of about 40% and a range of correlation of
3200 ft. In Figure 8, the family of diminishing return curves and the graph (for C0
= 40%) indicates by observation and judgment that the point of diminishing
returns is between one-third (33.3%) and one-fourth (25%) of the range of correla-
tion, or 29% for the sake of argument. The sampling distance should not be less
than 29% x 3200 ft, or 928 ft. Expressed as a function of money, the sampling dis-
tance should be the shortest affordable distance in keeping with die toxicity of the
pollutant, but not less than 928 ft between samples. In contrast, a second Pb
smelter had a semivariogram widi a nugget of zero (0) and a range of correlation of
2400 ft. In Figure 8, die curve of diminishing returns for C0 = 0 indicates by obser-
vation and judgment that die point of diminishing returns is between one-fourth
(25%) and one-fifth (20%) of the range of correlation, or 22.5% for the sake of argu-
ment. In this case, die sampling distance should not be less than 22.5% x 2400 ft,
or 540 ft. Expressed as a function of money, the sampling distance should be the
shortest affordable distance in keeping with the toxicity of the pollutant, but not less
than 540 ft. If the funding is adequate and die pollutants are of extreme toxicity,
-------
800 PRINCIPLES OF ENVIRONMENTAL SAMPLING
then the distance indicated by the point of diminishing returns should be used in
minimum interpolation variance. If there is less money and the pollutant is less
toxic, then a longer distance should be used for the grid cell's side. The directional
semivariograrns should orient the sampling grid.
The Pb smelters mentioned previously worked well with an east-west and
north-rsouth grid because the plume was formed by 80 years of aerial deposition. A
third set of data, dioxin along a highway, gave a readable semivariogram in a direc-
tion of 13 degrees from east to west. This discovery took much searching because
we started with the default directions (0, 45, 90,135 degrees) of the semivariogram
software; these default semivariograms showed no structure [pure nugget semivari-
ograrns (C0 = 100%)]. After we discovered the semivariogram at 13 degrees the rea-
son became obvious, because the road that was the transport of the pollutant ran at
that angle and so should any sampling grid.
In the field, some vertices cannot be sampled because of man-made improve-
ments or natural barriers, but these vertices must be sampled as closely as possible,
and die actual coordinates should be used in the spatial analysis program.
Grid Orientation and Shape Versus Anisotropy. If the ranges of correlation
are extremely different on the directional semivariograms, then the correlation
structure is anisotropic. Optimum sampling patterns reflect this anisotropy. For
example, the sides of a rectangular grid would be in die same ratio as die ranges of
correlation for the corresponding directional semivariograms. This ratio was
explained in detail by David (22), and a sampling design for logaridimic anisotropy
was derived by Barnes (23). Anisotropy is a frequent occurrence, but often the
semivariogram sampling gathers too few samples to measure it. Thus, more sam-
ples may be used cost-effectively in die semivariogram sampling in order to save
samples in the larger census (or mapping) sampling by identifying and taking
advantage of anisotropy.
Use of the triangular grid as opposed to the rectangular grid has been dis-
cussed (13, H). If the nugget is large (C0 > > Cj), litde is gained by the triangular
grid. Also, the triangular grid makes taking advantage of anisotropy more difficult. If
a triangular grid is chosen, a theodolite, which is a surveying instrument, is not
needed in die field; instead every odier row of samples must be offset by one-half of
a grid lengdi. In practice, diis action is easier than it sounds and almost as easy as
die traditional square grid.
Beyond Anisotropy
Numerous additional geostatistical considerations affect environmental sampling.
These considerations include spatial drift or trend, multivariate analysis, mixed or
overlapping populations, concentration-dependent variances, and specification of
confidence limits. Geostatistical techniques have been developed over the years to
deal with these various problems, but an adequate discussion is beyond die scope of
diis chapter.
-------
40. FLATMAN AND YFANTIS Geostatistical Sampling Designs 801
Acknowledgments
The EPA, through its Office of Research and Development, funded and performed
the research described here.
References
1. Gilbert, R. O. Statistical Methods for Environmental Pollution Monitoring; Van Nostrand
Reinhold: New York, 1987.
2. Pitard, F. F. Pierre Gy's Sampling Theory and Sampling Practice; CRC Press: Boca Raton, FL,
1989; Vol. 1 & 2.
3. Isaaks, E. H.; Srivastava, R. M. An Introduction to Applied Gcostatistics; Oxford University:
New York, 1990; pp 1-592.
4. Pitard, F. F. In Pierre Gy's Sampling Theory and Sampling Practice; CRC Press: Boca Raton,
FL, 1989; Vol. 2, p 36.
5. Gilbert, R. O. In Statistical Methods for Environmental Pollution Monitoring; Van Nostrand
Reinhold: New York, 1987; pp 35-42.
6. Pitard, F. F, In Pierre Gy's Sampling Theory and Sampling Practice; CRC Press: Boca Raton,
FL, 1989; Vol. l.pp 169-183.
7. Preparation of Soil Sampling Protocols: Sampling Techniques and Strategies; Center for Envi-
ronmental Research Information: Cincinnati, OH, 1992; pp A1-A16;
EPA/600/R-92/128.
8. Pitard, E E In Pierre Gy's Sampling Theory and Sampling Practice; CRC Press; Boca Raton,
FL. 1989; Vol. I.pl90.
9. Matheron, G. Earn, Geol, 1963, 58, 1246-1266.
10. Journel, A. G. Geostatistics/or the Environmental Sciences; Stanford University: Stanford,
CA, 1986.
11. Palmer, M. W Vegetation (Dordrecht, Netherlands) 1988, 75, 91-102.
12. Cliff, A. D.; Ord, J. K. Spatial Processes: Models and Applications; Pion: London, 1981; pp
1-266.
13. Olea, R. A. Math. Geol. 1984, J6(4), 369-392.
14. Yfantis, E. A.; Flatman, G. T; BeharJ. VMath. Geol. 1987,19(3). 183-205.
15. Flatman, G. T; Yfantis, E. A. Environ. Monit. Assess. 1984, 4, 335-349.
16. Flatman, G. T; Englund, E. J.; Yfantis, A. A. In Principles of Environmental Sampling;
Keith, L H., Ed.; ACS Professional Reference Book; American Chemical Society: Vvfcsh-
ington, DC, 1988; pp 73-84.
17. Borgman, L E.; Quimby, W E In Principles of Environmental Sampling; Keith, L. H., Ed.;
ACS Professional Reference Book; American Chemical Society: \\kshington, DC, 1988;
pp 25-43.
18. Neptune, D.; Brandy, E. E; Messner, M. J.; Michael, D. I. Hazard. Mater. Control 1990,
May/June, 19-25.
19. Burgess, T M.; Webster, R.; McBratney, A. B. J. Soil Sci. 1981,32, 643-659.
20. Starks, T H.; Brown, K. W; Fisher, N. J. In Quality Control in Remedial Site Investigation;
American Society for Testing and Materials: Philadelphia, PA, 1986; Vol. 5, pp 57-66;
ASTMSTP925.
21. Starks, T H. Math. Geol. 1986,18(6), 529-537.
22. David, M. Geostatistical Ore Reserve Estimation; Elsevier Scientific: Amsterdam, 1977.
23. Barnes, M. G. Statistical Design and Analysis in the Cleanup of Environmental Radionuclide
and Other Spatial Phenomena; TRAN STAT (Statistics for Environmental Studies) No. 13,
Battelle Memorial Institute: Richland, WA, 1980; pp 1-21.
Reprinted from ACS Professional Book
Principles of Environmental Sampling
Lawrence H. Keith, Editor
Published 1996 by the American Chemical Society
-------
Welcome to the 12th Annual EPA Conference, on Statistics
After a year's hiatus, it is a pleasure to welcome you to the 1997 EPA Conference on
Statistics. What a difference a year can make! Last year, we postponed the conference and held
a series of "local" training sessions. While these were a great success, they were local and. ,<
accessible only to those folks in the .Washington or Raleigh/Durham areas.' I heard from many
people that while we may have successfully circumvented the travel issue, the opportunity was
not available to everyone.
So here we are all together again in Richmond for a conference that will include all the
elements that we have heard you want. There will be formal-sessions, plenary sessions,
workshops, training, round tables, rectangular tables, panel discussions, poster sessions, and
outstanding speakers. The. theme this year is "Future Directions for Statistics at EPA". And
what could be a better time to discuss future directions than right after the Administrator's
announcement of the creation of the new EPA Center for Environmental Information and
Statistics. In fact, we have changed the schedule around in a last-minute upheaval to arrange for
Agency officials responsible for the CEIS to be here to present their vision and respond to your
comments and suggestions.
I also want to encourage you to avail yourself of the informal opportunities'here to
discuss common questions and concerns with fellow statisticians. It's no secret that some of the
best information is garnered in the hallways, over dinner, or while waiting fpr the elevator. We
are always anxious to make it even better. I want to thank the planning.and arrangements
committees for their, efforts to organize this conference and Margaret Conomos for her assistance
in coordinating transportation. It always looks deceptively easy, and we owe it to the hard work
of these people that it is easy for the rest of us. Special thanks are in order for Marcia Gardner of
SRA Technologies, Inc., who handled all the details so well.
Barry D. Nussbaum
Conference Chairman
Repository Material
Permanent Collection
Conference Planning Committee
John Fox
Henry Kahn
Elizabeth Margosches
John Warren
EJBD
ARCHIVE
EPA
230-
R-
97-
003
Arrangements Committee
Joan Bundy
Pat Wilkinson
US EPA
Headquarters and Chemical Libraries
EPA West Bldg Room 3340
Mailcode 3404T
1301 Constitution Ave NW
Washington DC 20004
202-566-0556
-------
AGENDA
-------
TWELFTH ANNUAL EPA CONFERENCE ON ENVIRONMENTAL STA TISTICS
Richmond, VA -April1-3,1997
AGENDA
Tuesday, April 1,1997
REGISTRA TION - Conference Area Foyer 9 30- 10 30am
OPENING SESSION - Grand Ballroom Section B . 10 30 am-12 00pm
Welcome & Introductions - Barry Nussbaum, Chair, Conference Planning Committee
Keynote Speaker: N. Phillip Ross, Director, Center for Environmental Statistics
"Center for Environmental Information and Statistics (CEIS) "
Followed by an Interactive Discussion with the CEIS Development Staff
-Tat* Cufrft.* ( B.t( H-V-^ )
Lunch Break 12 00-115pm
TRAINING SESSION I-A - Georgian/Elizabethan Rooms . I 15-4 45pm
EnvironmentalStatsfor S-PLUS: Software for Environmental Statistics
Presenters- Steven Millard, PSI. and Nagaraj Neerchal, University of Maryland Baltimore Campus
SESSION I-Raleigh/Drake Rooms . . . 115-2 45pm
Cancer Statistics, Epidemiology and Genetics Chair Ruth Allen. NCI
"Atlas of Cancer Mortality in the United States, 1970-92" Presenter Susan Devesa, NCI
"Evaluating Disease Cluster Alarms" Presenter Martin Kulldorff, NCI
Break - Conference Area Foyer 2 45 - 3 00pm
TRAINING SESSION I-A - (continued) 3 00-4 45pm
SESSION II - Raleigh/Drake Rooms . . . . 3 00-4 45pm
Representativeness in Statistics and Quality Assurance Chair John Warren, ORD
Presenters John Warren, ORD, and Malcolm J Bertoni, RTI
ROUNDTABLE DISCUSSIONS 4 45-6 00pm
GROUP A - Georgian/Elizabethan Rooms
Statistics & Health Facilitators Ruth Allen, NCI, and Elizabeth Margosches, OPPTS
GROUP B - Raliegh/Drake Rooms
Qualify Assurance Facilitator John Warren, ORD
GROUP C - Grand Ballroom, Section B
Statistical Research Facilitators Barry Nussbaum, OPPE, and Larry Cox, ORD
GROUP D - Milliard Room
Risk & Uncertainly Facilitator- Barnes Johnson, OSWER
-------
Wednesday, April 2,1997
SESSION III - Georgian/Elizabethan Rooms . 8 45-10-15 am
How Severe Is It? Chair ElizabethMargosches, OPPTS
"Toxic Severity for a Useful and Understandable Benchmark Dose " Presenter Linda Teuschler, ORD
"Severity Analysis Using Ridits " Presenter Mary Marion, OPPTS
SESSION IV - Raleigh/Drake Rooms 845-10 15 am
Exposure Assessment Chair John Fox, OW
"Interpreting Data from a National Survey of Protozoan in Dnnking-Water Sources "
Presenter. John Fox, OW
"Relationships Between Dioxins in Soil, Air, Ash, and Emissions from a Municipal Solid Waste Incinerator
Emitting Large Amounts ofDioxins "
Presenter Matthew Lorber, ORD
"Statistical Modeling ofDioxin Concentration Data from Sediment Cores "
Presenter Paul Pinsky, ORD
Break - Conference Area Foyer 10 15 -10 30 am
PANEL DISCUSSION - Grand Ballroom, Section B . 10 30am-12 00pm
EPA Cooperative Agreements Chair Barry Nussbaum, OPPE
Participants Larry Cox, ORD, Peter Guttorp, University of Washington, G P Patil, Pennsylvania State i
TRAINING SESSION I-B - Georgian/Elizabethan Rooms I 15 -4 45pm
EnvrionmentalStatsfor S-PLUS: Software for Environmental Statistics
Presenters- Steven Millard, PSI, and Nagaraj Neerchal, University of Maryland Baltimore Campus
TRAINING SESSION 2 - Raleigh/Drake Rooms .... . . I 15-3 30pm
Spatial Statistics Sampling Chair George Flatman, EPA, Las Vegas
"Spatial Sample Design " Presenter Evan Englund, EPA, Las Vegas
"SkewedFrequency Distributions" Presenter. George Flatman, EPA, Las Vegas
Break - Conference Area Foyer 2.15-2 30pm
TRAINING SESSION I-B (continued) 2 30-4 45pm
TRAINING SESSION 2 - (continued) 2 30-3 30pm
SESSION V - Raleigh/Drake Rooms . 3 30-4 45pm
Applications of Statistical Calibration Techniques in Analyzing Environmental Data
Chair Bimal Sinha, University of Maryland Baltimore Campus (UMBC)
"Confidence Regions and Tests in a Calibration Problem "
Presenter Thomas Mathew, UMBC
MINI SESSION A - Milliard Room . 4 45-5 30pm
Water Quality & Fishy Statistics Chair/Presenter Henry Kahn, OW
"Recent Developments in the Estimation of US Fish Consumption "
MINI SESSION B - Georgian/Elizabethan Rooms 4 45-5 45pm
Statistics and the Internet Chair/Presenter Chapman Gleason, OPPE
"Using the Web and other Networking Technologies in Support SASfor the Enterprise "
-------
AGENDA - page 3
Wednesday, April 2,1997 (continued)
RECEPTION & POSTER PRESENTA TIONS - Capitol Room . 5 30-6 45pm
Pesticide Residue Monitoring Data
Presenter Edward Brandt, EAB, OPP, OPPTS
A Master Sampling Frame for the Collection of Non-Agricultural Pesticide Usage Data
Presenter Alan R. Goozner, EAB, OPP, OPPTS
The National Air Quality and Emissions Trends Report, 1995
Presenter- David Mintz, OAR
Thursday, April 3,1997
SESSION VI - Grand Ballroom, Section B .. . 845-10 15 am
Statistics of Measurement in Analytical Chemistry
Chair Henry Kahn, OW
"A Two Component Model for Error in Analytical Chemistry and Issues of Detection and
Quantification "
Presenter. David M Rocke, Director, Center for Statistics in Science and Technology,
University of California, Davis
"Estimation of Precision of Low Concentration Chemical Analytical Measurements and
Establishment of Detection and Quantification Limits"
Presenters- Henry Kahn, OW, Kathleen Stralka and Raphael Kuznetsovski, SAIC
Break -Conference Area Foyer .. 10 15-10 45 am
CLOSING SESSION'- Grand Ballroom, Section B . . . .. 10 45-12 30pm
Featured Speaker. Daniel B. Can, School of Information Technology and Engineering, George Mason
University
"Statistical Graphics for Environmental Applications Developments and Challenges "
Bus to EPA Headquarters leaves at 1:30 pm
-------
ATTENDEE LIST
-------
Twelth Annual EPA Conference
on Environmental Statistics
List of Attendees
Ruth Allen
National Cancer Institute
Division of Cancer Epidemiology
and Genetics
6130 Executive Boulevard, MSC 7395
EPN Room 535
Bethesda, MD 20852-7395
(301)496-1609
Fax: (301)402-4279
Allenr@epndcc.nci.nih.gov
Robin Anderson
OAR/ORIA
U.S. EPA (6603J)
401 M Street, SW
Washington, DC 20460
(202) 233-9385
Fax: (202)233-9650
Anderson.Robin@epamail.epa.gov
David Annett
Support Contrator for NCI (SEER Program)
IMS, Inc.
12501 Prosperity Drive, Suite 200
Silver Spring, MD 20904
(301)680-9770
Fax: (301)680-8304
David_Annett@nih.gov
Lara Autry
OAR/OAQPS/EMAD
U.S. EPA (MD-19)
Research Triangle Park, NC 27711
(919)541-5544
Fax: (919)541-1039
Autry.Lara@epamail.epa.gov
Jeff Beaubur, Ph.D.
OPPTS/HERD
U.S. EPA (7403)
401 M Street, SW
Washington, DC 20460
(202) 260-2263
Fax: (202)260-1279
Malcolm Bertoni
Research Triangle Institute
401 M Street, NW, Suite 740
Washington, DC 20460
(202) 728-2067
Fax: (202)728-2095
mjb@rti.org
Ed Brandt
OPPTS/OPP
U.S. EPA (7503W)
Office of Pesticides
401 M Street, SW
Washington, DC 20460
(703) 308-8050
Fax: (703)308-8151
Brandt.Edward@epamail.epa.gov
Lori Brunsman
OPPTS/OPP/HED
U.S. EPA (7509C)
401 M Street, SW
Washington, DC 20460
(703) 308-2902
Fax: (703)305-5147
Brunsman.Lori@epamail.epa.gov
Judy Calem
OW/OGWDW
U.S. EPA (4607)
401 M Street, SW
Washington, DC 20460
(202) 260-8638
Fax: (202)260-3762
Calem.Judy@epamail.epa.gov
Daniel Carr
George Mason University
School of Information Technology
and Engineering
Fairfax, VA 22030-4444
(703)993-1671
Fax: (703)993-1521
-------
Steven Chang
OSWER/OERR
U.S. EPA (5204G)
401 M Street, SW
Washington, DC 20460
(703)603-9017
Fax: (703)603-9104
Chang.Steven@epamail.epa.gov
Darlene Cockfield
OPPE/OSPED/EID
U.S. EPA (2163)
401 M Street, SW
Washington, DC 20460
Fax: (202)260-4903
Margaret Conomos
OPPE
U.S. EPA (2164)
401 M Street, SW
Washington, DC 20460
(202) 260-3958
Fax: (202)260-4968
Conomos.Margaret@epamail.epa.gov
Lawrence Cox
ORD/NERL
U.S. EPA (MD-75)
Research Triangle Park, NC 27711
(919)541-2648
Fax: (919)541-7588
Cox.Larry@epamai 1 .epa.gov
John Creason
ORD/NHEERL
U.S. EPA Room 215 ERC (MD-55)
Research Triangle Park, NC 27711
(919)541-2598
Fax: (919)541-5394
Creason.John@epamail.epa.gov
David Crosby
American University
Department of Mathematics
and Statistics
4400 Massachusetts Avenue, NW
Washington, DC 20016
(202)885-3135
Fax: (202)885-3155
Crosby@nzms.wwb.noaa.gov
Thomas Curran
OAR/OAQPS
U.S. EPA (MD-12)
Research Triangle Park, NC 27711
(919)541-5694
Fax: (919)541-4028
Curran.Thomas@epamai 1 .epa.gov
Susan Devesa
National Cancer Institute
EPN, Room 415
Bethesda, MD 20892
(301)496-8104
Fax: (301)402-0081
Devesas@epndce.nci.nih.gov
Donald Doerfler
ORD/ERC
U.S. EPA (MD-55)
Research Triangle Park, NC 27711
(919)541-7741
Doerfler.Donald@epamaiI.epa.gov
Evan Englund
ORD/NERL-CRD (CAP)
U.S. EPA
P.O. Box 93478
Las Vegas, NV 89193-3478
(702) 798-2248
Fax: (702)798-2107
Englund.Evan@epamail.epa.gov
George Flatman
ORD/NERL-CRD
U.S. EPA
P.O. Box 93478
Las Vegas, NV 89193-3478
(702) 798-2528
Fax: (702)798-2208
Flatman.George@epamail.epa.gov
John Fox
OW
U.S. EPA (MC-4303)
401 M Street, SW
Washington, DC 20460
(202) 260-9889
Fax: (202)260-7185
Fox. John@epamai 1 .epa.gov
-------
Mary Frankenberry
OPPTS/OPP/EFED
U.S.EPA(7507C)
401M Street, SW
Washington, DC 20460
(703) 305-5694
Fax: (703)305-6309
Frankenberry .Mary@epamai I .epa.gov
Chapman Gleason
OPPE
U.S. EPA (2163)
401M Street, SW
Washington, DC 20460
Gleason.Chapman@epamail.epa.gov
Alan Goozner
OPPTS/OPP/BEAD
U.S. EPA (7503W)
401 M Street, SW
Washington, DC 20460
(703)308-8147
Fax: (703)308-8151
Goozner. A lan@epamai I .epa.gov
Peter Guttorp
University of Washington
National Research Center for Statistics
and the Environment
Box 351720
Seattle, WA 98195-1720
(206)616-9262
Fax: (206)616-9443
Peter@stat.wash ington.edu
Karen Hogan
OPPTS/OPPT
U.S. EPA (7403)
401M Street, SW
Washington, DC 20460
(202) 260-3895
Fax: (202)260-1279
Hogan.Karen@epamail.epa.gov
David Holland
ORD/NHEERL
U.S. EPA (MD-56)
ERC Annex
Research Triangle Park, NC 27711
(919)541-3126
Fax: (919)541-1486
Hoi land .Dav id@epamai 1 .epa.gov
William F. Hunt, Jr.
OAR/OAQPS/EMAD
U.S. EPA (MD-14)
Research Triangle Park, NC 27709
(919)541-5536
Fax: (919)541-2357
Hunt.Bill@epamail.epa.gov
Helen Jacobs
OW
U.S. EPA (4303)
401 M Street, SW
Washington, DC 20460
(202)260-5412
Fax: (202)260-7185
Jacobs.Helen@epamail.epa.gov
Barnes Johnson
OSWER/OSW
U.S. EPA (5307W)
401 M Street, SW
Washington, DC 20460
(703) 308-8855
Fax: (703)308-0511
Johnson.Bames@epamail.epa.gov
Henry Kahn
OW/EAD
U.S. EPA (MC-4303)
401M Street, SW
Washington, DC 20460
(202) 260-5408
Fax: (202)260-7185
Kahn.Henry@epamail.epa.gov
-------
Douglas Kendall
U.S. EPA Region VIII
NEIC/OECA, Building 53, Box 25227
Denver Federal Center
Denver, CO 80225
(303)236-5132x281
Fax: (303)236-5116
Kendall.Douglas@epamail.epa.gov
Mel Kollander
Temple University
Institute for Survey Research
2300 M Street, NW, Suite 800
Washington, DC 20037
(202) 973-2820
Fax: (202)973-2821
Melk@gwis2.circ.gwu.edu
Martin Kulldorff
National Cancer Institute
Biometry Branch, DCPC
EPN 344, 6130 Executive Boulevard
Bethesda, MD 20892
(301)496-7519
Fax: (301)402-0816
MartinK@helix.nih.gov
Raphael Kuznetsovski
SAIC/Reston Facility Directory
11251 Roger Bacon Drive
Reston,VA 20190
(703)318-4553
Fax: (703)709-1040
Rkuznetsovski@lan813 .ehsg.saic.com
James R. Lee
American University
School of International Service
Washington, DC 20016
(202)885-1691
Fax:(202)885-2494
Jlee@American.edu
Matthew Lorber
ORD/NCEA
U.S. EPA (8623)
401 M Street, SW
Washington, DC 20460
(202) 260-3924
Fax: (202) 260-6370
Lorber.Matthew@epamai 1 .epa.gov
Arthur Lubin
U.S. EPA Region V
Office of Strategic Environmental Analysis
77 West Jackson Boulevard
Chicago, 1L 60604-3507
(312)886-6226
Fax: (312)353-0374
Elizabeth Margosches
OPPTS/OPPT
U.S. EPA (7403)
401 M Street, SW
Washington, DC 20460
(202)260-1511
Fax: (202)260-1279
Margosches.Elizabeth@epamail.epa.gov
Mary Marion
OPPTS/OPP/HED
U.S. EPA
401M Street, SW
Washington, DC 20460
(703) 308-2854
Marion.Mary@epamail.epa.gov
Thomas Mathew
University of Maryland
Department of Mathematics and Statistics
1000 Hilltop Circle
Baltimore, MD 21250
(410)455-2418
Fax: (410)455-1066
Mathew@umbc2.umbc.edu
Steven P. Millard
Probability, Statistics and Information (PSI)
7723 44th Avenue, NE
Seattle, WA 98115-5117
(206) 528-4877
Fax: (206)528-4802
Smillard@nwlink.com
David Mintz
OAR/OAQPS
U.S. EPA (MD-14)
Research Triangle Park, NC 27711
(919)541-5224
Fax: (919)541-1903
Mintz.David@epamail.epa.gov
-------
Nagaraj Neerchal
University of Maryland Baltimore Campus
Department of Mathematics and Statistics
1000 Hilltop Circle
Baltimore, MD 21250
(410)455-2637
Fax: (410)455-1066
Nagaraj@math.umbc.edu
Barry Nussbaum
OPPE/CES
U.S. EPA (2163)
401M Street, SW
Washington, DC 20460
(202)260-1493
Fax: (202)460-4968
Nussbaum.Barry@epamail.epa.gov
Brenda Odom
ORD/QAD
U.S. EPA (8724)
401 M Street, SW
Washington, DC 20460
(202)260-8194
Fax: (202)401-7002
Odom.Brenda@epamail.epa.gov
G. Patil
The Pennsylvania State University
Center for Statistical Ecology and
Environmental Statistics
421 Thomas Building
University Park, PA 16802
(814)865-9442
Fax: (814)863-7114
Gpp@stat.psu.edu
Hugh Pettigrew
OPPTS/OPP
U.S. EPA (MC-7509C)
401 M Street, SW
Washington, DC 20460
(703)305-5699
Fax: (703)305-5147
Penigrew.Hugh@epamail.epa.gov
Andrea Pfahles-Hutchens
OPPTS/OPPT
U.S. EPA (7403)
401 M Street, SW
Washington, DC 20460
(202) 260-0288
Fax: (202)260-1279
Paul Pinsky
ORD/NCEA
U.S. EPA (8623)
401 M Street, SW
Washington, DC 20460
(202)260-1079
Fax: (202)260-3803
Pinsky.Paul@epamail.epa.gov
Esperanza Renard
ORD/NCERQA/QAD
U.S. EPA (MS-104)
2890 Woodbridge Avenue
Edison, NJ 08837
(908)321-4355
Fax: (908)321-6640
Renard.Esperanza@epamail.epa.gov
David Rocke
University of California, Davis
Graduate School of Management
Davis, CA 95616
(916)752-7368
Fax: (916)752-2924
dmrocke@ucdavis.edu
Randall Romig, Ph.D.
U.S. EPA Region VI
(6MD-HX)
1445 Ross Avenue
Dallas, TX 75202-2733
(214)665-8346
Fax: (214)665-8072
Romig.Randall@epamail.epa.gov
N. Phillip Ross
OPPE/OSPED/CES
U.S. EPA Room 3101 M(2163)
401 M Street, SW
Washington, DC 20460
(202) 260-5244
Fax: (202)260-8550
Ross.Nphillip@epamail.epa.gov
-------
Robert Runyon
U.S. EPA, Region II
2890 Woodbridge Avenue
Edison, NJ 08837
(908)321-6645
Fax: (908)906-6824
Runyon .Robert@epamai I .epa.gov
Judy Schmid
ORD/NHEERL
U.S. EPA (MD-55)
ERG
Research Triangle Park, NC 27711
(919)541-0486
Fax: (919)541-5394
Schmid.Judy@epamail.epa.gov
Mark Schmidt
OAR/OAQPS/EMAD/AQTAG
U.S. EPA (MD-14)
AQTAG, EMAD
Research Triangle Park, NC 27711
(919)541-2416
Schm idt. Mark@epamai I .epa.gov
R. Wood row Setzer
ORD/NHEERL
U.S. EPA (MD-55)
Research Triangle Park, NC 27711
(919)541-0128
Fax: (919)541-5394
Setzer. Woodrow@epamai I .epa.gov
Ronald Shafer
OPPE
U.S. EPA (2163)
401 M Street, SW
Washington, DC 20460
(202) 260-6766
Fax: (202)260-4968
Shafer.Ronald@epamai 1 .epa.gov
Bimal Sinha
OPPE/CES
U.S. EPA (2163)
401 M Street, SW
Washington, DC 20460
(202) 260-2680
Maria Smith
OW
U.S. EPA (4303)
401 M Street, SW
Washington, DC 20460
(202) 260-8639
Fax: (202)260-7185
Sm ith .Marla@epamai I .epa.gov
William P. Smith
OPPE/CES
U.S. EPA Room 3201(2163)
401 M Street, SW
Washington, DC 20460
(202) 260-2697
Fax: (202)260-4968
Smith.Will@epamail.epa.gov
Kathleen Stralka
SAIC
11251 Roger Bacon Drive
Reston, VA 20190
(703)318-4583
Kathleen.A.Stralka@cpmx.saic.com
Linda Teuschler
ORD/NCEA-CIN
U.S. EPA (MS-190)
26 W. Martin Luther King Drive
Cincinnati, OH 45268
(513)569-7573
Fax: (513)569-7916
Teuschler.Linda@epamai 1 .epa.gov
John Warren
ORD/NCERQA/QAD (8724)
U.S. EPA (8724)
401 M Street, SW
Washington, DC 20460
(202) 260-9464
Fax: (202)401-7992
Warren.John@epamail.epa.gov
Charles White
OW/OST/EAD
U.S. EPA (MC-4303)
401 M Street, SW
Washington, DC 20460
(202)260-5411
Fax: (202)260-7185
White.Chuck@epamail.epa.gov
-------
Conference Support Staff
Patricia Crocker
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703) 205-8500
Fax: (703)205-6260
Marcia Gardner
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703) 205-8500
Fax: (703)205-6260
Marcia.Gardner@sratech .com
Maryce Jacobs
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703) 205-8500
Fax: (703)205-6260
Hale Vandemer
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703)205-8500
Fax: (703)205-6260
-------
ABSTRACTS
-------
Index of Presentations Listed Alphabetically by Presenter(s)
Presenter(s) Page No.
Ed Brandt: Pesticide Residue Monitoring Data 17
Daniel B. Carr: Statistical Graphics for Environmental Applications: Developments
and Challenges 22
Susan Devesa: Atlas of Cancer Mortality in the United States, 1970-92 2
Evan Englund: Spatial Sample Design 11
George Flatman: Skewed Frequency Distributions 12
John F. Fox: Interpreting Data from a National Survey of Protozoan Pathogens in
Drinking-water Sources 7
Chapman Gleason: Using the Web and Other Networking Technologies in Supporting
SAS for the Enterprise 15
Alan R. Goozner: A Master Sampling Frame for the Collection of Non-Agricultural
Pesticide Usage Data 18
Henry Kahn: Recent Developments in the Estimation of U.S. Fish Consumption 14
Henry Kahn: Estimation of Precision of Low Concentration Chemical Analytical
Measurements and Establishment of Detection and Quantification Limits 20
Martin Kulldorff: Evaluating Disease Cluster Alarms 3
Matthew Lorber and Paul Pinsky: Relationships Between Dioxins in Soil, Air, Ash, and
Emissions from a Municipal Solid Waste Incinerator Emitting Large Amounts of Dioxins . 8
Mary Marion: Severity Analysis Using Ridits 6
Thomas Mathew: Confidence Regions and Tests in a Calibration Problem 13
Steven Millard and Nagaraj Neerchal: EnvironmentalStats for S-PLUS: Software for
Environmental Statistics 1
David Mintz: The National Air Quality and Emissions Trends Report, 1995 19
Barry Nussbaum: EPA Cooperative Agreements 10
Paul Pinsky: Statistical Modeling of Dioxin Concentration Data from Sediment Cores 9
David M. Rocke: A Two Component Model for Error in Analytical Chemistry and Issues
of Detection and Quantification 21
Linda Teuschler: Toxic Severity for a Useful and Understandable Benchmark Dose 5
John Warren: Representativeness in Statistics and Quality Assurance 4
Note: Complete abstracts for each conference presentation appear on the pages that follow.
These include the name and type of session, and the date and time of presentation (in the
upper right hand corner of the page), as well as the title of the presentation, the name and
affiliation of each author, and the name and affiliation of each presenter.
-------
TRAINING SESSION 1-A & B: EnvironmentalStats for S-PLUS:
Software for Environmental Statistics
(1 -A) Tuesday, April 1, and (1 B) Wednesday, April 2, 1:15 - 4:45 pm
Title: EnvironmentalStats for S-PLUS: Software for Environmental Statistics
Author: Steven P. Millard, Ph.D., Probability Statistics & Information (PSI)
Presenters: Steven Millard, PSI, and Nagaraj Neerchal, Department of Mathematics and
Statistics, University of Maryland Baltimore Campus
Abstract
S-PLUS is a premier statistics and graphics software package that is rapidly being adopted by
practitioners in fields ranging from pharmaceuticals to finance. ENVIRONMENTAL STATS for S-
PLUS is a new S-PLUS module designed specifically for environmental statistics. Developed over the
past three years, it covers all the major statistical methods found in the environmental monitoring
literature and includes an extensively detailed hypertext help system to guide you through the
background and application of each method. This training course will cover basic ideas in sampling
design and statistical methods for environmental monitoring and risk assessment, including methods of
random sampling, probability distributions, hypothesis tests and confidence intervals, prediction and
tolerance intervals, and methods for dealing with Type I left-censored ("below-detection-limit") data.
Concepts will be illustrated with data sets taken from current regulatory guidance documents.
-------
SESSION I - Cancer Statistics, Epidemiology and Genetics
Tuesday, April 1, 1:15 - 2:45 pm
Title: Atlas of Cancer Mortality in the United States, 1970-92
Authors: Susan S. Devesa, Ph.D., Dan J. Grauman, M.A., William J. Blot, Ph.D.*, Robert
N. Hoover, M.D., and Joseph F. Fraumeni, Jr., M.D., Epidemiology and
Biostatistics Program, Division of Cancer Epidemiology and Genetics, National
Cancer Institute, Bethesda, MD 20892
""Currently with the International Epidemiology Institute, Ltd., Rockville, MD 20850
Presenter: Susan Devesa, NCI
Abstract
The study of geographic variation in cancer rates may provide clues to the role of environmental or
lifestyle factors that may affect cancer risk. The maps themselves cannot provide information about the
causes of cancer or its clustering, but they can raise hypotheses about potential causative influences.
Earlier atlases showed substantial geographic variations in cancer mortality rates among whites and
nonwhites in the United States and stimulated subsequent studies which identified relevant exposures
and risk factors. For some cancers mortality rates have not changed greatly over time, whereas
substantial increase or decreases have been observed for other cancers. This atlas updates the maps
through 1992, presenting for the first time, data specifically for blacks. During the 23-year study period
1970-92, more than 8.5 million whites and 1.0 million blacks died due to cancer. The national annual
age-adjusted mortality rate per 100,000 person-years for all cancers combined ranged from 135 among
white females to 292 among black males. A total of 40 cancers (including all forms combined) were
considered. Some examples of maps from the new atlas will be presented. The patterns of cancer in the
United States, some of which have changed over time, may provide additional leads for the evaluation of
the determinants of cancer among American men and women.
JA1C.I , Jo* \<\<\$
-------
SESSION I: Cancer Statistics, Epidemiology and Genetics
Tuesday, April 1, 1:15 - 2:45 pm
Title: Evaluating Disease Cluster Alarms
Author/Presenter: Martin Kulldorff, Epidemiology and Biostatistics Program, National
Cancer Institute
Abstract
During the last few decades, there have been a considerable number of geographical disease cluster
alarms in different parts of the United States. Many are given considerable media attention, and, for
natural reasons, there is a considerable amount of worry in the local communities affected. As regards
the cause of the clusters, the environment is often a prime suspect.
Before moving into a full-scale epidemiological and environmental investigation, though, it makes sense
to find out whether the observed number of cases actually represents a statistically significant excess or
not. We cannot simply compare the disease rate inside and outside of the cluster area, since we then
have a problem of pre-selection bias. In this talk we will review and illustrate a couple of mutually
complimentary methods that can be used to work around that bias, one of which is the spatial scan
statistic. A number of applications will be given.
/AspeCTr cA c: Di'se.av Clu
-------
SESSION II - Representativeness in Statistics and Quality Assurance
Tuesday, April 1, 3:00 -4:45 pm
Title: Representativeness in Statistics and Quality Assurance
Author: John Warren, Quality Assurance Division, Office of Research and Development
(ORD)
Presenters: John Warren, ORD, and Malcolm J. Bertoni, Research Triangle Institute (RTI)
Abstract
The concept of "representativeness" is quite clear to a statistician, especially in the context of survey
sampling with respect to a well-defined frame. The concept is considerably less clear when the context
is environmental sampling because the homogeneity of sampled media and physical environment from
which the sample is drawn must be considered.
The session will explore the differing concepts of "representativeness" as used (and possibly abused) by
the environmental community, include a discussion of Gy's theory of sampling as a possible solution,
and finally engage the attendees in a free and frank discussion of further aspects of the concept.
-------
SESSION III - How Severe Is It?
Wednesday, April 2, 8:45 - 10:15 am
Title: Toxic Severity for a Useful and Understandable Benchmark Dose
Authors: Linda Teuschler and Richard C. Hertzberg, Ecological Exposure Research
Division, Office of Research and Development, Cincinnati, OH
Presenter: Linda Teuschler, ORD
Abstract
Regression on ordered categories of toxic severity is recommended in order to address two criticisms of
EPA's risk assessment procedures for noncancer effects. The first criticism is that presenting risk only
as probability does not consider the impact of the event. Second, the goal of the benchmark dose is
vaguely defined, in part because it focuses on one effect from one study. By including all reported
effects into the regression procedure and tracking the toxic severity, one ends up with a benchmark dose
that closely follows the definition of the Reference Dose. In addition, by keeping distinct the effects of
different severity, categorical regression allows for a definition of a benchmark dose that satisfies both a
low specified risk of minor effects and an even lower specified risk of major effects.
-------
SESSION III - How Severe Is It?
Wednesday, April 2, 8:45 - 10:15 am
Title: Severity Analysis Using Ridits
Author/Presenter: Mary Marion, Health Effects Division, Office of Prevention, Pesticides,
and Toxic Substances
Abstract
The United States Environmental Protection Agency, Office of Prevention, Pesticides, and Toxic
Substances, Office of Pesticide Programs has been given the task of reviewing chemical registrant data
and analyses, some of which use the statistical technique of rid its. The technique of rid it analysis used in
severity analysis was studied for its feasibility for use at the Agency.
Two toxicological data sets chosen were that of one study evaluating the severity of
glomerulonephropathy in male rat kidneys with dose increments of the chemical being reviewed and
another of mononuclear cell leukemia, also in male rats.
The mathematical theory behind this technique will be presented. This is a continuation of a paper
presented in 1995 at the poster session of the SUGI 21 Conference held in Chicago, Illinois.
-------
SESSION IV - Exposure Assessment
Wednesday. April 2, 8:45 - 10:15 am
Title: Interpreting Data from a National Survey of Protozoan Pathogens in
Drinking-water Sources
Author/Presenter: John F. Fox, Engineering and Analysis Division, Office of Water
Abstract
In 1997-98, EPA and participating water treatment systems will conduct a nationwide sampling program
to assess protozoa (Giardia and Cryptosporidium) in drinking-water sources (untreated, raw water) and,
at a smaller number of systems, in the treated drinking water. Several hundred participating treatment
plants will each submit one sample per month for 12-18 months. The chief objective of the protozoan
sampling program is to characterize the nationwide distribution of protozoan concentrations in source
water, with the treatment plant as the unit of sampling, in particular the distribution of plant mean,
median, and 90th percentile concentrations. A related problem is to characterize the variability and
distribution over time of concentrations at one plant. This presentation will discuss opportunities and
challenges in developing appropriate point and interval estimates from these data to achieve national-
level characterizations of protozoan concentrations in raw and treated water. About one year remains
before interim analysis of data. We welcome suggestions regarding data analysis and interpretation!
-------
SESSION IV - Exposure Assessment
Wednesday, April 2, 8:45 -10:15 am
Title: Relationships Between Dioxins in Soil, Air, Ash, and Emissions from a Municipal
Solid Waste Incinerator Emitting Large Amounts of Dioxins
Author: Matthew Lorber, National Center for Environmental Assessment (NCEA), Office
of Research and Development (ORD)
Presenters: Matthew Lorber and Paul Pinsky, NCEA, ORD
Abstract
Environmental measurements including air concentrations and soil concentrations of dioxins were taken
in the vicinity of a municipal solid waste incinerator emitting large amounts of dioxins. Also available
were two separate stack tests measuring concentrations and amounts of dioxins being emitted, and
concentrations in combuster ash. An "incinerator signature," defined as the profile of the 17 toxic dioxin
and furan congeners where each is described in proportion to total dioxins, was found in the ash and in
subsets of the other two matrices. The profiles in all media were also examined using principal
component analysis to determine what features best distinguished the profiles in each media. This study
also investigated the relationship of dioxin soil concentration as a function of distance from the
incinerator, and determined an urban background soil concentration, further from the incinerator, as
compared to elevated soil concentrations near the incinerator. A background urban air concentration was
determined and compared to measurements of elevated air concentrations, which also had the signature
profile.
-------
SESSION IV - Exposure Assessment
Wednesday, April 2, 8:45 - 10:15 am
Title: Statistical Modeling of Dioxin Concentration Data from Sediment Cores
Authors: Paul F. Pinsky and David Cleverly, National Center for Environmental
Assessment, Office of Research and Development (ORD)
Presenter: Paul Pinsky, ORD
Abstract
Evidence from several sources suggests that emissions of dioxins into the environment began to stabilize
in the 60's or 70's and have been declining since the 70's or 80's. One of the most important of these
sources is the historical record from sediment cores in U.S. lakes. Recently, a joint EPA and DOE study
measured levels of dioxins and coplanar PCB's in the sediment core of 11 U.S. lakes. Samples from
different sediment layers were dated, effectively transforming the data from each lake from a spatial
series to a time series. The resulting data base consists of a large number of time series (11 lakes times
30 concentrations of related chemicals) with each time series being relatively short (5 to 11 time points).
In this session, we will describe a modeling strategy for these data and interpret the modeling results
with the aim of summarizing overall trends as well as identifying any trends specific to certain lakes or
chemicals.
-------
PANEL DISCUSSION
Wednesday, April 2, 10:30 am - Noon
Title: EPA Cooperative Agreements
Author/Chair: Barry Nussbaum, Office of Policy, Planning and Evaluation
Participants: Larry Cox, Office of Research and Development, Peter Guttorp, University of
Washington, and G.P. Patil, Penn State University
Abstract
This panel discussion will feature investigators from two of the major cooperative agreements on
environmental statistics. The panel will discuss the use of cooperative agreements such as these to
encourage statistical research on theoretical and applied environmental topics. There will be general
comments by EPA on how to get tasks funded and work initiated. Then professors from two of the
universities with such agreements will discuss their side of the equation: how they operate under the
agreement and what they do. Included will be the vision for future work and applications. The panel
will also have time for a hopefully lively question and answer period.
10
-------
TRAINING SESSION 2 - Spatial Statistics Sampling
Wednesday, April 2, 1:15 - 3:30 pm
Title: Spatial Sample Design
Author/Presenter: Evan Englund, National Exposure Research Laboratory, Office of
Research and Development, Las Vegas
Abstract
Spatial samples, in addition to having number, referred to by classical statisticians as sample size, also
have sample support or sample volume or mass. QUAMS, thanks to Dean Neptune, represents this
concept by sample unit, remediation unit, and exposure unit. The support, since it cannot be analyzed
chemically in total, must be represented by a composite sample in which the subsamples survey the in
situ sample unit. The definitions and methods of obtaining spatial representativeness will be presented
verbally (many "real world" examples and few equations). The relationships of support size and change
of support to spatial variance and regularization of semivariograms for correct varicography will be
explained. The methodological "rules of thumb" for spatial sample design will be enumerated, clarified,
and organized.
11
-------
TRAINING SESSION 2 - Spatial Statistics Sampling
Wednesday. April 2, 1:15 - 3:30 pm
Title: Skewed Frequency Distributions
Author/Presenter: George Flatman, National Exposure Research Laboratory, Office of
Research and Development, Las Vegas
Abstract
The frequency distribution of both random variables and spatial variables has the ubiquitous problem of
skewness for data interpreters and decision makers. Presenting the mean of a skewed distribution is
disinformation to all data interpreters or managers (RPM or OSC) if they assume normality. The
appropriate model for skewed frequency distributions may be a mixture (plume mixed with background)
model rather than one lognormal model. When does a simplifying model become an over simplification?
The mixture model does a better job at explaining most waste sites. Methods of separation, such as QQ-
plots and robust methods, will be discussed. The various methods of evaluating a lognormal mean will
be evaluated and illustrated by real world data and by virtual (simulated) data. The number of questions
will exceed the number of answers.
12
-------
SESSION V - Applications of Statistical Calibration
Techniques in Analyzing Environmental Data
Wednesday, April 2,3:30 - 4:45 pm
Title: Confidence Regions and Tests in a Calibration Problem
Author/Presenter: Thomas Mathew, Department of Mathematics and Statistics, University of
Maryland Baltimore County
Abstract
Consider a univariate normally distributed response variable related to a univariate explanatory variable
through the usual linear regression model. Suppose independent observations are available on the
response variable corresponding to known values of the explanatory variable. Now consider another
observation on the response variable, corresponding to an unknown value of the explanatory variable.
The problem of calibration or inverse regression deals with statistical inference on this unknown
parameter. The data on the response variable, corresponding to known values of the explanatory variable
is referred to as calibration data. We will address the problem of constructing confidence regions and
hypotheses tests for the unknown value of the explanatory variable. Two types of problems will be
studied: the calibration data is used to construct confidence regions and to test for a single unknown
value of the explanatory variable, or for a sequence of unknown values of the explanatory variable. The
computational aspects and the practical implementation of our procedures will be illustrated in detail by
applying them to some chemical and environmental data.
13
-------
MINI SESSION A - Water Quality and Fishy Statistics
Wednesday, April 2,4:45 - 5:30 pm
Title: Recent Developments in the Estimation of U.S. Fish Consumption
Authors: Henry D. Kahn and Helen Jacobs, Environmental Analysis Division, Office of Water,
Kathleen Stralka, Science Applications International Corporation
Presenter: Henry Kahn, OW
Abstract
Estimates of U.S. per capita fish consumption play a key role in a number of Environmental Protection
Agency program decisions. In particular, exposure estimates used in determining water quality criteria
and related standards are based, in part, on estimates of the amount offish consumed and contamination
levels in the fish. This presentation will report on estimates offish consumption based on recent work
with the USDA's combined 1989, 1990, and 1991 Continuing Survey of Food Intake by Individuals
(CSFII). These estimates reflect adjustments based on USDA's Recipe file which provides the amount of
fish in combination foods and changes in the habitat designations (freshwater/estuarine and marine) for
certain species offish.
14
-------
MINI SESSION B - Statistics and the Internet
Wednesday, April 2,4:45 - 5:45 pm
Title: Using the Web and Other Networking Technologies in Supporting SAS for the
Enterprise
Authors: Chapman Gleason, Center for Environmental Statistics, Office of Policy, Planning
and Evaluation, and John Shirey, Enterprise Technology Services Division, Office
of Administration and Resource Management
Presenter: Chapman Gleason, CES, OPPE
Abstract
The Environmental Protection Agency (EPA) has just begun an Enterprise Computing Offer (ECO) with
SAS Institute. The EPA SAS ECO provides 21 SAS products (base, AF, Assist, ETS, Connect, FSP,
Graph, Share, Tutor, Stat, IML, Insight, Lab, Access for Oracle, Access for ODBC, CPE, CIS, QC,
Toolkit) on several desktop operating systems (Windows, Windows 95, Windows NT, MacOS, SunOS,
Digital Unix, OSF1, HP/UX, DG/UX) in EPA. This product mix will allow SAS users to design and
develop client/server SAS applications and provide EPA scientists and policy analysts with better
desktop scientific, data management, and statistical software. This session describes EPA's
implementation strategy to support SAS across a heterogeneous LAN/WAN computing environment
consisting of more than 300 Novell servers and LANs running IPX protocol, Windows PCs on Novell
LANs running TCP/IP and IPX protocols, Unix workstations and servers (running TCP/IP protocol), and
an IBM mainframe housed at the National Computer Center located in Research Triangle Park, North
Carolina. All the computers are accessible via SAS from the Desktop using TCP/IP protocol. The
session will include discussion of how EPA:
1) Prepared custom installation instructions for SAS on EPA's Novell LANs which run Networked
MS Windows.
2) Pkziped the SAS Windows Installation CD-ROM and set up an FTP Server for SAS to distribute
SAS to users on Novell's LANs.
3) Designed and implemented a Lotus Notes Mail-In Data Base and billing strategy to keep track of
the user population.
4) Implemented a SAS Listserver, called EPASAS-L, to allow users to share SAS technical
problems and solutions.
5) Designed an Internal SAS Web using a Lotus Notes InterNotes server and Data Base which
replicates and publishes to the Web each hour. This Lotus Notes Data Base is replicated to each
EPA Region allowing SAS users at remote sites to document their implementation of SAS
products, SAS applications, and SAS code and share it with other EPA SAS users.
6) Implemented a mail user-ID for the SAS Notes DB, so that users without Notes Clients can mail
a document (including Graphics) to a user-ID called epasas Web@epamail.epa.gov was, and the
document will automatically be published to the EPA SAS Web.
7) Implemented the SAS and Lotus Notes Interface allowing SAS programs to write to the
SAS/Web via SAS clients on remote Systems.
15
-------
(continued)
MINI SESSION B - Statistics and the Internet
Wednesday, April 2, 4:45 - 5:45 pm
Abstract (continued)
One of the benefits of client/server computing and the popularization of Internet protocols has been the
rapid development of the World Wide Web (WWW). However, HTML development has languished
because of single file names being required in HTML "home pages." One product that has overcome
that barrier and EPA has used to implement its SAS Web is a Lotus Notes InterNotes Server. An
InterNotes Server is a Notes server that runs under Windows NT Advanced Server and has the HTTP
demon running as an NT service. The InterNotes Server takes a Notes Data Base and converts the Notes
Documents into HTML documents and publishes the Notes Views as HTML links to the Notes
Documents. EPA has used this capability to save SAS users and developers the learning curve while
learning HTML, which is both tedious and time consuming. InterNotes also allows a "macro" level of
integration of keeping track of hundreds of HTML file names which are prevalent on Unix systems.
16
-------
POSTER SESSION
Wednesday, April 2,5:30 - 6:30 pm
Title: Pesticide Residue Monitoring Data
Author/Presenter: Ed Brandt, Economic Analysis Branch, Office of Pesticide Programs
Abstract
The Government Performance and Results Act requires all government agencies to connect the process
of planning, budgeting, and accountability. This paper addresses the issues concerning pesticide residue
monitoring data. Using National residue data from 1992 to 1995, the paper analyzes the consistency
between different residue monitoring programs, identifies gaps in the development of national estimates
of dietary exposure, and suggests approaches to better sampling strategies in the future to improve
overall dietary exposure estimates.
17
-------
POSTER SESSION
Wednesday, April 2, 5:30 - 6:30 pm
Title: A Master Sampling Frame for the Collection of Non-Agricultural
Pesticide Usage Data
Author/Presenter: Alan R. Goozner, Economic Analysis Branch, Biological and Economic
Analysis Division, Office of Pesticides Programs
Abstract
The EPA recently conducted the 1993 Certified Commercial Pesticide Applicator Survey. The survey
was conducted at considerable cost. Much of the time involvement was the construction of a sampling
frame. As a follow-on to this experience, several questions arose: Could a master sampling frame be
constructed that would allow quicker, more efficient replication of a similar survey? Would it allow
surveying more specialized aspects of the applicator population? EPA statisticians are encouraged to
offer their insights and opinions as to the feasibility of the idea.
Should the EPA offer seed money to have this frame constructed in the Private sector? Would private
sector research companies use such a frame? Would they pay for samples drawn from such a frame?
Would the frame facilitate more research into the aspects of non-agricultural pesticide usage that would
otherwise not be done? At a minimum, should the EPA more fully investigate the feasibility of frame
construction and usability?
18
-------
POSTER SESSION
Wednesday, April 2, 5:30 - 6:30 pm
Title: The National Air Quality and Emissions Trends Report, 1995
Author/Presenter: David Mintz, Office of Air Quality Planning and Standards, Office of Air
and Radiation
Abstract
This twenty-third annual report documenting air pollution trends in the United States was released by
Administrator Carol Browner at a major press conference on December 17, 1996. The report provides
information on those pollutants for which National Ambient Air Quality Standards have been
established. These pollutants are carbon monoxide (CO), lead (Pb), nitrogen dioxide (NO2), ozone (O3),
paniculate matter whose aerodynamic size is less than or equal to 10 microns (PM-10), and sulfur
dioxide (SO2).
While the report focuses on national trends in air quality concentrations and emissions for these criteria
pollutants, it also features information on related topics. These include visibility, air toxics,
nonattainment areas, urban area trends, reformulated gasoline, and Photochemical Assessment
Monitoring Stations (PAMS).
19
-------
SESSION VI - Statistics of Measurement in Analytical Chemistry
Thursday, April 3, 8:45 - 10:15 am
Title: Estimation of Precision of Low Concentration Chemical Analytical
Measurements and Establishment of Detection and Quantification Limits
Authors: Henry D. Kahn, Chief, Statistical Analysis Section, Office of Water, and Kathleen
Stralka, Statistician, Science Applications International Corporation (SAIC)
Presenter: Henry Kahn, EPA, and Kathleen Stralka, SAIC
Abstract
Estimates of precision of low concentration chemical analytical measurements are critical to establishing
detection and quantification levels. This presentation will consider estimates of precision based on the
EPA procedure for determining a "Method Detection Limit" and the Rocke-Lorenzato model. The
methods will be illustrated using some inductively coupled plasma - mass spectroscopy data and
applications to establishing detection and quantification levels will be discussed.
20
-------
SESSION VI - Statistics of Measurement in Analytical Chemistry
Thursday, April 3, 8:45 - 10:15 am
Title: A Two Component Model for Error in Analytical Chemistry and Issues of
Detection and Quantification
Author/Presenter: David M. Rocke, Director, Center for Statistics in Science and
Technology, University of California - Davis
Abstract
A new model for measurement error in analytical chemistry will be presented. A commonly used model
that assumes the standard deviation of analytical error increases proportionally with the concentration of
the analyte cannot be used for very low concentrations. For measurements of near zero amounts, the
standard deviation is often assumed to be constant, which does not apply to larger quantities. Neither
model applies across the full range of concentrations of an analyte. The new model contains two error
components, one additive and one multiplicative, and exhibits sensible behavior at both low and high
concentration levels. The use of the model with maximum likelihood estimation and application to some
gas chromatography/mass-spectrometry and atomic absorption spectroscopy data will be described.
Implications for detection and quantification will be discussed.
21
-------
FEATURED SPEAKER
Thursday, April 3, 10:45 am -12:30 pm
Title: Statistical Graphics for Environmental Applications: Developments and
Challenges
Author/Presenter: Daniel B. Carr, School of Information Technology and Engineering,
George Mason University
Abstract
Development of statistical graphics for environmental applications is a many faceted challenge. In the
first part of this session, recently developed templates for communicating environmental summaries to
broad audiences will be presented. The templates address issues such as converting tables to plots,
linking statistical summaries and maps, and representing metadata to provide an appropriate basis for
interpretation. Also, a JAVA implementation that enables user manipulation in a low-resolution
dynamic web environment will be described. The second part of the session will focus on graphics
challenge areas. For example, one challenge area involves working with massive data sets. An example
uses the gridding of breeding bird prevalence data to a continental U.S. EMAP grid to raise issues about
global gridding of satellite imagery. The second challenge area concerns visualizing statistical and
ecological models and their impact on a specific analysis. Recent developments in environmental
graphics provide important new capabilities, but some deep challenges remain.
22
-------
EVALUATION
-------
MISCELLANEOUS
-------
Downtown Richmond, Virginia
21 Marine Raider MuMum
22 John Marshall Houa*
29 Maaonlc Hall
24 Matthew Fontaine
Maury Monument
2« Meymont
2» Monumental Church
XT Mo.que
^Metropolitan Richmond Convention
and Visitors Bureau
Tourlet Information
1 Annabel Lse
2 Beth Ahabah Jawleh Mu»*um
* Blech Hlatory Mueeum
& Cultural Cantar
4 Bill -Boj»nglB»' Roblnaon Monument
Cirpanler Cenlet
Chamber of Commerce
City Hall (New)
City Hall (Old)
Christopher Columbus Monu
Jeffareon Oavia Monument
Egyptian Building
of the
2» Edgar AUfi Poa Muaau
Rlchn^nd Centre for
Coventions and Exhibltlona
mond Children's Muaaum
mond National Battlefield
Park Headquarter*
Richmond Railroad Muaaum
St. John's Church
36 St Paul'a Church
M Science Muaaum of Virginia
XT 6th Street Marketplace
38 Soldlera and Saltore Monument
M J.E.B. Stuart Monument
«O Tredegar Iron Work*
41 Valentine Muaaum
42 Valentine Riveralde
4* Virginia Hiatortcal Society
44 Virginia Museum of Fin* Art*
4* Virginia State Capitol
4a) Virginia StateUb*aTv"» Arcttlvaa
4T Vi
Farmer'* Market
Fadaral Reserve
Great Shlplock Park
Hollywood Cemetery
Stonewall Jackson Monument
Kanawha Canal Locks
Robert E. Le* Monument
Main Street Station
-------
I / .AUSIi-UT
Malro Richmond Visitors C»nt»r
Agecroft Hall
Amtrak Station
Arthur Asha, Jr. Athletic Canter
Borksdale Theatre at Hanover Tavern
Lewis Ginter Botanical Garden
Carytown
Chesterfield Towne Center
Cloverteaf Mall
Greyhound Station
Hanover Courthouse
Henrico Courthouse Complex
Henricus Park
Historic Chesterfield County Museum
Magnolia Grange
McGu ire Veterans Hospital
Meadow Farm Museum/Crump Park
Paramount's Kings Dominion
Virginia E. Randolph Museum
Regency Square
Richmond Braves, The Diamond
Richmond International Raceway
Lore Robins Gallery
Scotchtown
The Showpiece
State Fairgrounds on Strawberry Hill
Sw.ft Creek Mill Playhouse
Three Lakes Nature Center & Aquarium
Tuckahoe Plantation
Virginia Aviation Museum
Virginia Center Commons
Virginia House
Westhampton/The Shops
at Libbie & Grove
Willow Lawn, The Shops at
Wlrton House Museum
-------
mm
downtown
IMIll
A Guide to Restaurants in Metro Richmond
downtown, inctuaina historic church hill, aiockoe iup ana 6nocKoe bottom
DOWNTOWN
• Alllet American Grill. The Richmond Marriott, 500
East Broad Street. 643-3400. B/L/D/WB. Steaks, sal-
ads, pasta, sandwiches. Open 7 days a week: Break-
fast: 6:30a.m.-l la.rn Lunch: 11 a.m.-2p.m. Dinner 5
p.m.-IOp.m Sunday Brunch Buffet: 7a.m.-noon.
S3.SO-SI6.9S.
• Apollo Italian Restaurant: 703 East Broad Street,
649-7070. Italian LSD. Take out. Lunch, dinner, Mon-
day-Friday: 9a.m.-8p.m.; Saturday: I0a.rn.-8p.rn.
Closed Sunday. S2-SI6. No max group size.
• Becky's: 100 E. Gary Street. 643-9736. B/L+. South-
ern style cooking including homemade soups, sand-
wiches. Breakfast and lunch: 7a.m.-3p.m. Drinks and
lite fare only from 3p.m.-10:30pm. Take out avail-
able. S3 SO-S4.9S.
• Bill's Barbecue: 700 E. Mam Street, 643-9857. B/L/
D. Famous pork barbecue, strawberry pie, limeades.
Soup and salad bar. Open 7 days a week. Breakfast:
7a.m.-l 0.30a.m.; lunch/dinner: till 9p.m. S1.50-S5.
Maximum group size: 180 (call in advance).
• Blue Point Seafood Restaurant: 550 East Grace
Street. 783-8138. Seafood. L/D. Fresh seafood flown
from Boston, pasta, mixed gnll. Lunch: S4.95-SI 1.25;
dinner $8 95-SI7. Catering for private parties, recep-
tions; max. group size. 60; 1,000 with use of attached
Sixth Street Marketplace. Lunch and dinner Monday-
Saturday, 11:30a.m.-4p.m. and 4p.m.-9p.m. Open Sun-
days for adjacent Carpenter Center performances or
other large events downtown. Children's menu.
• Cafe Ole: • 2 North 6th Street. 225-8226. Mexican.
B/L/D. California-style bumtos, quesadillas & taco
salads. Breakfast. Monday-Friday. 8a.m-10-30a.ro.
Lunch. Monday-Tuesday. 11:30a.m.-3:30p.m. Lunch/
Dinner. Wednesday-Thursday. 11:30a.m.-7.30p m.
Closed Saturday-Sunday. Breakfast from S3. Lunch/
Dinner from $5.
Casablanca: 6 East Grace Street, 648-2040. Standard
American Fare L/D. Sandwiches, salads, burgers.
Open 7 days a week. Lunch: I la.m.-2a m. Ample por-
tions, pool table. S5-S8.
• Chez Foushee 203 North Foushee Street, 648-3225
Eclectic B/L/D. Monday-Friday. Continental break-
fast- 9a m -1 la m. Lunch: I la.m -3p.m Dinner Tapas
bar 4p m -9p.m. Soups, sandwiches, prepared salads
and tapas menus, also boxed lunch, full-service cater-
ing & private parties, weddings, corporate meetings.
$3-510
• China Gourmet. 204 East Grace Street, 788-8888
Chinese Lunch only. Monday-Friday, 1 la.m -6p m.,
Saturday, Noon-5p.m. Closed Sun. Average check S5.
Maximum group size: 25. Reservations required for
patty of 10 or more.
• Cross Roads Restaurant A Lounge: 217 West Clay
Street, 643-2060. FrencVCajun DAVB. Deep South-
ern/New Orleans Jazz & Blues. Dinner. Wednesday-
Sunday. 5p.m.-9p.m. Sunday Brunch: l)a.m.-2p.m.
Closed Monday-Tuesday. Dinner S5-SI5; children &
senior specials. Maximum group size. 85. Live Jazz
Show 9p.rn.-2a.rn. Tuesday-Saturday Christian Jazz,
2p.m -6p.m. Sunday. Reservations recommended.
• DJ's Fresh Garden Cafe: 701 East Franklin Street,
643-6592. Deli/Bakery. B/U Hot lunch specials, cater-
ing for panics, cookies, cakes. Monday-Friday, 7a.m.-
4p.m. Closed Saturday-Sunday. Breakfast SI-S3. Lunch
S3.89-S4.65. Max group size: 42.
• The French Quarters Restaurant: 421 East Franklin
Street. 643-1268. French L/D/WB. Continental French
cuisine Open 7 days a week. Lunch- Monday-Friday,
ll:30a.m.-2p.m. Dinner 5:30p.m.-IOp.m., Saturday,
till llp.m. Sunday Brunch: ll:30a.m.-2p.m. Lunch
S3.95-S11.50, Dinner S12.95-S24.9S. Max group size
175. Reservations recommended.
• Fu Kim: 515 East Mam Street, 780-2999. CUnese/
Vietnamese. Lunch only. Monday-Saturday, 11a.m.-
2p.m. Closed Sunday. S1-S5.
• Homemadea by Suzanne: 10 South 6th Street. 775-
2117. Boxed Lunches/Bakery/Dea. B/U Homemade
breads, salads, soups A desserts, delivery available. Con-
oriental Breakfast, Lunch: Monday-Friday 9a.rn.-3p.nj.
Closed Saturday-Sunday. Breakfast $1.50-S3.50. Lunch
S6.25-S8.SO.
• J.R Crowdert Dell: 305 Brook Road, 648-2565.
BBQ/DelUHome Cooking B/U Take out. country
Smithfield hams, sandwiches. Monday-Saturday.
6:30a.m.-4p.m.
• Just Willie's Cafe: 6 North 6th Street. 643-9330.
Home Cooking. Fresh turkey, baked ham, homemade
chicken salad, soup. Lunch only. Monday-Friday,
lla.m.-3p.m. Closed Saturday-Sunday. S2.50-S4.95
Max group size: 10.
• Lcmalre Restaurant: -The Jefferson Hotel, Franklin
& Adams Streets. 788-8000 ext. 6366 - RegionaWA.
B/L/D/WB. Richmond's only AAA S-Diamond Res-
taurant. Upscale, 7 private dining rooms, extensive
wines. Breakfast/buffet: Monday-Friday 6:30a.m.-
10a.m.. Saturday-Sunday till 11a.m. Lunch: Monday-
Friday, Noon-2p.m. Dinner: Monday-Saturday,
S.30a.m.-I0p.m. Breakfast S10, children S7.95. Lunch
SI4, children S9.9S, Dinner S34, children SIS. Maxi-
mum group size: 75. Reservations recommended.
• Unden Row Inn: 100 East Franklin Street, 783-7000,
Southern Ctusme B/L/D/WB. Open 7 days a week.
Chefs specials, steaks, pastas, fish, exceptional
crabcakes; also patio dining at this antebellum land-
mark, a property of the NationalTrust for Historic Pres-
ervation. Continental breakfast. Monday-Friday. 7a.m.-
10:30a.ra. Saturday, 7:30ajn.-10.30a.m. Lunch: Mon-
day-Sunday, ll:30a.m.-2:30p.m. Dinner. Monday-
Sunday. S:30p.m.-10pjn. Sunday Brunch: ll:30a.m.-
2:30p.m .Breakfast: S4.25-S5.9S. Lunch S6.25-S12.95,
Dinner SI4.95-S22.95. Reservations recommended.
• Nick's: 707 East Main Street. 644-1212.//omeCao*-
tng B/L. Boxed lunches, catering, lowfat/fat free menu
available. Breakfast: Monday-Friday 6:30a.m.-
2:30p.m. Lunch: Monday-Friday, I0a.m.-2:30p.m.
Closed Saturday-Sunday. S2.SO-S4.50.
• Ocean Restaurant: 414 East Main Street. 649-3456.
Home Cooking. B/L. Breakfast: Monday-Friday,
7a.rn.-l la.m. Lunch: Monday-Friday, I la.m.-2:30p.m.
Closed Saturday-Sunday. Breakfast S1.25-S2.99,
Lunch S1.25-S3.9S. Maximum group size: 25.
• Padow's Hams A Dell: -1110 East Mam Street,
648-4267. Deli. B/U Specializing in Smithfield coun-
try, honey glazed & spiral sliced hams in-store ft mail
order, plus sandwiches, prepared salads, soups, take-
out and eat-in, Monday-Friday. 7a.m.-5p.m. Closed
Saturday-Sunday. Breakfast from SI; lunch from
S2.50.
• The Pavilion: Crowne Plaza Hotel. 555 Canal Street.
788-0900. B/L/D/WB. Steaks, pasta, crab cake plat-
ter ($17.95) Open 7 days a week: Breakfast, 6a.m.-
I la.m.. Sunday Buffet till I0:30ajn. Lunch. 1 la.m.-
2p.m. Dinner, Sp.m -10p.m. Breakfast to $10 Lunch
S3.95-S1I.9S. Dinner S7.9S-S 18.95. Banquet facili-
ties available.
• Penny Lane Pub & Restaurant - 207 North Sev-
enth Street, 780-1682. L/D. Authentic British style pub
known for fish and chips, steak and kidney pie, hearty
fare. Lunch: Monday-Friday, lla.m.-2p.m. Dinner
Monday-Satuidy from S p.m. Lunch to S10. Dinner
S6.95-SI7.95.
• Perly's Restaurant: 111 East Grace Street. 649-
2779. Home Cooking/Dell B/U Breakfast: Monday-
Friday 7a.m.-1 la.m. Lunch: 1 la.m -3p m Closed Sat-
urday-Sunday. Breakfast SI.50-S4.SO. Lunch $2.75-
$6.85. Max group size 8.
• Mr. Beaungard'sThal Room: 103 East Gary Street,
644-2328. nal/Amerlcan cuisine. L/D. Formal/casual
dining. Lunch (Thai & American fare): Monday-Sat-
urday, 1 la.m.-2:30pjn. Dinner (Thai cuisine): Mon-
day-Thursday, 4:30p.m.-l 1p.m., Friday-Saturday,
4:30p.m.-l2p.m. Sunday. 4:30p.m.-9p.m. Lunch: $5-
$10. Dinner S9-SI5. Max group size 190.
• Plereei Pill Bar-B-Que - 1116 East Mam Street.
643-0427. BBQ Lunch only. Boxed lunches, cater-
ing, outdoor open pill BBQ, featuring hand-chopped
pork, nbs. chicken; also sanwichcs. salads Available
for morning, afternoon or evening meetings, parties
Lunch: Monday-Friday, I0:30a m.-4:30p m. Closed
Saturday-Sunday. S2-S8. Max group size 75
• The Red Door: 314 East Grace Street, 649-1588.
Greek/Italian/American L/D. Daily homemade foods
ft bread, daily specials. Lunch: Monday-Saturday,
lOam-Spm Dinner, Monday-Saturday, 5p.rn.-8p.rn
Closed Sunday. SI .50-57.2$. Maximum group size 75.
• Saigon Restaurant: 903 West Grace Street, 355-
6633. Vietnamese L/D. Lunch. Monday-Friday,
1 la.m.-2p.m. Dinner, Monday-Friday.Sp.m.-
10:30p.m. Saturday, Noon-10p.m. Closed Sunday
Lunch S3.75. Dinner S4.95-SI0.95. Maximum group
size 48.
• Steve's Restaurant. 110 North 5th Street. 649-3460.
B/U Homemade specials, Italian dishes, corned beef
and cabbage. Monday-Friday. Breakfast and lunch:
7a.rn.-2.45p.rn. S2.99-S5.50.
• 3rd Street Diner: 218 East Main Street. 788-4750
B/L/D. Open 24 hours Basic fare in double-decker
diner, burgers, fhes, daily specials and greek specials.
Breakfast served all day. Prices from about S2.
• TJ.'s: The Jefferson Hotel, East Franklin and Adams
Streets, 788-2000. L/D/WB. Lunch and dinner: Mon-
Sat I la.m.-2a.m. Dinner from 5 p m Lite menu, en-
trees, salads, sandwiches, pasta, chicken, steaks. $6-
$17. Sunday Champagne Brunch (reservations re-
quired) at S28.95 per person is from 10:30a.m.-2p.m.
• Tony's Bar-Be-Que - 207 North Third Street, 644-
8544. B/U All homemade fare, sandwiches, chicken
fillet, BBQ. Breakfast and lunch: Monday-Saturday,
6a m -4p m SI 99-S3 99 Closed Sunday
-------
from eclectic ond southern
cuisine to Indian ond Ihai
restaurants... from
Antebellum hornet to
converted tobacco
wrehouses... Richmond'*
downtown restaurants appeal
to everyone's palette!
• Ukrop's Fresh Express: 1 Oth & Main Street Streets,
648-5633.B/L.0eA Sandwiches,salads,soups,chips.
plus "heart healthy" items. Eat in or cany out. Mon-
day-Friday. Breakfast: 7a.m.-IOa.m. Lunch. lOa.m-
3:30p.m. S.99-S4.99. Maximum group size 30.
• Vie de France: James Center, 1051 East Cary Street,
7804748. B/L. 7:30a.m.-4:30p.m. Monday-Friday.
Sandwiches, soups, salads, muffins, bagels. Cafeteria
style/self-serve. Sit in or cany out. S.99-SS.99.
•WaDStPeetDeU:100Noith8thStreet,643-3354 Deli.
B/L. Classic deli serving subs ft sandwiches, corned beef.
pastrami prepared in house. New York bagels. Monday-
Fnday, 7a.m.-3p.m. Closed Saturday-Sunday. S3-S7.
Maximum group size 38.
• Winnie's Caribbean Cuisine - 200 East Mam Street.
6494974. Carribean. L/D. St. Lucian owner/chef is
known for her hot and spicy jerk chicken, conk frit-
ters, crab cakes and tasty Caribbean lemonade, plus
lunch and dinner specials. Tropical decor, bright col-
ors, reggae music. Lunch: Monday-Friday, 11a.m.-
3pjn. Dinner 5p.m.-9:30p.m. Saturday (dinner only),
6p.m.-IO:30p.m. Lunch S4.SO-S7.99. Dinner S5.50-
S12.99. Closed Sunday, but will open for large groups
with prior arrangement
CHURCH HILL
• Annabel Lee Rlverboat: 4400 East Main Street,
644-5700. Variety. Riverboat cruise ft buffet-style
dining with live entertainment ft commentary; lunch/
bnmch/dinner&plantation cruises available. Lunch, Tues
day Plantation Cruise I0a.m.-S:30p.m.. Wednesday-
Fnday, Noon-2p m, Saturday Ham -lp.m. Dinner
Wednesday-Thursday, 7p.m -9:30p.m., Friday-Satur-
day. 7:30p.m-IO:30pm. Sunday, 6p.m-8:30p.m.
Closed Monday. Lunch SI7.95, children S9.9S, Din-
ner S24.9S, children SI 1.95. Max group size 350.
• The Hill Cafe. 2800 East Broad Street. 6484360
B/L/D/WB. Diverse menu including lobster, prime
rib, salads, burgers, bumtos and quesadillas Break-
fast: Monday-Friday, 7a m.-3p.m. Lunch: Monday-Fri-
day, I la rn.-3p.ni. Dinner Tuesday-Sunday, 5i30p.m.-
2am Saturday and Sunday Brunch: I0-30a.ni.-
3:30p.m. Lunch S4.95-S6.9S. Dinner. S5.95-SIS.95.
Can accommodate large groups.
• Millie's Diner: 2603 East Mam Street, 643-SS12.
L/D/WB. Globally inspired eclectic menu featuring
"fusion* cuisine Menu changes monthly. Lunch- Tues-
day-Friday, lla.m.-2:30p.m. Dinner Tuesday-Satur-
day, 5:30p.m -I0:30p.m. Sunday dinner till 9p m. Sat-
urday brunch: 10 a.m.-3pm., Sunday brunch 9a.m-
3p.m. Lunch S6-SIO. Dinner S14.95-S2I.95.
• Mr. Patrick Henry's Inn: 2300 East Broad Street,
644-1322. Continental. L/D. Warm and woody inside
of this 1850s row house convened to an inn and res-
taurant. Garden dining, soups, salads, entrees, chefs
specials. Lunch: Monday-Friday, 11:30a.m.-2:30p.m.
Dinner Monday-Saturday, 5:30pm-9:30p.m. Lunch:
S6-S12. Dinner S18-S23.
• Poe's Pub: 2706 East Main Street, 648-2120. L/D.
Irish pub atmosphere, best known for its catfish and
ribs. Casual dining. S2.95-SI5.9S. Open 7 days a week
for lunch and dinner from 11 a.m.-2 a.m.
The Tobacco Company Restaurant: 1201 East Cary
Street, 782-9431. Continental L/D/WB. Seafood ft
VA specialties, historic landmark known for prime rib
with seconds on the house, live entertainment nightly.
Lunch- Monday-Saturday I l:30a.m.-2:30p.m. Dinner
Monday-Friday 5:30p.m.-10:30p.m, Saturday. 5p.m.-
1 lp.m., Sunday, 5:30p.m.-IOp.m. Sunday Brunch,
10:30a.m.-2:30p.m. Lunch S2.99-S9.95, children
S2.99-S6. Dinner SI3.9S-S26.95. children S2.99-S10.
Max group size 100. Space for catered receptions up
to 300.
SHOCKOE SLIPlSHOCKQC BOTTOM
• The Berkeley Hotel Dining Room: 1200 East Caiy
Street, 780-1300 B/L/D. American, European, voted
one of Richmond's best restaunnl experiences, exten-
sive wine list, many from Virginia. Breakfast:7a.m.-
I0:30a.m. Lunch: ll:30a.m-2p.m. Dinner: 6p.m.-10
p.m. Breakfast and lunch: S2.50-SI3.SO Dinner$17.
Maximum group size: 20.
• LiGrolta Restaurant: 12th & East Gary Streets,
644-2466. Italian L/D. Voted Best Lunch Spot by Style
Meekly ft voted ••••1/2 stars by Richmond Times-
Dispatch Lunch: Monday-Friday. 11:30a.m.-2:30p.m.
Dinner, Monday-Thursday, 5:30p.m-IOp.m., Friday-
Saturday, S:30p.m.-llp.m.. Sunday Sp.rn.-9p.rn.
Lunch S6.9S-S8.95. Dinner S9.9S-SI8.9S. Max group
size: 130.
• Nana Znshl: 1309 East Gary Street, 225-8801 Japa-
nese LID. Sushi bar and a la carte, teiriyaki. tempura
dishes. Lunch: Monday-Friday, 1 l:30a.m.-2p.m. Din-
ner Monday-Saturday, 5:30p.m.-10p.m. Lunch $4.95-
S9.SO. Dinner S6-S12.
• Peking Pavilion: 1302 East Gary Street. 649-8888.
Chinese L/D/WB. Northern Chinese cuisine. Lunch:
Sunday-Friday, ll:30a.m.-2:15p.m. Dinner, Sunday-
Friday, Sp.m -9:45p.m., Saturday, 5p.m.-10:4Sp.m.
Sunday Brunch. Lunch S4-S7, Dinner S7-S14. Max
group size 200.
• Rkhbran Brewing Co. and Restaurant: 1214 East
Gary Street, 644-3018. LID. Virginia's original
microbreweiy and only mierobrewery lestuamnt All
beer is made on premesis. Open 7 days a week. Two
floon featuring full service restaurant downstairs and
bar with pool and darts upstairs. Menu includes fish
and chips, pastas, chefs specials, catch of the day.
soups, salads, sanwiches. S3.9S-S15. Lunch:
11:30a.m.-4p.m. Dinner from 4 p.m. Children's menu.
Large groups accomodated.
• Sam Miller's Win-home: 1210 East Cary Street.
644-5465. SeafoodMeglonal VA B/L/D. Breakfast:
10a.m.-5p.m. Lunch: lla.m.-Sp.m. Dinner: 5p.m.-
1 lp.m. Breakfast SS.9S-SI2.9S. Lunch SS-SI2, Din-
ner SI I .95-522.95. Bus parking available. Max group
size: 175.
• Skipjack Tavern & Comedy Club: 109 South 12th
Street, 644-0848. LID. Open 7 days a week. Restau-
rent features nw bar with clams, oysters and crab legs
from owner's Chieoteague oyster farm; plus fish and
chips, sandwiches and traditional entrees. Lunch/din-
ner ll:30a.m.-2a.m. SS.9S-SI6.95. Two seatings for
weekend Comedy Club. Call for reservations and in-
formation.
• The Slip at Shockoe: 11 South 12th Street, 643-3313.
Home Coating/Soul food. B/L/D. Lunch ft dinner buf-
fet, salad bar. sandwiches, beer. wine, mixed beverages.
Breakfast: Monday-Friday, 7a.rn.-l 1a.m. Lunch: Mon-
day-Friday, lla.m.'3:30p.m. Dinner buffet: Friday,
5p.m-9:30p.m.. Sunday, 6p.m.-9:30p m. Closed Sat-
urday. Breakfast S99-S3.99, Lunch $1.49-5499, Din-
ner S3.95-S8.95. Max group size 125 Dancing Thurs-
day-Saturday, 9pm.-2a.rn, Sunday. 9p.m.-la.m Happy
Hour Friday. 5p.m.-9p.m.
•Awful Arthur's Seafood Co.: 101 North 18th Street.
643-1700. Seafood/RegionaUVA LID. Fresh seafood.
rawbar with oysters, clams, crablegs, shrimp ft craw-
fish, daily specials, theme nights. Lunch-Dinner Mon-
day-Friday, 11:30a.m.-2a.m., Saturday, Noon-2a.ni.,
Sunday. 4p.rn.-2a.rn. Lunch SS-S8. Dinner S9-S1S.
• Sea Breeze Cafe: 3 South 15th Street, 649-8516.
Caribbean. L/D. Hot and spicy island food; known
for conk fritters and mango shrimp. Lunch: Tuesday-
Friday, 11:30-2:30p.m. Dinner: Tuesday-Friday,
S:30p.m.-2a.m. Saturday and Sunday Sp.m.-2a.m.
Lunch S3.7S-S7.50. Dinner S5-SI4. Can accomodate
large groups with advance notice.
• The Bottom Line Tap & Grill: 1800 East Main
Street, 644-5944. American L/D. Sandwiches/Puft
Cnib. best selection of bottled beer in Bottom. Lunch:
Monday-Saturday, Noon-2p.m. Dinner Monday-Sat-
urday, Sp.nL-2a.rn. Closed Sunday. Lunch-Dinner SS-
SIO. Max group size 6.
• Bottoms Up Pizza: 1700 Dock Street, 644-4400
Plaa L/D. Gourmet pizza. 2 open-air decks, voted
best pizza 7 yean running. Monday-Wednesday,
ll:30a.m.-llpm, Thursday ll:30am-midnight, Fri
11:30a.m.-2a.m., Saturday, Noon-2a.m.. Sunday,
Noon-midnight $5-12. Max group size 300.
• Calypso Cafe: 1718 East Franklin Street, 22S-9776.
Caribbean. L/D/WB. Also seafood, steaks & vegetar-
ian, Caribbean theme (Jimmy Buffet ft rum runnersX
catering, parties, one of city's largest rooftop open air
decks. Lunch: Monday-Sunday, 1 la.m.-3p.m. Dinner
Monday-Sunday, 4p.rn.-l Op.m. Plus, Sunday Brunch.
Lunch S5-S8, Dinner $6-$ 12. Max group size 12.
• Castle Thunder: 1726 East Main Street, 648-3038.
L/D. Extensive sandwich menu, new outdoor dining
deck on Main Street. Open seven days a week, 11:30
a.m.-2a.m. S4.9S-S6.95. Maximum group size: ISO.
• Chaplin's GrlU: 2001 East Franklin Street, 643-
7520. Pasta, steaks, Cajun shrimp. Friday-Saturday.
10p.m.-2a.m.
•Cheat's Cow ud Clam Tavern: 21N. 17th St. 644-
4310. Seafood. Dinner only. Oldest bar in the Bot-
tom. Shellfish, pasta and steaks served informally.
Home of the famous "Moister Oyster - shucked oys-
ter, cocktail sauce with a shooter of beer on the side.
Dinner Tuesday-Saturday, Sp.m.-2a.m. Closed Sun-
day and Monday. S3.2S-S1S.9S. 1/2 price specials on
Tuesday. Lanje groups OK with notice.
• Cobblestone Brewery & Pub: 110 North 18th Street.
644-2739. Cajun. New Orleans. Jamaican, steaks.
• The Frog and the Redneck 1423 East Cary Street,
648-3764. Modem American Regional Dinner only.
Consistently rated as one of Southeast's finest restau-
rants and winner of many awards for excellence. Fea-
tures great local products including seafood, meats and
veggies Celebrity chefJimmy Snead cooked with Julia
Childs on "In Julia's Kitchen with Master Chefs." Din-
ner Monday-Friday, 5:30p.m.-IOp.m. Saturday, 5pm.-
10:30p.m. Will accomodate large groups with advance
notice.
• GoodfeUas: 1722 East Main Street. 643-5022. Pro-
gressive rocknroll with state of the art sound system
and house DJ. Paul. Wednesday-Saturday: 5p.m.-
2a.m.
-------
• The Hard ShcD: 1411 East Caiy Street. 644-5341.
Seafood. L/D. Seafood spot with lobster bar. steak.
diverse menu. Lunch: Monday-Saturday. 11:30-
2:30p.m. Dinner: Monday-Saturday, 5:30-10:30.
Closed Sundays. Lunch S4.25-S6.95. Dinner $12.95-
S21.95.
• Havana '59: 16 North 17th Street. 649-2822 Cu-
ban/Caribbean. Dinner only. The ultimate in theatri-
cal dining. Cuban cuisine in a re-created 1950"$ Ha-
vana streetscape. Cigar smoking, rooftop dance floor,
great fresh juices. 4:30p.m -closing S8-20
• Homer's Real Sports Grill: 14 North 18th Street.
643-2222. American. Two laser disc video screens
Hearty food including fried chicken, Buffalo wings,
meatloaf. Dinner Monday-Friday. 4p.rn.-2a m Lunch/
Dinner Saturday and Sunday, 12p m.-2a m.
• Johnson's Grill: 1802 E. Franklin St.. 648-9788
Soul Food No smoking or alcohol Open Monday-
Friday. Breakfast 6a.rn.-l la.m. Lunch 1 la.m -Ip.m
Closed Saturday-Sunday Breakfast S3.95. Lunch $4-
S6.50. Max group size 55.
•Main Street Grill: 1700 East Main Street, 644-3969
Vegetarian/American Grill L/D/WB. Grill by day.
vegetarian by night. Open Tuesday-Sunday. Breakfast-
7 a.m -11a.m., Lunch I la.m.-2.30p.m. Dinner: 6p.m -
12p.m. (vegetarian cuisine). Casual dining Break-
fast to SS; lunch and dinner around S7.95.
• Marks: 1707 East Franklin Street. 649-1079. Ameri-
can L/D. Sandwiches, homemade chips, pool table.
live music on weekends. Lunch/Dinner- Monday,
ll:30a.m.-8p.m. Tuesday-Friday. ll:30a.m.-2a.m.
Saturday, 5p.m.-2a.m.
• Medley's: 1701 East Mam Street. 648-2313 Cajun/
Creole. L/D/WB. News Orleans-style Cajun & Blues
Bar. Lunch: Monday-Saturday, Noon-3p.m Brunch
Sunday, Noon-3p.m. Dinner: Monday-Saturday. 6p.m-
11p.m. Appetizers only Monday-Saturday, 3p.m.-
6p.m.. I lp.m.-2a.m. Lunch S5.95. Dinner S12.95. Ap-
petizers S4.50-S7.50.
• Moondanee Saloon and Restaurant: 9 North 17th
Street. 788-6666. Southwestern. Dinner only. Cuban
chef, blackboard menu, great drinks. Dinner Tues-
day-Saturday. 6p.m.-11p.m. plus afterhours sand-
wiches till 1a.m. Alternative music night on Monday
from 9p.m. featuring college bands and a limited menu
S6-SI4.95.
• None Such Place: 1721 East Franklin Streel,.644-
0832. Regional/VA Traditional VA cuisine using fresh
ingredients ft classic culinary techniques, housed in
oldest commercial building in Richmond. Lunch-
Monday-Saturday, I l:30am-3pm, Dinner Monday-
Saturday, from 5:30pm. Closed Sunday Lunch S5.9S.
DinnerentreesSll.50-S20.95 Max group size 80-100.
•Rack-n-Roll Cafe: !713East Main Street. 644-1204
American grill. Sports bar atmosphere with pool tables,
darts, foozeball. Lunch-Dinner Monday-Wednesday,
11:30a.m.-midnight. Thursday-Friday, 11:30a m -
2a.m. Dinner Saturday-Sunday. 6p.m -2a m Lunch
S4.50-S6.50. Dinner S6.50. Maximum group size 300.
• River City Diner: 1712 East Main Street, 644-9418.
American Diner Food with flair, breakfast anytime.
Tuesday-Wednesday. 8a.m.-2a.m., Thursday-Friday,
8a.m.-4a.m., Saturday, 24 hours until Sunday 3p.m
Closed Monday. Average check S6.25.
• Rock Bottom Pizza-13 North 17th Street, 225-1382.
Pizza. 70s atmosphere. Wednesday, 9p m -2a m.,
Thursday-Saturday, 6p.m.-2a.m. Closed Sunday-Tues-
day. Max group size 30.
• Shotz: 4 North 18th Street, 649-7468. Deu/Piua
Fresh cooked pizza & subs, bar crowd after lOp.m.
private parties, 21 ft over only. Dinner Monday-Satur-
day. 5p.rn.-2a.rn. Closed Sunday. Dinner S5. Maximum
group size 20.
• Southern Sugar A Spleei: 2116 East Main Street,
788-4566. Southern B/L/D. Real, down-home south-
em cooking including fried chicken, liver and onions,
meatloaf, fish, pork chops. Monday-Saturday. Break-
fast: 9a.rn.-11a.m. Lunch: 1 la.m -4p.m. Dinner 4p.m -
9p.m. Breakfast: to SS. Lunch S5-S8. Dinner. S9-SI6.
• Star of India: 1703 East Franklin Street, 648-5470
Indian. L/D/WB. Lunch- Monday-Saturday,
11:30a.m.-2:30p.m. Dinner Monday-Thursday Sp.m -
10p.m., Saturday, Sp.m.-llp.m. Lunch buffet week-
days, S5.95. Dinner S7.50-S13. Sunday brunch S7.95
• Snnset Grill-1814 East Main Street. 643-2926. Surf
& Turf L/D. Capitalizes on neighborhood meat mar-
ket. Chicken, burgers, steaks every which way. Lunch-
only in spring 4 summer months on large outdoor
patio. Dinner Thursday-Saturday, Sp.m-1 a.m Lunch
S4-6. Dinner S5-S12 Closed Sunday-Wednesday.
• Surf Side Grill: 1714 East Franklin Street, 644-8704
Seafood Beach-fresh seafood Lunch-Dinner. Mon-
day-Friday, 11a.m.-10p.m. Dinner. Saturday, 4pm-
2a.m. Closed Sunday. Lunch S4.95-S8.95. Dinner
S8.99-S17.99. Lunch-Dinner children S4.25.
PuUMtr aaoMi no mpaaUOaffar At ccctncy oflnfmnam
contabtfi tttni• and ca\tia contacting rtttotumu dlitetfy to
cntttm Inlermalon atom kem of optima*, pnea. fonnnn.
« MCTBKWD «HO04) 7«-
-------
In Search of...
Environmental Statistician
The United Nations has an opening for an environmental statistician. Salary is $108,000.
For information contact the UN web site: http:\\www.un.org
Press "general information" and then
Enter "UN employment"
United Nations contact person is Patricia Nicolos, (212) 963-5783.
This information was provided by EPA contact, Kathleen Hogan, (202) 260-9349.
-------
United Slates
Environmental Protection
Agency EPA
Policy, Planning, and Evaluation (2163)
THE TWELFTH ANNUAL
EPA CONFERENCE ON
ENVIRONMENTAL STATISTICS
Richmond, VA April 1-3, 1997
IDEAS ARE NEEDED
TO FILL THIS SPACE
Wanted: Conference Logo
Theme: Statistics for the Future
Reward: Contact Barry Nussbaum
-------
B I* A TWELFTH AXXTAL COXFEIIEXCK X
EXVIHOXMEXTAL STATISTICS
SPECIAL Preview EDITION
STATISTICS FOR THE
FUTURE April 1-3, 1997
RICHMOND, VA Site of the Twelfth Annual
EPA Conference on Environmental Statistics.
"Thought the EPA Conference was supposed to
come to town last year," mused a Richmond
resident. Well, we didn't make it then, but
we're back and looking for a big turnout at
this year's conference. Personnel from EPA
and other Federal and state agencies will
gather south of the Mason-Dixon Line for a
two-an- a-hatf-day conference. The theme is
focusing on relevant applications of
statistics in government programs and how to
enhance statistical support, will feature
hands-on training sessions and
opportunities to learn about new statistical
techniques and software. There will be
sessions on health statistics, detection
limits, water quality, and the use of
statistics in Quality Assurance.
The conference's real, underlying
benefit to you is the opportunity to
exchange with others involved in similar
programs, with related problems, and on a
one-to-one level. Informal sessions, such
as the Poster/Technology Session and
Roundtable Discussions, provide an
atmosphere for sharing information, solving
problems, and building a network.
There is plenty of opportunity to get
involved. Check out the "Caiifor Your
PARTICIPATION" aA in this Special Preview
Edition. There is no limit to how much
involvement and fim you can have. And, from
winter weather predictions, you'll want to
cut loose and enjoy springtime in the old
South at the EPA statistics conference.
THE CONFERENCE IS BACK. VAU.COME.
SPECIAL FEATURES
••*••**••••»•*••
Ho Registration Fee
Transportation Provided from EPA
Headquarters
••••••••••••*•••
Costs within Government Per Diem
Fulfill* Qualifications as Training
WOW, what a year it has been. I'm sure
I'm not alone in saying that I've never seen
a set of furloughs and travel restrictions
that affected us as severely as last year.
But a funny thing happened on the way to
no forum. You may recall that despite all
our money saving techniques, we had to
forgo our annual conference on statistics.
In order to capitalize on the plans already in
progress by some of the professorial types
who were developing tutorial sesisons, we
decided to hold these training sessions in
Washington and RTP. This avoided travel
costs and travel restrictions for the
attendees from our two major locations.
We didn't intend to shut out regional and
laboratory folks at other locations, but we
had to do the best we could under unusual
circumstances. So what happened? We
didn't just salvage some sessions, we
actually learned that there was a real
demand for this training, and a good bit of
response came from people who normally
didn't attend the annual conference.
Imagine my surprise to hear "new"
participants asking why they weren't on
the list. They had heard about the
conference from a colleague down the hall.
So we are applying what we learned.
FIRST. I have personally arranged that the
government will not stop this year.
SECOND, we are still employing our cost
reduction methods to make the travel more
palatable to attendees. THIRD, and most
importantly, we are combining the
conference with enriched training in
Richmond on April 1-3, 1997 (no fooling!).
FOURTH, we are adding separate training
sessions in the late spring. We think we
may have hit on the best of both worlds
with this scheme. But it really depends on
your participation to make it a real
success. So, jump on the band wagon.
and participate! Write e paper, present a
poster, serve on a panel, and be active. I
look forward to seeing you in Richmond.
One last dilemma: If we had to postpone
last year's conference. Is this the 12th
annual conference on statistics, the 13th
annual conference on statistics, or the
12th almost annual conference on
statistics?; and was Grover Cleveland
really the 22nd and the 24th President all
by himself? If you can help me with any of
this, please caU, write, fax, e-mail, etc.
Thanks. BAmrNUSSBAUM
Emphasis on Training
Response to the series of statistical
training programs offered last spring in
DC and RTP was tremendous. Courses
in Regression Diagnostics, Information
Visualization, and SAS Applications
attracted a large and varied audience.
Positive feedback on the training
programs has led to a greater emphasis
on training opportunities at this year's
conference as well as training courses to
be offered in the late spring and/or early
summer of next year.
The Conference offers a variety of
training features, such as:
•• Abstracts of all Papers Presented at the
Conference
•• Training Programs Designed Specifically
for EPA Statistical Needs
•• Information from Current Publications in
Environmental Statistics and Information
Science
•• Informal Discussions with Other
Statisticians to Focus on Specific Problems
and Probable Solutions
Train for the Future In Statistics
CALL FOR YOUR
PARTICIPATION
(YOUare the conference)
UNCLE SAM and your EPA co-workers can
benefit from your experience. . . be a
participant in this year's conference. We
invite you to:
=!> Make a Presentation
=0 Chair a Session
•* Present a Poster
•=£> Moderate a Roundtable
Discussion
=& Become a Member of the
Conference Planning Commote
OR... you may have another ideal
Whatever you would like to do, name It,
and contact BARRY NVSSBAtlM NOW!
by phone at (202) 260-1493 or by fax at
(202) 260-4968 or by e-mail at
Nussbaum.Barry@epamail.epa.gov
-------
WHY ATTEND
•• Learn latest developments in environmental
statistics
» Share what YOU are doing
» Meet other colleagues
*> Present a poster; make a presentation
» See demonstrations of the latest statistical
programs
» Get answers to statistical problems
» Build team spirit
» Receive training in new software, statistical
methods, computers
* Build a network of statistical and information
specialists for the FUTURE
WHO WILL BE THERE
* EPA statisticians and survey specialists
» EPA developers and users of environmental
information and statistics
» EPA policy and decision makers
» State and local government environmental
information developers and users
» University experts and students
•> Special Guest Speakers
* YOU
REGISTRATION
FOR THE TWELFTH ANNUAL EPA CONFERENCE ON
ENVIRONMENTAL STATISTICS
RICHMOND, VA APRtt 1-3, 1997
Complete registration packets will be mailed on
JANUARY 31, 1997
la your mailing information correct?
Did we miss someone? Do you want to add a
colleague to the list?
Contact MARCIA GARDNER
SRA TECHNOLOGIES, INC.
Phone {703} 2O5-8547,fax (703) 2O5-626O or
E-mail: MAlECIfl.GAJtDNE8@sratech.eom
-------
c/EPA
UratedStote
EnvBonmentolProtection Agency
(2163)
401 M Street SW.
Washington. DC 20460
Official Business
Penalty for Private Use $300
Margaret G. Conomos
(2163)
-------
Question 1. How large does a group have to be to show health effects from arsenic exposure
between 10 and 50 Atg/1?
The 1960's Taiwan Epidemiological study studied people exposed to arsenic in drinking
water beginning in 1900. Wells ranged from 0.01 to 1.82 ppm (10-1,820 ppb or Atg/1).
Doctors physically examined 40,421 people out of 103,154 in 37 villages.
728 cases of skin cancer, 153 histologically confirmed.
72% had hyperkeratosis and 90% had hyperpigmentation.
The control group of 7,500, had an age distribution similar to the study population.
Arsenic ranged from non-detect to 0.017 mg/1 (17 ppb or //g/l). No skin cancer,
hyperkeratosis or hyperpigmentation in the control population. The expected number of
skin cancer cases, using the skin cancer rate for Singapore Chinese from 1968-1977 is a
little less than 3. Using this as the expected prevalence, the probability of observing no
cancer cases is 0.07.
EPA's drinking water criteria is 50 A*g/l or 50 ppb. The Taiwan study identified a
NOAEL (No observed adverse effects level) of 0.8 Atg/kg/day, and a corresponding
concentration in drinking water of
Question 2: How many infants should be in each concentration range, for a study of sulfates?
The Center for Disease Control (CDC) is proposing to study 1,000 babies exposed to
sulfate in their drinking water and compare them against 250 babies not exposed to
sulfate. They haven't identified the babies nor the exposure concentration ranges yet.
Sulfates cause a laxative effect-above 1,000 mg/I, and EPA's proposed drinking water
criteria is 500 mg/1, a level at which sulfates aren't expected to be a problem.
CDC's Sample Size Calculations for the planned study are attached.
In 1995 CDC studied 276 infants, and found 39 cases of diarrhea, with a median of 264
mg/1, and a range of 0-1271 mg/1. Non-cases had a median of 260 mg/1, and a range of 0
to 2787 mg/1. However, as seen by the attached graph, there were very few infants being
exposed to 500 mg/1 or higher.
Question 3: Are 100 participants, divided into 0, 500, 800, and 1200 mg/1 (40 per-group) enough
to establish a dose thaf-causes diarrhea?
In 1994, 4 volunteers drank water with 0,400, 600, 800,1000, and 1200 mg/1 sulfate at
48 hours. In a follow up study six people drank 1200 mg/1 sulfate for six-days and drdn^t
report "diarrhea.
From Irene Dooley 202/260-9531 f-\misc\epi-stafpwr •.
-------
R
Sample Size Calculations'
Confidence
95%
11
"
n
"
II
.Power
80%
»
n
it
»
n '
1
Unexposed:
Exposed
14
11
15
<(
1:6
it
Disease
in Exposed
13%
11
M
fl
•1
11
Risk Ratio
IS
16
1.5
1.6
l.S
1.6
Sample Size
Unexposed
345
250
332
241
324
235
Exposed
1.381
1,001
1,662
1,205
1,943
1,409
Total
1.726
1,251
1,994
1,446
2,267
1,644
1 Using Epilnfo (Version 5 Olb DOS) sample size calculations for unmatched cohort and cross-sectional studies (Exposed and
Nonexposed).
45
40 •
36
30
| 26
16 •
10 •
6 •
0
L
Mean = 363 mg/L
Median = 264 mg/L
Range* = 0 to 1327 mg/L
* One water sample (2787 mg/L) Is omitted
* 88 s
J •_
Sulfate Level (mg/L)
Figure 3. Frequency Distribution of Sulfate Concentration for All Water
Samples Submitted: June-October, 1995 (N=172)
13
-------
SEER'Stat
The SEER*Stat system is a statistical package for the analysis of SEER and other cancer
databases. SEER*Stat provides a graphical user interface for the production of the following
statistics and statistical tests.
• Frequencies
• Percentages
• Crude (non-adjusted) rates with standard errors and confidence intervals
• Age-adjusted rates with standard errors and confidence intervals
• Trends over time as percent changes, from crude or age-adjusted rates
• Trends over time as estimated annual percent changes, from crude or age-adjusted rates, with
confidence intervals
• Comparison of estimated annual percent changes with zero
• Comparison of two estimated annual percent changes with one another
SEER Web Site s
Home Page URL: httD://\v\vw-seer.ims.nci.nih.eov/
The SEER web site contains a variety of information about the SEER program.
Topics areas include:
• News
• About SEER
• Publications
• Online Systems
• Online Data
• Scientific Systems
• Registries
• Other Links
Online Systems
Cancer Query System (CANQUES) on the Web
CANQUES on the Web is an interactive system with a Java interface that allows the user to access
a variety of pre-calculated cancer statistics. There are currently in excess of 7.8 million pre-
calculated statistics available. CANQUES performs no calculations and contains statistics that
were created by the SEER Program for their routine reporting and the Cancer Statistics Review,
1973-1993 You must have a Java enabled browser to use the system and the most recent release of
that browser is recommended.
Type of statistics include:
SEER Incidence Rates
SEER Incidence Trends
U.S. Mortality Trends
SEER Median Age at Diagnosis
U.S. Mortality Median Age at Death
NHL and Kaposi's Sarcoma in San Francisco
SEER Relative Survival
Cclc
W
-------
Online Data
SEER Incidence Data - The February 1996 submission of the SEER Incidence database is available in
public use text format as self-extracting DOS executables. This data is for the nine standard registries and it
covers diagnosis years 1973-1993. (Password encrypted, requires completion of Public Use Data
Agreement to extract data. Public Use Data Agreement is available via internet.)
Population Data for the SEER Registries - The populations for the nine standard SEER registries, to be used
in conjunction with the above data, are available as self-extracting DOS executables. This data is stored in
text format and contains populations for 1973-1993 by individual registry and also by the counties defining
each registry.
United States Population Data - County level populations for each state in the U.S. are available as self-
extracting DOS executables. Each state file contains county populations by year, 1973-1993 A file
containing total United States populations is also available All files are stored in text format.
Scientific Systems
Portable Survival System
The analysis of patient survival plays an integral part in determining many aspects of cancer
prevention, control and treatment and is an important part in the interpretation of cancer statistics.
Since survival statistics play such an important role in the analysis of cancer data, the NCI
previously developed a system which generated survival statistics for researchers. This system is
the NCI's Mainframe Survival System which has been in use for over 25 years. A researcher must
have access to and a working knowledge of the NIH IBM mainframe system. This places a
limitation on the accessibility of the system. Also, repetitive mainframe usage costs are an issue
where a single analysis may cost hundreds of dollars depending on the requested parameters.
Information Management Services, Inc., in consultation with the Cancer Statistics Branch of the
National Cancer Institute, has developed a new, expanded and portable version of the Mainframe
Survival System called the Portable Survival System (PSS). The PSS is a Microsoft Windows-
based application which provides more access and greater ease in generating survival statistics
than its mainframe counterpart. The PSS retains all the features of the Mainframe Survival System
with several additional features The PSS can be installed on most PCs with access to a CD-ROM
drive
The NCI and IMS are currently in the process of integrating the PSS with the SEER*Stat system to
provide a single application for calculating a wide variety of cancer-related statistics.
The PSS is available on CD-ROM and may be ordered by mailing or faxing a completed Public
Use Data Agreement form (available from the SEER Web site) to the NCI.
-------
Applying Gyfs Theory of Sampling
to Problems of Representativeness in
Environmental Field Investigations
Malcolm J. Bertoni
Center for Environmental Measurements and Quality Assurance
Research Triangle Institute
Some Questions to be Answered
• Why consider Gy's Theory of Sampling?
• What are the main concepts of Gy's
Theory?
• How does Gy's Theory help address
representativeness?
• What are some limitations and questions
when applying Gy's theory to
environmental field investigations?
• How can I use this information to improve
my lot?
-------
Why consider Gyfs Theory of
Sampling?
Provides a theoretical and practical link
between statistical sampling concepts and
physical sample collection protocols
Helps clarify the relationships between
sampling units, sample support, and the
scale of inference
Provides a more sound scientific basis for
making measurements/observations of
sampling units
What's the origin of Gyfs Theory?
• Pierre Gy, a French mining engineer,
developed the theory in the late 1950s
through 1970s
• Addresses the estimation of mineral content
in ore
• Combines concepts from statistics, physics,
geology
• Has been applied to environmental
sampling by Pitard, Ramsey, others
-------
Main Concepts from Gy's Theory
• Types of sampling lots
• Types of heterogeneity
• Classification of errors
• Principles of correct sampling
• Methods for reducing errors
Types of Sampling Lots
D
0°
Zero-dimensional
One-dimensional
c> o • o
o
o
O
Two-dimensional
Three-dimensional
-------
Types of Heterogeneity
Short-Range (random fluctuations)
— Constitution heterogeneity
• How many constituents are in the material?
— Distribution heterogeneity
• How are the constituents distributed?
1 Long-Range
• Non-random trends, patterns
1 Periodic
• Cyclic changes
n
Constitution Heterogeneity
More
Less
fr
^
o
• t
•••
* •
-------
Distribution Heterogeneity
Less
More
How does Gy measure heterogeneity?
Based on analysis of particles or fragments;
extends to groups of particles of fragments
Interested in the fraction of material having
a particular property of interest ("critical
content"), expressed as a percent of mass
Heterogeneity defined in relation to the
critical analyte
-------
Heterogeneity of a Particle
*', = [a,. - a
)rmalize with repect to average mass of critical analyte:
where: ^ = concentration of particle
aL = average concentration of lot
Mj = mass of particle
ML = mass of entire lot
NF = number of particles in entire lot
Heterogeneity of a Group
where:
*, = —
a,, = concentration of a group of particles
Mn = mass of a group of particles
Nn = number of groups of particles
-------
Definition of Constitution Heterogeneity
(CH)
NF
CHL = s\h) = ± £ h]
NF i=i
C»t-
Definition of Distribution Heterogeneity
TV»» 9 1 ^^^ 2
L ^ /i' »T £-^ n
DHL = Si.
,,
aLML
-------
DH is defined in terms of
possible groupings...
hence DH is affected by group size
-------
Constant Factor of Constitution
Heterogeneity (IHL)
CHL is theoretical; it's difficult to estimate,
partly due to large NF term.
Multiplying by the average mass per
fragment, [ML / NF], eliminates the need to
estimate NF:
IHL = CHL [ML / NF]
Constant Factor of Constitution
Heterogeneity
IHL is more practical; can be estimated from
observable qualities and measures:
IHL = Cd3
where C = the sampling constant, calculated'
from several material parameters such as
liberation, shape, mineralogical factors;
d = particle diameter.
-------
Why such concern over
Heterogeneity?
Another measure of variability in a
population
Key to understanding and controlling errors
in environmental measurements
Foundation for understanding and applying
correct sampling principles
Gy's Classification of Errors
Fundamental
Grouping and
MQfBQBtfon Cf POf
ShorWonoe _
Lonponge
nuctufflton •
error
Pertodc
fluctuation —
Ml
HIBCllOII eiTOI
extractor error
tampl
-Total error-
Analytical
Overall
— estimation
-------
Types of Errors
• Short-range selection error (CE1)
- Fundamental error (FE)
- Grouping and segregation error (GE)
• Long-range fluctuation error (CE2)
• Periodic fluctuation error (CE3)
• Delimitation error (DE)
• Increment extraction error (EE)
• Preparation error (PE)
Fundamental Error (FE)
1 Caused by constitution heterogeneity
1 Can be estimated a priori by studying
properties of critical analyte and matrix to
be sampled
1 Main drivers are:
- qualities of heterogeneity
- particle size
- mass of the sample
-------
Fundamental Error (FE)
\
FE2 = -L--L
Ms ML)
IH,
where Ms = mass of the sample, assuming Ms« ML
Consequently:
« C—
Ms
Grouping and Segregation Error
(GSE)
• Grouping error introduced when fragments
are not selected one at a time (always!)
• Segregation error introduced when
fragments are not randomly distributed
(Distribution heterogeneity)
• Reduce GSE by:
- generating a sample by taking many increments
- homogenizing the material when possible
- selecting random locations for increment
extraction
-------
Gy's Theory helps statisticians:
1 Choose a sample mass (support) to satisfy
FE design constraint for a given particle
size and sampling constant
1 Reduce FE and/or sample mass through
grinding to reduce particle size
1 Reduce GSE by specifying, for example,
that "10 to 30 increments shall be taken to
form a sample"
Delimitation Error
1 Introduced when incorrect shape and
orientation for sample increment is selected
- design fault
- equipment selection/specification fault
1 Correct shapes:
• zero dimension — unit
• one dimension — slice
• two dimensions — cylinder
• three dimensions — sphere or cube
-------
Examples of ID Delimitations
Top View of Stream
Stream
Flow
I J I
Correct
Incorrect
Examples of 2D Delimitations
Tube
Sampler
Auger
The Material
Cross-Section View
-------
Extraction Error
Introduced when material is imperfectly
extracted in relation to the correct
delimitation
- implementation fault
- equipment selection/specification fault
Can result in systematic or random error
Many environmental sampling tools
introduce both delimitation and extraction
errors
Extraction Error Example
Cross Section Views of Sampling Spoons
Rounded
bottom
Flat bottom,
no side walls
Flat bottom with
side walls
Slice of material scooped from elongated pile
-------
Some limitations and questions
Gy's Theory based on particle
characteristics; environmental sampling
often involves media that don't translate
well to this model
Not clear how some chemical
contamination applies (e.g., sticky stuff that
adheres to particles?)
Is the average characteristic of the sample
always what the investigator wants to
know?
How can I use this to improve my lot?
• Design the right measurement protocols
(correct delimitation)
• Study the matrix you're sampling
• Increase the mass of the sample
• Take more increments for each sample
• Reduce particle size through grinding
(if OK for the material/contaminant)
• Specify correct subsampling protocols
-------
Scientific Extrapolation
Cycle of Representativeness
TARGET
POPULATION
Short-Circuiting
the Cycle of
Representativeness
AmlradnOm
Nv.rtlcinon
., ntfVIMUkn ,^
Sclmllfle/Paraanal Judgment
/ ,'* Extrapolation
-------
[1] From: Alan Goozner at DCOPP7 12/27/96 8:55AM (5369 bytes: 85 In)
To: melko@juno.com at IN
Subject: Master Sampling Frame for Non-Agrcultural Pesticide Research
Forwarded
From: Alan Goozner at DCOPP7 12/19/96 7:57AM (5121 bytes: 85 In)
To: chlorine-news@igc.apc.org at IN
cc: PEPI LACAYO at X400, BARRY NUSSBAUM at X400, MATTHEW LEOPARD at X400,
Alan Goozner, Rob Esworthy, Edward Brandt
Subject: Master Sampling Frame for Non-Agrcultural Pesticide Research
Message Contents
The EPA and the USDA historically have divided its
responsibilities for collection of pesticide usage
data where the USDA conducts surveys of farmers for
agricultural pesticide usage and the EPA conducts
specialized surveys of non-agricultural pesticide usage.
In the past, the EPA conducted the National Home and Garden
Pesticide Usage Survey and more recently the Certified
Commercial Pesticide Usage Survey. These two surveys were
National in scope and cost the Government over a million
dollars each to complete.
The EPA is not very well suited for the collection of data.
The Office of Pesticide Programs does not have a
professional data collection staff and needs to contract out
this activity whenever a study is conducted. This requires
competing in the private sector for a statistical
contractor, the clearance of an information collection
request through OMB and preparation of a report that must
clear many hurdles before being released to the public.
And, by the time the report reaches print, the data can be
as much as 2-3 years old.
Needless to say, the private sector can do a much better,
more efficient and more timely job in collecting data on .
pesticide usage.
In support of this need, the EPA may be in a position to
facilitate the collection of more and better pesticide usage
data for non-agricultural sites. The idea is to construct a
master sampling frame for non-agricultural pesticide usage
sample surveys.
If a frame can be constructed and maintained by the EPA, the
private sector can request samples from this list to conduct
specialized surveys of interest with the intent to share any
data with the EPA. The exact consistency of the frame is
yet to be determined but it may be composed of two major
components of the applicator population: A) Certified
Applicators and B) Homeowners.
Experience in conducting the Certified Commercial Pesticide
Applicator Survey at the EPA shows that state lists are
out of date. Many applicators on state lists have not
renewed their license or are no longer actively applying
pesticides. If these lists can be cleaned up and screened
for certain characteristics that the industry may need to
zero in on for future data collection efforts, a highly
efficient sampling frame can be constructed. For example,
if a National list of pesticide applicators can be
-------
constructed with certain known demographic characteristics
and pesticide usage characteristics by types of application
work and chemicals used, stratified random samples can
zero in to target specific areas of interest for research.
The cost of constructing such a master sampling frame would
be prohibitive for any one private organization
contemplating a National data collection effort. But, the
statistics developed would be more accurate and reliable
from a statistical stand point.
The question is: Is this a good idea?
If such a sampling frame was constructed, would your
organization use it to collect more/better data on pesticide
usage? If used, would it result in a savings in your market
research budget? Would it enable better and safer
introduction of pesticide products? Would producing more
reliable data support the goal of overall pesticide exposure
reduction?
You reply and further discussion is encouraged. If there is
enough industry support, I am willing to propose this to EPA
management in the Pesticides Office as a project. You may
want to communicate what specific non-agricultural pesticide
usage data collection efforts are underway or being
contemplated that may lend itself to using such a master
sampling frame. Would use of such a sampling frame result
in reduced costs for your organization? How much of a
savings would this be on an annual basis?
You may reply directly to:
Goozner.Alan@epamail.epa.gov
Alan R. Goozner, Statistician
USEPA, OPPTS/OPP/BEAD/EAB
-------
Estimating Dietary Exposure to Pesticide Residues
Table of contents
1 Author Ed Brandt, Economist
2. Abstract:
3 Statement of problem and approach
a. Increased need for measures of aggregate exposure
b Limitations of existing residue monitoring programs
c. Government Performance and Results Act of 1993 requires quantitative measures
to define goals and objectives
4. Suggested measurements for the goal of safe food related to Pesticides
a. Current measures have examined impacts as an indicator of outcomes since so
many factors in addition to pesticide exposures influence national health statitics.
b. The following table provides a schematic of the types of measures proposed for
each effect level.
c. Defining a measure of average annual dietary exposure
d. Basis for estimating average residue per sample
5. Findings of Statistical analyses
a. Descriptive Summaries of residues by pesticide and crop
b. There is general agreement in the priority ranking between POP and FDA data for
both chemicals and crops, i.e., same chemicals and crops rank the highest with
respect to residue exposure.
c Chemicals not included in either PDP or FDA account for 70% of agricultural
pesticide active ingredient use, but much of the poundage is represented by
herbicides and fumigants which would normally not be found.
d. Correlation between PDP and FDA average residue per sample
e. Correlation among crops within PDP and within FDA (correlation matrix is
in appendices)
6. Suggestions to improve existing programs to estimate national dietary exposure
a. Decrease sample sizes for pesticide residues that can be predicted from
historical data of residues and pesticide use
b. Base sample sizes to reduce existing weighted estimation errors. Weight
estimation error range by risk (amount/toxicity/endpoint of concern).
7. Future work
8. Appendicies
-------
Title: Estimating Dietary Exposure to Pesticide Residues
2. Author Ed Brandt, Economist
Economic Analysis Branch
Office of Pesticide Programs 7503 W
April 2, 1997 : EPA Statisticians Conference Poster session
3. Abstract: Several new laws have increased the need to estimate aggregate dietary
exposures. The Food Protection and Quality Act (FQPA) requires the examination of
aggregate exposures for pesticides likely to have additive effects (common modes of
action). The Government Performance and Results Act (GPRA) requires all government
agencies to reformat the budgeting process to connect measures of program outputs to
eventual environmental outcomes . Methodology and results to date are reported
concerning the consistency between two major residue monitoring programs, critical data
gaps and approaches to future data collection.
4. Statement of problem and approach
a. Increased need for measures of aggregate exposure
i. The importance of a consistent set of residue estimates across pesticides
has grown with the passage of FQPA. Previously, decision making for a
pesticide focused on whether the residues for the individual pesticide are
acceptable.
ii. The need for a national data base on residue data was recommended by the
National Academy of Sciences but funding for development has not yet
been received.
b. Limitations of existing residue monitoring programs
Two major residue monitoring programs are with USDA and FDA The
USDA's Pesticide Data Program (POP) was implemented in May 1991, to
provide data on pesticide residues in food to support exposure analyses
conducted by EPA in the registration of pesticides.
(1) Principal goal is to measure food safety for vulnerable populations
(2) 1992 to 1995 for selected crops and pesticides -(15 crops and 65
pesticides by 1995) for high consumption to infants and children
-------
and potentially riskier pesticides based on existing tox/exposure
data
(3) capture residues most related to actual consumption, i e, oranges
include pulp only. The skin is excluded
(4) probability based sample selection at the latest point of distribution.
ii. FDA residue monitoring includes a Surveillance and a Compliance
program. Surveillance data not specifically targeted toward known
problems of misuse so it tends to be more representative than the
Compliance data program which does target producers with past problems.
iii. The Surveillance program has limitations when used to estimate dietary
exposure.
(1) Primary role is prevention of illegal residues (over tolerance or no
tolerance). The watchdog role limits the flexibility to optimize
sampling for estimating dietary exposure alone
(2) Program limited by need to seize shipment in 24 hours if found to
be violative. Limits ability to measure residues downstream in the
distribution system (post harvest applications) since grower
identify is lost.
(3) Monitoring programs designed primarily for enforcement (to ensure
the absence of illegal residues) results in small sample sizes on
important commodities of high dietary consumption.
(4) Some chemicals not picked up by multi residue methods are
omitted altogether because of the incremental costs of inclusion.
c. Government Performance and Results Act of 1993 requires quantitative measures
to define goals and objectives
i. Programs must develop plans which connect program outputs to
objectives.
5. Suggested measurements for the goal of safe food related to Pesticides
a. Current measures have examined impacts as an indicator of outcomes since so
many factors in addition to pesticide exposures influence national health statitics.
b. The following table provides a schematic of the types of measures proposed for
each effect level.
-------
Effect level
Outcomes
Impacts
Outputs
Items to measure
cancer(s), neurotoxic effects, endocrine
disruption, other toxic effects
dietary exposure - residues on food
New registrations
Review of existing registrations
Measures
national health
statistics
residue levels
percent detects
pesticide use
number and type
Defining a measure of average annual dietary exposure
i. Limit analysis to variability of an annual national average that is appropriate
for lifetime assessments. Not appropriate for an acute or subchronic
analysis.
.
.
Expected exposure for a residue for chemical x on crop y is a function of
the probability of detection multiplied by probability of residue level given
detection.
= avg residue per sample*dietary consumption
( 1 )
avg. residue= Prob (any detectable residue on crop x)*Expected
residue given detect
These two variables can be combined into a single distribution of the
expected residue per sample. Thus, given 1,000 samples of chemical x on
crop y, there is an expected residue per sample and a probability
distribution of the sample mean.
Basis for estimating average residue per sample
i. The mean and variance of the sampling distribution could be determined
by knowing the probability of detection ( binomial distribution on detects)
with a log normal distribution of residues (log normal fits residue data the
best, consistent with the constant degradation function modeled by a log
normal.
ii. One would expect that percent detect would correlate with percent crop
treated, but this is not the case. Other factors, such as time of application,
-------
pesticide formulation with stickers and adherents, degradation rates,
weather etc. are thought to be important too. More work is needed on the
factors which most affect probability of detection.
iii. There are several alternative ways used to calculate the estimation error of
the true residue amount per sample.
(1) From probability of detection and residue distribution given
detection. A problem with this approach is that these two variables
are not independent. Percent detect is significantly correlated to the
residue level and is not correlated with the percent of use
(2) Based on variance of average residue per sample over time.
Estimating standard error by sample size and average residue level
for each year. Estimated mean is weighted by sample size for each
year using a weighted variance estimate.
(3) Calculating percentiles, or in the case of only four years of data
analyzed, the range of average residue per sample.
6. Findings of Statistical analyses
a. Descriptive Summaries of residues by pesticide and crop
i. Method 2 and 3 have been calculated but only method 3 is used to
construct a table of ranges.
(1) It is easier to understand, does not require assumptions about
homogeneity of variances and distribution form, and is closest to
existing methods for estimating upper ranges of residue.
(2) Tables are provided in the appendix which summarizes the
estimation of average residue per sample per crop.
ii. Analysis of variance indicates that compared to the variance in residue
levels among chemicals and crops, there is not a significant difference
between years for the same chemical and crop. This makes pooling data
across years more appealing to do.
-------
iii. Average residue is further adjusted by a sealer, the average intake per year
for infants and children and again for women of childbearing age. Since
these tables are rather lengthy, information is summarized again
aggregating on either chemical or crop Variance estimates at an aggregate
level have not yet been attempted.
b. There is general agreement in the priority ranking between PDF and FDA data for
both chemicals and crops, i.e., same chemicals and crops rank the highest with
respect to residue exposure.
i. Differences do exist because of food preparation and sampling as well as
timing of sampling, for example, FDA residues for citrus are higher than
for PDF, because FDA includes the skin. PDF residues are significantly
higher for pesticides that are applied during long term storage (root crops
for example)
ii. Post harvest treatments account for exposure far in excess of the pounds
applied relative to other crops. The majority of post harvest applications
are used to treat fungal diseases on tree fruits and vegetables. Insecticides
are used post harvest for grain storage. Growth regulators are applied to
stored root crops (potatoes) to prevent sprouting.
c. Chemicals not included in either PDF or FDA account for 70% of agricultural
pesticide active ingredient use, but much of the poundage is represented by
herbicides and fumigants which would normally not be found.
i. The quantity of pesticide use, in Ibs active ingredient, has little relation to
dietary exposure
ii. Fungicides and Insecticides account for most of residues yet Herbicides
have the highest use. Harvest aids and growth regulators also account for
high residue levels but the number of pesticides in this category is small.
d. Correlation between PDF and FDA average residue per sample
i. The number of observations (or cases) is defined as pesticides which have
PDF and FDA residue data for same crop and sample size exceeds 100
The 100 sample limit is the general rule of thumb used by residue
chemistry.
ii. Intercept set to zero to estimate ratio of PDF to FDA. This reduces the
loss of one degree of freedom for the intercept estimate as well as more
-------
directly measures the ratio of residues- or multiple between the two.
iii. Key factors affecting estimated ratio of PDF to FDA residue
(a) Portion of product sampled (edible vs. total)
(b) Time of sample collection-including late post harvest
applications that occur later in the retail distribution chain
(c) pesticide action, disposition of residues, and systemic
activity which results in plant uptake of the pesticide.
Estimated Ratio of Residues between POP and FDA
Fungicides Only
FUNGICIDES
APPLES
BANANA
CELERY
CARROT
GREEN BEANS
GRAPES
LETTUCE
ORANGES
PEACHES
POTATOES
Estimated S i g n i f
ratio level
PDP/FDA
1.65
0.04
0.07
17.32
0.15
0.32
014
0.04
2.70
2.36
0.01
0.02
0.25
0.27
0.02
0.16
0.14
0.01
003
0.00
R Square
82%
100%
57%
53%
95%
43%
95%
100%
71%
100%
Cases
(obs)
6
2
3
3
3
5
2
2
Factors affecting multiple
2 post harvest pesticides- extreme pts
FDA includes peel; PDP does not
extreme points- little correlation
extreme points- little correlation
FDA includes skin, PDP pulp only
5 post harvest pesticide use
3 (post harvest pesticide use
Insecticides Only
INSECTICIDES
APPLES
BROCCOLI
CELERY
CARROT
GREEN
BEANS
GRAPEFRUIT
GRAPES
ratio
' 4.45
0.28
1.83
0.53
3.12
0.02
2.20
Signifi
0.00
0.02
0.03
0.06
0.01
0.17
0.04
R
Square
64%
. 69%
75%
55%
65%
68%
45%
Cases
19
6
5
6
8
3
9
Possible explanations
to be determined
to be determined
to be determined
to be determined
to be determined
portion of fruit sampled
to be determined
6
-------
INSECTICIDES
LETTUCE
ORANGES
PEACHES
POTATOES
SPINACH
WHEAT
ratio
0.77
0.03
1.23
1.87
6.43
0.48
Signifi
0.00
0.00
0.00
0.02
0.00
0.00
R
Square
97%
95%
89%
54%
100%
94%
Cases
2
10
14
8
9
5
Possible explanations
to be determined
portion of fruit sampled
to be determined
to be determinec
to be determined
to be determined
High residue outliers
Fungicides
Crop
Apples
Banana
Celery
Grapes
Green beans
Lettuce
Oranges
Potatoes
Peaches
Carrots
Both POP and FDA
Thiabendazole,
Diphenylamine
Thiabendazole
CHLOROTHALONIL
Captan
Chlorthalonil
Iprodione
Thiabendazole
Thiabendazole
Iprodione and Dicloran
Iprodione
POP only
DICLORAN
Iprodione
and
Vincozolin
\
FDA only
Captan
Pentachlorbiphenyl
phenol PCB
Insecticides
Crop
Apples
Both POP and FDA
Propargite
POP only
FDA Only
Azinphos methyl
and carbaryl
-------
Crop
Oranges
wheat
spinach
Potatoes
Peaches
Lettuce
Grapes
Grapefruit
Green beans
Carrots
Celery
Broccoli
Both POP and FDA
Carbaryl
malathion.and
chlorpyrifos
permethrin
DDT
Carbaryl
Phosmet and
Parathion
Permethrin
Ethion
Acephate and
permethrin
Permethrin
POP only
Azinphos methyl
Dimethoate,
omethoate
Acephate
Diazinon
FDA Only
methidathion and
chlorpyrifos
Carbofuran
Parathion
Dicofol
Endosulfan
DDT
Methamidophos
e. Correlation among crops within PDP and within FDA ( correlation matrix is
in appendices)
i. Multivariate clustering remains to be done but based on a visual
examination of the correlation matrix, the following crops have high
correlations and appear to cluster.
(1) apples, grapefruit, oranges, bananas, broccoli
(2) peaches carrots grapes
(3) lettuce spinach
(4) potatoes oranges
-------
(5) Crops that do not correlate with any other crop
(a) Celery
(b) wheat
(c) sweet corn
(d) processed peas
ii. Crops within FDA based on 20 crops examined
(1) Crops that appear to cluster
(a) Tomatoes apples string beans peas cantaloupe
sweet pepper hot peppers carrots
(b) apple pear grapes potato orange cantaloupe
(c) peach cherry
(2) Crops that do not cluster
(a) Catfish
(b) wheat
(c) strawberries
7. Suggestions to improve existing programs to estimate national dietary exposure
a. Decrease sample sizes for pesticide residues that can be predicted from
historical data of residues and pesticide use
b. Base sample sizes to reduce existing weighted estimation errors. Weight
estimation error range by risk (amount/toxicity/endpoint of concern).
8. Future work
a. Developing "synthetic estimates" for pesticides/crop combinations with limited or
no data
i. Model residue measurements as influenced by portion of the food sampled,
time of sampling, decay rate of pesticide and metabolites, when applied,
systemic pesticides which are taken up by the plant, and extent and changes
in pesticide use
ii. Identify cases for which estimates cannot be made or are statistically weak
-------
iii. Evaluate the robustness of aggregate measures to identify significant
changes or trends in the level of pesticide residues for a given set of
chronic effects, i.e., cancer, neurotoxic, etc.
b. Additional sources toi nclude
i. Total diet study
ii. USDA's monitoring of meat milk and eggs
iii. state monitoring
c. Estimating sampling variance - individually and in aggregate for common
mechanisms
d. Clustering and other multivariate techniques to identify plausible
interrelationships of huge data sets
e. Developing relationships between pesticide use parameters, crop and
pesticide chemical/physical properties to improve regualtion of pesticides
9. Appendice
a. Crops listed in order of estimated dietary pesticide consumption of
children and women of child bearing age- FDA and PDF
b. Pesticides listed in order of estimated dietary pesticide consumption of
children and women of child bearing age- FDA and PDP
c. Agricultural pesticides not included in PDP or FDA from 1992 to 1995:
10
-------
Mathematical Geology
Volume 26, Number 3, April 1994
Contents
ARTICLES
Spectral Simulation of Multivariable Stationary Random Functions Using
Covariance Fourier Transforms 277
E. Pardo-Iguzquiza and M. Chica-Olmo
The Integral of the Semivariogram: A Powerful Method for Adjusting
the Semivariogram in Geostatistics 301
Frederick Delay and Ghislain de Marsily
Posterior Identification of Histograms Conditional to Local Data 323
Andre G. Journel and Wenlong Xu
Estimation of Background Levels of Contaminants 361
Anita Singh, Ashok K. Singh, and George Flat man
Comparative Performance of Indicator Algorithms for Modeling
Conditional Probability Distribution Functions 389
P. Goovaerts
BOOK REVIEW
Principles of Mathematical Geology by "A. B. Vistelius 413
Reviewed by C. John Mann
LETTERS TO THE EDITOR
Comments on "Cumulative Semivariogram Models of Regionalized
Variables" and "Standard Cumulative Scmivanograms of
Stationary Stochastic Processes and Regional Correlation"
by Zekai §en 4 j 5
Donald E. Myers
Reply to Comments by Donald E Myers 417
Zekai t'n
-------
Mathematical Geologv. Vol 26. Ni> 3. 1994
Estimation of Background Levels of Contaminants
Anita Singh,2 Ashok K. Singh,3 and George Flatman4
Samples front hu'iirdou* \\asie me in\e*iigtiinm* frequently conn- from two or more statistuttl
populations Asscs.*mem of "background leielsaftoHtaintnanisiunhca *igmficani problem Tin:
problem is being investigated m the U 5 Environmental Protection Agency's Environmental Mon-
itoring Systems Laboratory in La* Vega* This paper describes a stati*tnal approach for assessing
background levels from a tlaiawi The elevated \alues that may be a**o(iated with a plume or
contaminated area of the sue are separated from lower value* thai are assumed to represent
background levels It would be desirable 10 separate the /HO population* either spatially by Krigmg
the data or ihronologicall\ b\ a tune *eries anal\si\. provided an adequate number of sample*
\\ere properl\ collected in spate and/or time Unfortunately, quite often the data are 100 fe\\ in
number or too improperly deigned to support either spatial or time serie* analysts Regulation*
typically call for nothing more than the mean and standard deviation of the background di*tnbut ion
Tins paper provides a robust prohubili*iii approach for gaming this information from poorly col-
lected data that are not suitable for above-mentioned alternative approaches We assume that the
sue has some area.* unaffected b* the industrial atinii*. and that a subset of the given sample i*
from this clean part of the sni' We (an think of thi* multnariate data *ei ai (omingfrom two or
more populations the background population and the (ontaininated populations fnv//i var\ing
degrees ofi ontaminolion) U*int> mhu\i M-e*tinmiors ue de\ elop a proi edure to lt)gnal and biological applications
KEY WORDS: robust M-esnmaiors. influence funcuon. background esiimauon, robust confidence
limits, separation of mixed sample
INTRODUCTION
The United Slates Environmental Protection Agency (U S. EPA) encounters the
statistical problem of mixed samples from two or more populations in Resource
Conservation and Reclamation Act (RCRA) and Superfund Amendment and
'Received 2~* June 1993. jcccprcd ^ Mmcrnhci
•'Lockheed EnMmnmcnul S>sicms jnt) TcchnolniiiL^ Compjiu. 98(1 Kelly Johnson IJnvi- 1 .is
Vegas. Nevada 89119
'Department ol Mj|hcm.iiii:v. Unixcrsi^ nl \c\jj.. LJS Vegas. Nevada
"United Sl.nct Rnvirnnmc'nuil PniiCLtnfti -\;.'ini\ l.a«> Vt-jijs \i-v.id.t SW|
-------
362 Singh, Singh, and Ftatntan
Reauthorizalion Act (SARA) Evaluation and Remediation. This problem is being
considered at U.S. EPA's Environmental Monitoring and Systems Laboratory
at Las Vegas (EMSL-LV). This paper presents a solution from a probability
distribution-based method. A sample of concentration values of contaminants
from a Superfund site can be thought of as a mixed sample of background
concentration values plus ihe concentration values from a plume or plumes Ai
first glance, a statistical analyst could think that the mixed sample from a Su-
perfund site could be separated spatially by a Kngmg analysis However, these
statistical techniques need data obtained using appropriate statistical designs
Unfortunately, regulatory life is not simple. Often only too few samples or
improperly spaced data for spatial or time series analysis are available and the
required regulatory information is only the mean and standard deviation of the
distribution(s). This paper provides a robust probabilistic approach for gaining
this information from data thai are inadequate for above-mentioned alternative
approaches.
The occurrence of mixture samples from two or more normal (lognormal)
populations has been well recognized in several applied areas of interest such
as biology, geology, medicine, reliability, and environmental science Several
classical partitioning methods are available in statistical literature. Sinclair (1976)
used normal probability plots for graphical partitioning of mixture samples in
mineral exploration studies. Holgersson and Jorner (1978) gave a good review
of various methods including graphical, maximum likelihood (MLE). nonlinear
least squares, and method of moments, Fowlkes (1979) performed extensive
simulations to compare several graphical methods including the usual histogram
method, the normal probability Q-Q plot, and the empirical cumulative distri-
bution function. The ability of these classical and graphical methods to identify
mixtures in samples is doubtful, especially if discordant observations arc also
present in these samples. Moreover, the detection of these mixtures becomes
extremely difficult in the presence of overlap among the component populations.
Campbell (1984) used robust methods 10 study the effect of anomalies on mixture
models. Recently Fleischauer and Korte (1990) used the point of inflection of
the normal probability plot to obtain an estimate of threshold background level
contamination.
The graphical display, unarguably, is one of the most powerful diagnostic-
tools in the hands of a researcher. However, a subiective estimate ol the point
of inflection obtained by looking at these graphs is questionable, especially when
more than two component populations are present. The overlap among ihc com-
ponent populations generally masks the point of inflection Moreover, the anom-
alous observations (if any) and the presence of sc\eral (unknown) component
populations can distort the Q-Q plot to such an extent that the resulting inflection
point estimates may not he reliable. If one>want*, to use the Q-Q plot*, .iv .1
partitioning method, a stcpwisc procedure is dcsir.ihlc Thi- proposed
-------
Background Levels of Contaminants 363
procedure requires construction of a Q-Q plot at each step. Populations with
higher concentration levels will be identified first. Each step identifies a sample
from a different population. In this article, we propose robust procedures to
partition a given mixture sample into its component populations. Data-appraised
robust confidence limits for the individual observations placed on the same
Q-Q plot produce a more precise estimate of the cutoff point between two
adjacent populations This reduces the subjectivity involved m choosing the
inflection point from the graph. Several simulated as well as real-life examples
have been discussed to illustrate these procedures. The mathematical formulation
is given in the second section, the third section has all the examples, and finally,
there is a summary of our conclusions and recommendations.
MATHEMATICAL FORMULATION
The density function/M(jc) of a mixture population wuh (g + 1) unknown
component populations is given by
K
fM(x) = £ p,f,(x; n,; a,) (1)
where g > 1. and /(jr. p,, a,) is the density function of the ah population I"!,.
assumed to be normally (or lognormally) distnbuted with unknown mean and
standard deviation (SD) p, and a, respectively, and p, is the unknown mixture
proportion for FT,; / = 0, 1. 2, . . . , g, with Dp, = 1. Throughout the rest of
the article, it has been assumed that the researcher has performed a suitable data
transformation to achieve normality or near-normality (e g.. log-transformation
for positively skewed data) before proceeding with the following algorithm.
Given a sample JT,, jc2, - - - , -*„ of size n from this mixture model, the objective
is to resolve it into its component populations, i.e., find n, > 0 such that n,
observations belong to II, with £f=0 n, + n£ = n. Here nF > 0 is the number
of extreme unusual observations which stand alone and do not belong to any of
the given (g + 1) populations. The subsample of size n, then can be used to
estimate the parameters of population FI, and its proportion pr t = 0. 1.
g. The normal probability Q-Q plot is generally used to get an idea about .t,'.
the number of populations present. However, inevitable overlap among the
component populations and/or the presence of anomalous observations generally
distort the Q-Q plot significantly, resulting in masking of some of the component
populations, especially those populations which have lower concentration levels
Traditionally, theoretical quantiles from u standard normal distribution arc plot-
ted along the x-axis in a typical Q-Q plot. However, in this article, we use the
theoretical quantilcs_from N(x, s) for the classical Q-Q plot and the thcorctic;il
quantiles from /V(.v*. v*) for the robust Q-Q plot, here » is ihc sample inc.in
-------
364 Singh, Singh, and Flatman
s the sample standard deviation, and x *, s*, (defined later in this paper), rep-
resent their robust versions, respectively.
The initial step in the process is to identify nF > 0 highly contaminated
observations, which stand alone by themselves on a normal probability plot.
These observations may require individual treatment and/or further investigation
and should not be included in the subsequent partitioning of the underlying
mixture sample. Due to masking effects, the exclusion of these observations
from subsequent analysis may be required to identify intermediate populations.
This does not mean at all that these observations have been thrown away. The
new Q-Q plot will be drawn using the remaining n - nF observations. This
Q-Q plot will reveal if any representative samples from populations with higher
concentrations, namely FI,,, 11^. , etc. are present. Robust confidence limits for
the individual observation x, drawn on these Q-Q plots provide an objective
(rather than subjective) estimate of the cut-off point between two adjacent pop-
ulations. The process is repeated until all of the observations have been classified
into the various component populations. Each time a population is identified, a
new Q-Q plot with the new robust limits is drawn using only the unclassified
observations. This process provides a good estimate of the number of remaining
populations that need to be identified. At each step, these robust limits corre-
spond to the most dominant population present at that step. If there is such a
population present, then this population may be identified first, using these
robust limits as the estimates of its cutoff points from the adjacent populations.
The separation between two populations is probably most difficult in the pres-
ence of overlap. The overlapping populations (if any) should be identified in
the very end. All these ideas have been explained by means of several examples
presented in the following section.
Here, flo represents the background population and n,; i = 1,2 ..... g
represents contaminated parts of the site with varying degrees of contamination
levels in ascending order of magnitude, with II,, representing the population with
highest contamination levels. A recently proposed redescending PROP (Singh,
1994) influence function used here to identify the discordant observations is
given by
= da exp <-( - „)) if d > dl( (2)
where Jf, is the (a) 100% critical value of the distribution of ; = (x, - .v)2/
5~ which is distributed as an (n — l)2/3(l/2, (// - 2)/2)//i. where n here rep-
resents the number of observations used in t-fle computation of .r and" A
It should be noticed that the number of obs?crvaiion.s used will be updated
each ttme the process is repeated. For the initial iteration .ill of the n observations
will be used, next n - n, will be used, and then the remaining /; - n, ol
-------
Background Levels of Contaminants 36S
observations classified into Ug will be used, and so on. Each observation is
assigned some weight according to its extremeness in either of the two tails of
the distribution. These weights provide a very effective way of obtaining esti-
mates of the degrees of freedom needed to compute the individual robust con-
fidence limits at each step. The resulting M-estimators for a given sample are:
J* =
and
v = E2(() - 1. The robustified distances d*2 = (x, - x*)2/s*: follow a
»/2/?(l/2, (v - l)/2)/(»/ + 1) distribution. The two-sided robust limits for the
individual observation x, are given by the following probability statement-
P(LTL < x, < UTL) = I - a, / = 1, 2, . . n (4)
where LTL = x* - s*d* and UTL = J* + s*d%, x* and 5* are given by
(3), and d*2 is the (a) 100% critical value from the distribution of d*2. The
one-sided (1 - a) 100% robust limit for individual x, can be obtained similarly
The index /' runs over the number of observations used in a typical step. Once
the nF extreme observations have been identified and removed from the data
set, new Q-Q plot using the rest of the n - nF observations is drawn It should
be emphasized that the limits used here are for the individual observations x,
and riot for the population mean ^, as is sometimes done in practice For ex-
ample, in the context of background level estimation, individual observations
are being compared (and not the population mean ji)*to these threshold limits.
Therefore, these limits should be computed using the appropriate interval. A
brief description of the various intervals and limits is given in Singh and No-
cerino (1993).
The robust limits given by (4) when drawn on the same probability plot
provide a good initial estimate of the cutoff point between the adjacent popu-
lations. An estimate of the cutoff point CK between populations Uv and nv. ,
will be obtained first from this Q-Q plot. All of the unclassified observations
x > cg (not including the nf extreme observations) will be used to obtain the
robust interval Ig = (LTLK,UTLJ for the #th population YIK. All of the unclas-
sified observations belonging to this interval will be declared as coming from
11,,. Next, all the observations x > LTLt will be deleted from the subsequent
partitioning and a new Q-Q plot with the new robust limns will be obtained
using the remaining observations An estimate of Cv _ , , the cutoff point between
populations Uv_2 and FIV . ,. will be obtained from this plot. All unclassified
observations .v > £•„ , 'will he used 10 obtain the robust boundaries given In
-------
Singh, Singh, and Flatman
/,,., = (LTLg_i,VTLg_i} for the (g - l)th population II,_,. All observations
belonging to 7^ _ , will be declared as coming from R, _ ,. In case of any overlap
between n,., and n,. i.e., when LTLg < t/TL,.,, observations in the range
(LTLg,UTLK.i) can be assigned to either of the two populations FI or U
However, the PROP influence function (2) used in the derivation o'f the robust
limits given by (4) minimizes the overlap between the estimates for the two
adjacent populations by down-weighing the extreme observations appropriately
in either of the two tails of the distribution of the underlying populations. More-
over, when the two adjacent populations have disjoint boundaries, the obser-
vations (if any) belonging to this unclaimed region (LTL,,UTL,.,) should be
assigned to their nearest neighbor.
This process will be repeated as many times as required until all of the
observations have been classified into their respective populations. At the final
step, the threshold values for the background population n0 will be estimated.
The remaining unclassified observations will be used to estimate UTLo, which
is given by the one-sided probability statement:
P(x, < UTLo) = 1 - a
where UTL^ can be obtained using (4) by replacing a with 2 * a
Observations smaller than UTLo will be declared as coming from n0 As
before, if there is overlap between UQ and n,, i.e.. LTL{ < UT^, then obser-
vations in the overlapping range (L7I,, UTl^) can be assigned to either of the
two populations HO or II,. Once the boundaries for the various component
populations have been established, the complete classification procedure can
now be described in various steps as follows:
1. First of all, identify all of the extreme observations nE > 0 These will
not be used in any of the subsequent partitioning of the underlying sample.
2. Next define a, = no. of observations 6 the overlapping region
(LTL,,UTL, _ ,) between populations U, _ , and n,, with a, , , > 0 of these eR
and a, , > 0 of these ell,, / = 1, 2 g. and b, = no of observations e the
unclaimed region (UTL,_lt LTL,) between populat.ons 0,_ , and U with b
> Oof these fn, and/?,_,, > Oof these eH,.,,/ = 1. 2. K '
3. Identify all of the non-overlapping observations LTLV) + u + h
6 Once, the number (K + I) of populations prcscm. .i.ul the respective
-------
Background Levels of Contaminants
367
subsample sizes *„ ,- 0, 1 g have been est.mated, the (g + 1, popUiatlOn
proportions are estimated using the following formula:
p, = n,l(n - nF), i = 0. 1, e
o
7 Finally, using these n, observations, the robust estimates of the param-
eters of population n,, / = 0, 1 g w,ll be obtained using (3).
In order to illustrate the proposed statistical procedure, we now present
some simulated as well as real examples.
EXAMPLES AND DISCUSSION
The procedure described here has been applied to two s.mulated datasets
as well as a real dataset from the Sacramento Army Depot Superfund Site from
Keg.on 9 EPA. There were six primary contaminants at the Sacramento Armv
£r M TT^ Ske: Cadm'Um (Cd)' Chrom'^ (CD. Copper (Cu). Lead
(Pb), N,ckel (N,); and Zinc (Zn). A total of 45 samples were analyzed for the
above contaminants, six from uncontaminated regions of the site; which will be
referred to as the site-specific background sample; and 39 from contaminated
regions of the site. Moreover, the procedure outlined here has been used on a
simulated data set representing a sample from a m.xture of two lognormal pop-
ulations. Three simulated data sets and the Sacramento Army Depot Superfund
Site data set are given in the Appendix. In the following, all letters w.th * as a
superscript represent robust estimates, else, they are the classical maximum
likelihood estimates (MLEs). All the computations have been done using the
statistical software package SCOUT developed by the Lockheed Environmental
Systems & Technologies Company (LESAT) for the U.S EPA.
Example 1. A mixture sample of size 100 was generated from two rea-
sonably separated normal populations with 90% (p, = 0.9) observations coming
from a normal population 1^ with mean 10 and SD 3 - N(\Q 3) and 10% (p
- 01) observations coming from FI, - N(21, 8). Observations for the firsi
sample ranged from 2.485 to 18.598, whereas observations for the second sam-
ple ranged from 9.489 to 43.998, indicating some overlap between the'two
populations. This is the data set no 1, given m the Appendix The normal
probability Q-Q plots for the whole data set with the classical and the robust
limits placed on them are given in Figs, la and b, respectively. From both
graphs ,t is obvious that there are two populations present. The upper robus,
limit 15.99 for the dominating population 00 provides an estimate of the cuioH
point c, between the two populations (Fig. Ib). Next, using all observat.ons >
<,- the 95% robust one-sided lower boundary for the population 0, wiih-hiuhcr
concentrations ,.s given by LTL, = 17.5 (Fig. k) Therefore, all of the ohscr-
vatums greater than LTL, arc classified as commg from 0, Us.n, ,hc rcmamm-
-------
368
Singh, Singh, and Flatman
439912
31 2331
I
II 5*779
1 7631
. - 31
91* Win** ITTL - 7» 7
Wgni^ LTL - 3 33
LTL -
-69*24
2.3571
13 0*67
71 £312
Fig. la. Mixture of /V(10, 3) and /V(28. 7)-classical Q-Q plot
31 1557
43.99C
336199
737417
I? 16)
99% Mulmum IfTL - IV99
91* W«m4i^ UTL - M
2 4K2
3 9169
69361
'(VI
95* W.nam LTL - 5 32
99» M*iii»»m LTL - 3 97
9 7353 17 91O
D*«ributiort)
(100)
Fig. Ib. Mixture of /V(10. 3) and A/(27. 8)-rohusi Q-Q ploi
-------
Background Levels of Contaminants
369
O.ffC
370734
300JZ5
73.0797
16 1069
93XLn.bl7.XOI
h 10 TOM
DcrMo* h I.39M
10
3.) (.0
Control O«l (ln«io«)
15
110
Fig. Ic. Mixture of N(10, 3) and N(21, 8)-robust chart-A/(27, 8) contam. sample.
99ft MulHwm UTL - 15.97
17 7015
I
979*
5 1906
9J* Wuiiln( inr - M 57
*(")
.
95«
- 5 33
99» Muimum LTL - 3 »l
1.9H6
Fig. Id. Mixture of A/(l(). 3) and N(2~l, 8)-rohust Q-Q ploi unclassified sample
697*3 9.WI9 IJ 9555
Theoretical Qumila
-------
370
Singh, Singh, and Flatman
16 IOW
IJ 7015
!•
796
«* tm.b 131357
•pi)
n J
I 0
13 M
«3 J
67 7?
900
Fig. le. Mixture of A'dO. 3) and N(21. 8)-robust chan background A'dO. 3) sample
unclassified observations (smaller than 17.5). the Q-Q plot with the robust limits
placed on it is shown in Fig. Id. From this figure, it is obvious that there is
only one population left. The 95% one-sided robust upper boundary c/7L0 =
13.84 for HO is given in Fig. le. All observations less than 13.84 are classified
as coming from O0. Observations in the range (UTL^. L7L,) will be assigned
to their nearest neighbor. Thus the observation 16.107 (the only observation in
this range with bi = 1) will be assigned to II,. Two observations from IV
namely, 16.107 and 18.598 are misclassified into II,, and one observation 9.489
of n, has been misclassified into n^. All of the relevant estimates of the pop-
ulation parameters after the final classification are summarized in Table I.
Example 2. In this simulated example, we consider a three population
mixture model with ten observations from an TV(20, 4) population, 100 from an
A'(O.l) population, and 30 from an /V(5. 1). Moreover, in order to show the
extent of distortion of the Q-Q plot by the presence of extreme observations,
two extreme observations from an /V(100,10) are also included in this mixed
sample. This is data set no. 2 in the Appendix. The classical and the robust
Q-Q plots using all of the.1^2 observations are given irv Figs 2a and b, re-
spectively. Both graphs identify the two extreme observations. Moreover, both
graphs give indications of the presence of a sample from a population with
higher concentrations (observations no. 1-10). However, due to the large van-
-------
lackground Levels of Contaminants
371
123 TO
TJTI
5 U97
375
99* Mulmmn UTL - 43 T7
(141)
(MI)
Wimlnt LTL - Z3 3
?9«
LTL - J4 3g
74 0144
"
-------
372
Singh, Singh, and Flatman
\9.ni
I
1.7477
17.31
»* Mnhnrni UTL-I7.M
Wvntnf UTL - 13.19
'(••I
(lit)
LTL - -in
Mnlmm LTL - -II
Theoretical QmftSta (lluimmt Dtarfewton)
Fig. 2c. Mixture of N(Q, 1), N(5, 1), N(20, 4)-classical Q-Q plot/two extremes removed.
71 7159
I
n 6Mi
'(3)
(10)
/(HI)
Fig. 2d. Mixture of N(0, 1), N(5, 1), A/(20, 4)-robust Q-Q plot/two extremes removed.
-------
Background Levels of Contaminants
373
If.TW
25.IIJ4
71 1261
17 MOJ
USS3I
1.0
'(I)
LTLh I3O38
4414
Standard Ocvulion h 4 1)74
(in)
3 25 5.5 7 75
Control Chart (IntfiriAaal dncrvxionf)
in 0
Fig. 2e. Mixture of N(0, 1), A/(5. 1), A/(20, 4)-robust chart-contam. sample A/(20, 4).
1
4 J7O4
706Q
O JVI
J 5363
99«
J J2
140
(M4)
-------
374
Singh, Singh, and Flatman
* 5265
tm.lt 4 2511
l.TLU ) IK*
(H)
(IT)
4.7095
Stxndin) rV-»U63
99*
UTT- 1 3
'(43)
- l.7«
f*)
|J TOT
7 W49
99« Muimara 1 Tl. -21
00014
Qawdlca (NomJ D««rtMK>n)
7 XT2I
Fig. 2h. Mixture of A/((). I). ,V(5. I). A'CO. 4)-rohust Q-Q plot unclassified data
-------
Background Levels of Contaminants
375
1 4*41
1
-OCT76
I 7111
'(I)
"(»)
(I)
UTLli I.4«JI *|J4)
—I,-.
(Jll
(KM)
-0 ODI4
Ovitiran H 0 WT9
2 5363
I 0
J6 0 510
Control Chisl (lndi«Mu*l Obvrvilkini)
160
Fig. 2i. Mixture of A'(0. I). /V(5. I). N(20, 4)-robust chart-background N(0, 1).
Table I
Popn.
IIo
n,
95%
UTL^
LTL,
Limits
= 13.84
= 17.50
a,, b,
bn, = 5
bt , = 1
n,
89
1 1
P,
.89
.11
X*
9.91
30.71
s*
2.36
8.40
t
9.78
30.71
.V
2.59
8.40
ation in the data set, the intermediate population is masked in Fig. 2a, whereas.
Fig. 2b gives a clear indication of the presence of at least three populations.
Figures 2c and d represent the same graphs after removal of the two nE = 2
extreme observations. From Fig. 2c, one can wrongly conclude that there are
two populations present with observation no. 118 = 6.67 as the inflection point.
However, this is not the case here, as is obvious from Fig. 2d. Using observation
no. 10 as the cutoff c7 = 16.41 between populations n, and n2, the classical
as well as the robust (same) lower boundary for population FI2 is given in Fig
2e. All observations greater than LTL^ = 13.85 will be classified into 11,. A
new Q-Q plot using the remaining unclassified observations is given in Fig. 2f.
which leads to r, = 2.34 as the cutoff point between populations 0, and no. Ii
-------
376 Singh, Singh, and Flatman
should be noticed that the robust procedure used here has produced the same
cutoff point of 2.34 between populations UQ and II,, as can be seen from Fig.
2b, d, and f. Using all unclassified observations >2.34, the two-sided 95%
robust boundary for population n, is (3.16, 6.26), as given in Fig. 2g. Next all
observations less than 3.16 have been used to 'draw the robust Q-Q plot, given
in Fig. 2h. From this graph it is obvious that there is only one population UQ
left at this stage. Using these observations, the 95% robust upper boundary for
the background population is given by UTL^ = 1.485. All observations less
than this threshold will be classified into the background population HQ. Once
the boundaries have been set, observations in the overlapping and the unclaimed
regions have been classified according to rules described above. AH of the
relevant statistics using the final classification are summarized in Table II.
Example 3. In this example, we consider the data set from a Superfund
Site with six samples known to come from the background population (obser-
vations 33-38). As mentioned earlier, the site was sampled for six contaminants,
but the results for cadmium concentrations alone are included in this article.
The data for the 45 collected samples (background samples included) is given
in data set no. 3 in the Appendix.
The average site-specific background level of a contaminant plays an im-
portant role in remediation decisions. As such, the estimation of the average
site-specific background of a contaminant is an important problem. We now
show the results obtained by using the proposed procedure on Cadmium con-
centrations. The classical as well as the robust Q-Q plots for cadmium are given
in Figs. 3a and b, respectively. From these figures, it is obvious thai observation
nos. 15, 9, 22, and 21 represent extremely contaminated samples and should
be treated individually. From Fig. 3b, there is a clear indication of the presence
of at least three populations. Figure 3c represents the robust Q-Q plot after
removal of these nE = 4 extremes, which also indicates the presence of at least
three populations. Using c3 = 260.27 (observation no. 19 after the removal of
extremes) as the cutoff point between populations II2 and n3, all observations
greater than c3 will be used to estimate the parameters of fI3, the population
with high concentrations. Figure 3d indicates that these observations are from
Table II
II 95% Limits a,, h. n, p, x* s* .1 s
Ho
II,
11:
(i.
UTLo
LTLt
UTL,
LTL,
= \ 48
= 3 16
= 626
= 1 3 85
_
bo, =
ft, , =
b, , =
fe, j =
.
5
3
1
0
98
32
10
i
1
23
07
-0
4
21
03
71
45
089
081
4 86
-007
4 59
21 45
—
097
1 06
4 86
—
-------
Background Levels of Contaminants
377
TJOOO
I5IJ 33
60 071
H669
MUIBM UTL - IW7 47
Wtralitiim. - 1134 13
'PI)
**rofTl»caf Ouanlihc) (f^oonaj
•(71)
Fig. 3b. Cadmium concentration form a superfund site-rohusi 0 Q plot
-------
378
Singh, Singh, and Flatman
4*0774
I
2t2 379
103 TJ)
74 S£2
99* Miilni.i. tm. - 207.79
74 *SJ
-4 1991
'(17)
LTX -
99* MnlnMM LTL - 74 *6
117 I7f
W7 791
I DtaribMioi)
Fig. 3c. Cd cone, from a superfund site-robust Q-Q plot/extremes removed
777.023
575 404
3
I
473 716
777 IM
170
99« l»U»li»»»n> ITTL - 727.07
Wimlnf UTL - 674 4
170 549
'III
'(I)
"(«) MO)
95« Wtramt LTL - 173 II
99% Mulmum LTV - 170
7T7 IM 47^ 7*6
Theomlcml OuvMlIn (Normal Oftribvi>on)
T7' "7 <
Fig. 3d. Cd. cone, for a superfund site-robust Q-Q plot-high cone
-------
Background Levels of Contaminants
379
641 66
JJ7 773
423 7*6
114
(I)
m
High Cone.
7»6
| J9 351
(10)
J.J 7
Control Char) (Inifl tbW Otnemtkm)
100
Fig. 3e. Cd. cone, for a superfund site-robust chart-high cone.
Ml 771
99* Minimum UTL - TO I 73
n|
Fig. 3f. Cd. cone, lor a superfund site-robust Q-Q plot-high cone removed
-------
380
Singh, Singh, and Flatman
164
i
101
1276)4
109 166
•to
95* irrth m 623
(i)
W%LTLh 109. 166
(II)
(i)
A«rr«rr a l?0 >*«
It 6.1ICT
J 75 65
COMFB! O»t q. < 1 1*^
9 25
12 0
Fig. 3g. Cd. cone, for a superfund site-robust Q-Q chart-intermediate cone.
J58573
16 174J
3
i
1
71 0963
5 JIC
1046
99»Mulinnn UTL - 57.63
95 * Wwnlnf UTL - 46 11
(6)
(I)
(12)
LTL -
99« Mulmvnt LT1 - 10 46
1046
5 3ir2 21 0%3
Thctxntcml QiwXJha (Nomml
Fig. 3h. Cd. cone, for a superfund site-robust Q-Q plot-highest 2 cone removed
-------
Background Levels of Contaminants
381
«J9
!
J4TJ71
II 1X7
UTLH«I
•(I)
(I)
21 J7J6
I 0
•(1)
110)
•n»
Orrinbp h 3 1513
1 « tt T 11
Tmral ClMI (InM^terf ObwnrMlnm)
100
Fig. 3i. Cd cone for a superfund site-robust chart-intermediate cone
i: n
97125
3
i
TOJJ
J575
61
J
9i% IJTLIi IUMI
13)-
I'l
I 0
. I'l
IIO73J
Sundafd fVvUllonliO IA1I
'(7)
1II 1 0
( imnd Chart (Ixdl'Muil t*»m*ik»ii|
Fin- 3j. Cd cone lor a superfund Mtc-rohuM chart-h.ickprnund cone
-------
382
Singh, Singh, and Flatman
a single population. The one-sided lower 95% boundary for this population is
LTL^ = 205.91 as given in Fig. 3e.
A new robust Q-Q plot using only the unclassified observations is given
in Fig. 3f. There is a clear indication of the presence of three more populations.
The robust boundary (109.166, 131.623) given in Fig. 3g for the intermediate
population I12 is obtained using the top 12 observations of Fig. 3f, with c, =
111.60 as the cutoff point. Observation nos. 2, 4, and 5 (with b->. = 3) of Fig.
3g belong to the unclaimed region (UTL,. LTLJ and will be assigned to appro-
II N
II)
»771
• 71
in
10
4TJ
Fig. 3k. Cd cone for a superfund suc-rnbusi chart- known background cone
Table III
11
95% Limits
a,. b,
P.
II.,
11,
II:
II,
"/
UTL»
LTLt
UTLt
LTL.
UTL,
LTL,
= 1231
= 21 56
= 41.32
= 109 2
= 131 6
= 2059
—
b, , = 1
fc; , = 2
>>, , = 1
—
9
10
II
II
4
22
24
21
21
II 02
31 45
120 39
401 4
0
5
ft
150
86
55
32
82
8
32
128
401
„
66
78
07
y
3 85
7 19
IS II
1 M) S2
„
-------
Background Levels of Contaminants 3g3
priate populations using the nearest neighbor technique (see Table III). Next a
new Robust Q-Q plot using only the remaining unclassified observations is given
in Fig. 3h, giving a clear indication of the presence of two populations with the
cutoff point c, = 22.05 (observation no. 12 in Fig. 3h). The 95% robust bound-
ary - (21.576, 41.319) for population n,, using the top ten observations of
Fig. 3h, is given in Fig. 3i, with one observation belonging to the unclaimed
region ((/7L,, LTL,), with b, = 1. Finally, using the last nine observations,
the 95% upper threshold value for the background level contamination is £/7L,,
= 12.308 as can be seen in Fig. 3j. However, in this case, six samples from
the background were also available. The robust 95 % upper boundary using these
six background samples is given ,n Fig. 3k. The values in Figs. 3j and k are in
close agreement, establishing the correctness and validity of the procedure de-
scribed in this article. All relevant statistics after the final classification have
been summarized in Table III.
Example 4. In this example, we consider a simulated data set (given in
the Appendix) which consists of a mixture sample from two lognormal popu-
lations with some overlap. A sample of size 20 is obtained from a log #(0. 1)
population and a sample of ten is generated from a log N(4,'2). We use this
example to show the effectiveness of the proposed robust procedure in decom-
posing the mixture into component populations The classical Q-Q plots of the
untransformed and the log-transformed data are given in Figs. 4a and b re-
spectively. From F,g. 4a, ,t can be concluded that the sample is from a single
positively skewed population with observation no. 22 as an extreme observation.
This may lead the user to take the log-transformation From FiC. 4b. one can
conclude that the mixture sample comes from a lognormal distribution with
observation no. 22 to be slightly discordant. The corresponding robust Q-Q plot
before and after the log-transformation are given by Figs. 4c and d, respectively
Figure 4c suggests that there are more than one population present. Figure 4d
clearly separates the two underlying log-normal populations with cutoff point c,
- 1.61. All of the relevant statistics are summarized as follows in Table IV.
CONCLUSIONS AND RECOMMENDATIONS
The proposed robust procedure works quite effectively m classifying a
mixture sample into its component populations. In all of the examples discussed
here, the procedure described here classified the observations correctly into their
respective populations. When the data represent a mixture from lognormal pop-
ulations, the procedure based upon the classical MLE estimates may identify
some of these observations as anomalous However, the robust procedure de-
scribed here gives an indication thai there is more than one population present
(c.g . see Fig 4c) This m turn, forces ihc user to verify ihc distributional
iissumptions It is assumed that the user has some familiarity \iith symmetric
-------
«n MI
HU979
WJ 397
S3 715
17
MMTU" UTT - Jli.S
95* W train, UTL - 4 15 CM
95» Wtr»hg LTl - MO 23
17
-401 17
171
57 42*7 J«6 774
(Norm*] D4«r«Miaa|
516 OJ
Fig. 4a. Mixture of log yV(0. 1) and log /V(4. 2) classical Q-Q plot (umransformed).
4 XXI
I 7301
0 U7?
ITTL - « »J
W* WinOr^ UT1_ - 5 52
) 4J19
111,
4 1091
* OCP
j-. -*h. Mixlurc- of lo.j .Vf(). I i and loe ,V<4. 2) clasMcal (,» n ploi uransloniic
ili
-------
!
!
i
VII Ml
741 179
49 J 997
244 II)
UTL - J 13
-o iff » *•••••« UTL -
I 3117
lie* (Nonul Dt*r*»«to«)
7541
J IXM
Fig. 4c. Mixture of log /V(0. 1) and log A/(4, 2) robust Q-Q plot (untransformed).
4 1601 I
1
S24i |
99*
UTL - I 61
'5* Wanunf UTL - I 3)
0 7ZSI I
91*
09 *
LTL - -0^7
LTL - -0 T>
i :479
•> WJ
0 "I
o 32)8
0 <*>tn
6135
Fin. 4d. Mixliirc ol loe .V(0. I) and IOL- '.\'(4 Ti n.huM O Q plot llninstoniK-J )
-------
Singh, Singh, and Flatman
Table IV
Popn
11.,
11,
95%
(STL,,
in,
Limits a,, b.
= 1 143 -
= 1 419 -
»,
18
12
P.
06
04
i*
0 242
1 503
0
1
J*
563
324
0
3
I
187
688
0
1
i
643
58
and skewed distributions. It is the user's responsibility to achieve near-normality
(or at least symmetry) for each of the component populations before usme the
procedure described here. The robust procedure described here works quite
effectively in decomposing a mixture sample into its component lognormal pop-
ulations as well (see Fig. 4d). The stepwise procedure described here combines
the natural separation between the component populations. The sample from the
Sacramento Army Depot Superfund Site included a known site-specific back-
ground sample This, however, is not the case for many Superfund sites The
proposed statistical procedure will be a very useful tool for estimation of site-
specific background for such Superfund sites.
ACKNOWLEDGMENTS
The U.S. Environmental Protection Agency (EPA), through i.ts Office of
Research and Development (ORD). partially funded and collaborated in the
research described here It has been subjected to the Agency's peer review and
has been approved as an EPA publication The U S Government has a non-
exclusive, royalty-free license in and to any copyright covering this article The
authors wish to thank Ken Brown of U.S. EPA\EMSL-Las Vegas for providing
us the Superfund site data and for'helpful suggestions during the preparation of
this paper
APPENDIX
Datasi't I
Normal mixture generated from populations .V'dO. 3) and A'(27. 8) .90
observations are from N(\0. 3) and 10 are from A'(27. 8) 2 49. II 15 1() 47.
10 62. 12 65. 13 52. 11 02. 13 40. 9.50. 6 93 II 54. 6 83. 10 68. 10 38
8 16. 1057.602.649. 1098.625. 11 45. 12 31. 795. 13 89.987. 10 10.
1050. 11 95. 10 16. 11 09. 7 35. II 01. 1026. 12 06. 16 11. 1203. 12 62
1029. 14 63. 11 65, 13 13. 7 93. X 18. 11 11.7 95 8 15. 14 20. 7 99. |l 31.
9 63. 8 82. 8 42. -7 32. 18 59. 7 97. 6 43. 13 19 * 59 7 40. 12 71 8 .W
13 34. 8 34. 5 71. 8 14. 8 29. I 1 99. 11 23. 5 2h 9 04 7 12. 14 85 I I OS
-------
Background Levels of Contaminants 3g7
10 11, 11.01,9.57, 11 01, 12.25,7.93,4.48,9.13,6.58, 13.89,6.70, 1204.
7.69, 10.84, 9.13, 6.84, 10.33, 33.38, 23.49, 30.01, 37.23, 37.66, 31 27,
34.94, 9.48, 31.08,43.99.
Daiasei 2:
Normal mixture with ten observations from /V(20, 4), 100 from /V(0, 1).
30 from a N(5, 1), and two extreme observations from /V(100, 10) 18 12.
16.60. 27.60, 23.27, 29.80. 18.24, 24.40, 23.04, 16.98, 16.41. 1 77. 2 38,
-022, -0.35, -0.40, 1.00, -0.01, -0.16, 1.44, -1.03, -1.84, 0.94.
-031, -103, 1.19, -0.14, -1.42, -0.89, -0.23,0.18, -096, -0.17.
0 06, 1.62, -0.03, -0.25, 0.30, 2.48, -0.02, 1.23. 0.10, 1.13, -0.69, 0 72,
-0.86.0 11, 1 16.075,027, -1.40,0.29, -0.52,2.47, 1.01, 1 89. -058.
020. -0.66, -105, -0.10. 1.44,0.72,0.33, 1.06.048, -069. -048.
-I 13. -067, 0.12, -0 15, -0.10, -2.54, 0.25, -2.04, 055. -1.32.
-009. 051, 0.06, 1 54. 081. -1.65, -0.39, -0.01,0.41. -051. -060.
1 24. -1.48. 0.51, 0 13. 0.93, -2.17, 0.63, -0.39. -1.37, 1 17. -1 29.
-0 10, 0.30. 084. -Oil. 1.66, -0.66, -0.50, -087. -1 59, -0.69.
-201,4 16,397,4 18,3 71,4.55,3.45,5.62,6.67,425,4.76,524,578,
5 23. 6.20, 1 18, 5 62, 4.51, 5.35, 4.34, 4.77, 6.07, 4 24. 4 26. 3 77. 5 16.
4 07. 5 46. 3.80. 5 50. 4.84. 123.76, 117.61.
Daiasei 3.
Cadmium concentrations from the Sacramento Army Depot Superfund Sue
2620.2755,44501,3077.486.31,513.79, 11281, 159.30. 1300.668.
33 72. 35 01, 10.99, 22 05. 83094. 125.07, 40.84. 345 52, 384 80. 183 04.
2300, 1500, 260.27, 32.09. 166.16, 31.68, 12.39, 61453. 639.52, 11624.
11943. 111.60, 1029, 1.68, 3.34, 10.47, 11.74, 1032, 12230, 28303.
265.08. 12549, 131.06,47.90, 119.34.
Daiasei 4:
Mixture of 20 observations from lognormal yV(0, 1) and 10 from lognonnal
/V(4, 2) 05300. 2.7538. 3 2237, 0.2871, 1.2915, 1 5795. 20817. 1 0633.
07486.08284. 13252. 16477, 12311,26518.07258.52913. 19187.
103898. 05373. 14311. 332.1949, 988.3606, 193491. 93424. 92353
88 1362. 56 3981. 115 9378. 27.8464, 34.4647
REFERENCES
CampMI N \ 14X4 Miviurc Models and Aiypiul Values M.uh Gcnl \ Ifi n > n J(>>
4~7
HiMv.hh.iiio H .mil Kuril- \ I WO h'urmiiinm »/ Clcuni> \uin
-------
388 Singh, Singh,'and Flatman
Probability Plots, Environmental Management (Vol 14. No I) Spnnger-Vcrlag, New York.
p. 95-105
Fowlkes. E B . 1979, Some Methods for Studying the Mixture of two Normal (Lognormal) Dis-
tributions J Am Stat Assoc , v 74, n 367. p 561-575
Holgersson, M , and Jorner, U , 1978, Decomposition of a Mixture into Normal Components. A
Review J Bio-Med Comp . v 9, p 367-392
Sinclair, A J , 1976, Applications of Probability Graphs in Mineral Exploration. Assoc of Explo-
ration Ceochemisis. Rexdale, Ontario, p 95
Singh, A , 1994, Omnibus Robust Procedures for Assessment of Mullivanale Normality and De-
tection of Multivanate Outliers, in G. P Paul and C R Rao, eds . Mulnvariaie Environmental
Statistics North Holland, Elsevier Science Publishers, p 445-488.
Singh, A , and Nocenno, J 1994. Robust QA/QC for Environmental Applications, in The Pro-
ceedings of the Ninth International Conference on Systems Engineering University of Nevada.
Us Vegas, p. 370-374
-------
Representativeness in Statistics
and Quality Assurance
John Warren
Quality Assurance Division
Office of Research & Development
-------
Representativeness Influences:
Data aggregation:
o Merging data sets having similar Quality
Assurance protocols collected using probabilistic
sampling frames
o Merging data sets having a probability basis with
similar data with a non-probabilistic basis
-------
Representativeness Influences:
Hypothesis testing:
o Comparing data sets with different extraction
methods and different sample matrices
o Comparing data sets having both within and
between differences in the setting of the minimum
detection levels and data editing
-------
Factors Influencing Representativeness
Sample Selection Technique:
o Probabilistic:
- Systematic with SRS
- Composite with SRS
- Adaptive with any other
o Non-probabilistic:
- Judgmental
- "Found data"
-------
Factors Influencing Representativeness
Sample Analysis Methodology:
o Intra/Inter laboratory differences
o Method equivalence problems
o Heterogeneous sample matrices
o Variation in Quality Control
- Calibration frequencies
- Detection levels
- Laboratory protocols
- Extraction efficiencies
-------
Statisticians Are Little Help
A Dictionary of Statistical Terms
F.H.C. Marriott, 1990 International Statistical Institute
Representative Sample:
In the widest sense, a sample which is
representative of the population. Some confusion arises according to
whether 'representativeness' is regarded as meaning 'selected by
some process which gives all samples an equal chance of appearing
to represent the population'; or, alternatively, whether it means
'typical in respect of certain characteristics, however chosen'. On
the whole, it seems best to confine the word 'representative' to
samples that turn out to be so, however chosen, rather than apply it
to those with the objective of being representative.
-------
Kruskal and Mosteller : 1979
Three papers in International Statistical Review
"Representative Sampling" commonly applied to:
1. as a "seal of approval"
2. to denote "absence of selective forces"
3. as a "miniature of the population"
4. as being a "typical or ideal case"
5. to denote "coverage of a population"
6. as a "vague term to be made more precise"
7. as a "specific sampling method"
8. as "permitting good estimation"
9. as "good enough for a particular purpose"
-------
"Seal of approval"
No explanation provided of what process was used
to go from target population to sampled population
Use of "representative" is to convince the reader to
have faith in the reported results and therefore the
truthfulness of the conclusions
-------
"Absence of selective forces"
Used to imply that the sampling method used
deliberately excluded selective forces that might
over-represent some sub-population
Highly vulnerable to personal bias in elimination
methodology:
-------
"Miniature of the population"
Implies that every nuance of the population is
reflected in the sample i.e. identical frequency
distributions for sample and population.
In practice, it is obvious this cannot be acheived
-------
"Typical or ideal case"
Inevitably only a single specimen from the
population has been selected
Tremendous possibility of bias but the implication is
that an "ideal specimen" has been selected without
true definition of whether this implies "average",
"worst case", or "best case"
-------
"Coverage of the population"
The implication is that the sample selected has a
wide range across the population. At least one
example from each class or potential partition
(stratum) has been collected but the appropriate
weighting factors not made available.
-------
"Vague term to be made more precise"
The word "representativeness" is used as a promise
of things to come from a more detailed (not
specified) technical consideration of the problem.
The use of the term is intended to give permission to
discuss a problem without getting sidetracked by
technical details
-------
"Specific sampling method"
This is the use of "representative sampling" when
really the true kind of sampling has been deemed by
the author to be too complex for the audience's
comprehension. The intent of the author:
understanding by the majority, over-riding the true
comprehension of the minority (often statisticians)
-------
"Permitting good estimation"
The connotation that because some sample can be
labeled "representative" it will therefore allow for
satisfactory estimation without the necessity of
defining what this actually implies.
-------
"Good enough for a particular purpose"
This is the use of a sample to illustrate a particular
theory or hypothesis. It is a variation on the concept
of using a sample size 1 in that a counter-example
(non-random sample) can be enough to prove a case.
-------
Representativeness as an Indicator
Data Quality Indicators: PARCC
Precision
Accuracy (really Bias)
Representativeness
Comparability
Completeness
-------
PARCC: Representativeness
o Qualitative measure
o Open to individual interpretation
o Depends on media homogeneity
o Difficult to ensure
o Often demands many samples
o Needs expert opinion
-------
PARCC: Comparability
o Qualitative measure
o Expresses a degree of confidence
o Requires same variables of interest
o Needs units convertible to a standard
o Requires similar analytical procedures
o Needs compatible rules for data editing
o Requires similar sampling frames
o Needs meaningful temporal limits
o Requires expert opinion
-------
PARCC: Completeness
o Quantitative
o Influence depends on sample design
o If unbiased - loss of power
o If biased - loss of validity
o Needs expert opinion
-------
PARCC are Interrelated
Representativeness
Completeness«. » Comparability
-------
Regulatory use of Representativeness
Essentially never defined
Water (40 CFR 403)
Air(40CFR51)
TSCA (40 CFR 763)
RCRA (40 CFR 260)
"...samples should be representative of
daily conditions"
"...selected on the basis of spatial and
climatological (temporal) representativeness"
"...at locations representative of the air entering
the abatement site"
".. .a sample of a universe or whole which can be
expected to exhibit the average properties of the
universe or whole"
-------
Potentially Promising Areas
o Composite statistics & area of support
o Combining environmental information
o Applying Gy's theory of sampling
-------
Composite Statistics & Area of Support
o Interpretation of "support"
e.g. Linkage of long-term exposure risk
(104sq meters) with remediation technology
(103 sq meters) with sampled area
(102 sq meters) with physical sample
(101 sq meters) with sample analysis
(10"1 sq meters) with ...
e.g. geophysical/geostatistical (kriging)
Englund & Flatman: Spatial Statistics Sampling
-------
Composite Statistics & Area of Support
o Literature and information on composite sampling
+ Statistical Methods for Environmental
Pollution Monitoring (R.O. Gilbert)
+ Handbook of Statistics vol 12, Chapter 4
(G. Lovison, S.D. Gore, & G.P. Patil)
+ Environmental and Ecological Statistics
(Special Edition, G.P. Patil, editor)
+ Guidance on Sampling (QA/G-5S)
(Under development by QAD)
-------
Combining Environmental Information
o Literature and information on data combining:
+ Encountered Data, ... and Weighted Distributions
(G.P. Patil) 1991, Environmetrics 2, 377-423
+ Using Found Data to Augment a Probability Sample
(J.M. Overtoil, T.C. Young & W.S. Overtoil,
1993 Envir. Mon. & Assess 26, 65-83
+ Combining Environmental Information I & II
(L.H. Cox & W.W. Piegorsch)
1996f Environmetrics Z, 299- 324
+ Guidance on Sampling (QA/G-5S)
(Under development by QAD)
-------
Encountered Data, Statistical Ecology, Environmental
Statistics, and Weighted Distribution Methods
o Weighted distributions used to account for observer bias due
to being unable to actually observe an event or sample value
o If an observation (X) has a probability 0x of being observed
then the observed pdf is the true pdf weighted by 1 - 0x
o Regard the problem as one of modelling when samples are
drawn without a proper frame
o The paper contains some theoretical properties of weight
functions together with some applications
-------
Using Found Data to Augment a Probability Sample
o If the variable of interest is in both found and probability
based samples, then use a pseudo-random sample approach
and combine the data in the manner of a stratified sample
o If not, use a stratified calibration approach - form a predictor
equation for found data by regressing variable of interest on
the known frame attributes. Then for the probability based
sample, use the prediction equation and the frame attributes
to predict new variables of interest
o Extensive example on streams from the National Surface
Water Survey
-------
Labor
p.(3.
West Virginia
California
Alaska
Rhode Island
Louisiana
Washington
New Jersey
New York '
New Mexico
Alabama
Mississippi
Texa§
• Pennsylvania
O Montana
^:
Hawaii
Maine
Florida
Connecticut
Nevada
£
;
• Massachusetts
• Kentucky
• Idaho
© Michigan
O Tennessee
• Illinois
• South Carolina
{Maryland
Arizona
© Georgia
O Arkansas
• Wyoming
• Oregon
• Ohio
@ Missouri
O Oklahoma
• Indiana
• Virginia
• Kansas
® North Carolina
O Delaware
• Vermont
• Colorado
• New Hampshire
• Wisconsin
O Minnesota
I
Utah
Iowa
North Dakota
South Dakota
O Nebraska
j:f$atistics By Staje
1995 Average
Unemployment
Ratio
Employment iFQ
Population Ratio
)§V
i
:•
:
5 7
Percent
50
60
Percent
Figure 2: Linked Micromaps and Statistics
-------
Combining Environmental Information I & II
o Two consecutive papers, the first being an overview with
potential areas for research, the second considering various
applications to epidemiology and toxicology
o The overview includes kriging, non-detect problems, and
application to truncated spatial data
o Overview also includes the mathematical aspects of
combining p-values (the works of R. A. Fishery and
T. Mathew, B. Sinha, & L. Zhou)
p Examples include passive smoking and dose-response
------- |