12th Annual EPA Conference on Environmental Statistics Richmond VA April 1-3, 1997


            .
               •
             Future

             Direction
            far Statistics
            at EPA
Twelfth Annual EPA Conference
            on
   Environmental Statistics
       Richmond, Virginia
        ApriM -3,1997
               .
              . .. -•
              .

-------
Chapter 40

Geostatistical Sampling Designs

for Hazardous Waste Sites

George T. Flatman and Angelo A. Yfantis

This chapter discusses field sampling design for environmental sites and hazardous waste
sites with respect to random variable sampling theory, Gy's sampling theory, and geostatis-
tical (feriging) sampling theory. The literature often presents these samp/ing methods as an
adversarial "either/or" philosophy; this chapter emphasizes when each should be used with
a cooperative "both/and" philosophy. The intrasample variances, biases, or correlations
must be taken care of by the use of Gy's sampling theory for both independent random vari-
able sampling and analysis and correlated random variable sampling and analysis. The
deciding/actors in the choice of sampling design and analysis are not just intersample vari-
ances, biases, or correlations but also the discreteness of the waste under investigation,
remediation as a unit, and the relative cost of samples versus the cost of remediation.
ENVIRONMENTAL SAMPLING is a multidisciplinary science. It requires chemists,
media experts, risk assessors, and even statisticians. The sampling design is
an integral part of the experimental design and data analysis, and most
importantly, the data analysis cannot recover more information than the samples
contain. Thus the statistician needs to be on the project from its inception. Optimal
environmental sampling requires consideration of at least three branches of statis-
tics. Classical random variable statistics (1) are needed in quality assurance (QA)
and in the analysis of data that are reasonably independent (little or no process, spa-
tial, or chronological correlation). Gy's theory of sampling (2) is needed for the def-
inition of correctness for the "field sample" [determination of amount (mass or vol-
ume) sampled] and any samples taken in heterogeneous media (almost all
environmental samples). Geostatistics, and its most used form, kriging (3), is
needed for field sites with a spatial structure. The choice of sampling designs—

-------
780                               PRINCIPLES OF ENVIRONMENTAL SAMPLING

when to use classical random design or kriging's regular grid design—is a difficult
decision. Even statisticians differ on such a question. This chapter discusses the sta-
tistical rules diat enter into die decision. The decision depends on specifics of die
site and remediation plan as well as statistical aspects. For example, Gy's theory
must be used to take a correct sample for eidier random variable statistics (sampling
or analysis) or geostatistics (sampling or analysis).
     When 1 discussed the role of statistics in sampling design with a manager of a
chemical laboratory, the manager confided in me  diat his statistician's recommen-
dations were always illogical and irrational and contradicted common sense. We did
not have time to discuss specifics, but 1 suspect die advice he received was also poor
statistically because it confused the use of random variable statistics with die use of
spatial statistics. If the correct branch  of statistics has been chosen,  statistical
requirements can be  explained from statistical theory in a logical and reasonable
manner that does not defy common sense. It is important in a multidisciplinary
project  for all to be comfortable widi the soundness of the decisions. Statisticians
should be asked to explain the statistical requirements diey recommend until all feel
comfortable widi the design.
                   Random Variable Statistics

A random variable has both magnitude and probability. It may come from a sym-
metric distribution such as normal or uniform, or from a skewed distribution such
as lognormal or Poisson. Chemical environmental data sets are often assumed log-
normal, and radioactive data sets are often assumed Poisson.  Because both distribu-
tions are positively skewed, the estimate of the mean based on few samples has a
higher probability of being underestimated than die mean of a normal distribution
or any symmetric distribution with a strong central tendency. Random errors as
monitored by QA are often assumed normal. The branch of statistics that deals with
random variables gives us  the statistical inferences that have  tools for QA. Random
variables provide measures of central tendency (such as mean, median, and mode),
dispersion (such as range and standard deviation), and statistical inference (such as
confidence intervals, prediction intervals, and tolerance intervals).
    The mean and standard deviation are die statistics usually sought by a sam-
pling campaign; they are sufficient statistics (i.e., completely define die distribution)
for die normal distribution. The mean of any distribution becomes normal  as die
number of samples, n,  becomes large. This property justifies die use of confidence
intervals for die mean if, and only if, n is large enough (n > 16 for a symmetric dis-
tribution, and n > 50 for a skewed distribution). However, if the number of samples
is much fewer than 50 samples from the typical environmental distribution in a con-
fidence interval, then these limits are not to be trusted.  Eidier knowledge of the dis-
tribution or transformation to normality is required for statistical inference about die
variable, its distribution, or future samples. A listing of means and standard  devia-
tions or intervals, without investigating die distribution, is misleading and has die

-------
40. F1ATMAN AND YFANTIS Geostatistical Sampling Designs 781

potential of inviting wrong decisions because the readers will assume normality.
Nonparametric intervals and tests are available, but they lack power. For example,
the critical values for one-sided intervals for probabilities (1 - a) of 0.95 and 0.99
using the Tchebycheff inequality are 4.472 (square root of 20) and 10.000 instead of
the standard normal distribution values of 1.64 and 2.33. Most regulators will
cringe at 4 or 10 in a compliance hypothesis test. Anodier consideration is that ran-
dom variable sampling design requires rigorous definitions of the population and
sampling unit, so that the design can give each sampling unit an equal probability of
being chosen. This requirement will be discussed further.

Population Defined

In environmental samples, population is not as obvious or as well-defined a term as
it is in statistical textbooks (e.g., all the cards in a deck, or the two sides of a coin).
In site evaluation, the most obvious population is die waste site as a whole, but die
usual site has more than one population of interest. It may have population(s) of
plume(s) and background population(s). The population of interest is the popula-
tion (s) of the plume(s). \\kste plumes seldom honor property boundaries or travel
in politically defined shapes such as city blocks. Thus die populations of interest are
the plumeCs) and the background, not a mixture of these. To average all the samples
from the site would give an estimate of a mean from a mixture of populations, a
"fruit salad" of plumeCs) and backgroundCs). If the location and extent of the plume
or background are not known, but a map of mean contours (isopledis or isarithmic
lines) is wanted for multiple remediations, then this situation would require geosta-
tistical sampling and analysis. If the waste to be evaluated is well-defined and con-
fined, such as liquid waste stored in 55-gallon drums or a waste pile on a tarp that
will be disposed of as a unit, then the population of interest is the drum or pile and
dierefore classical statistics (a mean value) will be adequate for the decision.

Sampling Unit Defined

For textbook statistics, a sampling unit is a draw of a card or a flip of a coin, but for
an environmental sampling the unit is complicated by natural variation (e.g., media
heterogeneity or pollutant characteristics) and sampling tool variation and biases
(4). In laboratory QA the unit may be the contents of every ith vial in die queue of
die analyzing instrument. At an environmental site, die "sampling unit" is ambigu-
ously used to refer to both the sample and die sample support. The sample is much
smaller in volume or mass than the sample support, but if it is representative, it has
approximately die same concentrations of die pollutant or the same values of some
measured characteristic. The sample, simple or composite, is a small critical mass
diat is taken from die sample support for measurement. The sample support is die
larger volume or mass of in situ media diat is to be represented by the measure of
die sample. The sampling support is often die same volume as the remediation unit.
These two units are determined by the goal of the sampling campaign or die reme-

-------
782 PRINCIPLES OF ENVIRONMENTAL SAMPLING

diation option(s), but they must meet the requirements of Gy's theory of sampling
and geostatistics, which are each discussed in subsequent sections of this chapter.
The extractable mass or volume (field sample) cannot be dictated by the size of the
sampling tool or the size of the official container. It should be determined by the
heterogeneity of the media in accord with Gy's theory. Differing amounts of media
of interest, because they are ambiguously called "sample", should be identified by
size and use. The analysis sample (i.e., aliquot or split), used in its entirety by the
chemist for analysis, has a mass less than that of the preparation sample, which has
a mass less than or equal to that of the field sample. Each change of scale or reduc-
tion of sample mass must pass Gy's requirements (see the subsection Analytical
Error). The name of the sample is unimportant, but the change of mass is impor-
tant. Any change in volume (mass) must be checked using a monogram made up for
the current site. Extraction(s) for the field sample from the in situ sample support
(i.e., sampling unit) must satisfy both Gy's theory requirements and geostatistical
requirements.

Dealing with Correlation in Practice

In theory, the difference between an independent random variable and a random
variable correlated in time or space is clear, but this difference is not so clear in prac-
tice. In practice, most environmental samples are correlated in either time or space,
and possibly in both time and space, yet a random sampling or analysis is done.
Even the analyses of the samples in the queue of a mass spectrometer (MS) are cor-
related somewhat in time, but this correlation is weak enough and the QA samples
are spaced far enough apart that the correlation can be ignored. Correlation in space
or time can be taken into account by slighdy more complicated formulas in random
variable statistics; Gilbert (5) gives relevant sediment and groundwater examples of
how correlated sample units require more samples to be taken (larger n) than if the
observations were independent. The critical criterion for using a spatial sampling
and data analysis is the management decision or need to see a contour (isopleth)
map of die pollutant location as well as concentration (these are kriging results) in
place of a list or histogram of chemical analyses with a confidence interval about an
estimate of some mean (random variable output).
Pierre Gy's Sampling Theory

Pierre Gy is a mining engineer and Francis Pitard is a chemist. Both men have had
brilliant careers in process and mining quality control. Pitard has written a two-
volume work (2) that captures and communicates their experiences in die sampling
of heterogeneous media. These volumes are valuable for environmental sampling of
soils or sediments. Pitard organizes the taking of "correct" samples with correct sam-
pling tools, according to seven "errors". The emphasis on correct samples and tools
is analogous to the emphasis from the U.S. Environmental Protection Agency (EPA)

-------
40. FLATMAN AND YFANTIS Geostatistical Sampling Designs 783

on representative samples. Because of the potential of one, some, or all seven of
these errors to erode the correctness or the representativeness of an environmental
sample, this chapter will refer to them as "variances" to stress their additivity for a
component-of-variance model, "\fcriance" emphasizes the intrinsic nature of these
errors or biases in heterogeneous material sampling, in contrast to the negative con-
notations of these terms in the vernacular ("error" as a careless mistake; "bias" as an
intentional dishonesty). Variance, error, and bias are technical terms that describe dif-
fering problems with different solutions. An "error variance" is often thought of as
symmetric with a mean (expectation) of zero and as reducible by taking more sam-
ples; a "bias variance" is one-sided (e.g., always too high or too low) and is reducible
not by taking more samples but only through a correct sampling design. The sym-
metry or one-sidedness must be carefully thought out and often field-tested for all
potential variance in any sampling design and QA plan.
This theory sounds like any QA plan talking about errors, but it refers to a dif-
ferent type of error and needs to be discussed in its own part of the QA plan. Specif-
ically, it deals with intrasample error (errors within the sample) rather than inter-
sample error (errors between samples). The various components of variance of this
sampling theory sound trivially obvious when pointed out, but they are easily over-
looked in the stress of formulating a QA or sampling plan. Leaving them out can be
disastrous for QA and data quality objectives. Even though these sources of varia-
tion sometimes are obvious and trivial, they must be taken into account in every
environmental sampling plan.

The Fundamental Error

This component of variance is a natural property of heterogeneous material. It is not
an error in the sense of an avoidable mistake; however, if the sample planner does
not take it into consideration it will generate unnecessary (avoidable) variance in the
laboratory analyses. The variance is caused by the range of particle sizes in the
medium and the fact that often only certain sized particles contain the pollutant of
interest. This situation is illustrated in Figure 1; the shaded or lined particles are
assumed to contain or carry the pollutant, and the odier particles are die heteroge-
neous medium. Thus the chemical analysis depends on two values: die number of
solid particles (percentage composition), and their concentration. This dependence
adds anodier variance term or component of variance (percentage composition) to
the analytic variance. The magnitude of this error is small in a fine or homogeneous
soil or sediment but becomes larger as the medium becomes more heterogeneous in
particle size and particle affinity for the pollutant of interest. This fundamental com-
ponent of variance can be reduced by increasing the mass of the sample or by reduc-
ing die particle size of die sampling material by appropriate digestion.
To maintain die original level of accuracy, die sample material must always be
reduced in maximum particle size before being reduced in mass or volume (split or
aliquot). The mass of a sample required for a given relative variance [relative stan-
dard deviation (RSD) squared] can be read from Pitard's nomograms as a function of

-------
784
PRINCIPLES OF ENVIRONMENTAL SAMPLING
Figure 1. Heterogeneous material: fundamental error. (Reproduced with permission from reference 2,
Vol 1. Copyright 1989 CRC Press.)

various physical properties, the most important one being maximum particle size of
the medium (6). This relationship will be directly applicable to waste monitoring if
the pollutants of interest are heavy metals, but the application to volatile chemicals
or semivolatile chemicals remains to be developed. The EPA has a very readable doc-
ument on this subject that presents an example nomogram for soil properties (7).
The extension of Gy's theory to volatile chemicals and semivolatile chemicals is a
very important but as yet undeveloped part of environmental sampling.

Grouping and Segregation Error

There is potential for this variance in any heterogeneous media. The grouping and
segregation error develops through movement of samples through processing, han-
dling, shipping, or mixing. The heterogeneity may be in density or size (also adhe-
sion, cohesion, magnetism, affinity for moisture, and angle of repose of crystalline
structure) so that the particles come together by groups during any movement or
vibration. Figure 2 illustrates this type of error for the pile at the end of a conveyer
belt. If the black particles contain the pollutant of interest, then a sample from the
right side of the pile will be biased high and a sample from the left side will be
biased low. In taking a sample of a waste stream or pile, the potential variance can be
minimized by sampling along the gradient of grouping and segregation. For soil,
gravel, or sediment being carried on a conveyor belt, the gradient of grouping and
segregation would be across the belt orthogonal to the direction of motion, and thus
a correct sample would be a rectangular (not a trapezoidal) section oriented across
the belt. Sampling a pile, a truck, or a railroad car of waste in a correct manner is

-------
40. F1ATMANANDYFANTB Geostatistical Sampling Designs 785

Stream
O
^Q
O«
O pQ
&?&&&
L^oOoof*,^* V«»*;««
Rgure 2. Grouping and segregation error: (Reproduced with permission Jrom reference 2, VoL 1.
Copyright J989 CRC Press.)

very difficult because of this component of variance. The correct time to sample is
before the pile is built or the truck or railroad car is loaded. In sample preparation,
Pitard suggests that the pouring of the well-mixed material from the V-blender, espe-
cially if the paniculate material is allowed free fall of any distance, can undo (defeat)
the blending (8). Aliquotmg increases this error. The general rule is that as aliquot size
decreases, the variance increases. Theoretically, as the size of the aliquot approaches
the size of the grains of the sample, this error grows larger without bounds. The corol-
lary to this theorem is the tact that the chemist, aliquoting to get the relatively small
amount of material (analytical sample) actually required for the analysis, can turn the
analytic equipment into a random number generator if the sample material has not
been ground to the required fineness and aliquoted correctly.

Spatial and Periodic Errors

These error sources could be periodic and/or spatial structures on the scale of the
extracted sample or the sample support (the in situ area or volume represented by
the sample). If they were of a larger scale they would be studied by a time series
analysis or a geostatistical analysis, but they are not of interest, and the decision sta-
tistic is the mean of the unit and not the means of the subunits. In the preceding
discussion of classical statistics, the 55-gallon drum was assigned to a classical sta-
tistical analysis instead of a geostatistical analysis, even though there may have been
a structure in concentration in the vertical dimension of the drum. No one wants a
contour map of the concentration of pollutant inside of a drum because the drum
will be remediated (disposed of) as a unit. However, this gradient cannot be ignored;
instead it must be representatively sampled by sampling each layer proportional to

-------
786
PRINCIPLES OF ENVIRONMENTAL SAMPLING
its volume. This sampling is accomplished by the choice of sampling tools. To min-
imize the microspatial variance, a "composite liquid waste sampler" (COLIWASA)
must be used. The name of the sampling tool tells an important principle. Com-
positing is an important tool in random variable statistics to save chemical analysis
costs, but in spatial statistics it is used to ensure that the sample is representative of
the in situ sample support. Subsample compositing is physically doing the same
diing that statistical averaging does to the numerical values of replicate samples,
except compositing loses the information about the variance or standard deviation,
with the benefit of saving the cost of (n - 1) chemical analyses. These are two quite
different and important uses of compositing.

Increment Delimitation and Extraction Errors

These two variances arise from the interaction of a sampling tool with die hetero-
geneity of the media sampled. The circles in Figures 3 and 4 can represent the cut-
ting edge of a plugging or coring device descending on the media to take a soil or
sediment plug or core. In Figure 3, taking the shaded area of the larger particles
would be the correct sample, but if the larger particles are hard compared to the
softer interstitial material, the tool will not cut through the harder particles to give
the desired correct sample. Rather, the large hard particles will be pushed out of the
sample if their centers of gravity lie outside the corer, as illustrated by the white par-
O
Figure 3. Increment delimitation error. (Reproduced with permission from reference 2, Vol. 2.
Copyright 1989 CRC Press.)

-------
40. FLATMANANDYFANTIS Geostatistical Sampling Designs
787
O
Figure 4, Increment extraction error, (Reproduced with permission from reference 2, Vol. 2.
Copyright 1989 CRC Press.)
deles in Figure 4. If their centers of gravity fall within the corer as illustrated by the
shaded particles in Figure 4, then the particles in their entirety will be included in
the sample. Either case is incorrect, but the two cases tend to average out. It is
important to distinguish these two concepts: (1) the delimitation error is the varia-
tion caused by the inability to cut through all the heterogeneous media and take the
part included in the circle of the coring or plugging device, and (2) the extraction
error is the variation caused by taking or pushing out of the way the whole hard par-
ticle as a function of whether its center of gravity falls in or out of the circle of the
corer or plugger. If the cylinder could be cut out exactly by a laser and then taken
out intact by levitation as in science fiction, these two errors could be avoided.
Today's solution to these problems is to have a corer or plugger that is at least two or
three times the diameter of the largest particle size.
Analytical Error

The EPA and the American Chemical Society have published many excellent papers,
proceedings, and books on this interdisciplinary subject. Therefore, to avoid dupli-
cation, we wish to speak only to the chemist's method of abstracting a much smaller
sample (analytical sample) from the prepared sample. This step, because of the
smallness of the mass of the analytical sample compared to the mass of the sample
from which it comes, is the sample most apt to incur an unacceptable magnitude of
Gy's fundamental error. If the analytical sample is taken by sticking a spatula ran-

-------
788
PRINCIPLES OF ENVIRONMENTAL SAMPLING
domly into the top of the material in the bottle and taking out the desired amount,
such a sample is a grab sample and not an aliquot or split; the chemical analysis is
very apt to give a value that is incorrect for Gy's theory and unrepresentative for reg-
ulatory use.
For an example of the grinding and splitting or aliquoting needed to acquire a
correct and representative analytical sample, the critical path (A—»B-»C—>D—>E)
should be traced through Figure 5, a nomogram adapted from references 2 (Vol. 1)
and 7. (Grinding cannot be done, however, for volatile and semivolatile pollutants or
to the media for a leach test.) First the nomogram must be made for the specific site
(e.g., particle sizes and particle characteristics). The horizontal orx-axis is the sam-
ple weight in grams, and the vertical or_y-axis is the RSD of the fundamental error;
both axes are in log scale. In the center of the nomogram is a family of linear graphs
that introduces the third variable, maximum particle size. Each particle size has its
own line, and each line represents one and only one particle size.
The two ways to reduce diejy-axis intercept, the RSD of the fundamental error,
are: (1) to take a line with smaller particle size from the family of graphs, or (2) to
take a larger weight of sample on thex-axis. First, in the family of linear graphs, the
top line of the family represents the largest particle size, namely 75 mm, and inter-
cepts the largest RSD on thejy-axis. The next lower line is 25.4 mm, and so on down
to the line with the lowest RSD, which is for a particle size of 0.2 mm. The 0.2-mm
line is probably representative of QA internal standards, in contrast with Super-
fund's definition of soil as <2 mm and the definition from the Resource Conserva-
tion and Recovery Act (RCRA) of soil as <9 mm. These disparities in sizes might
explain some of the bench chemist's problems with increasing variance or RSD (e.g.,
1000.
1000
Sample Weight (gms)
10,000
1,000,000
100,000
Figure 5. Maximum panicle size: preparation error. (Adapted from reference 2, Vol. 1, and
reference 7.)

-------
40. FLATMANANDYFANTIS  Geostotisttcal Sampling Designs             .       789

square root of relative variance) as samples come from QA internal standards, Super-
fund samples, and RCRA samples. Second, each graph has a negative slope, which
shows that as the mass of sample on the horizontal axis for a given particle size
increases, the relative variance intercepted on the y-aaas decreases. The horizontal
line labeled 15% RSD represents the target accuracy or maximum acceptable RSD. If
the maximum particle size of the material of interest is measured empirically to be
75 mm, and the pollutant of interest is one that can be pulverized or grown without
loss, such as a heavy metal (Pb), then from the intersection of the horizontal line of
maximum RSD = 15% and the  downward sloping line of particle size 75 mm, the
necessary minimum sample weight can be read on the horizontal orx-axis as 100 g.
Thus the field technician or scientist must take a sample or composite subsamples
so that a field sample of 100 g or more is obtained.
    If the chemist is going to take an aliquot of 1 g for the analysis (analytical sam-
ple), then the preparation procedure must follow a path such as A-»B-»C->D-»E
in Figure 5. To maintain the accuracy of the 100 g of field sample whose maximum
particle size is 75 mm, the digestion process must first grind and then split. Grind-
ing reduces particle size and splitting reduces the mass of the sample. Grinding is
going down on the nomogram from A to B representing pulverizing from a maxi-
mum particle size of 75 mm to a particle size of 25.4 mm, and aliquoting or splitting
is moving to the left along the 25.4-mm line on the nomogram from B to C, repre-
senting aliquoting or splitting the sample of 100 g to a sample of 10 g. The new crit-
ical mass due to the particle reduction or the location of C on the new smaller parti-
cle line is the last integer weight tick line that intersects the new particle line just
below the 15% relative error line. The amount of information in the 100 g of mate-
rial of maximum particle size 25.4 mm at B appears to have an order of magnitude
(axes in log scale) decrease in RSD. This apparent decrease is not true, because vari-
ance of an extracted sample is not reduced by grinding, or information is not created
by digestion, but it does mean that now we can split the sample mass down to the
new critical mass (10 g)  on the current line (25.4-mm line) and still have the origi-
nal RSD of the 100-g sample, namely 15%. A nomogram path has no lower RSD
than its highest point (in this example, point A). Again, more digestion moves the
sample from C on the 25.4-mm line to D on the 6.35-mm line. No information is
created by grinding, but now the information in  100 g of 75-mm particles, namely
15% RSD, can be carried by a new critical mass of only 1 g as splitting or aliquoting
moves us along the 6.35-mm line to E,
    The process makes sense if the would-be user remembers that grinding reduces
the critical mass needed to carry the same RSD and that the aliquoting or splitting
removes only the unneeded mass. One  might well ask,  "Why the broken path?
Wouldn't it be easier to grind all the way  in one step and then split?" Yes, it would
be simpler, but it would require  the grinding of a larger mass of sample; the stepwise
path minimizes the mass of material digested. In the interest of minimizing grinding
or preparation, Pitard suggests sieving the material so the part less than the new
maximum particle size falls through, and then grinding only the pan that did not fall
through, remembering to recombine the two.

-------
790 PRINCIPLES OF ENVIRONMENTAL SAMPLING

This process sounds a little complicated because it is complicated, but with
particle size analyses of the media of interest and with statistically guided prepara-
tion (pulverizing and splitting or aliquoting), a correct and representative analytical
sample can be prepared for the chemical analysis.
Spatial Variable Statistics

The old adage that a chain is as strong as its weakest link implies that the prudent
blacksmith will strengthen die weakest link and try to make all links equally strong.
The application to environmental sampling is that error variances are a chain: the
analytical variance, the sampling and handling variance, and die field variance are
links. The goal of quality improvement is to make the sum of die variances as small
as possible, and the cost-effective way to minimize this sum is to spend more
resources on die variance link that is improved most cheaply. Because of diminish-
ing returns in variance reduction, the optimal variance to reduce is often die biggest
one. The field sampling variance is often the appropriate link or variance to reduce.
Variance reduction is most obviously accomplished by taking more samples, but if
sampling or analytic costs are high, increasing samples may be too expensive. In
many cases, the field sampling variance is economically reduced by going from a
random to a spatial variable sampling design.
The term geostatistics was coined by Matheron (9) to describe the study of
regionalized or spatially correlated variables. In die past 20 years, the geostatistical
literature has grown enormously, and many significant developments in dieory and
methodology have been presented. The practice of geostatistics has also spread from
its original applications in the mining industry to such fields as soil science, forestry,
meteorology, and environmental science.
The geostatistical methods described in this chapter, namely semivariograms
and ordinary kriging, represent two of the approaches available to us, and we
selected diem primarily to illustrate geostatistical concepts and dieir implications for
sampling programs. A discussion of the pros and cons of alternate approaches, such
as generalized covariance and universal kriging, is beyond the scope of diis chapter.
More extensive treatments of die subject can be found in references 3 and 10.

Random or Spatial Variables

Most field sampling plans are based on random variable statistics and assume that
die sample observations are independent and identically distributed (IID). However,
field samples are usually spatially correlated. Correlation is a statistical measurement
of the intuitive physical fact that samples taken close togedier are more similar in
value than samples taken fanher apart. Neglecting this correlation can make the sta-
tistics, tests, and sampling procedures diat assume independence (IID) inappropri-
ate (11,12); using this correlation makes die statistics, tests, and sampling proce-
dures of spatial statistics more appropriate and powerful. A truly random variable is

-------
40. FLATMAN AND YFANT1S   Geo$tatistical Sampling Designs                    791

completely described by its probability distribution. Samples are used to estimate
this distribution and to estimate statistical descriptors such as mean, median, and
standard deviation. In addition, spatial variables must be described by a measure of
the correlation between each value and die values at nearby locations. Samples can
be used to estimate the spatial correlation function and are frequently used to esti-
mate localized mean values for remediation units or exposure units.
    Localized mean estimates are often displayed in the form of isopleths or con-
tour maps. A practical rule for the investigator is that if a contour map is a desired or
even a plausible end product of a proposed study, geostatistical methods should be
considered.
    The implications for the design of a sampling program can be  significant.
Although random sampling is appropriate for random variables, Olea (13) demon-
strated that the most effective sampling pattern for local estimation of spatial vari-
ables  is the regular grid. Yfantis (14) evaluated triangular, square, and hexagonal
grids.  Also, geostatistical studies commonly use a multiphase approach, and the first
sampling phase is oriented primarily toward estimating the spatial correlation Q5).


Semivariogramsfor Quantifying Spatial  Correlation

One way in which spatial correlation can be measured and displayed is by a semi-
variogram, or graph of the type shown in Figure 6. The dots are the empirical semi-
variogram representing experimental values computed from sample data; the fitted
curve  is a theoretical semivariogram or an estimation of a spatial correlation function
                         Range of Correlation
  T
Distance in Lags (h)-axis
   Figure 6. A typical semwarioffom. (Reproducedfrom reference 16. Copyright 1988 American
                              Chemical Society.)

-------
792 PRINCIPLES OF ENVIRONMENTAL SAMPLING

assumed to be characteristic of the sampled area. The horizontal axis, called the lag
axis, is the distance between points in linear units such as meters or kilometers; the
vertical axis, called the gamma axis, is the variance of differences in pollution units
squared, such as parts per million squared. The experimental points are computed
by averaging data grouped into distance class intervals, \fcriance is a function of lag.
The rising nature of the points and curve follows the principle of sampling that
states the variance or difference between observations increases as the distance
between their locations increases.

Sill and Range of Correlation. Figure 6 is typical of many semivariograms of
chemical concentrations in the environment; the rise in variance has an upper
bound known as the sill. When the variance reaches the sill, sample locations are far
enough apart to make the samples independent. The distance on the lag axis at
which the semivariogram's curve reaches the sill is the range of correlation. This dis-
tance is important to die sampling plan, the estimation of pollution over the area
under investigation, and the interpolation error. The range of correlation explains a
practical relationship between spatial variables and random variables; random vari-
abks are field samples that are farther apart than the range of correlation, and spatial
variables art field samples closer together than die range of correlation. This range of
correlation is important for choosing die correct analysis; if a classical random vari-
able statistic is wanted, such as the mean or variance, then one type of sampling
design that would ensure spatial independence of the samples would be any sys-
tematic random design requiring that all samples are at least the range of correlation
apart (17). If a contour map of pollution isopleths or interpolation variance is
wanted, then as the sampling locations get closer together, the local interpolation
error decreases. Depending on the information wanted and the spacing of the sam-
ple locations, either random or spatial variance statistical analysis can be used on
field samples.

Variance Model. In Figure 6, on die vertical axis of the fitted model the variance
has two components, C0 and Cj. The Cj component of the variance is the measure
of structural variation and has the characteristic of increasing variance between sam-
ple observations as the distance between sample locations increases. The C0 com-
ponent of the variance combines random variance factors, such as sampling and
analytical error, along with any unmeasured spatial variance that may exist at dis-
tances smaller than die sampling interval; C0 is constant for all lags. The relationship
of C0 to the need for compositing samples and the relationship of Cl to the distance
between sample locations will be discussed in a later section.

Anisotropy and Directional Semivariograms. The variance structure, as
measured by the semivariogram, is often different in the range of correlation in dif-
ferent directions. This condition is called anisotropy and must be measured by direc-
tional semivariograms. Directional semivariograms are computed experimentally by
grouping sample pairs into directional classes, or windows, as well as into distance

-------
 40. FLATMANANDYFANT1S  Geostattstical Sampling Designs                    793

 classes. The directional ranges of correlation can change the geometry of the sam-
 pling grid and the orientation of the grid. Often, not enough preliminary data are
 available to compute directional semivariograms, and thus the sampling design
 must work with only an omnidirectional range of correlation. However, an omni-
 directional range of correlation and a sampling design from it honor the variance-
 covariance structure more than conventional random variable methods that con-
 sider only a scalar variance.


 Kriging for Surface Estimation

 Kriging is a linear-weighted average interpolation technique used in geostatistics to
 estimate unknown points or blocks from surrounding sample data. By assuming
 that the spatial correlation function inferred from the experimental semivariogram
 is representative of the points to be estimated as well as those sampled, die inter-
 polation error (kriging error or kriging standard deviation) associated with any esti-
 mate that is a linear-weighted average of sample values can be computed. The krig-
 ing algorithm computes the set of sample weights that minimize the interpolation
 error.
     Kriging software usually offers both punctual and block output options. Punc-
 tual kriging treats die input values as located at points and output estimates as values
 at points. Block kriging estimates die output for an area or volume (called block) by
 averaging  multiple points estimated over that area or volume. This difference is
 determined in the sampling and becomes important in the data analysis (see the
 subsection Sample Support and Estimation Blocks').
     Kriging has a number of characteristics of a desirable estimation method: sam-
 ple weights can be adjusted for anisotropy; samples in correlated clusters can be
 down-weighted; die degree of smoodiing increases as the random component (CJ
 of the semivariogram model increases; and, when the semivariogram model is com-
 pletely random (Cj = 0), the kriging estimator becomes the sample mean, as in
 independent random sample statistics.

 Spatial Outliers

 Spatial oudiers can be found by examining a geographical plot of the data; diey may
 fit into a random variable histogram of all the data very well. In odier words, a spa-
 tial outlier is a sample value that does not agree in magnitude with the values of its
 neighboring samples, especially the samples within a range of correlation. For exam-
 ple, a high (polluted) value in a low (background) neighborhood might be a spatial
 oudier but not a random variable outlier because the high value agrees widi odier
 polluted values. Once  these outliers are identified,  their location descriptions
 should be looked up in the sampling diary. If they are obviously from different
sources that do not have the same correlation structure, they should be excluded
 from the semivariogram evaluation. The question of whether to include a spatial
 oudier in die final local estimate of concentration must be answered on a case-by-

-------
794                              PRINCIPLES OF ENVIRONMENTAL SAMPLING

case basis. This matter involves the investigator's judgment, just as in the case of
random variables.
    The following discussion exemplifies an analysis of spatial outliers. Investigat-
ing the data from a city-wide sampling campaign for Pb, exploratory data analysts
showed an empirical semivariogram with a range of correlation of at least 6 miles
and two hot spots that were one order of magnitude higher in concentration than
the rest of the data. The data set was printed out on a geographical plot that showed
the two hot spots to be in sharp contradiction to their individual local neighbor-
hoods, that is, every neighboring point and every point one neighbor out was at
least one order of magnitude lower in concentration. The geographical map that
identified the freeway system and the data showed that both points seemed very
close to the freeways. In checking the sample log book this conclusion was con-
firmed; one of the aberrant samples was taken under a freeway overpass and the
other at a freeway on-ramp. Freeway Pb is said to have a range of about 500 feet.
Thus, because the two points represented a different source of Pb and had a much
shorter range, they were excluded from the semivariogram computations. However,
what was to be done with them in the kriging and mapping? If they were included
in the kriging, they would spread their high values over circular areas of 6 miles in
radius. This representation would be grossly untrue because the outliers are known
to have a different source and  a shorter range of correlation. The mapping would
show a large area needing remediation that, in fact, did not need remediation. Nev-
ertheless, the  values  had been found, and users of the map (risk assessors) needed
to know of the hot spots. The compromise was to krige and contour the Pb concen-
trations of the other samples onto a kriging map and then just  print the magnitude
of such outliers at  their respective locations on the map.
                       Spatial Soil Sampling

The growing number and complexity of toxic chemicals and hazardous waste sites
call for a new statistical technique for monitoring with more efficient sampling
designs and more precise data analysis. Geostatistics is a promising tool for these
needs. This section traces the logic sequence of geostatistical analysis and then
draws together the implications of geostatistical sampling design for soil pollution
monitoring. Geostatistical sampling design has at least two phases: (1) the survey or
the preliminary sampling to find the extent of the plume and to estimate a semivar-
iogram, and (2) the census to take as many samples as needed to estimate the sur-
face within the desired accuracy as calculated from the semivariogram model.

Sample Support and Estimation Blocks
The baste assumption of geostatistical sampling is to define and assign area  or vol-
ume to all inputs and outputs. In monitoring for environmental protection, the spa-
tial quantities to be defined and assigned are the sampling unit (area or volume), the

-------
40. FLATMAN AND YFANT1S Geostatistical Sampling Designs 795

remediation unit, and the exposure unit. Geostatisticians call the sampling unit the
sample support. The sampling unit or support is ambiguous: it is used to refer to
both the amount of medium extracted for the sample and the in situ area or volume
represented by the sample. The context usually identifies whether the extracted sup-
port or the in situ support is meant. The remediation unit is determined by the
mediod of remediation, and die exposure unit is determined by the risk assessor.
For example, an appropriate remediation block might be a volume 250 ft long, 16 ft
wide, and 0.5 ft diick, because diis amount was the minimum volume to move eco-
nomically. The shape is dictated by die up and back pass of a bulldozer widi an 8-ft
blade diat scrapes up one truckload of contaminated soil. Sample unit, remediation
unit, and exposure unit need to be defined (18) and dien incorporated by a geostat-
istician into the sampling plan.
The critical mass of a correct sample should be calculated as previously
explained (see die subsection Analytical Error). The spatial variance of the sampling
unit should be measured by taking "too many" equally spaced samples in several
units in an exploratory sampling trip to the site. If die sampling unit is larger in spa-
tial variance (a large spatial variance can be encountered in a small area), dien die
field samples design will have to use composite samples. In spatial compositing die
geometry as well as the mass of the subsamples (samples to be composited) is
important. The general rule is that subsamples should be equally spaced on the
sampling unit. For example, if four subsamples can be afforded, then one should be
taken from each quarter of die in situ support. Each subsample for the compositing
should be a correct sample (see the subsection Analytical Error). All samples for all
analyses, even eye-balling, should have die same representativeness, which for com-
posite samples means die same number of subsamples. The composite field sample,
just like any other average, has a variance divided by die number of subsamples.
Homoscedastidty (equality of variances) is a requirement for every data analysis even
when eye-balling die data. If die quantity to be estimated (e.g., remediation or expo-
sure unit) equals die sampling unit, punctual kriging analysis may be used because
mere is no change of scale or support. If the desired area or volume of estimation is
larger than the sampling unit, block kriging will have to be used.

Survey or Semivariogram Sampling

In a multiphase sampling program using spatial statistics, the primary goal in the
initial exploratory sampling is die collection of enough data to compute an empiri-
cal semivariogram and to determine the extent of the plume. These goals may con-
flict if limited resources are available. Widely spaced samples are needed to define
extent, and closely spaced samples are often needed for semivariogram analysis.
Approaches to diis problem include regular grids (i.e., radial, square, or rectangu-
lar), transects, and combinations.
Burgess et al. (19) suggested transect sampling for variogram input, and this
idea led to very good variograms in agricultural applications. However, in pollution
monitoring, transects alone have given very noisy variograms. This result is probably

-------
796
PRINCIPLES OF ENVIRONMENTAL SAMPLING
due to intrinsic noise in pollution data, which is often highly skewed and contains
high coefficients of variation. A combination exploratory grid, consisting of a grid of
square sampling units having extended transects in the directions of the major axis
and minor axis of the estimated plume (20), is illustrated in Figure 7. Prior informa-
tion may be used to select the best grid orientation. For example, if the plume to be
investigated was made by aerial deposition from an identifiable source, then wind
roses can be examined for wind direction and magnitude, and topographic maps
can be examined for natural barriers. Only the relatively regular grid concept is
important in Figure 7; the orientation is site-specific.
If the extent of the plume must be found, and funds are limited, then the tran-
sect samples should be variably spaced closer together at the grid center and farther
apart at the grid extremes. The purpose of this sampling is to capture the correlation
structure of the plume. Inhabited areas have a high occurrence of disturbed sam-
pling sites and local pollution from secondary sources, which are only stochastic
noise to the semivariogram's calculation. Therefore, this noise should be avoided by
this sampling. For example, aerially deposited smelter Pb should not be mixed with
auto Pb by taking samples along the freeways. The samples from the semivariogram
sampling can be pooled with the secondary mapping samples if they have the same
support.
However, the semivariogram sampling often is the sampling that tests for the
need for more compositing. If the support is changed between the samplings and
we wish to pool the samples for analysis, then the change in support must be cor-
rected before pooling. The sampling team must be aware of the need to keep all
samples on the same support. When compositing, the same number and mass of
Major Axis of Plume
Figure 7, An exploratory grid design, (Reproduced from reference 16. Copyright 1988 American
Chemical Society.)

-------
40. FLATMAN AND YFANTIS  GeostatMcal Sampling Designs                   797

subsamples and the same spacing or geometry must be maintained. When the sam-
pled locations must move from the regular grid to avoid cultural improvements or
natural barriers, then the spatial analysis program is corrected for this movement by
the true coordinates of the new sample locations; however, no easy method is avail-
able for the program to correct for change of support. If the microvariation could be
sampled and the support established before the semivariogram sampling, then a
complex statistical problem could be avoided in the pooling of samples for the spa-
tial analysis.
     Some samples should be taken close together (in the scale of the sampling
unit) to determine the need for composite samples. This sampling can be combined
with field  duplicates for quality analysis and control. Gy's fundamental error and
compositing become more important as coring volumes decrease. These microvaria-
tion samples should be taken at a distance of a few multiples of the core's diameter
apart. The distance between sample locations or grid unit's length needs to be esti-
mated from the sample unit of interest (e.g., residential  yard, city block, or square
mile section) and the desired output unit (e.g., remediation unit, that is, the mini-
mum volume of surface soil to be removed). The optimum exploratory sampling
distance is a proper fraction of these measurements, but it is often determined by
money available for sampling.


Census or  Sampling for Map Making

In spatial statistics, the goal of secondary sampling Is to uniformly cover the area in
question with a density of samples sufficient to contour  the plume with an accept-
able error of interpolation. This sample coverage is accomplished by using the direc-
tional semivariograms to determine the orientation, shape, and size of the grid cell.
Independent random variable statistics, in which the number of samples is com-
puted, differs from spatial statistics, in which orientation, shape, and size of the grid
are calculated and the number of samples is determined from the number of grid
cells needed  to cover the area.
     If the directional semivariograms have a marked difference in their respective
ranges of correlation, then the optimum cell geometry is not a square but a rectan-
gle with the longer side in the  direction of the longer range of correlation, and the
ratio of the sides should be the ratio of the ranges of correlation. Thus the grid cell
sides are of equal correlation or kriging (interpolation) variances rather than equal
distance. This characteristic will save a lot of samples while retaining the same accu-
racy in both directions.

Boundary.  For secondary sampling, the extent of the sampling grid must first be
chosen. The sampling grid must extend beyond the suspected  plume or area in
question. The area in question must be bounded by sampling locations to avoid
extrapolation in the kriging estimation algorithm for contouring.  Extrapolation,
which is estimating a value from data on only one side of the location of the point to
be estimated, is likely to lead to unrealistically high or low values. If an action level

-------
 798                              PRINCIPLES OF ENVIRONMENTAL SAMPLING

 has been set and a part of the plume has been adequately proven to be above or
 below the acting level, then that part of the plume need not be resampled. The sam-
 pling may be guided more by population areas or critical receptors than by the
 actual plume. The goals of the sampling must be written, and the areas of interest,
 action levels, and action areas (sampling unit, remediation unit, and exposure unit)
 must be defined before the optimum grid design can be made.

 Compositing Samples Reduces Nugget. The next step in secondary sampling
 is choosing the sample support  (21). If a residential yard is the sampling unit, then
 the ideal sampling process would be to take the entire yard, blend it to homogene-
 ity, and remove the appropriate number of aliquots or splits to meet the volume
 needed by the laboratory for analysis. However, because few residents would donate
 their whole yard to science, and laboratory mixing equipment such as V-blenders or
 ball mills cannot homogenize so large a volume, this sampling unit must be repre-
 sented by a few symmetrically laid out subsamples composited together. The num-
 ber of subsamples is a compromise between the size of the microvariance and the
 amount of time and money allowed for the digestion of the subsamples. The sub-
 samples are laid out symmetrically because a structural or spatial correlation may
 exist.
    The mixing of the subsamples to achieve  homogeneity is essential for com-
 positing. If the medium is water, then the task is relatively easy; for soils or sedi-
 ments, the  task is difficult. Aliquots or splits should be taken after the mixing to
 make  the final sample more representative. If a large nugget (e.g., C0 >  0.3 relative
 variance) persists after Gy's critical mass calculations and compositing within  the
 support, then the relative sizes of the field sampling and the laboratory analysis
 errors must be identified. The analysis of some pollutants has an analytical error that
 overwhelms the field sampling error and accounts for approximately all the semivar-
 iogram nugget.
    The minimum volume at each step and especially the aliquot used by  the
 chemical analyst in die lab must exceed the critical mass referred to in Gy's theory
 (see die section Pierre Gy's Sampling Theory),

 Grid  Unit Length or Distance Between Sample Locations. The range of
 correlations, the nugget (CJ, and the sampling budget determine the grid unit
 length, or the distance between sample locations. This length determination was dis-
 cussed in mathematical detail byvYfantis et al. (H). Figure 8 shows the graphs of
interpolation variance as a function of die ratio of grid spacing to range of correlation
 for a family of semivariograms. The model variograms each have relative Cl and C0
so that dieir sum equals 100%. The variograms differ only in the fraction of die sill
 (C0 + Cj) represented by the nugget component (CJ and the structure (Cp.  If the
semivariogram has a big nugget like the top graph of Cj = 10% and C0 = 90%, dien
diminishing returns (the curve has less rapid vertical drop and becomes more hori-
zontal) start and increase if the sample distance is less than two-diirds of die range
of correlation. For a very low nugget, such as the lowest graph (Cj  = 100% and C0

-------
40.  FLATMAN AND YFANT1S  Geostatistical Sampling Designs                    799
Variance
c
o
«-»
to
o
CL
^
u
*-
c
o
_c
'5>
a
0>
»-
o
c

i$*^_
^ x \ "~ 	 _ C,
V ^ ^ ^ ^ - "--"- "c"
Xxx^""----..... 	 r "~c>
^--__ c,
c,
II 1 1
= 10 C0=90
=20 C0=80
^40~C~=60~
=60 C^40~
=80 C0=20
= 100 C0=0
i
  1.0    2/3  1/2         1/3                 1/4                     1/5
        Ratio of Distance between Samples to Range of Correlation

figure 8. Diminishing information for additional samples. (Reproduced from reference 16. Copyright
                        1988 American Chemical Society.)
= 0%), diminishing returns do not start and increase until the sampling distance is
less dian one-half of the range  of correlation. The general rule is that for smaller
nuggets (Cg), the distance between sampling points on the sampling grid gets
smaller. The grid should be laid out with no vertices unsampled. If this design
exceeds budget, then the whole grid size should be adjusted, not just certain ver-
tices left unsampled as in systematic random sampling.
     Some real-world examples can clarify how the magnitude of the nugget (CJ
and the range of correlation determine the optimum cell size or distance between
samples. One Pb smelter had a nugget of about 40% and a range of correlation of
3200 ft. In Figure 8, the family of diminishing return curves and  the graph (for C0
= 40%) indicates by observation and judgment that  the point of  diminishing
returns is between one-third (33.3%) and one-fourth (25%)  of the range of correla-
tion, or 29% for the sake of argument. The sampling distance should not be less
than 29%  x 3200 ft, or 928 ft. Expressed as a function of money,  the sampling dis-
tance should be the shortest affordable distance in keeping with die toxicity of the
pollutant, but not less than 928 ft between samples.  In contrast, a second Pb
smelter had a semivariogram widi a nugget of zero (0) and a range of correlation of
2400 ft. In Figure 8, die curve of diminishing returns for C0 = 0 indicates by obser-
vation and judgment that die point of diminishing returns  is between one-fourth
(25%) and one-fifth (20%) of the range of correlation, or 22.5% for the  sake of argu-
ment. In this case, die sampling distance should not be less  than  22.5% x 2400 ft,
or 540 ft.  Expressed as a function of money, the sampling distance should be the
shortest affordable distance in keeping with the toxicity of the pollutant, but not less
than 540 ft. If the funding is adequate and die pollutants are of extreme toxicity,

-------
800                              PRINCIPLES OF ENVIRONMENTAL SAMPLING

then the distance indicated by the point of diminishing returns should be used in
minimum interpolation variance. If there is less money and the pollutant is less
toxic, then a longer distance should  be used for the grid cell's side. The directional
semivariograrns should orient the sampling grid.
     The Pb smelters mentioned previously worked well with an east-west and
north-rsouth grid because the plume was formed by 80 years of aerial deposition. A
third set of data, dioxin along a highway, gave a readable semivariogram in a direc-
tion of 13 degrees from east to west. This discovery took much searching because
we started with the default directions (0, 45, 90,135 degrees) of the semivariogram
software; these default semivariograms showed no structure [pure nugget semivari-
ograrns (C0 =  100%)]. After we discovered the semivariogram at 13 degrees the rea-
son became obvious, because the road that was the transport of the pollutant ran at
that angle and so should any sampling grid.
     In the field, some vertices cannot be sampled because of man-made improve-
ments or natural barriers, but these vertices must be sampled as closely as possible,
and die actual coordinates should be used in the spatial analysis program.

Grid Orientation and Shape Versus Anisotropy.  If the ranges of correlation
are extremely different on the directional semivariograms, then the correlation
structure is anisotropic. Optimum  sampling patterns reflect this anisotropy. For
example, the sides of a rectangular grid would be in die same ratio as die ranges of
correlation for the corresponding  directional semivariograms. This ratio  was
explained in detail by David (22), and a  sampling design for logaridimic anisotropy
was derived by Barnes (23).  Anisotropy is a frequent  occurrence, but often the
semivariogram sampling gathers too few samples to measure it.  Thus, more sam-
ples may be used cost-effectively in die semivariogram sampling in order to save
samples  in the larger census (or mapping) sampling  by identifying and taking
advantage of anisotropy.
    Use of the triangular grid as opposed to the rectangular grid has been dis-
cussed (13, H). If the nugget is large (C0 > > Cj), litde is gained by the triangular
grid. Also, the triangular grid makes taking advantage of anisotropy more difficult. If
a triangular grid is chosen, a theodolite, which is a surveying instrument, is not
needed in die field; instead every odier row of samples must be offset by one-half of
a grid lengdi.  In practice, diis action is easier than it sounds and almost as easy as
die traditional square grid.


Beyond Anisotropy

Numerous additional geostatistical considerations affect environmental sampling.
These considerations include spatial drift  or trend, multivariate analysis, mixed or
overlapping populations, concentration-dependent variances, and specification of
confidence limits. Geostatistical techniques have been developed over the years to
deal with these various problems, but an adequate discussion is beyond die scope of
diis chapter.

-------
40. FLATMAN AND YFANTIS  Geostatistical Sampling Designs                      801


                            Acknowledgments

The EPA, through its Office of Research and Development, funded and performed
the research described here.



                                 References

  1. Gilbert, R. O. Statistical Methods for Environmental Pollution Monitoring; Van Nostrand
    Reinhold: New York, 1987.
  2. Pitard, F. F. Pierre Gy's Sampling Theory and Sampling Practice; CRC Press: Boca Raton, FL,
    1989; Vol. 1 & 2.
  3. Isaaks, E. H.; Srivastava, R. M. An Introduction to Applied Gcostatistics; Oxford University:
    New York, 1990; pp 1-592.
  4. Pitard, F. F. In Pierre Gy's Sampling Theory and Sampling Practice; CRC Press: Boca Raton,
    FL, 1989; Vol. 2, p 36.
  5. Gilbert, R. O. In Statistical Methods for Environmental Pollution Monitoring; Van Nostrand
    Reinhold: New York, 1987; pp 35-42.
  6. Pitard, F. F, In Pierre Gy's Sampling Theory and Sampling Practice; CRC Press: Boca Raton,
    FL, 1989; Vol. l.pp 169-183.
  7. Preparation of Soil Sampling Protocols: Sampling Techniques and Strategies; Center for Envi-
    ronmental  Research   Information:   Cincinnati,   OH,   1992;   pp   A1-A16;
    EPA/600/R-92/128.
  8. Pitard, E E In Pierre Gy's Sampling Theory and Sampling Practice; CRC Press; Boca Raton,
    FL. 1989; Vol. I.pl90.
  9. Matheron, G. Earn, Geol, 1963, 58, 1246-1266.
10. Journel, A. G. Geostatistics/or the  Environmental Sciences; Stanford University: Stanford,
    CA, 1986.
11. Palmer, M. W Vegetation (Dordrecht, Netherlands) 1988, 75, 91-102.
12. Cliff, A. D.; Ord, J. K. Spatial Processes: Models and Applications; Pion: London, 1981; pp
    1-266.
13. Olea, R. A. Math. Geol. 1984, J6(4), 369-392.
14. Yfantis, E. A.; Flatman, G. T; BeharJ. VMath. Geol. 1987,19(3). 183-205.
15. Flatman, G. T; Yfantis, E. A. Environ. Monit. Assess. 1984, 4, 335-349.
16. Flatman, G.  T;  Englund, E. J.; Yfantis, A. A. In Principles of Environmental Sampling;
    Keith, L H.,  Ed.; ACS Professional Reference Book; American Chemical Society: Vvfcsh-
    ington, DC, 1988; pp 73-84.
17. Borgman, L  E.; Quimby, W E In  Principles of Environmental Sampling; Keith, L.  H., Ed.;
    ACS Professional Reference Book; American Chemical Society: \\kshington, DC, 1988;
    pp 25-43.
18. Neptune, D.; Brandy, E.  E; Messner, M. J.; Michael, D. I. Hazard. Mater. Control 1990,
    May/June, 19-25.
19. Burgess, T M.; Webster, R.; McBratney, A. B. J. Soil Sci. 1981,32, 643-659.
20. Starks, T H.; Brown, K. W; Fisher, N. J. In Quality Control in Remedial Site Investigation;
    American Society for Testing and  Materials:  Philadelphia, PA,  1986; Vol. 5, pp 57-66;
    ASTMSTP925.
21. Starks, T H. Math. Geol. 1986,18(6), 529-537.
22. David, M. Geostatistical Ore Reserve Estimation; Elsevier Scientific: Amsterdam, 1977.
23. Barnes, M. G. Statistical Design and Analysis in the Cleanup of Environmental Radionuclide
    and Other Spatial Phenomena; TRAN STAT (Statistics for Environmental Studies)  No. 13,
    Battelle Memorial Institute: Richland, WA, 1980; pp 1-21.
    Reprinted from ACS Professional Book
    Principles of Environmental Sampling
    Lawrence H. Keith, Editor
    Published 1996 by the American Chemical Society

-------
Welcome to the 12th Annual EPA Conference, on Statistics
After a year's hiatus, it is a pleasure to welcome you to the 1997 EPA Conference on
Statistics. What a difference a year can make! Last year, we postponed the conference and held
a series of "local" training sessions. While these were a great success, they were local and. ,<
accessible only to those folks in the .Washington or Raleigh/Durham areas.' I heard from many
people that while we may have successfully circumvented the travel issue, the opportunity was
not available to everyone.

So here we are all together again in Richmond for a conference that will include all the
elements that we have heard you want. There will be formal-sessions, plenary sessions,
workshops, training, round tables, rectangular tables, panel discussions, poster sessions, and
outstanding speakers. The. theme this year is "Future Directions for Statistics at EPA". And
what could be a better time to discuss future directions than right after the Administrator's
announcement of the creation of the new EPA Center for Environmental Information and
Statistics. In fact, we have changed the schedule around in a last-minute upheaval to arrange for
Agency officials responsible for the CEIS to be here to present their vision and respond to your
comments and suggestions.

I also want to encourage you to avail yourself of the informal opportunities'here to
discuss common questions and concerns with fellow statisticians. It's no secret that some of the
best information is garnered in the hallways, over dinner, or while waiting fpr the elevator. We
are always anxious to make it even better. I want to thank the planning.and arrangements
committees for their, efforts to organize this conference and Margaret Conomos for her assistance
in coordinating transportation. It always looks deceptively easy, and we owe it to the hard work
of these people that it is easy for the rest of us. Special thanks are in order for Marcia Gardner of
SRA Technologies, Inc., who handled all the details so well.
Barry D. Nussbaum
Conference Chairman
Repository Material
Permanent Collection
Conference Planning Committee
John Fox
Henry Kahn
Elizabeth Margosches
John Warren
EJBD
ARCHIVE
EPA
230-
R-
97-
003
Arrangements Committee

Joan Bundy
Pat Wilkinson

US EPA
Headquarters and Chemical Libraries
EPA West Bldg Room 3340
Mailcode 3404T
1301 Constitution Ave NW
Washington DC 20004
202-566-0556

-------
AGENDA

-------
            TWELFTH ANNUAL EPA CONFERENCE ON ENVIRONMENTAL STA TISTICS

                                  Richmond, VA -April1-3,1997


                                           AGENDA


                                     Tuesday, April 1,1997


REGISTRA TION - Conference Area Foyer                                              9 30- 10 30am

OPENING SESSION - Grand Ballroom Section B        .                              10 30 am-12 00pm
        Welcome & Introductions -  Barry Nussbaum, Chair, Conference Planning Committee
        Keynote Speaker: N. Phillip Ross, Director, Center for Environmental Statistics
                      "Center for Environmental Information and Statistics (CEIS) "
                  Followed by an Interactive Discussion with the CEIS Development Staff
                   -Tat* Cufrft.* ( B.t(  H-V-^ )
                                          Lunch Break                             12 00-115pm

TRAINING SESSION I-A - Georgian/Elizabethan Rooms                    .                I 15-4 45pm
        EnvironmentalStatsfor S-PLUS: Software for Environmental Statistics
        Presenters- Steven Millard, PSI. and Nagaraj Neerchal, University of Maryland Baltimore Campus


SESSION I-Raleigh/Drake Rooms        .            .             .                      115-2 45pm
        Cancer Statistics, Epidemiology and Genetics             Chair  Ruth Allen. NCI
        "Atlas of Cancer Mortality in the United States, 1970-92"   Presenter  Susan Devesa, NCI
        "Evaluating Disease Cluster Alarms"                    Presenter  Martin Kulldorff, NCI


                                   Break - Conference Area Foyer                      2 45 - 3 00pm

TRAINING SESSION I-A - (continued)                                                   3 00-4 45pm

SESSION II - Raleigh/Drake Rooms   .   .                  .       .                       3 00-4 45pm
        Representativeness in Statistics and Quality Assurance     Chair  John Warren, ORD
        Presenters  John Warren, ORD, and Malcolm J Bertoni, RTI


ROUNDTABLE DISCUSSIONS                                                         4 45-6 00pm

        GROUP A - Georgian/Elizabethan Rooms
               Statistics & Health      Facilitators Ruth Allen, NCI, and Elizabeth Margosches, OPPTS

        GROUP B - Raliegh/Drake Rooms
               Qualify Assurance      Facilitator John Warren, ORD

        GROUP C - Grand Ballroom, Section B
               Statistical Research     Facilitators Barry Nussbaum, OPPE, and Larry Cox, ORD

        GROUP D - Milliard Room
               Risk & Uncertainly     Facilitator- Barnes Johnson, OSWER

-------
                                     Wednesday, April 2,1997

SESSION III - Georgian/Elizabethan Rooms                                        .       8 45-10-15 am
        How Severe Is It?                                     Chair   ElizabethMargosches, OPPTS
        "Toxic Severity for a Useful and Understandable Benchmark Dose "   Presenter Linda Teuschler, ORD
        "Severity Analysis Using Ridits "                                 Presenter Mary Marion, OPPTS

SESSION IV - Raleigh/Drake Rooms                                                     845-10 15 am
        Exposure Assessment    Chair  John Fox, OW
        "Interpreting Data from a National Survey of Protozoan in Dnnking-Water Sources "
               Presenter. John Fox, OW
        "Relationships Between Dioxins in Soil, Air, Ash, and Emissions from a Municipal Solid Waste Incinerator
        Emitting Large Amounts ofDioxins "
               Presenter Matthew Lorber, ORD
        "Statistical Modeling ofDioxin Concentration Data from Sediment Cores "
               Presenter Paul Pinsky, ORD

                                    Break - Conference Area Foyer                     10 15 -10 30 am

PANEL DISCUSSION - Grand Ballroom, Section B  .                                  10 30am-12 00pm
                       EPA Cooperative Agreements     Chair Barry Nussbaum, OPPE
Participants Larry Cox, ORD, Peter Guttorp, University of Washington, G P Patil, Pennsylvania State i
TRAINING SESSION I-B - Georgian/Elizabethan Rooms                                     I 15 -4 45pm
        EnvrionmentalStatsfor S-PLUS: Software for Environmental Statistics
        Presenters- Steven Millard, PSI, and Nagaraj Neerchal, University of Maryland Baltimore Campus

TRAINING SESSION 2 - Raleigh/Drake Rooms ....                      .            .    I 15-3 30pm
        Spatial Statistics Sampling               Chair  George Flatman, EPA, Las Vegas
        "Spatial Sample Design "                 Presenter  Evan Englund, EPA, Las Vegas
        "SkewedFrequency Distributions"         Presenter. George Flatman, EPA, Las Vegas

                                    Break - Conference Area Foyer                       2.15-2 30pm

TRAINING SESSION I-B (continued)                                                     2 30-4 45pm

TRAINING SESSION 2 - (continued)                                                      2 30-3 30pm

SESSION V - Raleigh/Drake Rooms .                                                     3 30-4 45pm
        Applications of Statistical Calibration Techniques in Analyzing Environmental Data
                       Chair  Bimal Sinha, University of Maryland Baltimore Campus (UMBC)
                "Confidence Regions and Tests in a Calibration Problem "
                       Presenter  Thomas Mathew, UMBC

MINI SESSION A - Milliard Room                      .                                  4 45-5 30pm
        Water Quality & Fishy Statistics  Chair/Presenter  Henry Kahn, OW
        "Recent Developments in the Estimation of US Fish Consumption "

MINI SESSION B - Georgian/Elizabethan Rooms                                           4 45-5 45pm
        Statistics  and the Internet        Chair/Presenter  Chapman Gleason, OPPE
        "Using the Web and other Networking Technologies in Support SASfor the Enterprise "

-------
AGENDA - page 3

Wednesday, April 2,1997 (continued)

RECEPTION & POSTER PRESENTA TIONS - Capitol Room . 5 30-6 45pm

Pesticide Residue Monitoring Data
Presenter Edward Brandt, EAB, OPP, OPPTS

A Master Sampling Frame for the Collection of Non-Agricultural Pesticide Usage Data
Presenter Alan R. Goozner, EAB, OPP, OPPTS

The National Air Quality and Emissions Trends Report, 1995
Presenter- David Mintz, OAR

Thursday, April 3,1997
SESSION VI - Grand Ballroom, Section B .. . 845-10 15 am
Statistics of Measurement in Analytical Chemistry
Chair Henry Kahn, OW
"A Two Component Model for Error in Analytical Chemistry and Issues of Detection and
Quantification "
Presenter. David M Rocke, Director, Center for Statistics in Science and Technology,
University of California, Davis
"Estimation of Precision of Low Concentration Chemical Analytical Measurements and
Establishment of Detection and Quantification Limits"
Presenters- Henry Kahn, OW, Kathleen Stralka and Raphael Kuznetsovski, SAIC

Break -Conference Area Foyer .. 10 15-10 45 am

CLOSING SESSION'- Grand Ballroom, Section B . . . .. 10 45-12 30pm
Featured Speaker. Daniel B. Can, School of Information Technology and Engineering, George Mason
University
"Statistical Graphics for Environmental Applications Developments and Challenges "
Bus to EPA Headquarters leaves at 1:30 pm

-------
ATTENDEE LIST

-------
                       Twelth Annual EPA Conference
                          on Environmental Statistics
                                List of Attendees
Ruth Allen
National Cancer Institute
Division of Cancer Epidemiology
  and Genetics
6130 Executive Boulevard, MSC 7395
EPN Room 535
Bethesda, MD 20852-7395
(301)496-1609
Fax: (301)402-4279
Allenr@epndcc.nci.nih.gov

Robin Anderson
OAR/ORIA
U.S. EPA (6603J)
401 M Street, SW
Washington, DC 20460
(202) 233-9385
Fax: (202)233-9650
Anderson.Robin@epamail.epa.gov

David Annett
Support Contrator for NCI (SEER Program)
IMS, Inc.
12501 Prosperity Drive, Suite 200
Silver Spring, MD 20904
(301)680-9770
Fax: (301)680-8304
David_Annett@nih.gov

Lara Autry
OAR/OAQPS/EMAD
U.S. EPA (MD-19)
Research Triangle Park, NC  27711
(919)541-5544
Fax: (919)541-1039
Autry.Lara@epamail.epa.gov

Jeff Beaubur, Ph.D.
OPPTS/HERD
U.S. EPA (7403)
401 M Street, SW
Washington, DC 20460
(202) 260-2263
Fax: (202)260-1279
Malcolm Bertoni
Research Triangle Institute
401 M Street, NW, Suite 740
Washington, DC 20460
(202) 728-2067
Fax: (202)728-2095
mjb@rti.org

Ed Brandt
OPPTS/OPP
U.S. EPA (7503W)
Office of Pesticides
401 M Street, SW
Washington, DC 20460
(703) 308-8050
Fax: (703)308-8151
Brandt.Edward@epamail.epa.gov

Lori Brunsman
OPPTS/OPP/HED
U.S. EPA (7509C)
401 M Street, SW
Washington, DC 20460
(703) 308-2902
Fax: (703)305-5147
Brunsman.Lori@epamail.epa.gov

Judy Calem
OW/OGWDW
U.S. EPA (4607)
401 M Street, SW
Washington, DC 20460
(202) 260-8638
Fax: (202)260-3762
Calem.Judy@epamail.epa.gov

Daniel Carr
George Mason University
School of Information Technology
  and Engineering
Fairfax, VA 22030-4444
(703)993-1671
Fax:  (703)993-1521

-------
Steven Chang
OSWER/OERR
U.S. EPA (5204G)
401 M Street, SW
Washington, DC 20460
(703)603-9017
Fax:  (703)603-9104
Chang.Steven@epamail.epa.gov

Darlene Cockfield
OPPE/OSPED/EID
U.S. EPA (2163)
401 M Street, SW
Washington, DC 20460
Fax:  (202)260-4903

Margaret Conomos
OPPE
U.S. EPA (2164)
401 M Street, SW
Washington, DC 20460
(202) 260-3958
Fax:  (202)260-4968
Conomos.Margaret@epamail.epa.gov

Lawrence Cox
ORD/NERL
U.S. EPA (MD-75)
Research Triangle Park, NC 27711
(919)541-2648
Fax:  (919)541-7588
Cox.Larry@epamai 1 .epa.gov

John Creason
ORD/NHEERL
U.S. EPA Room 215 ERC (MD-55)
Research Triangle Park, NC 27711
(919)541-2598
Fax:  (919)541-5394
Creason.John@epamail.epa.gov

David Crosby
American University
Department of Mathematics
   and Statistics
4400 Massachusetts Avenue, NW
Washington, DC 20016
(202)885-3135
Fax: (202)885-3155
Crosby@nzms.wwb.noaa.gov
Thomas Curran
OAR/OAQPS
U.S. EPA (MD-12)
Research Triangle Park, NC 27711
(919)541-5694
Fax:  (919)541-4028
Curran.Thomas@epamai 1 .epa.gov

Susan Devesa
National Cancer Institute
EPN, Room 415
Bethesda, MD  20892
(301)496-8104
Fax:  (301)402-0081
Devesas@epndce.nci.nih.gov

Donald Doerfler
ORD/ERC
U.S. EPA (MD-55)
Research Triangle Park, NC 27711
(919)541-7741
Doerfler.Donald@epamaiI.epa.gov

Evan Englund
ORD/NERL-CRD (CAP)
U.S. EPA
P.O. Box 93478
Las Vegas, NV  89193-3478
(702) 798-2248
Fax:  (702)798-2107
Englund.Evan@epamail.epa.gov

George Flatman
ORD/NERL-CRD
U.S. EPA
P.O. Box 93478
Las Vegas, NV  89193-3478
(702) 798-2528
Fax:  (702)798-2208
Flatman.George@epamail.epa.gov

John Fox
OW
U.S. EPA (MC-4303)
401 M Street, SW
Washington, DC 20460
(202) 260-9889
Fax: (202)260-7185
Fox. John@epamai 1 .epa.gov

-------
Mary Frankenberry
OPPTS/OPP/EFED
U.S.EPA(7507C)
401M Street, SW
Washington, DC 20460
(703) 305-5694
Fax:  (703)305-6309
Frankenberry .Mary@epamai I .epa.gov

Chapman Gleason
OPPE
U.S. EPA (2163)
401M Street, SW
Washington, DC 20460
Gleason.Chapman@epamail.epa.gov

Alan Goozner
OPPTS/OPP/BEAD
U.S. EPA (7503W)
401 M Street, SW
Washington, DC 20460
(703)308-8147
Fax:  (703)308-8151
Goozner. A lan@epamai I .epa.gov

Peter Guttorp
University of Washington
National Research  Center for Statistics
  and the Environment
Box 351720
Seattle, WA 98195-1720
(206)616-9262
Fax:  (206)616-9443
Peter@stat.wash ington.edu

Karen Hogan
OPPTS/OPPT
U.S. EPA (7403)
401M Street, SW
Washington, DC 20460
(202) 260-3895
Fax:  (202)260-1279
Hogan.Karen@epamail.epa.gov
David Holland
ORD/NHEERL
U.S. EPA (MD-56)
ERC Annex
Research Triangle Park, NC 27711
(919)541-3126
Fax: (919)541-1486
Hoi land .Dav id@epamai 1 .epa.gov

William F. Hunt, Jr.
OAR/OAQPS/EMAD
U.S. EPA (MD-14)
Research Triangle Park, NC 27709
(919)541-5536
Fax: (919)541-2357
Hunt.Bill@epamail.epa.gov

Helen Jacobs
OW
U.S. EPA (4303)
401 M Street, SW
Washington, DC 20460
(202)260-5412
Fax: (202)260-7185
Jacobs.Helen@epamail.epa.gov

Barnes Johnson
OSWER/OSW
U.S. EPA (5307W)
401 M Street, SW
Washington, DC 20460
(703) 308-8855
Fax: (703)308-0511
Johnson.Bames@epamail.epa.gov

Henry Kahn
OW/EAD
U.S. EPA (MC-4303)
401M Street, SW
Washington, DC 20460
(202) 260-5408
Fax: (202)260-7185
Kahn.Henry@epamail.epa.gov

-------
Douglas Kendall
U.S. EPA Region VIII
NEIC/OECA, Building 53, Box 25227
Denver Federal Center
Denver, CO 80225
(303)236-5132x281
Fax: (303)236-5116
Kendall.Douglas@epamail.epa.gov

Mel Kollander
Temple University
Institute for Survey Research
2300 M Street, NW, Suite 800
Washington, DC 20037
(202) 973-2820
Fax: (202)973-2821
Melk@gwis2.circ.gwu.edu

Martin Kulldorff
National Cancer Institute
Biometry Branch, DCPC
EPN 344, 6130 Executive Boulevard
Bethesda, MD 20892
(301)496-7519
Fax: (301)402-0816
MartinK@helix.nih.gov

Raphael Kuznetsovski
SAIC/Reston Facility Directory
11251 Roger Bacon Drive
Reston,VA 20190
(703)318-4553
Fax: (703)709-1040
Rkuznetsovski@lan813 .ehsg.saic.com

James R. Lee
American University
School of International Service
Washington, DC 20016
(202)885-1691
Fax:(202)885-2494
Jlee@American.edu

Matthew Lorber
ORD/NCEA
U.S. EPA (8623)
401 M Street, SW
Washington, DC 20460
(202) 260-3924
Fax: (202) 260-6370
Lorber.Matthew@epamai 1 .epa.gov
Arthur Lubin
U.S. EPA Region V
Office of Strategic Environmental Analysis
77 West Jackson Boulevard
Chicago, 1L 60604-3507
(312)886-6226
Fax:  (312)353-0374

Elizabeth Margosches
OPPTS/OPPT
U.S. EPA (7403)
401 M Street, SW
Washington, DC 20460
(202)260-1511
Fax:  (202)260-1279
Margosches.Elizabeth@epamail.epa.gov

Mary Marion
OPPTS/OPP/HED
U.S. EPA
401M Street, SW
Washington, DC 20460
(703) 308-2854
Marion.Mary@epamail.epa.gov

Thomas Mathew
University of Maryland
Department of Mathematics and Statistics
1000 Hilltop Circle
Baltimore, MD 21250
(410)455-2418
Fax:  (410)455-1066
Mathew@umbc2.umbc.edu

Steven P. Millard
Probability, Statistics and Information (PSI)
7723 44th Avenue, NE
Seattle, WA 98115-5117
(206) 528-4877
Fax:  (206)528-4802
Smillard@nwlink.com

David Mintz
OAR/OAQPS
U.S.  EPA (MD-14)
Research Triangle Park, NC  27711
(919)541-5224
Fax:  (919)541-1903
Mintz.David@epamail.epa.gov

-------
Nagaraj Neerchal
University of Maryland Baltimore Campus
Department of Mathematics and Statistics
1000 Hilltop Circle
Baltimore, MD  21250
(410)455-2637
Fax: (410)455-1066
Nagaraj@math.umbc.edu

Barry Nussbaum
OPPE/CES
U.S. EPA (2163)
401M Street, SW
Washington, DC 20460
(202)260-1493
Fax: (202)460-4968
Nussbaum.Barry@epamail.epa.gov

Brenda Odom
ORD/QAD
U.S. EPA (8724)
401 M Street, SW
Washington, DC 20460
(202)260-8194
Fax: (202)401-7002
Odom.Brenda@epamail.epa.gov

G. Patil
The Pennsylvania State University
Center for Statistical Ecology and
  Environmental Statistics
421 Thomas Building
University Park, PA 16802
(814)865-9442
Fax: (814)863-7114
Gpp@stat.psu.edu

Hugh Pettigrew
OPPTS/OPP
U.S. EPA (MC-7509C)
401 M Street, SW
Washington, DC 20460
(703)305-5699
Fax: (703)305-5147
Penigrew.Hugh@epamail.epa.gov
Andrea Pfahles-Hutchens
OPPTS/OPPT
U.S. EPA (7403)
401 M Street, SW
Washington, DC 20460
(202) 260-0288
Fax:  (202)260-1279

Paul Pinsky
ORD/NCEA
U.S. EPA (8623)
401 M Street, SW
Washington, DC 20460
(202)260-1079
Fax:  (202)260-3803
Pinsky.Paul@epamail.epa.gov

Esperanza Renard
ORD/NCERQA/QAD
U.S. EPA (MS-104)
2890 Woodbridge Avenue
Edison, NJ 08837
(908)321-4355
Fax:  (908)321-6640
Renard.Esperanza@epamail.epa.gov

David Rocke
University of California, Davis
Graduate School of Management
Davis, CA 95616
(916)752-7368
Fax:  (916)752-2924
dmrocke@ucdavis.edu

Randall Romig, Ph.D.
U.S. EPA Region VI
(6MD-HX)
1445 Ross Avenue
Dallas, TX 75202-2733
(214)665-8346
Fax: (214)665-8072
Romig.Randall@epamail.epa.gov

N. Phillip Ross
OPPE/OSPED/CES
U.S. EPA Room 3101 M(2163)
401 M Street, SW
Washington, DC 20460
(202) 260-5244
Fax: (202)260-8550
Ross.Nphillip@epamail.epa.gov

-------
Robert Runyon
U.S. EPA, Region II
2890 Woodbridge Avenue
Edison, NJ 08837
(908)321-6645
Fax:  (908)906-6824
Runyon .Robert@epamai I .epa.gov

Judy Schmid
ORD/NHEERL
U.S. EPA (MD-55)
ERG
Research Triangle Park, NC  27711
(919)541-0486
Fax:  (919)541-5394
Schmid.Judy@epamail.epa.gov

Mark Schmidt
OAR/OAQPS/EMAD/AQTAG
U.S. EPA (MD-14)
AQTAG, EMAD
Research Triangle Park, NC  27711
(919)541-2416
Schm idt. Mark@epamai I .epa.gov

R. Wood row Setzer
ORD/NHEERL
U.S. EPA (MD-55)
Research Triangle Park, NC  27711
(919)541-0128
Fax:  (919)541-5394
Setzer. Woodrow@epamai I .epa.gov

Ronald Shafer
OPPE
U.S. EPA (2163)
401 M Street, SW
Washington, DC 20460
(202) 260-6766
Fax:  (202)260-4968
Shafer.Ronald@epamai 1 .epa.gov

Bimal Sinha
OPPE/CES
U.S. EPA (2163)
401 M Street, SW
Washington, DC 20460
(202) 260-2680
Maria Smith
OW
U.S. EPA (4303)
401 M Street, SW
Washington, DC 20460
(202) 260-8639
Fax:  (202)260-7185
Sm ith .Marla@epamai I .epa.gov

William P. Smith
OPPE/CES
U.S. EPA Room 3201(2163)
401 M Street, SW
Washington, DC 20460
(202) 260-2697
Fax:  (202)260-4968
Smith.Will@epamail.epa.gov

Kathleen Stralka
SAIC
11251 Roger Bacon Drive
Reston, VA 20190
(703)318-4583
Kathleen.A.Stralka@cpmx.saic.com

Linda Teuschler
ORD/NCEA-CIN
U.S. EPA (MS-190)
26 W. Martin Luther King Drive
Cincinnati, OH 45268
(513)569-7573
Fax:  (513)569-7916
Teuschler.Linda@epamai 1 .epa.gov

John Warren
ORD/NCERQA/QAD (8724)
U.S. EPA (8724)
401 M Street, SW
Washington, DC 20460
(202) 260-9464
Fax:  (202)401-7992
Warren.John@epamail.epa.gov

Charles White
OW/OST/EAD
U.S. EPA (MC-4303)
401 M Street, SW
Washington, DC 20460
(202)260-5411
Fax: (202)260-7185
White.Chuck@epamail.epa.gov

-------
Conference Support Staff

Patricia Crocker
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703) 205-8500
Fax: (703)205-6260

Marcia Gardner
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703) 205-8500
Fax: (703)205-6260
Marcia.Gardner@sratech .com
Maryce Jacobs
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703) 205-8500
Fax: (703)205-6260

Hale Vandemer
SRA Technologies, Inc.
8110 Gatehouse Road, Suite 600W
Falls Church, VA 22042
(703)205-8500
Fax: (703)205-6260

-------
ABSTRACTS

-------
       Index of Presentations Listed Alphabetically by Presenter(s)


Presenter(s)                                                                  Page No.

Ed Brandt: Pesticide Residue Monitoring Data	17
Daniel B. Carr:  Statistical Graphics for Environmental Applications: Developments
   and Challenges	22
Susan Devesa: Atlas of Cancer Mortality in the United States, 1970-92	2
Evan Englund: Spatial Sample Design	11
George Flatman: Skewed Frequency Distributions  	12
John F. Fox: Interpreting Data from a National Survey of Protozoan Pathogens in
   Drinking-water Sources	7
Chapman Gleason: Using the Web and Other Networking Technologies in Supporting
   SAS for the Enterprise	15
Alan R. Goozner:  A Master Sampling Frame for the Collection of Non-Agricultural
   Pesticide Usage Data	18
Henry Kahn:  Recent Developments in the Estimation of U.S. Fish Consumption 	14
Henry Kahn:  Estimation of Precision of Low Concentration Chemical Analytical
   Measurements and Establishment of Detection and Quantification Limits	20
Martin Kulldorff:  Evaluating Disease Cluster Alarms	3
Matthew Lorber and Paul Pinsky: Relationships Between Dioxins in Soil, Air, Ash, and
   Emissions from a Municipal Solid Waste Incinerator Emitting Large Amounts of Dioxins . 8
Mary Marion: Severity Analysis Using Ridits	6
Thomas Mathew: Confidence Regions and Tests in a Calibration Problem	13
Steven Millard and Nagaraj Neerchal: EnvironmentalStats for S-PLUS:  Software for
   Environmental Statistics  	1
David Mintz: The National Air Quality and Emissions Trends Report, 1995 	19
Barry Nussbaum:  EPA Cooperative Agreements	10
Paul Pinsky:  Statistical Modeling of Dioxin Concentration Data from Sediment Cores	9
David M. Rocke:  A Two Component Model for Error in Analytical Chemistry and Issues
   of Detection and Quantification	21
Linda Teuschler: Toxic Severity for a Useful and Understandable Benchmark Dose	5
John Warren: Representativeness in Statistics and Quality Assurance  	4
    Note: Complete abstracts for each conference presentation appear on the pages that follow.
    These include the name and type of session, and the date and time of presentation (in the
    upper right hand corner of the page), as well as the title of the presentation, the name and
    affiliation of each author, and the name and affiliation of each presenter.

-------
                                TRAINING SESSION 1-A & B: EnvironmentalStats for S-PLUS:
                                                         Software for Environmental Statistics
                               (1 -A) Tuesday, April 1, and (1 B) Wednesday, April 2, 1:15 - 4:45 pm
Title:         EnvironmentalStats for S-PLUS: Software for Environmental Statistics

Author:       Steven P. Millard, Ph.D., Probability Statistics & Information (PSI)

Presenters:    Steven Millard, PSI, and Nagaraj Neerchal, Department of Mathematics and
              Statistics, University of Maryland Baltimore Campus
                                          Abstract
S-PLUS is a premier statistics and graphics software package that is rapidly being adopted by
practitioners in fields ranging from pharmaceuticals to finance. ENVIRONMENTAL STATS for S-
PLUS is a new S-PLUS module designed specifically for environmental statistics.  Developed over the
past three years, it covers all the major statistical methods found in the environmental monitoring
literature and includes an extensively detailed hypertext help system to guide you through the
background and application of each method.  This training course will cover basic ideas in sampling
design and statistical methods for environmental monitoring and risk assessment, including methods of
random sampling, probability distributions, hypothesis tests and confidence intervals, prediction and
tolerance intervals, and methods for dealing with Type I left-censored ("below-detection-limit") data.
Concepts will be illustrated with data sets taken from current regulatory guidance documents.

-------
SESSION I - Cancer Statistics, Epidemiology and Genetics
Tuesday, April 1, 1:15 - 2:45 pm
Title: Atlas of Cancer Mortality in the United States, 1970-92

Authors: Susan S. Devesa, Ph.D., Dan J. Grauman, M.A., William J. Blot, Ph.D.*, Robert
N. Hoover, M.D., and Joseph F. Fraumeni, Jr., M.D., Epidemiology and
Biostatistics Program, Division of Cancer Epidemiology and Genetics, National
Cancer Institute, Bethesda, MD 20892
""Currently with the International Epidemiology Institute, Ltd., Rockville, MD 20850

Presenter: Susan Devesa, NCI
Abstract
The study of geographic variation in cancer rates may provide clues to the role of environmental or
lifestyle factors that may affect cancer risk. The maps themselves cannot provide information about the
causes of cancer or its clustering, but they can raise hypotheses about potential causative influences.
Earlier atlases showed substantial geographic variations in cancer mortality rates among whites and
nonwhites in the United States and stimulated subsequent studies which identified relevant exposures
and risk factors. For some cancers mortality rates have not changed greatly over time, whereas
substantial increase or decreases have been observed for other cancers. This atlas updates the maps
through 1992, presenting for the first time, data specifically for blacks. During the 23-year study period
1970-92, more than 8.5 million whites and 1.0 million blacks died due to cancer. The national annual
age-adjusted mortality rate per 100,000 person-years for all cancers combined ranged from 135 among
white females to 292 among black males. A total of 40 cancers (including all forms combined) were
considered. Some examples of maps from the new atlas will be presented. The patterns of cancer in the
United States, some of which have changed over time, may provide additional leads for the evaluation of
the determinants of cancer among American men and women.
JA1C.I , Jo* \<\<\$

-------
                                       SESSION I: Cancer Statistics, Epidemiology and Genetics
                                                 	Tuesday, April 1, 1:15 - 2:45 pm
Title:                 Evaluating Disease Cluster Alarms

Author/Presenter:     Martin Kulldorff, Epidemiology and Biostatistics Program, National
                      Cancer Institute
                                           Abstract
During the last few decades, there have been a considerable number of geographical disease cluster
alarms in different parts of the United States. Many are given considerable media attention, and, for
natural reasons, there is a considerable amount of worry in the local communities affected. As regards
the cause of the clusters, the environment is often a prime suspect.

Before moving into a full-scale epidemiological and environmental investigation, though, it makes sense
to find out whether the observed number of cases actually represents a statistically significant excess or
not.  We cannot simply compare the disease rate inside and outside of the cluster area, since we then
have a problem of pre-selection bias.  In this talk we will review and illustrate a couple of mutually
complimentary methods that can be used to work around that bias, one of which is the spatial scan
statistic. A number of applications will be given.
              /AspeCTr  cA  c:  Di'se.av  Clu

-------
                             SESSION II - Representativeness in Statistics and Quality Assurance
                            	Tuesday, April 1, 3:00 -4:45 pm
Title:         Representativeness in Statistics and Quality Assurance

Author:       John Warren, Quality Assurance Division, Office of Research and Development
              (ORD)

Presenters:    John Warren, ORD, and Malcolm J. Bertoni, Research Triangle Institute (RTI)
                                          Abstract
The concept of "representativeness" is quite clear to a statistician, especially in the context of survey
sampling with respect to a well-defined frame. The concept is considerably less clear when the context
is environmental sampling because the homogeneity of sampled media and physical environment from
which the sample is drawn must be considered.

The session will explore the differing concepts of "representativeness" as used (and possibly abused) by
the environmental community, include a discussion of Gy's theory of sampling as a possible solution,
and finally engage the attendees in a free and frank discussion of further aspects of the concept.

-------
                                                               SESSION III - How Severe Is It?
                                                            Wednesday, April 2, 8:45 - 10:15 am
Title:         Toxic Severity for a Useful and Understandable Benchmark Dose

Authors:      Linda Teuschler and Richard C. Hertzberg, Ecological Exposure Research
              Division, Office of Research and Development, Cincinnati, OH

Presenter:     Linda Teuschler, ORD
                                           Abstract
Regression on ordered categories of toxic severity is recommended in order to address two criticisms of
EPA's risk assessment procedures for noncancer effects. The first criticism is that presenting risk only
as probability does not consider the impact of the event.  Second, the goal of the benchmark dose is
vaguely defined, in part because it focuses on one effect from one study. By including all reported
effects into the regression procedure and tracking the toxic severity, one ends up with a benchmark dose
that closely follows the definition of the Reference Dose.  In addition, by keeping distinct the effects of
different severity,  categorical regression allows for a definition of a benchmark dose that satisfies both a
low specified risk  of minor effects and an even lower specified risk of major effects.

-------
                                                               SESSION III - How Severe Is It?
                                                             Wednesday, April 2, 8:45 - 10:15 am
Title:                 Severity Analysis Using Ridits

Author/Presenter:     Mary Marion, Health Effects Division, Office of Prevention, Pesticides,
                      and Toxic Substances
                                           Abstract


The United States Environmental Protection Agency, Office of Prevention, Pesticides, and Toxic
Substances, Office of Pesticide Programs has been given the task of reviewing chemical registrant data
and analyses, some of which use the statistical technique of rid its. The technique of rid it analysis used in
severity analysis was studied for its feasibility for use at the Agency.

Two toxicological data sets chosen were that of one study evaluating the severity of
glomerulonephropathy in male rat kidneys with dose increments of the chemical being reviewed and
another of mononuclear cell leukemia, also in male rats.

The mathematical theory behind this technique will be presented.  This is a continuation of a paper
presented in 1995 at the poster session of the SUGI 21 Conference held in Chicago, Illinois.

-------
SESSION IV - Exposure Assessment
Wednesday. April 2, 8:45 - 10:15 am
Title: Interpreting Data from a National Survey of Protozoan Pathogens in
Drinking-water Sources

Author/Presenter: John F. Fox, Engineering and Analysis Division, Office of Water
Abstract
In 1997-98, EPA and participating water treatment systems will conduct a nationwide sampling program
to assess protozoa (Giardia and Cryptosporidium) in drinking-water sources (untreated, raw water) and,
at a smaller number of systems, in the treated drinking water. Several hundred participating treatment
plants will each submit one sample per month for 12-18 months. The chief objective of the protozoan
sampling program is to characterize the nationwide distribution of protozoan concentrations in source
water, with the treatment plant as the unit of sampling, in particular the distribution of plant mean,
median, and 90th percentile concentrations. A related problem is to characterize the variability and
distribution over time of concentrations at one plant. This presentation will discuss opportunities and
challenges in developing appropriate point and interval estimates from these data to achieve national-
level characterizations of protozoan concentrations in raw and treated water. About one year remains
before interim analysis of data. We welcome suggestions regarding data analysis and interpretation!

-------
                                                          SESSION IV - Exposure Assessment
                                                           Wednesday, April 2, 8:45 -10:15 am
Title:         Relationships Between Dioxins in Soil, Air, Ash, and Emissions from a Municipal
              Solid Waste Incinerator Emitting Large Amounts of Dioxins

Author:       Matthew Lorber, National Center for Environmental Assessment (NCEA), Office
              of Research and Development (ORD)

Presenters:    Matthew Lorber and Paul Pinsky, NCEA, ORD
                                          Abstract
Environmental measurements including air concentrations and soil concentrations of dioxins were taken
in the vicinity of a municipal solid waste incinerator emitting large amounts of dioxins. Also available
were two separate stack tests measuring concentrations and amounts of dioxins being emitted, and
concentrations in combuster ash. An "incinerator signature," defined as the profile of the 17 toxic dioxin
and furan congeners where each is described in proportion to total dioxins, was found in the ash and in
subsets of the other two matrices.  The profiles in all media were also examined using principal
component analysis to determine what features best distinguished the profiles in each media. This study
also investigated the relationship of dioxin soil concentration as a function of distance from the
incinerator, and determined an urban background soil concentration, further from the incinerator, as
compared to elevated  soil concentrations near the incinerator. A background urban air concentration was
determined and compared to measurements of elevated air concentrations, which also had the signature
profile.

-------
                                                           SESSION IV - Exposure Assessment
                                                            Wednesday, April 2, 8:45 - 10:15 am
Title:         Statistical Modeling of Dioxin Concentration Data from Sediment Cores

Authors:      Paul F. Pinsky and David Cleverly, National Center for Environmental
              Assessment, Office of Research and Development (ORD)

Presenter:     Paul Pinsky, ORD
                                           Abstract
Evidence from several sources suggests that emissions of dioxins into the environment began to stabilize
in the 60's or 70's and have been declining since the 70's or 80's. One of the most important of these
sources is the historical record from sediment cores in U.S. lakes.  Recently, a joint EPA and DOE study
measured levels of dioxins and coplanar PCB's in the sediment core of 11 U.S. lakes.  Samples from
different sediment layers were dated, effectively transforming the data from each lake from a spatial
series to a time series. The resulting data base consists of a large number of time series (11 lakes times
30 concentrations of related chemicals) with each time series being relatively short (5 to 11 time points).
In this session, we will describe a modeling strategy for these data and interpret the modeling results
with the aim of summarizing overall trends as well as identifying any trends specific to certain lakes or
chemicals.

-------
                                                                        PANEL DISCUSSION
                                                          Wednesday, April 2, 10:30 am - Noon
Title:         EPA Cooperative Agreements

Author/Chair: Barry Nussbaum, Office of Policy, Planning and Evaluation

Participants:  Larry Cox, Office of Research and Development, Peter Guttorp, University of
              Washington, and G.P. Patil, Penn State University
                                          Abstract
This panel discussion will feature investigators from two of the major cooperative agreements on
environmental statistics.  The panel will discuss the use of cooperative agreements such as these to
encourage statistical research on theoretical and applied environmental topics. There will be general
comments by EPA on how to get tasks funded and work initiated. Then professors from two of the
universities with such agreements will discuss their side of the equation: how they operate under the
agreement and what they do.  Included will be the vision for future work and applications. The panel
will also have time for a hopefully lively question and answer period.
                                              10

-------
                                            TRAINING SESSION 2 - Spatial Statistics Sampling
                                                             Wednesday, April 2, 1:15 - 3:30 pm
Title:                Spatial Sample Design

Author/Presenter:    Evan Englund, National Exposure Research Laboratory, Office of
                     Research and Development, Las Vegas
                                          Abstract
Spatial samples, in addition to having number, referred to by classical statisticians as sample size, also
have sample support or sample volume or mass. QUAMS, thanks to Dean Neptune, represents this
concept by sample unit, remediation unit, and exposure unit. The support, since it cannot be analyzed
chemically in total, must be represented by a composite sample in which the subsamples survey the in
situ sample unit.  The definitions and methods of obtaining spatial representativeness will be presented
verbally (many "real world" examples and few equations). The relationships of support size and change
of support to spatial variance and regularization of semivariograms for correct varicography will be
explained. The methodological "rules of thumb" for spatial sample design will be enumerated, clarified,
and organized.
                                              11

-------
                                            TRAINING SESSION 2 - Spatial Statistics Sampling
                                                            Wednesday. April 2, 1:15 - 3:30 pm
Title:                Skewed Frequency Distributions

Author/Presenter:    George Flatman, National Exposure Research Laboratory, Office of
                     Research and Development, Las Vegas
                                          Abstract
The frequency distribution of both random variables and spatial variables has the ubiquitous problem of
skewness for data interpreters and decision makers. Presenting the mean of a skewed distribution is
disinformation to all data interpreters or managers (RPM or OSC) if they assume normality. The
appropriate model for skewed frequency distributions may be a mixture (plume mixed with background)
model rather than one lognormal model. When does a simplifying model become an over simplification?
The mixture model  does a better job at explaining most waste sites. Methods of separation, such as QQ-
plots and robust methods, will be discussed. The various methods of evaluating a lognormal mean will
be evaluated and illustrated by real world data and by virtual (simulated) data.  The number of questions
will exceed the number of answers.
                                              12

-------
                                            SESSION V - Applications of Statistical Calibration
                                                  Techniques in Analyzing Environmental Data
                                                              Wednesday, April 2,3:30 - 4:45 pm
Title:                 Confidence Regions and Tests in a Calibration Problem

Author/Presenter:     Thomas Mathew, Department of Mathematics and Statistics, University of
                      Maryland Baltimore County
                                           Abstract
Consider a univariate normally distributed response variable related to a univariate explanatory variable
through the usual linear regression model. Suppose independent observations are available on the
response variable corresponding to known values of the explanatory variable. Now consider another
observation on the response variable, corresponding to an unknown value of the explanatory variable.
The problem of calibration or inverse regression deals with statistical inference on this unknown
parameter. The data on the response variable, corresponding to known values of the explanatory variable
is referred to as calibration data. We will address the problem of constructing confidence regions and
hypotheses tests for the unknown value of the explanatory variable. Two types of problems will be
studied: the calibration data is used to construct confidence regions and to test for a single unknown
value of the explanatory variable, or for a sequence of unknown values of the explanatory variable. The
computational aspects and the practical implementation of our procedures will be illustrated in detail by
applying them to some chemical and environmental data.
                                              13

-------
                                          MINI SESSION A - Water Quality and Fishy Statistics
                                                             Wednesday, April 2,4:45 - 5:30 pm
Title:          Recent Developments in the Estimation of U.S. Fish Consumption

Authors:       Henry D. Kahn and Helen Jacobs, Environmental Analysis Division, Office of Water,
               Kathleen Stralka, Science Applications International Corporation

Presenter:      Henry Kahn, OW
                                           Abstract
Estimates of U.S. per capita fish consumption play a key role in a number of Environmental Protection
Agency program decisions. In particular, exposure estimates used in determining water quality criteria
and related standards are based, in part, on estimates of the amount offish consumed and contamination
levels in the fish. This presentation will report on estimates offish consumption based on recent work
with the USDA's combined 1989, 1990, and 1991 Continuing Survey of Food Intake by Individuals
(CSFII). These estimates reflect adjustments based on USDA's Recipe file which provides the amount of
fish in combination foods and changes in the habitat designations (freshwater/estuarine and marine) for
certain species offish.
                                              14

-------
                                                MINI SESSION B - Statistics and the Internet
                                                          Wednesday, April 2,4:45 - 5:45 pm
Title:         Using the Web and Other Networking Technologies in Supporting SAS for the
              Enterprise

Authors:      Chapman Gleason, Center for Environmental Statistics, Office of Policy, Planning
              and Evaluation, and John Shirey, Enterprise Technology Services Division, Office
              of Administration and Resource Management

Presenter:     Chapman Gleason, CES, OPPE
                                         Abstract
The Environmental Protection Agency (EPA) has just begun an Enterprise Computing Offer (ECO) with
SAS Institute. The EPA SAS ECO provides 21 SAS products (base, AF, Assist, ETS, Connect, FSP,
Graph, Share, Tutor, Stat, IML, Insight, Lab, Access for Oracle, Access for ODBC, CPE, CIS, QC,
Toolkit) on several desktop operating systems (Windows, Windows 95, Windows NT, MacOS, SunOS,
Digital Unix, OSF1, HP/UX, DG/UX) in EPA.  This product mix will allow SAS users to design and
develop client/server SAS applications and provide EPA scientists and policy analysts with better
desktop scientific, data management, and statistical software. This session describes EPA's
implementation strategy to support SAS across a heterogeneous LAN/WAN computing environment
consisting of more than 300 Novell servers and LANs running IPX protocol, Windows PCs on Novell
LANs running TCP/IP and IPX protocols, Unix workstations and servers (running TCP/IP protocol), and
an IBM mainframe housed at the National Computer Center located in Research Triangle Park, North
Carolina. All the computers are accessible via SAS from the Desktop using TCP/IP protocol. The
session will include discussion of how EPA:

1)      Prepared custom installation instructions for SAS on EPA's Novell LANs which run Networked
        MS Windows.
2)      Pkziped the SAS Windows Installation CD-ROM and set up an FTP Server for SAS to distribute
        SAS to users on Novell's LANs.
3)      Designed and implemented a Lotus Notes Mail-In Data Base and billing strategy to keep track of
        the user population.
4)      Implemented a SAS Listserver, called EPASAS-L, to allow users to share SAS technical
        problems and solutions.
5)      Designed an Internal SAS Web using a Lotus Notes InterNotes server and Data Base which
        replicates and publishes to the Web each hour. This Lotus Notes Data Base is replicated to each
        EPA Region allowing SAS users at remote sites to document their implementation of SAS
        products, SAS applications, and SAS code and share it with other EPA SAS users.
6)      Implemented a mail user-ID for the SAS Notes DB, so that users without Notes Clients can mail
        a document (including Graphics) to a user-ID called epasas Web@epamail.epa.gov was, and the
        document will automatically be published to the EPA SAS Web.
7)      Implemented the SAS and Lotus Notes Interface allowing SAS programs to write to the
        SAS/Web via SAS clients on remote Systems.

                                            15

-------
                                                                                (continued)

                                                 MINI SESSION B - Statistics and the Internet
                                                           Wednesday, April 2, 4:45 - 5:45 pm
                                    Abstract (continued)
One of the benefits of client/server computing and the popularization of Internet protocols has been the
rapid development of the World Wide Web (WWW). However, HTML development has languished
 because of single file names being required in HTML "home pages." One product that has overcome
that barrier and EPA has used to implement its SAS Web is a Lotus Notes InterNotes Server. An
InterNotes Server is a Notes server that runs under Windows NT Advanced Server and has the HTTP
demon running as an NT service. The InterNotes Server takes a Notes Data Base and converts the Notes
Documents into HTML documents and publishes the Notes Views as HTML links to the Notes
Documents. EPA has used this capability to save SAS users and developers the learning curve while
learning HTML, which is both tedious and time consuming. InterNotes also allows a "macro" level of
integration of keeping track of hundreds of HTML file names which are prevalent on Unix systems.
                                            16

-------
                                                                           POSTER SESSION
                                                             Wednesday, April 2,5:30 - 6:30 pm
Title:                Pesticide Residue Monitoring Data

Author/Presenter:    Ed Brandt, Economic Analysis Branch, Office of Pesticide Programs
                                           Abstract
The Government Performance and Results Act requires all government agencies to connect the process
of planning, budgeting, and accountability.  This paper addresses the issues concerning pesticide residue
monitoring data. Using National residue data from 1992 to 1995, the paper analyzes the consistency
between different residue monitoring programs, identifies gaps in the development of national estimates
of dietary exposure, and suggests approaches to better sampling strategies in the future to improve
overall dietary exposure estimates.
                                              17

-------
POSTER SESSION
Wednesday, April 2, 5:30 - 6:30 pm
Title: A Master Sampling Frame for the Collection of Non-Agricultural
Pesticide Usage Data

Author/Presenter: Alan R. Goozner, Economic Analysis Branch, Biological and Economic
Analysis Division, Office of Pesticides Programs
Abstract
The EPA recently conducted the 1993 Certified Commercial Pesticide Applicator Survey. The survey
was conducted at considerable cost. Much of the time involvement was the construction of a sampling
frame. As a follow-on to this experience, several questions arose: Could a master sampling frame be
constructed that would allow quicker, more efficient replication of a similar survey? Would it allow
surveying more specialized aspects of the applicator population? EPA statisticians are encouraged to
offer their insights and opinions as to the feasibility of the idea.

Should the EPA offer seed money to have this frame constructed in the Private sector? Would private
sector research companies use such a frame? Would they pay for samples drawn from such a frame?
Would the frame facilitate more research into the aspects of non-agricultural pesticide usage that would
otherwise not be done? At a minimum, should the EPA more fully investigate the feasibility of frame
construction and usability?
18

-------
                                                                           POSTER SESSION
                                                             Wednesday, April 2, 5:30 - 6:30 pm
Title:                The National Air Quality and Emissions Trends Report, 1995

Author/Presenter:    David Mintz, Office of Air Quality Planning and Standards, Office of Air
                     and Radiation
                                           Abstract
This twenty-third annual report documenting air pollution trends in the United States was released by
Administrator Carol Browner at a major press conference on December 17, 1996. The report provides
information on those pollutants for which National Ambient Air Quality Standards have been
established. These pollutants are carbon  monoxide (CO), lead (Pb), nitrogen dioxide (NO2), ozone (O3),
paniculate matter whose aerodynamic size is less than or equal to 10 microns (PM-10), and sulfur
dioxide (SO2).

While the report focuses on national trends in air quality concentrations and emissions for these criteria
pollutants, it also features information on related topics. These include visibility, air toxics,
nonattainment areas, urban area trends, reformulated gasoline, and Photochemical Assessment
Monitoring Stations (PAMS).
                                              19

-------
                                SESSION VI - Statistics of Measurement in Analytical Chemistry
                                                             Thursday, April 3, 8:45 - 10:15 am
Title:         Estimation of Precision of Low Concentration Chemical Analytical
              Measurements and Establishment of Detection and Quantification Limits

Authors:      Henry D. Kahn, Chief, Statistical Analysis Section, Office of Water, and Kathleen
              Stralka, Statistician, Science Applications International Corporation (SAIC)

Presenter:     Henry Kahn, EPA, and Kathleen Stralka, SAIC
                                          Abstract
Estimates of precision of low concentration chemical analytical measurements are critical to establishing
detection and quantification levels.  This presentation will consider estimates of precision based on the
EPA procedure for determining a "Method Detection Limit" and the Rocke-Lorenzato model.  The
methods will be illustrated using some inductively coupled plasma - mass spectroscopy data and
applications to establishing detection and quantification levels will be discussed.
                                             20

-------
                                SESSION VI - Statistics of Measurement in Analytical Chemistry
                                                              Thursday, April 3, 8:45 - 10:15 am
Title:                A Two Component Model for Error in Analytical Chemistry and Issues of
                     Detection and Quantification

Author/Presenter:    David M. Rocke, Director, Center for Statistics in Science and
                     Technology, University of California - Davis
                                           Abstract
A new model for measurement error in analytical chemistry will be presented.  A commonly used model
that assumes the standard deviation of analytical error increases proportionally with the concentration of
the analyte cannot be used for very low concentrations.  For measurements of near zero amounts, the
standard deviation is often assumed to be constant, which does not apply to larger quantities. Neither
model applies across the full range of concentrations of an analyte.  The new model contains two error
components, one additive and one multiplicative, and exhibits sensible behavior at both low and high
concentration levels. The use of the model with maximum likelihood estimation and application to some
gas chromatography/mass-spectrometry and atomic absorption spectroscopy data will be described.
Implications for detection and quantification will be discussed.
                                             21

-------
FEATURED SPEAKER
Thursday, April 3, 10:45 am -12:30 pm
Title: Statistical Graphics for Environmental Applications: Developments and
Challenges

Author/Presenter: Daniel B. Carr, School of Information Technology and Engineering,
George Mason University
Abstract
Development of statistical graphics for environmental applications is a many faceted challenge. In the
first part of this session, recently developed templates for communicating environmental summaries to
broad audiences will be presented. The templates address issues such as converting tables to plots,
linking statistical summaries and maps, and representing metadata to provide an appropriate basis for
interpretation. Also, a JAVA implementation that enables user manipulation in a low-resolution
dynamic web environment will be described. The second part of the session will focus on graphics
challenge areas. For example, one challenge area involves working with massive data sets. An example
uses the gridding of breeding bird prevalence data to a continental U.S. EMAP grid to raise issues about
global gridding of satellite imagery. The second challenge area concerns visualizing statistical and
ecological models and their impact on a specific analysis. Recent developments in environmental
graphics provide important new capabilities, but some deep challenges remain.
22

-------
EVALUATION

-------
MISCELLANEOUS

-------
Downtown  Richmond, Virginia
                                       21  Marine Raider MuMum
                                       22  John Marshall Houa*
                                       29  Maaonlc Hall
                                       24  Matthew Fontaine
                                            Maury Monument
                                       2«  Meymont
                                       2»  Monumental Church
                                       XT  Mo.que
^Metropolitan Richmond Convention
    and Visitors Bureau
   Tourlet Information
 1 Annabel Lse
 2 Beth Ahabah Jawleh Mu»*um
 * Blech Hlatory Mueeum
    & Cultural Cantar
 4 Bill -Boj»nglB»' Roblnaon Monument
   Cirpanler Cenlet
   Chamber of Commerce
   City Hall (New)
   City Hall (Old)
   Christopher Columbus Monu
   Jeffareon Oavia Monument
   Egyptian Building
                                            of the
                                       2» Edgar AUfi Poa Muaau
                                           Rlchn^nd Centre for
                                            Coventions and Exhibltlona
                                               mond Children's Muaaum
                                               mond National Battlefield
                                            Park Headquarter*
                                           Richmond Railroad Muaaum
                                           St. John's Church
                                       36 St Paul'a Church
                                       M Science Muaaum  of Virginia
                                       XT 6th Street Marketplace
                                       38 Soldlera and Saltore Monument
                                       M J.E.B. Stuart Monument
                                       «O Tredegar  Iron Work*
                                       41 Valentine Muaaum
                                       42 Valentine Riveralde
                                       4* Virginia Hiatortcal Society
                                       44 Virginia Museum  of Fin* Art*
                                       4* Virginia State Capitol    	
                                       4a) Virginia StateUb*aTv"» Arcttlvaa
                                       4T Vi
   Farmer'* Market
   Fadaral Reserve
   Great Shlplock Park
   Hollywood Cemetery
   Stonewall Jackson Monument
   Kanawha Canal Locks
   Robert E. Le* Monument
   Main Street Station

-------
I /    .AUSIi-UT
                                          Malro Richmond Visitors C»nt»r
                                          Agecroft Hall
                                          Amtrak Station
                                          Arthur Asha, Jr. Athletic Canter
                                          Borksdale Theatre at Hanover Tavern
                                          Lewis Ginter Botanical Garden
                                          Carytown
                                          Chesterfield Towne Center
                                          Cloverteaf Mall
                                          Greyhound Station
                                          Hanover Courthouse
                                          Henrico Courthouse Complex
                                          Henricus Park
                                          Historic Chesterfield County Museum
                                          Magnolia Grange
                                          McGu ire Veterans Hospital
                                          Meadow Farm Museum/Crump Park
                                          Paramount's Kings Dominion
                                          Virginia E. Randolph Museum
                                          Regency Square
                                          Richmond Braves, The Diamond
                                          Richmond International Raceway
                                          Lore Robins Gallery
                                          Scotchtown
                                          The Showpiece
                                          State Fairgrounds on Strawberry Hill
                                          Sw.ft Creek Mill Playhouse
                                          Three Lakes Nature Center & Aquarium
                                          Tuckahoe Plantation
                                          Virginia Aviation Museum
                                          Virginia Center Commons
                                          Virginia House
                                          Westhampton/The Shops
                                           at Libbie & Grove
                                          Willow Lawn, The Shops at
                                          Wlrton House Museum

-------
                    mm
                    downtown
                    IMIll
A Guide to Restaurants in Metro Richmond
  downtown,  inctuaina historic church hill, aiockoe iup ana 6nocKoe bottom
  DOWNTOWN
• Alllet American Grill. The Richmond Marriott, 500
East Broad Street. 643-3400. B/L/D/WB. Steaks, sal-
ads, pasta, sandwiches. Open 7 days a week: Break-
fast: 6:30a.m.-l la.rn Lunch: 11 a.m.-2p.m. Dinner 5
p.m.-IOp.m Sunday  Brunch Buffet: 7a.m.-noon.
S3.SO-SI6.9S.
• Apollo Italian Restaurant: 703 East Broad Street,
649-7070. Italian LSD. Take out. Lunch, dinner, Mon-
day-Friday: 9a.m.-8p.m.; Saturday:  I0a.rn.-8p.rn.
Closed Sunday. S2-SI6. No max group size.
• Becky's: 100 E. Gary Street. 643-9736. B/L+. South-
ern style cooking including homemade soups, sand-
wiches. Breakfast and  lunch: 7a.m.-3p.m. Drinks and
lite fare only from 3p.m.-10:30pm. Take out avail-
able. S3 SO-S4.9S.
• Bill's Barbecue: 700  E. Mam Street, 643-9857. B/L/
D. Famous pork barbecue, strawberry  pie, limeades.
Soup and salad bar. Open 7 days a week. Breakfast:
7a.m.-l 0.30a.m.; lunch/dinner: till 9p.m. S1.50-S5.
Maximum group size: 180 (call in advance).
• Blue Point Seafood Restaurant: 550 East Grace
Street. 783-8138. Seafood. L/D. Fresh  seafood flown
from Boston, pasta, mixed gnll. Lunch: S4.95-SI 1.25;
dinner $8 95-SI7. Catering for private parties, recep-
tions; max. group size. 60; 1,000 with use of attached
Sixth Street Marketplace. Lunch and dinner Monday-
Saturday, 11:30a.m.-4p.m. and 4p.m.-9p.m. Open Sun-
days for adjacent Carpenter Center performances or
other large events downtown. Children's menu.
• Cafe Ole: • 2 North 6th Street. 225-8226. Mexican.
B/L/D. California-style bumtos, quesadillas & taco
salads. Breakfast. Monday-Friday. 8a.m-10-30a.ro.
Lunch. Monday-Tuesday. 11:30a.m.-3:30p.m. Lunch/
Dinner. Wednesday-Thursday. 11:30a.m.-7.30p m.
Closed Saturday-Sunday. Breakfast from S3. Lunch/
Dinner from $5.
Casablanca: 6 East Grace Street, 648-2040. Standard
American Fare  L/D.  Sandwiches, salads, burgers.
Open 7 days a week. Lunch: I la.m.-2a  m. Ample por-
tions, pool table. S5-S8.
• Chez Foushee 203 North Foushee Street, 648-3225
Eclectic  B/L/D. Monday-Friday. Continental break-
fast- 9a m -1 la m. Lunch:  I la.m -3p.m Dinner Tapas
bar 4p m -9p.m. Soups, sandwiches, prepared salads
and tapas menus, also boxed lunch, full-service cater-
ing & private parties,  weddings, corporate meetings.
$3-510
• China Gourmet. 204 East Grace Street, 788-8888
Chinese  Lunch only.  Monday-Friday, 1 la.m -6p m.,
Saturday, Noon-5p.m.  Closed Sun. Average check S5.
Maximum group size: 25. Reservations required for
patty of 10 or more.
 • Cross Roads Restaurant A Lounge: 217 West Clay
 Street, 643-2060. FrencVCajun DAVB. Deep South-
 ern/New Orleans Jazz & Blues. Dinner. Wednesday-
 Sunday. 5p.m.-9p.m. Sunday Brunch: l)a.m.-2p.m.
 Closed Monday-Tuesday. Dinner S5-SI5; children &
 senior specials. Maximum group size. 85. Live Jazz
 Show 9p.rn.-2a.rn. Tuesday-Saturday Christian Jazz,
 2p.m -6p.m. Sunday. Reservations recommended.
 • DJ's Fresh Garden Cafe: 701 East Franklin Street,
 643-6592. Deli/Bakery. B/U Hot lunch specials, cater-
 ing for panics, cookies, cakes. Monday-Friday, 7a.m.-
 4p.m. Closed Saturday-Sunday. Breakfast SI-S3. Lunch
 S3.89-S4.65. Max group size: 42.
 • The French Quarters Restaurant: 421 East Franklin
 Street. 643-1268. French L/D/WB. Continental French
 cuisine Open 7 days a week. Lunch- Monday-Friday,
 ll:30a.m.-2p.m. Dinner 5:30p.m.-IOp.m., Saturday,
till llp.m. Sunday Brunch: ll:30a.m.-2p.m. Lunch
S3.95-S11.50, Dinner S12.95-S24.9S. Max group size
175. Reservations recommended.
• Fu Kim: 515 East Mam Street, 780-2999. CUnese/
Vietnamese. Lunch only. Monday-Saturday, 11a.m.-
2p.m. Closed Sunday. S1-S5.
• Homemadea by Suzanne: 10 South 6th Street. 775-
2117. Boxed Lunches/Bakery/Dea. B/U Homemade
breads, salads, soups A desserts, delivery available. Con-
oriental Breakfast, Lunch: Monday-Friday 9a.rn.-3p.nj.
Closed Saturday-Sunday. Breakfast $1.50-S3.50. Lunch
S6.25-S8.SO.
• J.R Crowdert Dell: 305 Brook Road, 648-2565.
BBQ/DelUHome Cooking  B/U Take out. country
Smithfield hams, sandwiches. Monday-Saturday.
6:30a.m.-4p.m.
• Just Willie's Cafe: 6 North 6th Street. 643-9330.
Home Cooking.  Fresh turkey, baked ham, homemade
chicken salad, soup. Lunch  only. Monday-Friday,
 lla.m.-3p.m. Closed Saturday-Sunday. S2.50-S4.95
Max group size: 10.
• Lcmalre Restaurant: -The Jefferson Hotel, Franklin
& Adams Streets. 788-8000 ext. 6366 - RegionaWA.
B/L/D/WB. Richmond's only AAA S-Diamond Res-
taurant. Upscale, 7 private dining rooms, extensive
wines. Breakfast/buffet: Monday-Friday 6:30a.m.-
 10a.m.. Saturday-Sunday till 11a.m. Lunch: Monday-
 Friday, Noon-2p.m. Dinner: Monday-Saturday,
 S.30a.m.-I0p.m. Breakfast S10, children S7.95. Lunch
 SI4, children S9.9S, Dinner S34, children SIS. Maxi-
mum group size: 75. Reservations recommended.
• Unden Row Inn: 100 East Franklin Street, 783-7000,
Southern Ctusme  B/L/D/WB. Open 7 days a week.
 Chefs specials, steaks, pastas, fish, exceptional
 crabcakes; also  patio dining at this antebellum land-
 mark, a property of the NationalTrust for Historic Pres-
 ervation. Continental breakfast. Monday-Friday. 7a.m.-
 10:30a.ra. Saturday, 7:30ajn.-10.30a.m. Lunch: Mon-
day-Sunday, ll:30a.m.-2:30p.m. Dinner.  Monday-
Sunday. S:30p.m.-10pjn. Sunday Brunch: ll:30a.m.-
2:30p.m .Breakfast: S4.25-S5.9S. Lunch S6.25-S12.95,
Dinner SI4.95-S22.95. Reservations recommended.
• Nick's: 707 East Main Street. 644-1212.//omeCao*-
tng B/L. Boxed lunches, catering, lowfat/fat free menu
available. Breakfast: Monday-Friday 6:30a.m.-
2:30p.m. Lunch: Monday-Friday,  I0a.m.-2:30p.m.
Closed Saturday-Sunday. S2.SO-S4.50.
• Ocean Restaurant: 414 East Main Street. 649-3456.
Home Cooking. B/L. Breakfast: Monday-Friday,
7a.rn.-l la.m. Lunch: Monday-Friday, I la.m.-2:30p.m.
Closed Saturday-Sunday. Breakfast S1.25-S2.99,
Lunch S1.25-S3.9S. Maximum group size: 25.
• Padow's Hams A Dell: -1110 East Mam Street,
648-4267. Deli. B/U Specializing in Smithfield coun-
try, honey glazed & spiral sliced hams in-store ft mail
order, plus sandwiches, prepared salads, soups, take-
out and eat-in, Monday-Friday. 7a.m.-5p.m. Closed
Saturday-Sunday. Breakfast from SI; lunch from
S2.50.
• The Pavilion: Crowne Plaza Hotel. 555 Canal Street.
788-0900. B/L/D/WB. Steaks, pasta, crab cake plat-
ter ($17.95) Open 7 days a week: Breakfast, 6a.m.-
I la.m.. Sunday Buffet till I0:30ajn. Lunch. 1 la.m.-
2p.m. Dinner, Sp.m -10p.m. Breakfast to $10 Lunch
S3.95-S1I.9S. Dinner S7.9S-S 18.95.  Banquet facili-
ties available.
• Penny Lane Pub & Restaurant - 207 North Sev-
enth Street, 780-1682. L/D. Authentic British style pub
known for fish and chips, steak and kidney pie, hearty
fare.  Lunch: Monday-Friday, lla.m.-2p.m. Dinner
Monday-Satuidy from S p.m. Lunch to S10. Dinner
S6.95-SI7.95.
• Perly's  Restaurant: 111 East Grace Street. 649-
2779. Home Cooking/Dell B/U Breakfast: Monday-
Friday 7a.m.-1 la.m. Lunch: 1 la.m -3p m Closed Sat-
urday-Sunday. Breakfast SI.50-S4.SO. Lunch $2.75-
$6.85. Max group size 8.
• Mr. Beaungard'sThal Room: 103 East Gary Street,
644-2328. nal/Amerlcan cuisine. L/D. Formal/casual
dining. Lunch (Thai & American fare): Monday-Sat-
urday, 1 la.m.-2:30pjn. Dinner (Thai cuisine): Mon-
day-Thursday, 4:30p.m.-l 1p.m.,  Friday-Saturday,
4:30p.m.-l2p.m. Sunday. 4:30p.m.-9p.m. Lunch: $5-
$10. Dinner S9-SI5. Max group size 190.
• Plereei  Pill Bar-B-Que - 1116  East Mam Street.
643-0427. BBQ Lunch only. Boxed lunches, cater-
ing, outdoor open pill BBQ, featuring hand-chopped
pork, nbs. chicken; also sanwichcs. salads Available
for morning, afternoon or evening  meetings, parties
Lunch: Monday-Friday, I0:30a m.-4:30p m. Closed
Saturday-Sunday. S2-S8. Max group size 75
• The Red Door: 314 East Grace  Street, 649-1588.
Greek/Italian/American  L/D. Daily homemade foods
ft bread, daily specials. Lunch: Monday-Saturday,
lOam-Spm Dinner, Monday-Saturday, 5p.rn.-8p.rn
Closed Sunday. SI .50-57.2$. Maximum group size 75.
• Saigon Restaurant: 903 West Grace Street, 355-
6633. Vietnamese  L/D. Lunch.  Monday-Friday,
1 la.m.-2p.m.  Dinner,  Monday-Friday.Sp.m.-
10:30p.m. Saturday, Noon-10p.m. Closed Sunday
Lunch S3.75. Dinner S4.95-SI0.95. Maximum group
size 48.
• Steve's Restaurant. 110 North 5th Street. 649-3460.
B/U  Homemade specials, Italian dishes, corned beef
and cabbage. Monday-Friday.  Breakfast and lunch:
7a.rn.-2.45p.rn. S2.99-S5.50.
• 3rd Street Diner: 218 East Main Street. 788-4750
B/L/D. Open 24 hours  Basic fare in double-decker
diner, burgers, fhes, daily specials and greek specials.
Breakfast served all day. Prices from about S2.
• TJ.'s: The Jefferson Hotel, East Franklin and Adams
Streets, 788-2000. L/D/WB. Lunch and dinner: Mon-
Sat I la.m.-2a.m. Dinner from 5 p m Lite menu, en-
trees, salads, sandwiches, pasta, chicken, steaks. $6-
$17.  Sunday Champagne Brunch (reservations  re-
quired) at S28.95 per person is from 10:30a.m.-2p.m.
• Tony's Bar-Be-Que - 207 North Third Street, 644-
8544. B/U All homemade fare, sandwiches, chicken
fillet, BBQ. Breakfast and lunch: Monday-Saturday,
6a m -4p m SI 99-S3 99 Closed Sunday

-------
   from  eclectic  ond  southern
   cuisine to Indian ond Ihai
   restaurants... from
   Antebellum  hornet to
   converted tobacco
   wrehouses... Richmond'*
   downtown restaurants  appeal
   to everyone's palette!
 • Ukrop's Fresh Express: 1 Oth & Main Street Streets,
 648-5633.B/L.0eA Sandwiches,salads,soups,chips.
 plus "heart healthy" items. Eat in or cany out. Mon-
 day-Friday. Breakfast: 7a.m.-IOa.m. Lunch.  lOa.m-
 3:30p.m. S.99-S4.99. Maximum group size 30.
 • Vie de France: James Center, 1051 East Cary Street,
 7804748. B/L. 7:30a.m.-4:30p.m. Monday-Friday.
 Sandwiches, soups, salads, muffins, bagels. Cafeteria
 style/self-serve. Sit in or cany out. S.99-SS.99.
 •WaDStPeetDeU:100Noith8thStreet,643-3354 Deli.
 B/L. Classic deli serving subs ft sandwiches, corned beef.
 pastrami prepared in house. New York bagels. Monday-
 Fnday, 7a.m.-3p.m. Closed Saturday-Sunday. S3-S7.
 Maximum group size 38.
 • Winnie's Caribbean Cuisine - 200 East Mam Street.
 6494974. Carribean. L/D. St. Lucian owner/chef is
 known for her hot and spicy jerk chicken, conk frit-
 ters, crab cakes and  tasty Caribbean lemonade, plus
 lunch and dinner specials. Tropical decor, bright col-
 ors, reggae music. Lunch: Monday-Friday,  11a.m.-
 3pjn. Dinner 5p.m.-9:30p.m. Saturday (dinner only),
 6p.m.-IO:30p.m. Lunch S4.SO-S7.99. Dinner S5.50-
 S12.99. Closed Sunday, but will open for large groups
 with prior arrangement
  CHURCH   HILL
• Annabel Lee Rlverboat: 4400 East Main Street,
644-5700. Variety. Riverboat cruise ft buffet-style
dining with live entertainment ft commentary; lunch/
bnmch/dinner&plantation cruises available. Lunch, Tues
day Plantation Cruise I0a.m.-S:30p.m.. Wednesday-
Fnday, Noon-2p m, Saturday Ham -lp.m. Dinner
Wednesday-Thursday, 7p.m -9:30p.m., Friday-Satur-
day. 7:30p.m-IO:30pm. Sunday, 6p.m-8:30p.m.
Closed Monday. Lunch SI7.95, children S9.9S, Din-
ner S24.9S, children SI 1.95. Max group size 350.
• The Hill Cafe. 2800 East Broad Street. 6484360
B/L/D/WB.  Diverse menu including lobster, prime
rib, salads, burgers, bumtos and quesadillas Break-
fast: Monday-Friday, 7a m.-3p.m. Lunch: Monday-Fri-
day, I la rn.-3p.ni. Dinner Tuesday-Sunday, 5i30p.m.-
2am   Saturday and Sunday Brunch:  I0-30a.ni.-
3:30p.m. Lunch S4.95-S6.9S. Dinner.  S5.95-SIS.95.
Can accommodate large groups.
• Millie's Diner: 2603 East Mam Street, 643-SS12.
L/D/WB. Globally inspired eclectic menu featuring
"fusion* cuisine Menu changes monthly. Lunch- Tues-
day-Friday, lla.m.-2:30p.m. Dinner Tuesday-Satur-
day, 5:30p.m -I0:30p.m. Sunday dinner till 9p m. Sat-
urday brunch: 10 a.m.-3pm., Sunday brunch 9a.m-
3p.m. Lunch S6-SIO. Dinner S14.95-S2I.95.
 • Mr. Patrick Henry's Inn: 2300 East Broad Street,
 644-1322. Continental. L/D. Warm and woody inside
 of this 1850s row house convened to an inn and res-
 taurant. Garden dining, soups, salads, entrees, chefs
 specials. Lunch: Monday-Friday, 11:30a.m.-2:30p.m.
 Dinner Monday-Saturday, 5:30pm-9:30p.m. Lunch:
 S6-S12. Dinner S18-S23.
 • Poe's Pub: 2706 East Main Street, 648-2120. L/D.
 Irish pub atmosphere, best known for its catfish and
 ribs. Casual dining. S2.95-SI5.9S. Open 7 days a week
 for lunch and dinner from 11 a.m.-2 a.m.
The Tobacco Company Restaurant: 1201 East Cary
Street, 782-9431. Continental L/D/WB. Seafood ft
VA specialties, historic landmark known for prime rib
with seconds on the house, live entertainment nightly.
Lunch- Monday-Saturday I l:30a.m.-2:30p.m. Dinner
Monday-Friday 5:30p.m.-10:30p.m, Saturday. 5p.m.-
1 lp.m., Sunday, 5:30p.m.-IOp.m. Sunday Brunch,
10:30a.m.-2:30p.m. Lunch S2.99-S9.95, children
S2.99-S6. Dinner SI3.9S-S26.95. children S2.99-S10.
Max group size 100. Space  for catered receptions up
to 300.
  SHOCKOE    SLIPlSHOCKQC   BOTTOM
 • The Berkeley Hotel Dining Room: 1200 East Caiy
 Street, 780-1300 B/L/D. American, European, voted
 one of Richmond's best restaunnl experiences, exten-
 sive wine list, many  from Virginia. Breakfast:7a.m.-
 I0:30a.m. Lunch: ll:30a.m-2p.m. Dinner: 6p.m.-10
 p.m. Breakfast and lunch: S2.50-SI3.SO Dinner$17.
 Maximum group size: 20.
 • LiGrolta Restaurant:  12th & East Gary Streets,
 644-2466. Italian L/D. Voted Best Lunch Spot by Style
 Meekly ft voted ••••1/2 stars by Richmond Times-
 Dispatch Lunch: Monday-Friday. 11:30a.m.-2:30p.m.
 Dinner, Monday-Thursday, 5:30p.m-IOp.m., Friday-
 Saturday, S:30p.m.-llp.m.. Sunday Sp.rn.-9p.rn.
 Lunch S6.9S-S8.95. Dinner S9.9S-SI8.9S. Max group
 size: 130.
 • Nana Znshl: 1309 East Gary Street, 225-8801 Japa-
 nese LID. Sushi bar and a la carte, teiriyaki. tempura
 dishes. Lunch: Monday-Friday, 1 l:30a.m.-2p.m. Din-
 ner Monday-Saturday, 5:30p.m.-10p.m. Lunch $4.95-
 S9.SO. Dinner S6-S12.
 • Peking Pavilion: 1302 East Gary Street. 649-8888.
 Chinese L/D/WB. Northern Chinese cuisine. Lunch:
 Sunday-Friday, ll:30a.m.-2:15p.m. Dinner, Sunday-
 Friday, Sp.m -9:45p.m., Saturday, 5p.m.-10:4Sp.m.
 Sunday Brunch. Lunch S4-S7, Dinner S7-S14. Max
 group size 200.
 • Rkhbran Brewing Co. and Restaurant: 1214 East
 Gary Street, 644-3018.  LID. Virginia's original
 microbreweiy and only mierobrewery lestuamnt All
 beer is made on premesis. Open 7 days a week. Two
 floon featuring full service restaurant downstairs and
 bar with pool and darts upstairs. Menu includes fish
 and chips, pastas, chefs specials, catch of the day.
 soups, salads, sanwiches.  S3.9S-S15.  Lunch:
 11:30a.m.-4p.m. Dinner from 4 p.m. Children's menu.
 Large groups accomodated.
 • Sam Miller's Win-home: 1210 East Cary Street.
 644-5465. SeafoodMeglonal VA  B/L/D. Breakfast:
 10a.m.-5p.m. Lunch: lla.m.-Sp.m.  Dinner: 5p.m.-
 1 lp.m. Breakfast SS.9S-SI2.9S. Lunch SS-SI2, Din-
 ner SI I .95-522.95. Bus parking available. Max group
 size: 175.
 • Skipjack Tavern &  Comedy Club: 109 South 12th
 Street, 644-0848. LID. Open 7 days a week. Restau-
 rent features nw bar with clams, oysters and crab legs
 from owner's Chieoteague oyster farm; plus fish and
 chips, sandwiches and traditional entrees. Lunch/din-
 ner ll:30a.m.-2a.m. SS.9S-SI6.95. Two seatings for
 weekend Comedy Club. Call for reservations and in-
 formation.
 • The Slip at Shockoe: 11 South 12th Street, 643-3313.
 Home Coating/Soul food. B/L/D. Lunch ft dinner buf-
 fet, salad bar. sandwiches, beer. wine, mixed beverages.
 Breakfast: Monday-Friday, 7a.rn.-l 1a.m. Lunch: Mon-
day-Friday, lla.m.'3:30p.m. Dinner buffet: Friday,
5p.m-9:30p.m.. Sunday, 6p.m.-9:30p m. Closed Sat-
urday. Breakfast S99-S3.99, Lunch $1.49-5499, Din-
ner S3.95-S8.95. Max group size 125 Dancing Thurs-
day-Saturday, 9pm.-2a.rn, Sunday. 9p.m.-la.m Happy
Hour Friday. 5p.m.-9p.m.
•Awful Arthur's Seafood Co.: 101 North 18th Street.
643-1700. Seafood/RegionaUVA LID. Fresh seafood.
rawbar with oysters, clams, crablegs, shrimp ft craw-
fish, daily specials, theme nights. Lunch-Dinner Mon-
day-Friday, 11:30a.m.-2a.m., Saturday, Noon-2a.ni.,
Sunday. 4p.rn.-2a.rn. Lunch SS-S8. Dinner S9-S1S.
• Sea Breeze Cafe:  3 South 15th Street, 649-8516.
Caribbean. L/D. Hot and spicy island food; known
for conk fritters and mango shrimp. Lunch: Tuesday-
Friday, 11:30-2:30p.m. Dinner: Tuesday-Friday,
S:30p.m.-2a.m. Saturday and Sunday Sp.m.-2a.m.
Lunch S3.7S-S7.50. Dinner S5-SI4. Can accomodate
large groups with advance notice.
• The Bottom Line Tap & Grill: 1800 East Main
Street, 644-5944. American  L/D. Sandwiches/Puft
Cnib. best selection of bottled beer in Bottom. Lunch:
Monday-Saturday, Noon-2p.m. Dinner Monday-Sat-
urday, Sp.nL-2a.rn. Closed Sunday. Lunch-Dinner SS-
SIO. Max group size 6.
• Bottoms Up Pizza: 1700 Dock Street, 644-4400
Plaa  L/D. Gourmet pizza. 2 open-air decks, voted
best pizza 7 yean running. Monday-Wednesday,
ll:30a.m.-llpm, Thursday ll:30am-midnight, Fri
11:30a.m.-2a.m., Saturday, Noon-2a.m..  Sunday,
Noon-midnight $5-12. Max group size 300.
• Calypso Cafe: 1718 East Franklin Street, 22S-9776.
Caribbean. L/D/WB. Also seafood, steaks & vegetar-
ian, Caribbean theme (Jimmy Buffet ft rum runnersX
catering, parties, one of city's largest rooftop open air
decks. Lunch: Monday-Sunday, 1 la.m.-3p.m. Dinner
Monday-Sunday, 4p.rn.-l Op.m. Plus, Sunday Brunch.
Lunch S5-S8, Dinner $6-$ 12. Max group size 12.
• Castle Thunder: 1726 East Main Street, 648-3038.
L/D. Extensive sandwich menu, new outdoor dining
deck on Main Street. Open seven days a week, 11:30
a.m.-2a.m. S4.9S-S6.95.  Maximum group size: ISO.
• Chaplin's GrlU: 2001 East Franklin Street, 643-
7520.  Pasta, steaks, Cajun shrimp. Friday-Saturday.
10p.m.-2a.m.
•Cheat's Cow ud Clam Tavern: 21N. 17th St. 644-
4310.  Seafood. Dinner only. Oldest bar  in the Bot-
tom. Shellfish, pasta and steaks served  informally.
Home of the famous "Moister Oyster - shucked oys-
ter, cocktail sauce with a shooter of beer on the side.
Dinner Tuesday-Saturday, Sp.m.-2a.m. Closed Sun-
day and Monday. S3.2S-S1S.9S. 1/2 price specials on
Tuesday. Lanje groups OK with notice.
• Cobblestone Brewery & Pub: 110 North 18th Street.
644-2739. Cajun. New Orleans. Jamaican, steaks.
• The Frog and the Redneck 1423 East Cary Street,
648-3764. Modem American Regional Dinner only.
Consistently rated as one of Southeast's finest restau-
rants and winner of many awards for excellence. Fea-
tures great local products including seafood, meats and
veggies Celebrity chefJimmy Snead cooked with Julia
Childs on "In Julia's Kitchen with Master Chefs." Din-
ner Monday-Friday, 5:30p.m.-IOp.m. Saturday, 5pm.-
10:30p.m. Will accomodate large groups with advance
notice.
• GoodfeUas: 1722 East  Main Street. 643-5022. Pro-
gressive rocknroll with state of the art sound system
and house DJ. Paul. Wednesday-Saturday: 5p.m.-
2a.m.

-------
• The Hard ShcD: 1411 East Caiy Street. 644-5341.
Seafood. L/D. Seafood spot with lobster bar. steak.
diverse menu. Lunch: Monday-Saturday.  11:30-
2:30p.m. Dinner: Monday-Saturday, 5:30-10:30.
Closed Sundays. Lunch S4.25-S6.95. Dinner $12.95-
S21.95.
• Havana '59: 16 North  17th Street. 649-2822  Cu-
ban/Caribbean. Dinner only. The ultimate in theatri-
cal dining. Cuban cuisine in a re-created 1950"$ Ha-
vana streetscape. Cigar smoking, rooftop dance floor,
great fresh juices. 4:30p.m -closing  S8-20
• Homer's Real Sports Grill: 14 North 18th Street.
643-2222. American. Two laser disc video screens
Hearty food including fried  chicken, Buffalo wings,
meatloaf. Dinner Monday-Friday. 4p.rn.-2a m Lunch/
Dinner Saturday and Sunday, 12p m.-2a m.
• Johnson's Grill: 1802 E. Franklin St.. 648-9788
Soul Food No smoking  or  alcohol Open Monday-
Friday. Breakfast 6a.rn.-l la.m. Lunch  1 la.m -Ip.m
Closed Saturday-Sunday Breakfast S3.95. Lunch $4-
S6.50. Max group size 55.
•Main Street Grill: 1700 East Main Street, 644-3969
 Vegetarian/American  Grill  L/D/WB. Grill by day.
vegetarian by night. Open Tuesday-Sunday. Breakfast-
7 a.m -11a.m., Lunch I la.m.-2.30p.m. Dinner: 6p.m -
 12p.m. (vegetarian cuisine). Casual dining Break-
 fast to SS; lunch and dinner around S7.95.
• Marks: 1707 East Franklin Street. 649-1079. Ameri-
can  L/D. Sandwiches, homemade chips, pool  table.
 live music on weekends. Lunch/Dinner- Monday,
 ll:30a.m.-8p.m.  Tuesday-Friday.  ll:30a.m.-2a.m.
 Saturday, 5p.m.-2a.m.
 • Medley's: 1701  East Mam Street. 648-2313 Cajun/
 Creole. L/D/WB. News Orleans-style Cajun &  Blues
 Bar. Lunch: Monday-Saturday, Noon-3p.m  Brunch
 Sunday, Noon-3p.m. Dinner: Monday-Saturday. 6p.m-
 11p.m. Appetizers only Monday-Saturday, 3p.m.-
 6p.m.. I lp.m.-2a.m. Lunch S5.95. Dinner S12.95. Ap-
 petizers S4.50-S7.50.
 • Moondanee Saloon and Restaurant: 9 North 17th
 Street. 788-6666. Southwestern. Dinner only. Cuban
 chef, blackboard menu, great drinks. Dinner  Tues-
 day-Saturday. 6p.m.-11p.m. plus afterhours sand-
 wiches till 1a.m.  Alternative music night on Monday
 from 9p.m. featuring college bands and a limited menu
 S6-SI4.95.
 • None Such Place:  1721 East Franklin Streel,.644-
 0832. Regional/VA Traditional VA cuisine using fresh
  ingredients ft classic culinary techniques, housed in
  oldest commercial building in Richmond. Lunch-
  Monday-Saturday, I l:30am-3pm, Dinner Monday-
  Saturday, from 5:30pm. Closed Sunday Lunch S5.9S.
  DinnerentreesSll.50-S20.95 Max group size 80-100.
  •Rack-n-Roll Cafe: !713East Main Street. 644-1204
  American grill. Sports bar atmosphere with pool tables,
  darts, foozeball.  Lunch-Dinner Monday-Wednesday,
  11:30a.m.-midnight. Thursday-Friday,  11:30a m -
  2a.m. Dinner Saturday-Sunday. 6p.m -2a m  Lunch
  S4.50-S6.50. Dinner S6.50. Maximum group size 300.
  • River City Diner:  1712 East Main Street, 644-9418.
  American  Diner Food  with flair, breakfast anytime.
  Tuesday-Wednesday. 8a.m.-2a.m., Thursday-Friday,
  8a.m.-4a.m., Saturday, 24  hours until Sunday 3p.m
  Closed Monday. Average check S6.25.
  • Rock Bottom Pizza-13 North 17th Street, 225-1382.
  Pizza. 70s  atmosphere. Wednesday, 9p m -2a m.,
  Thursday-Saturday, 6p.m.-2a.m. Closed Sunday-Tues-
  day. Max group size 30.
  • Shotz: 4 North 18th Street, 649-7468. Deu/Piua
  Fresh cooked pizza & subs, bar crowd after  lOp.m.
  private parties, 21 ft over only. Dinner Monday-Satur-
  day. 5p.rn.-2a.rn. Closed Sunday. Dinner S5. Maximum
  group size 20.
• Southern Sugar A Spleei: 2116 East Main Street,
788-4566. Southern B/L/D. Real, down-home south-
em cooking including fried chicken, liver and onions,
meatloaf, fish, pork chops. Monday-Saturday. Break-
fast: 9a.rn.-11a.m. Lunch: 1 la.m -4p.m. Dinner 4p.m -
9p.m. Breakfast: to SS. Lunch S5-S8. Dinner. S9-SI6.
• Star of India: 1703 East Franklin Street, 648-5470
Indian.  L/D/WB.  Lunch- Monday-Saturday,
 11:30a.m.-2:30p.m. Dinner Monday-Thursday Sp.m -
 10p.m., Saturday, Sp.m.-llp.m. Lunch buffet week-
days, S5.95. Dinner S7.50-S13. Sunday brunch S7.95
• Snnset Grill-1814 East Main Street. 643-2926. Surf
& Turf L/D. Capitalizes on neighborhood meat mar-
ket. Chicken, burgers, steaks every which way. Lunch-
only in spring 4  summer months on large outdoor
patio. Dinner Thursday-Saturday, Sp.m-1 a.m Lunch
 S4-6. Dinner S5-S12  Closed Sunday-Wednesday.
• Surf Side Grill: 1714 East Franklin Street, 644-8704
Seafood Beach-fresh seafood Lunch-Dinner. Mon-
 day-Friday, 11a.m.-10p.m. Dinner. Saturday, 4pm-
 2a.m. Closed  Sunday. Lunch S4.95-S8.95.  Dinner
 S8.99-S17.99.  Lunch-Dinner children S4.25.
 PuUMtr aaoMi no mpaaUOaffar At ccctncy oflnfmnam
 contabtfi tttni• and ca\tia contacting rtttotumu dlitetfy to
 cntttm Inlermalon atom kem of optima*, pnea. fonnnn.
                       « MCTBKWD «HO04) 7«-

-------
                                In Search of...
                         Environmental Statistician
The United Nations has an opening for an environmental statistician.  Salary is $108,000.

For information contact the UN web site:   http:\\www.un.org
                                      Press "general information" and then
                                      Enter "UN employment"

United Nations contact person is Patricia Nicolos, (212) 963-5783.

This information was provided by EPA contact, Kathleen Hogan, (202) 260-9349.

-------
United Slates
Environmental Protection
Agency EPA
Policy, Planning, and Evaluation (2163)
   THE TWELFTH ANNUAL
    EPA CONFERENCE ON
ENVIRONMENTAL STATISTICS
Richmond, VA          April 1-3, 1997
IDEAS   ARE  NEEDED
    TO FILL THIS SPACE
    Wanted: Conference Logo
  Theme: Statistics for the Future
 Reward: Contact Barry Nussbaum

-------
 B I* A TWELFTH  AXXTAL  COXFEIIEXCK      X
                                     EXVIHOXMEXTAL  STATISTICS

                           SPECIAL  Preview  EDITION
 STATISTICS    FOR     THE
 FUTURE    April 1-3,  1997
 RICHMOND, VA Site of the Twelfth Annual
 EPA Conference on Environmental Statistics.
 "Thought the EPA Conference was supposed to
 come to town last year," mused a Richmond
 resident. Well, we didn't make it then, but
 we're back and looking for a big turnout at
 this year's conference. Personnel from EPA
 and other Federal and state agencies will
 gather south of the Mason-Dixon Line for a
 two-an- a-hatf-day conference. The theme is
focusing  on  relevant  applications of
statistics in government programs and how to
enhance statistical support, will feature
hands-on   training    sessions    and
opportunities to learn about new statistical
techniques and software.  There will be
sessions on health statistics, detection
limits, water quality, and the use of
statistics in Quality Assurance.
     The conference's real, underlying
benefit to you  is the  opportunity to
exchange with others involved in similar
programs, with related problems, and on a
one-to-one level. Informal sessions, such
as the Poster/Technology Session  and
Roundtable  Discussions, provide  an
atmosphere for sharing information, solving
problems, and building a network.
   There is plenty of opportunity to get
involved.  Check out the "Caiifor Your
PARTICIPATION" aA in this Special Preview
Edition. There is no limit to how much
involvement and fim you can have. And, from
winter weather predictions, you'll want to
cut loose and enjoy springtime in the old
South at the EPA statistics conference.
THE CONFERENCE IS BACK. VAU.COME.
SPECIAL FEATURES
         ••*••**••••»•*••
          Ho Registration Fee
     Transportation Provided from EPA
            Headquarters
        ••••••••••••*•••
    Costs within Government Per Diem
     Fulfill* Qualifications as Training
WOW, what a year it has been. I'm sure
I'm not alone in saying that I've never seen
a set of furloughs and travel restrictions
that  affected us as severely as last year.
But a funny thing happened on the way to
no forum.  You may recall that despite all
our money saving techniques, we had to
forgo our annual conference on statistics.
In order to capitalize on the plans already in
progress by some of the professorial types
who  were developing tutorial sesisons, we
decided to hold these training sessions in
Washington and RTP.  This avoided travel
costs and  travel  restrictions  for  the
attendees from our two major  locations.
We didn't intend to shut out regional and
laboratory folks at other locations, but we
had to do the best we could under unusual
circumstances.  So what happened?  We
didn't just  salvage some sessions, we
actually  learned that there was a real
demand for this training, and a good bit of
response came from people who normally
didn't attend the annual  conference.
Imagine  my surprise to  hear "new"
participants asking why they weren't on
the  list.   They had heard about the
conference from a colleague down the hall.

So we are applying  what  we learned.
FIRST. I have personally arranged that the
government will  not stop this   year.
SECOND, we are still  employing our cost
reduction methods to make the travel more
palatable to attendees. THIRD, and most
importantly,  we  are  combining   the
conference  with  enriched  training  in
Richmond on April 1-3, 1997 (no fooling!).
FOURTH, we are adding separate training
sessions in the late spring.  We think we
may  have hit on the best of both worlds
with this scheme. But it really depends on
your  participation  to make it  a real
success.  So, jump on the band wagon.
and participate! Write e paper, present a
poster, serve on a panel, and be active. I
look forward to seeing you in Richmond.

One last dilemma:  If we had to  postpone
last year's conference.  Is this the  12th
annual conference on  statistics, the 13th
annual conference on statistics, or the
12th  almost   annual  conference  on
statistics?;  and  was Grover Cleveland
really the 22nd and the 24th President all
by himself? If you can help me with any of
this,  please caU, write, fax,  e-mail, etc.
Thanks.             BAmrNUSSBAUM
 Emphasis on Training

 Response  to the  series  of statistical
 training programs offered last spring in
 DC and RTP was tremendous. Courses
 in Regression  Diagnostics, Information
 Visualization,  and  SAS  Applications
 attracted a large and  varied audience.
 Positive  feedback  on  the  training
 programs has led to a  greater emphasis
 on training opportunities at this year's
 conference as well as training courses to
 be offered in the late spring and/or early
 summer of next year.

 The  Conference offers a variety of
 training features, such as:

 ••  Abstracts of all Papers Presented at the
 Conference
 •• Training Programs Designed Specifically
for EPA Statistical Needs
 •• Information from Current Publications in
 Environmental Statistics  and Information
 Science
 ••    Informal  Discussions  with  Other
 Statisticians to Focus on Specific Problems
 and Probable Solutions

   Train for the Future In Statistics
CALL FOR YOUR

      PARTICIPATION
        (YOUare the conference)
UNCLE SAM and your EPA co-workers can
benefit from your experience. . .  be a
participant in this year's conference.  We
invite you to:
        =!> Make a Presentation
        =0 Chair a Session
        •* Present a Poster
        •=£> Moderate a Roundtable
         Discussion
        =& Become a Member of the
         Conference Planning Commote
   OR... you may have another ideal
Whatever you would like to do, name It,
and contact BARRY NVSSBAtlM NOW!
by phone at (202) 260-1493 or by fax at
(202)  260-4968  or  by   e-mail  at
Nussbaum.Barry@epamail.epa.gov

-------
WHY ATTEND
••      Learn  latest developments  in  environmental
       statistics
»      Share what YOU are doing
»      Meet other colleagues
*>      Present a poster; make a presentation
»      See demonstrations  of  the latest  statistical
       programs
»      Get answers to statistical problems
»      Build team spirit
»      Receive training in  new software,  statistical
       methods, computers
*      Build a network of statistical and information
       specialists for the FUTURE

 WHO WILL BE  THERE

 *      EPA statisticians and survey specialists
 »      EPA developers and users of environmental
       information and statistics
 »      EPA policy and decision makers
 »      State  and local  government  environmental
       information developers and users
 »      University experts and students
 •>      Special Guest Speakers
 *      YOU
             REGISTRATION
    FOR THE TWELFTH ANNUAL EPA CONFERENCE ON
            ENVIRONMENTAL STATISTICS
 RICHMOND, VA                        APRtt 1-3, 1997

 Complete registration packets will be mailed on

            JANUARY 31, 1997

       la your mailing information correct?
 Did we miss someone? Do you want to add a
              colleague to the list?

 Contact MARCIA GARDNER
         SRA TECHNOLOGIES, INC.
 Phone {703} 2O5-8547,fax (703) 2O5-626O or
 E-mail: MAlECIfl.GAJtDNE8@sratech.eom

-------
c/EPA
    UratedStote
    EnvBonmentolProtection Agency
    (2163)
    401 M Street SW.
    Washington. DC 20460
    Official Business
    Penalty for Private Use $300
                                              Margaret G. Conomos

                                              (2163)

-------
Question 1. How large does a group have to be to show health effects from arsenic exposure
between 10 and 50 Atg/1?

The 1960's Taiwan Epidemiological study studied people exposed to arsenic in drinking
water beginning in 1900. Wells ranged from 0.01 to 1.82 ppm (10-1,820 ppb or Atg/1).

Doctors physically examined 40,421 people out of 103,154 in 37 villages.
728 cases of skin cancer, 153 histologically confirmed.
72% had hyperkeratosis and 90% had hyperpigmentation.

The control group of 7,500, had an age distribution similar to the study population.
Arsenic ranged from non-detect to 0.017 mg/1 (17 ppb or //g/l). No skin cancer,
hyperkeratosis or hyperpigmentation in the control population. The expected number of
skin cancer cases, using the skin cancer rate for Singapore Chinese from 1968-1977 is a
little less than 3. Using this as the expected prevalence, the probability of observing no
cancer cases is 0.07.

EPA's drinking water criteria is 50 A*g/l or 50 ppb. The Taiwan study identified a
NOAEL (No observed adverse effects level) of 0.8 Atg/kg/day, and a corresponding
concentration in drinking water of
Question 2: How many infants should be in each concentration range, for a study of sulfates?

The Center for Disease Control (CDC) is proposing to study 1,000 babies exposed to
sulfate in their drinking water and compare them against 250 babies not exposed to
sulfate. They haven't identified the babies nor the exposure concentration ranges yet.
Sulfates cause a laxative effect-above 1,000 mg/I, and EPA's proposed drinking water
criteria is 500 mg/1, a level at which sulfates aren't expected to be a problem.

CDC's Sample Size Calculations for the planned study are attached.

In 1995 CDC studied 276 infants, and found 39 cases of diarrhea, with a median of 264
mg/1, and a range of 0-1271 mg/1. Non-cases had a median of 260 mg/1, and a range of 0
to 2787 mg/1. However, as seen by the attached graph, there were very few infants being
exposed to 500 mg/1 or higher.
Question 3: Are 100 participants, divided into 0, 500, 800, and 1200 mg/1 (40 per-group) enough
to establish a dose thaf-causes diarrhea?

In 1994, 4 volunteers drank water with 0,400, 600, 800,1000, and 1200 mg/1 sulfate at
48 hours. In a follow up study six people drank 1200 mg/1 sulfate for six-days and drdn^t
report "diarrhea.
From Irene Dooley 202/260-9531 f-\misc\epi-stafpwr •.

-------
R
                                               Sample Size Calculations'
Confidence
95%
11
"
n
"
II
.Power
80%
»
n
it
»
n '
1
Unexposed:
Exposed
14
11
15
<(
1:6
it
Disease
in Exposed
13%
11
M
fl
•1
11
Risk Ratio
IS
16
1.5
1.6
l.S
1.6
Sample Size
Unexposed
345
250
332
241
324
235
Exposed
1.381
1,001
1,662
1,205
1,943
1,409
Total
1.726
1,251
1,994
1,446
2,267
1,644
         1 Using Epilnfo (Version 5 Olb DOS) sample size calculations for unmatched cohort and cross-sectional studies (Exposed and
         Nonexposed).
  45

  40 •

  36

  30

| 26



  16 •

  10 •

   6 •

   0
                 L
                                                 Mean = 363 mg/L
                                                 Median = 264 mg/L
                                                 Range* = 0 to 1327 mg/L
                                                 * One water sample (2787 mg/L) Is omitted
                          *  88  s
                                                                       J	•_
                                                 Sulfate Level (mg/L)
                     Figure 3. Frequency Distribution of Sulfate Concentration for All Water
                     Samples Submitted: June-October, 1995 (N=172)
                                                     13

-------
SEER'Stat

        The SEER*Stat system is a statistical package for the analysis of SEER and other cancer
        databases.  SEER*Stat provides a graphical user interface for the production of the following
        statistics and statistical tests.

        •   Frequencies
        •   Percentages
        •   Crude (non-adjusted) rates with standard errors and confidence intervals
        •   Age-adjusted rates with standard errors and confidence intervals
        •   Trends over time as percent changes, from crude or age-adjusted rates
        •   Trends over time as estimated annual percent changes, from crude or age-adjusted rates, with
            confidence intervals
        •   Comparison of estimated annual percent changes with zero
        •   Comparison of two estimated annual percent changes with one another

SEER Web Site                   s

    Home Page URL: httD://\v\vw-seer.ims.nci.nih.eov/

    The SEER web site contains a variety of information about the SEER program.

    Topics areas include:
        •   News
        •   About SEER
        •   Publications
        •   Online Systems
        •   Online Data
        •   Scientific Systems
        •   Registries
        •   Other Links

Online Systems

    Cancer Query System (CANQUES) on the Web

        CANQUES on the Web is an interactive system with a Java interface that allows the user to access
        a variety of pre-calculated cancer statistics. There are currently in excess of 7.8 million pre-
        calculated statistics available. CANQUES performs  no calculations and contains statistics that
        were created by the SEER Program for their routine reporting and the Cancer Statistics Review,
        1973-1993 You must have a Java enabled browser to use the system and the most recent release of
        that  browser is recommended.

        Type of statistics include:

            SEER Incidence Rates
            SEER Incidence Trends
            U.S. Mortality Trends
            SEER Median Age at  Diagnosis
            U.S. Mortality Median Age at Death
            NHL and Kaposi's Sarcoma  in San Francisco
            SEER Relative Survival
Cclc
                                               W

-------
Online Data

SEER Incidence Data - The February 1996 submission of the SEER Incidence database is available in
public use text format as self-extracting DOS executables. This data is for the nine standard registries and it
covers diagnosis years 1973-1993. (Password encrypted, requires completion of Public Use Data
Agreement to extract data. Public Use Data Agreement is available via internet.)

Population Data for the SEER Registries - The populations for the nine standard SEER registries, to be used
in conjunction with the above data, are available as self-extracting DOS executables. This data is stored in
text format and contains populations for 1973-1993 by individual registry and also by the counties defining
each registry.

United States Population Data - County level populations for each state in the U.S. are available as self-
extracting DOS executables. Each state file contains county populations by year, 1973-1993 A file
containing total United States populations is also available All files are stored in text format.

Scientific Systems

Portable Survival System

The analysis of patient survival plays an integral part in determining many aspects of cancer
prevention, control and treatment and is an important part in the interpretation of cancer statistics.
Since survival statistics play such an important role in the analysis of cancer data, the NCI
previously developed a system which generated survival statistics for researchers. This system is
the NCI's Mainframe Survival System which has been in use for over 25 years. A researcher must
have access to and a working knowledge of the NIH IBM mainframe system. This places a
limitation on the accessibility of the system. Also, repetitive mainframe usage costs are an issue
where a single analysis may cost hundreds of dollars depending on the requested parameters.

Information Management Services, Inc., in consultation with the Cancer Statistics Branch of the
National Cancer Institute, has developed a new, expanded and portable version of the Mainframe
Survival System called the Portable Survival System (PSS). The PSS is a Microsoft Windows-
based application which provides more access and greater ease in generating survival statistics
than its mainframe counterpart. The PSS retains all the features of the Mainframe Survival System
with several additional features The PSS can be installed on most PCs with access to a CD-ROM
drive

The NCI and IMS are currently in the process of integrating the PSS with the SEER*Stat system to
provide a single application for calculating a wide variety of cancer-related statistics.

The PSS is available on CD-ROM and may be ordered by mailing or faxing a completed Public
Use Data Agreement form (available from the SEER Web site) to the NCI.

-------
  Applying Gyfs Theory of Sampling
 to Problems of Representativeness in
  Environmental Field Investigations
             Malcolm J. Bertoni


Center for Environmental Measurements and Quality Assurance
            Research Triangle Institute
     Some Questions to be Answered

    • Why consider Gy's Theory of Sampling?
    • What are the main concepts of Gy's
     Theory?
    • How does Gy's Theory help address
     representativeness?
    • What are some limitations and questions
     when applying Gy's theory to
     environmental field investigations?
    • How can I use this information to improve
     my lot?

-------
  Why consider Gyfs Theory of
             Sampling?

  Provides a theoretical and practical link
  between statistical sampling concepts and
  physical sample collection protocols
  Helps clarify the relationships between
  sampling units, sample support, and the
  scale of inference
  Provides a more sound scientific basis for
  making measurements/observations of
  sampling units
What's the origin of Gyfs Theory?

 • Pierre Gy, a French mining engineer,
  developed the theory in the late 1950s
  through 1970s
 • Addresses the estimation of mineral content
  in ore
 • Combines concepts from statistics, physics,
  geology
 • Has been applied to environmental
  sampling by Pitard, Ramsey, others

-------
Main Concepts from Gy's Theory


• Types of sampling lots
• Types of heterogeneity
• Classification of errors
• Principles of correct sampling
• Methods for reducing errors
      Types of Sampling Lots
D
0°
Zero-dimensional
   One-dimensional
                            c>  o • o
        o
                                      o
                                    O
                        Two-dimensional
       Three-dimensional

-------
     Types of Heterogeneity
 Short-Range (random fluctuations)
 — Constitution heterogeneity
  • How many constituents are in the material?
 — Distribution heterogeneity
  • How are the constituents distributed?
1 Long-Range
  • Non-random trends, patterns
1 Periodic
  • Cyclic changes
     n
   Constitution Heterogeneity
        More
    Less
           fr
           ^
             o
          •  t
•••
*  •

-------
      Distribution Heterogeneity
          Less
More
How does Gy measure heterogeneity?
    Based on analysis of particles or fragments;
    extends to groups of particles of fragments
    Interested in the fraction of material having
    a particular property of interest ("critical
    content"), expressed as a percent of mass
    Heterogeneity defined in relation to the
    critical analyte

-------
        Heterogeneity of a Particle
              *', = [a,. - a
)rmalize with repect to average mass of critical analyte:
    where:    ^ = concentration of particle
             aL = average concentration of lot
             Mj = mass of particle
             ML = mass of entire lot
             NF = number of particles in entire lot
         Heterogeneity of a Group
    where:
             *,  = —
    a,, =  concentration of a group of particles
    Mn =  mass of a group of particles
    Nn =  number of groups of particles

-------
Definition of Constitution Heterogeneity
                 (CH)
                        NF
         CHL = s\h) = ± £ h]
                     NF i=i
         C»t-
Definition of Distribution Heterogeneity
         TV»»     9      1 ^^^  2

           L    ^ /i'   »T £-^  n
        DHL = 	Si.
                      ,,
                    aLML

-------
  DH is defined in terms of
    possible groupings...
hence DH is affected by group size

-------
Constant Factor of Constitution
       Heterogeneity (IHL)
CHL is theoretical; it's difficult to estimate,
partly due to large NF term.

Multiplying by the average mass per
fragment, [ML / NF], eliminates the need to
estimate NF:
          IHL = CHL [ML / NF]
 Constant Factor of Constitution
       Heterogeneity
IHL is more practical; can be estimated from
observable qualities and measures:
        IHL = Cd3

where   C = the sampling constant, calculated'
        from several material parameters such as
        liberation, shape, mineralogical factors;
        d = particle diameter.

-------
     Why such concern over
          Heterogeneity?
Another measure of variability in a
population
Key to understanding and controlling errors
in environmental measurements
Foundation for understanding and applying
correct sampling principles
  Gy's Classification of Errors
 Fundamental
 Grouping and
MQfBQBtfon Cf POf
        ShorWonoe _
         Lonponge
         nuctufflton •
          error

         Pertodc
         fluctuation —
         Ml
HIBCllOII eiTOI
        extractor error
                     tampl
                            -Total error-
                              Analytical
                                    Overall
                                  — estimation

-------
          Types of Errors
• Short-range selection error (CE1)
 - Fundamental error (FE)
 - Grouping and segregation error (GE)
• Long-range fluctuation error (CE2)
• Periodic fluctuation error (CE3)
• Delimitation error (DE)
• Increment extraction error (EE)
• Preparation error (PE)
     Fundamental Error (FE)

 1 Caused by constitution heterogeneity
 1 Can be estimated a priori by studying
  properties of critical analyte and matrix to
  be sampled
 1 Main drivers are:
  - qualities of heterogeneity
  - particle size
  - mass of the sample

-------
         Fundamental Error (FE)
                \
FE2 =  -L--L
        Ms  ML)
                            IH,
where Ms = mass of the sample, assuming Ms« ML

Consequently:

                    « C—
                        Ms
 Grouping and Segregation Error
                 (GSE)

 • Grouping error introduced when fragments
  are not selected one at a time (always!)
 • Segregation error introduced when
  fragments are not randomly distributed
  (Distribution heterogeneity)
 • Reduce GSE by:
  - generating a sample by taking many increments
  - homogenizing the material when possible
  - selecting random locations for increment
   extraction

-------
Gy's Theory helps statisticians:

1 Choose a sample mass (support) to satisfy
 FE design constraint for a given particle
 size and sampling constant
1 Reduce FE and/or sample mass through
 grinding to reduce particle size
1 Reduce GSE by specifying, for example,
 that "10 to 30 increments shall be taken to
 form a sample"
        Delimitation Error
1 Introduced when incorrect shape and
 orientation for sample increment is selected
 - design fault
 - equipment selection/specification fault
1 Correct shapes:
  • zero dimension — unit
  • one dimension — slice
  • two dimensions — cylinder
  • three dimensions — sphere or cube

-------
  Examples of ID Delimitations
            Top View of Stream
Stream
Flow
         I	J    I
           Correct
                        Incorrect
  Examples of 2D Delimitations
        Tube
        Sampler
Auger
                      The Material
          Cross-Section View

-------
         Extraction Error
 Introduced when material is imperfectly
 extracted in relation to the correct
 delimitation
 - implementation fault
 - equipment selection/specification fault
 Can result in systematic or random error
 Many environmental sampling tools
 introduce both delimitation and extraction
 errors
    Extraction Error Example
Cross Section Views of Sampling Spoons
   Rounded
   bottom
Flat bottom,
no side walls
Flat bottom with
  side walls
Slice of material scooped from elongated pile

-------
    Some limitations and questions

    Gy's Theory based on particle
    characteristics; environmental sampling
    often involves media that don't translate
    well to this model
    Not clear how some chemical
    contamination applies (e.g., sticky stuff that
    adheres to particles?)
    Is the average characteristic of the sample
    always what the investigator wants to
    know?
How can I use this to improve my lot?

   • Design the right measurement protocols
    (correct delimitation)
   • Study the matrix you're sampling
   • Increase the mass of the sample
   • Take more increments for each sample
   • Reduce particle size through grinding
    (if OK for the material/contaminant)
   • Specify correct subsampling protocols

-------
                            Scientific Extrapolation
Cycle of Representativeness
                    TARGET
                   POPULATION
Short-Circuiting
the Cycle of
Representativeness
  AmlradnOm
          Nv.rtlcinon
       ., ntfVIMUkn ,^
                                Sclmllfle/Paraanal Judgment
                        /  ,'* Extrapolation

-------
[1] From: Alan Goozner at DCOPP7 12/27/96 8:55AM (5369 bytes: 85 In)
To: melko@juno.com at IN
Subject: Master Sampling Frame for Non-Agrcultural Pesticide Research
	 Forwarded 	
From: Alan Goozner at DCOPP7 12/19/96 7:57AM (5121 bytes: 85 In)
To: chlorine-news@igc.apc.org at IN
cc: PEPI LACAYO at X400, BARRY NUSSBAUM at X400, MATTHEW LEOPARD at X400,
  Alan Goozner, Rob Esworthy, Edward Brandt
Subject: Master Sampling Frame for Non-Agrcultural Pesticide Research
	 Message Contents 	
          The EPA and the USDA historically have divided its
          responsibilities for collection of pesticide usage
          data where the USDA conducts surveys of farmers for
          agricultural pesticide usage and the EPA conducts
          specialized surveys of non-agricultural pesticide usage.

          In the past, the EPA conducted the National Home and Garden
          Pesticide Usage Survey and more recently the Certified
          Commercial Pesticide Usage Survey.  These two surveys were
          National in scope and cost the Government over a million
          dollars each to complete.

          The EPA is not very well suited for the collection of data.
          The Office of Pesticide Programs does not have a
          professional data collection staff and needs to contract out
          this activity whenever a study is conducted.  This requires
          competing in the private sector for a statistical
          contractor, the clearance of an information collection
          request through OMB and preparation of a report that must
          clear many hurdles before being released to the public.
          And, by the time the report reaches print, the data can be
          as much as 2-3 years old.

          Needless to say, the private sector can do a much better,
          more efficient and more timely job in collecting data on .
          pesticide usage.

          In support of this need, the EPA may be in a position to
          facilitate the collection of more and better pesticide usage
          data for non-agricultural sites.  The idea is to construct a
          master sampling frame for non-agricultural pesticide usage
          sample surveys.

          If a frame can be constructed and maintained by the EPA, the
          private sector can request samples from this list to conduct
          specialized surveys of interest with the intent to share any
          data with the EPA.  The exact consistency of the frame is
          yet to be determined but it may be composed of two major
          components of the applicator population:  A) Certified
          Applicators and B)  Homeowners.

          Experience in conducting the Certified Commercial Pesticide
          Applicator Survey at the EPA shows that state lists are
          out of date.  Many applicators on state lists have not
          renewed their license or are no longer actively applying
          pesticides.  If these lists can be cleaned up and screened
          for certain characteristics that the industry may need to
          zero in on for future data collection efforts, a highly
          efficient sampling frame can be constructed.  For example,
          if a National list of pesticide applicators can be

-------
constructed with certain known demographic characteristics
and pesticide usage characteristics by types of application
work and chemicals used, stratified random samples can
zero in to target specific areas of interest for research.

The cost of constructing such a master sampling frame would
be prohibitive for any one private organization
contemplating a National data collection effort.  But, the
statistics developed would be more accurate and reliable
from a statistical stand point.

The question is:  Is this a good idea?
If such a sampling frame was constructed, would your
organization use it to collect more/better data on pesticide
usage?  If used, would it result in a savings in your market
research budget?  Would it enable better and safer
introduction of pesticide products?  Would producing more
reliable data support the goal of overall pesticide exposure
reduction?

You reply and further discussion is encouraged.  If there is
enough industry support, I am willing to propose this to EPA
management in the Pesticides Office as a project.  You may
want to communicate what specific non-agricultural pesticide
usage data collection efforts are underway or being
contemplated that may lend itself to using such a master
sampling frame.  Would use of such a sampling frame result
in reduced costs for your organization?  How much of a
savings would this be on an annual basis?

You may reply directly to:
Goozner.Alan@epamail.epa.gov

Alan R. Goozner, Statistician
USEPA, OPPTS/OPP/BEAD/EAB

-------
Estimating Dietary Exposure to Pesticide Residues
Table of contents

1 Author Ed Brandt, Economist
2. Abstract:
3 Statement of problem and approach

a. Increased need for measures of aggregate exposure
b Limitations of existing residue monitoring programs
c. Government Performance and Results Act of 1993 requires quantitative measures
to define goals and objectives

4. Suggested measurements for the goal of safe food related to Pesticides

a. Current measures have examined impacts as an indicator of outcomes since so
many factors in addition to pesticide exposures influence national health statitics.
b. The following table provides a schematic of the types of measures proposed for
each effect level.
c. Defining a measure of average annual dietary exposure
d. Basis for estimating average residue per sample

5. Findings of Statistical analyses

a. Descriptive Summaries of residues by pesticide and crop
b. There is general agreement in the priority ranking between POP and FDA data for
both chemicals and crops, i.e., same chemicals and crops rank the highest with
respect to residue exposure.
c Chemicals not included in either PDP or FDA account for 70% of agricultural
pesticide active ingredient use, but much of the poundage is represented by
herbicides and fumigants which would normally not be found.
d. Correlation between PDP and FDA average residue per sample
e. Correlation among crops within PDP and within FDA (correlation matrix is
in appendices)

6. Suggestions to improve existing programs to estimate national dietary exposure

a. Decrease sample sizes for pesticide residues that can be predicted from
historical data of residues and pesticide use
b. Base sample sizes to reduce existing weighted estimation errors. Weight
estimation error range by risk (amount/toxicity/endpoint of concern).
7. Future work

8. Appendicies

-------
Title: Estimating Dietary Exposure to Pesticide Residues
2. Author Ed Brandt, Economist
Economic Analysis Branch
Office of Pesticide Programs 7503 W

April 2, 1997 : EPA Statisticians Conference Poster session
3. Abstract: Several new laws have increased the need to estimate aggregate dietary
exposures. The Food Protection and Quality Act (FQPA) requires the examination of
aggregate exposures for pesticides likely to have additive effects (common modes of
action). The Government Performance and Results Act (GPRA) requires all government
agencies to reformat the budgeting process to connect measures of program outputs to
eventual environmental outcomes . Methodology and results to date are reported
concerning the consistency between two major residue monitoring programs, critical data
gaps and approaches to future data collection.

4. Statement of problem and approach

a. Increased need for measures of aggregate exposure

i. The importance of a consistent set of residue estimates across pesticides
has grown with the passage of FQPA. Previously, decision making for a
pesticide focused on whether the residues for the individual pesticide are
acceptable.

ii. The need for a national data base on residue data was recommended by the
National Academy of Sciences but funding for development has not yet
been received.

b. Limitations of existing residue monitoring programs
Two major residue monitoring programs are with USDA and FDA The
USDA's Pesticide Data Program (POP) was implemented in May 1991, to
provide data on pesticide residues in food to support exposure analyses
conducted by EPA in the registration of pesticides.

(1) Principal goal is to measure food safety for vulnerable populations
(2) 1992 to 1995 for selected crops and pesticides -(15 crops and 65
pesticides by 1995) for high consumption to infants and children

-------
and potentially riskier pesticides based on existing tox/exposure
data
(3) capture residues most related to actual consumption, i e, oranges
include pulp only. The skin is excluded
(4) probability based sample selection at the latest point of distribution.

ii. FDA residue monitoring includes a Surveillance and a Compliance
program. Surveillance data not specifically targeted toward known
problems of misuse so it tends to be more representative than the
Compliance data program which does target producers with past problems.

iii. The Surveillance program has limitations when used to estimate dietary
exposure.

(1) Primary role is prevention of illegal residues (over tolerance or no
tolerance). The watchdog role limits the flexibility to optimize
sampling for estimating dietary exposure alone

(2) Program limited by need to seize shipment in 24 hours if found to
be violative. Limits ability to measure residues downstream in the
distribution system (post harvest applications) since grower
identify is lost.

(3) Monitoring programs designed primarily for enforcement (to ensure
the absence of illegal residues) results in small sample sizes on
important commodities of high dietary consumption.

(4) Some chemicals not picked up by multi residue methods are
omitted altogether because of the incremental costs of inclusion.

c. Government Performance and Results Act of 1993 requires quantitative measures
to define goals and objectives

i. Programs must develop plans which connect program outputs to
objectives.

5. Suggested measurements for the goal of safe food related to Pesticides

a. Current measures have examined impacts as an indicator of outcomes since so
many factors in addition to pesticide exposures influence national health statitics.

b. The following table provides a schematic of the types of measures proposed for
each effect level.

-------
Effect level
Outcomes
Impacts
Outputs
Items to measure
cancer(s), neurotoxic effects, endocrine
disruption, other toxic effects
dietary exposure - residues on food
New registrations
Review of existing registrations
Measures
national health
statistics
residue levels
percent detects
pesticide use
number and type
Defining a measure of average annual dietary exposure

i.      Limit analysis to variability of an annual national average that is appropriate
       for lifetime assessments. Not appropriate for an acute or subchronic
       analysis.
.
 .
       Expected  exposure for a residue for chemical x on crop y is a function of
       the probability of detection multiplied by probability of residue level given
       detection.
                = avg residue per sample*dietary consumption
       ( 1 )
              avg. residue= Prob (any detectable residue on crop x)*Expected
              residue given detect

       These two variables can be combined into a single distribution of the
       expected residue per sample. Thus, given  1,000 samples of chemical x on
       crop y, there is an expected residue per sample and a probability
       distribution of the sample mean.
Basis for estimating average residue per sample

i.      The mean and variance of the  sampling distribution could be determined
       by knowing the probability of detection ( binomial distribution on detects)
       with a log normal distribution of residues (log normal fits residue data the
       best, consistent with the constant degradation function  modeled by a log
       normal.
ii.     One would expect that percent detect would correlate with percent crop
       treated, but this is not the case. Other factors, such as time of application,

-------
pesticide formulation with stickers and adherents, degradation rates,
weather etc. are thought to be important too. More work is needed on the
factors which most affect probability of detection.

iii. There are several alternative ways used to calculate the estimation error of
the true residue amount per sample.

(1) From probability of detection and residue distribution given
detection. A problem with this approach is that these two variables
are not independent. Percent detect is significantly correlated to the
residue level and is not correlated with the percent of use

(2) Based on variance of average residue per sample over time.
Estimating standard error by sample size and average residue level
for each year. Estimated mean is weighted by sample size for each
year using a weighted variance estimate.

(3) Calculating percentiles, or in the case of only four years of data
analyzed, the range of average residue per sample.
6. Findings of Statistical analyses

a. Descriptive Summaries of residues by pesticide and crop

i. Method 2 and 3 have been calculated but only method 3 is used to
construct a table of ranges.

(1) It is easier to understand, does not require assumptions about
homogeneity of variances and distribution form, and is closest to
existing methods for estimating upper ranges of residue.

(2) Tables are provided in the appendix which summarizes the
estimation of average residue per sample per crop.

ii. Analysis of variance indicates that compared to the variance in residue
levels among chemicals and crops, there is not a significant difference
between years for the same chemical and crop. This makes pooling data
across years more appealing to do.

-------
iii. Average residue is further adjusted by a sealer, the average intake per year
for infants and children and again for women of childbearing age. Since
these tables are rather lengthy, information is summarized again
aggregating on either chemical or crop Variance estimates at an aggregate
level have not yet been attempted.
b. There is general agreement in the priority ranking between PDF and FDA data for
both chemicals and crops, i.e., same chemicals and crops rank the highest with
respect to residue exposure.

i. Differences do exist because of food preparation and sampling as well as
timing of sampling, for example, FDA residues for citrus are higher than
for PDF, because FDA includes the skin. PDF residues are significantly
higher for pesticides that are applied during long term storage (root crops
for example)

ii. Post harvest treatments account for exposure far in excess of the pounds
applied relative to other crops. The majority of post harvest applications
are used to treat fungal diseases on tree fruits and vegetables. Insecticides
are used post harvest for grain storage. Growth regulators are applied to
stored root crops (potatoes) to prevent sprouting.

c. Chemicals not included in either PDF or FDA account for 70% of agricultural
pesticide active ingredient use, but much of the poundage is represented by
herbicides and fumigants which would normally not be found.

i. The quantity of pesticide use, in Ibs active ingredient, has little relation to
dietary exposure

ii. Fungicides and Insecticides account for most of residues yet Herbicides
have the highest use. Harvest aids and growth regulators also account for
high residue levels but the number of pesticides in this category is small.

d. Correlation between PDF and FDA average residue per sample

i. The number of observations (or cases) is defined as pesticides which have
PDF and FDA residue data for same crop and sample size exceeds 100
The 100 sample limit is the general rule of thumb used by residue
chemistry.

ii. Intercept set to zero to estimate ratio of PDF to FDA. This reduces the
loss of one degree of freedom for the intercept estimate as well as more

-------
       directly measures the ratio of residues- or multiple between the two.

iii.     Key factors affecting estimated ratio of PDF to FDA residue

              (a)     Portion of product sampled (edible vs. total)

              (b)     Time of sample collection-including late post harvest
                     applications that occur later in the retail distribution chain
              (c)     pesticide action, disposition of residues, and systemic
                     activity which results in plant uptake of the pesticide.

      Estimated Ratio of  Residues between POP and FDA
                      Fungicides Only
FUNGICIDES
APPLES
BANANA
CELERY
CARROT
GREEN BEANS
GRAPES
LETTUCE
ORANGES
PEACHES
POTATOES
Estimated S i g n i f
ratio level
PDP/FDA
1.65
0.04
0.07
17.32
0.15
0.32
014
0.04
2.70
2.36
0.01
0.02
0.25
0.27
0.02
0.16
0.14
0.01
003
0.00
R Square
82%
100%
57%
53%
95%
43%
95%
100%
71%
100%
Cases
(obs)
6
2
3
3
3
5
2
2
Factors affecting multiple
2 post harvest pesticides- extreme pts
FDA includes peel; PDP does not
extreme points- little correlation
extreme points- little correlation



FDA includes skin, PDP pulp only
5 post harvest pesticide use
3 (post harvest pesticide use
                       Insecticides Only
INSECTICIDES
APPLES
BROCCOLI
CELERY
CARROT
GREEN
BEANS
GRAPEFRUIT
GRAPES
ratio
' 4.45
0.28
1.83
0.53
3.12
0.02
2.20
Signifi
0.00
0.02
0.03
0.06
0.01
0.17
0.04
R
Square
64%
. 69%
75%
55%
65%
68%
45%
Cases
19
6
5
6
8
3
9
Possible explanations
to be determined
to be determined
to be determined
to be determined
to be determined
portion of fruit sampled
to be determined
                              6

-------
INSECTICIDES
LETTUCE
ORANGES
PEACHES
POTATOES
SPINACH
WHEAT
ratio
0.77
0.03
1.23
1.87
6.43
0.48
Signifi
0.00
0.00
0.00
0.02
0.00
0.00
R
Square
97%
95%
89%
54%
100%
94%
Cases
2
10
14
8
9
5
Possible explanations
to be determined
portion of fruit sampled
to be determined
to be determinec
to be determined
to be determined
High residue outliers
     Fungicides
Crop
Apples
Banana
Celery
Grapes
Green beans
Lettuce
Oranges
Potatoes
Peaches
Carrots
Both POP and FDA
Thiabendazole,
Diphenylamine
Thiabendazole
CHLOROTHALONIL
Captan
Chlorthalonil
Iprodione
Thiabendazole
Thiabendazole
Iprodione and Dicloran
Iprodione
POP only


DICLORAN
Iprodione
and
Vincozolin

\




FDA only








Captan
Pentachlorbiphenyl
phenol PCB
    Insecticides
Crop
Apples
Both POP and FDA
Propargite
POP only

FDA Only
Azinphos methyl
and carbaryl

-------
Crop
Oranges
wheat
spinach
Potatoes
Peaches
Lettuce
Grapes
Grapefruit
Green beans
Carrots
Celery
Broccoli

Both POP and FDA
Carbaryl
malathion.and
chlorpyrifos
permethrin
DDT
Carbaryl
Phosmet and
Parathion
Permethrin

Ethion


Acephate and
permethrin
Permethrin

POP only




Azinphos methyl

Dimethoate,
omethoate

Acephate
Diazinon



FDA Only
methidathion and
chlorpyrifos


Carbofuran


Parathion
Dicofol
Endosulfan
DDT

Methamidophos

e.     Correlation among crops within PDP and within FDA ( correlation matrix is
      in appendices)

      i.     Multivariate clustering remains to be done but based on a visual
            examination of the correlation matrix, the following crops have high
            correlations and appear to cluster.

            (1)    apples, grapefruit, oranges, bananas, broccoli
            (2)    peaches carrots grapes
            (3)    lettuce spinach
            (4)    potatoes oranges

-------
                   (5)    Crops that do not correlate with any other crop
                         (a)   Celery
                         (b)   wheat
                         (c)   sweet corn
                         (d)   processed peas

             ii.     Crops within FDA based on 20 crops examined

                   (1)    Crops that appear to cluster

                         (a)   Tomatoes apples string beans peas cantaloupe
                               sweet pepper hot peppers carrots

                         (b)   apple pear grapes potato orange cantaloupe
                         (c)   peach cherry

                   (2)    Crops that do not cluster

                         (a)   Catfish
                         (b)   wheat
                         (c)   strawberries

7.     Suggestions to improve existing programs to estimate national dietary exposure

      a.     Decrease sample sizes for pesticide residues that can be predicted from
             historical data of residues and pesticide use

      b.     Base sample sizes to reduce existing weighted estimation errors.  Weight
             estimation error range by risk (amount/toxicity/endpoint of concern).
8.     Future work

      a.    Developing "synthetic estimates" for pesticides/crop combinations with limited or
            no data

            i.     Model residue measurements as influenced by portion of the food sampled,
                   time of sampling, decay rate of pesticide and metabolites, when applied,
                   systemic pesticides which are taken up by the plant, and extent and changes
                   in pesticide use

            ii.     Identify cases for which estimates cannot be made or are statistically weak

-------
             iii.     Evaluate the robustness of aggregate measures to identify significant
                   changes or trends in the level of pesticide residues for a given set of
                   chronic effects, i.e., cancer, neurotoxic, etc.
      b.     Additional sources toi nclude

             i.      Total diet study
             ii.     USDA's monitoring of meat milk and eggs
             iii.     state monitoring

      c.     Estimating sampling variance - individually and in aggregate for common
             mechanisms

      d.     Clustering and other multivariate techniques to identify plausible
             interrelationships of huge data sets

      e.     Developing relationships between pesticide use parameters, crop and
             pesticide chemical/physical properties to improve regualtion of pesticides
9.    Appendice
      a.     Crops listed in order of estimated dietary pesticide consumption of
             children and women of child bearing age- FDA and PDF

      b.     Pesticides listed in order of estimated dietary pesticide  consumption of
             children and women of child bearing age- FDA and PDP

      c.     Agricultural pesticides not included in PDP or FDA from 1992 to 1995:
                                        10

-------
Mathematical  Geology

Volume 26, Number 3, April 1994
Contents

ARTICLES
Spectral Simulation of Multivariable Stationary Random Functions Using
          Covariance Fourier Transforms                           277
    E. Pardo-Iguzquiza and M. Chica-Olmo

The Integral  of the Semivariogram:  A Powerful Method for Adjusting
          the Semivariogram  in Geostatistics                        301
    Frederick Delay and Ghislain de Marsily

Posterior Identification of Histograms Conditional to Local Data         323
    Andre G. Journel and Wenlong Xu

Estimation of Background Levels of Contaminants                     361
    Anita Singh, Ashok K. Singh, and George Flat man

Comparative  Performance of Indicator Algorithms for Modeling
          Conditional Probability Distribution Functions              389
    P. Goovaerts

BOOK REVIEW
Principles of Mathematical Geology  by "A. B. Vistelius                 413
    Reviewed by C. John Mann

LETTERS TO THE EDITOR
Comments on "Cumulative Semivariogram Models of Regionalized
          Variables"  and "Standard Cumulative Scmivanograms of
          Stationary Stochastic Processes and Regional Correlation"
          by Zekai  §en                                          4 j 5
    Donald E. Myers

Reply to Comments by  Donald  E Myers                             417
    Zekai  t'n

-------
Mathematical Geologv. Vol 26. Ni> 3. 1994
Estimation of Background Levels of Contaminants

Anita Singh,2 Ashok K. Singh,3 and George Flatman4
Samples front hu'iirdou* \\asie me in\e*iigtiinm* frequently conn- from two or more statistuttl
populations Asscs.*mem of "background leielsaftoHtaintnanisiunhca *igmficani problem Tin:
problem is being investigated m the U 5 Environmental Protection Agency's Environmental Mon-
itoring Systems Laboratory in La* Vega* This paper describes a stati*tnal approach for assessing
background levels from a tlaiawi The elevated \alues that may be a**o(iated with a plume or
contaminated area of the sue are separated from lower value* thai are assumed to represent
background levels It would be desirable 10 separate the /HO population* either spatially by Krigmg
the data or ihronologicall\ b\ a tune *eries anal\si\. provided an adequate number of sample*
\\ere properl\ collected in spate and/or time Unfortunately, quite often the data are 100 fe\\ in
number or too improperly deigned to support either spatial or time serie* analysts Regulation*
typically call for nothing more than the mean and standard deviation of the background di*tnbut ion
Tins paper provides a robust prohubili*iii approach for gaming this information from poorly col-
lected data that are not suitable for above-mentioned alternative approaches We assume that the
sue has some area.* unaffected b* the industrial atinii*. and that a subset of the given sample i*
from this clean part of the sni' We (an think of thi* multnariate data *ei ai (omingfrom two or
more populations the background population and the (ontaininated populations fnv//i var\ing
degrees ofi ontaminolion) U*int> mhu\i M-e*tinmiors ue de\ elop a proi edure to lt)gnal and biological applications

KEY WORDS: robust M-esnmaiors. influence funcuon. background esiimauon, robust confidence
limits, separation of mixed sample
INTRODUCTION

The United Slates Environmental Protection Agency (U S. EPA) encounters the
statistical problem of mixed samples from two or more populations in Resource
Conservation and Reclamation Act (RCRA) and Superfund Amendment and
'Received 2~* June 1993. jcccprcd ^ Mmcrnhci
•'Lockheed EnMmnmcnul S>sicms jnt) TcchnolniiiL^ Compjiu. 98(1 Kelly Johnson IJnvi- 1 .is
Vegas. Nevada 89119
'Department ol Mj|hcm.iiii:v. Unixcrsi^ nl \c\jj.. LJS Vegas. Nevada
"United Sl.nct Rnvirnnmc'nuil PniiCLtnfti -\;.'ini\ l.a«> Vt-jijs \i-v.id.t SW|

-------
362                                                Singh, Singh, and Ftatntan

Reauthorizalion Act (SARA) Evaluation and Remediation. This problem is being
considered at U.S.  EPA's Environmental Monitoring and Systems Laboratory
at Las Vegas (EMSL-LV).  This  paper presents a solution  from a probability
distribution-based method.  A sample of concentration values of contaminants
from a Superfund site can  be  thought of  as a  mixed sample of background
concentration values plus ihe concentration values from a plume or plumes  Ai
first glance, a statistical analyst could think that the mixed  sample from a Su-
perfund site could be separated spatially by a Kngmg analysis  However, these
statistical techniques need  data obtained using appropriate statistical  designs
Unfortunately,  regulatory life  is  not simple. Often only too  few samples or
improperly spaced data for spatial or time series analysis are available and  the
required regulatory information is only the mean and standard deviation of the
distribution(s). This paper provides a robust probabilistic approach for gaining
this information from data thai are inadequate for above-mentioned alternative
approaches.
     The occurrence of mixture samples from two or more  normal (lognormal)
populations has been well  recognized in several applied areas of interest such
as biology, geology, medicine, reliability, and environmental science  Several
classical partitioning methods are  available in statistical literature. Sinclair (1976)
 used normal probability plots  for graphical partitioning of mixture samples in
 mineral exploration studies. Holgersson and  Jorner (1978)  gave  a good  review
 of various methods including graphical, maximum likelihood (MLE). nonlinear
 least squares,  and  method  of  moments, Fowlkes (1979) performed extensive
 simulations to  compare several graphical methods including the usual histogram
 method,  the normal probability Q-Q plot, and the empirical cumulative distri-
 bution function. The ability of these classical and graphical methods to identify
 mixtures in samples is doubtful, especially if discordant observations arc  also
 present in these samples.  Moreover, the detection of these mixtures becomes
 extremely difficult  in the presence of overlap among the component populations.
 Campbell (1984) used robust methods 10 study the effect of anomalies on mixture
 models.  Recently  Fleischauer and Korte (1990) used the point of inflection of
 the normal probability plot to  obtain an estimate of threshold background level
 contamination.
      The graphical display, unarguably, is one of the most powerful diagnostic-
  tools in the hands of a researcher. However, a subiective  estimate ol the  point
  of inflection obtained by looking at these graphs is questionable, especially when
  more than two component populations are present. The overlap among ihc com-
  ponent populations generally masks the point of inflection  Moreover, the anom-
  alous observations (if any) and  the presence of sc\eral (unknown) component
  populations can distort the Q-Q plot to such an extent that the resulting inflection
  point estimates  may  not he reliable. If one>want*, to use the Q-Q  plot*,  .iv .1
  partitioning method, a stcpwisc  procedure is dcsir.ihlc Thi- proposed

-------
Background Levels of Contaminants                                         363

procedure requires construction of a Q-Q plot at each step. Populations with
higher concentration levels will be identified first. Each step identifies a sample
from a different population. In this article, we propose robust procedures to
partition a given mixture sample into its component populations. Data-appraised
robust confidence  limits  for the individual  observations  placed  on the  same
Q-Q plot produce a more precise  estimate of the  cutoff point  between two
adjacent  populations This reduces the subjectivity involved m  choosing  the
inflection point from the graph. Several simulated as well  as real-life examples
have been discussed to illustrate these procedures. The mathematical formulation
is given in the second section, the third section has all the examples, and finally,
there is a summary of our conclusions and recommendations.
                 MATHEMATICAL FORMULATION

    The density function/M(jc) of a mixture population wuh (g +  1) unknown
component populations is given by
                                  K
                        fM(x) =  £ p,f,(x; n,; a,)                      (1)

where g  >  1. and /(jr. p,, a,) is the density function of the ah population I"!,.
assumed to be normally (or lognormally) distnbuted with unknown mean and
standard deviation (SD) p,  and a, respectively, and p, is the unknown  mixture
proportion for FT,; /  = 0,  1. 2, . . .  , g, with Dp,  =  1. Throughout the rest of
the article, it has been assumed that  the researcher has performed a suitable data
transformation to achieve normality  or near-normality (e g.. log-transformation
for positively  skewed data) before  proceeding with the following algorithm.
Given a sample JT,, jc2, - - -  , -*„ of size n from this mixture model, the objective
is  to resolve it into its component  populations, i.e., find n,  > 0 such that n,
observations belong to II, with £f=0 n, + n£ = n. Here nF > 0 is  the number
of extreme unusual observations which stand alone and do not belong to any of
the given (g +  1) populations. The subsample of size n, then can be used to
estimate the parameters of population FI, and its proportion pr t  =  0.  1.
g.  The normal  probability  Q-Q plot is generally used  to get  an idea about .t,'.
the number of populations present. However, inevitable overlap  among the
component populations and/or the presence of anomalous observations generally
distort the Q-Q plot significantly, resulting in masking of some of the component
populations, especially those populations which have lower concentration levels
Traditionally, theoretical quantiles from u standard normal distribution arc plot-
ted along the x-axis in a typical Q-Q plot. However, in this article, we use the
theoretical quantilcs_from N(x, s) for the classical Q-Q plot and the thcorctic;il
quantiles from /V(.v*. v*)  for the  robust Q-Q plot, here » is ihc sample inc.in

-------
364                                                 Singh, Singh, and Flatman

s the sample standard deviation, and x *, s*, (defined later in this paper), rep-
resent their robust versions, respectively.
    The initial step in the process is to identify nF >  0 highly  contaminated
observations, which stand alone  by themselves on a normal  probability plot.
These observations may require individual treatment and/or further investigation
and should  not be included  in the subsequent  partitioning of the underlying
mixture sample.  Due to masking effects,  the exclusion of these observations
from subsequent analysis  may be required  to identify intermediate populations.
This does not mean at all that these observations have been thrown away. The
new Q-Q plot will be drawn using the remaining n -  nF observations.  This
Q-Q plot will reveal if any representative samples from populations with higher
concentrations, namely FI,,, 11^. , etc. are present. Robust confidence limits for
the individual observation x, drawn on these Q-Q plots provide an objective
(rather than subjective) estimate of the cut-off point between two  adjacent pop-
ulations. The process is repeated until all of the observations have been classified
into the various component populations. Each time a population is identified, a
new Q-Q plot with the new robust limits  is drawn using only the unclassified
observations. This process provides a good estimate of the number of remaining
populations that need to be identified. At  each  step, these robust limits corre-
spond  to the most dominant population present at that  step. If there is such a
population present, then  this population  may be identified first, using these
robust limits as the estimates of its cutoff points from the adjacent populations.
The separation between two populations is probably most difficult in the pres-
ence of overlap.  The overlapping populations (if any) should be identified in
the very end. All these ideas have been explained by means of several examples
presented in the following section.
     Here, flo represents  the background  population and n,; i =  1,2 ..... g
represents contaminated parts of the site with varying degrees of contamination
levels in ascending order of magnitude, with II,, representing the population with
highest contamination levels. A recently proposed redescending PROP (Singh,
 1994)  influence function used here to identify the discordant observations  is
given by
                       = da exp <-( dl(               (2)

 where Jf, is the (a) 100% critical value of the distribution  of 
-------
 Background Levels of Contaminants                                          36S

 observations classified into Ug will be used,  and so on. Each observation is
 assigned some weight according to its extremeness in either of the two tails of
 the distribution. These weights provide a very effective way of obtaining esti-
 mates of the degrees of freedom needed to compute  the individual robust con-
 fidence limits at each step. The resulting M-estimators for a given sample are:
                       J* =
 and
 v  = E2(  cg (not including the nf extreme observations) will be used to obtain the
 robust  interval Ig  = (LTLK,UTLJ for the  #th population YIK. All of the  unclas-
 sified observations belonging to  this interval will be declared as coming from
 11,,. Next, all the  observations x > LTLt will  be deleted from the  subsequent
 partitioning  and a new  Q-Q  plot with the  new robust limns  will be obtained
 using the remaining observations An estimate of Cv _ , , the  cutoff point between
 populations  Uv_2 and FIV . ,. will be obtained from this plot. All unclassified
observations .v >  £•„  ,  'will he used 10 obtain the  robust  boundaries given In
-------
                                                    Singh, Singh, and Flatman

 /,,., = (LTLg_i,VTLg_i} for the (g - l)th population II,_,. All observations
 belonging to 7^ _ , will be declared as coming from R, _ ,. In case of any overlap
 between n,., and n,. i.e., when LTLg <  t/TL,.,, observations in the range
 (LTLg,UTLK.i) can be assigned to either of the  two populations  FI  or U
 However, the  PROP influence function (2) used in the derivation o'f the robust
 limits given by (4) minimizes the overlap between  the estimates  for the two
 adjacent populations by down-weighing the extreme  observations appropriately
 in either of the two tails of the distribution of the underlying populations. More-
 over,  when the two adjacent populations have disjoint boundaries, the obser-
 vations  (if any) belonging to this unclaimed region  (LTL,,UTL,.,) should be
 assigned to their nearest neighbor.
     This  process will  be repeated as many times as required until all of the
 observations have been classified into their respective populations.  At the final
 step, the threshold values for the background population n0 will  be estimated.
 The remaining unclassified observations will be used to estimate UTLo, which
 is given by the one-sided probability statement:

                          P(x,  < UTLo)  = 1  - a

 where UTL^ can be obtained using (4) by  replacing a with 2 * a
     Observations smaller than UTLo will be declared as coming from  n0 As
 before, if there is overlap between UQ and n,, i.e.. LTL{ < UT^, then obser-
 vations in  the overlapping range (L7I,, UTl^) can be assigned to either of the
 two populations  HO or II,. Once the boundaries for the  various  component
 populations have been  established, the complete classification procedure  can
 now be described in various steps as follows:

     1.  First of all, identify all of the extreme observations  nE  > 0  These will
 not be used in any of the  subsequent partitioning of the underlying sample.
     2.   Next  define  a,  =  no.  of observations 6  the  overlapping  region
 (LTL,,UTL, _ ,) between  populations U, _ , and n,, with a,  , , > 0 of these eR
 and a, , >  0 of these ell,,  / =  1, 2	g. and b, = no of observations e the
 unclaimed  region (UTL,_lt LTL,) between populat.ons 0,_ , and U  with b
 > Oof these fn, and/?,_,, > Oof these  eH,.,,/ =  1.  2.       K '
     3. Identify all of the non-overlapping observations   LTLV) + u    + h
     6  Once, the number (K  +  I) of populations prcscm.  .i.ul the respective
-------
   Background Levels of Contaminants
                                                                       367

   subsample sizes *„ ,- 0, 1	g have been est.mated, the (g + 1, popUiatlOn
   proportions are estimated using the following formula:

                   p,  = n,l(n - nF),      i  = 0.  1,      e
                                                       o
       7  Finally, using these n, observations,  the robust estimates of the param-
  eters of population n,, / = 0, 1	g w,ll be obtained  using (3).

       In order to  illustrate the proposed statistical  procedure, we now  present
  some simulated as well  as real examples.


                     EXAMPLES  AND DISCUSSION

      The procedure described here has been  applied to two s.mulated datasets
  as well  as a real dataset  from the Sacramento Army Depot  Superfund Site from
  Keg.on  9 EPA. There were six primary contaminants at the Sacramento Armv
  £r M TT^ Ske: Cadm'Um (Cd)' Chrom'^  (CD.  Copper (Cu). Lead
  (Pb), N,ckel (N,); and Zinc (Zn). A total of 45 samples were analyzed  for the
  above contaminants, six  from uncontaminated regions of the site; which will be
  referred to  as the site-specific background  sample; and 39 from contaminated
  regions  of the site. Moreover, the procedure  outlined here has been used on a
  simulated data set representing a sample from a m.xture of two lognormal pop-
  ulations. Three  simulated data sets and the Sacramento Army Depot Superfund
 Site data set are given  in the Appendix. In the following, all letters w.th * as a
 superscript  represent robust estimates,  else,  they are the  classical maximum
 likelihood estimates (MLEs).  All the computations have been done using the
 statistical software package SCOUT developed by the Lockheed Environmental
 Systems & Technologies Company (LESAT)  for the U.S  EPA.
     Example 1.   A mixture sample of size 100 was generated from two rea-
 sonably separated normal populations with 90% (p,  = 0.9) observations coming
 from a normal population 1^ with mean 10 and SD 3  - N(\Q  3) and 10% (p
 - 01) observations coming from FI,  - N(21,  8). Observations for  the firsi
 sample ranged from 2.485 to 18.598, whereas  observations  for the second sam-
 ple ranged  from 9.489 to 43.998, indicating some overlap between  the'two
 populations. This  is the  data set  no  1, given m the Appendix  The normal
 probability Q-Q plots for the whole data set with the classical and the robust
 limits  placed on them  are given in Figs, la  and b,  respectively.  From both
 graphs ,t is obvious that there are two populations present. The upper robus,
 limit  15.99 for the  dominating population 00 provides an estimate of the cuioH
 point  c, between the two populations (Fig. Ib). Next, using  all  observat.ons >
<,- the 95%  robust  one-sided lower boundary for the population 0, wiih-hiuhcr
concentrations ,.s given  by LTL, =  17.5 (Fig.  k) Therefore, all of the ohscr-
vatums greater than LTL, arc classified as commg from 0,  Us.n, ,hc rcmamm-
-------
368
                                                         Singh, Singh, and Flatman
          439912
          31 2331
     I
          II 5*779
           1 7631
                               . - 31
91* Win** ITTL - 7» 7
                                                                     Wgni^ LTL - 3 33
                                                                            LTL -
              -69*24
                                  2.3571
                                                       13 0*67
                                                                          71 £312
                   Fig.  la.  Mixture of /V(10, 3) and /V(28. 7)-classical Q-Q plot
                                                                                              31 1557
         43.99C
         336199
          737417
          I? 16)
                 99% Mulmum IfTL - IV99
                 91* W«m4i^ UTL - M
           2 4K2
              3 9169
                                  69361
                                                                            '(VI
                                                                 95* W.nam LTL - 5 32
                                                                 99» M*iii»»m LTL - 3 97
                                      9 7353                17 91O

                                            D*«ributiort)
                                                                                               (100)
                    Fig.  Ib.  Mixture of /V(10. 3) and A/(27.  8)-rohusi Q-Q ploi
-------
Background Levels of Contaminants
                                                                                      369
          O.ffC
          370734
          300JZ5
          73.0797
          16 1069
                 93XLn.bl7.XOI
                                                                          h 10 TOM
                                                                    DcrMo* h I.39M
              10
                                 3.)                 (.0
                                       Control O«l (ln«io«)
                                                                       15
                                                                                          110
        Fig. Ic.  Mixture of N(10, 3) and N(21, 8)-robust chart-A/(27, 8) contam. sample.
                99ft MulHwm UTL - 15.97
          17 7015
     I
          979*
          5 1906
                9J* Wuiiln( inr - M 57
                                                                                          *(")
                                                                               .
                                                              95«
                                                                           - 5 33
                                                              99» Muimum LTL - 3 »l
    1.9H6

Fig. Id.  Mixture  of A/(l(). 3) and N(2~l, 8)-rohust Q-Q ploi unclassified sample
                                 697*3               9.WI9              IJ 9555
                                     Theoretical Qumila 
-------
 370
                                                     Singh, Singh, and Flatman
        16 IOW
        IJ 7015

!•
         796
            «* tm.b 131357
                                                         •pi)
                                                         n J
          I 0
                         13 M
                                       «3 J
                                                      67 7?
                                                                    900
     Fig.  le.  Mixture of A'dO. 3) and N(21. 8)-robust chan background A'dO. 3) sample
unclassified observations (smaller than 17.5). the Q-Q plot with the robust limits
placed on it is  shown  in Fig. Id. From this  figure, it  is obvious that there is
only one population left. The 95% one-sided  robust upper boundary c/7L0  =
13.84 for HO is given in Fig. le.  All observations less than  13.84 are classified
as coming  from O0. Observations in  the range (UTL^. L7L,) will be assigned
to their nearest  neighbor. Thus the observation 16.107 (the  only observation in
this range with bi  =  1) will be assigned to II,. Two observations from IV
namely, 16.107 and 18.598 are misclassified into II,, and one observation 9.489
of n, has been misclassified into n^. All of the relevant estimates of the pop-
ulation parameters after the  final classification  are summarized in Table I.
    Example 2.   In this simulated example,  we consider a three  population
mixture model with ten observations from an TV(20, 4) population, 100 from an
A'(O.l) population,  and 30 from  an /V(5.  1). Moreover,  in order to show  the
extent of distortion of  the Q-Q  plot by the presence of extreme observations,
two extreme observations from an  /V(100,10)  are also included  in  this mixed
sample. This is data set no. 2 in the Appendix.  The classical and the robust
Q-Q plots  using all of the.1^2  observations  are given irv Figs  2a  and  b,  re-
spectively.  Both graphs identify the two extreme observations. Moreover, both
graphs give indications of the presence of a  sample  from a  population with
higher concentrations (observations no. 1-10). However, due  to the large van-
-------
lackground Levels of Contaminants
                                                                                               371
       123 TO
         TJTI
        5 U97
          375
              99* Mulmmn UTL - 43 T7
                                                                                           (141)
                                                                                     (MI)
                                                               Wimlnt LTL -  Z3 3
                                                            ?9«
                                                                       LTL -  J4 3g
                                                                     74 0144
    " 
-------
372
Singh, Singh, and Flatman
          \9.ni
     I
          1.7477
           17.31
                »* Mnhnrni UTL-I7.M
                    Wvntnf UTL - 13.19
                                                                                       '(••I
                                                                         (lit)
                                                                      LTL - -in
                                                                Mnlmm LTL - -II
                                     Theoretical QmftSta (lluimmt Dtarfewton)




     Fig. 2c. Mixture of N(Q, 1), N(5, 1), N(20, 4)-classical Q-Q plot/two extremes removed.
          71 7159
     I
          n 6Mi
                                                                                        '(3)
                                                                          (10)
                                                                        /(HI)
      Fig. 2d. Mixture of N(0, 1), N(5, 1), A/(20, 4)-robust Q-Q plot/two extremes removed.
-------
Background Levels of Contaminants
     373
          If.TW
          25.IIJ4
         71 1261
          17 MOJ
          USS3I
              1.0
                        '(I)
                    LTLh I3O38
                                                                             4414
                                                              Standard Ocvulion h 4 1)74
                                                                                            (in)
                                 3 25                 5.5                 7 75
                                      Control Chart (IntfiriAaal dncrvxionf)
in 0
     Fig. 2e.  Mixture of N(0, 1), A/(5. 1), A/(20, 4)-robust chart-contam. sample A/(20, 4).
    1
         4 J7O4
         706Q
         O JVI
         J 5363
               99«
             J J2
                                  140
                                                                                       (M4)
                                                              
-------
374
                                                                       Singh, Singh, and Flatman
         * 5265
                   tm.lt 4 2511
                   l.TLU ) IK*
                                         (H)
                                                       (IT)
                                                                           4.7095
                                                              Stxndin) rV-»U63
               99*
                          UTT- 1 3
                                                                                 '(43)
                             - l.7«
                    f*)
            |J TOT
            7 W49
                                                              99« Muimara 1 Tl. -21
                                                    00014


                                             Qawdlca (NomJ D««rtMK>n)
                                                                                          7 XT2I
       Fig. 2h.  Mixture of A/(().  I). ,V(5. I).  A'CO. 4)-rohust  Q-Q plot unclassified data
-------
 Background Levels of Contaminants
375
        1 4*41
    1
        -OCT76
        I 7111
             '(I)
                            "(»)
             (I)
                UTLli I.4«JI   *|J4)
               —I,-. 	
                                          (Jll
                                                                        (KM)
                                                          -0 ODI4
                                                     Ovitiran H 0 WT9
        2 5363
           I 0
                          J6 0             510

                              Control Chisl (lndi«Mu*l Obvrvilkini)
                                                       160
       Fig. 2i. Mixture of A'(0. I). /V(5. I). N(20, 4)-robust chart-background N(0, 1).
                                    Table I
Popn.
IIo
n,
95%
UTL^
LTL,
Limits
= 13.84
= 17.50
a,, b,
bn, = 5
bt , = 1
n,
89
1 1
P,
.89
.11
X*
9.91
30.71
s*
2.36
8.40
t
9.78
30.71
.V
2.59
8.40
ation in the data set, the intermediate population is masked in Fig. 2a, whereas.
Fig. 2b gives  a clear indication of the presence of at least three  populations.
Figures 2c  and d represent the same graphs after removal of the  two nE  = 2
extreme observations.  From Fig. 2c, one can wrongly conclude that there are
two populations present with observation no.  118 = 6.67 as the inflection point.
However, this  is not the case here, as is obvious from Fig. 2d. Using observation
no. 10 as the cutoff c7  =  16.41 between populations n, and  n2,  the classical
as well as the  robust (same) lower boundary  for population FI2 is given in  Fig
2e. All observations greater than LTL^ =  13.85 will be classified into  11,. A
new Q-Q plot  using the remaining unclassified observations is given  in Fig. 2f.
which leads to r, = 2.34 as the cutoff point between populations 0,  and no. Ii
-------
376                                                Singh, Singh, and Flatman

should be noticed that the robust procedure used here has produced the same
cutoff point of 2.34 between populations UQ and II,, as can be seen from Fig.
2b, d, and f.  Using all unclassified observations >2.34, the  two-sided 95%
robust boundary for population n, is (3.16, 6.26), as given in Fig. 2g. Next all
observations less than 3.16 have been used to 'draw the robust Q-Q plot, given
in Fig. 2h. From this graph it is obvious that there is only one population UQ
left at this stage. Using these observations, the 95% robust upper boundary for
the background population is  given by UTL^ =  1.485.  All observations less
than this threshold will be classified into the background population HQ. Once
the boundaries have been set, observations in the overlapping and the unclaimed
regions have been  classified according to rules  described above. AH of the
relevant statistics using the final classification are summarized  in Table II.
     Example 3.   In this example, we consider the data set from a Superfund
Site with six samples known to come from the background  population (obser-
vations 33-38). As mentioned earlier, the site was sampled for six contaminants,
but the results for cadmium concentrations alone are included in this article.
The data for the 45  collected samples (background  samples included) is given
 in data set no. 3 in the Appendix.
     The average site-specific background level of a contaminant plays an im-
 portant role in remediation decisions. As such, the estimation of the average
 site-specific background of a contaminant is an important problem. We now
 show the results obtained by  using the proposed procedure on Cadmium con-
 centrations. The classical as well as the robust Q-Q plots for cadmium are given
 in Figs. 3a and b, respectively. From these figures, it is obvious thai observation
 nos.  15, 9, 22, and 21 represent  extremely contaminated samples and  should
 be treated  individually. From Fig. 3b, there is a clear indication of the presence
 of at least three populations. Figure 3c represents  the robust Q-Q  plot after
 removal of these nE =  4 extremes, which also indicates the presence of at least
 three populations. Using c3 = 260.27 (observation no.  19 after the removal of
 extremes) as the cutoff point between populations II2 and n3, all observations
 greater than c3 will be used to estimate the parameters of fI3, the population
 with high concentrations.  Figure 3d indicates that these observations are from


                                   Table II

  II     95% Limits       a,, h.       n,     p,      x*      s*      .1       s
Ho
II,

11:
(i.
UTLo
LTLt
UTL,
LTL,

= \ 48
= 3 16
= 626
= 1 3 85
_
bo, =
ft, , =
b, , =
fe, j =
. 	
5
3
1
0

98
32

10
i
1
23

07
	
-0
4

21

03
71

45

089
081

4 86
	
-007
4 59

21 45
—
097
1 06

4 86
—
-------
Background Levels of Contaminants
                                                                                                 377
         TJOOO
         I5IJ 33
         60 071
         H669
                   MUIBM UTL - IW7 47
                   Wtralitiim. - 1134 13
                                                                                       'PI)
                                                                           **rofTl»caf Ouanlihc) (f^oonaj
                                                                                       •(71)
            Fig. 3b.  Cadmium concentration form a superfund site-rohusi 0  Q plot
-------
378
Singh, Singh, and Flatman
          4*0774
     I
          2t2 379
          103 TJ)
           74 S£2
                 99* Miilni.i. tm. - 207.79
               74 *SJ
                                  -4 1991
'(17)
                                                                           LTX -
                                                                 99* MnlnMM LTL - 74 *6
                                                                          117 I7f
                                                                                             W7 791
                                                            I DtaribMioi)
            Fig.  3c. Cd cone,  from a superfund site-robust Q-Q plot/extremes  removed
          777.023
          575 404
     3
     I
          473 716
          777 IM
           170
                 99« l»U»li»»»n> ITTL - 727.07
                     Wimlnf UTL - 674 4
               170 549
                                                              'III
                                                          '(I)
                            "(«)         MO)
                                                                 95« Wtramt LTL - 173 II
                                                                 99% Mulmum LTV - 170
                                  7T7 IM              47^ 7*6
                                       Theomlcml OuvMlIn (Normal Oftribvi>on)
                                                                                              T7' "7 <
                 Fig. 3d.  Cd.  cone, for a superfund site-robust Q-Q plot-high cone
-------
Background Levels of Contaminants
                                                                                                 379
         641 66
         JJ7 773
         423 7*6
         114
                (I)
                        m
                                                                         High Cone.
                                                                               7»6
                                                                              | J9 351
                                                                                              (10)
              J.J                 7
Control Char) (Inifl tbW Otnemtkm)
                                                                                           100
                  Fig. 3e. Cd. cone,  for a superfund site-robust chart-high cone.
        Ml 771
               99* Minimum UTL - TO I 73
               n|
          Fig.  3f.  Cd.  cone, lor a superfund site-robust Q-Q plot-high cone  removed
-------
380
                                                          Singh, Singh, and Flatman
          164
    i
101
          1276)4
          109 166
                                    •to
                95* irrth m 623
                (i)
                W%LTLh 109. 166
                                                                                             (II)
                                                                                      (i)
                                                                    A«rr«rr a l?0 >*«
                                                                            It 6.1ICT
                                 J 75                 65


                                       COMFB! O»t q. < 1 1*^
                                                                        9 25
                                                                                           12 0
           Fig. 3g.  Cd. cone, for a superfund  site-robust Q-Q chart-intermediate cone.
         J58573
         16 174J
     3
     i
     1
         71 0963
          5 JIC
           1046
                99»Mulinnn UTL - 57.63
                95 * Wwnlnf UTL - 46 11
                                                                        (6)
                                                         (I)
                                                     (12)
                                                                        LTL -
                                                               99« Mulmvnt LT1 -  10 46
              1046
                                 5 3ir2                21 0%3


                                     Thctxntcml QiwXJha (Nomml
        Fig. 3h. Cd. cone,  for a superfund site-robust  Q-Q plot-highest 2  cone  removed
-------
Background  Levels of Contaminants
381
          «J9
    !
         J4TJ71
         II 1X7
                   UTLH«I
                       •(I)
               (I)
         21 J7J6
             I 0
                               •(1)
                                                                                           110)
                                                                 •n»
                                                                   Orrinbp h 3 1513
                                1 «                 tt                 T 11

                                     Tmral ClMI (InM^terf ObwnrMlnm)
                                                                                        100
             Fig. 3i. Cd  cone  for a superfund site-robust chart-intermediate cone
          i: n
         97125
    3
    i
          TOJJ
           J575
            61
                   J

               9i% IJTLIi IUMI
                                  13)-
                                            I'l
             I 0
                                                                              .   I'l
                                                                          IIO73J
                                                             Sundafd fVvUllonliO IA1I

                                                                       '(7)
                                1II                 1 0

                                      ( imnd Chart (Ixdl'Muil t*»m*ik»ii|
             Fin- 3j. Cd  cone  lor a superfund Mtc-rohuM chart-h.ickprnund cone
-------
382
                                              Singh, Singh, and Flatman
a single population.  The one-sided lower 95%  boundary for this population is
LTL^ = 205.91 as given in Fig. 3e.
     A new robust Q-Q plot using only the unclassified observations is given
in Fig. 3f. There is a clear indication of the presence of three more populations.
The robust boundary (109.166, 131.623) given in Fig.  3g for the intermediate
population I12 is obtained  using the top 12 observations of Fig. 3f,  with c,  =
111.60 as the cutoff point. Observation nos. 2, 4, and 5 (with b->.  = 3) of Fig.
3g belong to the unclaimed region (UTL,. LTLJ and will be assigned to appro-
        II N
            II)
        »771
        • 71
          in
           10
                                                       4TJ
        Fig. 3k. Cd  cone  for a superfund suc-rnbusi chart- known background cone
                                   Table III
 11
95% Limits
a,. b,
                                      P.
II.,
11,

II:

II,
"/
UTL»
LTLt
UTLt
LTL.
UTL,
LTL,

= 1231
= 21 56
= 41.32
= 109 2
= 131 6
= 2059
—
	
b, , = 1

fc; , = 2

>>, , = 1
—
9
10

II

II
4
22
24

21

21
	
II 02
31 45

120 39

401 4

0
5

ft

150

86
55

32

82

8
32

128

401
„
66
78

07

y

3 85
7 19

IS II

1 M) S2
„
-------
  Background Levels of Contaminants                                          3g3

  priate populations using the nearest neighbor technique (see Table III). Next a
  new Robust Q-Q plot using only the remaining unclassified observations is given
  in Fig. 3h, giving a clear indication of the presence of two populations with the
  cutoff point c, = 22.05 (observation no. 12 in Fig. 3h). The 95% robust bound-
  ary  -  (21.576, 41.319) for population n,,  using  the top ten observations of
  Fig. 3h, is given in Fig. 3i, with one observation  belonging to the unclaimed
  region ((/7L,, LTL,),  with b, =  1. Finally, using the last nine observations,
  the 95% upper threshold value for the background level contamination is £/7L,,
  = 12.308 as can be seen in  Fig. 3j. However,  in this case,  six samples from
  the background were also available. The robust 95 % upper boundary using these
  six background samples is given ,n Fig. 3k. The values in Figs. 3j and k are in
  close agreement, establishing the correctness and validity of the procedure de-
  scribed in this article. All  relevant statistics  after the final classification have
  been summarized in Table  III.
      Example 4.   In this example, we consider a simulated data set (given in
 the Appendix) which  consists of a mixture sample  from  two lognormal popu-
 lations with  some overlap.  A sample of size 20 is obtained from a log #(0. 1)
 population and  a sample of ten  is generated  from a log N(4,'2).  We use this
 example to show the effectiveness  of the proposed  robust procedure  in decom-
 posing the mixture into component populations  The classical  Q-Q plots of the
 untransformed and the log-transformed  data  are given in Figs. 4a and  b  re-
 spectively. From F,g. 4a, ,t can  be concluded that  the sample is from a  single
 positively skewed population with observation no. 22 as an extreme observation.
 This  may lead the  user to take the log-transformation From  FiC. 4b. one can
 conclude that the mixture sample  comes  from a lognormal  distribution with
 observation no. 22 to be slightly discordant. The corresponding robust Q-Q plot
 before and after the log-transformation are given by Figs. 4c and d, respectively
 Figure  4c suggests that there are more than one population present.  Figure 4d
 clearly  separates  the two underlying log-normal populations with cutoff point c,
 -  1.61. All  of the relevant  statistics are summarized as follows in Table IV.


             CONCLUSIONS AND  RECOMMENDATIONS
     The proposed robust procedure works quite effectively  m classifying a
 mixture sample into its component populations. In all of the examples discussed
 here,  the procedure described here classified the observations correctly into their
 respective populations. When the  data represent a mixture  from lognormal pop-
 ulations, the  procedure based  upon the classical  MLE estimates may identify
 some  of these observations as anomalous  However, the robust procedure de-
 scribed  here gives an indication thai there is more than one  population present
 (c.g . see Fig  4c)  This m turn,  forces ihc  user to  verify ihc distributional
iissumptions  It is assumed that the user has  some familiarity  \iith symmetric
-------
 «n MI
 HU979
 WJ 397
 S3 715
     17
            MMTU" UTT - Jli.S
        95* W train, UTL - 4 15 CM
                                                           95» Wtr»hg LTl -  MO 23
                                                                                 17
     -401 17
                           171
                                            57 42*7               J«6 774



                                            (Norm*] D4«r«Miaa|
                                                                                          516 OJ
  Fig.  4a.  Mixture of log yV(0.  1)  and log /V(4.  2) classical  Q-Q plot  (umransformed).
4 XXI
I 7301
0 U7?
                    ITTL - « »J
       W* WinOr^ UT1_ - 5 52
    ) 4J19
                                                                             111,
                                                                    4 1091
                                                                                         * OCP
j-. -*h. Mixlurc- of lo.j .Vf().  I i and  loe  ,V<4. 2) clasMcal (,» n  ploi uransloniic
                                                                                   ili
-------
!

!
i
     VII Ml
      741 179
49 J 997
      244 II)
                          UTL - J 13
          -o iff » *•••••« UTL -
                                                      I 3117


                                                   lie* (Nonul Dt*r*»«to«)
                                                                             7541
                                                                                                 J IXM
         Fig. 4c.  Mixture of log /V(0.  1) and log A/(4, 2) robust Q-Q plot (untransformed).
       4 1601 I
 1
   S24i |
             99*
                          UTL - I  61
             '5* Wanunf UTL - I  3)
      0 7ZSI I
                                                           91*

                                                           09 *
                                                                            LTL - -0^7

                                                                             LTL - -0 T>
       i :479

          •> WJ
                                0 "I
                                                      o 32)8
                                                                           0 <*>tn
                                                                                                 6135
         Fin. 4d.  Mixliirc ol loe .V(0. I) and IOL- '.\'(4  Ti n.huM O Q plot  llninstoniK-J )
-------
                                                  Singh, Singh, and Flatman

                                Table IV
Popn
11.,
11,
95%
(STL,,
in,
Limits a,, b.
= 1 143 -
= 1 419 -
»,
18
12
P.
06
04
i*
0 242
1 503

0
1
J*
563
324

0
3
I
187
688

0
1
i
643
58
and skewed distributions. It is the user's responsibility to achieve near-normality
(or at least symmetry) for each of the component populations before usme the
procedure described  here.  The  robust procedure described  here works  quite
effectively in decomposing a mixture sample into its component lognormal pop-
ulations as well (see Fig. 4d). The stepwise procedure described here combines
the natural separation between the component populations. The sample from the
Sacramento  Army Depot Superfund Site included a known site-specific back-
ground sample  This, however,  is not the case  for many  Superfund sites The
proposed statistical procedure will be a very useful tool for estimation of site-
specific background  for such Superfund sites.


                       ACKNOWLEDGMENTS
     The U.S. Environmental  Protection  Agency (EPA),  through i.ts Office of
Research and Development (ORD). partially funded and collaborated in the
research described here It has been subjected to the Agency's peer review and
has been approved as an EPA  publication The U S Government has a non-
exclusive, royalty-free license in and to any copyright covering this article The
authors wish to thank Ken Brown of U.S.  EPA\EMSL-Las Vegas for providing
us the Superfund site data and for'helpful suggestions during the preparation of
this paper
                              APPENDIX


                               Datasi't  I

    Normal mixture generated from  populations .V'dO. 3) and A'(27. 8) .90
observations are from N(\0. 3) and 10 are from  A'(27.  8)  2 49. II 15 1() 47.
10 62. 12 65.  13 52.  11 02.  13 40. 9.50. 6 93  II  54. 6 83.  10 68. 10 38
8 16.  1057.602.649. 1098.625. 11 45.  12  31. 795.  13 89.987. 10 10.
1050. 11  95.  10 16. 11 09. 7 35. II 01.  1026. 12 06. 16 11.  1203. 12 62
1029. 14  63. 11 65, 13 13. 7 93. X 18. 11 11.7 95 8 15.  14 20. 7 99. |l 31.
9 63.  8 82.  8 42. -7 32. 18 59.  7  97. 6 43.  13  19  * 59  7 40.  12 71  8 .W
13 34. 8 34. 5 71.  8 14. 8 29.  I 1  99. 11  23. 5 2h 9 04  7 12.  14 85  I I OS
-------
 Background Levels of Contaminants                                       3g7

 10 11, 11.01,9.57,  11 01, 12.25,7.93,4.48,9.13,6.58, 13.89,6.70, 1204.
 7.69, 10.84, 9.13, 6.84,  10.33, 33.38, 23.49, 30.01, 37.23, 37.66, 31 27,
 34.94, 9.48, 31.08,43.99.

                              Daiasei 2:

     Normal mixture with ten observations  from /V(20, 4), 100 from /V(0, 1).
 30 from a N(5,  1),  and two extreme observations from  /V(100,  10)  18 12.
 16.60. 27.60, 23.27, 29.80.  18.24, 24.40, 23.04, 16.98,  16.41.  1 77. 2 38,
 -022, -0.35,  -0.40, 1.00,  -0.01,  -0.16,  1.44, -1.03,  -1.84, 0.94.
 -031, -103, 1.19,  -0.14, -1.42, -0.89, -0.23,0.18, -096, -0.17.
 0 06, 1.62, -0.03, -0.25, 0.30, 2.48, -0.02, 1.23. 0.10, 1.13, -0.69, 0 72,
 -0.86.0 11, 1 16.075,027, -1.40,0.29, -0.52,2.47, 1.01, 1  89. -058.
 020. -0.66, -105,  -0.10. 1.44,0.72,0.33, 1.06.048, -069. -048.
 -I 13. -067, 0.12, -0 15,  -0.10,  -2.54, 0.25, -2.04, 055. -1.32.
 -009. 051, 0.06, 1 54. 081.  -1.65, -0.39, -0.01,0.41. -051. -060.
 1 24. -1.48. 0.51,  0  13. 0.93,  -2.17, 0.63, -0.39.  -1.37,  1  17. -1 29.
 -0 10, 0.30. 084.  -Oil.  1.66,  -0.66, -0.50,  -087.  -1 59, -0.69.
 -201,4 16,397,4 18,3 71,4.55,3.45,5.62,6.67,425,4.76,524,578,
 5 23. 6.20, 1  18, 5 62, 4.51, 5.35, 4.34, 4.77, 6.07, 4 24. 4 26.  3 77. 5 16.
 4 07. 5 46. 3.80. 5 50. 4.84. 123.76, 117.61.

                              Daiasei 3.

     Cadmium concentrations from the Sacramento Army Depot Superfund Sue
 2620.2755,44501,3077.486.31,513.79, 11281,  159.30. 1300.668.
 33 72. 35 01, 10.99, 22 05. 83094.  125.07, 40.84. 345 52,  384 80. 183 04.
 2300, 1500, 260.27, 32.09. 166.16, 31.68, 12.39,  61453. 639.52, 11624.
 11943. 111.60,  1029,  1.68, 3.34, 10.47,  11.74,  1032, 12230,  28303.
 265.08. 12549, 131.06,47.90,  119.34.

                              Daiasei 4:

     Mixture of 20 observations from lognormal yV(0, 1) and 10 from lognonnal
/V(4, 2) 05300. 2.7538. 3 2237, 0.2871,  1.2915,  1 5795. 20817. 1 0633.
07486.08284.  13252.  16477, 12311,26518.07258.52913.  19187.
 103898.  05373. 14311. 332.1949, 988.3606,  193491.  93424.  92353
 88 1362. 56 3981. 115 9378.  27.8464, 34.4647


                           REFERENCES
CampMI N  \  14X4  Miviurc Models and Aiypiul Values  M.uh Gcnl  \ Ifi  n > n J(>>
    4~7
 HiMv.hh.iiio  H  .mil Kuril-  \  I WO h'urmiiinm »/ Clcuni \uin
-------
388                                                           Singh, Singh,'and Flatman
    Probability Plots, Environmental Management (Vol  14. No  I)  Spnnger-Vcrlag, New York.
    p. 95-105
Fowlkes. E  B .  1979, Some Methods for Studying the Mixture of two Normal (Lognormal) Dis-
    tributions J  Am  Stat  Assoc , v 74, n  367. p  561-575
Holgersson,  M , and Jorner, U ,  1978, Decomposition of a Mixture into Normal Components. A
    Review  J  Bio-Med  Comp . v  9,  p 367-392
Sinclair, A J ,  1976, Applications of Probability Graphs  in Mineral Exploration.  Assoc of Explo-
    ration Ceochemisis. Rexdale, Ontario, p 95
Singh, A , 1994, Omnibus Robust Procedures for Assessment of Mullivanale Normality and De-
    tection of Multivanate Outliers, in G. P  Paul and C  R  Rao, eds . Mulnvariaie Environmental
    Statistics North Holland, Elsevier Science  Publishers, p 445-488.
Singh, A  , and Nocenno,  J  1994. Robust QA/QC  for Environmental Applications, in The Pro-
    ceedings of the Ninth International Conference on Systems Engineering University of Nevada.
     Us Vegas, p. 370-374
-------
 Representativeness in Statistics
     and Quality Assurance
          John Warren

    Quality Assurance Division
Office of Research & Development
-------
          Representativeness Influences:
Data aggregation:

o Merging data sets having similar Quality
 Assurance protocols collected using probabilistic
 sampling frames

o Merging data sets having a probability basis with
  similar data with a non-probabilistic basis
-------
          Representativeness Influences:
Hypothesis testing:

o Comparing data sets with different extraction
  methods and different sample matrices

o Comparing data sets having both within and
  between differences in the setting of the minimum
  detection levels and data editing
-------
     Factors Influencing Representativeness
Sample Selection Technique:

o Probabilistic:
             - Systematic with SRS
             - Composite with SRS
             - Adaptive with any other

o Non-probabilistic:
             - Judgmental
             - "Found data"
-------
     Factors Influencing Representativeness
Sample Analysis Methodology:

           o Intra/Inter laboratory differences
           o Method equivalence problems
           o Heterogeneous sample matrices
           o Variation in Quality Control
              - Calibration frequencies
              - Detection levels
              - Laboratory protocols
              - Extraction efficiencies
-------
               Statisticians Are Little Help
A Dictionary of Statistical Terms
F.H.C. Marriott, 1990 International Statistical Institute

Representative Sample:
                     In the widest sense, a sample which is
representative of the population. Some confusion arises according to
whether 'representativeness' is regarded as meaning 'selected by
some process which gives all samples an equal chance of appearing
to represent the population'; or, alternatively, whether it means
'typical in respect of certain characteristics, however chosen'. On
the whole, it seems best to confine the word 'representative'  to
samples that turn out to be so, however chosen, rather than apply it
to those with the objective of being representative.
-------
            Kruskal and Mosteller : 1979
Three papers in International Statistical Review
"Representative Sampling" commonly applied to:

   1. as a "seal of approval"
   2. to denote "absence of selective forces"
   3. as a "miniature of the population"
   4. as being a "typical or ideal case"
   5. to denote "coverage of a population"
   6. as a "vague term to be made more precise"
   7. as a "specific sampling method"
   8. as "permitting good estimation"
   9. as "good enough for a particular purpose"
-------
                "Seal of approval"
No explanation provided of what process was used
to go from target population to sampled population

Use of "representative" is to convince the reader to
have faith in the reported results and therefore the
truthfulness of the conclusions
-------
           "Absence of selective forces"
 Used to imply that the sampling method used
deliberately excluded selective forces that might
over-represent some sub-population

Highly vulnerable to personal bias in elimination
methodology:
-------
          "Miniature of the population"
Implies that every nuance of the population is
reflected in the sample i.e. identical frequency
distributions for sample and population.

In practice, it is obvious this cannot be acheived
-------
              "Typical or ideal case"
Inevitably only a single specimen from the
population has been selected

Tremendous possibility of bias but the implication is
that an "ideal specimen" has been selected without
true definition of whether this implies "average",
"worst case", or "best case"
-------
          "Coverage of the population"
The implication is that the sample selected has a
wide range across the population. At least one
example from each class or potential partition
(stratum) has been collected but the  appropriate
weighting factors not made available.
-------
      "Vague term to be made more precise"
The word "representativeness" is used as a promise
of things to come from a more detailed (not
specified) technical consideration of the problem.
The use of the term is intended to give permission to
discuss a problem without getting sidetracked by
technical details
-------
           "Specific sampling method"
This is the use of "representative sampling" when
really the true kind of sampling has been deemed by
the author to be too complex for the audience's
comprehension.  The intent of the author:
understanding by the majority, over-riding the true
comprehension of the minority (often statisticians)
-------
           "Permitting good estimation"
The connotation that because some sample can be
labeled "representative" it will therefore allow for
satisfactory estimation without the necessity of
defining what this actually implies.
-------
     "Good enough for a particular purpose"
This is the use of a sample to illustrate a particular
theory or hypothesis. It is a variation on the concept
of using a sample size 1 in that a counter-example
(non-random sample) can be enough to prove a case.
-------
Representativeness as an Indicator
Data Quality Indicators: PARCC

   Precision
   Accuracy (really Bias)
   Representativeness
   Comparability
   Completeness
-------
PARCC: Representativeness
o Qualitative measure
o Open to individual interpretation
o Depends on media homogeneity
o Difficult to ensure
o Often demands many samples
o Needs expert opinion
-------
  PARCC: Comparability
o Qualitative measure
o Expresses a degree of confidence
o Requires same variables of interest
o Needs units convertible to a standard
o Requires similar analytical procedures
o Needs compatible rules for data editing
o Requires similar sampling frames
o Needs meaningful temporal limits
o Requires expert opinion
-------
     PARCC: Completeness
o Quantitative
o Influence depends on sample design
o If unbiased - loss of power
o If biased - loss of validity
o Needs expert opinion
-------
    PARCC are Interrelated
       Representativeness
Completeness«.      » Comparability
-------
         Regulatory use of Representativeness
                 Essentially never defined
Water (40 CFR 403)
Air(40CFR51)
TSCA (40 CFR 763)
RCRA (40 CFR 260)
"...samples should be representative of
   daily conditions"

 "...selected on the basis of spatial and
    climatological (temporal) representativeness"

 "...at locations representative of the air entering
    the abatement site"

 ".. .a sample of a universe or whole which can be
    expected to exhibit the average properties of the
    universe or whole"
-------
     Potentially Promising Areas
o Composite statistics & area of support




o Combining environmental information




o Applying Gy's theory of sampling
-------
     Composite Statistics & Area of Support

o Interpretation of "support"

  e.g.   Linkage of long-term exposure risk
        (104sq meters) with remediation technology
        (103 sq meters) with sampled area
        (102 sq meters) with physical sample
        (101 sq meters) with sample analysis
        (10"1 sq meters) with ...
   e.g.  geophysical/geostatistical (kriging)

Englund & Flatman: Spatial Statistics Sampling
-------
     Composite Statistics & Area of Support

o Literature and information on composite sampling
   + Statistical Methods for Environmental
     Pollution Monitoring (R.O. Gilbert)

   + Handbook of Statistics vol 12, Chapter 4
     (G. Lovison, S.D. Gore, & G.P.  Patil)

   + Environmental and Ecological Statistics
     (Special Edition, G.P. Patil, editor)

   + Guidance on Sampling (QA/G-5S)
     (Under development by QAD)
-------
      Combining Environmental Information

o Literature and information on data combining:
   + Encountered Data, ... and Weighted Distributions
     (G.P. Patil) 1991, Environmetrics 2, 377-423

   + Using Found Data to Augment a Probability Sample
     (J.M. Overtoil, T.C. Young & W.S. Overtoil,
     1993 Envir. Mon. & Assess 26, 65-83

   + Combining Environmental Information I & II
     (L.H. Cox & W.W. Piegorsch)
     1996f Environmetrics Z, 299- 324

   + Guidance on Sampling (QA/G-5S)
     (Under development by QAD)
-------
   Encountered Data, Statistical Ecology, Environmental
        Statistics, and Weighted Distribution Methods
o Weighted distributions used to account for observer bias due
  to being unable to actually observe an event or sample value

o If an observation (X) has a probability 0x of being observed
 then the observed pdf is the true pdf weighted by 1 - 0x

o Regard the problem as one of modelling when samples are
 drawn without a proper frame

o The paper contains some theoretical properties of weight
functions together with some applications
-------
     Using Found Data to Augment a Probability Sample

o If the variable of interest is in both found and probability
  based samples, then use a pseudo-random sample approach
  and combine the data in the manner of a stratified sample

o If not, use a stratified calibration approach - form a predictor
  equation for found data by regressing variable of interest on
  the known frame attributes.  Then for the probability based
  sample, use the prediction equation and the frame attributes
  to  predict new variables of interest

o Extensive example on streams from the National Surface
  Water Survey
-------
                               Labor
         p.(3.
         West Virginia
         California
         Alaska
         Rhode Island

         Louisiana
         Washington
         New Jersey
         New York '
         New Mexico

         Alabama
         Mississippi
         Texa§
       • Pennsylvania
       O Montana
        ^:
         Hawaii
         Maine
         Florida
         Connecticut
         Nevada
£
;
       • Massachusetts
       • Kentucky
       • Idaho
       © Michigan
       O Tennessee
       • Illinois

       • South Carolina
         {Maryland
         Arizona
       © Georgia
       O Arkansas

       • Wyoming
       • Oregon
       • Ohio
       @ Missouri
       O Oklahoma

       • Indiana
       • Virginia
       • Kansas
       ® North Carolina
       O Delaware

       • Vermont
       • Colorado
       • New Hampshire
       • Wisconsin
       O Minnesota
        I

               Utah
               Iowa
               North Dakota
               South Dakota
             O Nebraska
                                           j:f$atistics By Staje
                                      1995 Average
                           Unemployment
                                Ratio
                                                           Employment iFQ
                                                           Population Ratio
                                           	
                                )§V
                                       i
                                  :•
                                  :
                                5      7

                                Percent
                                                   50
                                                                 60

                                                                 Percent
      Figure 2: Linked Micromaps and Statistics
-------
        Combining Environmental Information I & II
 o Two consecutive papers, the first being an overview with
  potential areas for research, the second considering various
  applications to epidemiology and toxicology

 o The overview includes kriging, non-detect problems, and
  application to truncated spatial data

 o Overview also includes the mathematical aspects of
  combining p-values (the works of R. A. Fishery and
  T. Mathew, B. Sinha, & L. Zhou)

p Examples include passive smoking and dose-response
-------