EPA Observational Economy Series Volume 1: Composite Sampling


            United States
            Environmental Protection
            Agency
             Policy, Planning.
             And Evaluation
             (2163)
EPA-230-R-95-005
August 1995
&EPA
EPA Observational
Economy Series
            Volume 1: Composite Sampling

-------
  EPA Observational Economy Series
      Vol.  1: Composite Sampling
United States           Policy, Planning,      EPA 230-R-95-005
Environmental Protection     And Evaluation      August 1995
Agency              (2163)

-------
 Contents
Foreword                                                     m

Acknowledgments                                             iv

1. Introduction                                                1

2. What is Composite Sampling?                               3
   2.1 Method	   3
   2.2 Limitations of Composite Sampling	   5

3. Applications                                                7
   3.1. Soil Sampling   	   8
       3.1.1. PCB Contamination	   8
       3.1.2. PAH Contamination	   9
   3.2. Ground Water Monitoring  	  10
   3.3. Indoor Air Monitoring	  10
   3.4. Biomonitoring	  11
       3.4.1. Bioaccumulation in Human Adipose Tissue	  11
       3.4.2. Assessing Contamination in Fish	  12
       3.4.3.  Assessing Contaminants  in Mollusks	  13
       3.4.4. Measuring Average Fat Content in Bulk Milk	  13

4. Summary                                                  15

References                                                    17

-------
 Foreword
The high costs of laboratory analytical procedures frequently strain environ-
mental and public health budgets. Whether soil, water or biological tissue is
being analyzed, the cost of testing for chemical and pathogenic contaminants
can be quite prohibitive.
    Composite sampling can substantially reduce analytical costs because the
number of required analyses is reduced by compositing several samples into
one and analyzing the composited sample. By appropriate selection of the
composite sample size and retesting of select  individual samples, composite
sampling may reveal the same information as  would otherwise require many
more analyses.
   Many of the limitations of composite sampling have been overcome by
recent research, thus bringing out more widespread potential for using com-
posite sampling to reduce costs of environmental and public health assess-
ments while maintaining and often increasing the precision of sample-based
inference.
                                 ill

-------
 Acknowledgments
 The EPA Observational Economy Series is a result of the research conducted
 under a cooperative agreement between the U.S. Environmental Protection
 Agency and  the Pennsylvania State University Center for Statistical Ecology
 and Environmental Statistics, Professor G.P. Patil,  Director.
    The EPA grant CR-821531010, entitled "Research and Outreach on Ob-
 servational Economy,  Environmental Sampling and Statistical Decision Mak-
 ing in Statistical Ecology and Environmental Statistics" consists often sep-
 arate projects in progress at the Penn State Center:  1) Composite Sampling
 and Designs; 2) Ranked Set Sampling and Designs; 3) Environmental Site
 Characterization and  Evaluation; 4) Encounter Sampling; 5)  Spatio-temporal
 Data Analysis; 6) Biodiversity Analysis and Monitoring; 7)  Adaptive Sam-
 pling Designs; 8) Statistics in Environmental Policy and Regulation for Com-
pliance and Enforcement; 9) Statistical Ecology and Ecological Risk Assess-
 ment; and 10) Environmental Statistics Knowledge Transfer, Outreach and
Training.
   The series is published by the Statistical Analysis and Computing Branch
of the Environmental Statistics and Information Division in the EPA Office
of Policy, Planning and Evaluation.  This volume in the series  is largely based
on the work  of M. T. Boswell, S. D. Gore, G. D. Johnson, G. P. Patil, and C.
Taillie at the Penn State Center in cooperation with John Fritzvold, Herbert
Lacayo, Robert O'Brien, Brenda Odom, Barry Nussbaum, and John Warren
as project officers at U.S. EPA. Questions or comments on this publication
should be directed to  Dr. N. Phillip Ross, Director, Environmental Statistics
and Information Division (Mail Code 2163), United States  Environmental
Protection Agency, 401 M Street  SW, Washington, DC 20460; Ph. (202)
260-2680.
                                 IV

-------
 1.     Introduction
 While decision making in general involves opinion based on prior experience,
 scientifically based decision making requires careful collection, measurement
 and interpretation of data from physical observations.  Examples of such
 decisions are:  "Has a hazardous waste site been sufficiently cleaned?"; or
 "Are pollutants accumulating in certain foods as well as in human or wildlife
 tissues?".
    Scientifically based decision making  should  minimize the risk of being
 wrong. Since decisions require information, which is in turn extracted from
 data, this risk decreases as the data become more representative of the pop-
 ulation being studied.
    In order for a  data set to properly represent  a population, it must cover
 the ranges of space and time within which the population lies,  as well as
 have sufficient  resolution within these ranges. It soon becomes obvious that
 collection and review of representative data can  be prohibitively expensive
 if a large sample size (number of measurements, recordings or counts) is re-
 quired, especially when analytical costs are very high such as with monitoring
 environmental and biological media for chemical or pathogenic contaminants.
    Conventional statistical techniques allow for the reduction of either cost or
 uncertainty. However, the reduction of one of  these factors is at the expense
 of an increase in the other. Composite sampling offers  to maintain cost or
 uncertainty at a specified level while decreasing the other component.
    Compositing simply refers to physically mixing individual samples to form
 a composite sample, as visualized in Figure 1. Just one analysis is performed
 on the composite,  which is used to represent each of the original  individual
 samples.
   Compositing is common practice for simply increasing the representative-
ness of a measurement, such as when measuring the fat content of a particular
entree that is composited across several restaurants included in  a national
 survey (Burros, 1994). For this reason, compositing can always reduce costs
for  estimating a total or  an average value. However, analysis of  composite
samples can be cleverly extended to classify the  original individual sample
units that comprised a composite.  For  example, one may need to identify

-------
                   Individual Field Samples
        (composite)                      (composite)
       Figure 1: Forming composite samples from individual samples

the presence or absence of a pathogen like HIV in blood samples, or one may
need to identify all soil cores whose contaminant concentration exceeds an
action level at a hazardous waste site.
   When analytical costs dominate over sampling costs, the savings potential
is obviously high; however, the immediate question is "How do we compensate
for information that is lost due to compositing?". More specifically, if we are
testing whether or not a substance is present or existing at a concentration
above some threshold, we do not want to dilute individual "contaminated"
samples with clean samples so that the analysis  does not  detect any con-
tamination.' Furthermore, if our measurements are of a variable such as a
chemical concentration, we may need to know the actual values of those in-
dividual samples with the highest  concentrations. For example, "hot spots"
need to be identified at hazardous waste sites.
   Through judicial choice of a strategy for retesting some of the original in-
dividual samples based on composite sample measurements, many limitations
of composite sampling can be overcome.  Furthermore, other innovative ap-
plications of composite sampling are emerging such as combining with ranked
set sampling, another approach to achieve observational economy that is dis-
cussed in Volume 2 of this series.

-------
 2.   What  is   Composite
 Sampling?
 2.1.   Method

 First; let's clarify that a "sample" in this document refers to a physical object
 to be measured, whether an individual or a composite, and not a collection
 of observations in the statistical  sense.  Individual sample units are what
 is obtained in the field,  such  as soil cores or fish fillets;  or obtained from
 subjects, such as blood samples. Meanwhile, a composite  sample may be a
 physical mix of individual  sample units or a batch of unblended individual
 sample units that are tested as a group. Most compositing for environmental
 assessment and monitoring consists of physically mixing individual units to
 make a  composite sample that is as homogeneous as possible.
   With classical  sampling, no distinction is made between the process of
 sampling (i.e., selection or inclusion) and that of observation or measurement.
 We assume, with classical sampling, that any unit selected for inclusion in a
 statistical sample is measured  and hence its value becomes known. In com-
 posite sampling, however,  there is a clear distinction between the sampling
 and measurement stages. Compositing takes place between these two stages,
 and therefore achieves  two  otherwise conflicting goals. While a large number
 of samples can be selected  to  satisfy  sample size requirements, the number
 of analytical measurements  is kept affordable.
   If a  variable of concern is a measurement that is continuous in nature
 such as  a chemical concentration, the mean (arithmetic average) of composite
 samples provides an unbiased estimate of the true but unknown "population"
mean. Also, if measurement error  is known, the population variance based
 on the scale of the individual samples can be estimated by a simple weighting
of the measured composite  sample  variance.
   With selective retesting  of individual sample units, based on initial com-
posite sample results, we can  classify  all  of the individual  sample units ac-
cording  to the presence or absence of a trait, or exceedance (vs. compliance)
of a numerical standard. We  can  subsequently estimate the prevalence of

                                 3

-------
         subsamples (aliquots) of
         individual samples used to
         form a composite
                                                 (composite
              retest select individuals
                                           composite test negative?
                               No	j

                                          Yes: all individual samples
                                             classified as negative


               Figure 2: Composite sampling with retesting

a trait or proportion of non-compliance. Basically, if a composite measure-
ment does not reveal a trait in question or is in compliance, then all individual
samples comprising that composite are classified as "negative". When a com-
posite tests  positive, then retesting is  performed on the individual samples
or subsamples  (aliquots)  in order to locate the source of "contamination".
    Retesting, as visualized in  a general sense in Figure 2, may simply be
exhaustive retesting of all individuals comprising a composite or may entail
more specialized protocols. Generally, as the retesting protocol becomes more
sophisticated,  the  expected number  of analyses  decreases. Therefore, one
must consider any increased logistical costs along with the expected decrease
in analytical cost when evaluating the overall cost of a compositing/retesting
protocol.
    Due to recent research (Patil, Gore and Sinha, 1994), the individual sam-
ples with  the highest value, along  with those individual samples comprising
an  upper  percentile, can be identified with minimal retesting.  This  ability
is extremely important when "hot  spots" need  to be identified such as with
soil monitoring at  a hazardous  waste site.
    Whether we are dealing with data from binary (presence/absence) meas-
urements  or data from measurements on  a continuum, composite sampling

-------
 can result in classifying each individual sample without having to separately
 analyze each one. While composite sampling may not be feasible when the
 prevalence of contamination is high, the analytical costs can be drastically
 reduced as the number of contaminated samples decreases.


 2.2.  Limitations  of Composite  Sampling

 Both  physical and logistical constraints exist that may restrict the applica-
 tion of composite sampling. The limitations which more commonly arise are
 discussed here along with some  simple recommendations for  how compositing
 still may help.

    Physical:

    If the integrity of the  individual sample values changes  because of com-
 positing, then composite sampling may not be the desired approach. For
 example, volatile chemicals can evaporate upon mixing of samples (Cline
 and Severin,  1989) or interaction can occur among sample constituents. In
 the first case, compositing of individual sample extracts may be a reasonable
 alternative to mixing individual samples as they are collected.
    Another limitation is imposed by potential dilution, where an individual
 sample with a high value  is combined with low values resulting in a compos-
 ite  sample that falsely tests negative.  When classifying  samples according
 to exceedance or compliance with  some standard value, c, the problem of
 dilution is overcome by comparing the composite sample result to c divided
 by  the composite sample size, k,  (c/k). Furthermore, when an analytical
 detection limit, d, is known, the maximum composite  sample size  is estab-
 lished according  to the inequality k < c/d. One may lower this upper bound
 on  the composite sample size  to reduce effects of measurement error. As
 can be seen here, when  reporting  limits (Rajagopal, 1990)  or action levels
 (Williams, 1990)  of some hazardous chemical concentrations are legally re-
quired to be near the detection limit, the possibility of composite sampling
may be eliminated.
    Sample homogeneity is another consideration. A homogeneous sample is
one where the variable of  interest, such as a chemical concentration, is evenly
distributed throughout the sample.  In contrast,  a heterogeneous sample can
have substantially different values for the variable of interest, depending on
what  part of the sample  is actually analyzed. If the whole sample unit is
analyzed, then heterogeneity is not a problem;  however,  most laboratory
analyses are performed on a small subsample of the original sample unit. For
example, one gram of soil may be  taken from a one kilogram soil core for

-------
 actual extraction and analysis. If a subsample is to represent a larger sample
 unit, then the larger unit must be fairly homogeneous with respect to the
 variable of interest.
    Therefore, an individual sample unit  should be homogenized as much as
 possible prior to obtaining an aliquot for inclusion in a composite. Further-
 more, formation of  a composite must include homogenization if the composite
 is going to be represented by measurement on a smaller subsample.
    Often, measurements on  multiple attributes  are desired. However,  if
 retesting is performed in order to classify individual samples, it  is unclear
 how to  optimize the retesting relative to the different attributes (Schaeffer et
 al., 1982). For example,  should chemicals be tested independently, or does
 there exist dependence in the multivariate information that can be used to
 improve cost efficiency? Classifying for multiple attributes remains an open
 problem in composite sampling.

    Logistical:

    When retesting of certain individual samples may be required based on
 composite sample results, then  subsamples  (aliquots) of the original  individ-
 ual samples  must be preserved and stored until all  testing is done.  This may
 lead to  extra expense  that must be considered in the overall cost comparison
 between compositing and other  strategies. For most environmental and pub-
 lic health studies, the analytical savings  from compositing will far outweigh
 the extra cost of sample preservation and storage.
   Another consideration is that events out of control of the scientists may
 dictate  the feasibility of composite  sampling. For example, people whose
 wells are being tested may demand that  their wells be treated as equitably
 as the wells of their neighbors.  Measuring  some well samples individually and
 some well samples  solely as part of a composite may give an appearance of
 inequitability and result in a political decree  to measure each well individually
 (Rajagopal,  1990).
   Circumstances that may presently disqualify composite sampling from
being applied may change upon advances in technology.  Long turn-around
time for laboratory results and large labor costs may currently eliminate
optimal retesting designs from consideration. However, retesting designs in
the future may be automated and guided by an expert system (Rajagopal,
 1990). Also,  advances in statistical methodology  may further extend  the
utility of composite sampling,
   For  other reviews of composite sampling, see Rohde (1976, 1979), Elder
 (1977),  Elder, Thompson, and Myers (1980), Boswell and Patil (1987) and
 Garner, Stapanian, and Williams (1988). For an overview, see Patil, Gore,
and Taillie  (1994).

                                   6

-------
 3.   Applications
 Composite sampling has its roots in what is known as group testing. An early
 application of group testing was to estimate the  prevalence of plant virus
 transmission  by insects (Watson, 1936). In this  application, insect vectors
 were allowed to feed upon host plants, thus allowing the disease transmission
 rate to be estimated from the number of plants that subsequently become
 diseased.
    Apparently, the next important application of group testing occurred dur-
 ing World War II when U.S. servicemen were tested for syphilis by detecting
 the presence or absence of a specific antigen of the syphilis-causing bacterium
 in samples of their blood (Dorfman, 1943). Initial analyses were done on com-
 posite samples formed from aliquots of blood drawn  from the subjects. A
 composite sample testing negative indicated that all individuals contributing
 to the composite were negative, while a composite testing positive prompted
 exhaustive  retesting of the original aliquots comprising that composite. If
 blood aliquots of, k individuals are composited, the number of required  tests
 to classify these k individuals will either be 1 or k + 1. For a given prevalence
 of the trait, the expected number of tests can be calculated for a composite
 of size k. This application  has gone on to become a classic example of  how
 statistical cleverness can assist researchers in  attaining what  we call obser-
 vational economy (Rao, 1989).
   In light of recent developments, composite sampling is increasingly be-
coming  an  acceptable  practice for sampling soils,  biota, and  bulk materials
when the goal is estimation of some population value under restrictions of a
 desired standard error and/or limits on the cost of sample measurement.
   In response to an informal survey of various professionals, several favor-
 able applications of composite sampling were received. They include:

   • Establishing and verifying attainment of remedial cleanup standards in
     soils using sample compositing and bootstrapping techniques

   •  Use of  compositing  to obtain adequate support in geostatistical  sam-
     pling

-------
    • Optimal compositing strategies for screening material for deleterious
      agents

    • A soil sample design utilizing techniques of compositing, binary search,
      and confidence limits on proportions

    • Composite sampling for analyzing foliage and other biological materials

    While many diverse applications exist for composite sampling, some ex-
 amples that are particularly relevant to environmental and public health
 studies  are detailed in the remainder  of this chapter.


 3.1.   Soil Sampling

 3.1.1.  Characterization of Soil PCB Contamination at
 Gas Pipeline  Compressor Stations

 As part of a recent settlement between the Pennsylvania Department of
 Environmental Resources and the Texas Eastern Pipeline Company, PCB-
 contaminated  soils had to be characterized and remediated at 19 sites. Be-
 cause waste sources included indiscriminate dumping, disposal in trash pits,
 air emissions and even application as weed killer along fence lines, the result-
 ing spatial distribution of contaminated soil was very heterogeneous, with
 hot spot locations  unknown. Therefore, the only way to reliably  characterize
 these sites required a very large  number of soil samples, around  12,000  to be
 more precise. With each sample analyzed for total PCB,s, the cost for site
 characterization alone was around  $33 million.  Now  to really appreciate the
 magnitude of the problem, one must realize this discussion only pertains to
 the Pennsylvania settlement. The problem extends along the whole pipeline
 from the Gulf Coast to New England.
   Results of a retrospective study (Gore, Patil, and Taillie, 1992; Patil,  Gore
 and Sinha, 1994),  using the actual site characterization data,  revealed that
 composite sampling methods potentially could have substantially reduced
 the analytical  costs.
   Three aspects of the data were evaluated: (i) estimation of the  mean
 and variance of total PCB concentration as well as total PCB mass, (ii)
 classification of each individual (uncomposited) sample as above or below
 a specified critical level,  and (iii) quantification of those individual samples
with the highest PCB levels.
   Results showed that unbiased estimates of the mean and variance could
be obtained with one fourth the number of analyses (90 instead of 360). A
 small loss of precision resulting from compositing seemed quite acceptable in

-------
 light of large analytical cost reduction.  Compositing can actually  increase
 precision if composites are purposely formed to increase heterogeneity within
 composites; however, in this  case composites were formed from spatially prox-
 imate field samples in  order to minimize heterogeneity within composites.
 This approach was taken because it provides for the most efficient retesting
 for classifying individual samples, which, as with most hazardous waste sites,
 was the primary objective.
    A site was acceptably clean if 90% of the measured samples were below
 10 parts per  million (ppm) with no values exceeding 25 ppm. With charac-
 terization data from the worst of the nineteen  sites, compositing could have
 reduced the analytical cost of classifying individual samples according to the
 10 ppm criterion by 9%, relative  to exhaustive testing. Starting  from this
 nearly worst case scenario, the cost savings increase as we move  to cleaner
 sites  and should be dramatic  when analyzing post-remediation verification
 data. For example,  another site along the pipeline that is  cleaner, although
 still contaminated, could have had all individual  samples  classified accord-
 ing to 10 ppm for 50% less of the analytical cost associated with exhaustive
 testing. (See  Gore, Boswell,  Patil, and Taillie,  1992).
    Finally, if concerned with simply knowing which individual sample has the
 highest concentration, we could have discovered this by exhaustively retesting
just two composite samples. In other words, with only eight measurements
 in addition to the 90 composite measurements, we could have identified the
 "hottest" spot.  Furthermore, 12 additional measurements could have re-
 vealed the locations with the four highest concentrations (See Patil, Gore
 and Sinha, 1994).
    Keep in mind that the percentages cited here result from a retrospective
 study where expected composite values were estimated by  arithmetically av-
eraging individual values. Since this approach assumes no measurement error
 (but some is  expected due to incomplete homogenization of samples), these
percentages are best interpreted as potential savings.

 3.1.2.  Characterization  of Soil PAH  Contamination  at
 a Superfund Site

In another study involving remediation of contaminated soil (Messner, et al,
 1990), the investigators  wanted to  determine which half-acre plots at a  Su-
perfund site should be remediated. The contaminant was total polyaromatic
hydrocarbons  (PAHs) and the  cleanup objective was to remediate  any plots
that posed greater than a 10"4 risk based  on  direct ingestion as  the most
likely route of exposure.
   These investigators concluded that the most cost-effective sampling design

-------
 was to take two composite samples from each half-acre plot, with each of the
 two composites consisting of ten individual samples. Even when considering
 the influence of small "hot spots/' the proposed composite sampling design
 assured a high  probability of making the correct decision. Since the estimated
 cost per analysis for this study was $800, the savings due to compositing is
 phenomenal.


 3.2.  Ground  Water  Monitoring

 As the distribution of a constituent in a given medium becomes more ho-
 mogeneous, measurement error decreases, making composite sampling more
 feasible.  For this reason, composite sampling has  great economic potential
 for analyzing dissolved solutes, whether the  solvent is water or some other
 liquid. In fact, a study of composite sampling of wastewater (Schaeffer, Ker-
 ster and Janardan, 1980) showed that  variability of analytical results  due to
 compositing was an insignificant source of total variability.
    Rajagopal and Williams (1989) critically evaluated the economy of com-
 positing ground water samples when screening a large monitoring network in
 order to  identify contaminated wells. With a binary retesting scheme, com-
 positing resulted in decreased analytical effort and  subsequent cost when no
 more than about 12.5% of the wells were contaminated. Of course the savings
 increased as the number of contaminated wells in the network decreased.
    When more than one out of eight wells were contaminated,  the number
 of analyses increased over the amount required' by initial exhaustive testing,
 with the worst case scenario resulting in 50% additional analyses.  If, however,
 curtailed retesting  was performed  instead of straight  binary retesting, the
 absolute maximum exceedance  of analyses would be 31% over that required
 by  initial exhaustive retesting.  This number of additional analyses becomes
 even smaller as  the distribution of contaminated wells becomes contagious
 (or clumped); therefore the rate of 31% additional analyses is absolute worst
 case.
    As  seen here, if the number of contaminated wells is expected to be gen-
erally low,  (e.g. less than 12%), compositing can be economically attractive.


 3.3.  Indoor  Air Monitoring for  Allergens

 Quantification of specific  allergens  in dust from human dwellings provides
important information for determining allergen exposure. The fact that  in-
 door allergens are  not equally  distributed in the dust of human dwellings
makes  it  difficult to  estimate allergen  exposure with a high degree of cer-
                                  10

-------
 tainty. A composite sample may provide a more reliable estimate of indoor
 allergen exposure and minimize error associated with unequal distribution
 of allergens on discrete objects. Composite samples of household dust may
 provide useful information while minimizing the sample collection effort and
 analytical test costs.
    In a recent study (Lintner et al., 1992), dust samples from three specific
 objects  and composite samples from the same three objects were collected
 from the living rooms and bedrooms of 15 homes by a single  technician.
 Discrete and composite samples were collected from floor, furniture (uphol-
 stery/bed) and window- coverings in both the living room and a bedroom of
 each home. Discrete samples were collected by vacuuming the specific objects
 for 10 minutes. Composite samples were collected in a defined sequence by
 vacuuming the three objects for 5 minutes each. In this way, the composites
 were formed at the time of sample collection by allowing the vacuum cleaner
 to do the physical mixing of the dust from several objects.
   Results of this study seem to indicate that the actual measurement of a
 composite sample will be approximately the average of the values that would
 be obtained from separate measurements on discrete samples. However, if
 an object  has a significantly higher allergen content than other objects, the
 composite sample measurement tends to be higher than the average of the
 discrete sample measurements.  In order to effectively use composite sam-
 pling, only items which are likely sources of allergen should be used to form
 a composite sample.


 3.4.  Biomonitoring

 3.4.1.  Measuring Bioaccumulation in Human  Adipose
 Tissue

The National Human Adipose Tissue Survey (NHATS) is an annual survey
to collect and analyze a sample of adipose tissue specimens from autopsied
cadavers and surgical patients  (Orban, Lordo  and Schemberger,  1990). The
primary objectives of NHATS include:
   •  To identify chemicals that are present in  the adipose tissue of individ-
     uals in the U.S. population,

   •  To estimate the average  concentrations,  with confidence intervals, of
     selected chemicals in adipose tissue of individuals in the U.S.  population
     and in various demographic subpopulations, and

   • To determine if geographic region, age, race and sex affect the average
     concentrations of  selected chemicals detected in the U.S. population

                                 11

-------
    Every year approximately 800-1200 adipose tissue specimens are collected
 using a multistage  sampling plan.   First, the 48  conterminous states are
 stratified into four geographic areas, which form four strata. Next, a sample
 of metropolitan statistical areas  (MSAs) is selected from every stratum with
 probabilities proportional to MSA populations.  Finally, several  cooperators
 (hospital pathologists or medical examiners) are chosen from every selected
 MSA and asked to supply a specified quota of tissue specimens. The quota
 specifies the number of specimens needed in each of the following categories:

    • Age groups: 0-14 years, 15-44 years, and 45+ years;

    • Race: Caucasian and non-Caucasian; and

    • Sex: Male and female.

    The sampling plans are designed to give unbiased and efficient estimates
 of the average concentrations of selected chemicals in the entire population
 and  in various subpopulations defined  by the demographic  variables described
 above. Concentrations  are characterized by the average  or median chemical
 concentrations; while prevalence is the proportion of individuals  with chem-
 ical concentrations exceeding specified criterion levels.
   Instead of analyzing 800-1200 individual specimens,  only about 50 com-
 posite samples are analyzed. This not only reduces analytical cost, but also
 provides enough tissue mass to use high resolution gas chromatography /
 mass spectrometry which allows for a wider list of target chemicals to test
 for.

 3.4.2.  Assessing  Contamination in  Fish

 When monitoring human tissue for assessing the bioaccumulation of contam-
 inants, compositing was  forced on the study in  order to  achieve sufficient
 mass of material for analysis. Now, with other organisms this is not typically
 a limitation because  we can  sacrifice  the whole  organism. Nevertheless,  as
 researchers have shown  (Paasivirta and Paukku, 1989),  compositing is still
 preferable because it  is much more cost-effective.
   When concerned with the concentrations of a host of organochlorine com-
 pounds  in  Herring off of Finland's East Gulf, researchers recognized  how
 expensive such monitoring could become. They  therefore evaluated  the ef-
 fectiveness of composite sampling and concluded that costs could be reduced
by about 54% using optimized composite sampling instead  of analyzing indi-
 vidual fish. They also showed that average chemical concentrations could be
estimated from composite samples with the same accuracy as a larger num-
ber of individual samples, and that optimum composite sample  sizes could
be easily calculated if laboratory variance can be predicted.

                                  12

-------
 3.4.3. Assessing  Contaminants in Mollusks

 As part of the National Oceanic and Atmospheric Administration's "Mus-
 sel Watch" program,  177 coastal sites were sampled from 1986  to 1988
 (N O A A, 1989). While mussels were collected along the West Coast and north-
 ern East Coast, oysters were taken along the  southern East Coast, the Gulf
 Coast and three sites in Hawaii.
    Using the soft tissue of these mollusks, composite samples were made by
 homogenizing either 30 mussels or 20 oysters. Six composites were then used
 for chemical analysis, three for organics and three for trace elements.
    Compositing served two purposes here; to provide sufficient media (tis-
 sue) for analysis and to increase the information in each measurement. The
 statistics of interest were means and variances, therefore retesting of individ-
 ual mollusks or groups thereof was not necessary and the desired information
 was obtained with minimal analyses.

 3.4.4. Measuring  Average Fat Content in Bulk Milk

 Apparently, the economic value of composite sampling is well known in the
 dairy industry, where milk must be routinely analyzed. For example, the fat
 content of milk is determined on composite samples which are formed from
 samples using all deliveries during a specified period of time.
   Since composite samples are known to provide an unbiased estimate of the
 population mean, dairy scientists are mainly concerned with the precision of
 a composite sample estimator compared to that of an individual sample es-
timator. Williams and  Peterson (1978) developed a framework for assessing
the precision  of sampling schemes by estimating different sources of varia-
tion associated with the sampling process. They identified four components:
 variance due to real difference  between collections from a supplier within a
compositing period (biological variance), variance among samples taken from
the same collection (sample variance), variance among measurements on the
same  sample  (testing variance) and the variance associated with forming a
composite sample (compositing variance).
   Based on a study of sixty-one herd milk supplies in three different cream-
ery locations,  Connolly  and O'Connor (1981) found that the biological com-
ponents of variability were about 10 times as large as sampling or compositing
components, indicating  that the true biological variability is not masked by
the composite sampling process.
                                  13

-------
 4.   Summary
Compared to exhaustively testing all individual sample units, testing compos-
ite samples has the potential to greatly increase one's observational economy
when conducting environmental and public health monitoring.
   When the objective is to estimate the population mean or total, com-
positing will always  reduce analytical cost; however, a sufficient number of
composite samples must still be obtained for estimating the variance.
   When the objective is to classify each individual sample,  with subsequent
estimation of the prevalence of a binary trait or proportion of noncompliance
measurements, testing composite  samples with selective retesting becomes
cost-effective when the prevalence or proportion is low. Examples of where
composite sampling  can be very  cost-effective for classification include (i)
estimating the prevalence of a rare disease and (ii) verifying if a hazardous
waste site has been sufficiently remediated.
                                  15

-------
                            References
 BOSWELL, M. T., AND PATiL, G. P. (1987). A perspective of composite sampling.
      Commun. Statist-Theory Meth., 16,  3069-3093.
 BURROS, M. (1994). A study faults Mexican restaurants. The New York Times,
      July 19, 1994, p. A16.
 CLINE, S. M., AND SEVERIN, B. F. (1989). Volatile organic losses from a com-
      posite water sampler.  Water Res., 23(4), 407-412.
 CONNOLLY, J., AND O'CONNOR, F. (1981). Comparison of random and compos-
      ite sampling methods for the estimation of fat content of bulk milk supplies.
      Ir. J. Agr. Res.,  20, 35-51.
 DORFMAN, R. (1943). The detection of defective members of large populations.
      Ann. Math. Stat., 14, 436-440.
 EDLAND, S. D. AND VAN BELLE, G. (1994). Decreased sampling costs and im-
      proved accuracy with composite sampling. In Environmental Statistics, As-
      sessment and Forecasting,  C.  R.  Cothern, and N.  P. Ross, eds. Lewis Pub-
      lishers, Boca Raton, pp. 29-55.
ELDER, R. S. (1977). Properties of composite sampling procedures.  Ph.D. Dis-
      sertation. Virginia Polytechnic  Institute and State University, Blacksburg,
      VA.
ELDER,  R. S.,  THOMPSON, W. 0., AND MYERS,  R. H. (1980).  Properties  of
      composite  sampling  procedures.  Technometrks,  22(2), 179-186.
GARNER, F.  C., STAPANIAN,  M.  A., AND  WILLIAMS,  L. R. (1988). Compos-
      ite sampling for environmental monitoring. In Principles of Environmental
      Sampling,  L. H. Keith, ed. American Chemical Society, pp. 363-374.
GORE, S. D., BOSWELL, M. T., PATIL, G. P., AND TAILLIE, C. (1992). Studies
      on the applications of composite sample techniques in hazardous waste site
      characterization and evaluation: I.  Onsite surface soil sampling  for PCB  at
      the Uniontown Site. Technical  Report  Number 92-0101, Center  for Statis-
      tical Ecology and Environmental Statistics, Pennsylvania State  University,
      University Park,  PA 16802.
GORE, S.  D., AND PATIL, G. P. (1994). Identifying  extremely large values using
      composite sample data. With Discussions by J. Warren,  H. D.  Kahn, and
      K.  Campbell. Environmental and Ecological Statistics, 1(3), 227-245.
GORE, S. D., PATIL, G. P., AND TAILLIE,  C. (1992). Studies on the applications
      of composite sample techniques in hazardous waste site characterization and
      evaluation: II.  Onsite surface soil  sampling  for PCB  at the Armagh Site.
      Technical Report Number 92-0305, Center for  Statistical Ecology and En-
      vironmental Statistics, Pennsylvania State University, University Park, PA
      16802.
                                   17

-------
 LINTNER, T.  J., MAKI, C.  L., BRAME, K.  A., AND BOSWELL, M. T. (1992).
      Sampling  dust from human dwellings to estimate the prevalence of indoor
      allergens.  Technical Report Number 92-0805,  Center for Statistical Ecol-
      ogy and Environmental Statistics, Pennsylvania State University, University
      Park,  PA  16802.
 MACK,  G. A., AND ROBINSON, P. E.  (1985). Use  of composited samples to
      increase the precision and probability of detection of toxic chemicals. In En-
      vironmental Applications of Chemometrics J.  J. Breen, and P. E. Robinson,
      eds. American  Chemical Society, Washington,  DC.  pp.  174-183.
 MESSNER, M. J.,  CLAYTON, C.  A.,  MICHAEL,  D.  L,  NEPTUNE, M.  D., AND
      BRANTLY, E. P. (1990). Retrospective design solutions for a remedial in-
      vestigation. Supplement to Quantitative Decision Making in Super-fund: A
      Data Quality Objectives Case Study.  Hazardous Materials Control, Volume
      3, Number 3.
 NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION.  (1989). A Summary
      of Data on Tissue Contamination from the First Three Years (1986-1988) of
      the Mussel Watch Project. NOAA Technical Memorandum, NOS OMA 49.
 ORBAN, J. E.,  LORDO, R., AND SCHWEMBERGER, J. (1990). Statistical methods
      for analyzing composite sample data applied to EPA's human monitoring
      program. MS.
 PAASIVIRTA, J., AND  PAUKKU, R. (1989). Use of composited samples to optimize
      the monitoring of environmental toxins. Chemosphere, 19,  1551-1562.
 PATIL, G. P., GORE, S. D. AND SINHA, A. K. (1994).  Environmental chemistry,
      statistical  modeling, and observational economy. In Environmental  Statis-
      tics, Assessment and Forecasting, C. R. Cothern  and N. P. Ross, eds. Lewis
      Publishers, Boca Raton, pp.  57-97.
 PATIL, G. P., GORE, S. D., AND TAILLIE, C. (1994).  Design and analysis with
      composite samples: A novel method to accomplish observational economy in
      environmental studies. Technical Report Number 94-0410, Center for Statis-
      tical Ecology and Environmental Statistics,  Pennsylvania State University,
     University Park, PA 16802.
 RAJAGOPAL, R. (1990). Personal communication.
 RAJAGOPAL,  R., AND WILLIAMS, L. R. (1989). Economics of sample composit-
      ing as  a screening tool in ground water quality monitoring. Ground Water
      Monitoring  Review, 9(1),  186-192.
RAO, C.R. (1989). Statistics and Truth, Putting  Chance to Work. International
      Co-operative Publishing House, Fairland, MD.  pp.  118-119.
ROHDE,  C. A.  (1976). Composite sampling.  Biometrics, 32, 273-282.
ROHDE,  C. A.  (1979). Batch, bulk and composite  sampling. In Sampling Bi-
      ological Populations R.  M. Cormack,  G. P. Patil, and D. S. Robson, eds.
      International Co-operative Publishing  House, Fairland, MD. pp. 365-377.
SCHAEFFER,  D.,  KERSTER, H. W., AND JANARDAN,  K.  G. (1982).  Monitoring
      toxics by group testing. Environ. Mgmt., 6(6), 467-469.
                                   18

-------
SCHAEFFER, D. J., KERSTER, H. W.,  AND  JANARDAN, K.  G.  (1980). Grab  ver-
      sus composite sampling: A primer for the  manager and  engineer. Environ.
      Mgmt,  4(6),  469-481.
WATSON, M.  A. (1936).  Factors affecting  the amount of  infection obtained by
      aphis  transmission  of the  virus Hy.  III.  Philos.  Trans. Roy.  Soc.  London,
      Ser. B., 226, 457-489.
WILLIAMS, C. J., AND PETERSON,  R.  G. (1978).  Variation in  estimates of milk
      fat, protein and lactose content associated with  various bulk milk sampling
      programs.  /. Dairy Science, 61, 1093.
                                     19

-------