United States Environmental Protection Agency Policy, Planning. And Evaluation (2163) EPA-230-R-95-005 August 1995 &EPA EPA Observational Economy Series Volume 1: Composite Sampling ------- EPA Observational Economy Series Vol. 1: Composite Sampling United States Policy, Planning, EPA 230-R-95-005 Environmental Protection And Evaluation August 1995 Agency (2163) ------- Contents Foreword m Acknowledgments iv 1. Introduction 1 2. What is Composite Sampling? 3 2.1 Method 3 2.2 Limitations of Composite Sampling 5 3. Applications 7 3.1. Soil Sampling 8 3.1.1. PCB Contamination 8 3.1.2. PAH Contamination 9 3.2. Ground Water Monitoring 10 3.3. Indoor Air Monitoring 10 3.4. Biomonitoring 11 3.4.1. Bioaccumulation in Human Adipose Tissue 11 3.4.2. Assessing Contamination in Fish 12 3.4.3. Assessing Contaminants in Mollusks 13 3.4.4. Measuring Average Fat Content in Bulk Milk 13 4. Summary 15 References 17 ------- Foreword The high costs of laboratory analytical procedures frequently strain environ- mental and public health budgets. Whether soil, water or biological tissue is being analyzed, the cost of testing for chemical and pathogenic contaminants can be quite prohibitive. Composite sampling can substantially reduce analytical costs because the number of required analyses is reduced by compositing several samples into one and analyzing the composited sample. By appropriate selection of the composite sample size and retesting of select individual samples, composite sampling may reveal the same information as would otherwise require many more analyses. Many of the limitations of composite sampling have been overcome by recent research, thus bringing out more widespread potential for using com- posite sampling to reduce costs of environmental and public health assess- ments while maintaining and often increasing the precision of sample-based inference. ill ------- Acknowledgments The EPA Observational Economy Series is a result of the research conducted under a cooperative agreement between the U.S. Environmental Protection Agency and the Pennsylvania State University Center for Statistical Ecology and Environmental Statistics, Professor G.P. Patil, Director. The EPA grant CR-821531010, entitled "Research and Outreach on Ob- servational Economy, Environmental Sampling and Statistical Decision Mak- ing in Statistical Ecology and Environmental Statistics" consists often sep- arate projects in progress at the Penn State Center: 1) Composite Sampling and Designs; 2) Ranked Set Sampling and Designs; 3) Environmental Site Characterization and Evaluation; 4) Encounter Sampling; 5) Spatio-temporal Data Analysis; 6) Biodiversity Analysis and Monitoring; 7) Adaptive Sam- pling Designs; 8) Statistics in Environmental Policy and Regulation for Com- pliance and Enforcement; 9) Statistical Ecology and Ecological Risk Assess- ment; and 10) Environmental Statistics Knowledge Transfer, Outreach and Training. The series is published by the Statistical Analysis and Computing Branch of the Environmental Statistics and Information Division in the EPA Office of Policy, Planning and Evaluation. This volume in the series is largely based on the work of M. T. Boswell, S. D. Gore, G. D. Johnson, G. P. Patil, and C. Taillie at the Penn State Center in cooperation with John Fritzvold, Herbert Lacayo, Robert O'Brien, Brenda Odom, Barry Nussbaum, and John Warren as project officers at U.S. EPA. Questions or comments on this publication should be directed to Dr. N. Phillip Ross, Director, Environmental Statistics and Information Division (Mail Code 2163), United States Environmental Protection Agency, 401 M Street SW, Washington, DC 20460; Ph. (202) 260-2680. IV ------- 1. Introduction While decision making in general involves opinion based on prior experience, scientifically based decision making requires careful collection, measurement and interpretation of data from physical observations. Examples of such decisions are: "Has a hazardous waste site been sufficiently cleaned?"; or "Are pollutants accumulating in certain foods as well as in human or wildlife tissues?". Scientifically based decision making should minimize the risk of being wrong. Since decisions require information, which is in turn extracted from data, this risk decreases as the data become more representative of the pop- ulation being studied. In order for a data set to properly represent a population, it must cover the ranges of space and time within which the population lies, as well as have sufficient resolution within these ranges. It soon becomes obvious that collection and review of representative data can be prohibitively expensive if a large sample size (number of measurements, recordings or counts) is re- quired, especially when analytical costs are very high such as with monitoring environmental and biological media for chemical or pathogenic contaminants. Conventional statistical techniques allow for the reduction of either cost or uncertainty. However, the reduction of one of these factors is at the expense of an increase in the other. Composite sampling offers to maintain cost or uncertainty at a specified level while decreasing the other component. Compositing simply refers to physically mixing individual samples to form a composite sample, as visualized in Figure 1. Just one analysis is performed on the composite, which is used to represent each of the original individual samples. Compositing is common practice for simply increasing the representative- ness of a measurement, such as when measuring the fat content of a particular entree that is composited across several restaurants included in a national survey (Burros, 1994). For this reason, compositing can always reduce costs for estimating a total or an average value. However, analysis of composite samples can be cleverly extended to classify the original individual sample units that comprised a composite. For example, one may need to identify ------- Individual Field Samples (composite) (composite) Figure 1: Forming composite samples from individual samples the presence or absence of a pathogen like HIV in blood samples, or one may need to identify all soil cores whose contaminant concentration exceeds an action level at a hazardous waste site. When analytical costs dominate over sampling costs, the savings potential is obviously high; however, the immediate question is "How do we compensate for information that is lost due to compositing?". More specifically, if we are testing whether or not a substance is present or existing at a concentration above some threshold, we do not want to dilute individual "contaminated" samples with clean samples so that the analysis does not detect any con- tamination.' Furthermore, if our measurements are of a variable such as a chemical concentration, we may need to know the actual values of those in- dividual samples with the highest concentrations. For example, "hot spots" need to be identified at hazardous waste sites. Through judicial choice of a strategy for retesting some of the original in- dividual samples based on composite sample measurements, many limitations of composite sampling can be overcome. Furthermore, other innovative ap- plications of composite sampling are emerging such as combining with ranked set sampling, another approach to achieve observational economy that is dis- cussed in Volume 2 of this series. ------- 2. What is Composite Sampling? 2.1. Method First; let's clarify that a "sample" in this document refers to a physical object to be measured, whether an individual or a composite, and not a collection of observations in the statistical sense. Individual sample units are what is obtained in the field, such as soil cores or fish fillets; or obtained from subjects, such as blood samples. Meanwhile, a composite sample may be a physical mix of individual sample units or a batch of unblended individual sample units that are tested as a group. Most compositing for environmental assessment and monitoring consists of physically mixing individual units to make a composite sample that is as homogeneous as possible. With classical sampling, no distinction is made between the process of sampling (i.e., selection or inclusion) and that of observation or measurement. We assume, with classical sampling, that any unit selected for inclusion in a statistical sample is measured and hence its value becomes known. In com- posite sampling, however, there is a clear distinction between the sampling and measurement stages. Compositing takes place between these two stages, and therefore achieves two otherwise conflicting goals. While a large number of samples can be selected to satisfy sample size requirements, the number of analytical measurements is kept affordable. If a variable of concern is a measurement that is continuous in nature such as a chemical concentration, the mean (arithmetic average) of composite samples provides an unbiased estimate of the true but unknown "population" mean. Also, if measurement error is known, the population variance based on the scale of the individual samples can be estimated by a simple weighting of the measured composite sample variance. With selective retesting of individual sample units, based on initial com- posite sample results, we can classify all of the individual sample units ac- cording to the presence or absence of a trait, or exceedance (vs. compliance) of a numerical standard. We can subsequently estimate the prevalence of 3 ------- subsamples (aliquots) of individual samples used to form a composite (composite retest select individuals composite test negative? No j Yes: all individual samples classified as negative Figure 2: Composite sampling with retesting a trait or proportion of non-compliance. Basically, if a composite measure- ment does not reveal a trait in question or is in compliance, then all individual samples comprising that composite are classified as "negative". When a com- posite tests positive, then retesting is performed on the individual samples or subsamples (aliquots) in order to locate the source of "contamination". Retesting, as visualized in a general sense in Figure 2, may simply be exhaustive retesting of all individuals comprising a composite or may entail more specialized protocols. Generally, as the retesting protocol becomes more sophisticated, the expected number of analyses decreases. Therefore, one must consider any increased logistical costs along with the expected decrease in analytical cost when evaluating the overall cost of a compositing/retesting protocol. Due to recent research (Patil, Gore and Sinha, 1994), the individual sam- ples with the highest value, along with those individual samples comprising an upper percentile, can be identified with minimal retesting. This ability is extremely important when "hot spots" need to be identified such as with soil monitoring at a hazardous waste site. Whether we are dealing with data from binary (presence/absence) meas- urements or data from measurements on a continuum, composite sampling ------- can result in classifying each individual sample without having to separately analyze each one. While composite sampling may not be feasible when the prevalence of contamination is high, the analytical costs can be drastically reduced as the number of contaminated samples decreases. 2.2. Limitations of Composite Sampling Both physical and logistical constraints exist that may restrict the applica- tion of composite sampling. The limitations which more commonly arise are discussed here along with some simple recommendations for how compositing still may help. Physical: If the integrity of the individual sample values changes because of com- positing, then composite sampling may not be the desired approach. For example, volatile chemicals can evaporate upon mixing of samples (Cline and Severin, 1989) or interaction can occur among sample constituents. In the first case, compositing of individual sample extracts may be a reasonable alternative to mixing individual samples as they are collected. Another limitation is imposed by potential dilution, where an individual sample with a high value is combined with low values resulting in a compos- ite sample that falsely tests negative. When classifying samples according to exceedance or compliance with some standard value, c, the problem of dilution is overcome by comparing the composite sample result to c divided by the composite sample size, k, (c/k). Furthermore, when an analytical detection limit, d, is known, the maximum composite sample size is estab- lished according to the inequality k < c/d. One may lower this upper bound on the composite sample size to reduce effects of measurement error. As can be seen here, when reporting limits (Rajagopal, 1990) or action levels (Williams, 1990) of some hazardous chemical concentrations are legally re- quired to be near the detection limit, the possibility of composite sampling may be eliminated. Sample homogeneity is another consideration. A homogeneous sample is one where the variable of interest, such as a chemical concentration, is evenly distributed throughout the sample. In contrast, a heterogeneous sample can have substantially different values for the variable of interest, depending on what part of the sample is actually analyzed. If the whole sample unit is analyzed, then heterogeneity is not a problem; however, most laboratory analyses are performed on a small subsample of the original sample unit. For example, one gram of soil may be taken from a one kilogram soil core for ------- actual extraction and analysis. If a subsample is to represent a larger sample unit, then the larger unit must be fairly homogeneous with respect to the variable of interest. Therefore, an individual sample unit should be homogenized as much as possible prior to obtaining an aliquot for inclusion in a composite. Further- more, formation of a composite must include homogenization if the composite is going to be represented by measurement on a smaller subsample. Often, measurements on multiple attributes are desired. However, if retesting is performed in order to classify individual samples, it is unclear how to optimize the retesting relative to the different attributes (Schaeffer et al., 1982). For example, should chemicals be tested independently, or does there exist dependence in the multivariate information that can be used to improve cost efficiency? Classifying for multiple attributes remains an open problem in composite sampling. Logistical: When retesting of certain individual samples may be required based on composite sample results, then subsamples (aliquots) of the original individ- ual samples must be preserved and stored until all testing is done. This may lead to extra expense that must be considered in the overall cost comparison between compositing and other strategies. For most environmental and pub- lic health studies, the analytical savings from compositing will far outweigh the extra cost of sample preservation and storage. Another consideration is that events out of control of the scientists may dictate the feasibility of composite sampling. For example, people whose wells are being tested may demand that their wells be treated as equitably as the wells of their neighbors. Measuring some well samples individually and some well samples solely as part of a composite may give an appearance of inequitability and result in a political decree to measure each well individually (Rajagopal, 1990). Circumstances that may presently disqualify composite sampling from being applied may change upon advances in technology. Long turn-around time for laboratory results and large labor costs may currently eliminate optimal retesting designs from consideration. However, retesting designs in the future may be automated and guided by an expert system (Rajagopal, 1990). Also, advances in statistical methodology may further extend the utility of composite sampling, For other reviews of composite sampling, see Rohde (1976, 1979), Elder (1977), Elder, Thompson, and Myers (1980), Boswell and Patil (1987) and Garner, Stapanian, and Williams (1988). For an overview, see Patil, Gore, and Taillie (1994). 6 ------- 3. Applications Composite sampling has its roots in what is known as group testing. An early application of group testing was to estimate the prevalence of plant virus transmission by insects (Watson, 1936). In this application, insect vectors were allowed to feed upon host plants, thus allowing the disease transmission rate to be estimated from the number of plants that subsequently become diseased. Apparently, the next important application of group testing occurred dur- ing World War II when U.S. servicemen were tested for syphilis by detecting the presence or absence of a specific antigen of the syphilis-causing bacterium in samples of their blood (Dorfman, 1943). Initial analyses were done on com- posite samples formed from aliquots of blood drawn from the subjects. A composite sample testing negative indicated that all individuals contributing to the composite were negative, while a composite testing positive prompted exhaustive retesting of the original aliquots comprising that composite. If blood aliquots of, k individuals are composited, the number of required tests to classify these k individuals will either be 1 or k + 1. For a given prevalence of the trait, the expected number of tests can be calculated for a composite of size k. This application has gone on to become a classic example of how statistical cleverness can assist researchers in attaining what we call obser- vational economy (Rao, 1989). In light of recent developments, composite sampling is increasingly be- coming an acceptable practice for sampling soils, biota, and bulk materials when the goal is estimation of some population value under restrictions of a desired standard error and/or limits on the cost of sample measurement. In response to an informal survey of various professionals, several favor- able applications of composite sampling were received. They include: • Establishing and verifying attainment of remedial cleanup standards in soils using sample compositing and bootstrapping techniques • Use of compositing to obtain adequate support in geostatistical sam- pling ------- • Optimal compositing strategies for screening material for deleterious agents • A soil sample design utilizing techniques of compositing, binary search, and confidence limits on proportions • Composite sampling for analyzing foliage and other biological materials While many diverse applications exist for composite sampling, some ex- amples that are particularly relevant to environmental and public health studies are detailed in the remainder of this chapter. 3.1. Soil Sampling 3.1.1. Characterization of Soil PCB Contamination at Gas Pipeline Compressor Stations As part of a recent settlement between the Pennsylvania Department of Environmental Resources and the Texas Eastern Pipeline Company, PCB- contaminated soils had to be characterized and remediated at 19 sites. Be- cause waste sources included indiscriminate dumping, disposal in trash pits, air emissions and even application as weed killer along fence lines, the result- ing spatial distribution of contaminated soil was very heterogeneous, with hot spot locations unknown. Therefore, the only way to reliably characterize these sites required a very large number of soil samples, around 12,000 to be more precise. With each sample analyzed for total PCB,s, the cost for site characterization alone was around $33 million. Now to really appreciate the magnitude of the problem, one must realize this discussion only pertains to the Pennsylvania settlement. The problem extends along the whole pipeline from the Gulf Coast to New England. Results of a retrospective study (Gore, Patil, and Taillie, 1992; Patil, Gore and Sinha, 1994), using the actual site characterization data, revealed that composite sampling methods potentially could have substantially reduced the analytical costs. Three aspects of the data were evaluated: (i) estimation of the mean and variance of total PCB concentration as well as total PCB mass, (ii) classification of each individual (uncomposited) sample as above or below a specified critical level, and (iii) quantification of those individual samples with the highest PCB levels. Results showed that unbiased estimates of the mean and variance could be obtained with one fourth the number of analyses (90 instead of 360). A small loss of precision resulting from compositing seemed quite acceptable in ------- light of large analytical cost reduction. Compositing can actually increase precision if composites are purposely formed to increase heterogeneity within composites; however, in this case composites were formed from spatially prox- imate field samples in order to minimize heterogeneity within composites. This approach was taken because it provides for the most efficient retesting for classifying individual samples, which, as with most hazardous waste sites, was the primary objective. A site was acceptably clean if 90% of the measured samples were below 10 parts per million (ppm) with no values exceeding 25 ppm. With charac- terization data from the worst of the nineteen sites, compositing could have reduced the analytical cost of classifying individual samples according to the 10 ppm criterion by 9%, relative to exhaustive testing. Starting from this nearly worst case scenario, the cost savings increase as we move to cleaner sites and should be dramatic when analyzing post-remediation verification data. For example, another site along the pipeline that is cleaner, although still contaminated, could have had all individual samples classified accord- ing to 10 ppm for 50% less of the analytical cost associated with exhaustive testing. (See Gore, Boswell, Patil, and Taillie, 1992). Finally, if concerned with simply knowing which individual sample has the highest concentration, we could have discovered this by exhaustively retesting just two composite samples. In other words, with only eight measurements in addition to the 90 composite measurements, we could have identified the "hottest" spot. Furthermore, 12 additional measurements could have re- vealed the locations with the four highest concentrations (See Patil, Gore and Sinha, 1994). Keep in mind that the percentages cited here result from a retrospective study where expected composite values were estimated by arithmetically av- eraging individual values. Since this approach assumes no measurement error (but some is expected due to incomplete homogenization of samples), these percentages are best interpreted as potential savings. 3.1.2. Characterization of Soil PAH Contamination at a Superfund Site In another study involving remediation of contaminated soil (Messner, et al, 1990), the investigators wanted to determine which half-acre plots at a Su- perfund site should be remediated. The contaminant was total polyaromatic hydrocarbons (PAHs) and the cleanup objective was to remediate any plots that posed greater than a 10"4 risk based on direct ingestion as the most likely route of exposure. These investigators concluded that the most cost-effective sampling design ------- was to take two composite samples from each half-acre plot, with each of the two composites consisting of ten individual samples. Even when considering the influence of small "hot spots/' the proposed composite sampling design assured a high probability of making the correct decision. Since the estimated cost per analysis for this study was $800, the savings due to compositing is phenomenal. 3.2. Ground Water Monitoring As the distribution of a constituent in a given medium becomes more ho- mogeneous, measurement error decreases, making composite sampling more feasible. For this reason, composite sampling has great economic potential for analyzing dissolved solutes, whether the solvent is water or some other liquid. In fact, a study of composite sampling of wastewater (Schaeffer, Ker- ster and Janardan, 1980) showed that variability of analytical results due to compositing was an insignificant source of total variability. Rajagopal and Williams (1989) critically evaluated the economy of com- positing ground water samples when screening a large monitoring network in order to identify contaminated wells. With a binary retesting scheme, com- positing resulted in decreased analytical effort and subsequent cost when no more than about 12.5% of the wells were contaminated. Of course the savings increased as the number of contaminated wells in the network decreased. When more than one out of eight wells were contaminated, the number of analyses increased over the amount required' by initial exhaustive testing, with the worst case scenario resulting in 50% additional analyses. If, however, curtailed retesting was performed instead of straight binary retesting, the absolute maximum exceedance of analyses would be 31% over that required by initial exhaustive retesting. This number of additional analyses becomes even smaller as the distribution of contaminated wells becomes contagious (or clumped); therefore the rate of 31% additional analyses is absolute worst case. As seen here, if the number of contaminated wells is expected to be gen- erally low, (e.g. less than 12%), compositing can be economically attractive. 3.3. Indoor Air Monitoring for Allergens Quantification of specific allergens in dust from human dwellings provides important information for determining allergen exposure. The fact that in- door allergens are not equally distributed in the dust of human dwellings makes it difficult to estimate allergen exposure with a high degree of cer- 10 ------- tainty. A composite sample may provide a more reliable estimate of indoor allergen exposure and minimize error associated with unequal distribution of allergens on discrete objects. Composite samples of household dust may provide useful information while minimizing the sample collection effort and analytical test costs. In a recent study (Lintner et al., 1992), dust samples from three specific objects and composite samples from the same three objects were collected from the living rooms and bedrooms of 15 homes by a single technician. Discrete and composite samples were collected from floor, furniture (uphol- stery/bed) and window- coverings in both the living room and a bedroom of each home. Discrete samples were collected by vacuuming the specific objects for 10 minutes. Composite samples were collected in a defined sequence by vacuuming the three objects for 5 minutes each. In this way, the composites were formed at the time of sample collection by allowing the vacuum cleaner to do the physical mixing of the dust from several objects. Results of this study seem to indicate that the actual measurement of a composite sample will be approximately the average of the values that would be obtained from separate measurements on discrete samples. However, if an object has a significantly higher allergen content than other objects, the composite sample measurement tends to be higher than the average of the discrete sample measurements. In order to effectively use composite sam- pling, only items which are likely sources of allergen should be used to form a composite sample. 3.4. Biomonitoring 3.4.1. Measuring Bioaccumulation in Human Adipose Tissue The National Human Adipose Tissue Survey (NHATS) is an annual survey to collect and analyze a sample of adipose tissue specimens from autopsied cadavers and surgical patients (Orban, Lordo and Schemberger, 1990). The primary objectives of NHATS include: • To identify chemicals that are present in the adipose tissue of individ- uals in the U.S. population, • To estimate the average concentrations, with confidence intervals, of selected chemicals in adipose tissue of individuals in the U.S. population and in various demographic subpopulations, and • To determine if geographic region, age, race and sex affect the average concentrations of selected chemicals detected in the U.S. population 11 ------- Every year approximately 800-1200 adipose tissue specimens are collected using a multistage sampling plan. First, the 48 conterminous states are stratified into four geographic areas, which form four strata. Next, a sample of metropolitan statistical areas (MSAs) is selected from every stratum with probabilities proportional to MSA populations. Finally, several cooperators (hospital pathologists or medical examiners) are chosen from every selected MSA and asked to supply a specified quota of tissue specimens. The quota specifies the number of specimens needed in each of the following categories: • Age groups: 0-14 years, 15-44 years, and 45+ years; • Race: Caucasian and non-Caucasian; and • Sex: Male and female. The sampling plans are designed to give unbiased and efficient estimates of the average concentrations of selected chemicals in the entire population and in various subpopulations defined by the demographic variables described above. Concentrations are characterized by the average or median chemical concentrations; while prevalence is the proportion of individuals with chem- ical concentrations exceeding specified criterion levels. Instead of analyzing 800-1200 individual specimens, only about 50 com- posite samples are analyzed. This not only reduces analytical cost, but also provides enough tissue mass to use high resolution gas chromatography / mass spectrometry which allows for a wider list of target chemicals to test for. 3.4.2. Assessing Contamination in Fish When monitoring human tissue for assessing the bioaccumulation of contam- inants, compositing was forced on the study in order to achieve sufficient mass of material for analysis. Now, with other organisms this is not typically a limitation because we can sacrifice the whole organism. Nevertheless, as researchers have shown (Paasivirta and Paukku, 1989), compositing is still preferable because it is much more cost-effective. When concerned with the concentrations of a host of organochlorine com- pounds in Herring off of Finland's East Gulf, researchers recognized how expensive such monitoring could become. They therefore evaluated the ef- fectiveness of composite sampling and concluded that costs could be reduced by about 54% using optimized composite sampling instead of analyzing indi- vidual fish. They also showed that average chemical concentrations could be estimated from composite samples with the same accuracy as a larger num- ber of individual samples, and that optimum composite sample sizes could be easily calculated if laboratory variance can be predicted. 12 ------- 3.4.3. Assessing Contaminants in Mollusks As part of the National Oceanic and Atmospheric Administration's "Mus- sel Watch" program, 177 coastal sites were sampled from 1986 to 1988 (N O A A, 1989). While mussels were collected along the West Coast and north- ern East Coast, oysters were taken along the southern East Coast, the Gulf Coast and three sites in Hawaii. Using the soft tissue of these mollusks, composite samples were made by homogenizing either 30 mussels or 20 oysters. Six composites were then used for chemical analysis, three for organics and three for trace elements. Compositing served two purposes here; to provide sufficient media (tis- sue) for analysis and to increase the information in each measurement. The statistics of interest were means and variances, therefore retesting of individ- ual mollusks or groups thereof was not necessary and the desired information was obtained with minimal analyses. 3.4.4. Measuring Average Fat Content in Bulk Milk Apparently, the economic value of composite sampling is well known in the dairy industry, where milk must be routinely analyzed. For example, the fat content of milk is determined on composite samples which are formed from samples using all deliveries during a specified period of time. Since composite samples are known to provide an unbiased estimate of the population mean, dairy scientists are mainly concerned with the precision of a composite sample estimator compared to that of an individual sample es- timator. Williams and Peterson (1978) developed a framework for assessing the precision of sampling schemes by estimating different sources of varia- tion associated with the sampling process. They identified four components: variance due to real difference between collections from a supplier within a compositing period (biological variance), variance among samples taken from the same collection (sample variance), variance among measurements on the same sample (testing variance) and the variance associated with forming a composite sample (compositing variance). Based on a study of sixty-one herd milk supplies in three different cream- ery locations, Connolly and O'Connor (1981) found that the biological com- ponents of variability were about 10 times as large as sampling or compositing components, indicating that the true biological variability is not masked by the composite sampling process. 13 ------- 4. Summary Compared to exhaustively testing all individual sample units, testing compos- ite samples has the potential to greatly increase one's observational economy when conducting environmental and public health monitoring. When the objective is to estimate the population mean or total, com- positing will always reduce analytical cost; however, a sufficient number of composite samples must still be obtained for estimating the variance. When the objective is to classify each individual sample, with subsequent estimation of the prevalence of a binary trait or proportion of noncompliance measurements, testing composite samples with selective retesting becomes cost-effective when the prevalence or proportion is low. Examples of where composite sampling can be very cost-effective for classification include (i) estimating the prevalence of a rare disease and (ii) verifying if a hazardous waste site has been sufficiently remediated. 15 ------- References BOSWELL, M. T., AND PATiL, G. P. (1987). A perspective of composite sampling. Commun. Statist-Theory Meth., 16, 3069-3093. BURROS, M. (1994). A study faults Mexican restaurants. The New York Times, July 19, 1994, p. A16. CLINE, S. M., AND SEVERIN, B. F. (1989). Volatile organic losses from a com- posite water sampler. Water Res., 23(4), 407-412. CONNOLLY, J., AND O'CONNOR, F. (1981). Comparison of random and compos- ite sampling methods for the estimation of fat content of bulk milk supplies. Ir. J. Agr. Res., 20, 35-51. DORFMAN, R. (1943). The detection of defective members of large populations. Ann. Math. Stat., 14, 436-440. EDLAND, S. D. AND VAN BELLE, G. (1994). Decreased sampling costs and im- proved accuracy with composite sampling. In Environmental Statistics, As- sessment and Forecasting, C. R. Cothern, and N. P. Ross, eds. Lewis Pub- lishers, Boca Raton, pp. 29-55. ELDER, R. S. (1977). Properties of composite sampling procedures. Ph.D. Dis- sertation. Virginia Polytechnic Institute and State University, Blacksburg, VA. ELDER, R. S., THOMPSON, W. 0., AND MYERS, R. H. (1980). Properties of composite sampling procedures. Technometrks, 22(2), 179-186. GARNER, F. C., STAPANIAN, M. A., AND WILLIAMS, L. R. (1988). Compos- ite sampling for environmental monitoring. In Principles of Environmental Sampling, L. H. Keith, ed. American Chemical Society, pp. 363-374. GORE, S. D., BOSWELL, M. T., PATIL, G. P., AND TAILLIE, C. (1992). Studies on the applications of composite sample techniques in hazardous waste site characterization and evaluation: I. Onsite surface soil sampling for PCB at the Uniontown Site. Technical Report Number 92-0101, Center for Statis- tical Ecology and Environmental Statistics, Pennsylvania State University, University Park, PA 16802. GORE, S. D., AND PATIL, G. P. (1994). Identifying extremely large values using composite sample data. With Discussions by J. Warren, H. D. Kahn, and K. Campbell. Environmental and Ecological Statistics, 1(3), 227-245. GORE, S. D., PATIL, G. P., AND TAILLIE, C. (1992). Studies on the applications of composite sample techniques in hazardous waste site characterization and evaluation: II. Onsite surface soil sampling for PCB at the Armagh Site. Technical Report Number 92-0305, Center for Statistical Ecology and En- vironmental Statistics, Pennsylvania State University, University Park, PA 16802. 17 ------- LINTNER, T. J., MAKI, C. L., BRAME, K. A., AND BOSWELL, M. T. (1992). Sampling dust from human dwellings to estimate the prevalence of indoor allergens. Technical Report Number 92-0805, Center for Statistical Ecol- ogy and Environmental Statistics, Pennsylvania State University, University Park, PA 16802. MACK, G. A., AND ROBINSON, P. E. (1985). Use of composited samples to increase the precision and probability of detection of toxic chemicals. In En- vironmental Applications of Chemometrics J. J. Breen, and P. E. Robinson, eds. American Chemical Society, Washington, DC. pp. 174-183. MESSNER, M. J., CLAYTON, C. A., MICHAEL, D. L, NEPTUNE, M. D., AND BRANTLY, E. P. (1990). Retrospective design solutions for a remedial in- vestigation. Supplement to Quantitative Decision Making in Super-fund: A Data Quality Objectives Case Study. Hazardous Materials Control, Volume 3, Number 3. NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION. (1989). A Summary of Data on Tissue Contamination from the First Three Years (1986-1988) of the Mussel Watch Project. NOAA Technical Memorandum, NOS OMA 49. ORBAN, J. E., LORDO, R., AND SCHWEMBERGER, J. (1990). Statistical methods for analyzing composite sample data applied to EPA's human monitoring program. MS. PAASIVIRTA, J., AND PAUKKU, R. (1989). Use of composited samples to optimize the monitoring of environmental toxins. Chemosphere, 19, 1551-1562. PATIL, G. P., GORE, S. D. AND SINHA, A. K. (1994). Environmental chemistry, statistical modeling, and observational economy. In Environmental Statis- tics, Assessment and Forecasting, C. R. Cothern and N. P. Ross, eds. Lewis Publishers, Boca Raton, pp. 57-97. PATIL, G. P., GORE, S. D., AND TAILLIE, C. (1994). Design and analysis with composite samples: A novel method to accomplish observational economy in environmental studies. Technical Report Number 94-0410, Center for Statis- tical Ecology and Environmental Statistics, Pennsylvania State University, University Park, PA 16802. RAJAGOPAL, R. (1990). Personal communication. RAJAGOPAL, R., AND WILLIAMS, L. R. (1989). Economics of sample composit- ing as a screening tool in ground water quality monitoring. Ground Water Monitoring Review, 9(1), 186-192. RAO, C.R. (1989). Statistics and Truth, Putting Chance to Work. International Co-operative Publishing House, Fairland, MD. pp. 118-119. ROHDE, C. A. (1976). Composite sampling. Biometrics, 32, 273-282. ROHDE, C. A. (1979). Batch, bulk and composite sampling. In Sampling Bi- ological Populations R. M. Cormack, G. P. Patil, and D. S. Robson, eds. International Co-operative Publishing House, Fairland, MD. pp. 365-377. SCHAEFFER, D., KERSTER, H. W., AND JANARDAN, K. G. (1982). Monitoring toxics by group testing. Environ. Mgmt., 6(6), 467-469. 18 ------- SCHAEFFER, D. J., KERSTER, H. W., AND JANARDAN, K. G. (1980). Grab ver- sus composite sampling: A primer for the manager and engineer. Environ. Mgmt, 4(6), 469-481. WATSON, M. A. (1936). Factors affecting the amount of infection obtained by aphis transmission of the virus Hy. III. Philos. Trans. Roy. Soc. London, Ser. B., 226, 457-489. WILLIAMS, C. J., AND PETERSON, R. G. (1978). Variation in estimates of milk fat, protein and lactose content associated with various bulk milk sampling programs. /. Dairy Science, 61, 1093. 19 ------- |