United States Environmental Protection Agency Policy, Planning, And Evaluation (2163) EPA-230-R-95-006 August 1995 c/EPA EPA Observational Economy Series Volume 2: Ranked Set Sampling ------- EPA Observational Economy Series Vol. 2: Ranked Set Sampling United States Policy, Planning, EPA 230-R-95-006 Environmental Protection And Evaluation August 1995 Agency (2163) ------- Contents Foreword i" Acknowledgments iv 1. Introduction 1 2. What is Ranked Set Sampling? 5 2.1. Description 5 2.1.1. Ranking Criteria 7 2.1.2. Robustness of the Procedure 7 2.2. Variations of the Basic Protocol 8 2.2.1. Unequal Allocation of Sample Units 8 2.2.2. Combining with Line Intercept, Sampling 10 3. Applications 13 3.1. Forage Yields 13 3.2. Seedling Counts 14 3.3. Shrub Phytomass in Forest Stands 15 3.4. Herbage Mass 17 3.5. PCB Contamination Levels 18 3.6. Improved Compositing of Samples 21 3.7. Additional Applications 22 4. Summary 25 References 27 ------- Foreword Ranked set sampling is a novel method for achieving observational economy when performing environmental monitoring and assessment. Compared to simple random sampling, ranked set sampling yields a sample of observations that are more representative of the underlying population. Therefore: either greater confidence is gained for a fixed number of observations, or for a desired level of confidence, less observations (less $) are needed. The increased sampling efficiency is achieved by exploiting auxiliary infor- mation involving acquired field samples, a characteristic of double sampling procedures. With ranked set sampling, however: the auxiliary information does not have to be a quantitative concomitant variable. In fact, it can be purely judgmental; and thus, in the spirit of total quality management, it stimulates and utilizes a productive cross disciplinary dialogue among those responsible for sampling and assessment. Additionally, the ranked set sam- pling procedure is robust in the sense that it cannot perform worse than the usual simple random sampling. This volume in the EPA Observational Economy Series introduces the concept and method of ranked set sampling for its timely inclusion in the toolbox of sampling procedures that aim to achieve observational economy, particularly when analytical costs dominate the monitoring scenario. iii ------- Acknowledements The EPA Observational Economy Series is a result of the research conducted under a cooperative agreement between the U.S. Environmental Protection Agency and the Pennsylvania State University Center for Statistical Ecology and Environmental Statistics, Professor G.P. Patil, Director. 'The EPA Grant CR-821531010, entitled "Research and Outreach on Ob- servational Economy, Environmental Sampling and Statistical Decision Mak- ing in Statistical Ecology and Environmental Statistics" consists of ten sep- arate projects in progress at the Penn State Center: 1) Composite Sampling and Designs; 2) Ranked Set Sampling and Designs; 3) Environmental Site Characterization and Evaluation; 4) Encounter Sampling: 5) Spatio-temporal Data Analysis; 6) Biodiversity Analysis and Monitoring; 7) Adaptive Sam- pling Designs; 8) Statistics in Environmental Policy and Regulation for Com- pliance and Enforcement; 9) Statistical Ecology and Ecological Risk Assess- ment; and 10) Environmental Statistics Knowledge Transfer? Outreach and Training. The series is published by the Statistical Analysis and Computing Branch of the Environmental Statistics and Information Division in the EPA Office of Policy, Planning and Evaluation. This volume in the series is largely based on the work of G. D. Johnson, A. Kaur, G. P. Patil, A. K. Sinha and C. Taillie at the Penn State Center in cooperation with John Fritzvold, Herbert Lacayo, Robert O'Brien, Brenda Odom, Barry Nussbaum, and John Warren: as project officers at the U.S. EPA. Questions or comments on this publication should be directed to Dr. N. Phillip Ross, Director, Environmental Statistics and Information Division (Mail Code 2163), United States Environmental Protection Agency, 401 M Street SW, Washington, DC 20460; Ph. (202) 260-2680. IV ------- 1. Introduction Environmental monitoring and assessment mostly requires observational data, as opposed to data obtained from controlled experiments. This is true whether we are assessing the extent of soil contamination at a one-acre site or some measure of forest' resources over the Pacific Northwest region of the United States. Obtaining such data requires identification of sample units to represent the population of concern, followed by selection of particular units to quantify the characteristic(s) of interest. Sample units are basically the smallest units of measurement such as plots: soil cores, individuals: etc., while typical characteristics of interest include biomass, chemical concentrations or "head counts". Typically the most expensive part of this process is laboratory analysis, while identification of potential sample units is a comparatively simple mat- ter. We can therefore achieve great observational economy if we are able to identify a large number of sample units to represent the population of interest, yet only have to quantify a carefully selected subsample. This potential for observational economy was recognized for estimating mean pasture and forage yields in the early 1950's when Mclntyre (1952) proposed a method, later coined Ranked Set sampling (RSS) by Halls and Dell (1966), and currently under active investigation in various quarters. As a simple introduction to the concept of RSS, consider the following example: Let's say we wish to estimate the mean height of students at a university from a random sample of three students. Furthermore, in order to acknowl- edge the inherent uncertainty, we need to present this estimate as a confidence interval within which we expect the true population mean to lie with desired confidence. Now the simplest way to obtain our sample is to randomly select three students from the university's population, then measure their heights. While the arithmetic average of the three heights is an unbiased point estimate of the population mean: the associated confidence interval can be very large, reflecting the high degree of uncertainty with estimating a large population mean with only three measurements. This is because we have no control over ------- which individuals of the population enter the sample. For example, we may happen to grab two very short people and one very tall: or we may grab three very tall people. The only way to overcome such a problem with a simple random sample (SRS) is to increase the sample size. On the other hand, we may obtain a ranked set sample. To do this. we may randomly invite three students to breakfast and visually rank them with respect to height. We then select the student we believe is shortest and actually measure his or her height. Repeating this process with lunch, we then select the middle ranked person, and, as such, select the tallest ranked person at dinner. 'The resulting measurements of student heights constitute a ranked set sample. As with the SRS measurements: the arithmetic average of the RSS measurements provides an unbiased point estimate of the popu- lation mean; however, the associated confidence interval can potentially be much smaller than that obtained with SRS measurements, thus reflecting de- creased uncertainty. This encouraging feature results because measurements obtained through RSS are likely to be more regularly spaced than those ob- tained through SRS and therefore are more representative of the population. Amazingly, the RSS procedure induces stratification of the whole population at the sample level; in effect, we are randomly sampling from the subpopu- lations of predominantly short, medium and tall students without having to construct the subpopulation strata. Each subpopulation has its own distri- bution, as visualized in Figure 1, where we see how the parent population gets effectively partitioned into subpopulations. Mclntyre's proposal does not appear to have been applied for over a decade, after which forest and range researchers continued to discover the effectiveness of RSS (see Halls and Dell, 1966; Evans, 1967; Martin, et al. 1980; Jewiss, 1981; and Cobby, et al. 1985). Theoretical investigations by Dell and Clutter (1972) showed that, regardless of ranking errors, the RSS estimator of a population mean is unbiased and at least as precise as the SRS estimator with the same number of quantifications. David and Levine (1972) investigated the case where ranking is done by a numerical covariate. Furthermore, RSS also provides more precise estimators of the variance (Stokes, 1980a). the cumulative distribution function (Stokes and Sager, 1988) and at times the Pearson correlation coefficient (Stokes, 1980b). For an annotated bibliography with an historic perspective, see Kaur, Patil, Sinha, and Taillie (1995). ------- Q Figure 1: Frequency distributions of heights of different ranks superimposed on population frequency distribution of all heights-a schematic diagram. ------- ------- 2. What is Ranked Set Sampling? 2.1. Description As mentioned in Chapter 1, to create ranked sets we must partition the selected first phase sample into sets of equal size. In order to plan an RSS design, we must therefore choose a set size which is typically small, around 3 or 4, to minimize ranking error. Let's arbitrarily call this set size m, where m is the number of sample units allocated into each set. Now proceed as follows. • step 1: Randomly select m2 sample units from the population. • step 2: Allocate the m2 selected units as randomly as possible into m sets, each of size m. • step 3: Without yet knowing any values for the variable of interest, rank the units within each set based on a perception of relative values for this variable. This may be based on personal judgment or done with measurements of a covariate that is correlated with the variable of interest. • step 4: Choose a sample for actual analysis by including the smallest ranked unit in the first set, then the second smallest ranked unit in the second set, continuing in this fashion until the largest ranked unit is selected in the last set. • step 5: Repeat steps 1 through 4 for r cycles until the desired sample size, n = mr, is obtained for analysis. As an illustration, consider the set size m = 3 with r = 4 cycles. This sit- uation is illustrated in Figure 2 where each row denotes a judgment-ordered sample within a cycle, and the units selected for quantitative analysis are ------- cycle rank 1 2 3 Figure 2: A ranked set sample design with set size m = 3 and the number of sampling cycles r = 4. Although 36 sample units have been selected from the population, only the 12 circled units are actually included in the final sample for quantitative analysis. circled. Note that 36 units have been randomly selected in 4 cycles; how- ever, only 12 units are actually analyzed to obtain the ranked set sample of measurements. Obtaining a sample in this manner results in maintaining the unbiasedness of simple random sampling; however: by incorporating "outside" information about the sample units, we are able to contribute a structure to the sample that increases its representativeness of the true underlying population. If we quantified the same number of sample units, mr = 12, by a sim- ple random sample, we have no control over which units enter the sample. Perhaps all the 12 units would come from the lower end of the range, or per- haps most would be clustered at the low end while one or two units would come from the middle or upper range. With simple random sampling, the only way to increase the prospect of covering the full range of possible val- ues is to increase the sample size. With ranked set sampling, however, we increase the representativeness with a fixed number of sample units, thus saving considerably on quantification costs. With the ranked set sample thus obtained, it can be shown that unbiased estimators of several important population parameters can be calculated, including the mean and, in case of more than one sampling cycle, the variance. ------- 2.1.1. Ranking Criteria A real key- to success lies with step 3 in the above procedure- ranking. This may be based on visual inspection or other expert opinion about the sample units. For example, a field-seasoned range scientist or woods person may readily be able to rank three or four quadrats of grass with respect to overall volume or mass. Meanwhile a hazardous waste site inspector may be able to reliably rank areas of soil with respect to concentrations of a toxic contami- nant, based on features like surface staining, discoloration or the appearance of stressed vegetation. On the other hand, if another characteristic is available that is highly correlated with the characteristic of interest but costs much less to obtain, then we may rank by the values of such a "covariate". For example! re- flectance intensity of near-infrared electromagnetic radiation, as recorded in a remotely sensed digital image, is directly proportional to vegetation concen- tration on the ground. Another example might be to measure total organic halides (TOX) in soil in order to rank soil sampling units with respect to the concentration of volatile organic solvents. As an indicator variable. TOX is much less expensive to measure than specific organic compounds. 2.1.2. Robustness of the Procedure Several questions may now arise, such as: 1. What if the distribution of sample measurements is skewed? or sym- metric? or essentially unknown? 2. What if the sample units are not randomly allocated into sets? 3. How does error in ranking affect results? First of all. while independent (random) and identically distributed sam- ple measurements obtained through perfect ranking may lead to optimum performance of ranked set sampling, no matter how much these desirable characteristics are deviated from, the sampling efficiency will never be worse than with simple random sampling using the same number of quantifications. In fact, when efficiency is expressed as the relative precision (RP) such that variance of sample average with simple random sampling variance of sample average with ranked set sampling it can be shown that the bounds of this relative precision are 1 < RP < (m + l)/2, 7 ------- where m is the set size. Since RP can not be less than one. the RSS protocol can not be worse than the simple random sampling protocol. 2.2. Variations of the Basic Protocol 2.2.1. Unequal Allocation of Sample Units The performance of RSS decreases as the underlying distribution of the char- acteristic of interest becomes increasingly skewed. Mclntyre (1952) originally suggested that this problem may be overcome by allocating sample units into ranks in proportion to the standard deviation of each rank. This is the same approach as used in stratified random sampling, known as Neyman alloca- tion, and would indeed be optimal if we had reliable prior estimates of the rank standard deviations. An example of unequal allocation is displayed in Figure 3. Here we have the same set size, m = 3, and sample size, n = 12, as in the earlier example of equal allocation; however, the number of sampling cycles is adjusted so as to yield the desired unequal allocation of samples. Unequal allocation can actually increase the performance of RSS above and beyond that achievable with standard equal allocation; however? if not. properly applied, the performance of RSS can be worse than the performance of simple random sampling. Actually, the bounds on relative precision with unequal allocation become 0 < RP < m. indicating that, with appropriate unequal allocation, the relative precision may even increase to a level of m, and not just (m + l)/2 as in the case with equal allocation. Although an optimal RSS design would allocate samples into ranks in direct proportion to the rank standard deviations, we rarely know the stan- dard deviations beforehand. We do know, however, that the distributions of many environmental and ecological variables are skewed towards the right, meaning that while most values are clustered around a median, a few much larger values are usually present. This skewness can actually be exploited to increase the precision beyond that obtained with ranked set sampling under equal allocation because standard deviations usually tend to increase with increasing rank values for right-skewed distributions. With some idea of the degree of skewness, Kaur, Patil, and Taillie (1994) have devised a rule-of- thumb for allocating sample units into ranks that performs closely to the optimal Neyman allocation. Therefore, distributions of many environmental and ecological variables may actually lend themselves well to being estimated ------- Sets Units No. of sets 10 11 12 Figure 3: Ranked set sampling with unequal allocation: circles indicate sam- ple units chosen for quantification. ------- with very high precision relative to that obtainable through simple random sampling. 2.2.2. Combining with Line Intercept Sampling A common field sampling method for ecological assessments is to include sample units that one encounters along a line (transect) that is randomly se- lected within a two-dimensional area of interest. Units are typically members of a plant or animal species. Often the number of sample units identified are too numerous to select every one for quantification, especially if measurements are destructive, such as with cutting vegetation for weighing. If the initially identified sample units are treated as a larger first phase sample, ri = m2r, then the RSS protocol can be applied to select a smaller subsample, n = mr, for actual quantification. For example: consider a single sampling cycle when the set size, m equals three for estimating the biomass of shrubs in a given area. A line transect for such a situation may be visualized as in Figure 4. Such an RSS-based line intercept sample has been found to produce more precise, and still unbiased, estimators of the population mean. size, total and cover, compared to the SRS-based line intercept sample (see Muttlack and McDonald, 1992). 10 ------- Setl Set 2 Set3 Figure 4: Aerial view of a line transect intercepting shrubs. For set size m = 3, nine shrubs are partitioned into 3 sets of 3. Using apparent shrub size for ranking with respect to biomass, the shrubs taken for analyses include the smallest ranked in the first set, the second smallest ranked in the second set and the largest ranked in the third set. 11 ------- ------- 3. Applications 3.1. Forage Yields Although Mclntyre's original proposal of estimating pasture yields by "unbi- ased selective sampling using ranked sets" was made in 1952, no applications were apparently reported until fourteen years later, Halls and Dell (1966) applied Mclntyre's method: coining it "ranked set sampling" for estimating the weights of browse and herbage in a pine-hardwood forest of east Texas. These authors discovered RSS to be considerably more efficient than SRS. Sets of three closely grouped quadrats were formed on a 300-acre tract. At select locations, metal frames of 3.1 square feet were placed at three ran- domly selected points within a circle of 13 foot radius as seen in Figure 5. Quadrats were then ranked as lowest, intermediate and highest according to the perceived weight of browse and, separately, of herbage. Then, after clipping and drying, the separate weights of browse and herbage were deter- mined for each quadrat. This was repeated for 126 sets for estimating browse and 124 sets for estimating herbage. In order to simulate the SRS estimator for the mean weight of browse, one quadrat was randomly selected from each set without considering its rank. Since actual values were known for each quadrat, the RSS estimator was obtained by randomly choosing the ranks to be quantified for each set, re- sulting in 37 lowest ranks, 46 intermediate ranks and 43 highest ranks. Halls and Dell also examined Mclntyre's suggestion that unequal allocation might further improve the efficiency of estimation. Since the standard deviations for the order statistics were 7, 13 and 27.7 for the low, intermediate and high yield, respectively (ratio of 1:2:4), they selected 14 quadrats in the low group, 40 in the intermediate group and 72 in the high group. Note that perfect ranking was obtained for both RSS protocols because the actual values were already known for each quadrat. Results of these three sampling protocols are reported in Table 1. As ex- pected under perfect ranking, precision due to RSS with approximately equal allocation increased, more than doubling for browse estimates. Furthermore, when allocation was proportional to the order statistic standard deviation, 13 ------- Figure 5: Within each circle, quadrats are randomly placed, followed by ranking and analysis of one appropriate quadrat, (not to scale) Table 1: Summary statistics for browse and herbage estimates browse herbage Variance Variance mean of mean mean of mean Unranked: random 14.9 Perfect ranking: near equal allocation 13.2 Perfect ranking: proportional allocation 12.9 4.55 2.18 1.91 7.3 7.0 7.2 1.00 .73 .58 (Source: Halls and Dell. 1966) the precision increased still further, thus supporting Mclntyre's contention. Another very valuable aspect of this study was that two observers inde- pendently ranked the quadrats, one a professional range man and the other a woods worker. There was practically no difference in the ranking results between the two observers. 3.2. Seedling Counts The effectiveness of RSS for improving the sampling precision of seedling counts was studied by Evans (1967) in an area in central Louisiana that was seeded to Longleaf Pine (Pinus palustris mill). After dividing the target area into 24 blocks, each block was then subdivided into 25 one-milacre plots. All 14 ------- 600 plots were initially measured to characterize the population, which is summarized in Table 2a. The population mean and standard deviation were calculated to be 1.675 and 1.36, respectively. For the RSS protocol, three plots were randomly selected from each of the 24 blocks (sets), resulting in 72 identified plots. The three plots within each set were then visually ranked. One cycle consisted of selecting the lowest. ranked plot from the first set, the second lowest from the second set and the highest ranked plot from the third set. Repeating the cycle eight times yielded 24 selected plots in the ranked set sample (m = 3, r = 8). This whole procedure was repeated twice so that three separate field trials were performed, as summarized in Table 2b. Evans also computed the means and standard deviations of each rank using all 72 identified plots for each of the three field trials. These results are reproduced in Table 2c for comparison to the RSS results in Table 2b. In order to compare RSS to SRS, Evans resampled the 24 blocks (sets) 80 times to obtain two empirical distributions of the means, one based on the RSS estimator and the other based on the SRS estimator, which is actually a stratified random sample estimator. The results of this "bootstrapping" exercise are reproduced in Table 2d where we see a significant reduction in the variance due to RSS. 3.3. Shrub Phytomass in Forest Stands The performance of RSS for estimating shrub phytomass (all vegetation be- tween one and five meters high) was evaluated by Martin et al. (1980) at a forested site in Virginia. They investigated four major vegetation types along a decreasing moisture gradient: mixed hardwood, mixed oak. mixed oak and pine, and mixed pine. For each vegetation type, a 20m by 20m area was subjectively located which was further divided into 16 plots of equal size (5m by 5m). For the RSS procedure, four sets of four plots were randomly selected from the 16 plots in each vegetation type. The plots in each set were then ranked by visual inspection, followed by quantifying the smallest ranked plot from the first set, the second smallest ranked plot from the second set and so on in the usual manner for RSS. This was repeated for each of the four vegetation types. For the SRS procedure, four out of the 16 plots in each vegetation type were randomly selected without replacement, followed by quantification of each selected plot. Again, this is actually a stratified random sample since each vegetation type is a separate stratum. Shrub phytomass was also determined for all 64 plots to obtain a grand mean and variance for comparison. Their results are reproduced in Table 3 where we see a 15 ------- Table 2: Data from Longleaf Pine Seedling Counts (a) The frequency distribution of seedling counts in the 600 milacre plots. Seedling Count 0 I 2 3 4 5 6 78 9~ Frequency 110 201 157 75 33 17 3 3 0 1 (b) Means and variances of three ranked set sample trials.(mr = 24) Trial Mean Variance I 149 0.043 2 1.62 0.056 3 1.71 0.024 (c) Means and standard deviations of all seedlings for all ranks of three field trials of ranked set sampling. Trial Means Mean Standard Deviations L M H L M H 1 0.750 1.500 2.625 1.625 0.532 0.750 1.173 2 0.917 1.625 2.833 1.792 0.881 1.013 1.880 3 0.750 1.708 3.125 1.861 0.520 0.955 0.927 (d) Test of significance of ranked-set versus random sampling Method of Number Degrees Mean Sum Variance F sampling applications freedom squares Random SO Ranked-set SO ^^Significant at the 79 1.709 7.572 79 1.647 1.939 .01 level of probability .0958 3.91** .0245 (Source: Evans, 1967) 16 ------- Table 3: RSS and SRS results for 16 measured plots across all vegetation types. Sampling Method All 64 Plots SRS RSS Mean Phytomass (kg/ha) 2536 1976 2356 Variance of the Mean (X 106) 0.15 4.34 2.73 Coefficient, of Variation of the Mean(%) 15 108 70 (Source: Martin et al. 1980) substantial increase in precision of the mean estimator associated with RSS. 3.4. Herbage Mass In order to compare RSS to SRS for estimating herbage mass in pure grass swards and both herbage mass and clover content in mixed grass-clover swards, Cobby et al. (1985) conducted four experiments at Hurley (UK). Besides comparison of RSS to SRS, their objective was to assess the effects of the following factors on RSS: (i) imperfect ranking within sets, (ii) greater variation between sets than within sets, and (iii) asymmetric distribution of the quantified values. The first two experiments were conducted by randomly selecting 15 lo- cations, followed by randomly selecting three quadrats at each location and having several observers rank the quadrats within each set. For the last two experiments, 45 quadrats were drawn at random from the entire target area. This allowed an assessment of the effects of both spatial variation and ranking errors within sets. Their results are reproduced in Table 4, where RP of both the worst and best observers are compared to the RP under perfect ranking, and the between and within set variances are presented for assessing spatial variation. These authors determined the main adverse factor to be within set clustering, and they recommend spacing quadrats within sets as far apart as possible when local spatial autocorrelation exists. With this in mind, they recommend RSS over SRS for sampling grass and grass-clover swards. 17 ------- Table 4: Relative precisions (RP) ± s.e. of the worst and the best observers. and under perfect ranking: and the between and the within set variances while estimating herbage mass (grass and mixture) and clover contents. Experiments Relative Precisions (R P) Variances Worst Best Perfect Between Within 1 (Grass) 2 (Mixture) 3 (Grass) 4 (Mixture) 2 (Clover) 4 (Clover) 1.11 ± 0.09 1.11 ± 0.09 - - - 1.36 ± 0.14 1.15 ± 0.12 1.36 ± 0.19 1.23 ± 0.14 1.27 ± 0.10 - - - 1.51 ± 0.15 1.34 ± 0.15 1.62 ± 0.18 1.31 ± 0.17 1.40 ± 0.16 1.66 ± 0.17 1.55 ± 0.16 1.44 + 0.16 1.72 ± 0.20 0.24 0.07 0.00 0.11 16.3 16.2 0.31 0.09 1.58 0.66 34.4 71.6 (Source: Cobby et al. 1985) 3.5. PCB Contamination Levels Before being lead to believe that RSS is only for vegetation studies: let us consider estimating PCB concentrations in soil. PatiL Sinha, and Taillie (1994) used measurements of this contaminant, collected at a Pennsylvania site along the gas pipeline of the Texas Eastern Company. Table 5 provides the summary statistics of PCB values in two sampling grids (A and C) within this site. Since the distribution of these data was highly skewed: they ex- amined the effects of unequal as well as equal allocation of samples. More specifically, they examined the following schemes: (a) Equal allocation of samples using all possible choices of sample units of each set size, (b) Equal allocation of samples for a particular sample, and (c) Unequal allocation of samples. Considering set sizes 2, 3, and 4, the relative savings (RS) were computed as var(SRS)-var(RSS) taj,jng ^io consideration all possible choices of sample units for each set size for both the grids under the equal allocation scheme. The results are given in Table 6, where it is evident that RS increases with set size but that the magnitude of RS is higher for grid C than for grid A. Note that the data for grid C is much less skewed than grid A, as seen in Table 5. 18 ------- Table 5: Descriptive statistics of PCB values in grids A and C. Characteristics Number of Observations Mean Standard Deviation Coefficient of Variation Coefficient of skewness Coefficient of kurtosis A 184 200.9 902.9 4.49 9.27 99.69 Grid C 68 600.2 1583 2.64 4.48 20.88 Table 6: Relative savings (RS) considering all possible combinations of each set size under perfect ranking situation with equal allocation. Set Size (m) 2 3 4 A RS 4 7 10 Grid C RS 9 16 22 19 ------- Table 7: Values of the sample mean, X(m)u, relative precision, and relative savings under the perfect ranking protocol with unequal allocation of sam- ples. Set Size Grid m A C Proportion X(m)u RP RS Proportion X(m)u RP RS of samples of samples (Exact No.) (Exact No.) 2 2 3 3 (2 1:10 (8,84) 1:15 (6,86) 1:4:20 ,10,48) 1:4:25 205, 203. 203, .9 .1 .6 201.1 1.724 1.818 2.174 2.326 42 45 54 57 (2,8,50) 4 4 1 (2, 1 (2,2 :3:5:16 5,9,28) :3:9:27 ,10,30) 247, 226, .1 .1 1.695 1.316 41 24 1:10 (3,31) 1:15 (2,32) 1:1.7:1.5 (5,8,8) 1:2:7 (2,4,15) 1:2:3:4 (2,3,5,6) 1:1:3:5 (2,2,4,8) 535.2 520.4 560.1 615.2 576.6 802.4 2.041 2, 1 1 2, 1 .174 .471 .923 .083 .449 51 54 32 48 52 31 For comparing the performance of the RSS protocol relative to that of SRS with unequal allocation of samples, these authors considered two different proportional allocations for each set-size in order to decide the sample size for each rank. This has been done to show the impact of proportional allocation on the magnitude of relative savings accrued due to RSS over SRS. The results are given in Table 7, where the magnitudes of relative savings are seen to be quite substantial for each set size for both the grids. While unequal allocation of samples into ranks can substantially increase RS when the underlying population follows a skewed distribution, this pro- cedure does require some prior knowledge of the underlying distribution. For this purpose one may either take advantage of prior surveys of similar nature or conduct a pilot study. This same problem also arises in determining the optimum sample size under Neyman's allocation scheme for stratified ran- dom sampling. Recent work by Kaur, Patil and Taillie (1994) has addressed the issue of optimum allocation when some knowledge about the underlying distribution is available, and they have devised a rule-of-thumb for allocating sample units based on skewness. 20 ------- composite 1 composite 2 composite 3 Figure 6: Formation of three composites from three ranked set samples, each with a set size of three. Homogeneity is maximized (variability is minimized) within each composite by forming composites from equally ranked samples. 3.6. Improved Compositing of Samples Consider a situation that calls for composite sampling, as discussed in Volume 1 of this series. If our primary objective is classification of the individual samples used to form the composites and/or identification of those individual samples that constitute an upper percentile with respect to the characteristic of interest, then we will need to retest certain individual samples. Since the purpose of composite sampling is to minimize the number of analytical tests required, we obviously want to minimize the extent of retesting individual samples. Maximizing the homogeneity within the composites will minimize the necessary number of retests. Therefore, it is desired to form composites from individual sample units that are as much alike as possible. As pointed out by Patil, Sinha, and Taillie (1994), one can increase the chances of obtaining maximum homogeneity within composites by forming composites from samples identified to be in the same rank as conceptualized in Figure 6. The RSS protocol can thus be combined with composite sam- pling to achieve even greater observational economy than composite sampling alone. 21 ------- 3.7. Additional Applications Yanagawa and Chen (1980) mention that the RSS technique is regularly employed at the Pastoral Research Laboratory, CSIRO at Armidale, N.S.W.. Australia. A plate with four holes is randomly thrown on a field and the pasture in each hole is ranked by eye, followed by selection of one hole for quantification of pasture. These authors also mention that RSS has been used to estimate rice crops in Okinawa, Japan. They attribute this information to H. Mizuno at the "Mathematical Method in Sampling" symposium held at Chiba University (Japan), 1974. In addition to the reported applications of RSS, several other applications have been recommended: (i) Evans (1967) pointed out that the method would prove time-saving in the determination of cell wall thickness of different species of wood. In the same area of application, Dell (1969) has mentioned that the RSS procedure should be efficient for estimating averages for various properties of cells in a cross section of wood chips. (ii) The RSS method of sampling could be useful in determining the aver- age length of various kinds of bacterial cells. Also, it may be used to determine the average number of bacterial cells per unit volume. This is possible because it is convenient to order test tubes containing the cell suspension on the basis of concentration with the help of an optical instrument without knowing the exact number of the bacterial cells. Takahasi and Wakimoto (1968) have suggested these applications. (iii) The technique may also be used to determine the average height of trees because it is easy to rank the heights of several nearby trees by a visual inspection. This application has also been mentioned by Takahasi and Wakimoto (1968). (iv) Stokes and Sager (1988) have suggested that the method of RSS could also be used to investigate a difficult-to-measure characteristic in hu- man populations. They have, for example, referred to the Consumer Expenditure Survey. The results of this survey are used for the con- struction of the Consumer Price Index. But this survey requires de- tailed record keeping by the participating households as well the ser- vices of professional interviewers. In this situation a pre-measurement ranking could be performed on the basis of a cheaper screening inter- view and the technique of RSS may prove to be a timely innovation. (v) With the availability of computerized Geographic Information Systems (CIS), ranking prospective sample locations across a landscape may be 22 ------- done rapidly prior to expensive field visits, thus allowing RSS to be applied to large scale surveys to obtain a more precise estimate at re- duced cost. If prospective locations are selected at random from across a region and allocated to a set, then each location can be referenced to data layers in a GIS and, based on a derived ranking index, each member of the set can be ranked relative to each other. This merger of GIS and RSS has been recommended by Johnson and Myers (1993) and by Myers, Johnson and Patil (1994). (vi) Following a catastrophic event such as flooding or fire, those in charge of management and planning of natural or cultural resources need rapid assessments of the spatial extent and magnitude of damage. The au- thors cited above in item (v) recommend that the combination of RSS and Geographic Information Systems (GIS) can result in rapid mobi- lization of available information to design a very efficient field sampling strategy. 23 ------- ------- 4. Summary Compared to simple random sampling, the ranked set sampling method has been proven theoretically and shown empirically to yield more precise esti- mators of the population mean. This is especially desirable when sample sizes are generally small as with environmental data which are expensive or destructive to obtain. A browse through the references cited throughout this publication will also reveal that many other population features can be estimated with higher precision using ranked set sampling. So long as ranking is not cost-prohibitive, such as with exploiting avail- able expert judgment or using a readily available ranking covariate, ranked set sampling can serve to achieve observational economy. Even if measur- able effort is required to obtain values of a covariate, it may still be worth while if the resulting rankings were reasonably accurate. And this is particu- larly so, because there is considerable robustness in the ranked set sampling procedure. A very attractive feature of ranked set sampling is that, unlike other double sampling procedures, it can use subjective expert opinion as the source of auxiliary information. Such a feature also appeals to the philosophy of total quality management because it exploits the expertise of not only a professional statistician, but also of field personnel who usually know the most about the population being sampled. 2.5 ------- ------- References COBBY, J. M. RIDOUT: M. S., BASSETT, P. J.. AND LARGE, R. V. (198.5). An investigation into the use of ranked set sampling on grass and grass-clover swards. Grass and Forage Science, 40, 257-263. DAVID. H. A. AND LEVINE, D. N. (1972). Ranked set sampling in the presence of judgment error. Biometrics, 28, 553-555. DELL. T. R. (1969). The theory and some applications of ranked set sampling. Ph.D. Thesis, Department of Statistics, University of Georgia, Athens. Geor- gia. DELL. T. R. AND CLUTTER, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545-553. EVANS, M. J. (1967). Application of ranked set sampling to regeneration surveys in areas direct-seeded to longleaf pine. Masters Thesis, School of Forestry and Wildlife Management, Louisiana State University, Baton Rouge. HALLS, L. K. AND DELL, T. R. (1966). Trial of ranked set sampling for forage yields. Forest Science, 12, 22-26. JEWISS, 0. R. (1981). Shoot development and number. In Sward Measurement Handbook, J. Hodgson, et al., eds. Hurley: The British Grassland Society. pp. 93-114. JOHNSON, G. D., AND MYERS, W. L. (1993). Potential of ranked-set sampling for disaster assessment. Presented at IUFRO S4.02 Conference on "Inventory and Management Techniques in the Context of Catastrophic Events," June 1993. KAUR, A., PATIL, G. P., AND TAILLIE, C. (1994). Unequal allocation model for ranked set sampling with skew distributions. Technical Report 94-0930, Center for Statistical Ecology and Environmental Statistics, Department of Statistics, Pennsylvania State University, University Park, PA. KAUR, A., PATIL, G. P., SINHA: A. K. AND TAILLIE, C. (1995). Ranked set sampling: An annotated bibliography. Environmental and Ecological Statis- tics, 2(1) (to appear). MARTIN, W. L., SHARIK, T. L., ODERWALD, R. G., AND SMITH, D. W. (1980). Evaluation of ranked set sampling for estimating shrub phytomass in Ap- palachian oak forests. Publication Number FWS-4-80, School of Forestry and Wildlife Resources, Virginia Polytechnic Institute and State University, Blacksburg, Virginia. MclNTYRE, G. A. (1952). A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3, 385-390. MYERS, W., JOHNSON, G. D., AND PATIL: G. P. (1994). Rapid mobilization of spatial/temporal information in the context of natural catastrophes. In ------- 1994 Proceedings of the Section on Statistical Graphics. American Statistical Association. Alexandria, VA. pp. 25-31. MUTTLAK, H.A. AND MCDONALD, L.L. (1992). Ranked set sampling and the line-intercept method: a more efficient procedure Biom. J., 3, 329-346. PATIL, G. P. . SINHA, A. K., AND TAILLIE, C. (1994). Ranked set sampling. In Handbook of Statistics, Volume 12: Environmental Statistics. G. P. Patil and C. R. Rao, eds. North Holland/Elsevier Science Publishers. STOKES . S. L. (1977). Ranked set sampling with concomitant variables. Commu- nications in Statistics-Theory and Methods. A6, 1207-1211. STOKES, S. L. (1980a). Estimation of variance using judgment ordered ranked set samples. Biometrics, 36, 35-42. STOKES, S. L. (1980b). Inferences on the correlation coefficient in bivariate normal populations from ranked set samples. Journal of the American Statistical Association, 75, 989-995. STOKES, S. L. AND SAGER, T. W. (1988). Characterization of a ranked set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83, 374-381. TAKAHASI, K. AND WAKIMOTO, K. (1968). On unbiased estimates of the popu- lation mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics. 20, 1-31. YANAGAWA, T. AND S.H. CHEN. (1980). The MG procedure in ranked set sam- pling. J. Statist. Plann. Inference, 4, 33-34. 28 ------- |