United States
Environmental Protection
Agency
Policy, Planning,
And Evaluation
(2163)
EPA-230-R-95-006
August 1995
c/EPA
EPA Observational
Economy Series
Volume 2: Ranked Set Sampling
-------
EPA Observational Economy Series
Vol. 2: Ranked Set Sampling
United States Policy, Planning, EPA 230-R-95-006
Environmental Protection And Evaluation August 1995
Agency (2163)
-------
Contents
Foreword i"
Acknowledgments iv
1. Introduction 1
2. What is Ranked Set Sampling? 5
2.1. Description 5
2.1.1. Ranking Criteria 7
2.1.2. Robustness of the Procedure 7
2.2. Variations of the Basic Protocol 8
2.2.1. Unequal Allocation of Sample Units 8
2.2.2. Combining with Line Intercept, Sampling 10
3. Applications 13
3.1. Forage Yields 13
3.2. Seedling Counts 14
3.3. Shrub Phytomass in Forest Stands 15
3.4. Herbage Mass 17
3.5. PCB Contamination Levels 18
3.6. Improved Compositing of Samples 21
3.7. Additional Applications 22
4. Summary 25
References 27
-------
Foreword
Ranked set sampling is a novel method for achieving observational economy
when performing environmental monitoring and assessment. Compared to
simple random sampling, ranked set sampling yields a sample of observations
that are more representative of the underlying population. Therefore: either
greater confidence is gained for a fixed number of observations, or for a desired
level of confidence, less observations (less $) are needed.
The increased sampling efficiency is achieved by exploiting auxiliary infor-
mation involving acquired field samples, a characteristic of double sampling
procedures. With ranked set sampling, however: the auxiliary information
does not have to be a quantitative concomitant variable. In fact, it can be
purely judgmental; and thus, in the spirit of total quality management, it
stimulates and utilizes a productive cross disciplinary dialogue among those
responsible for sampling and assessment. Additionally, the ranked set sam-
pling procedure is robust in the sense that it cannot perform worse than the
usual simple random sampling.
This volume in the EPA Observational Economy Series introduces the
concept and method of ranked set sampling for its timely inclusion in the
toolbox of sampling procedures that aim to achieve observational economy,
particularly when analytical costs dominate the monitoring scenario.
iii
-------
Acknowledements
The EPA Observational Economy Series is a result of the research conducted
under a cooperative agreement between the U.S. Environmental Protection
Agency and the Pennsylvania State University Center for Statistical Ecology
and Environmental Statistics, Professor G.P. Patil, Director.
'The EPA Grant CR-821531010, entitled "Research and Outreach on Ob-
servational Economy, Environmental Sampling and Statistical Decision Mak-
ing in Statistical Ecology and Environmental Statistics" consists of ten sep-
arate projects in progress at the Penn State Center: 1) Composite Sampling
and Designs; 2) Ranked Set Sampling and Designs; 3) Environmental Site
Characterization and Evaluation; 4) Encounter Sampling: 5) Spatio-temporal
Data Analysis; 6) Biodiversity Analysis and Monitoring; 7) Adaptive Sam-
pling Designs; 8) Statistics in Environmental Policy and Regulation for Com-
pliance and Enforcement; 9) Statistical Ecology and Ecological Risk Assess-
ment; and 10) Environmental Statistics Knowledge Transfer? Outreach and
Training.
The series is published by the Statistical Analysis and Computing Branch
of the Environmental Statistics and Information Division in the EPA Office of
Policy, Planning and Evaluation. This volume in the series is largely based
on the work of G. D. Johnson, A. Kaur, G. P. Patil, A. K. Sinha and C.
Taillie at the Penn State Center in cooperation with John Fritzvold, Herbert
Lacayo, Robert O'Brien, Brenda Odom, Barry Nussbaum, and John Warren:
as project officers at the U.S. EPA. Questions or comments on this publication
should be directed to Dr. N. Phillip Ross, Director, Environmental Statistics
and Information Division (Mail Code 2163), United States Environmental
Protection Agency, 401 M Street SW, Washington, DC 20460; Ph. (202)
260-2680.
IV
-------
1. Introduction
Environmental monitoring and assessment mostly requires observational data,
as opposed to data obtained from controlled experiments. This is true
whether we are assessing the extent of soil contamination at a one-acre site
or some measure of forest' resources over the Pacific Northwest region of the
United States. Obtaining such data requires identification of sample units to
represent the population of concern, followed by selection of particular units
to quantify the characteristic(s) of interest. Sample units are basically the
smallest units of measurement such as plots: soil cores, individuals: etc., while
typical characteristics of interest include biomass, chemical concentrations or
"head counts".
Typically the most expensive part of this process is laboratory analysis,
while identification of potential sample units is a comparatively simple mat-
ter. We can therefore achieve great observational economy if we are able
to identify a large number of sample units to represent the population of
interest, yet only have to quantify a carefully selected subsample.
This potential for observational economy was recognized for estimating
mean pasture and forage yields in the early 1950's when Mclntyre (1952)
proposed a method, later coined Ranked Set sampling (RSS) by Halls and
Dell (1966), and currently under active investigation in various quarters.
As a simple introduction to the concept of RSS, consider the following
example:
Let's say we wish to estimate the mean height of students at a university
from a random sample of three students. Furthermore, in order to acknowl-
edge the inherent uncertainty, we need to present this estimate as a confidence
interval within which we expect the true population mean to lie with desired
confidence.
Now the simplest way to obtain our sample is to randomly select three
students from the university's population, then measure their heights. While
the arithmetic average of the three heights is an unbiased point estimate of
the population mean: the associated confidence interval can be very large,
reflecting the high degree of uncertainty with estimating a large population
mean with only three measurements. This is because we have no control over
-------
which individuals of the population enter the sample. For example, we may
happen to grab two very short people and one very tall: or we may grab three
very tall people. The only way to overcome such a problem with a simple
random sample (SRS) is to increase the sample size.
On the other hand, we may obtain a ranked set sample. To do this.
we may randomly invite three students to breakfast and visually rank them
with respect to height. We then select the student we believe is shortest and
actually measure his or her height. Repeating this process with lunch, we
then select the middle ranked person, and, as such, select the tallest ranked
person at dinner. 'The resulting measurements of student heights constitute
a ranked set sample. As with the SRS measurements: the arithmetic average
of the RSS measurements provides an unbiased point estimate of the popu-
lation mean; however, the associated confidence interval can potentially be
much smaller than that obtained with SRS measurements, thus reflecting de-
creased uncertainty. This encouraging feature results because measurements
obtained through RSS are likely to be more regularly spaced than those ob-
tained through SRS and therefore are more representative of the population.
Amazingly, the RSS procedure induces stratification of the whole population
at the sample level; in effect, we are randomly sampling from the subpopu-
lations of predominantly short, medium and tall students without having to
construct the subpopulation strata. Each subpopulation has its own distri-
bution, as visualized in Figure 1, where we see how the parent population
gets effectively partitioned into subpopulations.
Mclntyre's proposal does not appear to have been applied for over a
decade, after which forest and range researchers continued to discover the
effectiveness of RSS (see Halls and Dell, 1966; Evans, 1967; Martin, et al.
1980; Jewiss, 1981; and Cobby, et al. 1985). Theoretical investigations by
Dell and Clutter (1972) showed that, regardless of ranking errors, the RSS
estimator of a population mean is unbiased and at least as precise as the
SRS estimator with the same number of quantifications. David and Levine
(1972) investigated the case where ranking is done by a numerical covariate.
Furthermore, RSS also provides more precise estimators of the variance
(Stokes, 1980a). the cumulative distribution function (Stokes and Sager,
1988) and at times the Pearson correlation coefficient (Stokes, 1980b). For an
annotated bibliography with an historic perspective, see Kaur, Patil, Sinha,
and Taillie (1995).
-------
Q
Figure 1: Frequency distributions of heights of different ranks superimposed
on population frequency distribution of all heights-a schematic diagram.
-------
-------
2. What is Ranked Set
Sampling?
2.1. Description
As mentioned in Chapter 1, to create ranked sets we must partition the
selected first phase sample into sets of equal size. In order to plan an RSS
design, we must therefore choose a set size which is typically small, around 3
or 4, to minimize ranking error. Let's arbitrarily call this set size m, where
m is the number of sample units allocated into each set. Now proceed as
follows.
• step 1: Randomly select m2 sample units from the population.
• step 2: Allocate the m2 selected units as randomly as possible into m
sets, each of size m.
• step 3: Without yet knowing any values for the variable of interest,
rank the units within each set based on a perception of relative values
for this variable. This may be based on personal judgment or done
with measurements of a covariate that is correlated with the variable
of interest.
• step 4: Choose a sample for actual analysis by including the smallest
ranked unit in the first set, then the second smallest ranked unit in the
second set, continuing in this fashion until the largest ranked unit is
selected in the last set.
• step 5: Repeat steps 1 through 4 for r cycles until the desired sample
size, n = mr, is obtained for analysis.
As an illustration, consider the set size m = 3 with r = 4 cycles. This sit-
uation is illustrated in Figure 2 where each row denotes a judgment-ordered
sample within a cycle, and the units selected for quantitative analysis are
-------
cycle
rank
1 2 3
Figure 2: A ranked set sample design with set size m = 3 and the number
of sampling cycles r = 4. Although 36 sample units have been selected from
the population, only the 12 circled units are actually included in the final
sample for quantitative analysis.
circled. Note that 36 units have been randomly selected in 4 cycles; how-
ever, only 12 units are actually analyzed to obtain the ranked set sample of
measurements.
Obtaining a sample in this manner results in maintaining the unbiasedness
of simple random sampling; however: by incorporating "outside" information
about the sample units, we are able to contribute a structure to the sample
that increases its representativeness of the true underlying population.
If we quantified the same number of sample units, mr = 12, by a sim-
ple random sample, we have no control over which units enter the sample.
Perhaps all the 12 units would come from the lower end of the range, or per-
haps most would be clustered at the low end while one or two units would
come from the middle or upper range. With simple random sampling, the
only way to increase the prospect of covering the full range of possible val-
ues is to increase the sample size. With ranked set sampling, however, we
increase the representativeness with a fixed number of sample units, thus
saving considerably on quantification costs.
With the ranked set sample thus obtained, it can be shown that unbiased
estimators of several important population parameters can be calculated,
including the mean and, in case of more than one sampling cycle, the variance.
-------
2.1.1. Ranking Criteria
A real key- to success lies with step 3 in the above procedure- ranking. This
may be based on visual inspection or other expert opinion about the sample
units. For example, a field-seasoned range scientist or woods person may
readily be able to rank three or four quadrats of grass with respect to overall
volume or mass. Meanwhile a hazardous waste site inspector may be able to
reliably rank areas of soil with respect to concentrations of a toxic contami-
nant, based on features like surface staining, discoloration or the appearance
of stressed vegetation.
On the other hand, if another characteristic is available that is highly
correlated with the characteristic of interest but costs much less to obtain,
then we may rank by the values of such a "covariate". For example! re-
flectance intensity of near-infrared electromagnetic radiation, as recorded in
a remotely sensed digital image, is directly proportional to vegetation concen-
tration on the ground. Another example might be to measure total organic
halides (TOX) in soil in order to rank soil sampling units with respect to the
concentration of volatile organic solvents. As an indicator variable. TOX is
much less expensive to measure than specific organic compounds.
2.1.2. Robustness of the Procedure
Several questions may now arise, such as:
1. What if the distribution of sample measurements is skewed? or sym-
metric? or essentially unknown?
2. What if the sample units are not randomly allocated into sets?
3. How does error in ranking affect results?
First of all. while independent (random) and identically distributed sam-
ple measurements obtained through perfect ranking may lead to optimum
performance of ranked set sampling, no matter how much these desirable
characteristics are deviated from, the sampling efficiency will never be worse
than with simple random sampling using the same number of quantifications.
In fact, when efficiency is expressed as the relative precision (RP) such that
variance of sample average with simple random sampling
variance of sample average with ranked set sampling
it can be shown that the bounds of this relative precision are
1 < RP < (m + l)/2,
7
-------
where m is the set size. Since RP can not be less than one. the RSS protocol
can not be worse than the simple random sampling protocol.
2.2. Variations of the Basic Protocol
2.2.1. Unequal Allocation of Sample Units
The performance of RSS decreases as the underlying distribution of the char-
acteristic of interest becomes increasingly skewed. Mclntyre (1952) originally
suggested that this problem may be overcome by allocating sample units into
ranks in proportion to the standard deviation of each rank. This is the same
approach as used in stratified random sampling, known as Neyman alloca-
tion, and would indeed be optimal if we had reliable prior estimates of the
rank standard deviations. An example of unequal allocation is displayed in
Figure 3. Here we have the same set size, m = 3, and sample size, n = 12, as
in the earlier example of equal allocation; however, the number of sampling
cycles is adjusted so as to yield the desired unequal allocation of samples.
Unequal allocation can actually increase the performance of RSS above
and beyond that achievable with standard equal allocation; however? if not.
properly applied, the performance of RSS can be worse than the performance
of simple random sampling. Actually, the bounds on relative precision with
unequal allocation become
0 < RP < m.
indicating that, with appropriate unequal allocation, the relative precision
may even increase to a level of m, and not just (m + l)/2 as in the case with
equal allocation.
Although an optimal RSS design would allocate samples into ranks in
direct proportion to the rank standard deviations, we rarely know the stan-
dard deviations beforehand. We do know, however, that the distributions of
many environmental and ecological variables are skewed towards the right,
meaning that while most values are clustered around a median, a few much
larger values are usually present. This skewness can actually be exploited to
increase the precision beyond that obtained with ranked set sampling under
equal allocation because standard deviations usually tend to increase with
increasing rank values for right-skewed distributions. With some idea of the
degree of skewness, Kaur, Patil, and Taillie (1994) have devised a rule-of-
thumb for allocating sample units into ranks that performs closely to the
optimal Neyman allocation. Therefore, distributions of many environmental
and ecological variables may actually lend themselves well to being estimated
-------
Sets
Units
No. of sets
10
11
12
Figure 3: Ranked set sampling with unequal allocation: circles indicate sam-
ple units chosen for quantification.
-------
with very high precision relative to that obtainable through simple random
sampling.
2.2.2. Combining with Line Intercept Sampling
A common field sampling method for ecological assessments is to include
sample units that one encounters along a line (transect) that is randomly se-
lected within a two-dimensional area of interest. Units are typically members
of a plant or animal species.
Often the number of sample units identified are too numerous to select
every one for quantification, especially if measurements are destructive, such
as with cutting vegetation for weighing. If the initially identified sample
units are treated as a larger first phase sample, ri = m2r, then the RSS
protocol can be applied to select a smaller subsample, n = mr, for actual
quantification. For example: consider a single sampling cycle when the set
size, m equals three for estimating the biomass of shrubs in a given area. A
line transect for such a situation may be visualized as in Figure 4.
Such an RSS-based line intercept sample has been found to produce more
precise, and still unbiased, estimators of the population mean. size, total and
cover, compared to the SRS-based line intercept sample (see Muttlack and
McDonald, 1992).
10
-------
Setl
Set 2
Set3
Figure 4: Aerial view of a line transect intercepting shrubs. For set size
m = 3, nine shrubs are partitioned into 3 sets of 3. Using apparent shrub
size for ranking with respect to biomass, the shrubs taken for analyses include
the smallest ranked in the first set, the second smallest ranked in the second
set and the largest ranked in the third set.
11
-------
-------
3. Applications
3.1. Forage Yields
Although Mclntyre's original proposal of estimating pasture yields by "unbi-
ased selective sampling using ranked sets" was made in 1952, no applications
were apparently reported until fourteen years later, Halls and Dell (1966)
applied Mclntyre's method: coining it "ranked set sampling" for estimating
the weights of browse and herbage in a pine-hardwood forest of east Texas.
These authors discovered RSS to be considerably more efficient than SRS.
Sets of three closely grouped quadrats were formed on a 300-acre tract.
At select locations, metal frames of 3.1 square feet were placed at three ran-
domly selected points within a circle of 13 foot radius as seen in Figure 5.
Quadrats were then ranked as lowest, intermediate and highest according
to the perceived weight of browse and, separately, of herbage. Then, after
clipping and drying, the separate weights of browse and herbage were deter-
mined for each quadrat. This was repeated for 126 sets for estimating browse
and 124 sets for estimating herbage.
In order to simulate the SRS estimator for the mean weight of browse, one
quadrat was randomly selected from each set without considering its rank.
Since actual values were known for each quadrat, the RSS estimator was
obtained by randomly choosing the ranks to be quantified for each set, re-
sulting in 37 lowest ranks, 46 intermediate ranks and 43 highest ranks. Halls
and Dell also examined Mclntyre's suggestion that unequal allocation might
further improve the efficiency of estimation. Since the standard deviations
for the order statistics were 7, 13 and 27.7 for the low, intermediate and high
yield, respectively (ratio of 1:2:4), they selected 14 quadrats in the low group,
40 in the intermediate group and 72 in the high group. Note that perfect
ranking was obtained for both RSS protocols because the actual values were
already known for each quadrat.
Results of these three sampling protocols are reported in Table 1. As ex-
pected under perfect ranking, precision due to RSS with approximately equal
allocation increased, more than doubling for browse estimates. Furthermore,
when allocation was proportional to the order statistic standard deviation,
13
-------
Figure 5: Within each circle, quadrats are randomly placed, followed by
ranking and analysis of one appropriate quadrat, (not to scale)
Table 1: Summary statistics for browse and herbage estimates
browse
herbage
Variance Variance
mean of mean mean of mean
Unranked: random 14.9
Perfect ranking:
near equal allocation 13.2
Perfect ranking:
proportional allocation 12.9
4.55
2.18
1.91
7.3
7.0
7.2
1.00
.73
.58
(Source: Halls and Dell. 1966)
the precision increased still further, thus supporting Mclntyre's contention.
Another very valuable aspect of this study was that two observers inde-
pendently ranked the quadrats, one a professional range man and the other
a woods worker. There was practically no difference in the ranking results
between the two observers.
3.2. Seedling Counts
The effectiveness of RSS for improving the sampling precision of seedling
counts was studied by Evans (1967) in an area in central Louisiana that was
seeded to Longleaf Pine (Pinus palustris mill). After dividing the target area
into 24 blocks, each block was then subdivided into 25 one-milacre plots. All
14
-------
600 plots were initially measured to characterize the population, which is
summarized in Table 2a. The population mean and standard deviation were
calculated to be 1.675 and 1.36, respectively.
For the RSS protocol, three plots were randomly selected from each of the
24 blocks (sets), resulting in 72 identified plots. The three plots within each
set were then visually ranked. One cycle consisted of selecting the lowest.
ranked plot from the first set, the second lowest from the second set and
the highest ranked plot from the third set. Repeating the cycle eight times
yielded 24 selected plots in the ranked set sample (m = 3, r = 8). This
whole procedure was repeated twice so that three separate field trials were
performed, as summarized in Table 2b. Evans also computed the means and
standard deviations of each rank using all 72 identified plots for each of the
three field trials. These results are reproduced in Table 2c for comparison to
the RSS results in Table 2b.
In order to compare RSS to SRS, Evans resampled the 24 blocks (sets) 80
times to obtain two empirical distributions of the means, one based on the
RSS estimator and the other based on the SRS estimator, which is actually
a stratified random sample estimator. The results of this "bootstrapping"
exercise are reproduced in Table 2d where we see a significant reduction in
the variance due to RSS.
3.3. Shrub Phytomass in Forest Stands
The performance of RSS for estimating shrub phytomass (all vegetation be-
tween one and five meters high) was evaluated by Martin et al. (1980) at
a forested site in Virginia. They investigated four major vegetation types
along a decreasing moisture gradient: mixed hardwood, mixed oak. mixed
oak and pine, and mixed pine. For each vegetation type, a 20m by 20m area
was subjectively located which was further divided into 16 plots of equal size
(5m by 5m).
For the RSS procedure, four sets of four plots were randomly selected
from the 16 plots in each vegetation type. The plots in each set were then
ranked by visual inspection, followed by quantifying the smallest ranked plot
from the first set, the second smallest ranked plot from the second set and
so on in the usual manner for RSS. This was repeated for each of the four
vegetation types. For the SRS procedure, four out of the 16 plots in each
vegetation type were randomly selected without replacement, followed by
quantification of each selected plot. Again, this is actually a stratified random
sample since each vegetation type is a separate stratum. Shrub phytomass
was also determined for all 64 plots to obtain a grand mean and variance
for comparison. Their results are reproduced in Table 3 where we see a
15
-------
Table 2: Data from Longleaf Pine Seedling Counts
(a) The frequency distribution of seedling counts in the 600 milacre plots.
Seedling Count 0 I 2 3 4 5 6 78 9~
Frequency 110 201 157 75 33 17 3 3 0 1
(b) Means and variances of three ranked set sample trials.(mr = 24)
Trial Mean Variance
I 149 0.043
2 1.62 0.056
3 1.71 0.024
(c) Means and standard deviations of all seedlings for all ranks of three field
trials of ranked set sampling.
Trial Means Mean Standard Deviations
L M H L M H
1 0.750 1.500 2.625 1.625 0.532 0.750 1.173
2 0.917 1.625 2.833 1.792 0.881 1.013 1.880
3 0.750 1.708 3.125 1.861 0.520 0.955 0.927
(d) Test of significance of ranked-set versus random sampling
Method of Number Degrees Mean Sum Variance F
sampling applications freedom squares
Random SO
Ranked-set SO
^^Significant at the
79 1.709 7.572
79 1.647 1.939
.01 level of probability
.0958 3.91**
.0245
(Source: Evans, 1967)
16
-------
Table 3: RSS and SRS results for 16 measured plots across all vegetation
types.
Sampling
Method
All 64 Plots
SRS
RSS
Mean
Phytomass
(kg/ha)
2536
1976
2356
Variance
of the Mean
(X 106)
0.15
4.34
2.73
Coefficient, of
Variation of
the Mean(%)
15
108
70
(Source: Martin et al. 1980)
substantial increase in precision of the mean estimator associated with RSS.
3.4. Herbage Mass
In order to compare RSS to SRS for estimating herbage mass in pure grass
swards and both herbage mass and clover content in mixed grass-clover
swards, Cobby et al. (1985) conducted four experiments at Hurley (UK).
Besides comparison of RSS to SRS, their objective was to assess the effects
of the following factors on RSS: (i) imperfect ranking within sets, (ii) greater
variation between sets than within sets, and (iii) asymmetric distribution of
the quantified values.
The first two experiments were conducted by randomly selecting 15 lo-
cations, followed by randomly selecting three quadrats at each location and
having several observers rank the quadrats within each set. For the last
two experiments, 45 quadrats were drawn at random from the entire target
area. This allowed an assessment of the effects of both spatial variation and
ranking errors within sets.
Their results are reproduced in Table 4, where RP of both the worst
and best observers are compared to the RP under perfect ranking, and the
between and within set variances are presented for assessing spatial variation.
These authors determined the main adverse factor to be within set clustering,
and they recommend spacing quadrats within sets as far apart as possible
when local spatial autocorrelation exists. With this in mind, they recommend
RSS over SRS for sampling grass and grass-clover swards.
17
-------
Table 4: Relative precisions (RP) ± s.e. of the worst and the best observers.
and under perfect ranking: and the between and the within set variances
while estimating herbage mass (grass and mixture) and clover contents.
Experiments Relative Precisions (R P) Variances
Worst Best Perfect Between Within
1 (Grass)
2 (Mixture)
3 (Grass)
4 (Mixture)
2 (Clover)
4 (Clover)
1.11 ± 0.09
1.11 ± 0.09
- - -
1.36 ± 0.14
1.15 ± 0.12
1.36 ± 0.19
1.23 ± 0.14
1.27 ± 0.10
- - -
1.51 ± 0.15
1.34 ± 0.15
1.62 ± 0.18
1.31 ± 0.17
1.40 ± 0.16
1.66 ± 0.17
1.55 ± 0.16
1.44 + 0.16
1.72 ± 0.20
0.24
0.07
0.00
0.11
16.3
16.2
0.31
0.09
1.58
0.66
34.4
71.6
(Source: Cobby et al. 1985)
3.5. PCB Contamination Levels
Before being lead to believe that RSS is only for vegetation studies: let us
consider estimating PCB concentrations in soil. PatiL Sinha, and Taillie
(1994) used measurements of this contaminant, collected at a Pennsylvania
site along the gas pipeline of the Texas Eastern Company. Table 5 provides
the summary statistics of PCB values in two sampling grids (A and C) within
this site. Since the distribution of these data was highly skewed: they ex-
amined the effects of unequal as well as equal allocation of samples. More
specifically, they examined the following schemes:
(a) Equal allocation of samples using all possible choices of sample units
of each set size,
(b) Equal allocation of samples for a particular sample, and
(c) Unequal allocation of samples.
Considering set sizes 2, 3, and 4, the relative savings (RS) were computed
as var(SRS)-var(RSS) taj,jng ^io consideration all possible choices of sample
units for each set size for both the grids under the equal allocation scheme.
The results are given in Table 6, where it is evident that RS increases with
set size but that the magnitude of RS is higher for grid C than for grid A.
Note that the data for grid C is much less skewed than grid A, as seen in
Table 5.
18
-------
Table 5: Descriptive statistics of PCB values in grids A and C.
Characteristics
Number of Observations
Mean
Standard Deviation
Coefficient of Variation
Coefficient of skewness
Coefficient of kurtosis
A
184
200.9
902.9
4.49
9.27
99.69
Grid
C
68
600.2
1583
2.64
4.48
20.88
Table 6: Relative savings (RS) considering all possible combinations of each
set size under perfect ranking situation with equal allocation.
Set Size
(m)
2
3
4
A
RS
4
7
10
Grid
C
RS
9
16
22
19
-------
Table 7: Values of the sample mean, X(m)u, relative precision, and relative
savings under the perfect ranking protocol with unequal allocation of sam-
ples.
Set Size Grid
m A C
Proportion X(m)u RP RS Proportion X(m)u RP RS
of samples of samples
(Exact No.) (Exact No.)
2
2
3
3
(2
1:10
(8,84)
1:15
(6,86)
1:4:20
,10,48)
1:4:25
205,
203.
203,
.9
.1
.6
201.1
1.724
1.818
2.174
2.326
42
45
54
57
(2,8,50)
4
4
1
(2,
1
(2,2
:3:5:16
5,9,28)
:3:9:27
,10,30)
247,
226,
.1
.1
1.695
1.316
41
24
1:10
(3,31)
1:15
(2,32)
1:1.7:1.5
(5,8,8)
1:2:7
(2,4,15)
1:2:3:4
(2,3,5,6)
1:1:3:5
(2,2,4,8)
535.2
520.4
560.1
615.2
576.6
802.4
2.041
2,
1
1
2,
1
.174
.471
.923
.083
.449
51
54
32
48
52
31
For comparing the performance of the RSS protocol relative to that of SRS
with unequal allocation of samples, these authors considered two different
proportional allocations for each set-size in order to decide the sample size for
each rank. This has been done to show the impact of proportional allocation
on the magnitude of relative savings accrued due to RSS over SRS. The
results are given in Table 7, where the magnitudes of relative savings are
seen to be quite substantial for each set size for both the grids.
While unequal allocation of samples into ranks can substantially increase
RS when the underlying population follows a skewed distribution, this pro-
cedure does require some prior knowledge of the underlying distribution. For
this purpose one may either take advantage of prior surveys of similar nature
or conduct a pilot study. This same problem also arises in determining the
optimum sample size under Neyman's allocation scheme for stratified ran-
dom sampling. Recent work by Kaur, Patil and Taillie (1994) has addressed
the issue of optimum allocation when some knowledge about the underlying
distribution is available, and they have devised a rule-of-thumb for allocating
sample units based on skewness.
20
-------
composite 1
composite 2
composite 3
Figure 6: Formation of three composites from three ranked set samples, each
with a set size of three. Homogeneity is maximized (variability is minimized)
within each composite by forming composites from equally ranked samples.
3.6. Improved Compositing of Samples
Consider a situation that calls for composite sampling, as discussed in Volume
1 of this series. If our primary objective is classification of the individual
samples used to form the composites and/or identification of those individual
samples that constitute an upper percentile with respect to the characteristic
of interest, then we will need to retest certain individual samples. Since the
purpose of composite sampling is to minimize the number of analytical tests
required, we obviously want to minimize the extent of retesting individual
samples. Maximizing the homogeneity within the composites will minimize
the necessary number of retests. Therefore, it is desired to form composites
from individual sample units that are as much alike as possible.
As pointed out by Patil, Sinha, and Taillie (1994), one can increase the
chances of obtaining maximum homogeneity within composites by forming
composites from samples identified to be in the same rank as conceptualized
in Figure 6. The RSS protocol can thus be combined with composite sam-
pling to achieve even greater observational economy than composite sampling
alone.
21
-------
3.7. Additional Applications
Yanagawa and Chen (1980) mention that the RSS technique is regularly
employed at the Pastoral Research Laboratory, CSIRO at Armidale, N.S.W..
Australia. A plate with four holes is randomly thrown on a field and the
pasture in each hole is ranked by eye, followed by selection of one hole for
quantification of pasture. These authors also mention that RSS has been used
to estimate rice crops in Okinawa, Japan. They attribute this information
to H. Mizuno at the "Mathematical Method in Sampling" symposium held
at Chiba University (Japan), 1974.
In addition to the reported applications of RSS, several other applications
have been recommended:
(i) Evans (1967) pointed out that the method would prove time-saving in
the determination of cell wall thickness of different species of wood.
In the same area of application, Dell (1969) has mentioned that the
RSS procedure should be efficient for estimating averages for various
properties of cells in a cross section of wood chips.
(ii) The RSS method of sampling could be useful in determining the aver-
age length of various kinds of bacterial cells. Also, it may be used to
determine the average number of bacterial cells per unit volume. This
is possible because it is convenient to order test tubes containing the
cell suspension on the basis of concentration with the help of an optical
instrument without knowing the exact number of the bacterial cells.
Takahasi and Wakimoto (1968) have suggested these applications.
(iii) The technique may also be used to determine the average height of trees
because it is easy to rank the heights of several nearby trees by a visual
inspection. This application has also been mentioned by Takahasi and
Wakimoto (1968).
(iv) Stokes and Sager (1988) have suggested that the method of RSS could
also be used to investigate a difficult-to-measure characteristic in hu-
man populations. They have, for example, referred to the Consumer
Expenditure Survey. The results of this survey are used for the con-
struction of the Consumer Price Index. But this survey requires de-
tailed record keeping by the participating households as well the ser-
vices of professional interviewers. In this situation a pre-measurement
ranking could be performed on the basis of a cheaper screening inter-
view and the technique of RSS may prove to be a timely innovation.
(v) With the availability of computerized Geographic Information Systems
(CIS), ranking prospective sample locations across a landscape may be
22
-------
done rapidly prior to expensive field visits, thus allowing RSS to be
applied to large scale surveys to obtain a more precise estimate at re-
duced cost. If prospective locations are selected at random from across
a region and allocated to a set, then each location can be referenced
to data layers in a GIS and, based on a derived ranking index, each
member of the set can be ranked relative to each other. This merger
of GIS and RSS has been recommended by Johnson and Myers (1993)
and by Myers, Johnson and Patil (1994).
(vi) Following a catastrophic event such as flooding or fire, those in charge
of management and planning of natural or cultural resources need rapid
assessments of the spatial extent and magnitude of damage. The au-
thors cited above in item (v) recommend that the combination of RSS
and Geographic Information Systems (GIS) can result in rapid mobi-
lization of available information to design a very efficient field sampling
strategy.
23
-------
-------
4. Summary
Compared to simple random sampling, the ranked set sampling method has
been proven theoretically and shown empirically to yield more precise esti-
mators of the population mean. This is especially desirable when sample
sizes are generally small as with environmental data which are expensive
or destructive to obtain. A browse through the references cited throughout
this publication will also reveal that many other population features can be
estimated with higher precision using ranked set sampling.
So long as ranking is not cost-prohibitive, such as with exploiting avail-
able expert judgment or using a readily available ranking covariate, ranked
set sampling can serve to achieve observational economy. Even if measur-
able effort is required to obtain values of a covariate, it may still be worth
while if the resulting rankings were reasonably accurate. And this is particu-
larly so, because there is considerable robustness in the ranked set sampling
procedure.
A very attractive feature of ranked set sampling is that, unlike other
double sampling procedures, it can use subjective expert opinion as the source
of auxiliary information. Such a feature also appeals to the philosophy of
total quality management because it exploits the expertise of not only a
professional statistician, but also of field personnel who usually know the
most about the population being sampled.
2.5
-------
-------
References
COBBY, J. M. RIDOUT: M. S., BASSETT, P. J.. AND LARGE, R. V. (198.5). An
investigation into the use of ranked set sampling on grass and grass-clover
swards. Grass and Forage Science, 40, 257-263.
DAVID. H. A. AND LEVINE, D. N. (1972). Ranked set sampling in the presence
of judgment error. Biometrics, 28, 553-555.
DELL. T. R. (1969). The theory and some applications of ranked set sampling.
Ph.D. Thesis, Department of Statistics, University of Georgia, Athens. Geor-
gia.
DELL. T. R. AND CLUTTER, J. L. (1972). Ranked set sampling theory with order
statistics background. Biometrics, 28, 545-553.
EVANS, M. J. (1967). Application of ranked set sampling to regeneration surveys
in areas direct-seeded to longleaf pine. Masters Thesis, School of Forestry
and Wildlife Management, Louisiana State University, Baton Rouge.
HALLS, L. K. AND DELL, T. R. (1966). Trial of ranked set sampling for forage
yields. Forest Science, 12, 22-26.
JEWISS, 0. R. (1981). Shoot development and number. In Sward Measurement
Handbook, J. Hodgson, et al., eds. Hurley: The British Grassland Society.
pp. 93-114.
JOHNSON, G. D., AND MYERS, W. L. (1993). Potential of ranked-set sampling for
disaster assessment. Presented at IUFRO S4.02 Conference on "Inventory
and Management Techniques in the Context of Catastrophic Events," June
1993.
KAUR, A., PATIL, G. P., AND TAILLIE, C. (1994). Unequal allocation model
for ranked set sampling with skew distributions. Technical Report 94-0930,
Center for Statistical Ecology and Environmental Statistics, Department of
Statistics, Pennsylvania State University, University Park, PA.
KAUR, A., PATIL, G. P., SINHA: A. K. AND TAILLIE, C. (1995). Ranked set
sampling: An annotated bibliography. Environmental and Ecological Statis-
tics, 2(1) (to appear).
MARTIN, W. L., SHARIK, T. L., ODERWALD, R. G., AND SMITH, D. W. (1980).
Evaluation of ranked set sampling for estimating shrub phytomass in Ap-
palachian oak forests. Publication Number FWS-4-80, School of Forestry
and Wildlife Resources, Virginia Polytechnic Institute and State University,
Blacksburg, Virginia.
MclNTYRE, G. A. (1952). A method for unbiased selective sampling, using ranked
sets. Australian Journal of Agricultural Research, 3, 385-390.
MYERS, W., JOHNSON, G. D., AND PATIL: G. P. (1994). Rapid mobilization
of spatial/temporal information in the context of natural catastrophes. In
-------
1994 Proceedings of the Section on Statistical Graphics. American Statistical
Association. Alexandria, VA. pp. 25-31.
MUTTLAK, H.A. AND MCDONALD, L.L. (1992). Ranked set sampling and the
line-intercept method: a more efficient procedure Biom. J., 3, 329-346.
PATIL, G. P. . SINHA, A. K., AND TAILLIE, C. (1994). Ranked set sampling.
In Handbook of Statistics, Volume 12: Environmental Statistics. G. P. Patil
and C. R. Rao, eds. North Holland/Elsevier Science Publishers.
STOKES . S. L. (1977). Ranked set sampling with concomitant variables. Commu-
nications in Statistics-Theory and Methods. A6, 1207-1211.
STOKES, S. L. (1980a). Estimation of variance using judgment ordered ranked
set samples. Biometrics, 36, 35-42.
STOKES, S. L. (1980b). Inferences on the correlation coefficient in bivariate normal
populations from ranked set samples. Journal of the American Statistical
Association, 75, 989-995.
STOKES, S. L. AND SAGER, T. W. (1988). Characterization of a ranked set
sample with application to estimating distribution functions. Journal of the
American Statistical Association, 83, 374-381.
TAKAHASI, K. AND WAKIMOTO, K. (1968). On unbiased estimates of the popu-
lation mean based on the sample stratified by means of ordering. Annals of
the Institute of Statistical Mathematics. 20, 1-31.
YANAGAWA, T. AND S.H. CHEN. (1980). The MG procedure in ranked set sam-
pling. J. Statist. Plann. Inference, 4, 33-34.
28
------- |