United States
Environmental Protection
Agency
Office of Environmental
Information
Washington, DC 20460
EPA/240/R-02/005
December 2002
vvEPA Guidance on Choosing a
Sampling Design for Environmental
Data Collection
for Use in Developing a Quality
Assurance Project Plan
EPA QA/G-5S
-------
FOREWORD
This document, Guidance for Choosing a Sampling Design for Environmental Data
Collection (EPA QA/G-5S), will provide assistance in developing an effective QA Project Plan as
described in Guidance for QA Project Plans (EPA QA/G-5) (EPA 1998b). QA Project Plans are
one component of EPA's Quality System. This guidance is different from most guidance in that it is not
meant to be read in a linear or continuous fashion, but to be used as a resource or reference document.
This guidance is a "tool-box" of statistical designs that can be examined for possible use as the QA
Project Plan is being developed.
EPA works every day to produce quality information products. The information used in these
products are based on Agency processes to produce quality data, such as the quality system described
in this document. Therefore, implementation of the activities described in this document is consistent
with EPA's Information Quality Guidelines and promotes the dissemination of quality technical,
scientific, and policy information and decisions.
This document provides guidance to EPA program managers, analysts, and planning teams on
statistically based sampling schemes. It does not impose legally binding requirements and the methods
described may not apply to a particular situation based on the circumstances. The Agency retains the
discretion to adopt approaches on a case-by-case basis that may differ from the techniques described
in this guidance. EPA may periodically revise this guidance without public notice. It is the intent of the
Quality Staff to revise the document to include: new techniques, corrections, and suggestions for
alternative techniques. Future versions of this document will include examples in depth that illustrate the
strengths of each statistical design.
This document is one of the U.S. Environmental Protection Agency Quality System Series
documents. These documents describe the EPA policies and procedures for planning, implementing,
and assessing the effectiveness of a Quality System. Questions regarding this document or other
Quality System Series documents should be directed to the Quality Staff:
U.S. Environmental Protection Agency
Quality Staff (2811R)
1200 Pennsylvania Ave., NW
Washington, D.C. 20460
Phone: (202)564-6830
Fax: (202)565-2441
E-mail: quality@epa.gov
Copies of EPA Quality System Series documents may be obtained from the Quality Staff or by
downloading them from epa.gov/quality/index.html.
Final
EPA QA/G-5 S i December 2002
-------
Final
EPAQA/G-5S ii December 2002
-------
TABLE OF CONTENTS
Page
1. INTRODUCTION 1
1.1 WHY IS SELECTING AN APPROPRIATE SAMPLING DESIGN
IMPORTANT? 1
1.2 WHAT TYPES OF QUESTIONS WILL THIS GUIDANCE ADDRESS? 2
1.3 WHO CAN BENEFIT FROM THIS DOCUMENT? 3
1.4 HOW DOES THIS DOCUMENT FIT INTO THE EPA QUALITY SYSTEM? . . 4
1.5 WHAT SOFTWARE SUPPLEMENTS THIS GUIDANCE? 5
1.6 WHAT ARE THE LIMITATIONS OR CAVEATS TO THIS DOCUMENT? .... 5
1.7 HOW IS THIS DOCUMENT ORGANIZED? 6
2. OVERVIEW OF SAMPLING DESIGNS 7
2.1 OVERVIEW 7
2.2 SAMPLING DESIGN CONCEPTS AND TERMS 8
2.3 PROBABILISTIC AND JUDGMENTAL SAMPLING DESIGNS 10
2.4 TYPES OF SAMPLING DESIGNS 11
2.4.1 Judgmental Sampling 12
2.4.2 Simple Random Sampling 12
2.4.3 Stratified Sampling 13
2.4.4 Systematic and Grid Sampling 13
2.4.5 Ranked Set Sampling 14
2.4.6 Adaptive Cluster Sampling 15
2.4.7 Composite Sampling 15
3. THE SAMPLING DESIGN PROCESS 17
3.1 OVERVIEW 17
3.2. INPUTS TO THE SAMPLING DESIGN PROCESS 17
3.3 STEPS IN THE SAMPLING DESIGN PROCESS 22
3.4 SELECTING A SAMPLING DESIGN 24
4. JUDGMENTAL SAMPLING 27
4.1 OVERVIEW 27
4.2 APPLICATION 27
4.3 BENEFITS 28
4.4 LIMITATIONS 28
4.5 IMPLEMENTATION 28
4.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS 29
Final
EPA QA/G-5S iii December 2002
-------
Page
4.7 EXAMPLES OF SUCCESSFUL USE 30
4.8 EXAMPLES OF UNSUCCESSFUL USE 31
5. SIMPLE RANDOM SAMPLING 33
5.1 OVERVIEW 33
5.2 APPLICATION 33
5.3 BENEFITS 34
5.4 LIMITATIONS 34
5.5 IMPLEMENTATION 35
5.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS 39
5.7 EXAMPLES 40
APPENDIX 5. SAMPLE SIZE TABLES 44
6. STRATIFIED SAMPLING 51
6.1 OVERVIEW 51
6.2 APPLICATION 51
6.3 BENEFITS 52
6.4 LIMITATIONS 53
6.5 IMPLEMENTATION 53
6.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS 54
6.7 EXAMPLE 55
APPENDIX 6-A. FORMULAE FOR ESTIMATING SAMPLE SIZE 57
APPENDIX 6-B. DALENIUS-HODGES PROCEDURE 59
APPENDIX 6-C. CALCULATING THE MEAN AND STANDARD ERROR 60
7. SYSTEMATIC/GRID SAMPLING 63
7.1 OVERVIEW 63
7.2 APPLICATION 64
7.3 BENEFITS 67
7.4 LIMITATIONS 68
7.5 IMPLEMENTATION 69
7.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS 71
7.7 EXAMPLES 72
8. RANKED SET SAMPLING 77
8.1 OVERVIEW 77
8.2 APPLICATION 80
8.3 BENEFITS 80
8.4 LIMITATIONS 82
Final
EPA QA/G-5S iv December 2002
-------
Page
8.5 IMPLEMENTATION 83
8.6 EXAMPLES 84
APPENDIX 8-A. USING RANKED SET SAMPLING 87
9. ADAPTIVE CLUSTER SAMPLING 103
9.1 OVERVIEW 103
9.2 APPLICATION 103
9.3 BENEFITS 104
9.4 LIMITATIONS 104
9.5 IMPLEMENTATION 106
9.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS 108
9.7 EXAMPLE 109
APPENDIX 9-A. ESTIMATORS OF MEAN AND VARIANCE Ill
10. COMPOSITE SAMPLING 119
10.1 OVERVIEW 119
10.2 COMPOSITE SAMPLING FOR ESTIMATING A MEAN 122
10.2.1 Overview 122
10.2.2 Application 124
10.2.3 Benefits 125
10.2.4 Limitations 125
10.2.5 Implementation 127
10.2.6 Relationship to Other Sampling Designs 130
10.2.7 Examples 133
10.3 COMPOSITE SAMPLING FOR ESTIMATING A POPULATION
PROPORTION 133
10.3.1 Overview 133
10.3.2 Application 134
10.3.3 Benefits 135
10.3.4 Limitations 135
10.3.5 Implementation 135
10.3.6 Relationship to Other Sampling Designs 137
10.3.7 Examples 137
APPENDIX 10-A. COST AND VARIANCE MODELS 138
APPENDIX 10-B. ESTIMATING A POPULATION PROPORTION 141
Final
EPA QA/G-5S V December 2002
-------
Page
11. COMPOSITE SAMPLING FOR IDENTIFYING A TRAIT AND EXTREME
SAMPLING UNITS 143
11.1 COMPOSITE SAMPLING FOR IDENTIFYING A TRAIT 143
11.1.1 Overview 143
11.1.2 Application 144
11.1.3 Benefits 145
11.1.4 Limitations 145
11.1.5 Implementation 145
11.1.6 Relationship to Other Sampling Designs 149
11.1.7 Examples 151
11.2 COMPOSITE SAMPLING AND RETESTING FOR IDENTIFYING
EXTREME SAMPLING UNITS 151
11.2.1 Overview 151
11.2.2 Application 153
11.2.3 Benefits 153
11.2.4 Limitations 153
11.2.5 Implementation 153
11.2.6 Relationship to Other Sampling Designs 154
GLOSSARY OF TERMS 155
BIBLIOGRAPHY 161
Final
EPAQA/G-5S vi December 2002
-------
FIGURES
Page
1-1. Site Map for Old Lagoon 2
1-2. Life-cycle of Data in the EPA Quality System 4
2-1. Inferences Drawn from Judgmental versus Probabilistic Sampling Designs 11
2-2. Simple Random Sampling 12
2-3. Stratified Sampling 13
2-4. Systematic/Grid Sampling 14
2-5. Adaptive Cluster Sampling 15
2-6. Composite Sampling 15
3-1. The DQO Process 18
3-2. Factors in Selecting a Sampling Design 20
3-3. The Sampling Design Process 22
5-1. Example of a Map Showing Random Sampling Locations 37
5-2. A One-Dimensional Sample of Cross-Sections from a Waste Pile 38
5-3. A Two-Dimensional Sample of Cores from a Waste Pile 39
5-4. Illustration of a Quasi-Random Sample 39
6-1. Stratification of Area to Be Sampled 55
7-1. Systematic Designs for Sampling in Space 63
7-2. Choosing a Systematic Sample ofn = 4 Units from a Finite Population of N = 15 Units .... 64
7-3. Locating a Square Grid Systematic Sample 70
7-4. Map of an Area to Be Sampled Using a Triangular Sampling Grid 72
8-1. Using Ranked Set Sampling to Select Three Locations 79
9-1. Population Grid with Initial and Follow-up Samples and Areas of Interest 107
9-2. Follow-up Sampling Pattern 108
9-3. Comparison of Initial Sample with Final Sample 108
9-4. Illustration of an Ideal Situation for Adaptive Cluster Sampling 109
10-1. Equal Volume, Equal Allocation Compositing 119
11-1. Illustration of Retesting Schemes for Classifying Units When 3 of 32
Units are Positive 152
Final
EPAQA/G-5S vii December 2002
-------
TABLES
Page
1-1. Potential Benefits for Users 3
2-1. Probability-based versus Judgmental Sampling Designs 10
2-2. Sampling Designs Presented in this Guidance 12
3-1. Choosing the Appropriate Sampling Design for Your Problem 24
5-1. Sample Size Needed for One-Sample t-test 44
5-2. Sample Size Needed for a One-Sample Test for a Population Proportion, P,
at a 5% Significance Level 45
5-3. Sample Size Needed for a One-Sample Test for a Population Proportion, P,
at a 10% Significance Level 46
5-4. Sample Size Needed for a Two-Sample t-Test 47
5-5. Sample Size Needed for a Two-Sample Test for Proportions
at a 5% Significance Level 48
5-6. Sample Size Needed for a Two-Sample Test for Proportions
at a 10% Significance Level 49
6-1. Summary Statistics for Simple and Stratified Random Samples 56
6-2. Number of Samples Needed to Produce Various Levels of Precision for the Mean 56
8-1. Comparing the Number of Samples for Laboratory Analysis Using Ranked Set Sampling ..81
8-2. The Approximate Cost Ratio for Estimating the Mean 88
8-3. Approximate Cost Ratio for Estimating the Mean when On-site Measurements
Are Used to Rank Field Locations 89
8-4. Relative Precision (RP) of Balanced Ranked Set Sampling to Simple Random
Sampling for Lognormal Distributions 92
8-5. Optimal Values of t for Determining the Number of Samples for Laboratory
Analysis Needed for an Unbalanced Ranked Set Sampling Design 97
8-6. Correction Factors for Obtaining Relative Precision Values 98
9-1. Comparison of Designs 105
10-1. When to Use Composite Sampling — Four Fundamental Cases 121
10-2. Criteria for Judging Benefits of Composite Sampling 123
10-3. Optimal k Values for Estimating a Population Mean 129
10-4. Optimal k for Estimating p and Approximate Confidence Intervals for p 137
10-5. Components of Cost and Variance for Random Samples - With and Without
Composite Sampling 139
11-1. Identification of Composite Sampling and Retesting Schemes for Classifying
Units Having a Rare Trait 147
11-2. Optimal Number of Samples per Composite for Exhaustive Retesting 148
11-3. Optimal Number of Samples per Composite for Sequential Retesting 149
11-4. Optimal Values of A: for Binary Split Retesting 150
Final
EPA QA/G-5S viii December 2002
-------
BOXES
Page
1-1. Questions that this Document Will Help to Address 2
10-1. Example of Benefits of Composite Sampling 126
10-2. Directions for Selecting Equal Allocation, Equal Volume Composite Samples for
Estimating a Mean 128
10-3. Example: Compositing for Estimating a Site Mean 131
10-4. Directions for Composite Sampling for Estimating the Proportion of a Population
with a Given Trait 136
11-1. Generic Algorithm for use with the Various Schemes 146
Final
EPA QA/G-5S k December 2002
-------
Final
EPA QA/G-5S X December 2002
-------
CHAPTER 1
INTRODUCTION
This document provides guidance on how to create sampling designs to collect environmental
measurement data. This guidance describes several relevant basic and innovative sampling designs, and
describes the process for deciding which design is right for a particular application.
1.1 WHY IS SELECTING AN APPROPRIATE SAMPLING DESIGN IMPORTANT?
The sampling design is a fundamental part of data collection for scientifically based decision
making. A well-developed sampling design plays a critical role in ensuring that data are sufficient to
draw the conclusions needed.1 A sound, science-based decision is based on accurate information. To
generating accurate information about the level of contamination in the environment, you should
consider the following:
• the appropriateness and accuracy of the sample collection and handling method,
• the effect of measurement error,
• the quality and appropriateness of the laboratory analysis, and
• the representativeness of the data with respect to the objective of the study.
Of these issues, representativeness is addressed through the sampling design.
Representativeness may be considered as the measure of the degree to which data accurately and
precisely represent a characteristic of a population, parameter variations at a sampling point, a process
condition, or an environmental condition [American National Standards Institute/American Society for
Quality Control (ANSFASQC) 1994]. Developing a sampling design is a crucial step in collecting
appropriate and defensible data that accurately represent the problem being investigated.
For illustration, consider Figure 1-1, a site map for a dry lagoon formerly fed by a pipe.
Assuming that good field and laboratory practices are exercised and adequate quality control is
implemented, the analytical results of soil samples drawn from randomly located sites A, B, and C may
be representative if the objective is to address whether the pipe has released a particular contaminant.
However, these data are not representative if the objective is to estimate the average concentration
level of the entire old lagoon. For that estimation, random sampling locations should be generated from
Note: Sampling design is not the only important component. The methods used in sample handling and extraction
are equally important to the quality of the data. The United States Environmental Protection Agency produces
extensive guidance on sampling methods and field sampling techniques for different regulations, regions, and
programs that are not addressed in this document. In addition, measurement error affects the ability to draw
conclusions from the data. Guidance on Data Quality Indicators (QA/G-Si) (EPA, 2001) contains information on
this issued
Final
EPA QA/G-5S 1 December 2002
-------
the entire site of the old lagoon (for example, perhaps
including samples at D, E, and F). If a sampling
design results in the collection of nonrepresentative
data, even the highest quality laboratory analysis
cannot compensate for the lack of representative data.
The selection of the appropriate sampling design is
necessary in order to have data that are representative
of the problem being investigated.2
This document provides technical guidance on
specific sampling designs that can be used to improve
the quality of environmental data collected. Based in
Location of Pipe
\
Old Lagoon (now dry)
Figure 1-1. Site Map for Old Lagoon
statistical theory, each chapter explains the benefits and drawbacks of each design and describes
relevant examples of environmental measurement applications. To choose a sampling design that
adequately addresses the estimation or decision at hand, it is important to understand what relevant
factors should be considered and how these factors affect the choice of an appropriate sampling design.
1.2 WHAT TYPES OF QUESTIONS WILL THIS GUIDANCE ADDRESS?
Often it is difficult in practice to know how to answer questions regarding how many samples to
take and where they should be taken. The development of a sampling design will answer these
questions after considering relevant issues, such as variability. Box 1-1 outlines the questions that are
relevant to choosing a sampling design.
Box 1-1. Questions that this Document Will Help to Address
What aspects of the problem should be considered for creating a sampling design?
What are the types of designs that are commonly used in environmental sampling?
What are some innovative designs that may improve the quality of the data?
Which designs suit my problem?
How should I design my sampling to provide the right information for my problem given a
limited budget for sampling?
How do I determine how much data are needed to make a good decision?
Note: The problem of what constitutes "representativeness" is complex and further discussion may be
found in Guidance on Data Quality Indicators Peer Review Draft (QA/G-5i) (EPA, 2001).
EPA QA/G-5S
Final
December 2002
-------
1.3 WHO CAN BENEFIT FROM THIS DOCUMENT?
This document will be usefiil to anyone planning data collection from any type of environmental
media including soil, sediment, dust, surface water, groundwater, air, vegetation, and sampling in indoor
environments. The document contains information that will help those who are not extremely familiar
with statistical concepts as well as those who are more comfortable with statistics. To this end, varying
degrees of detail are provided on the various sampling designs, which should be used according to
ability. The potential benefits for different types of users are shown in Table 1-1. This document is
meant to apply to all environmental media; examples in this document provides information on
innovative designs not discussed in earlier EPA documents.
The guidance document is designed for users who are not necessarily well versed in statistics.
The document is written in plain language, and is designed to minimize technical jargon and provide
useful explanations for those who might not already be familiar with the concepts described. In some
chapters, more advanced material and more advanced references have been provided for statisticians -
these have been marked as "more advanced."
Table 1-1. Potential Benefits for Users
Potential User
Benefit to the User
Environmental Scientist
or Environmental
Engineer who is planning
the sampling or Project
Manager planning the
investigation and reviewing
the sampling plan
An understanding of various sampling designs and the conditions
under which these designs are appropriate
An understanding of how sampling design affects the quality of the
data and the ability to draw conclusions from the data
An understanding of the appropriate uses of professional judgment
The information needed to choose designs that may increase the
quality of the data at the same cost as compared to typical
sampling approaches (for example, Ranked Set Sampling)
Risk Assessor or Data
Analyst who will be using
the data
An understanding of the advantages and limitations of data
collected using various sampling designs
The ability to draw scientifically based conclusions from data based
on different types of designs
The ability to match assessment tools to the sampling design used
Statistician assisting with
the development and
review of the sampling plan
Tables, figures, and text that will help communicate important
information about choosing a sampling design to colleagues
working on the design who are not well versed in statistics
Advanced references to support more complex design
development
EPA QA/G-5S
Final
December 2002
-------
1.4 HOW DOES THIS DOCUMENT FIT INTO THE EPA QUALITY SYSTEM?
Analysts should use systematic planning in order to collect data that will allow them to draw
scientifically based conclusions. There are many cases in which data have been collected, but when the
decision maker examines the data to draw conclusions, he or she finds that the data do not match the
needs of the decision. Such problems can be avoided by using a systematic planning process to design
the data collection. This process accounts for user's needs before the data are collected.
When data are being used in direct support of a decision, the Agency's recommended
systematic pinning tool is the Data Quality Objectives (DQO) Process as described in EPA 2000b.
For systematic planning of environmental data collection, EPA prefers the Data Quality Objectives
(DQO) process described in the data quality objectives guidance (EPA, 2000b). A sampling design is
chosen in Step 7 of the DQO Process based on the parameters specified in the other steps in the DQO
Process. In this guidance, the activities of DQO Step 7 are explained in Chapter 3 (i.e., the process of
choosing a sampling design), and a full discussion of the factors that should be considered in Step 7 of
the DQO Process is given in Section 3.2.
Figure 1-2 illustrates the life-cycle of environmental data in the EPA Quality System. The
process begins with systematic planning. Developing a sampling design is the last step in systematic
planning, and is explained briefly in Step 7 of Guidance for the Data Quality Objectives Process
(QA/G-4) (EPA, 2000b). This guidance document on sampling design is intended to expand greatly on
the general details provided in that guidance. Information from the other steps in the systematic
planning process are used as input to developing the sampling design. This process is described in
detail in Chapter 3 of this guidance.
PLANNING
ASSESSMENT
Figure 1-2. Life-cycle of Data in the EPA Quality System
EPA QA/G-5S
Final
December 2002
-------
Data Quality Indicators (DQIs) are specific calculations that measure performance as
reflected in the DQOs and performance and acceptance criteria. DQIs include precision, accuracy,
representativeness, completeness, consistency, and sensitivity, and are discussed at length in Guidance
on Data Quality Indicators (QA/G-Si) (EPA, 2001). The choice of sampling design will have an
impact on the DQIs. These indicators are addressed specifically for each project in the details of the
Quality Assurance (QA) Project Plan.
The development of a sampling design is followed by the development of a QA Project Plan.
A process for developing a QA Project Plan is described in Guidance for Quality Assurance Project
Plans (QA/G-5) (EPA, 1998b).
After the QA Project Plan is developed and approved, data are collected during the
study/experimental phase according to the plan. Quality is further assured by the use of standard
operating procedures and audits (technical assessment). Finally, verification, validation, and quality
assessment of the data complete the quality system data collection process.
1.5 WHAT SOFTWARE SUPPLEMENTS THIS GUIDANCE?
Visual Sampling Plan (VSP) is a software tool that contains some of the sampling plans
discussed in this guidance. VSP supports the implementation of the DQO Process by visually
displaying different sampling plans, Unking them to the DQO Process, and determining the optimal
sampling specifications to protect against potential decision errors. This easy-to-use program is highly
visual, very graphic, and intended for use by non-statisticians. VSP may be obtained from
http://dqo.pnl.gov.vsp.
1.6 WHAT ARE THE LIMITATIONS OR CAVEATS TO THIS DOCUMENT?
The scope of this document is limited to environmental measurement data. It does not explicitly
address count data, survey (questionnaire) data, human exposure data, or experimental data collection,
although some of the concepts described here are applicable to these types of studies. This guidance
does not provide a complete catalogue of potential sampling designs used by EPA. These guidelines
do not supercede regulatory requirements for specific types of sampling design, nor regional, state, or
program guidance; rather, they are intended to supplement other guidance.
In addition, there are sampling designs that might be used in environmental data collection that
are not discussed in this document. For example, double sampling, sequential sampling, quota
sampling, and multi-stage sampling are all designs that are used for environmental data collection.
Information on these designs can be found in other resources on sampling designs.
Final
EPA QA/G-5S 5 December 2002
-------
1.7 HOW IS THIS DOCUMENT ORGANIZED?
This document is designed to be used as a reference rather than be read from beginning to end
First-time users will probably want to skim Chapter 2 and read Chapter 3 before continuing to other
chapters. Chapter 2 defines important concepts and terms, and introduces the types of sampling
designs covered in this document, along with information on what specific types of situations call for
which designs. Chapter 3 describes the process of developing a sampling design and discusses how
input from a systematic planning process affects the choice of a sampling design.
The remaining chapters contain specific information about different sampling designs or
protocols. Each chapter is formatted in a similar style to allow the reader to easily find information. A
synopsis of the benefits and limitations of the design can be found in each chapter, so that readers can
evaluate each design in light of their specific situation. Each chapter also contains at least one example
and descriptions of applications of this design, where possible. Finally, each chapter has an appendix
containing formulae and additional technical information.
Some designs are often used in conjunction with other designs; descriptions and examples of
these types of studies are included. At the end of the document, a glossary defines key terms and a list
of references contains citations for all referenced material and other materials used in developing this
document.
The level of detail provided in the chapters varies based on the complexity of the design. For
simpler designs, the chapter provides relatively complete information regarding how and when to
implement this approach. For more complex designs, a general discussion is provided, along with
references that can provide more information for the interested reader. It is assumed that a statistician
would need to be involved in the development process for the more complex designs.
Final
EPA QA/G-5S 6 December 2002
-------
CHAPTER 2
OVERVIEW OF SAMPLING DESIGNS
2.1 OVERVIEW
What does a sampling design consist of?
A complete sampling design indicates the number of samples and identifies the particular
samples (for example, the geographic positions where these samples will be collected or the time points
when samples will be collected). Along with this information, a complete sampling design will also
include an explanation and justification for the number and the positions/timings of the samples. For a
soil sample, the samples may be designated by longitude and latitude, or by measurements relative to an
existing structure. For air or water measurements, the samples would be designated by longitude and
latitude as well as by time. For example, for the measurement of particulates in air, a specified length of
time would be set, such as 24 hours, in addition to the geographical location. The sampling design
would note what time the air sample collection would begin (for example, 12:00 midnight on February
10,2001), and when it would end (for example, 12:00 midnight on February 11,2001). The
measurement protocol would then specify when the sampler would be retrieved and how the sample
would be analyzed.
What is the purpose of a sampling design?
The goals of a sampling design can vary widely. Typical objectives of a sampling design for
environmental data collection are:
• To support a decision about whether contamination levels exceed a threshold of
unacceptable risk,
• To determine whether certain characteristics of two populations differ by some amount,
• To estimate the mean characteristics of a population or the proportion of a population
that has certain characteristics of interest,
• To identify the location of "hot spots" (areas having high levels of contamination) or
plume delineation,
• To characterize the nature and extent of contamination at a site, or
• To monitor trends in environmental conditions or indicators of health.
A well-planned sampling design is intended to ensure that resulting data are adequately
representative of the target population and defensible for their intended use. Throughout the sampling
design process, the efficient use of time, money, and human resources are critical considerations. A
good design should meet the needs of the study with a minimum expenditure of resources. If resources
Final
EPA QA/G-5S 7 December 2002
-------
are limited or these are multiple objectives, tradeoffs may need to be made in the design. More
information on how to go about doing this is contained in Chapter 3 on the sampling design process.
2.2 SAMPLING DESIGN CONCEPTS AND TERMS
Defining the population is an important step in developing a sampling plan. The target
population is the set of all units that comprise the items of interest in a scientific study, that is, the
population about which the decision maker wants to be able to draw conclusions. The sampled
population is that part of the target population that is accessible and available for sampling. For
example, the target population may be defined as surface soil in a residential yard, and the sampled
population may be areas of soil in that yard not covered by structures or vegetation. Ideally, the
sampled population and the target population are the same. If they are not, then professional judgment
is used to verify that data drawn from the sampled population is appropriate for drawing conclusions
about the target population.
A sampling unit is a member of the population that may be selected for sampling, such as
individual trees, or a specific volume of air or water. It is important for study planners to be very
specific when defining a sampling unit's characteristics with respect to space and time. A sampling unit
should detail the specific components of a particular environmental media, for example, 10 cubic meters
(m3) of air passing through a filter located in downtown Houston on July 15,2000. Some
environmental studies have distinct sampling units such as trees, fish, or drums of waste material.
However, such distinct sampling units may not be available in environmental studies requiring samples of
soil, water, or other solid or liquid media. In this case, the sampling units are defined by the investigator
and need to be appropriate for selecting a representative sample of material from the medium of
interest. The physical definition of a sampling unit in terms of its "size, shape, and orientation" is
referred to as the sample support (Starks, 1986). The sampling frame is a list of all the possible
sampling units from which the sample can be selected. The sample is a collection of some of these
sampling units.
Sample support represents that portion of the sampling unit, such as an area, volume, mass, or
other quantity, that is extracted in the field and subjected to the measurement protocol (see definition
below). It is a characteristic of a sample describing its relationship to the entity from which it was
taken. It represents an area, mass, volume within the sampling unit. For example, if a sampling unit is a
single tree, the sample support could be a core from the base of the tree. Or, if a sample unit is 10
grams of soil from a particular x-y coordinate, the sample support might be 1 gram of this soil after
homogenization. Smaller sample support usually results in greater sampling variation (i.e., greater
variability between sampling units) [see Section 21.5.3 of Pitard (1993)]. For example, soil cores with
a 2-inch diameter and 6-inch depth usually have greater variability in contaminant concentrations than
cores with a 2-inch diameter and 5-foot depth, much like composite samples have less variability than
Final
EPAQA/G-5S 8 December 2002
-------
individual specimens (see Chapter 9). Hence, the study objectives need to clearly define the sample
support in order for the results (for example, sample mean and variance) to be clearly interpretable.
Once a sampling unit is selected, a measurement protocol is applied; a measurement protocol
is a specific procedure for making observations or performing analyses to determine the characteristics
of interest for each sampling unit. The measurement protocol would include the procedures for
collecting a physical sample, handling and preparing the physical sample, applying an analytical method
(including the sample preparation steps) to obtain a result (that is, to obtain the data for the sample),
and protocol for resampling if necessary. If compositing of the samples is employed (so that
measurements are made on the composites), then the measurement protocol would also include a
composite sampling protocol, which indicates how many composites are to be formed, how many
samples comprise each composite, and which samples are used to form each composite; the
compositing protocol would also prescribe the compositing procedures (for example, for
homogenization, for taking aliquots). The sampling design specifies the number, type, and location
(spatial and/or temporal) of sampling units to be selected for measurement.
A water sampling example illustrates how these terms relate to one another. Consider a study
designed to measure E. coli and enter cocci levels in a specific swimming area of a lake. The target
population is the water flowing through this area (delineated by buoys) from May 1 until September 15.
The sampled population will be the water in the swimming area at 7 am and 2 p.m at approximately 6
inches below the surface. The sampling units chosen for the study consist of 1-liter volumes of water at
particular locations in the swimming area. In this case, the sample support is equal to the sampling unit,
1 liter of water. The measurement protocol calls for the use of a 2-liter beaker, held by a 6-inch
handle. The sampler needs a nonmotorized boat (for example, a rowboat) to collect the sample so as
to minimize the disturbance to the water. The sample is collected in the specified manner and poured
into a 2-liter sample jar, up to the 1-liter line. The rest of the water in the beaker is discarded back into
the lake. Each 1-liter container of water is taken to the lab for analysis within 6 hours and is analyzed
according to current state standards. The sampling design calls for obtaining a minimum of two samples
on each sampling day at 7 a.m. and 2 p.m or up to three times a day when there are indications of
increased potential for contamination (for example, heavy rainfall). Sampling days are defined in the
study and may be every day, every other day, or whatever frequency is appropriate for the particular
problem at hand. The sampling design also specifies the exact locations where the samples should be
drawn, which in this case were chosen at random.
Another important concept for sampling design is the conceptual model At the outset of data
collection activities, it is critical to develop an accurate conceptual model of the potential hazard. A
conceptual model describes the expected source of the contaminant and the size and breadth of the
area of concern, identifies the relevant environmental media and the relevant fate and transport
pathways, and defines the potential exposure pathways. The model should also identify potential
Final
EPAQA/G-5S 9 December 2002
-------
sources of variability in the data (for example, inherent variability among sampling units in the population
and variability associated with selecting and analyzing samples).
2.3 PROBABILISTIC AND JUDGMENTAL SAMPLING DESIGNS
There are two main categories of sampling designs: probability-based designs and judgmental
designs. Probability-based sampling designs apply sampling theory and involve random selection of
sampling units. An essential feature of a probability-based sample is that each member of the
population from which the sample was selected has a known probability of selection. When a
probability-based design is used, statistical inferences may be made about the sampled population from
the data obtained from the sampling units. That is, when using a probabilistic design, inferences can be
drawn about the sampled population, such as the concentration of fine particulate matter (PM2 5) in
ambient air in downtown Houston on a summer day, even though not every single "piece" of the
downtown air is sampled. Judgmental sampling designs involve the selection of sampling units on
the basis of expert knowledge or professional judgment.
Table 2-1 summarizes the main features of each main type of sampling design. Section 2.3.1
introduces judgmental sampling, and Chapter 4 contains more information on the benefits and limitations
of this design. Sections 2.3.2 through 2.3.7 introduce the six probabilistic sampling designs, and
Chapters 5 through 10 describe these in more detail. Reviewing these chapters will provide more
details about the appropriate use of these designs.
Table 2-1. Probability-based versus Judgmental Sampling Designs
Probability-based
Judgmental
-------
Objective: Estimate the average
concentration of pesticide chlorpyrifos in
the apples grown on this apple orchard
TARGET POPULATION: Fruit to
be consumed from this orchard
CONSIDER PRACTICAL CONSTRAINTS:
apples may not be consumed for various
but because this is not predictable, all
iit growing in this orchard is eligible for sampling
3PU
SAMPLED POPULATION:
All fruit growing in orchard that is
to be processed for consumption
Judgmental Sampling ' Probability Sampling
Determine where to take Determine where to take
samples using personal opinion samp|es statistically
Select measurement
protocol
Collect sample
each ends with data collection
and analysis. The difference is
seen when moving up the
diagram, which shows how
conclusions can be drawn about
the sampled and target
populations.
When using probabilistic
sampling, the data analyst can
draw quantitative conclusions
about the sampled population.
That is, in estimating a parameter
(for example., the mean), the
analyst can calculate a 95%
confidence interval for the
parameter of interest. If
comparing this to a threshold, the
analyst can state whether the data
indicate that the concentration
exceeds or is below the threshold
with a certain level of confidence.
Expert judgment is then used to
draw conclusions about the target
population based on the statistical
findings about the sampled population. Expert judgment can also be used in other aspects of
probabilistic sampling designs, such as defining strata in a stratified design. Such uses of expert
judgment will be discussed in more detail in relevant sampling design chapters.
When using judgmental sampling, statistical analysis cannot be used to draw conclusions
about the target population. Conclusions can only be drawn on the basis of professional judgment. The
usefulness of judgmental sampling will depend on the study objectives, the study size and scope, and the
degree of professional judgment available. When judgmental sampling is used, quantitative statements
about the level of confidence in an estimate (such as confidence intervals) cannot be made.
2.4 TYPES OF SAMPLING DESIGNS
This guidance describes six sampling designs and one sampling protocol (i.e., composite
sampling). Most of these designs are commonly used in environmental data collection. Some are
designs that are not as commonly used but have great potential for improving the quality of
1
Select measurement
protocol
Collect sample
units
Measure units and
generate data
Inspect data
units
t
Measure units and
generate data
Analyze data
Figure 2-1. Inferences Drawn from Judgmental versus
Probabilistic Sampling Designs
EPA QA/G-5S
11
Final
December 2002
-------
environmental data. Table 2-2 identifies the sampling designs discussed in this document, and indicates
which chapter contains detailed information on each design. This section briefly describes each design,
providing some information about the type of applications for which each design is especially
appropriate and useful.
Table 2-2. Sampling Designs Presented in this Guidance
Sampling Design/Protocol
Judgmental
Simple Random
Stratified
Systematic and Grid
Ranked Set
Adaptive Cluster
Composite
Chapter
4
5
6
7
8
9
10,11
Use
Common
Common
Common
Common
Innovative
Innovative
Common
2.4.1 Judgmental Sampling
In judgmental sampling, the selection of sampling units (i.e., the number and location and/or
timing of collecting samples) is based on knowledge of the feature or condition under investigation and
on professional judgment. Judgmental sampling is distinguished from probability-based sampling in that
inferences are based on professional judgment, not statistical scientific theory. Therefore, conclusions
about the target population are limited and depend entirely on the validity and accuracy of professional
judgment; probabilistic statements about parameters are not possible. As described in subsequent
chapters, expert judgment may also be used in conjunction with other sampling designs to produce
effective sampling for defensible decisions.
2.4.2 Simple Random Sampling
In simple random sampling, particular sampling units (for example, locations and/or times) are
selected using random numbers, and all possible selections of a given number of units are equally likely.
For example, a simple random sample of a set of drums can be taken by
numbering all the drums and randomly selecting numbers from that list or by
sampling an area by using pairs of random coordinates. This method is easy
to understand, and the equations for determining sample size are relatively
straightforward. An example is shown in Figure 2-2. This figure illustrates a
possible simple random sample for a square area of soil. Simple random
sampling is most useful when the population of interest is relatively
homogeneous; i.e., no major patterns of contamination or "hot spots" are
expected. The main advantages of this design are:
Figure 2-2. Simple
Random Sampling
EPA QA/G-5S
12
Final
December 2002
-------
(1) It provides statistically unbiased estimates of the mean, proportions, and variability.
(2) It is easy to understand and easy to implement.
(3) Sample size calculations and data analysis are very straightforward.
In some cases, implementation of a simple random sample can be more difficult than some other
types of designs (for example, grid samples) because of the difficulty of precisely identifying random
geographic locations. Additionally, simple random sampling can be more costly than other plans if
difficulties in obtaining samples due to location causes an expenditure of extra effort.
2.4.3 Stratified Sampling
In stratified sampling, the target population is separated into nonoverlapping strata, or
subpopulations that are known or thought to be more homogeneous (relative to the environmental
medium or the contaminant), so that there tends to be less variation among sampling units in the same
stratum than among sampling units in different strata. Strata may be chosen on the basis of spatial or
temporal proximity of the units, or on the basis of preexisting information or professional judgment
about the site or process. Figure 2-3 depicts a site that was stratified on the basis of information about
how the contaminant is present based
on wind patterns and soil type and on
the basis of surface soil texture. This
design is useful for estimating a
parameter when the target population is
heterogeneous and the area can be
subdivided based on expected
contamination levels. Advantages of
this sampling design are that it has
potential for achieving greater precision
in estimates of the mean and variance,
and that it allows computation of reliable
estimates for population subgroups of
special interest. Greater precision can
be obtained if the measurement of
interest is strongly correlated with the
variable used to make the strata.
Radius = 500 m
Figure 2-3. Stratified Sampling
2.4.4 Systematic and Grid Sampling
In systematic and grid sampling, samples are taken at regularly spaced intervals over space or
time. An initial location or time is chosen at random, and then the remaining sampling locations are
defined so that all locations are at regular intervals over an area (grid) or time (systematic). Examples
EPA QA/G-5S
13
Final
December 2002
-------
of systematic grids include square, rectangular, triangular, or radial grids [Section 16.6.2 of Myers
(1997)].
In random systematic sampling, an initial sampling location (or time) is chosen at random and
the remaining sampling sites are specified so that they are located according to a regular pattern
(Cressie, 1993) for example, at the points identified by the intersection of each line in one of the grids
shown in Figure 2-4. Systematic and grid
sampling is used to search for hot spots and
to infer means, percentiles, or other
parameters and is also useful for estimating
spatial patterns or trends over time. This
design provides a practical and easy
method for designating sample locations
and ensures uniform coverage of a site, unit,
I
7
L
Systematic Gnd Sampling - Square Gnd
Systematic Gnd Sampling - Tnangular Gnds
or process.
2.4.5 Ranked Set Sampling
Figure 2-4. Systematic/Grid Sampling
Ranked set sampling is an innovative design that can be highly useful and cost efficient in
obtaining better estimates of mean concentration levels in soil and other environmental media by
explicitly incorporating the professional judgment of a field investigator or a field screening measurement
method to pick specific sampling locations in the field. Ranked set sampling uses a two-phase sampling
design that identifies sets of field locations, utilizes inexpensive measurements to rank locations within
each set, and then selects one location from each set for sampling.
In ranked set sampling, m sets (each of size r) of field locations are identified using simple
random sampling. The locations are ranked independently within each set using professional judgment
or inexpensive, fast, or surrogate measurements. One sampling unit from each set is then selected
(based on the observed ranks) for subsequent measurement using a more accurate and reliable (hence,
more expensive) method for the contaminant of interest. Relative to simple random sampling, this
design results in more representative samples and so leads to more precise estimates of the population
parameters.
Ranked set sampling is useful when the cost of locating and ranking locations in the field is low
compared to laboratory measurements. It is also appropriate when an inexpensive auxiliary variable
(based on expert knowledge or measurement) is available to rank population units with respect to the
variable of interest. To use this design effectively, it is important that the ranking method and analytical
method are strongly correlated.
EPA QA/G-5S
14
Final
December 2002
-------
2.4.6 Adaptive Cluster Sampling
In adaptive cluster sampling, n samples are taken using simple random sampling, and additional
samples are taken at locations where measurements exceed some threshold value. Several additional
rounds of sampling and analysis may be needed. Adaptive cluster sampling tracks the selection
probabilities for later phases of sampling so that an unbiased estimate of the population mean can be
calculated despite oversampling of certain areas. An example application of adaptive cluster sampling
is delineating the borders of a plume of contamination.
Initial and final adaptive
sampling designs are shown in Figure
2-5. Initial measurements are made
of randomly selected primary
sampling units using simple random
sampling (designated by squares in
Figure 2-5). Whenever a sampling
unit is found to show a characteristic
of interest (for example, contaminant
concentration of concern, ecological
effect as indicated by the shaded
areas in the figure), additional
sampling units adjacent to the original
Population Grid with Shaded Areas of
Interest and Initial Simple Random Sample
Final Adaptive Cluster Sampling Results
X = Sampling unit
Figure 2-5. Adaptive Cluster Sampling
unit are selected, and measurements are made.
Adaptive sampling is useful for estimating or searching
for rare characteristics in a population and is appropriate for
inexpensive, rapid measurements. It enables delineating the
boundaries of hot spots, while also using all data collected with
appropriate weighting to give unbiased estimates of the
population mean.
2.4.7 Composite Sampling
hi composite sampling (illustrated in Figure 2-6),
volumes of material from several of the selected sampling units
are physically combined and mixed in an effort to form a single
homogeneous sample, which is then analyzed. Compositing can be very cost effective because it
reduces the number of chemical analyses needed. It is most cost effective when analysis costs are large
relative to sampling costs; it demands, however, that there are no safety hazards or potential biases (for
example, loss of volatile organic components) associated with the compositing process.
IntvicLial 5arrol»c
Al quctfcto ji> iii
Figure 2-6. Composite Sampling
EPA QA/G-5S
15
Final
December 2002
-------
Compositing is often used in conjunction with other sampling designs when the goal is to
estimate the population mean and when information on spatial or temporal variability is not needed. It
can also be used to estimate the prevalence of a rare trait. If individual aliquots from samples
comprising a composite can be retested on a new portion, retesting schemes can be combined with
composite sampling protocols to identify individual units that have a certain trait or to determine those
particular units with the highest contaminant levels.
Final
EPAQA/G-5S 16 December 2002
-------
CHAPTERS
THE SAMPLING DESIGN PROCESS
3.1 OVERVIEW
What are the objectives of the sampling design process?
The sampling design process should match the needs of the project with the resources
available. The needs generally consist of the study objectives and the tolerable limits on uncertainty.
The resources may include personnel, time, and availability of financial resources. The goal of the
process is to use all of the information available so that the data collected meets the needs of the
decision maker.
Who is typically involved in the sampling design process?
The sampling design process typically includes a multi-disciplinary group (such as a DQO
development team) that is involved in systematic planning at the beginning and at key review points.
This team should include the decision maker or end user of the data. More rigorous technical activities
will likely be performed by statisticians or by environmental scientists or engineers who have training
and experience in environmental statistics.
3.2. INPUTS TO THE SAMPLING DESIGN PROCESS
What outputs from the systematic planning process are incorporated into the sampling design
process?
It is EPA policy (EPA, 2000c) that all EPA organizations use a systematic planning process to
develop acceptance or performance criteria for the collection, evaluation, or use of environmental data.
Systematic planning identifies the expected outcome of the project, the technical goals, the cost and
schedule, and the acceptance criteria for the final result. The Data Quality Objectives (DQO) Process
is the Agency's recommended planning process when data are being used to select between two
opposing conditions, such as decision-making or determining compliance with a standard. The outputs
of this planning process (the data quality objectives themselves) define the performance criteria. The
DQO Process is a seven-step planning approach based on the scientific method that is used to prepare
for data collection activities such as environmental monitoring efforts and research. It provides the
criteria that a sampling design should satisfy, where to collect samples; tolerable decision error rates;
and the number of samples to collect.
Final
EPAQA/G-5S 17 December 2002
-------
DQOs are qualitative and quantitative statements, developed in the first six steps of the
DQO Process (Figure 3-1), that define the purpose for the data collection effort, clarify the kind of data
needed, and specify the limits on decision errors needed for the study. These outputs are used in the
final DQO step to develop a sampling design that meets the performance criteria and other design
constraints. The DQO Process helps investigators ensure that the data collected are of the right type,
quantity, and quality needed to answer research questions or support environmental decisions, and
ensures that valuable resources are spent on collecting only those data necessary to support defensible
decisions.
The DQO Process is a systematic planning approach for data collection that is based on the
scientific method and uses a seven-step process. Although the DQO Process is typically described in
linear terms, it is really a flexible process that relies on iteration and modification as the planning team
works through each step, thus allowing early steps to be revised in light of information developed from
subsequent steps.
The Steps of the DQO Process
Step 1: State the Problem. This step
defines the problem clearly, identifies the primary
decision maker and planning team members, and
determines the available budget, personnel, and
schedule deadlines.
Step 2: Identify the Decision. The key
activities are to develop an appropriate decision
statement: identify the principal study question,
define alternative actions that could result from
resolving the principal study question, link the
principal study question to possible actions, and
organize multiple decisions.
Step 3: Identify the Inputs to the
Decision. These activities include identifying the
type and sources of information needed to resolve
the decision statement, identifying information
needed to establish the action level, and confirming
that suitable methods exist.
Step 4: Define the Boundaries of the
Study. This step specifies the characteristics that
Step 1. State the Problem
Define the problem; identify the planning team,
examine budget, srhediile.
i
Step 2. Identify the Decision
State decision; identify study question, define
alternative actions.
*
Step 3. Identify the Inputs to the Decision
Identify information needed for the decision (information
sources, basis for Action Level, sampling/analysis method).
*
Step 4. Define the Boundaries of the Study
Specify sample characteristics, define
spatial/temporal limits, units of decision making.
*
Step 5. Develop a Decision Rule
Define statistical parameter (mean, median), specify
Action Level, develop logic for action.
+
Step 6. Specify Tolerable Limits on Decision Error
Set acceptable limits for decision errors relative to
consequences (health effects, costs)
*
Step 7. Optimize the Design for Obtaining Data
Select resource-effective sampling and analysis plan that
meets the performance criteria.
Figure 3-1. The DQO Process
EPA QA/G-5S
18
Final
December 2002
-------
define the population of interest, defines the spatial and temporal boundaries, defines the scale of
decision making, and identifies any practical constraints on data collection.
Step 5: Develop a Decision Rule. This step develops a decision rule, a statement that
allows the decision maker a logical basis for choosing among alternative actions, by determining the
parameter of interest, action level, scale of decision making, and outlining alternative actions.
Step 6: Specify Tolerable Limits on Decision Errors. This step determines the decision
maker's tolerable limits on potential decision errors by identifying the decision errors and base-level
assumptions, specifying a range of possible parameter values where the consequences of decision
errors are relatively minor, and assigning probability values to the probability for the occurrence of
potential decision errors.
Step 7: Optimize the Design for Obtaining Data. This final step identifies a resource-
effective sampling design for data collection for generating data. This design is then expected to satisfy
the DQOs. Meeting or exceeding the DQOs is the goal of selection of sampling design.
By using the DQO Process, the planning team clarifies study objectives, defines the appropriate
types of data, and specifies tolerable levels of potential decision errors that will be used to establish the
quality and quantity of data needed to support decisions. Through this process, the planning team can
examine trade-offs between the uncertainty of results and cost of sampling and analysis in order to
develop designs that are acceptable to all parties involved. These are all important inputs to the
sampling design process.
What information will be needed to implement the sampling design process?
The information needed includes outputs from the systematic planning process (for example, the
outputs from Steps 1 through 6 of the DQO Process) and specific information about contributing
factors about the specific problem that could influence the choice of design. The categories of factors
that should be used in developing a sampling design are shown in Figure 3-2 and include:
Information About the Process or Area of Concern includes the conceptual model and any
additional information about the process or area (for example, any secondary data from the site that are
available, including results from any pilot studies).
Data Quality Information that is needed as input to the sampling design process is mainly from
the DQO Process and include:
• The purpose of the data collection—that is, hypothesis testing (evidence to reject or
support a finding that a specific parameter exceeds a threshold level, or evidence to
Final
EPAQA/G-5S 19 December 2002
-------
Choice of Sampling
Design
Information About The
Process or Area of Concern
Conceptual Model of the Potential
Environmental Hazard
> Size/Breadth of Area of Concern
f Media of concern
•> Distributions of Contaminant
»• Sources of Variability
> Chemical/Physical Properties of
Contaminant
Additional Information About the
Process or Area
Data Quality Information
Purpose of Data Collection
Spatial and Temporal Boundaries of Study
Preliminary Estimates of Variance
Statistical Parameter of Interest
Tolerance for Potential Decision Errors
Overall Precision Requirements (width of
the gray region)
Sample Support
Constraints
Sampling/Analysis Constraints
Time/Schedule Constraints
Geographical Constraints
Budget Constraints
Compositing Constraints
Figure 3-2. Factors in Selecting a Sampling Design
reject or support a finding that the specified parameters of two populations differ), estimating a
parameter with a level of confidence, or detecting hot spots (DQO Step 5).
• The target population and spatial/temporal boundaries of the study (DQO Step 4).
• Preliminary estimation of variance (DQO Step 4).
• The statistical parameter of interest, such as mean, median, percentile, trend, slope, or
percentage (DQO Step 5).
• Limits on decision errors and precision, in the form of false acceptance and false
rejection error rates and the definition of the gray region (overall precision
specifications) (DQO Step 6).
Constraints are principally sampling design and budget.
For more details on the DQO Process see Guidance on the Data Quality Objectives Process
(QA/G-4) (EPA, 2000b).
EPA QA/G-5S
20
Final
December 2002
-------
It is important to carefully consider early in the design phase the sample support of the data to
be collected and the proposed method of conducting the chemical analysis. The sample support is the
physical size, shape, and orientation of material that is extracted from the sampling unit and subjected to
the measurement protocol. In other words, the sample support comprises the portion of the sampling
unit that is actually available to be measured or observed, and therefore to represent the sampling unit.
Consequently, the sample support should be chosen so that the measurement protocol captures the
desired characteristics of the sampling unit, given the inherent qualities of and variability within the
sampling unit, and is consistent with the objectives of the study. The specification of sample support
also should be coordinated with the actual physical specifications of the chosen analytical method(s) to
ensure that a sufficient quantity of material is available to support the needed analyses. Usually, the
analytical method needs a much smaller amount of material than that needed for the sample support to
represent the sampling unit. In that case, the measurement protocol will specify how the sample
support will be processed and subsampled to yield the amount of material needed for analysis.
Some examples will help clarify how sample support relates to sampling units and analytical
methods. Consider a study that is designed to estimate average arsenic contamination in surface soil at
a site. The project team may decide to divide the site into square sampling units that are 3 meters on
each side and 10 centimeters deep. Given their knowledge of variability experienced at other sites, the
project team may decide that the sample support needed to properly characterize a sampling unit is the
area and volume of soil that can be obtained by taking 9 soil cores, each 15 cm in diameter and 10 cm
deep. Consider another example in which a study is designed to estimate average mercury
contamination in fish. The project team may decide that the sampling unit is an individual fish, and the
sample support is the type and mass offish tissue extracted from each fish, which they might specify in a
table. In both of the above examples, an analytical chemist would confirm that the sample support
would provide a sufficient amount of soil or fish tissue to conduct the analytical procedures needed to
characterize the concentrations of arsenic in soil or mercury in fish. Sometimes the sample support is an
integral part of the analytical result. For example, when sampling water for the occurrence of
microbiological contaminants such as chryptosporidium, water is passed through filters and the filters
are then processed and examined to count the number of organisms. The volume of water filtered
constitutes the sample support and also is used directly in the calculation of the occurrence rate (i.e.,
number of organisims per volume of water). In all cases, the sample support is chosen to ensure that
the measurement protocol will reliably characterize the sampling unit in a way that is consistent with the
study objectives. The study objectives are defined during systematic planning, such as in DQO Steps 1
and 2. The definition of the sampling unit and selection of sampling support will depend strongly on the
study boundaries defined in DQO Step 4, and on the performance criteria developed in DQO Step 6.
Possible constraints on choosing a sampling design fall into four categories: sampling/analysis
limitations, time/schedule restrictions, geographic barriers, and budget amounts. Sampling/analysis
constraints could include measurement instrument performance (for example, sensitivity and selectivity
requirements for field or laboratory technologies), regulatory requirements that specify analytic or
Final
EPAQA/G-5S 21 December 2002
-------
sample collection method, or weather constraints (for example, performance of field technologies at low
temperature, high humidity, or the ability to collect samples during certain seasons or types of weather).
Time/schedule constraints could include seasonal constraints such as the relationship of exposure to
season (for example, solvent volatility in warmer weather) and the availability of certified professionals.
Geographic constraints could include physical barriers that may preclude sampling (for example, rivers,
fences) and also any possible hindrance to the ability to accurately identify sample location. Budget
constraints should take into account the entire data collection process—from the collection of the
sample in the field, including transport and storage, to analysis of the samples and data entry and
validation. Compositing constraints could include the decision on representativeness of the physical
sample taken at a location or station, or the ability to physically mix samples both in the field and in the
laboratory.
In addition to these categories, sampling design development should also take into account
existing regulations and requirements (for example, state, municipal) if they apply. Finally, any possible
secondary uses of the data should be considered to the extent possible.
3.3 STEPS IN THE SAMPLING DESIGN PROCESS
Steps of the sampling design process are
represented in Figure 3-3 and described below.
Review the systematic planning outputs.
First, the sampling objectives need to be stated clearly.
Next, make sure the acceptance or performance criteria
are specified adequately (such as probability limits on
decision errors or estimation intervals). Then review the
constraints regarding schedule, funding, special
equipment and facilities, and human resources.
Develop general sampling design
alternatives. Decide whether the approach will involve
episodic sampling events (where a sampling design is
established and all data for that phase are collected
according to that design) or an adaptive strategy (where
a sampling protocol is established and sampling units are
selected in the field, in accordance with the protocol,
based on results from previous sampling for that phase).
Consider sampling designs that are compatible with the
sampling objectives. Evaluate advantages,
disadvantages, and trade-offs in the context of the
Review planning outputs
f
Develop general design
alternatives
»
Formulate mathematical
expressions for
performance and cost of
each design
*
Determine sample size that
satisfies performance
criteria and constraints
*
Choose the most
resource-effective design
t
Document the design in the
QA Project Plan
Figure 3-3. The Sampling Design
Process
EPA QA/G-5S
22
Final
December 2002
-------
specific conditions of the study including the anticipated costs for possible alternative sampling
strategies.
Formulate mathematical expressions for the performance and cost of each design
alternative. For each design, develop the necessary statistical model or mathematical formulae
needed to determine the performance of the design, in terms of the desired statistical power or width of
the confidence interval. This process usually involves developing a model of relevant components of
variance and estimating the total variance, plus key components as necessary. Also for each design,
develop a cost model that addresses fixed costs (such as mobilization and setup costs) and variable
costs (such as labor hours per sample and analytical costs per sample). Note that this step is not used
in judgmental sampling designs. Assistance from a statistician will be needed to develop these formulae
for more complex designs; formulae for the simpler designs are provided in the appendices to the
chapters in this guidance.
Determine the sample size that satisfies the performance criteria and constraints.
Calculate the optimal sample size (and sample allocation, for stratified designs or other more complex
designs). This guidance document provides formulae for estimating sample sizes needed for the
different designs. Trade-offs may be needed between less precise, less expensive measurement
protocols (that allow for more sampling units to be selected and measured) and more precise, more
expensive measurement protocols (that provide better characterization of each sampling unit at the
expense of allowing fewer sampling units to be selected and measured). Care has to be taken to ensure
that the trade-offs made do not change the inferences from the initially planned design. For example,
the use of compositing designs needs to agree with the initial concepts of exposure or goal of the study.
If none of the designs are feasible (i.e., performance specifications cannot be satisfied within all
constraints), then consider the following possible corrective actions listed below. Note that this step is
not used in judgmental sampling designs because performance criteria are not explicitly considered.
• Consider other, more sophisticated, sampling designs.
• Relax performance specifications (for example, increase the allowable probability of
committing a decision error) at the expense of increasing decision error risk.
• Relax one or more constraints (for example, increase the budget).
• Reevaluate the sampling objectives (for example, increase the scale of decision making,
reduce the number of sub-populations that need separate estimates, or consider
surrogate or indicator measurements).
Choose the most resource-effective design. Consider the advantages, disadvantages, and
trade-offs between performance and cost among designs that satisfy performance specifications and
constraints. Consider practical issues, schedule and budget risks, health and safety risks to project
Final
EPAQA/G-5S 23 December 2002
-------
personnel and the community, and any other relevant issues of concern to those involved with the
project. Finally, obtain agreement within the planning team on the appropriate design.
Document the design in the QA Project Plan. Provide details on how the design should be
implemented, contingency plans if unexpected conditions or events arise in the field, and quality
assurance (QA) and quality control (QC) that will be performed to detect and correct problems and
ensure defensible results. Specify the key assumptions underlying the sampling design, particularly
those that should be verified during implementation and assessment. Details on how to write a QA
Project Plan can be found in Guidance for Quality Assurance Project Plans (QA/G-5) (EPA,
1998b).
3.4 SELECTING A SAMPLING DESIGN
Table 3-1 presents examples of problem types that one may encounter and suggests sampling
designs that are relevant for these problem types in particular situations.
Table 3-1. Choosing the Appropriate Sampling Design for Your Problem
If you are...
performing a
screening phase of
an investigation of a
relatively small-scale
problem
developing an
understanding of
when contamination
is present
developing an
understanding of
where contamination
is present
estimating a
population mean
and you have... consider using...
a limited budget and/or a judgmental sampling
limited schedule
an adequate budget for the systematic sampling
number of samples needed
an adequate budget for the grid sampling
number of samples needed
an adequate budget systematic or grid
sampling
budget constraints and composite sampling
analytical costs that are
high compared to sampling
costs
in order to...
assess whether further
investigation is warranted that
should include a statistical
probabilistic sampling design.
acquire coverage of the time
periods of interest.
acquire coverage of the area
of concern with a given level
of confidence that you would
have detected a hot spot of a
given size.
also produce information on
spatial or temporal patterns.
produce an equally precise or
a more precise estimate of the
mean with fewer analyses and
lower cost.
EPA QA/G-5S
24
Final
December 2002
-------
Table 3-1. Choosing the Appropriate Sampling Design for Your Problem
If you are...
estimating a
population mean or
proportion
delineating the
boundaries of an area
of contamination
estimating the
prevalence of a rare
trait
attempting to identify
population units that
have a rare trait (for a
finite population of
units)
attempting to identify
population unit(s)
that have the highest
contaminant levels
(for a finite
population of units)
and you have...
budget constraints and
professional knowledge or
inexpensive screening
measurements to assess
the relative amounts of the
contaminant at specific
field sample locations
spatial or temporal
information on
contaminant patterns
a field screening method
analytical costs that are
high compared to sampling
costs
the ability to physically mix
aliquots from the samples
and then retest additional
aliquots
the ability to physically mix
aliquots from the samples
and then retest additional
aliquots
consider using...
ranked set sampling
stratified sampling
adaptive cluster
sampling
random sampling and
composite sampling
composite sampling and
retesting
composite sampling and
retesting
in order to...
reduce the number of analyses
needed for a given level of
precision.
increase the precision of the
estimate with the same number
of samples, or achieve the
same precision with fewer
samples and lower cost.
simultaneously use all
observations in estimating the
mean
produce an equally precise (or
a more precise) estimate of the
prevalence with fewer
analyses and lower cost.
classify all units at reduced
cost by not analyzing every
unit.
identify such units at reduced
cost by not analyzing every
unit.
EPA QA/G-5S
25
Final
December 2002
-------
Final
EPA QA/G-5S 26 December 2002
-------
CHAPTER 4
JUDGMENTAL SAMPLING
4.1 OVERVIEW
Judgmental sampling refers to the selection of sample locations based on professional judgment
alone, without any type of randomization. Judgmental sampling is useful when there is reliable historical
and physical knowledge about a relatively small feature or condition. As discussed in Quality
Assurance Guidance for Conducting Brownfields Site Assessments (EPA, 1998a), whether to
employ a judgmental or statistical (probability-based) sampling design is the main sampling design
decision. This design decision applies to many environmental investigations including Brownsfield
investigations. An important distinction between the two types of designs is that statistical sampling
designs are usually needed when the level of confidence needs to be quantified, and judgmental
sampling designs are often needed to meet schedule and budgetary constraints.
Implementation of a judgmental sampling design should not be confused with the application of
professional judgment (or the use of professional knowledge of the study site or process). Professional
judgment should always be used to develop an efficient sampling design, whether that design is
judgmental or probability-based. In particular, when stratifying a population or site, exercising good
professional judgment is essential so that the sampling design established for each stratum is efficient
and meaningful.
4.2 APPLICATION
For soil contamination investigations, judgmental sampling is appropriate for situations in which
any of the following apply:
• Relatively small-scale features or conditions are under investigation.
An extremely small number of samples will be selected for analysis/characterization.
There is reliable historical and physical knowledge about the feature or condition under
investigation.
• The objective of the investigation is to screen an area(s) for the presence or absence of
contamination at levels of concern, such as risk-based screening levels (note that if such
contamination is found, follow-up sampling is likely to involve one or more statistical
designs).
• Schedule or emergency considerations preclude the possibility of implementing a
statistical design.
Final
EPA QA/G-5S 27 December 2002
-------
Judgmental sampling is sometimes appropriate when addressing site-specific groundwater
contamination issues. As further discussed in Quality Assurance Guidance for Conducting
Brownfields Site Assessments (EPA, 1998a), a statistical sampling design may be impractical if data
are needed to evaluate whether groundwater beneath a Brownfields site is contaminated due to the high
cost of groundwater sample collection and knowledge of the connection between soil and groundwater
contamination.
4.3 BENEFITS
Because judgmental sampling designs often can be quickly implemented at a relatively low cost,
the primary benefits of judgmental sampling are to meet schedule and budgetary constraints that cannot
be met by implementing a statistical design. In many situations, when some or all of the conditions listed
in Section 4.2 exist, judgmental sampling offers an additional important benefit of providing an
appropriate level of effort for meeting investigation objectives without excessive consumption of project
resources.
4.4 LIMITATIONS
Judgmental sampling does not allow the level of confidence (uncertainty) of the investigation to
be accurately quantified. In addition, judgmental sampling limits the statistical inferences that can be
made to the units actually analyzed, and extrapolation from those units to the overall population from
which the units were collected is subject to unknown selection bias.
4.5 IMPLEMENTATION
By definition, judgmental sampling is implemented in a manner decided by the professionals)
establishing the sampling design. Specialized academic and professional training is needed before a
professional is qualified to design a judgmental sampling program. The following paragraphs provide
only a few examples of the most common factors that professionals should consider when establishing
judgmental sampling designs.
As discussed in EPA's Soil Screening Guidance (EPA, 1996a), current investigative
techniques and statistical methods cannot accurately establish the mean concentration of subsurface
soils within a contaminated source without a costly and intensive sampling program that is well beyond
the level of effort generally appropriate for screening. The Soil Screening Guidance advises that, in
establishing a judgmental sampling design to investigate subsurface soil contamination, the professional
should locate two or three soil borings in the areas suspected of having the highest contaminant
concentrations. If the mean contaminant concentration calculated for any individual boring exceeds the
applicable numerical screening value, additional investigative phases should be conducted. The Soil
Final
EPAQA/G-5S 28 December 2002
-------
Screening Guidance provides several approaches for calculating a mean contaminant concentration for
each boring; these approaches vary with the sampling-interval design.
In establishing a judgmental sampling design to investigate a subsurface soil contamination
problem, the professional needs to consider many factors including the following:
• Soil properties that affect contaminant migration (for example, texture, layering,
moisture content);
• The physical and chemical nature of the contaminant under investigation (for example,
solubility, volatility, reactivity);
• The manner in which the contaminant is understood to have been released (for
example, surface spill, leachate generated through above ground or buried waste,
leaking underground tank or pipe);
The timing and duration of the release; and
• The amount of contaminant understood to have been released.
As stated in Section 4.2, judgmental sampling is often appropriate when addressing site-specific
groundwater contamination issues. The most common factors to consider in establishing a judgmental
sampling design to address a site-specific groundwater contamination issue include the following:
• The physical and chemical nature of the contaminant under investigation (for example,
solubility, volatility, reactivity, density [whether floating or sinking nonaqueous phase
liquid could be present]);
• The possible effects of contaminant migration through the unsaturated zone when and
where the contaminant entered the aquifer;
• The possible ways that contaminant migration through the unsaturated zone might have
changed the chemical nature of the contaminant before it entered the aquifer;
• The depths and thicknesses of aquifers beneath the site;
• The direction and rate of groundwater flow within each aquifer and variations in these
parameters;
• The aquifer properties that cause the contaminant to disperse within it, both laterally and
vertically; and
• The natural attenuation processes that may affect how the contaminant migrates in
groundwater.
4.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS
Other sampling designs are used in conjunction with judgmental sampling in two common
situations. First, they may be used when the population or site is stratified, and judgmental sampling
takes place within one or more strata. This situation is typical of small-scale soil contamination
Final
EPA QA/G-5S 29 December 2002
-------
investigations when the suspected location of the contaminant release is known. When the suspect area
is identified as a stratum, then a judgmental sampling design is established for that stratum. Other strata
established for the site may be addressed through implementation of statistical sampling designs.
Judgment is, of course, used in establishing the boundaries and extent of each stratum.
Second, other sampling designs may be used when judgmental sampling indicates that the
screening criteria established for the area under investigation is exceeded, thereby warranting further
investigation. Depending on how much historical information is available and how much information has
been obtained from the judgmental-sampling phase, follow-up phases of investigation might involve any
of the statistical sampling designs described in this guidance document.
4.7 EXAMPLES OF SUCCESSFUL USE
4.7.1 Area Impacted by Contamination Can Be Visually Discerned
An active manufacturing facility is being sold, and the prospective purchaser is conducting an
investigation to characterize existing environmental conditions and potential associated liability. One
feature being assessed is an approximately 500 square meters (m2) fenced area where drums of an
aqueous cupric-chloride waste are stored. When released, the waste stains the soil blue-green. Eight
irregularly shaped blue-green stains are identified ranging in size from about 10 square centimeters to a
square meter. The stains are thought to be a result of relatively small releases that occurred as waste
was poured into drums at the storage area from smaller containers filled at the facility's Satellite
Accumulation Areas. A judgmental sampling design is established whereby a single grab sample of soil
is collected from each of the observed stains and analyzed for copper concentration. If any single
copper result falls within one order of magnitude of the risk-based copper soil-screening level for
industrial land use, the seller has agreed to pay for a follow-up investigation that will involve a statistical
sampling program designed to better characterize the soil copper contamination and assess whether
remediation is warranted.
4.7.2 Potential Location of the Contaminant Release Is Known
An abandoned textile mill is being investigated as a Brownfields site, and one previous
employee was located who gave a reliable account of site features and activities. Based on this
interview, the site was stratified and several different sampling designs (some statistical and some
judgmental) were established. A judgmental sampling design is being used to investigate a 30 meter
long drain pipe that carried a variety of wastes from one of the site factories to a leach field adjacent to
the building; a statistical grid-sampling design was established to investigate the leach field The drain
pipe is accessible under a grating installed on the basement floor of the factory, and visual (external) and
video (internal) inspections of the pipe showed it to be in good condition with no observable
deterioration or cracks. However, several of the joints between the 3 meter length pipe segments
Final
EPAQA/G-5S 30 December 2002
-------
appeared either loose or slightly separated. The judgmental sampling design established for this feature
involved marking the basement floor adjacent to each pipe joint, removing the pipe, and collecting a
single sample of the soil at each marked location for laboratory analysis. The analytical results then
would be compared to the risk-based screening levels established for the list of potential site
contaminants.
4.8 EXAMPLES OF UNSUCCESSFUL USE
4.8.1 Double Judgmental Sampling
Ginevan (2001) has a practical example:
"...a good question is 'what do I do if I am stuck with a "dirty
spots " sample?' The answer is that if there is a great deal of
money riding on the decision one should do the sampling over.
Note also that nothing is ever so bad that it cannot be made worse.
In one case we participated in, a dirty spots sample was taken first.
This was pointed out to the client, who then went out and took a
comparable number of samples from an area known to be clean.
At this point the formula given by Land's procedure for the upper
bound on the arithmetic mean of log-normal data was applied to
the combined data (which were strongly bimodal because of the
clean/dirty dichotomy). The resulting "upper bound" on the mean
exceeded the largest observation from the dirty spots sample!
Unhappily these data were beyond even the capability of the
bootstrap to salvage. The original sample had been taken to find
dirty spots and was thus not representative of the site. The end
result was a set of about 100 measurements which told us almost
nothing about the nature and extent of contamination at the site.
The client then instituted a statistically designed sampling plan. "
4.8.2 Visual Judgmental Sampling
This example concerns a rural county enforcement officer tramping along a creek periodically
exclaiming, "Here is a contamination!" when encountering dark spots in the stream sediment.
Obviously, the samples collected were only representative of those "dark" areas of sediment declared
contaminated by the enforcement officer and resulted in a wide range on concentration. Subsequent
investigation of the support of color blind grab samples of sediment revealed that the variation within an
areal area the size of a desk top encompassed all concentrations from not detected to those measured
Final
EPAQA/G-5S 31 December 2002
-------
by the enforcement officer. The support of the sample collected by the enforcement officer was no
better than a single random grab sample.
These examples show how it is possible to be completely misled by reliance on what seems to
be a desirable characteristic upon which to base the inclusion of a sample unit into the overall sample.
The advantage gained by using a probabilistic sampling scheme is that such biases are avoided.
Final
EPAQA/G-5S 32 December 2002
-------
CHAPTERS
SIMPLE RANDOM SAMPLING
5.1 OVERVIEW
Simple random sampling is the simplest and most fundamental probability-based sampling
design. Most of the commonly used statistical analysis methods assume either implicitly or explicitly that
the data were obtained using a simple random sampling design.
A simple random sample of size n is defined as a set of n sampling units selected from a
population (of objects or locations in space and/or time) so that all possible sets of n sampling units
have the same chance of being selected. For example, if there is a population of four elements
(A,B,C,D) and a sample of size n=3 elements is drawn, without replacement, there are four possible
outcomes:
(A,B,C), (A,B,D), (A,C,D), and (B,C,D).
Any sampling design that makes these outcomes equally likely is, by definition, a simple random
sampling design. A simple random sample of size n occurs when n units are independently selected at
random from the population of interest.
The most important characteristic of simple random sampling is that it protects against the bias
(systematic deviation from the "truth") that can occur if units are selected subjectively. Because it is the
most fundamental sampling design, simple random sampling also is a benchmark against which the
efficiency and cost of other sampling designs often are compared. Moreover, when using an alternative
sampling design, the minimum sample size (number of sampling units) needed for that sampling design
often is estimated by first computing the sample size that would be needed with a simple random
sampling design. That sample size is then multiplied by an adjustment factor, called the survey design
effect, to produce the minimum sample size needed under the alternative sampling design [Section 4.1.1
ofCochran(1977)].
5.2 APPLICATION
Simple random sampling is appropriate when the population being sampled is relatively uniform
or homogeneous. In practice, simple random sampling usually is used in conjunction with other
sampling designs, as discussed in Section 5.6.
Simple random sampling often is appropriate for the last stage of sampling when the sampling
design has more than one stage of sampling (i.e., a sample of units is selected at the first stage and then
Final
EPAQA/G-5S 33 December 2002
-------
subunits are selected from each sample unit) [Chapter 6 of Gilbert (1987) and Chapters 12 and 13 of
Thompson (1992)]. Examples include the following:
• Selecting one or more leaves from each sample plant for characterization,
• Selecting one or more aliquots from each soil sample for chemical analysis, and
• Assigning split samples or aliquots to laboratories or analytical methods.
In a similar vein, simple random sampling usually is needed for assigning experimental units to
treatments, or experimental conditions, in experimental designs.
5.3 BENEFITS
The primary benefit of simple random sampling is that it protects against selection bias by
guaranteeing selection of a sample that is representative of the sampling frame, provided that the sample
size is not extremely small (for example, 20 observations or more). Moreover, the procedures needed
to select a simple random sample are relatively simple.
Other benefits of using simple random sampling include the following:
• Statistical analysis of the data is relatively straightforward because most common
statistical analysis procedures assume that the data were obtained using a simple
random sampling design.
• Explicit formulae, as well as tables and charts in reference books, are available for
estimating the minimum sample size needed to support many statistical analyses.
5.4 LIMITATIONS
Simple random sampling has two primary limitations:
• Because all possible samples are equally likely to be selected, by definition, the sample
points could, by random chance, not be uniformly dispersed in space and/or time. This
limitation is overcome somewhat as the sample size increases, but it remains a
consideration, even with a large number of samples.
• Simple random sampling designs ignore all prior information, or professional
knowledge, regarding the site or process being sampled, except for the expected
variability of the site or process measurements. Prior information almost always can be
used to develop a probability-based sampling design that is more efficient than simple
random sampling (i.e., needs fewer observations to achieve a given level of precision).
Final
EPA QA/G-5S 34 December 2002
-------
Because of these limitations, simple random sampling is seldom recommended for use in
practice except for relatively uniform populations. Stratified simple random sampling (Chapter 6) is
commonly used to overcome these limitations by defining geographic and/or temporal sampling strata.
Alternatively, one may use systematic sampling (Chapter 7) or quasi-random sampling (Section 5.5.2)
to overcome these same limitations. Nevertheless, simple random sampling is a fundamental building
block and benchmark for most other sampling designs.
5.5 IMPLEMENTATION
This section discusses how to determine the minimum sample size needed with simple random
sampling to (1) estimate a population mean or proportion with prespecified precision or (2) test a
hypothesis regarding a population mean or proportion with a prespecified significance level and power.
This section also addresses the process of selecting a simple random sample.
5.5.1 How do you estimate the sample size?
To determine the minimum sample size needed to estimate a population proportion (for
example, proportion of units with concentrations above a health-based threshold), first identify a
conservative preliminary estimate of the true population proportion. In the absence of prior information,
use 50% as the preliminary estimate as this results in the largest sample size and so is the most
conservative. The closer the preliminary estimate is to the actual value, the greater the savings in
resources.
To determine the minimum sample size needed to estimate a population mean (for example,
mean contaminant concentration), first identify a conservatively large preliminary estimate of the
population variance. The preliminary estimate should be large enough that the true population variance
is not likely to be larger than the preliminary estimate because the sample size will be too small if the
estimated variance is too small. Sources of a preliminary estimate of population variance include: a
pilot study of the same population, another study conducted with a similar population, or an estimate
based on a variance model combined with separate estimates for the individual variance components.
In the absence of prior information, estimate the standard deviation (square root of the variance) by
dividing the expected range of the population by six, i.e.
« _ ExpectedMaximum - Expected Minimum
However, this is only a crude approximation and should be used only as a last resort.
Using these inputs, Appendix 5 provides general-purpose formulae for determining the
minimum sample size needed to achieve specified precision for estimates of population means and
Final
EPAQA/G-5S 35 December 2002
-------
proportions. Sample size formulae for achieving specified power for hypothesis tests are in Section 3
of Guidance for Data Quality Assessment (QA/G-9) (EPA, 2000a). Appendix 5 tabulates the
results from applying these formulae for determining the minimum sample size needed for hypothesis
tests. Examples of the use of these tables are provided in Section 5.7.2.
If the sample sizes calculated using the simple random sampling formulae are greater than the
study budget can support, then other sampling designs may reduce the number of sample specimens
and/or the number of measurements. For example, stratified random sampling (Chapter 6) and ranked
set sampling (Chapter 8) may result in smaller sample sizes if (inexpensive) data are available that are
positively correlated with the outcomes of interest. Moreover, if the objective of the study is estimation
of means, composite sampling (Chapter 10) may greatly reduce the number of analytical measurements.
Finally, if the variability between replicate measurements (for example, in the lab) is greater than the
natural variability between units (for example, using an imprecise method to analyze water samples from
a fairly homogenous body of water), using the mean of replicate measurements on each sample
specimen may reduce the number of sample specimens.
5.5.2 How do you decide where to take samples?
Selecting a simple random sample is most straightforward when all the sampling units (for
example, barrels in a warehouse, trees at a study site) comprising the population of interest can be
listed. When selecting a simple random sample from a list of N distinct sampling units, use the following
procedure:
• Label the sampling units from 1 to N.
Use a table of random numbers, or a computerized random number generator, to
randomly select n integers from 1 to N from the list.
The set of sampling units with these n labels comprises a simple random sample of size n. These n
sample units may be n points on the surface of a hazardous waste site, n points in time, etc. Here the
word "sample" is used in this statistical sense, related to a list of sampling units or potential sampling
locations. The actual aliquots of air, water, soil, etc., that are collected at the sample locations are
referred to as sample "specimens" to distinguish them from the statistical sample selected from the
universe of all possible sampling units (objects or locations in space and/or time).
When selecting a sample from a two-dimensional medium, such as surface soils or the
bottom of a lake or stream, the above one-dimensional list sampling approach can be used if an Mby N
grid is used to partition the population into MN unique units and the sample is selected from the list of
MN units.
Final
EPA QA/G-5S 36 December 2002
-------
However, it is often more practical and flexible to select points directly at random in two-
dimensional space if the desired sample support is not a rectangular area. If a rectangular coordinate
system (i.e., x andy coordinates, such as latitude and longitude) can be superimposed on the area of
interest, then a simple random sample of points is generated by randomly generating jc- and y-
coordinates, as illustrated in Figure 5-1. Note that in an irregularly shaped sample area, randomly
generated points falling outside of the sample area are not used.
100-
0) 75 -
o
o
>- 25 -I
I 1 I I I I I
0 25 50 75 100 125 150 175
X Coordinate
Figure 5-1. Example of a Map Showing Random Sampling Locations
When these sampling procedures are implemented to generate simple random samples in two
dimensions, the randomly generated sampling points (i.e., x- and y- coordinates or direction) should be
rounded to the nearest unit that can be reliably identified in the field (for example, nearest 1 or 5
meters). A sample specimen with the support defined in the sampling plan should then be obtained as
near as possible to each of these approximate random sampling points using a procedure to avoid
subjective bias factors such as "difficulty in collecting a sample, the presence of vegetation, or the color
of the soil" (EPA, 2000b). The protocols should be defined so that it will always be possible to obtain
a sample from each randomly selected location. However, if it is physically impossible to obtain a
specimen from a randomly selected location, deleting that location from the sample is valid as long as
inferences are restricted to the accessible locations. The use of a subsidiary list of alternate (random)
locations to be substituted for inaccessible locations is recommended.
The above sampling methods can be extended fairly easily, at least conceptually, to sampling
three-dimensional wastes (for example, a waste pile or liquid wastes in a pond, lagoon, or drum).
One approach is to superimpose a three-dimensional coordinate system over the area to be sampled
(i.e., jc, y, and z coordinates) and randomly generate x-, y-, and z-coordinates to identify randomly
selected points.
EPA QA/Q-5S
37
Final
December 2002
-------
Although it is conceptually easy to generate random sampling points in three dimensions,
actually getting a sampling tool into a three-dimensional medium at these randomly selected locations
and extracting specimens with the correct sample support (size, shape, and orientation) can be difficult
or impossible. Consider, for example, solid waste in a pile. If the waste pile has the consistency of soil,
a technician may be able to take a core sample at the randomly selected location and extract a
subsample from the core at the correct depth that has the desired support (for example, 5 centimeter
diameter and 15 centimeters depth). However, if the pile contains large impermeable solids (for
example, rocks of larger diameter than the core), taking such a core sample may not be possible.
Alternatively, if the material is very fine, like ash, a technician may not be able to take a core sample
because the process of getting the core would fundamentally alter the nature of the pile being sampled
(for example, it would cause the pile to shift or collapse). In that case, one potential solution may be to
level the pile and take samples from the entire depth of the leveled pile at randomly selected points in
two dimensions.
Liquid wastes present similar problems for sampling in three dimensions. If the liquid waste has
the consistency of water, it may be possible to extract samples from randomly selected locations using a
probe and pump. However, some wastes (for example, a semiliquid sludge) are too thick to be
pumped yet not solid enough to extract competent cores. If a technician were sampling sludge from a
lagoon, it might be necessary to sample the entire vertical thickness of sludge at randomly selected
locations (in two dimensions) and then analyze a subsample(s) from the resulting composite sample.
Section 21.6.5 of Pitard
(1993) states that one could
theoretically obtain correct
(representative) samples from a
waste pile by selecting either one- or
two-dimensional samples
representing the full cross-section of
the waste. A one-dimensional
sample is one in which vertical cross-
sections of a prescribed thickness are
selected, as depicted in Figure 5-2.
A two-dimensional sample is one in
which cores from the top to the
bottom of the waste pile are
randomly extracted, as depicted in
Figure 5-3. Section 14.4.7 of Pitard Figure 5'2- A One-Dimensional Sample of Cross-
(1993) states that attempting to Sections from a Waste Pile
extract such samples is an "exercise
in futility" because of the lack of appropriate sampling devices. Additional guidance regarding sampling
EPA QA/G-5S
38
Final
December 2002
-------
Figure 5-3. A Two-Dimensional Sample of Cores from a
Waste Pile
devices and techniques that can be
used to sample from three-
dimensional waste piles is provided in
Section 8.3 of Myers (1997) and by
the American Society for Testing and
Materials (ASTM) D6232-00
(2000).
An alternative sampling
method that provides random samples
that are more uniformly dispersed
than simple random samples is "quasi-
random sampling." Quasi-random
sampling refers to methods for
generating a quasi-random sequence
of numbers that are "in a precise
sense, 'maximally avoiding' of each other" [Section 7.7 of Press et al.(1992)]. Samples in two or more
dimensions are generated by pairing two or more of these quasi-random sequences, hi two
dimensions, the result is a set of sample points that, for any given sample size, appear to be uniformly
scattered throughout the sampled area, as illustrated in Figure 5-4. Quasi-random sampling can be
used to avoid the potential for geographic clustering that exists with simple random sampling without
taking the risk of aligning the sample with an unknown pattern of contamination, a limitation of grid
sampling (as discussed in Chapter 7). The resulting data can be analyzed as if the sample were a simple
random sample, knowing that the sampling variance is likely to be slightly underestimated. Techniques
for generating quasi-random samples are mathematically complex; they are described in Section 7.7 of
Press et al. (1992). A simpler technique that
achieves similar results is "deep" stratification,
in which only one unit is selected at random
from each sampling stratum (see Chapter 6).
A variation would be to divide the population
into small units and take a random sample
from within each unit for a total of n units.
5.6 RELATIONSHIP TO OTHER
SAMPLING DESIGNS
Simple random sampling often is used
for selecting samples within sampling strata.
When an independent simple random sample
is selected from each stratum, the sampling
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
Figure 5-4. Illustration of a Quasi-Random
Sample
EPA QA/G-5S
39
Final
December 2002
-------
design is referred to as stratified simple random sampling (see Chapter 6). Simple random sampling
also is used as the first step of the ranked set sampling process described in Chapter 7. It also can be
used as the first step of the adaptive cluster sampling process described in Chapter 9.
5.7 EXAMPLES
5.7.1 General Simple Random Sampling Example
Suppose that a company with a fleet of 5,000 late-model, mid-sized sedans decides that they
will overhaul their fleet to improve emissions if the mean (average) carbon monoxide (CO) emission
rate of the fleet (in grams per mile, g/m) is unusually high. Since the EPA standard for passenger cars is
no more than 3.4 g/m, and data from the manufacturers of their fleet's cars suggests that most cars in
the fleet will be between 1.0 and 3.0 g/m, they decide that an overhaul is needed if their mean CO
emission rate exceeds 2.5 g/m. Hence, to determine whether or not an overhaul is needed, they will
test the following hypothesis for means:
HQ: (j, <, 2.5 versus HA: n > 2.5 g/m
Suppose that all vehicles in the fleet are late-model, 6-cylinder cars that are expected to have similar
emission rates. Hence, for selecting a sample of vehicles to be tested from this relatively homogeneous
5,000-vehicle population, a simple random sampling design is appropriate.
In order to determine appropriate sample sizes using Appendix Table 5-1, a preliminary
estimate of the variability between measurements of CO emission rates is needed for their fleet.
Company researchers referred to old records to estimate the expected variability in the fleet's CO
emission rates. However, lacking any data regarding variances of CO emission rates, they choose to
use one-sixth of the expected range as an estimate of the standard deviation, as discussed in Section
5.5.1. They expected that the range probably would be from about 0.5 to 3.5, a range of 3.0 g/m. and
were could potentially be as large as 4 g/m or more if some of their cars were not properly tuned.
Hence, sample sizes were determined for the following potential standard deviations:
Range (g/m)
3
4
5
A
(7 = Range 1 6
0.50
0.67
0.83
EPA QA/G-5S
40
Final
December 2002
-------
In their application of the DQO Process, the company officials determined that the maximum
acceptable error rates were as follows:
• False Rejection: a = Prob(false rejection when n = 2.5 g/m) = 0.05
• False Acceptance: P = Prob(false acceptance when \i = 2.75 g/m) = 0.05
Table 5-1 then was used to determine the minimum sample size needed by entering the table with the
following parameters:
• a = Significance level = 0.05 (i.e. 5%)
Power = 1 - (3 = 0.95 (i.e. 95%)
Effect size 1 = 100( ||i, - n0| /&) = 100( |2.75 - 2.50| ) / 0.50 = 50%
Effect size 2 = 100( |m - n0| /&) = 100( |2.75 - 2.50|) / 0.67 = 37%
Effect size 3 = 100( ||i, - n0| /&) = 100( |2.75 - 2.50J) / 0.83 = 30%
Hence, the company managers used the first row of Table 5-1 to determine that a sample of 122, 69,
or 45 cars was needed, depending on whether the effect size was 30%, 40%, or 50%, respectively.
Based on these results, they decided that a simple random sample of 100 cars should provide adequate
protection against both false rejection and false acceptance decision errors.
The researchers then assigned inventory control numbers to the cars in the fleet from 1 to 5,000
to facilitate the random sampling process. They used a random number generator to generate 100
random numbers between 1 and 5,000 (for example, using http://www.random.org). The cars with
these inventory control numbers were then selected as the simple random sample of cars to be tested
for CO emission rates.
In this case, the cost of sampling (measuring the emission rate) was relatively low and a large
sample presented no problems. If the cost had been prohibitive, a pilot study would have been
completed in order to give preliminary information on the variability. This would probably result in a
lower number of cars to test.
5.7.2 Examples Using Look-up Tables in Appendix 5
These examples are simply intended to demonstrate the use of the tables.
Tables 5-2 and 5-3: Suppose the company decides that they need to overhaul the fleet of
cars if more than 10% of the fleet have CO emission rates exceeding 3.0 g/m. To determine whether
or not the overhaul is needed, they need to test the hypothesis for proportions:
HQ: P < 10% versus HA: P > 10%
Final
EPA QA/G-5 S 41 December 2002
-------
In their application of the DQO Process, the company officials determine that the maximum acceptable
error rates are as follows:
• False Rejection: a = Prob(false rejection when P = 10%) = 0.05
• False Acceptance: p = Prob(false acceptance when P = 15%) = 0.05
Table 5-2 then can be used to determine the minimum sample size needed by entering the table with the
following parameters:
a = Significance level = 0.05 (i.e., 5%)
Power = 1 - p = 0.95 (i.e., 95%)
P0 = 10%
|P,-P0| = |15%-10%| = 5%
Table 5-2 shows that a sample of 468 cars is necessary to achieve the error bounds specified for the
hypothesis test.
Table 5-4: Suppose the company also has a fleet of 5,000 small pick-up trucks. The researchers
want to know if the mean CO emission rate for their fleet of pick-up trucks exceeds that for the fleet of
sedans. They then need to test the hypothesis for difference of two means:
HO: |i, - u.2 <• 0 versus HA: n, - \iz > 0,
where m is the mean CO emission rate for the fleet of pick-up trucks and \i2 is the mean CO emission
rate for the fleet of sedans.
In their application of the DQO Process, they determine that the maximum acceptable error
rates are as follows:
• a = Prob(false rejection when 5 = n, - \i2= 0) = 0.05
• p = Prob(false acceptance when 8 = (i, - ji2 - 0-25 g/m) = 0.05
Table 5-4 then can be used to determine the minimum sample size needed by entering the table with the
following parameters:
a = Significance level = 0.05 (i.e.5%)
Power = 1 - p = 0.95 (i.e.95%)
Effect size = 100( |«, - 50| /&) = 100( |0.25 - 0.00| / 0.50) = 50%
Final
EPAQA/G-5S 42 December 2002
-------
Table 5-4 shows that a sample of 88 sedans and 88 pick-up trucks is necessary to achieve the error
bounds specified for the hypothesis test.
Tables 5-5 and 5-6: Suppose the company decides that they want to determine whether the
proportion of pickup trucks in the fleet with CO emission rates greater than 3.0 g/m is greater than the
proportion for the fleet of sedans. They then need to test the hypothesis for difference of two
proportions:
HO: P, - P2 < 0% versus HA: PI - P2 > 0%
where P[ is the proportion of pick-up trucks with emission rates exceeding 3.0 g/m and P2 is the
proportion of sedans with emission rates exceeding 3.0 g/m.
In their application of the DQO Process, they determine that the maximum acceptable error
rates are as follows:
• False Rejection: a = Prob(false rejection when Pj - P2 = 0) = 0.05
• False Acceptance: p = Prob(false acceptance when P[ = 10% and P2 = 5%) = 0.05
Table 5-5 then can be used to determine the sample size needed by entering the table with the following
parameters:
a = Significance level = 0.05 (i.e.5%)
Power = 1 - p = 0.95 (i.e.95%)
PI = 10%
|P,-P2| = |10%-5%| = 5%
Table 5-5 indicates that a sample of 947 sedans and a sample of 947 pick-up trucks are necessary to
achieve the error bounds specified for the hypothesis test.
It should be noted, however, that when the estimated sample size (ri) becomes relatively large
compared to the population size (TV), a factor called the Finite Population Correction Factor, the ratio
n/N, must be taken into consideration. For more information, see Section 4.2 of Gilbert (1987),
Section 2.5 of Cochran (1963), and Appendix 5. In addition, these formulae assume the underlying
population to be normally distributed. If approximate normality does not hold, these sample sizes could
be too small.
Final
EPAQA/G-5S 43 December 2002
-------
APPENDIX 5
SAMPLE SIZE TABLES FOR SIMPLE RANDOM SAMPLING DESIGNS
This appendix provides the following tables to determine the minimum sample size needed to
achieve sufficient precision with simple random sampling designs:
• Table 5-1. Sample Size Needed for a One-Sample t-Test.
• Table 5-2. Sample Size Needed for a One-Sample Test for a Population Proportion,
P, at a 5% Significance Level.
• Table 5-3. Sample Size Needed for a One-Sample Test for a Population Proportion,
P, at a 10% Significance Level.
• Table 5-4. Sample Size Needed for a Two-Sample t-Test.
• Table 5-5. Sample Size Needed for a Two-Sample Test for Proportions at a 5%
Significance Level.
• Table 5-6. Sample Size Needed for a Two Sample Test for Proportions at a 10%
Significance Level.
The formulae that these sample size calculations are based upon are provided in Chapter 3 of
Guidance for Data Quality Assessment (QA/G-9) (EPA, 2000a) for the remaining tables, which
address sample size needed for hypothesis tests.
Table 5-1. Sample Size Needed for One-Sample t-test
Significance
Level
5%
10%
Power
95%
90%
80%
95%
90%
80%
10%
1,084
858
620
858
658
452
Effect
20%
272
216
156
215
166
114
Size
30%
122
97
71
96
74
51
40%
69
55
40
55
42
29
50%
45
36
27
36
28
19
Case 1: H0: \L < Cvs HA: \i > C; Case 2: H0: \i z C vs HA: \i
-------
Table 5-2. Sample Size Needed for a One-Sample Test for a Population
Proportion, P, at a 5% Significance Level
Po
Case 1
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Case 2
PI-
5% 10%
level = 5%. Power = 95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
468 133
751 200
947 244
1056 266
1077 266
1012 244
860 200
621 133
291 NA
level = 5%. Power = 90%
90%
80%
70%
60%
50%
40%
30%
20%
10%
362 102
589 156
746 191
834 210
853 211
804 195
686 161
498 109
239 NA
level = 5%. Power = 80%
90%
80%
70%
60%
50%
40%
30%
20%
10%
253 69
419 109
534 136
600 151
617 153
583 142
501 119
368 83
184 NA
-Po
15%
65
93
110
118
115
103
80
46
NA
49
72
87
93
92
83
66
40
NA
33
50
62
67
67
61
50
32
NA
20%
39
54
63
65
63
54
39
NA
NA
30
42
49
52
50
44
33
NA
NA
20
29
35
38
37
33
26
NA
NA
Case 1: H0: P < P0 vs HA: P > P0; Case 2: H0: P ;> P0 vs HA: P < P0; P = P, at the boundary
of the gray region determined in Step 6 of the DQO Process.
EPA QA/G-5S
45
Final
December 2002
-------
Table 5-3. Sample Size Needed for a One-Sample Test for a
Population Proportion, P, at a 10% Significance Level
Po
Case 1
Significance level
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance level
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance level
10%
20%
30%
40%
50%
60%
70%
80%
90%
Case 2
PI-
5% 10%
= 10%. Power = 95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
378 109
601 161
753 195
837 211
852 210
798 191
676 156
484 102
221 NA
= 10%. Power = 90%
90%
80%
70%
60%
50%
40%
30%
20%
10%
284 81
456 121
575 148
641 161
654 161
615 148
522 121
377 81
177 NA
= 10%. Power = 80%
90%
80%
70%
60%
50%
40%
30%
20%
10%
188 53
308 81
392 100
439 110
449 111
424 103
363 86
265 59
130 NA
-Po
15% 20%
54 33
75 44
88 50
93 52
91 49
80 42
62 30
34 NA
NA NA
40 24
57 33
67 38
72 40
70 38
63 33
49 24
28 NA
NA NA
25 15
38 22
45 26
49 28
49 27
44 24
36 18
22 NA
NA NA
Case 1: H0: P s P0 vs HA: P > P0, Case 2: H0: P a P0 vs HA: P < P0; P = P, at the
boundary of the gray region determined in Step 6 of the DQO Process; NA = not
EPA QA/G-5S
46
Final
December 2002
-------
Table 5-4. Sample Size Needed for a Two-Sample t-Test
Significance
Level
5%
10%
Power
95%
90%
80%
95%
90%
80%
10%
2,166
1,714
1,238
1,714
1,315
902
Effect
20%
542
429
310
429
329
226
Size
30%
242
191
139
191
147
101
40%
136
108
78
108
83
57
50%
88
70
51
69
53
37
Case 1: H0: ji, - \12 < 80 vs HA: n, - \12 > 60; Case 2: H0: n, - |J.2 > 50 vs HA: n, - \i2 < 60. In either
case, 6j = (jij - (j,2) at the boundary of the gray region determined in Step 6 of the DQO
Process, and the effect size is 100 * |6, - 60| /6 .
See Table 24.1 of Cohen (1988) for a more extensive tabulation.
EPA QA/G-5S
47
Final
December 2002
-------
Table 5-5. Sample Size Needed for a Two-Sample Test for
Proportions at a 5% Significance Level
Pi
Case 1
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Case 2
IP PI
1 "i - "i \
5%
level = 5%. Power =
90%
80%
70%
60%
50%
40%
30%
20%
10%
947
1510
1900
2116
2160
2030
1727
1250
601
level = 5%. Power =
90%
80%
70%
60%
50%
40%
30%
20%
10%
750
1195
1503
1675
1709
1606
1366
990
476
level = 5%. Power =
90%
80%
70%
60%
50%
40%
30%
20%
10%
541
863
1086
1209
1234
1160
987
715
344
10%
95%
276
406
493
536
536
493
406
276
NA
90%
219
322
390
424
424
390
322
219
NA
80%
158
232
282
307
307
282
232
158
NA
15%
139
192
226
240
236
212
168
106
NA
110
152
179
190
187
167
133
84
NA
80
110
129
138
135
121
96
61
NA
20%
87
114
130
136
130
114
87
NA
NA
69
90
103
108
103
90
69
NA
NA
50
65
75
78
75
65
50
NA
NA
Casel: H0: P, - P2 <, 0 vs HA:
P2 < 0; NA = Not applicable.
P,-P2>0;
Case 2: H0: P, - P2 > 0 vs
HA: P, -
EPA QA/G-5S
48
Final
December 2002
-------
Table 5-6. Sample Size Needed for a Two-Sample Test for
Proportions at a 10% Significance Level
Pi
Casel
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Significance
10%
20%
30%
40%
50%
60%
70%
80%
90%
Case 2
1 Pi - P2 1
5%
level = 10%. Power =
90%
80%
70%
60%
50%
40%
30%
20%
10%
750
1195
1503
1675
1709
1606
1366
990
476
level = 10%. Power =
90%
80%
70%
60%
50%
40%
30%
20%
10%
575
917
1153
1285
1311
1232
1048
759
365
level = 10%. Power =
90%
80%
70%
60%
50%
40%
30%
20%
10%
395
629
792
882
900
846
720
521
251
10%
95%
219
322
390
424
424
390
322
219
NA
90%
168
247
299
326
326
299
247
168
NA
80%
115
170
206
224
224
206
170
115
NA
15%
110
152
179
190
187
167
133
84
NA
85
117
137
146
143
129
102
64
NA
58
80
94
100
98
88
70
44
NA
20%
69
90
103
108
103
90
69
NA
NA
53
69
79
83
79
69
53
NA
NA
37
48
55
57
55
48
37
NA
NA
Case 1: H0: P, - P2 <, 0 vs HA: P, - P2 > 0; Case 2: H0: P, - P2
P2 < 0; NA = Not applicable.
EPA QA/G-5S
49
Final
December 2002
-------
Final
EPA QA/G-5S 50 December 2002
-------
CHAPTER 6
STRATIFIED SAMPLING
6.1 OVERVIEW
Stratified sampling is a sampling design in which prior information about the population is used
to determine groups (called strata) that are sampled independently. Each possible sampling unit or
population member belongs to exactly one stratum. There can be no sampling units that do not belong
to any of the strata and no sampling units that belong to more than one stratum. When the strata are
constructed to be relatively homogeneous with respect to the variable being estimated, a stratified
sampling design can produce estimates of overall population parameters (for example, mean,
proportion) with greater precision than estimates obtained from simple random sampling. Using
proportional allocation to determine the number of samples to be selected from each stratum will
produce estimates of population parameters with precision at least as good as, and possibly better than,
estimates obtained using simple random sampling (regardless of how the strata are defined). However,
if optimal allocation is used to assign samples to the strata, and the estimates of the variance within the
strata are not close to the actual values, the level of precision in the resulting estimates may be worse
than the level of precision for simple random sampling.
Stratified random sampling also is often used to produce estimates with prespecified precision
for important subpopulations. For example, one of the most common uses of stratification is to account
for spatial variability by defining geographic strata, especially when results need to be reported
separately for particular geographic areas or regions. Strata may also be defined temporally. Temporal
strata permit different samples to be selected for specified time periods and, hence, also permit
designing the sample to support separate estimates for different time periods (for example, seasons)
with prespecified precision. Hence, temporally stratified sampling designs support accurate monitoring
of trends.
6.2 APPLICATION
The method of defining the strata depends on the purpose of the stratification. One of the
principal reasons for using a stratified design is to ensure a more representative sample by distributing
the sample throughout the spatial and/or temporal dimensions of the population. For instance, a sample
drawn with a simple random sample may not be uniformly distributed in space and/or time because of
the randomness. Such a sample may not be as representative of the population as a sample obtained
by stratifying the study area and independently selecting a sample from each stratum.
Stratification may produce gains in precision in the estimates of population characteristics. If the
investigator has prior knowledge of the spatial distribution of the study area, the strata should be
Final
EPA QA/G-5S 51 December 2002
-------
defined so that the area within each stratum is as homogeneous as possible. In addition, the strata can
be defined using reliable data on another variable that is highly correlated with the variable to be
estimated. If the sample is allocated either proportionally or optimally to the strata, the resulting
estimates will have greater precision than if no stratification were used. The variable providing the
information used to establish the strata is referred to throughout this chapter as an "auxiliary variable."
Stratification is advisable if a population is subdivided into groups and certain information is
desired separately for each group. If estimates (for example, means, proportions, etc.) are desired for
particular groups or regions, each group or region would be assigned as a separate stratum.
Stratification also is useful if different parts of a population present different sampling issues that may
need to be addressed separately. Field conditions may need different sampling procedures for different
groups of the population in order to be efficient. This approach is facilitated by stratified sampling
because, by definition, each stratum is sampled independently of the other strata. If unbiased estimators
of the stratum mean and variance exist for each stratum, then one also can produce unbiased estimates
of the overall mean and variance. Field conditions may need different sampling procedures for different
groups of the population in order to be efficient. This approach is facilitated by stratified sampling
because each stratum can use a different statistical sampling method.
6.3 BENEFITS
Stratification can be useful when the implementation of different sampling designs in each
stratum could reduce costs associated with the sample selection. The strata can be defined in order to
minimize costs associated with sampling at various sites. Study sites that are close in proximity to one
another can be assigned to one stratum to minimize the travel time for a team of field personnel to take
samples at these locations. Also, if the costs of collecting samples at a portion of a study site are much
greater than the rest of the study site, the most costly portion of the site can be assigned as a stratum to
minimize sample collection costs. Groups of the population with certain characteristics, which may or
may not be the same as the primary stratification variables, can be used as strata in order to ensure that
a sufficient number of sampling units appear in the sample for estimates or other analysis of the groups.
For example, the investigator may want to stratify the country by average yearly rainfall in order to
increase the precision of estimates and may also want to stratify by EPA region to obtain estimates for
each region. Stratification can also ensure that certain rare groups of the population that are of interest
for estimates or analysis, and that may not otherwise have sufficient sample sizes, have the sample sizes
necessary to perform the desired analyses.
When stratification is based on correlation with an auxiliary variable which is adequately
correlated with the variable of interest, stratification can produce estimates with increased precision
compared with simple random sampling or, equivalently, achieve the same precision with fewer
observations. For increased precision, the auxiliary variable used to define the strata should be highly
correlated with the outcomes being measured. The amount of increase in precision over simple random
Final
EPAQA/G-5S 52 December 2002
-------
sampling depends on the strength of the correlation between the auxiliary variable and the outcome
variable being measured. Consider a situation in which a prior study had found that the amount of clay
in the soil is correlated with the amount of a chemical that remains in the soil. In this case, the
investigator could use a map of the study area showing the amount of clay in the soil to define the strata
needed to estimate the concentration of the chemical. Strata can be defined in order to minimize costs
to attain a given level of precision or to maximize precision for a given cost. Example 6-1 shows how
the appropriate use of stratification in a planned sampling design can produce estimates with increased
precision or need fewer samples as compared to simple random sampling.
6.4 LIMITATIONS
Stratified sampling needs reliable prior knowledge of the population in order to effectively
define the strata and allocate the sample sizes. The gains in the precision, or the reductions in cost,
depend on the quality of the information used to set up the stratified sampling design. Any possible
increases in precision are particularly dependent on strength of the correlation of the auxiliary,
stratification variable with the variable being observed in the study. Precision may be reduced if
Neyman or optimal allocation is used and if the auxiliary variable used for the optimization calculations
does not accurately reflect the variability of observations for the study.
As with simple random sampling, with a stratified sampling plan the investigator may encounter
difficulties identifying and gaining access to the sampled locations in the field. Such limitations may
reduce the expected gains in precision anticipated by using a stratified sampling scheme.
6.5 IMPLEMENTATION
6.5.1 How do you decide what sample size to use with this design?
The strata should be determined before allocating the sample sizes, and the methods used to
define the strata depend on the reasons that stratification is desired. When the strata are to be defined
according to an auxiliary variable that is correlated with the variable to be estimated, the optimal
definition of the strata is to allocate the strata so that the population included in each stratum is as
homogeneous as possible with respect to the auxiliary variable.
Section 5A.6 of Cochran (1977) offers some guidelines on how to optimally assign strata when
the auxiliary variable is continuous (i.e., consists of measured values). If the investigator is interested in
estimating the overall mean for the population, Cochran suggests defining no more than six strata and
using a procedure attributed to Dalenius and Hodges (1959) to determine the optimal cutoff values for
each of the strata based on the distribution of the second variable for the population. The steps for
determining the Dalenius-Hodges strata are given in Appendix 6-B. Section 5A.7 of Cochran (1977)
also provides a discussion and an example of the Dalenius-Hodges procedure. The effectiveness of
Final
EPAQA/G-5S 53 December 2002
-------
using a pilot study to determine the strength of the correlation between the two variables cannot be
under estimated.
Once the strata have been defined, a number of options can be used to allocate the sample
sizes to each stratum. Equal allocation can be used to assign the same number of samples to be
selected within each stratum. Proportional allocation can be used to allocate the samples to the strata
so that the proportion of the total sampling units allocated to a stratum is the same as the proportion of
sampling units in the population that are classified in that stratum. As mentioned in Section 6.1,
proportional allocation can ensure that the precision of the population estimates will be as least as good
as, if not better than, the precision without the use of stratification. Optimal allocation has two options:
• Optimize the precision for a fixed study cost.
• Optimize the cost of the study for a fixed level of precision.
If the investigator has a fixed budget in order to collect the samples, the samples could be allocated so
that the results would produce the highest precision for the variable to be estimated. If the investigator
needs a specific level of precision, the samples could be allocated so that the costs in obtaining the
designated level of precision are as low as possible. A special case of the optimal allocation in which
the cost of sampling each unit is the same across all strata is Neyman allocation. As previously stated,
the extent of the benefits of the stratified sampling design, especially when the optimal sample
allocations are used, depend on the quality of the data used to set up the sampling design and the
strength of the correlation between the auxiliary variable and the variable to be estimated. However,
because the optimal and Neyman sample allocations depend on auxiliary data, the increase (or possible
decrease) in precision of the estimates as compared to simple random sampling depends on the
accuracy of the variance values used in the sample allocation calculations. Disproportionate allocation
may not work well if good estimates of variances are not available. The formulae for the sample size
allocations can be found in Appendix 6-A.
6.5.2 How do you decide where to take samples with this design?
Once the strata are established, any sampling design can be used to select the samples within
each stratum. Where to select these samples will depend on the choice of sampling design that is used
(Section 6.6).
6.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS
As mentioned earlier, any sampling design can be used within each stratum. The choices
include, but are not limited to, simple random sampling, quasi-random sampling, grid sampling, and
even another level of stratified sampling.
Final
EPA QA/G-5S 54 December 2002
-------
6.7 EXAMPLE
An investigator wants to estimate the average concentration of arsenic in the surface soil around
the smoke stack at a hazardous waste incinerator facility to determine if the soil has been contaminated
above the naturally occurring concentrations of arsenic for the region. Samples are to be taken within
500 meters from the smoke stack. Information gathered from prior studies indicates that the
concentration of arsenic will be higher in the area along the prevailing wind direction and that the
variability of the concentration of arsenic in the soil will be higher for clayey soils compared to sandy
soils. Because the hazardous waste incinerator facility is located along the ocean coast, the prevailing
winds flow from the east. The precision for the estimate of the concentration of arsenic can be
increased by dividing the study area into strata according to the prevailing wind direction and the type
of soil (see Figure 6-1).
Radius = 500 IT
Budget restrictions will only allow 60
samples to be taken from the area around the smoke
stack. The study area was stratified according to
Figure 6-1, and the Neyman allocation (described in
Section 6.5.1) was used to determine the number of
samples to be randomly selected within each
stratum. The summary statistics for the stratified
samples are shown in Table 6-1. Suppose that a
simple random sample of 60 soil samples was also
taken from the study area for comparison of the
performance of the designs. Table 6-1 shows that
taking 60 samples by simple random sampling and
stratified random sampling produce similar estimates
for the mean concentration of arsenic, but the
standard error associated with the stratified random
sample is lower (i.e., the precision is higher) than that of the simple random sample. Table 6-2 shows
that the investigator would have only needed to take 40 soil samples using stratified random sampling in
order to get a precision similar to that obtained by analysis of 60 samples taken by simple random
sampling. This result is shown by comparing the standard errors and the 95% confidence intervals
shown for the various sample sizes under stratified random sampling and simple random sampling. If a
particular precision was desired for this study (for example, a standard error of 1.00 for estimating the
mean), the investigator could reduce the costs of obtaining an estimate of the average concentration of
arsenic by using a stratified sampling design as described above instead of a simple random sampling
design.
Figure 6-1. Stratification of Area to Be
Sampled
EPA QA/G-5S
55
Final
December 2002
-------
Table 6-1. Summary Statistics for Simple and Stratified Random Samples
# samples
mean
standard
error
Simple
Random
Sampling
60
19.81
4.35
Stratified Random Sampling
Down-
wind/
Clayey Soil
43
46.16
9.99
Down-
wind/
Sandy
Soil
5
12.66
4.63
Perpendicular
Wind/Clayey
Soil
10
9.49
2.28
Perpendicular
Wind/Sandy
Soil
2
10.20
3.12
Overall
60
22.94
3.68
Table 6-2. Number of Samples Needed to Produce Various Levels
of Precision for the Mean
# samples
standard
error
95%
Confid.
Interval
Simple
Random
Sampling
60
4.35
±8.69
Stratified Random Sampling
60
3.68
±7.36
40
4.51
±9.12
20
6.41
±10.57
14
7.57
±16.35
9
9.06
±20.50
8
9.73
±22.43
7
10.59
±25.04
EPA QA/G-5S
56
Final
December 2002
-------
APPENDIX 6-A
FORMULAE FOR ESTIMATING SAMPLE SIZE SPECIFICATIONS
FOR STRATIFIED SAMPLING DESIGNS
This appendix contains formulae for several commonly used estimates of sample size n.
L = number of strata
Nh = total number of units in stratum h
N = total number of units in population, N = £ Lh=, N h
H, = number of units sampled in stratum h
• To calculate the overall mean and the variance of the overall mean for stratified random
sampling:
L
xst = £ W,xh
h= 1
L
variance of xsf - £
h=l
«, 57
T^)A
where xh is the ordinary mean of stratum h, and sjj is the ordinary estimated variance of
stratum h.
To calculate the sample size within the stratum:
n = total number of units sampled, n = £ J1 _, n h
oh = prior known standard deviation in stratum h
Wh = stratum weight, Wh = N h /N
C = total budget
C0 = initial fixed costs
Ch = cost per sample for stratum h
V = fixed variance
, „
equal allocation: nh =
Final
EPAQA/G-5S 57 December 2002
-------
proportional allocation: «/, - nW
Neyman allocation: nh = n
Whoh
Note that in practice, oh is replaced by sh.
optimal allocation for fixed cost: h
k=i
oh is replaced by sh.
optimal allocation for a fixed margin of error for each stratum:
n =
2 Vh-1
where d is the "margin of error" for each estimate within the strata
EPA QA/G-5S
58
Final
December 2002
-------
APPENDIX 6-B
DALENIUS-HODGES STRATIFICATION PROCEDURE
This procedure is used to determine the optimal cut-off points for stratification using a variable
(y) that is highly correlated with the variable of interest. Often this is a continuous variable expected to
be highly correlated with the primary outcome to be measured in the study.
1. Form an initial set of K intervals that cover the entire range of observed y values. Let
[A,.!, AJ denote the endpoints of the th interval (i=l,2,3,...,K-l). Count the number of
observations, Nf, in each interval.
2. Calculate D, = A,- Aj_, and T = ^N,.£>,.
i
3. For each interval /, calculate C, = \ T•. That is, add all the 7} from the first interval
;=i
up to, and including, interval /'. This makes a cumulative count.
L
4. Calculate Q = Total/L where Total = 2^ T, and L is the desired number of strata.
1=1
5. For each interval z, calculate C/Q and round it up to the next higher integer. This now
gives the stratum number to which the observations in interval / will be classified.
For example, supply the correlated variable y ranges from 0 to 50, and suppose L=3 strata will be
created. The Dalenius-Hodges procedure can be used to define the strata:
Interval
0-5
5-14
14-20
20-30
30-35
35-45
45-50
Total
D,
5
9
6
10
5
10
5
N,
254
195
160
135
90
155
76
1065
T,
35.6
41.9
31.0
36.7
21.2
39.4
19.5
225.3
c,
35.6
77.5
108.5
145.2
166.4
205.8
225.3
C./Q
(Q = 225.3/3=75.1)
0.47
1.03
1.44
1.93
2.22
2.74
3.00
Rounded value
1
2
2
2
3
3
3
It follows that the 1st stratum contains j-values 0-5, the second stratum contains ^-values between 5
and 30, the last stratum contains y-values between 30 and 50.
EPA QA/G-5S
59
Final
December 2002
-------
APPENDIX 6-C
CALCULATING THE MEAN AND STANDARD ERROR
Since it would be veiy difficult to estimate the number of soil samples, Nh, which could be taken
in each stratum, assign a weight, Wh, to each stratum based on the percentage of the study area
covered by the stratum. For instance, if down-wind clayey soil covers 35% of the study area, then
Wh=0.35 for this stratum. Note that the sum of the weights for all strata should equal 1.
Step 1 : Calculate the sample size, n,, for each stratum with a total sample size of 60
(n=60) under Neyman allocation using the equation:
.» -
h=l
The assumed population standard deviations, oh, and weights, Wh, for each
stratum were assigned as follows:
Stratum
Down-Wind / Clayey Soil
Down-Wind / Sandy Soil
Perpendicular Wind / Clayey Soil
Perpendicular Wind / Sandy Soil
Weight
(Wh)
0.35
0.15
0.30
0.20
Population Standard
Deviation, crh
75
20
20
5
Neyman Allocation
Sample Size, n,
43
5
10
2
Step 2: Calculate the mean, x h, and variance, s2, , of the samples within each stratum
using the standard formulae used for Simple Random Sampling. The results are
summarized in the following table:
EPA QA/G-5S
60
Final
December 2002
-------
Stratum
Down-Wind/Clayey Soil
Down-Wind/Sandy Soil
Perpendicular Wind/Clayey Soil
Perpendicular Wind/Sandy Soil
Mean
**
46.16
12.66
9.49
10.20
Variance
*,2
4287.84
107.08
51.88
19.52
Sample Size
nh
43
5
10
2
Weight
wh
0.35
0.15
0.30
0.20
Step 3: Calculate the mean, xst, under stratified sampling
L
f - \* W, rr = ?? 94
«p/ ~~ 7 n ft ^&-i*j~
h=l
When N is very large, as it is in this example, the equation for the variance
under stratified sampling reduces to:
variance of x =
= 13.55
Step 4: The standard error of the stratified sampling mean is the square root of the
variance:
standard error of x =
=3.68
EPA QA/G-5S
61
Final
December 2002
-------
Final
EPA QA/G-5S 62 December 2002
-------
CHAPTER?
SYSTEMATIC/GRID SAMPLING
7.1 OVERVIEW
Systematic sampling, also called grid sampling or regular sampling, consists of collecting
samples at locations or over time in a specified pattern. For example, samples might be collected from
a square grid over a set geographical area or at equal intervals over time. Systematic designs are good
for uniform coverage, ease of use, and the intuitive notion that important features of the population being
sampled will not be missed. Also, samples taken at regular intervals, such as at every node of an area
defined by a grid, are useful when the goal is to estimate spatial or temporal correlations or to identify a
pattern.
Systematic sampling is used to ensure that the target population is fully and uniformly
represented in the set of n samples collected. To make systematic sampling a probability-based design,
the initial sampling location is chosen at random. Then the remaining («-l) sampling locations are
chosen so all n are spaced according to some pattern.
There are two major applications for systematic sampling:
Spatial designs. Samples may be collected in one,
two, or three dimensions if the population
characteristic of interest has a spatial component.
Sampling along a line or transect is an example of
sampling in one dimension. Sampling every node on
a grid laid over an area of interest is sampling in two
dimensions. If depth or volume is of interest,
samples can be taken at regular grid intervals in
three dimensions, such as uniformly spacing samples
from a pile of dirt both horizontally and vertically.
Several options for systematic two-dimensional
sampling in space are shown in Figure 7-1 (Gilbert,
1987). hi Figure 7-la, sample location "A" is
randomly assigned and all other sampling locations
are then known once the grid is laid down. Note
how all the sampling points are an equal distance
from each other, thus causing problems if the
contamination of interest occurs in some fixed
pattern, hi Figure 7-lb, location "A" is also
Ce
(a)
ntral Aligned Square Grid
1
A
f
•
Unaligned Grid
(b)
•
A
•E
!•
•
B
•F
J«
£
• G
K«
•
D
•H
L«
Figure 7-1. Systematic
Designs for Sampling in
Space
EPA QA/G-5S
63
Final
December 2002
-------
selected at random and the remaining locations ("B" through "L") within their square
cells are determined randomly within each grid cell. This design has the advantages of
randomness combined with good coverage (somewhat similar to the concept of quasi-
randomness as discussed at the end of Section 5.5.2).
Temporal (periodic) designs. When samples are selected to represent a target
population that changes over time, data collectors would use a one-dimensional sample
where every kth unit is selected or a sample is collected at specific points in time.
Figure 7-2 (Gilbert, 1987) shows an example of periodic sampling. In this figure, a
systematic sample of n - 4 units is desired from a finite population of N =15 units,
representing 15 units of time. The 15 units are displayed as a circle for illustration, as if
the units were on a clock. The systematic interval between units was determined by
computing N/n = 15/4 = 3.75, which is rounded up to 4. Then a random number
between 1 and 15 was selected; namely 7. Hence, sampling starts at the 7th unit and
every 4th unit from that point is selected.
J—+. 1
© 2
14 3
13 (i,
12
/^^v ^
® 0
10 g 8
N = 15
Desired n = 4
Therefore N/n = 3.7 = 4 = k
Random starting location number
between 1 and 15 is 7
Figure 7-2. Choosing a Systematic Sample of n - 4
Units from a Finite Population of N = 15 Units
Grid designs can vary in their shape, orientation, and selection criteria for the initial grid node.
This flexibility, the intuitive appeal, and easily explained protocol for taking regular samples make
systematic sampling one of the more popular and defensible sampling designs.
7.2 APPLICATION
Systematic sampling is often used in environmental applications because it is practical and
convenient to implement in the field. It often provides better precision (i.e., smaller confidence intervals,
smaller standard errors of population estimates) and more complete coverage of the target population
than random sampling. Systematic sampling is appropriate if either of the following conditions pertain:
EPA QA/G-5S
64
Final
December 2002
-------
• There is no information about a population and the objective is to determine if there is a
pattern or correlation among units, or
• There is a suspected or known pattern or correlation among units at the site and the
objective is to estimate the shape of the pattern or the strength of the correlation.
Systematic sampling designs are used in three situations:
1. When making an inference about a population parameter such as the mean when
environmental measurements that are known to be heterogeneous. A systematic design
is only one of many sampling designs that may be used for making an inference about a
population parameter. However, if the concentrations over space or time in the target
population are correlated so that the data show definite spatial or temporal patterns,
then systematic sampling will often be more efficient (provide a more precise answer for
a given amount of sampling) than random sampling. Many automatic samplers use
systematic sampling due to the mechanical necessity of taking samples at fixed intervals.
2. When estimating a trend or identifying a spatial or temporal correlation A systematic
design is well suited for this type of problem because a constant distance or time
interval between sampling locations or times allows for the efficient estimation of trends
and patterns over time or space, as well as the correlation structure needed for
modeling. Random sampling would typically need more samples to achieve the same
amount of information about the patterns and correlation.
3. When looking for a "hot spot" or making a statement about the maximum size object
that could be missed with a given sampling design. If a systematic square, rectangular
or triangular grid design is laid over a study site, then it is possible to determine the
probability that any size of an approximately elliptical region of elevated concentration
("hot spot") will be hit by a sampling point on the grid. One can also determine the
spacing between sampling locations needed to hit an elliptical target with specified
probability.
If distinct features exist at a site, such as an ecological cluster or a groundwater plume, then
collecting data on a regular grid is the most efficient approach to ensuring such features are actually
detected. However, if the scale of the pattern or feature of interest is smaller than the spacing between
sampling locations, then the systematic pattern of sampling is not an efficient design unless the spacing
between sampling locations is reduced or some other procedure such as composite sampling is
introduced into the design.
Final
EPAQA/G-5S 65 December 2002
-------
Systematic sampling would be inappropriate if a known pattern of contamination coincides with
the regularity of the grid design. Such a coincidence would result in an overestimation or
underestimation of a particular trait in the target population of interest. For example, suppose a line of
trees resulted in soil mounds with high contamination along the tree line and a grid line was aligned with
the tree line. Then, a decision about the average contamination over an entire area would be upwardly
biased by so many samples collected in the high concentration area along the tree line. If prior
information is available on the possible patterns of contamination, this information may be important in
selecting grid spacing, grid orientation, and whether or not systematic sampling designs have an
advantage over other designs.
What are some more advanced findings on systematic sampling?
Section 8.2 of Cochran (1977) states that systematic sampling can be considerably more
precise than simple random sampling or even stratified random in some situations. He states:
"Systematic sampling is more precise than simple random sampling if the variance within the systematic
samples is larger than the population variance as a whole. Systematic sampling is precise when units
within the same sample are heterogeneous and is imprecise when they are homogeneous." Cochran
demonstrates that systematic sampling is capable of providing enhanced performance over other
designs depending on the properties of the target population. He provides results from a study of 13
different data sets from natural populations showing a consistent gain in precision using systematic
sampling.
Section 8.3 of Gilbert (1987) also discusses the relative performance of systematic sampling for
the following types of population structure:
• Populations in random order
• Populations with linear trends
• Populations with periodicities
• Populations with correlations between values in close proximity
Two observations can be made. First, for populations in random order, systematic sampling offers
convenience. An example of a random order population might be radioactive fallout from atmospheric
nuclear weapons tests that is uniformly distributed over large areas of land. Second, if the population
consists entirely of a linear trend, systematic sampling will, on the average, give a smaller variance of "x
(sampling error of the sample mean) than simple random sampling. However, stratified random
sampling will, on average, give a smaller variance of "x than either systematic sampling or simple
random sampling.
A comprehensive study by Yfantis, Flatman, and Behar (1987) discusses the level of efficiency
and accuracy of different grid types. They conclude that an equilateral triangular grid works slightly
Final
EPAQA/G-5S 66 December 2002
-------
better for the majority of the cases they studied. However, this study did not include the effects of a
second or additional phases of sampling. It is possible that when a multiple time period or phased
sampling design is planned, the specific type of first-phase sampling grid may be less important than
using geostatistical techniques (such as geostatistical simulations) to place second-phase samples in
locations that most reduce probabilities of estimation errors (EPA, 1996b).
7.3 BENEFITS
Systematic/grid sampling has the following benefits:
• Uniform, known, complete spatial/temporal coverage of the target population is
possible. A grid design provides the maximum spatial coverage of an area for a given
number of samples.
• The design and implementation of grids is relatively straightforward and has intuitive
appeal; field procedures can be written simply. Once an initial point is located, the
regular spacing allows field teams to easily locate the next sampling point, except for
unaligned or random samples within the grid structure.
• Multiple options are available for implementing a grid design. Often, sampling programs
are executed in phases. The initial phase uses broad-scale grids to look for any kind of
activity or hit. Once the general area or time frame of the activity of interest has been
identified, smaller-scale grids are used to refine the estimates. Alternatively, during a
single phase, the total area can be subdivided into areas based on the likelihood of
finding properties of interest and different grid spacings used in each sub-area. In
addition, one can overlay multiple grids, orient multiple grids in opposite directions,
intermix fine-mesh grids with large-mesh grids, and still maintain the constant spacing
desired for certain applications, such as estimating the correlation function (i.e.,
variogram). Standard formulae for estimating sample size and population parameters
are adjusted to account for these variations.
• Regularly spaced or regularly timed samples allow for spatial and temporal correlations
to be calculated, assuming the pattern of interest is larger than the spacing of the
sampled nodes. If correlation over space or time may be present and there are distinct
features or patterns in the population to be sampled, constant spacing of samples is
often a good option for estimating the features and making predictions of unsampled
areas.
• Grid designs can be implemented with little to no prior information about a site. The
only inputs needed are the total area to be covered and the number of samples (or
Final
EPAQA/G-5S 67 December 2002
-------
alternatively, the grid spacing) to be used. Grid sampling is often used for pilot studies,
scoping studies, and exploratory studies using the assumption that there are no patterns
or regularities in the distribution of the contaminant of interest.
Many studies have been performed using simulated data sets to compare the efficiency of
alternative sampling designs. All such studies conclude that the overall performance of the design is
influenced as much by particular features in the population to be sampled along with the estimators used
for estimating population parameters of interest as the type of design chosen.
What are the results from some more advanced studies?
In a study on trace elements in contaminated soil to assess the impact of contaminated soil on
the environment and on agricultural activities, Wang and Qi (1998) found that given a certain sampling
density, systematic sampling had better estimation performance than either a stratified or a random
sampling design.
In a study on assessing the percentage cover of crop residue to estimate soil erosion, Li and
Chaplin (1995) found that systematic sampling was more precise than random sampling for both corn
and soybean residue in most cases. Crop residue is plant material left on the field surface after harvest.
Measuring the crop residue cover on the soil surface is essential in the management of soils to reduce
erosion. Li and Chaplin laid grid frames on top of a picture taken of fields with corn and soybean
residue. The image was then read into a computer program that randomly changed the position of the
grid on the picture. Light densities recorded the reading of coverage at each node. The grid design
compared favorably to a design where random locations were sampled for coverage readings, using the
same number of sampling points as used in the systematic sampling.
In another study, Li and Chaplin (1998) considered both one- and two-dimensional sampling
designs for estimating crop residue coverage. Although widely used, no rigorous study exists on the
precision of the line transect method. Li and Chaplin used a computer-generated virtual field surface
and applied various sampling designs. They found the square grid was more precise than the line
transect methods because of the smaller coefficient of variation over a wide range of sampling points
and residue cover.
7.4 LIMITATIONS
Systematic/grid sampling may not be as efficient as other designs if prior information is available
about the population. Such prior information could be used as a basis for stratification or identifying
areas of higher likelihood of finding population properties of interest.
Final
EPAQA/G-5S 68 December 2002
-------
If the population properties of interest are aligned with the grid, systematic/grid sampling raises
the possibility of an overestimation or underestimation (bias) of a population characteristic. Caution
should be used if there is a possibility of a cyclical pattern in the unit or process to be sampled that
might match the sampling frequency. For example, one would not want to take air samples every
Monday morning if a nearby plant always pressure-cleaned the duct work on Monday morning.
As mentioned earlier, a single systematic sample cannot be used to get a completely valid
estimate of the standard error of the mean, i.e., variance of the mean, without some assumptions about
the population. This could result in an inaccurate calculation for the confidence interval of the mean.
Several approximate methods have been proposed by Wolter (1984) and illustrated in Section 8.6 of
Gilbert (1987). One option is to take multiple sets of systematic samples, each with a randomly
determined starting point, and calculate an empirical estimate of the standard error of the mean. The
use of multiple sets of systematic samples has to be balanced against the cost or feasibility of using the
sampling designs incorporating compositing. Methods for estimating the variance of the mean
developed for simple random sampling plans can be used with confidence only when the population is
in random order.
7.5 IMPLEMENTATION
Systematic sampling designs are relatively straightforward to implement. You need to know
how many samples to take and where to take them.
7.5.1 How do you decide how many samples to take?
Many of the sample size formulae provided for simple random sampling (i.e., the sample size
formula for estimating a mean provided in Chapter 4) can be used for systematic sampling as long as
there are no strong cyclical patterns, periodicities, or significant spatial correlations between pairs of
sample locations not intoduced as part of the grid or systematic process. For the hot spot problem,
there are nomographs provided in Section 10.1 of Gilbert (1987) and a computer program called
ELIPGRID PC (Davidson, 1995) for calculating the optimal grid spacing for a hot spot of prespecified
size and shape with a specified confidence of finding the hot spot. Li and Chaplin (1998) discuss how
to design grid sampling patterns with the least number of sampling points to achieve a specified
precision based on results.
7.5.2 How do you decide where to take samples?
There are many variations on patterns for regular spacing of systematic samples. Patterns
include square, rectangles, triangles, circles, and hexagons. Basic geometry can be used to determine
internodal spacing. For example, for the two-dimensional sampling problem, EPA has detailed
guidance on how to locate samples using a systematic sampling design (EPA, 1989). Figure 7-3, taken
Final
EPA QA/G-5S 69 December 2002
-------
(1) Select initial random point.
100
75-
Y 50-
25"
0
0 25 50 75 100 125150 175
X
(3) Construct lines parallel to
vertical axis, separated by
a distance of L.
100-
75-
f 50"
25-
0
4—
L
1 i i I i i
0 25 50 75 100 125150 175
(2) Construct coordinate axis going
through initial point.
100-
Y 50-
/J
^ —
^ ^
-^_ ^
I 1 1 I 1 I 1
0 25 50 75 100 125150 175
X
(4) Construct lines parallel to
horizontal axis, separated by
a distance of L.
100
Y 50-
0 25 50 75 100 125150 175
K
<
L,
L
I S
I
_r>
Figure 7-3. Locating a Square Grid Systematic Sample
from that document, summarizes how to lay out a square grid. Once a sample size n and the area A to
be sampled have been specified, Equations 7-1 and 7-2 can be used to calculate the spacing between
adjacent sampling locations. For the square grid, the distance L between the vertical and horizontal
parallel lines is:
(7-1)
For the triangular grid, the distance L becomes:
(7-2)
For one-dimensional sampling, the procedure theoretically is even simpler, but the complexities
for the one-dimensional problem come in the application. For example, the line transect method is used
extensively by U.S. Department of Agriculture technicians as a quick means to estimate agricultural
conditions, such as plant coverage. To conduct a measurement in a certain area, a cord with 50 to 100
equally spaced beads is stretched diagonally across the crop rows. Using the same point on each
bead—for example, the leading edge—those beads are counted that have the plant characteristic of
EPA QA/G-5S
70
Final
December 2002
-------
interest under them when viewed directly from above. This count is divided by the total number of
beads on the cord to give an observation of the percent occurrence. An average of three to five
observations in the area is used to estimate field totals. The transect length, size of the cord, and
marker spacing are part of the protocol.
For more discussion of the diagonal line transect method, refer to the MidWest Plan Service
(MWPS, 1992). Also, see Li and Chapin (1998) for more detailed information on implementing this
method.
7.6 RELATIONSHIP TO OTHER SAMPLING DESIGNS
Systematic sampling can be used in place of random sampling in many of the designs discussed
in this document. For example, sampling on a grid pattern can be conducted within each stratum of a
stratified sampling plan (Chapter 5). The key criteria for using a systematic design is that a random
starting location be identified for the selection of the initial unit and the grid layout cannot coincide with a
characteristic of interest in the population.
For example, the Environmental Monitoring and Assessment Program uses a sampling strategy
that has multiple stages and involves aspects of stratified and systematic sampling. The first stage of the
design is a triangular grid covering the conterminous United States. The grid is randomly situated over
the U.S. land mass; the interpoint distance along the grid is approximately 27 kilometers, and the ratio
of area to number of grid points is approximately 635 square kilometers per grid point. The grid design
is good for measuring those ecological resources that do not change position over the time of the survey
and that need to be sampled repeatedly over time. The multistage design permits the design to be
tailored to the resources of interests and purposes of the reporting. During the first stage, data may be
collected at random sample grid points; on the basis of these data, informed choices can be made for
the definition, stratification, and so on of second and lower stage units. In preparation for the second
stage, a randomly placed hexagonal template is constructed over the region. The typical size of the
template is 16 hexagons per grid point (Cox et al.,1995).
The combination of systematic and random sampling was demonstrated in a study by Cailas et
al. (1995) in proposing a methodology for an accurate estimation of the total amount of materials
recycled. One objective of this comprehensive study of the recycling infrastructure in Illinois was to
make an accurate estimation of the amount of total material recycled. It was found that responses from
a small number of previously identified critical facilities were essential for an accurate estimation of the
total amount of material recycled. The combined design consisted of systematically sampling the critical
facilities and randomly sampling the remaining ones. This application yielded an accurate estimate with
less than 1% difference from the actual amount recycled. This was done with only 15% of the total
number of recycling facilities included in the critical facilities subpopulation.
Final
EPAQA/G-5S 71 December 2002
-------
7.7 EXAMPLES
7.7.1 Implementing Triangular Sampling
This example is taken from EPA (1989, 1992). Suppose 30 samples were to be taken from an
area of 14,025 m2. This area is shown in gray in Figure 7-4.
£2
0
0)
.*-•
03
C
o
o
O
25
0 25 50 75 100 125 150 175 200
X Coordinate (meters)
Sampling Grid
The following steps are performed:
1.
2.
andYmax=100.
lin = 0,Ymm = 0,Xmax = 200,
! and
R2
0.820 and 0.360.
3.
Xn,,,, + Rl(Xmax - Xmm)
Ymm + R2 (Ymax - Ymin)
4.
5. Use the formula for
EPA QA/G-5S
72
Final
December 2002
-------
• Using remotely sensed information (aerial photographs and/or spatially referenced
databases as found in a geographic information systems) to identify locations to be
studied.
• Using distance along a pipeline (longer distance implying lower levels of a contaminant)
to approximate the relative concentrations of a contaminant at various distances.
A simple ecological example will illustrate the ranked set sampling approach (based on Stokes
and Sager, 1988); a more detailed lead contamination example follows in Section 8.2. The
recommended step-by-step process for setting up an ranked set sampling design is presented in
Appendix 8-A. Suppose the average individual volume of the trees on a property needs to be
estimated. Begin by randomly selecting two trees and judge by eye which tree has the most volume.
Mark the smaller tree to be carefully measured for volume and ignore the other tree. Next, randomly
select another two trees. Mark the larger of these two trees and ignore the other tree. Then repeat
this procedure, alternatively marking the smaller of the first two trees, then the larger of the second two
trees. Repeat this procedure a total of 10 cycles for a total of 40 trees. Twenty of the trees will have
been marked and 20 ignored. Of the 20 marked trees, 10 are from a stratum of generally smaller trees
and 10 are from a stratum of generally larger trees. Determine the volume of each of the 20 marked
trees by careful measurement and use that measurement to estimate the average volume per tree on the
lot. In this illustration there were 10 cycles and 2 trees marked per cycle. In practice, the number of
trees marked per cycle (the "set size") and the number of cycles is determined using a systematic
planning process, as illustrated in Appendix 8-A.
Example of Using Ranked Set Sampling to Estimate The Mean Lead Concentration in Soil
Suppose a future residential area is suspected of having lead concentrations in surface soil that
exceed background concentrations. As part of the risk assessment process, the soil of the area will be
sampled to estimate the mean lead concentration. Prior studies have shown that x-ray fluorescence
(XRF) measurements of lead in soil obtained using a hand-held in-situ detector closely correlate with
laboratory measurements of lead in soil at the same locations. Furthermore, it was determined that the
cost of taking the XRF measurements in the field was very low compared to the cost of laboratory
measurements for lead. (Cost considerations are discussed in Appendix 8-A.) Hence, ranked set
sampling was selected for data collection instead of simple random sampling (see Appendix 8-A for
guidance on how to determine if ranked set sampling is preferred over simple random sampling).
Suppose the systematic planning process employed determined that n - 12 soil samples should
be collected and measured for lead in the laboratory in order to meet the acceptance and performance
criteria for this study (i.e., to have 95% confidence that the estimated mean computed using laboratory
lead measurements would be within 25% of the true mean). Also, in order to obtain information to
properly compute the variance of this estimated mean, the following replication process was used to
obtain the 12 samples. Specifically, m = 3 field samples (the "set size") were collected in each of r = 4
Final
EPAQA/G-5S 78 December 2002
-------
CHAPTERS
RANKED SET SAMPLING
8.1 OVERVIEW
This chapter describes and illustrates ranked set sampling, an innovative sampling design
originally developed by Mclntyre (1952). The unique feature of ranked set sampling is that it combines
simple random sampling with the field investigator's professional knowledge and judgment to pick
places to collect samples. Alternatively, on-site measurements can replace professional judgment when
appropriate. The use of ranked set sampling increases the chance that the collected samples will yield
representative measurements; that is, measurements that span the range of low, medium, and high
values in the population. This results in better estimates of the mean as well as improved performance
of many statistical procedures such as testing for compliance with a risk-based or background-based
(reference-based) standard. Moreover, ranked set sampling can be more cost-efficient than simple
random sampling because fewer samples need to be collected and measured.
The use of professional judgment in the process of selecting sampling locations is a powerful
incentive to use ranked set sampling. Professional judgment is typically applied by visually assessing
some characteristic or feature of various potential sampling locations in the field, where the
characteristic or feature is a good indicator of the relative amount of the variable or contaminant of
interest that is present. For example, the relative amounts of a pollutant in randomly selected sampling
spots may be assessed based on the degree of surface or subsurface soil staining, discoloration of soil,
or the amount of plant defoliation in each spot. Similarly, the yield of a plant species in randomly
selected potential 1 meter by 1 meter field plots may be visually assessed based on the density, height,
or coloration of vegetation in each plot. This assessment ranks the visually assessed locations from
smallest to largest with respect to the variable of interest; it is then used as described in this chapter to
determine which spots to actually sample.
In some situations, a more accurate assessment of the relative amounts of a pollutant present at
field locations can be provided by an inexpensive on-site measurement. Indeed, the sensitivity and
accuracy of in-situ detectors has increased greatly in recent years. Some examples include the
following:
• Using ultraviolet fluorescence in the field to measure (screen) for BTEX (benzene,
toluene, ethyl benzene, and xylene) and PAHs (polyaromatic hydrocarbons) in soil.
• Using X-ray fluorescence in the field to measure lead or other metals in soil.
• Using total organic halide (TOX) measurements of soil as a screening measurement for
volatile organic solvents.
Final
EPAQA/G-5S 77 December 2002
-------
Chapter 10 of Gilbert (1987). In most situations the triangular grid is more efficient at detecting hot
spots than the square or rectangular grid designs.
In summary, if nothing is known about the spatial characteristics of the target population, grid
sampling is efficient in finding patterns or locating rare events unless the patterns or events occur on a
much finer scale than the grid spacing. If there is a known pattern or spatial or temporal characteristic
of interest, grid sampling may have advantages over other sampling designs depending on what is
known of the target population and what questions are being addressed by sampling.
Final
EPAQA/G-5S 76 December 2002
-------
7.7.5 Geostatistical Applications
When there is spatial or temporal dependence, moving from one point to another nearby
location usually results in values that do not change dramatically. Samples close together will tend to
have more similar values than samples far apart. This is often the case in an environmental setting. The
method chosen to estimate an overall site mean, as well as the site variance, must properly account for
the pattern of spatial continuity. Any non-random or partially random sampling scheme (including a
systematic grid design) will tend to produce biased estimates if not adjusted for the degree of spatial
correlation. There exist techniques to minimize the biasing impact of spatial correlation while generating
reasonable estimates of the mean.
EPA has produced guidance for geostatistical soil sampling (EPA, 1996b). Sampling in
support of geostatistical analysis is an important topic and discussed in detail in this EPA document.
One important component of geostatistics is the variogram. The variogram is a plot of the variance of
paired sample measurements as a function of the distance between samples. Samples taken on a
regular grid are desirable for estimating the variogram. While all regular grids tend to work reasonably
well in geostatistical applications, there are differences in efficiency depending on the type of grid
pattern chosen. The most common grid types include square, triangular, and hexagonal patterns. Entz
and Chang (1991) evaluated 16 soil sampling schemes to determine their impact on directional sample
variograms and kriging. They concluded that for their case study, grid sampling needs more samples
than stratified random sampling and the stratified-grid design, but the accuracy of the kriged estimates
was comparable for all sampling designs. They also found that the variograms that were estimated from
sample data collected from stratified and grid designs led to the same conclusion about the spatial
variability of the soil bulk density (the subject of the study).
7.7.6 Hot Spot Problem Application
One application for using grid sampling that is widely encountered in environmental settings is in
the spatial context of searching for hot spots. The problem can be formulated several ways:
• What grid spacing is needed to hit a hot spot with specified confidence?
• For a given grid spacing, what is the probability of hitting a hot spot of a specified size?
• What is the probability a hot spot exists when no hot spots were found by sampling on
a grid?
For this application, sampling over a gridded area at the nodes is used to search for an
object(s) of interest or, alternatively, to be able to state that an object of a specified size cannot exist if a
grid node was not intersected For example, the sampling goal may be to find if at least one 55-gallon
drum is buried in an area. Optimal grid spacings for the hot spot problem have been worked out for a
range of relative object sizes and orientations. The hot spot problem is discussed extensively in
Final
EPA QA/G-5S 75 December 2002
-------
regional basis with known confidence. The Environmental Monitoring and Assessment Program's
sampling design is based on a systematic, triangular grid (also see discussion in Section 7.6). The grid
is used to select a sample in a manner analogous to the National Stream Survey. For example, for
sampling lakes, each lake is identified by its "center" and a grid node identifies a lake to be included in
the sample as the lake that has a center closest to the grid node. The probability of sampling a given
lake is proportional to the area of the polygon enclosing the region closer to that lake's center than to
any other lake's center. Larger lakes have a higher probability of being included in the sample
(Stehman and Overton, 1994).
When estimating abundance for various animals, samples are often taken along a transect at
regular intervals. This is a form of grid sampling. A pronghorn (antelope) abundance study evaluated
the efficiency of systematic sampling versus simple random sampling versus probability proportional to
size sampling (Kraft et al, 1995). The total number of pronghorn was already known; this was a
simulation study to evaluate alternative sampling plans. The sampling unit was a 0.8-km-side linear
transect variable in length according to size and shape of the study area. Six different study areas were
used. A plane flew along the transect and when a pronghorn was sighted, the pilot circled until the herd
could be counted. The goal was to estimate total abundance of pronghorn in an area. For the
systematic sampling, the sampling units (transects of different lengths) in an area were numbered; after
the first unit was randomly chosen, every pth unit following was selected. For this study, it was found
that stratification combined with accurate estimates of optimal stratum sample sizes increased precision,
reducing the mean coefficient of variation from 33 without stratification to 25 with stratification. Cost,
however, increased with stratification by 23%.
7.7.4 Groundwater Applications
For sampling groundwater in fixed wells over time, a systematic sample in time is usually
preferred over a simple random sample in time. There are several reasons for this preference:
extrapolating from the sample period to future periods is easier with a systematic sample than a simple
random sample; seasonal cycles can be easily identified and accounted for in the data analysis; a
systematic sample will be easier to administer because of the fixed schedule for sampling times; and
most groundwater samples have been traditionally collected using a systematic sample, making
comparisons to background more straightforward.
EPA guidance on groundwater sampling for evaluating attainment of cleanup standards
(EPA, 1992) suggests a variation of systematic sampling when periodic seasonal variations or other
repeated patterns are suspected. Several variations are described and recommended depending on the
sampling goal as biased estimates may result unless the systematic sample has a spacing small enough to
characterize both high and low concentrations. For example, the goals described include identifying or
characterizing the pattern of contamination in an aquifer, obtaining comparable period-to-period
samples, and making comparisons to background when there are large seasonal fluctuations in the data.
Final
EPAQA/G-5S 74 December 2002
-------
, 14,025
L=fcir23-23-23
6. A line parallel to the x-axis through the point (164,36) is drawn; points are marked off
23 meters apart from this line as shown in Figure 7-4.
7. The midpoint between the last two points along the line is found and a point is marked
at a distance (0.866 x 23) = 19.92 (i.e. 20) meters perpendicular to the line at that
midpoint. This point is the first sample location on the next line.
8. Points at distance L=23 meters apart are marked on this new line.
9. Steps 6 and 7 are repeated until the triangular grid is determined.
There are now exactly 30 locations marked off in a triangular pattern. In some instances, due to
irregular boundaries, it may not be possible to obtain the exact number of samples planned for.
7.7.2 Soil Contamination Applications
For applications where the goal of sampling is to evaluate the attainment of cleanup standards
for soil and solid media, EPA guidance (EPA, 1992) recommends collecting samples in the reference
areas and cleanup units on a random-start equilateral triangular grid except when the remedial-action
method may leave contamination in a pattern that could be missed by a triangular grid; in this case,
unaligned grid sampling is recommended. There are also many applications for grid sampling when the
goal is site characterization. Grid sampling insures all areas are represented in the sample and can
provide confidence that a site has been fully characterized.
7.7.3 Ecological and Environmental Survey Applications
The National Stream Survey and EPA's Environmental Monitoring and Assessment Program
are two large-scale environmental surveys that use variable probability, systematic sampling and a
special estimator called the Horvitz-Thompson estimator (Cochran, 1977) to estimate population
parameters of ecological interest. For the National Stream Survey, all streams represented as blue lines
on 1:250,000 topographic maps define the target population of streams. Sampling units were selected
using a square grid, with density of 1 grid node per 64 square miles, imposed on 1:250,000
topographic maps of a target area. A target stream reach was selected into the sample if a grid node
fell into the direct watershed of that reach. This protocol resulted in reaches being sampled with
probability proportional to direct watershed area. In the Environmental Monitoring and Assessment
Program, one objective is to estimate the current condition of the nation's ecological resources on a
Final
EPAQA/G-5S 73 December 2002
-------
1—t
, 14,025
L=''^ir23-23'23
6. A line parallel to the x-axis through the point (164,36) is drawn; points are marked off
23 meters apart from this line as shown in Figure 7-4.
7. The midpoint between the last two points along the line is found and a point is marked
at a distance (0.866 x 23) = 19.92 (i.e. 20) meters perpendicular to the line at that
midpoint. This point is the first sample location on the next line.
8. Points at distance L=23 meters apart are marked on this new line.
9. Steps 6 and 7 are repeated until the triangular grid is determined.
There are now exactly 30 locations marked off in a triangular pattern. In some instances, due to
irregular boundaries, it may not be possible to obtain the exact number of samples planned for.
7.7.2 Soil Contamination Applications
For applications where the goal of sampling is to evaluate the attainment of cleanup standards
for soil and solid media, EPA guidance (EPA, 1992) recommends collecting samples in the reference
areas and cleanup units on a random-start equilateral triangular grid except when the remedial-action
method may leave contamination in a pattern that could be missed by a triangular grid; in this case,
unaligned grid sampling is recommended. There are also many applications for grid sampling when the
goal is site characterization. Grid sampling insures all areas are represented in the sample and can
provide confidence that a site has been fully characterized.
7.7.3 Ecological and Environmental Survey Applications
The National Stream Survey and EPA's Environmental Monitoring and Assessment Program
are two large-scale environmental surveys that use variable probability, systematic sampling and a
special estimator called the Horvitz-Thompson estimator (Cochran, 1977) to estimate population
parameters of ecological interest. For the National Stream Survey, all streams represented as blue lines
on 1:250,000 topographic maps define the target population of streams. Sampling units were selected
using a square grid, with density of 1 grid node per 64 square miles, imposed on 1:250,000
topographic maps of a target area. A target stream reach was selected into the sample if a grid node
fell into the direct watershed of that reach. This protocol resulted in reaches being sampled with
probability proportional to direct watershed area, hi the Environmental Monitoring and Assessment
Program, one objective is to estimate the current condition of the nation's ecological resources on a
Final
EPAQA/G-5S 73 December 2002
-------
regional basis with known confidence. The Environmental Monitoring and Assessment Program's
sampling design is based on a systematic, triangular grid (also see discussion in Section 7.6). The grid
is used to select a sample in a manner analogous to the National Stream Survey. For example, for
sampling lakes, each lake is identified by its "center" and a grid node identifies a lake to be included in
the sample as the lake that has a center closest to the grid node. The probability of sampling a given
lake is proportional to the area of the polygon enclosing the region closer to that lake's center than to
any other lake's center. Larger lakes have a higher probability of being included in the sample
(Stehman and Overton, 1994).
When estimating abundance for various animals, samples are often taken along a transect at
regular intervals. This is a form of grid sampling. A pronghorn (antelope) abundance study evaluated
the efficiency of systematic sampling versus simple random sampling versus probability proportional to
size sampling (Kraft et al., 1995). The total number of pronghorn was already known; this was a
simulation study to evaluate alternative sampling plans. The sampling unit was a 0.8-km-side linear
transect variable in length according to size and shape of the study area. Six different study areas were
used. A plane flew along the transect and when a pronghorn was sighted, the pilot circled until the herd
could be counted. The goal was to estimate total abundance of pronghorn in an area. For the
systematic sampling, the sampling units (transects of different lengths) in an area were numbered; after
the first unit was randomly chosen, every pth unit following was selected. For this study, it was found
that stratification combined with accurate estimates of optimal stratum sample sizes increased precision,
reducing the mean coefficient of variation from 33 without stratification to 25 with stratification. Cost,
however, increased with stratification by 23%.
7.7.4 Groundwater Applications
For sampling groundwater in fixed wells over time, a systematic sample in time is usually
preferred over a simple random sample in time. There are several reasons for this preference:
extrapolating from the sample period to future periods is easier with a systematic sample than a simple
random sample; seasonal cycles can be easily identified and accounted for in the data analysis; a
systematic sample will be easier to administer because of the fixed schedule for sampling times; and
most groundwater samples have been traditionally collected using a systematic sample, making
comparisons to background more straightforward.
EPA guidance on groundwater sampling for evaluating attainment of cleanup standards
(EPA, 1992) suggests a variation of systematic sampling when periodic seasonal variations or other
repeated patterns are suspected. Several variations are described and recommended depending on the
sampling goal as biased estimates may result unless the systematic sample has a spacing small enough to
characterize both high and low concentrations. For example, the goals described include identifying or
characterizing the pattern of contamination in an aquifer, obtaining comparable period-to-period
samples, and making comparisons to background when there are large seasonal fluctuations in the data.
Final
EPAQA/G-5S 74 December 2002
-------
7.7.5 Geostatistical Applications
When there is spatial or temporal dependence, moving from one point to another nearby
location usually results in values that do not change dramatically. Samples close together will tend to
have more similar values than samples far apart. This is often the case in an environmental setting. The
method chosen to estimate an overall site mean, as well as the site variance, must properly account for
the pattern of spatial continuity. Any non-random or partially random sampling scheme (including a
systematic grid design) will tend to produce biased estimates if not adjusted for the degree of spatial
correlation. There exist techniques to minimize the biasing impact of spatial correlation while generating
reasonable estimates of the mean.
EPA has produced guidance for geostatistical soil sampling (EPA, 1996b). Sampling in
support of geostatistical analysis is an important topic and discussed in detail in this EPA document.
One important component of geostatistics is the variogram. The variogram is a plot of the variance of
paired sample measurements as a function of the distance between samples. Samples taken on a
regular grid are desirable for estimating the variogram. While all regular grids tend to work reasonably
well in geostatistical applications, there are differences in efficiency depending on the type of grid
pattern chosen. The most common grid types include square, triangular, and hexagonal patterns. Entz
and Chang (1991) evaluated 16 soil sampling schemes to determine their impact on directional sample
variograms and kriging. They concluded that for their case study, grid sampling needs more samples
than stratified random sampling and the stratified-grid design, but the accuracy of the kriged estimates
was comparable for all sampling designs. They also found that the variograms that were estimated from
sample data collected from stratified and grid designs led to the same conclusion about the spatial
variability of the soil bulk density (the subject of the study).
7.7.6 Hot Spot Problem Application
One application for using grid sampling that is widely encountered in environmental settings is in
the spatial context of searching for hot spots. The problem can be formulated several ways:
What grid spacing is needed to hit a hot spot with specified confidence?
• For a given grid spacing, what is the probability of hitting a hot spot of a specified size?
• What is the probability a hot spot exists when no hot spots were found by sampling on
a grid?
For this application, sampling over a gridded area at the nodes is used to search for an
object(s) of interest or, alternatively, to be able to state that an object of a specified size cannot exist if a
grid node was not intersected. For example, the sampling goal may be to find if at least one 55-gallon
drum is buried in an area. Optimal grid spacings for the hot spot problem have been worked out for a
range of relative object sizes and orientations. The hot spot problem is discussed extensively in
Final
EPAQA/G-5S 75 December 2002
-------
Chapter 10 of Gilbert (1987). In most situations the triangular grid is more efficient at detecting hot
spots than the square or rectangular grid designs.
In summary, if nothing is known about the spatial characteristics of the target population, grid
sampling is efficient in finding patterns or locating rare events unless the patterns or events occur on a
much finer scale than the grid spacing. If there is a known pattern or spatial or temporal characteristic
of interest, grid sampling may have advantages over other sampling designs depending on what is
known of the target population and what questions are being addressed by sampling.
Final
EPAQA/G-5S 76 December 2002
-------
CHAPTER 8
RANKED SET SAMPLING
8.1 OVERVIEW
This chapter describes and illustrates ranked set sampling, an innovative sampling design
originally developed by Mchityre (1952). The unique feature of ranked set sampling is that it combines
simple random sampling with the field investigator's professional knowledge and judgment to pick
places to collect samples. Alternatively, on-site measurements can replace professional judgment when
appropriate. The use of ranked set sampling increases the chance that the collected samples will yield
representative measurements; that is, measurements that span the range of low, medium, and high
values in the population. This results in better estimates of the mean as well as improved performance
of many statistical procedures such as testing for compliance with a risk-based or background-based
(reference-based) standard. Moreover, ranked set sampling can be more cost-efficient than simple
random sampling because fewer samples need to be collected and measured.
The use of professional judgment in the process of selecting sampling locations is a powerful
incentive to use ranked set sampling. Professional judgment is typically applied by visually assessing
some characteristic or feature of various potential sampling locations in the field, where the
characteristic or feature is a good indicator of the relative amount of the variable or contaminant of
interest that is present. For example, the relative amounts of a pollutant in randomly selected sampling
spots may be assessed based on the degree of surface or subsurface soil staining, discoloration of soil,
or the amount of plant defoliation in each spot. Similarly, the yield of a plant species in randomly
selected potential 1 meter by 1 meter field plots may be visually assessed based on the density, height,
or coloration of vegetation in each plot. This assessment ranks the visually assessed locations from
smallest to largest with respect to the variable of interest; it is then used as described in this chapter to
determine which spots to actually sample.
In some situations, a more accurate assessment of the relative amounts of a pollutant present at
field locations can be provided by an inexpensive on-site measurement. Indeed, the sensitivity and
accuracy of in-situ detectors has increased greatly in recent years. Some examples include the
following:
• Using ultraviolet fluorescence in the field to measure (screen) for BTEX (benzene,
toluene, ethyl benzene, and xylene) and PAHs (polyaromatic hydrocarbons) in soil.
• Using X-ray fluorescence in the field to measure lead or other metals in soil.
• Using total organic halide (TOX) measurements of soil as a screening measurement for
volatile organic solvents.
Final
EPAQA/G-5S 77 December 2002
-------
• Using remotely sensed information (aerial photographs and/or spatially referenced
databases as found in a geographic information systems) to identify locations to be
studied.
• Using distance along a pipeline (longer distance implying lower levels of a contaminant)
to approximate the relative concentrations of a contaminant at various distances.
A simple ecological example will illustrate the ranked set sampling approach (based on Stokes
and Sager, 1988); a more detailed lead contamination example follows in Section 8.2. The
recommended step-by-step process for setting up an ranked set sampling design is presented in
Appendix 8-A. Suppose the average individual volume of the trees on a property needs to be
estimated. Begin by randomly selecting two trees and judge by eye which tree has the most volume.
Mark the smaller tree to be carefully measured for volume and ignore the other tree. Next, randomly
select another two trees. Mark the larger of these two trees and ignore the other tree. Then repeat
this procedure, alternatively marking the smaller of the first two trees, then the larger of the second two
trees. Repeat this procedure a total of 10 cycles for a total of 40 trees. Twenty of the trees will have
been marked and 20 ignored. Of the 20 marked trees, 10 are from a stratum of generally smaller trees
and 10 are from a stratum of generally larger trees. Determine the volume of each of the 20 marked
trees by careful measurement and use that measurement to estimate the average volume per tree on the
lot. In this illustration there were 10 cycles and 2 trees marked per cycle. In practice, the number of
trees marked per cycle (the "set size") and the number of cycles is determined using a systematic
planning process, as illustrated in Appendix 8-A.
Example of Using Ranked Set Sampling to Estimate The Mean Lead Concentration in Soil
Suppose a future residential area is suspected of having lead concentrations in surface soil that
exceed background concentrations. As part of the risk assessment process, the soil of the area will be
sampled to estimate the mean lead concentration. Prior studies have shown that x-ray fluorescence
(XRF) measurements of lead in soil obtained using a hand-held in-situ detector closely correlate with
laboratory measurements of lead in soil at the same locations. Furthermore, it was determined that the
cost of taking the XRF measurements in the field was very low compared to the cost of laboratory
measurements for lead. (Cost considerations are discussed in Appendix 8-A.) Hence, ranked set
sampling was selected for data collection instead of simple random sampling (see Appendix 8-A for
guidance on how to determine if ranked set sampling is preferred over simple random sampling).
Suppose the systematic planning process employed determined that n = 12 soil samples should
be collected and measured for lead in the laboratory in order to meet the acceptance and performance
criteria for this study (i.e., to have 95% confidence that the estimated mean computed using laboratory
lead measurements would be within 25% of the true mean). Also, in order to obtain information to
properly compute the variance of this estimated mean, the following replication process was used to
obtain the 12 samples. Specifically, m = 3 field samples (the "set size") were collected in each of r = 4
Final
EPAQA/G-5S 78 December 2002
-------
cycles to obtain the necessary n = mxr =3x4=12 samples that will be measured for lead in the
laboratory. A method to determine m and r is provided in Appendix 8-A.
The ranked set sampling method for determining the three field locations to be sampled is as
follows:
Qsetl
set 2
Set 3
Figure 8-1. Using Ranked Set Sampling to Select
Three Locations
1. Use simple random sampling to
randomly select m2 = 32 = 9
locations on the property.
Randomly divide the nine
locations into m sets of size m
(3 sets of size 3). In Figure 8-1
the first set of three locations is
denoted by "Set 1," the second
set by "Set 2," and the third set
by "Set 3."
2. Consider the three locations in
Set 1. MakeanXRF
measurement at each of those
three locations and label the
locations 1,2, and 3 to indicate
the smallest, middle, and largest
XRF measurement, respectively. Collect the first soil sample at location label 1 in Set 1; this
location has the smallest XRF lead measurement in Set 1 (labeled 1* in Figure 8-1).
3. Consider the three locations in Set 2 and make an XRF measurement at each of those
locations. Collect the second soil sample at label 2 in Set 2; this location has the second highest
XRF measurement in Set 2 (labeled 2* in Figure 8-1).
4. Consider the three locations in Set 3 and make an XRF measurement at each of the three
locations in that set. Collect the third soil sample at label 3 in Set 3; this location has the highest
XRF measurement in Set 3 (labeled 3* in Figure 8-1).
Thus, nine in-situ XRF measurements are used to guide the selection of three soil samples that
will be measured for lead in the laboratory. Then, this procedure is repeated r = 4 times to obtain the
entire n = mxr = 3x4=\2 soil samples needed. This replication process is needed to estimate the
variance of the estimated mean (see Appendix 8-A for the computational formula). In practice, if
professional judgment is used to rank the locations in each set, the set size (m = 3 in this example)
should be between 2 and 5. Larger values of m make it more difficult to accurately rank the locations
EPA QA/G-5S
79
Final
December 2002
-------
within each set. However, set size larger than five may be practical if field locations are ranked using
screening measurements. In general, larger set sizes when using screening measurements are desirable
because they result in more precise estimates of the mean.
Note that the above example is a balanced ranked set sampling design, that is, the same
number of field locations, r = 4, are sampled for each of the m = 3 ranks. That is, in the above
example, a sample is collected at each of four locations expected to have a relatively small value of the
variable of interest (lead), as well as at four locations expected to have a mid-value of lead and at four
locations expected to have a relatively large value of lead. Unbalanced ranked set sampling designs can
also be used, as discussed in Section 8.5.2 and Appendix 8-A.
8.2 UNDER WHAT CONDITIONS IS RANKED SET SAMPLING APPROPRIATE?
Ranked set sampling is appropriate when the following conditions hold:
• The cost of laboratory measurements is high relative to the cost of using screening
measurements or professional judgment in the field to determine the relative magnitudes
of contamination in randomly selected field plots.
• Professional judgment or on-site measurements can accurately determine the relative
magnitudes of contamination among randomly selected field locations.
A more precise estimate of the mean or a more powerful test for compliance is needed
than can be achieved for a fixed budget if simple random sampling were used in place
of ranked set sampling.
A process whereby costs and accuracy of ranking field locations is considered in setting up a
ranked set sampling design is provided in Appendix 8-A.
8.3 BENEFITS
A major benefit of ranked set sampling is that it will yield a more precise estimate of the mean
than if the same number of measurements is obtained using simple random sampling (Mclntyre, 1952;
Gilbert, 1995; Johnson et al, 1996; Muttlak, 1996). Table 8-1 illustrates this for the normal
distribution with moderate coefficients of variation (CV). For example, suppose the distribution of the
variable of interest is normal with a true mean of 1 and a coefficient of variation (CV = standard
deviation divided by the mean) of 0.50. Furthermore, suppose our goal is to obtain enough laboratory
measurements to have 95% confidence that the estimated mean is within 25% of the true mean. Table
Final
EPAQA/G-5S 80 December 2002
-------
Table 8-1. Comparing the Number of Samples for Laboratory Analysis
Using Ranked Set Sampling*
Coefficient
of Variation
(CV)**
0.50
0.707
1.0
Ranked Set Sampling
Set Size (m)
Simple Random Sampling
Ranked Set Sampling - 2
Ranked Set Sampling - 3
Ranked Set Sampling - 5
Simple Random Sampling
Ranked Set Sampling - 2
Ranked Set Sampling - 3
Ranked Set Sampling - 5
Simple Random Sampling
Ranked Set Sampling - 2
Ranked Set Sampling - 3
Ranked Set Sampling - 5
Specific Precision of the Estimated Mean
with 95% Confidence
10%
97
66
51
35
193
132
102
70
385
262
201
140
15%
43
30
24
20
86
60
45
35
171
118
90
65
25%
16
12
9
10
31
22
18
15
62
42
33
25
* Adapted from Table 1 in Mode et al. (1999). Table values derived assuming there are no
errors in ranking field locations.
** Coefficient of Variation = standard deviation divided by the mean.
8-1 indicates that simple random sampling will need 16 samples, but if ranked set sampling is used with
a "set size" of 2, then only 12 samples are needed, reducing sampling and laboratory costs by 25%. If
the cost of using professional judgment or on-site measurements is considerably less than the cost of
laboratory measurements, then there is a strong motivation to use ranked set sampling rather than
simple random sampling. Note in Table 8-1 that when high precision in the estimated mean is needed,
the number of samples needed is dramatically reduced as the set size increases. Ranked set sampling
also has several other benefits, as follows:
• The estimated mean of ranked set sampling data is a statistically unbiased estimator of
the true mean (as is that of a simple random sample).
EPA QA/G-5S
81
Final
December 2002
-------
• Ranked set sampling provides increased ability to detect differences in means or
medians of two populations (for example, site and background populations).
• Ranked set sampling can be used in other sampling designs such as stratified random
sampling and composite sampling.
• Ranked set sampling can be used to obtain more representative data for purposes other
than estimating a mean by covering more of the target population. Such purposes
include computing a confidence limit on the median of a population (Hettmansperger,
1995), testing for differences in the medians of two populations (Bohn and Wolfe,
1992,1994), conducting simple tests to check for compliance with a fixed remediation
concentration limit (Hettmansperger, 1995; Koti and Babu, 1996; Barabesi, 1998),
estimating the slope and intercept of a straight line relationship (Muttlak, 1995),
estimating the ratio of two variables (Samawi and Muttlak, 1996), and estimating the
means of several populations in an experimental setting (Muttlak, 1996).
When the objective of sampling is to estimate the mean, consideration should be given to using
ranked set sampling rather than simple random sampling when the cost of ranking potential sampling
locations in the field is negligible or very low compared to the cost of laboratory measurements.
Guidance on setting up a ranked set sampling design taking cost considerations into account, including
ranking costs, is provided in Appendix 8-A.
8.4 LIMITATIONS
Before ranked set sampling is used, the costs of locating and ranking potential sampling
locations in the field should be determined to make sure that ranked set sampling is cost-effective.
Ranked set sampling can yield a more precise estimate of the population mean, the costs may be higher
than if simple random sampling were used.
The precision of a mean that is computed using data obtained with ranked set sampling will be
reduced if errors are made in ranking field locations. That is, the precision of the computed mean is
maximized (i.e., the variance of the computed mean is minimized) when there are no errors in ranking
field locations. However, even when professional judgment or on-site methods cannot rank field
locations without error, ranked set sampling will perform as well as simple random sampling in
estimating the mean for the same number of measurements.
In ranked set sampling, the field locations being compared (ranked) are supposed to be
randomly located over the population. However, in practice, field locations within a set may be
purposely clustered in close proximity to decrease the effort of taking screening measurements or to
increase the accuracy of visually ranking the locations. In this case, the precision of the estimated mean
obtained using ranked set sampling data may be reduced. To reduce or eliminate this decrease in
Final
EPA QA/G-5S 82 December 2002
-------
precision Mclntyre (1952) suggests dividing the population into portions of equal size that have no well-
defined gradients and then selecting an equal number of samples within each portion.
If ranked set sampling data are used to test hypotheses, the data computations may differ from
the standard computations that would be performed if the data were obtained using simple random
sampling. For example, suppose the Wilcoxon Rank Sum test will be used to test for differences in the
medians of two populations and that the data are obtained using ranked set sampling. Then the data
computations for the Wilcoxon Rank Sum test described in Bohn and Wolfe (1992,1994) should be
used rather than the standard computations [for example, see Section 18.2 of Gilbert (1987)] that
would be used if the data had been obtained using simple random sampling. If ranked set sampling
data will be used to conduct tests of hypotheses or to compute confidence intervals on means or other
statistical parameters, guidance from a statistician familiar with ranked set sampling should be sought.
Finally, Appendix 8-A shows that the on-site measurements (for example, the XRF
measurements in the above example) obtained for the ranking process are not used quantitatively when
computing the estimated mean or the variance of the estimated mean. Hence, ranked set sampling does
not make full use of the information content of the XRF measurements. One approach for making fuller
use of on-site measurements is to use the "Double Sampling" design described in Section 9.1 of Gilbert
(1987). In that design, the XRF measurements are used in combination with the lead measurements in
a linear regression equation to estimate the mean. However, the Double Sampling design requires the
XRF and lead measurements to be linearly related with a high correlation; ranked set sampling does
not.
8.5 IMPLEMENTATION
8.5.1 How Do You Decide the Number of Samples for Laboratory Analysis Needed to
Estimate the Mean?
Most methods in the statistical literature for determining the number of samples for estimating
the mean were developed assuming that sampling locations are identified using simple random sampling
rather than ranked set sampling. In general, ranked set sampling needs fewer samples than simple
random sampling because ranked set sampling yields more information per set of measurements. This
concept was illustrated in Table 8-1 for the normal distribution. Appendix 8-A provides a step-by-step
process for determining the ranked set sampling sample size for estimating a mean.
Methods for computing the ranked set sampling sample size (number of samples for laboratory
measurement) for other sampling objectives, such as testing hypotheses, are less well-developed and
not yet available in the statistical literature. However, since ranked set sampling increases the
performance of statistical procedures relative to what would be achieved if simple random sampling
were used, the "« " calculated for simple random sampling should be adjusted to allow for a multiple of
cycles (see the example in Appendix 8-A).
Final
EPAQA/G-5S 83 December 2002
-------
8.5.2 How Do You Decide Where in the Field to Collect Samples for Laboratory Analysis?
Locations at which samples for laboratory analysis will be collected are determined by the
ranking process using professional judgment or on-site measurements. The use of ranked set sampling
to determine the field locations is illustrated in Appendix 8-A for a balanced ranked set sampling
design. In a balanced ranked set sampling design, the same number of locations are collected for each
rank. For example, the simple ranked set sampling lead example given in Section 8.1 was a balanced
design because the design needs an equal number of locations expected to have relatively low, medium,
or high lead concentrations. A balanced ranked set sampling design should be used if the underlying
distribution of the population is symmetric.
In an unbalanced ranked set sampling design, different numbers of locations expected to have
relatively low, medium, or high concentrations are sampled. Environmental data are often asymmetric
and skewed to the right; that is, with a few measurements that are substantially larger than the others. If
the goal is to estimate the mean using ranked set sampling, Mclntyre (1952) indicates the mean would
be more precisely estimated if more locations expected to have relatively high concentrations were
selected than locations expected to have relatively low or medium concentrations. This idea is
discussed further by Patil et al. (1994). To illustrate an unbalanced ranked set sampling design, one
could modify the lead example in Section 8.1 to collect a soil sample at twice as many locations
expected to have relatively high lead concentrations as at locations expected to have relatively low or
medium concentrations. When an unbalanced ranked set sampling design is used, the true mean of the
population is estimated by computing a weighted mean, as described in Appendix 8-A, rather than the
usual unweighted mean.
An appropriate unbalanced ranked set sampling design should increase the precision of the
estimated mean of an asymmetric distribution. However, an inappropriate unbalanced ranked set
sampling design for an asymmetric distribution can provide a less precise estimate of the mean than a
balanced ranked set sampling design or a simple random sampling design. Kaur et al. (1995)
established a method for developing an appropriate unbalanced ranked set sampling design for
asymmetric distributions that are skewed to the right. This method is provided in Appendix 8-A.
8.6 EXAMPLES
8.6.1 Estimating Mean Plutonium Concentrations in Soil
Gilbert (1995) illustrates the use of ranked set sampling to obtain samples for estimating the
mean plutonium (Pu) concentration in surface soil at some weapons testing areas on the Nevada Test
Site. Pu concentrations in soil samples are typically measured in the laboratory, and measurement is
quite expensive. However, at the weapons testing areas in Nevada, inexpensive field measurements of
Americium-241 (denoted by 241 Am) in surface soil can be obtained using an in-situ detector called the
Final
EPAQA/G-5S 84 December 2002
-------
FIDLER (Field Instrument for the Detection of Low Energy Radiation). Past studies had shown that in
areas of high soil Pu concentrations, there is a relatively high correlation (about 0.7) between a FIDLER
reading at a field location and a Pu measurement made on a 10-gram aliquot for a surface (0-5
centimeters) soil sample collected at that spot. Moreover, the cost of a Pu measurement in the
laboratory is at least 10 times greater than the cost of obtaining a FIDLER reading. Hence, using Table
8-3 in Appendix 8-A, it appears that using ranked set sampling instead of simple random sampling to
determine locations to collect soil samples for laboratory analysis should provide a more precisely
estimated mean. Gilbert (1995) illustrates how to compute the mean and its variance using data from a
balanced ranked set sampling design. It should be noted that, because the distribution of Pu
measurements at the study areas is typically skewed to the right, an unbalanced ranked set sampling
design might produce a more precise estimated mean than a balanced ranked set sampling design.
8.6.2 Estimating Mean Reid Vapor Pressure
Nussbaum and Sinha (1997) discuss a situation where ranked set sampling appears to have
great potential for cost savings. Air pollution in large cities is currently being reduced through the use of
reformulated gasoline. Reformulated gasoline was introduced because of regulations that limit the
volatility of gasoline, as commonly measured by the Reid Vapor Pressure (RVP). Typically, RVP is
measured on samples from gasoline stations obtained using simple random sampling. RVP can be
measured in the laboratory or at the pump itself. Although laboratory measurement costs are not
unduly expensive, it is expensive to ship samples to the laboratory. Hence, reducing the number of
samples analyzed in the laboratory could result in a large costs savings without sacrificing the
assessment of compliance with the volatility regulations.
One possible way to reduce the number of samples analyzed in the laboratory is to use ranked
set sampling. Measurements of RVP taken at the pump might be used to rank samples using the
ranked set sampling procedure to determine which samples should be taken to the laboratory for
measurement. Suppose that (1) the correlation between field RVP and laboratory RVP measurements
is sufficiently high so that the ranking was very accurate and that (2) it is several times more costly to
transport and measure samples in the laboratory than it is to rank samples at the pump. In this case, the
number of samples measured in the laboratory could be reduced by perhaps a factor of 2 or more
without reducing the ability to determine when the volatility regulations are being violated. Nussbaum
and Sinha (1997) present data that strongly suggest a very strong positive linear relationship between
pump and laboratory measurements of RVP. This information may be used to justify the use of field
RVP measurements to accurately rank the pump samples (see Table 8-2 in Appendix 8-A). Assuming
no ranking errors, Table 8-2 shows that if the ratio of laboratory transportation and measuring costs to
ranking costs (i.e., the cost of the field RVP measurement and ranking process) is greater than 6, then
ranked set sampling can be expected to yield as precise an estimate of the mean RVP as what would
be obtained using simple random sampling but at less cost.
Final
EPAQA/G-5S 85 December 2002
-------
8.6.3 Estimating Mean Pool Area in Streams
Mode et al. (1999) provided this example of a U.S. Department of Agriculture Forest Service
data collection effort on Pacific Northwest streams as part of a large scale monitoring project. There
was interest in assessing salmon production in streams. The size of salmon habitat, particularly pool
area in streams, has been linked to salmon production. Obtaining pool area by accurately and precisely
measuring length and width of stream pools is time consuming and labor intensive. However, visual
estimates of pool area can be obtained at much less cost. Mode et al. (1999) found that ranked set
sampling estimates of the mean pool area for 20 of 21 streams were more precise than estimates of the
pool area that would be obtained by physically measuring pool areas selected using simple random
sampling. They also found that for over 75% of the streams, it would be less costly to use ranked set
sampling than simple random sampling to obtain the same precision in the estimated mean pool area
when pool measuring costs were at least 11 times greater than the costs of visually assessing pool area.
Final
EPA QA/G-5S 86 December 2002
-------
APPENDIX 8-A
USING RANKED SET SAMPLING
INTRODUCTION
This appendix provides guidance on how to develop a balanced or unbalanced ranked set
sampling design and how to estimate the mean and the standard deviation of the mean based on the
data obtained. Developing a ranked set sampling design for the purpose of estimating the mean of the
population is a two step process:
Step 1. Determine if ranked set sampling is cost effective compared to simple random
sampling. This step is accomplished by considering the costs and performance
of professional judgment and inexpensive on-site methods for ranking field
locations.
Step 2. If ranked set sampling is expected to be more cost effective than simple random
sampling, then determine the number of samples for laboratory analysis needed
to estimate the mean with the specified accuracy and confidence.
Details of how to implement Steps 1 and 2 are provided in this appendix along with the
methods for computing the mean and its standard deviation.
HOW DO YOU DECIDE IF RANKED SET SAMPLING IS MORE COST EFFECTIVE
THAN SIMPLE RANDOM SAMPLING FOR ESTIMATING THE MEAN?
This section provides guidance on how to determine if ranked set sampling will be more cost
effective than simple random sampling when the objective of sampling is to estimate the mean with a
specified precision. Ranked set sampling is more cost effective than simple random sampling for
estimating the mean if the cost of using professional judgment or on-site measurements to rank potential
sampling locations is negligible (Patil et al., 1994). This conclusion stems from the fact that fewer
samples for laboratory analysis are needed to estimate the mean with specified precision if ranked set
sampling is used than if simple random sampling is used. Hence, laboratory measurement costs will be
lower. However, ranking potential sampling locations in the field may be costly due to factors such as
spending more hours in the field, locating and training an expert to subjectively rank field locations, and
purchasing and using on-site field technologies. The basic question is whether the increased precision in
the mean that can be obtained using ranked set sampling will compensate for the extra work and cost of
ranking.
Final
EPAQA/G-5S 87 December 2002
-------
The effect of costs on the decision of whether to use ranked set sampling or simple random
sampling can be approximated using Table 8-2. This table shows the approximate cost ratio (cost of a
laboratory measurement divided by the cost of ranking a field location) that must be exceeded before
ranked set sampling will be more cost effective than simple random sampling to estimate the mean with
a desired level of precision. The cost ratio that must be exceeded depends on the set size, m (number
of locations sampled in each of the r ranked set sampling cycles), and on the distribution of the
population of laboratory measurements. Table 8-2 gives approximate cost ratios for normal
measurements when there is different sizes of ranking error. Table 8-2 shows that for a given set size,
the cost ratios that apply when there is substantial ranking error are almost double the ratios when there
is no ranking error.
Table 8-2. The Approximate Cost Ratio* for Estimating the Mean
Data Distribution
Normal
Normal
Normal
Degree of
Ranking Error
None
Moderate
Substantial
Set Size
m = 2
4
5.5
7.25
Set Size
m = 3
3.25
5
6.25
Set Size
m = 5
2.75
4.5
6.5
Constructed from Figure 3 in Mode et al. (1999).
"Cost of a laboratory measurement divided by the cost of ranking a field location.
Suppose that practical aspects of ranking in the field lead to using a relatively small set size of
m = 3 and that prior studies at the site of interest indicate that laboratory measurements for the
contaminant of interest are likely to be approximately normally distributed. Since the normal distribution
is symmetric, a balanced ranked set sampling design will be used (a balanced design is defined in
Section 8.5.2). If no errors are expected in ranking field locations, the ratio of laboratory measuring
costs (per sample) to ranking cost (per field location) must be greater than approximately 3.25 in order
for ranked set sampling to be more cost effective than simple random sampling; that is, for the total cost
of ranked set sampling to be less than the total cost of simple random sampling to estimate the mean
with a desired specified precision. If there is substantial ranking error and m = 3 is used, the cost ratio
must be greater than 6.25 for ranked set sampling to be more cost effective than simple random
sampling. However, if past studies indicate that the measurements are more likely to have a distribution
that is skewed to the right, the cost ratios will have to be higher before ranked set sampling is efficient.
Note that the cost ratios in Table 8-2 were developed assuming that a balanced ranked set
sampling design will be used. If the distribution of laboratory measurements is expected to be skewed
to the right, then an unbalanced ranked set sampling design will be more efficient than a balanced
ranked set sampling design.
EPA QA/G-5S
88
Final
December 2002
-------
The cost ratios in Table 8-2 can be used when field locations are ranked using either
professional judgment or on-site measurements. Table 8-3 provides cost ratios from Figure 4 of Mode
et al. (1999) for balanced ranked set sampling designs with set sizes m equal to 2, 4, 6, and 8 that are
applicable when there is quantitative information on the correlation between the on-site measurement at
a location and the measurement obtained in the laboratory for a sample collected at the field location. If
the on-site measurement is a good predictor of the corresponding laboratory measurement, then the
correlation between the two measurements will be close to 1 and no or very few ranking errors will
occur. A correlation of exactly 1 implies no ranking errors. If the screening measurement has
absolutely no ability to predict the value of the laboratory measurement, then the correlation will be
zero.
Table 8-3. Approximate Cost Ratio* for Estimating the Mean when On-site
Measurements** Are Used to Rank Field Locations
Correlation
(Degree of Ranking Error)
1 .0 (No ranking error)
0.9
0.8
0.7
Set Size
m = 2
5
6
7
12
Set Size
m = 4
3
5
8
12
Set Size
m = 6
2
5
8
14
Set Size
m = 8
2
5
9
16
"Cost of a laboratory measurement divided by the cost of ranking a field location.
**Cost ratios are from Figure 4 of Mode et al. (1999) and were derived assuming the on-site measurements and the
measurements in the laboratory have a bi-variate normal distribution.
If the correlation between the screening and laboratory measurements is close to 1, then the
information gained by ranked set sampling via the ranking process increases appreciably compared to
simple random sampling. Hence, the cost ratio need not be so large for ranked set sampling to be
worth the extra effort and cost of ranking. For example if the correlation is 1, indicating no ranking
errors, then the cost ratio can be as small as 2 or 3 for set sizes of m = 4 or larger. But ranking errors
will occur if the correlation is 0.8 or smaller, and the additional information obtained using ranked set
sampling will be reduced compared to simple random sampling. Consequently, the cost ratio that must
be exceeded for ranked set sampling to be more cost effective than simple random sampling is relatively
high (8 or more).
Tables 8-2 and 8-3 permit summary statements like the following (adapted from Mode et al.,
1999): If the cost for a laboratory measurement is about six. times that of a screening measurement or
professional judgment determination, and given that past data sets have been fairly normally distributed,
then ranked set sampling will be more cost effective than simple random sampling unless the chosen
EPA QA/G-5S
89
Final
December 2002
-------
ranking method will result in substantial ranking errors (Table 8-2) or is based on a on-site
measurement that is not very highly correlated (Table 8-3).
It should be noted that the use of field measurements has advantages that can lower the cost of
the overall project, such as by reducing the number of return trips to the field through using a dynamic
work plan. Hence, on-site measurements can result in greater project cost savings than is apparent in a
simple comparison of per sample costs as is done above.
HOW DO YOU DETERMINE THE NUMBER OF SAMPLES FOR LABORATORY
ANALYSIS TO ESTIMATE THE MEAN WHEN RANKED SET SAMPLING IS USED?
This section begins by defining and discussing the relative precision of ranked set sampling to
simple random sampling. The relative precision is used in the process subsequently discussed for
approximating the number of samples ("sample size") for laboratory analysis needed for balanced and
unbalanced ranked set sampling designs.
What is the Relative Precision of Ranked Set Sampling to Simple Random Sampling?
For a sample size n, the relative precision of ranked set sampling to simple random sampling is
defined to be:
RP = Var(xSRS)/Var(xRSS) (8.1 A)
where:
Var(xSRS ) = variance of the estimated mean of the laboratory measurements if simple
random sampling is used to select sampling locations, and
Var(xRSS ) = variance of the estimated mean of the laboratory measurements if
ranked set sampling is used to select the sampling locations.
Note from Equation (8.1A) that values of the relative precision greater than 1 imply that Var(x"RSS ) is
less than Var( x~SRS), in which case ranked set sampling should be considered for use instead of simple
random sampling, assuming the applicable cost ratio in Table 8-2 or 8-3 is exceeded.
It is known (Patil et al., 1994) that the relative precision of ranked set sampling to simple
random sampling is always equal to or greater than 1 when a balanced design is used, regardless of the
shape of the distribution of the laboratory measurement data. This means that Var(x~RSS ) is always
Final
EPAQA/G-5S 90 December 2002
-------
expected to be less than Var( x SRS), a rather remarkable result. To be more specific, if a balanced
ranked set sampling design is used, then:
l |