Statistical Design and Sample Selection for the Unregulated Contaminant Monitoring Regulation (1999)


           United States       Office of Water       EPA815-R-01-004
           Environmental Protection   4607          August 2001
           Agency

&EPA     Statistical Design and Sample
           Selection for the Unregulated
           Contaminant Monitoring
           Regulation  (1999)
                                  T£% Printed on Recycled Paper

-------
UCMR Statistical Design	August 2001

-------
UCMR Statistical Design	August 2001

                                        Foreword

Under §1445(a)(2)(A) of the Safe Drinking Water Act (SDWA), as amended in 1996, the Environmental
Protection Agency (EPA) is required to establish criteria for a program to monitor for unregulated
contaminants and to publish a list of contaminants to be monitored.  In response to this requirement, EPA
published the Revisions to the Unregulated Contaminant Monitoring Regulation (UCMR) for public water
systems (PWSs) on September 17,  1999 (64 FR 50556),  and in supplemental rules, including, the
Perchlorate and Acetochlor Rule (March 2, 2000 - 65 FR 11372), and the List 2 Rule (January 11,2001-
66 FR 2273).  EPA expects to publish other rules detailing updates and modifications to the UCMR
program, monitoring requirements, and analytical methods, as needed.

This document provides technical background information on the statistical process used to select the
nationally representative sample ofsmallPWSs (systems serving 10,000 or fewer people) for the UCMR.
This document also explains the statistical select!on process for large PWSs (systems serving greater than
10,000 people) selected to monitor for the Screening  Survey component of the UCMR. Note that this
document  does not explain all UCMR program requirements  in detail.   Where more  detailed and
comprehensive information is available through other EPA guidance documents, the reader will be referred
to these documents.

-------
UCMR Statistical Design	August 2001

-------
UCMR Statistical Design	August 2001
                                  Acknowledgments

This document was prepared in support of the Unregulated Contaminant Monitoring Regulation (UCMR)
for EPA's Office of Ground Water and Drinking Water.  Charles Job served as EPA's team leader for
development of the UCMR with James Taft as Targeting and Analysis Branch Chief. Rachel Sakata served
as Work Assignment Manager. The UCMR Work Group provided technical guidance throughout.
Andrew Schulman provided technical lead for finalizing the selection process. Also, Dennis Helsel of the
US Geological Survey, and Christopher Frebis provided valuable assistance.  External expert reviewers
and many stakeholders provided valuable advice to improve the UCMR and this document. The Cadmus
Group, Inc., served as the prime contractor providing support for various components of this and other
UCMR support work. The maj or contributions of Kim Clemente, Maureen Devitt, Jonathan Koplos, and
Piyali Talukdar are gratefully acknowledged.  George Hallberg served as Cadmus' Project Manager.

-------
UCMR Statistical Design	August 2001

-------
UCMR Statistical Design	August 2001


                                      Contents


Section                                                                           Page


Foreword  	i

Acknowledgments	iii

Contents	v

List of Tables and Figures  	  vii

1.      Introduction 	1
       1.1    Purpose and Background	1
       1.2    Overview of the UCMR Program 	2

2.      Determining the UCMR Sampling Frame	5
       2.1    Background  	5
       2.2    Needs Survey Inventory Background	5
       2.3    Needs Survey Sample Frame Improvements	5
             2.3.1  Community Water Systems  	5
             2.3.2  Non-transient Non-community Water Systems	7
             2.3.3  Tribal Water Systems	7
             2.3.4  System Classifications	8
             2.3.5  Additional Sampling Frame Improvements	8

3.      Selecting the Statistical Population for Systems Serving 10,000 or Fewer People	9
       3.1    Determining the Population of Small PWS for Inclusion in the UCMR Sample  . 9
       3.2    Stratifying the Population  	9
       3.3    Tribal Water Systems as an Individual Stratum	11
       3.4    Consistency of State Plans	11

4.      Selecting the Representative Sample for Systems Serving 10,000 or Fewer People ....  12
       4.1    Objectives of the Sample	12
             4.1.1  Accuracy and Precision 	12
             4.1.2  Stratification	13
             4.1.3  Representativeness	14
             4.1.4  Summary  	14
       4.2    How the Samples Were Allocated	15
             4.2.1  Allocation of Systems to States and Territories	16
             4.2.2  Calculation of Category Sampling Probabilities 	19
       4.3    Statistical Implications	22
             4.3.1  Occurrence and Exposure Estimates 	22
             4.3.2  Margins of Error	23

5.      Selecting Systems for the Initial Plan List and the Replacement List in Each State ....  24

6.      Selecting Systems for the State Plan  	26

-------
UCMR Statistical Design	August 2001

1.      Index System Monitoring	27

8.      Assessment Monitoring	28

9.      Screening Surveys 	28

10.     Pre-Screen Testing	31

11.     References 	34


                                     Appendices

Appendix A

       Statistical Theory and Optimal Choice of Probabilities
       for Probability-Weighted Estimation	  A-l

Appendix B

       Expected and Total Number of Systems Selected for Assessment Monitoring	B-l

Appendix C

       Acronyms	C-l

Appendix D

       Definitions	  D-l
                                          VI

-------
UCMR Statistical Design	August 2001



                          List of Tables and Figures

                                      Tables


Number     Title                                                               Page

Table 1.   Systems Serving 10,000 or Fewer People  	10

Table 2.   Distribution of Small Systems Required to Conduct Assessment
          Monitoring and Screening Survey in Each State/Tribe/Territory	16

TableS.   Sample Allocation Proportional to Population Served:  Expected
          Number of Systems Drawn From Each Category, and Resulting
          Margins of Error for Exposure Estimates  	19

Table 4.   Sample Allocation for Assessment Monitoring: Expected Number
          of Systems Drawn from Each Category, and Resulting Margins of
          Error for Exposure Estimates	20

Table 5.   National Representative Sample Distributed by System Size Category
          and Water Source Type as Selected for the Initial SMPs	21

Table 6.   Comparison of 99% Normal and Wilson Score Confidence Intervals
          for Exposure Estimates  in CWSs under Assessment Monitoring	24

Table 7.   Cumulative Probability  of Selection by Stratum for Colorado  	25

Table 8.   Distribution of Index Systems in the Representative Sample 	27

Table 9.   Allocation of Systems for Screening Surveys by Size Category with
          the Associated Confidence Levels and Margins of Error  	30


                                      Figures


Figure 1.  UCMR (1999) Implementation Timeline	4

Figure 2.  Number and Probability of Small Systems Chosen for Assessment
          Monitoring and Screening Surveys for the UCMR Years 2001-2003 	33
                                         VII

-------
UCMR Statistical Design	August 2001

-------
UCMR Statistical Design August 2001

1. Introduction

1.1 Purpose and Background

The requirement to monitor unregulated contaminants was established bythe 1986 Amendments to the Safe
Drinking Water Act (SDWA). Public water systems (PWSs) were required to report the monitoring
results for up to 48 unregulated contaminants to the States or primacy agency under several regulations (40
CFR 141.40(e), (j), and (n)(ll) - (12)). Systems with less than 150 service connections were exempt,
provided those systems made their facilities available for the States to monitor.

Under §1445(a)(2)(A) of the SDWA, as amended in 1996, the Environmental Protection Agency (EPA)
was required to establish criteria for a program to monitor for unregulated contaminants and to publish a
list of contaminants to be monitored. To fulfill the requirements of the SDWA, EPA published the Revisions
to the Unregulated Contaminant Monitoring Regulation (UCMR) for PWSs on September 17, 1999 (64
FR 50556). This regulation included programmatic changes to the UCMR and provided a list of
contaminants for which monitoring was required, or would be required in the future. The UCMR set up
a three-tiered monitoring approach for contaminants based on the availability of analytical methods and
insights on contaminant properties and fate and transport. In response to public comments, and as relevant
analytical methods were refined and developed. EPA published the Perchlorate and Acetochlor Rule on
March 2, 2000 (65 FR 11372), and the List 2 Rule on January 11, 2001 (66 FR 2273). As EPA
continues to refine and develop additional methods and/or identify minor clarifications or modifications
needed for the successful implementation of the UCMR, the Agency will provide additional guidance
documents or fact sheets and will promulgate additional rules, as necessary.

The UCMR program was developed in coordination with the Contaminant Candidate List (CCL) and the
National Drinking Water Contaminant Occurrence Database (NCOD). The UCMR and the CCL operate
on a 5-year cycle to assess the impact of new and emerging contaminants on drinking water. The new
UCMR program is a cornerstone of the sound science approach to future drinking water regulation. The
data collected through the UCMR program will be stored in the NCOD to facilitate analysis or review of
contaminant occurrence, and will be used to support the development of subsequent CCLs, and to support
the Administrator's determination of whether or not to regulate a contaminant in the interest of protecting
public health.

The SDWA provisions and EPA regulations described in this document contain legally binding
requirements. This document does not substitute for those provisions or regulations, nor is it a regulation
itself. Itdoesnotimposelegally-bindingrequirementsonEPA, States, or the regulated community, and may
not apply to a particular situation based upon the circumstances. EPA and State decisionmakers retain the
discretion to adopt approaches on a case-by-case basis that differ from this guidance where appropriate.
Any decisions regarding a particular facility will be made based on the applicable statutes and regulations.
Therefore, interested parties are free to raise questions and objections about the appropriateness of the
application of this guidance to a particular situation, and EPA will consider whether or not the
recommendations or interpretations in the guidance are appropriate in that situation based on the law and
regulations. EPA may change this guidance in the future without notice or an opportunity for comment.
Mention of trade names or commercial products does not constitute endorsement or recommendation for
use.

The purpose of this document is to describe the statistical design and methods used to select the
representative sample of small PWSs (systems serving fewer than 10,000 people) that are required to

-------
UCMR Statistical Design August 2001

conduct Assessment Monitoring and Screening Surveys. This document also describes the process used
to select large PWSs (systems serving more than 10,000 people) for the Screening Survey component of
the UCMR. Portions of this document also describe how this process relates to individual State Monitoring
Plans (SMPs). Under the UCMR, the listed unregulated contaminants will be monitored between 2001-
2005. All large PWSs are required to monitor for UCMR contaminants. Section 1445(a)(2) of SDWA
mandates that only a representative sample of small PWSs maybe required to monitor under the UCMR.
The representative sample must be of adequate size and quality to obtain the necessary and valid
contaminant occurrence information upon which to base regulatory determinations while minimizing burden
to the water system.

The objective of the statistical approach for the UCMR is to estimate contaminant exposure and
occurrence in a nationally representative sample of small systems which will enable extrapolations of
exposure and occurrence nationwide. For contaminant exposure assessments (the fraction of population
that is exposed to a contaminant), the representative sample design was first weighted by population served
by PWSs. However, information on contaminant occurrence is also necessary. The context of occurrence
(for example, the size of a water system or its water source) is a factor when evaluating potential future
regulatory implementation. Therefore, the representative sampling design incorporates a stratified sampling
approach and allocates some samples among strata to enable evaluations of occurrence relative to system
size (based on population served), water source type (surface water or groundwater) and, to some degree,
geographic distribution. Although this statistical design is not strictly optimal for estimating either exposure
or occurrence, the design meets the data quality objective for overall exposure estimates (99% confidence
level with ±1% error tolerance, at 1% exposure), while providing more precise occurrence estimates for
categories of small systems.

1.2 Overview of the UCMR Program

The first component of the UCMR is Assessment Monitoring which will be conducted by all of the
approximately 2,800 large community water systems (CWSs) and non-transient non-community water
systems (NTNCWSs) serving more than 10,000 persons (except those large systems that purchase all of
their water from another PWS), and by a statistically representative sample of 800 small CWSs and
NTNCWSs serving 10,000 or fewer persons (except those small systems that purchase all of their water
from another PWS). AssessmentMonitoringwillbeconductedfortheUCMR(1999)List 1 contaminants,
for which analytical methods have already been developed and refined.

The second component of the UCMR includes the Screening Surveys. Each of the two Screening Surveys
will be conducted at 120 large systems, and at 180 small systems randomly selected from the pool of
systems required to conduct Assessment Monitoring. Screening Survey monitoring will be conducted for
the List 2 contaminants for which analytical methods have been developed, but may need further refinement
before larger-scale monitoring is conducted.

The third component of the UCMR is Pre-Screen Testing which may be conducted at a combined total
of up to 200 large and small systems. States will be asked to nominate systems that are particularly
vulnerable to the Pre-Screen Testing contaminants. Pre-Screen Testing maybe conducted for some of the
UCMR (1999) List 3 contaminants for which analytical methods are in the initial stages of development.
EPA will provide further guidance on Pre-Screen Testing contaminants and analytical methods as
necessary.

-------
UCMR Statistical Design August 2001

EPA also selected 30 small systems to serve as Index Systems. These systems will conduct Assessment
Monitoring each year of the 5-year UCMR cycle to provide additional programmatic information and data
quality control. EPA contractors will collect data on temporal variations in contaminant occurrence, and
on the environmental and operating conditions of these 30 small systems. Detailed information from the
Index Systems, together with the monitoring data generated through general UCMR monitoring, will enable
EPA to develop future regulations that better reflect the environmental characteristics and operating
conditions of small PWSs.

General monitoring schedules are related to the type of monitoring (Assessment Monitoring, Screening
Survey, or Pre-Screen Testing) being conducted. Each participating system must conduct Assessment
Monitoring for the List 1 contaminants for a 12-month period in the first three years (2001 through 2003)
of the 5-year UCMR contaminant monitoring cycle (2001-2005), as per §141.40(a)(5). Randomly
selected large systems will sample for the UCMR List 2 contaminants in 2002 (for chemical contaminants)
and 2003 (for the microbiological contaminant, Aeromonas\ while small systems will sample in 200land
2003, respectively. No time-frame has been established yet for Pre-Screen Testing for the UCMR (1999)
List 3 contaminants.

Required monitoring locations are also related to the type of monitoring (see §141.40(a)(5)). Assessment
Monitoring samples must be collected at the entry point(s) to the distribution system unless otherwise
specified by the State or EPA. Samples for the first Screening Survey (for the List 2 chemicals) must
always be collected at the entry point(s) to the distribution system (source water samples are not
permitted). Samples forAeromonas must be collected in the distribution system. Sampling locations must
include one midpoint in the distribution system where the disinfectant residual will be expected to be typical
for the system (midpoint, or MD, as defined in the Rule), and two other points: one of maximum retention
time and one where the disinfectant residual will have typically declined (point of maximum residence, or
MR, and location of lowest disinfectant residual or LD, respectively, as defined in the Rule).

Discussions with States and other stakeholders indicated the need to select a representative sample of
systems across all States to ensure both confidence in the UCMR results and a comprehensive spatial
distribution. To ensure that the sample is representative of the nation and to reduce the burden on small
systems, EPA statistically selected a nationally representative sample of systems serving 10,000 or fewer
people for the UCMR. States are participating in the UCMR through State Monitoring Plans (SMPs) as
established by Partnership Agreements (PAs) with EPA. Note, however, that a State was not required
to enter into a PA with EPA to participate in SMP development. Through the PAs and the SMPs, States
were given an opportunity to participate in the UCMR program, while sharing some of the responsibilities
with EPA. All steps involved with sample selection described throughout this document assume that a State
has entered into a PA with the appropriate EPA Regional Office, or has decided to review the SMP.

As described later in this document, a list of the statistically-selected systems was provided by EPA to the
States. The list was comprised of a "primary list," an "alternate list," and a "supplemental alternate list" of
systems. These lists were provided to the States for their review and inclusion in their SMPs. States could
either: (1) respond by accepting the primary list as their representative plans, or (2) propose an alternative
plan by selecting other system(s) from the replacement list(s), in cases where EPA's initial plan identified
system(s) that no longer existed, because of merger or closure, or that switched to purchased water.

Figure 1 provides a summary of the UCMR three-tiered monitoring approach, and shows the
implementation timeline of UCMR activities.

-------
UCMR Statistical Design
August 2001
Figure 1. UCMR (1999) Implementation Timeline
2000
2001
2002
2003
2004 2005
Large Systems (serving more than 10,000 people)

List 1 Assessment Monitoring - All Large Systems
must monitor for one year during this three-year period.
Data must be reported electronically to EPA.

List 2 Screening
Survey
(Chemicals)
120 randomly
selected large
systems must
monitor.
List 2 Screening
Survey
(Aeromonas)
Second set of
randomly selected
120 large systems
must monitor.

Small Systems (serving 10,000 or fewer people)

List 1 Assessment Monitoring - 800 Small Systems
(statistically selected) must monitor for one year during
this three-year period, as specified by the State and
EPA. Approximately one-third monitor each year. EPA
pays for the costs of testing.
List 2 Screening
Survey
(Chemicals)
180 randomly
selected small
systems must
monitor; subset of
systems doing
List 1 monitoring
during this year.

List 2 Screening
Survey
(Aeromonas)
Second set of 180
randomly selected
small systems must
monitor; subset of
systems doing
List 1 monitoring
during this year.

Index Systems
30 Index Systems (selected from the 800 small systems) must monitor every year for List 1
contaminants during this five-year period, with additional support from EPA.

All Systems Conducting UCMR Monitoring

Systems
notified of
requirements
by EPA/State

Perchl orate
Laboratory
Proficiency
Testing

i i
i i
i i
Reporting - All Large and Small Systems Monitoring for List 1 and List 2 Contaminants
must report results to customers under the Consumer Confidence or Public Notification Rule.
i i
i i
i i
i i

-------
UCMR Statistical Design August 2001

2. Determining the UCMR Sampling Frame

2.1 Background

A critical first step in selecting a nationally representative sample of small PWSs for the UCMR is
the selection of a sample frame, i.e., an appropriate inventory list of PWSs from which to select the
sample. This is particularly true in a stratified sample such as designed for the UCMR. Stratified
sampling studies are often subject to strata migration problems, which are caused by the inaccurate
strata classification of systems in the design and sample selection phase and which can complicate
and jeopardize the results of the strata-based sampling. Although the Safe Drinking Water
Information System (SDWIS) provides the raw inventory list, or "total population," of PWSs from
which the statistical sample is drawn, SDWIS is not designed to be a sample frame. Many properties
of SDWIS, and, more importantly, some lingering problems of system classification in SDWIS, can
result in many inaccuracies for sample frame applications such as the sample selection procedures
necessary for the UCMR statistical sampling.

EPA utilized the inventory list provided by the 1999 Drinking Water Infrastructure Needs Survey
(Needs Survey) to select small systems for Assessment Monitoring and Screening Survey
monitoring, and to select large systems for Screening Survey monitoring. EPA then improved upon
the SDWIS inventory and created a more suitable inventory list for a sample frame. The sample
frame improvements and sample selection considerations used to improve the 1999 Needs Survey
inventory information for use as the UCMR sampling frame are described in the following sections
of this document.

2.2 Needs Survey Inventory Background

The Needs Survey is conducted every four years to assess infrastructure needs of the Nation's
drinking water systems. The Needs Survey data, along with other relevant information, is also used
to allocate State Drinking Water Revolving Fund (DWSRF) monies. The Needs Survey requires that
inventory information is as accurate as possible so that PWS needs are accurately estimated. A
process was established to develop a reliable and accurate database from which to draw the Needs
Survey samples. The Needs Survey inventory is based on inventory information on all PWSs
included in SDWIS as of March 1998. The steps used to ensure that inventory data (system status,
population served, number of service connections, source of water, contact name and address, etc.)
are correct are described in detail below.

2.3 Needs Survey Sample Frame Improvements

2.3.1 Community Water Systems

Inventory data are confirmed before being used by the Needs Survey. The Needs Survey uses the
confirmed data in specific size categories (large and medium CWSs serving greater than 50,000
people, and 3,301 to 50,000 people, respectively) to select systems that will complete questionnaires
describing current and future system infrastructure needs. Inventory data are also confirmed before
small CWSs serving less than 3,300 people and non-profit non-community water systems are
selected for site visits.

Problematic data were first identified and addressed based on the experience of the 1995 Needs
Survey. This step included reviewing and cleaning up odd, repeated values (such as repeated "99s"
for the population-served value). EPA then provided the confirmed inventory data to the States

-------
UCMR Statistical Design August 2001

(including the Virgin Islands and Puerto Rico) for review and asked the States to provide any
necessary changes.

EPA also worked with the States to identify the total "consecutive" population served (including the
population of retail buyers) by many prominent large systems, and to group systems into size and
type categories that more accurately reflect actual populations served by a particular water system.
For instance, the reported population served by the Metropolitan Water District of Southern
California (MWD) was adjusted to account for the fact that the system actually serves a much larger
population than the SDWIS inventory suggests. Based on the SDWIS inventory, the MWD is
categorized as a small system serving less than 3,300 people. Adjusting the population to account
for the approximately 16 million consumers actually served by the system, the system is then
reclassified as a large system, which accurately reflects how this system is regulated under the
SDWA. This example highlights the types of changes incorporated into the adjusted sample frame
through identification of the consecutive population served.

On site inventory verifications were conducted for States where: (1) the 1995 inventory verification
discrepancy rate was greater than 1 percent; or (2) the number of CWSs in a State in SDWIS as of
March 1998 was at least 3 percent greater than in the sampling frame used for the 1995 Needs
Survey. On site inventory verifications were also conducted for States that contributed to at least
0.8 percent of total national need in the 1995 Needs Survey, and if EPA determined that SDWIS
inventory may not accurately reflect a State's inventory. Inventory verifications were conducted in
Arizona, Arkansas, California, Colorado, New York, North Carolina, Ohio, Oklahoma, and
Tennessee. SDWIS-Fed inventory information for Virginia was replaced with SDWIS-State
inventory information, since SDWIS-Fed was known to be an inaccurate source of current inventory.

The process of State corrections includes a variety of inventory review procedures and data
verifications as described below:

• A stratified random sample of systems was used to select systems within each State that
would then be subject to inventory verification. The systems were stratified by service size
category and water source (surface or ground water), and a representative sample was
selected for each State to represent a 95 percent confidence level with a relative error of 10
percent. The sample was selected based on the expected proportion of systems with
discrepancies, from experience with data verifications conducted by EPA between 1991 and
1997. CWSs serving 25 to 1,000 people were expected to have a discrepancy rate of 7.5
percent, while CWSs serving 1,001 to 40,000 people were expected to have a discrepancy
rate of 5 percent.
• A two-staged cluster sampling approach was used to select systems in New York, since data
in this State are managed by numerous district offices. The first stage selected enough
offices to include systems of all strata in the sample. The second stage was a random sample
of systems within the district offices.
• Sanitary survey information, bacteriological results, or other chemical records in State files
and/or databases were reviewed on site to ensure that inventory data were accurate. If
inventory information was different between SDWIS and the State files and/or database, a
discrepancy was issued. Each State so identified was then given an opportunity to provide
monitoring results or other documentation of a system's characteristics, and, in some cases,
documentation of a system's actual existence. Systems that were inactive were removed
from the Needs Survey sampling frame, while other systems were re-categorized if
necessary. For instance, SDWIS may have a system categorized as a surface water system,
while State records indicate that the system purchases surface water. It is this type of mis-
categorization that is routinely corrected in the Needs Survey sample frame.

-------
UCMR Statistical Design August 2001

• Based on results of the inventory verification, the total inventory for each State was further
refined. The inventory verification results were extrapolated to all systems in each State to
estimate the number of active systems in each size and type category (stratum). The
determined proportion of inactive systems in the inventory verification sample was applied
to the number of total systems in the sample frame. Then systems were assigned to each
stratum based on the proportion of active systems that moved from one stratum to another.
For instance, if 5 percent of systems in the inventory verification sample were inactive, then
it was assumed that 5 percent of the total number of systems in the State were inactive. This
was then applied to the revised active frame, reflecting the final inventory of active systems
in a State.

The Needs Survey sample frame was further refined during the course of the data collection period.
System status as of January 1, 1999 was used to determine inclusion and placement within the
sample frame. Additional systems were added to the sampling frame based on information provided
by the State of Virginia in the last quarter of 1999.

2.3.2 Non-transient Non-community Water Systems

Limited verification was conducted on the non-profit non-community water systems (NCWSs). A
random sample of 100 systems was selected from the total number of non-profit NCWSs across the
US to determine how many systems would be selected in each State for a site visit. This random
sample was used only to estimate the number of systems in each State where EPA would conduct
a site visit. The actual sample of non-profit NCWSs were then randomly selected from the counties
in each State where EPA already had plans to conduct site visits at small CWSs. The sample of non-
profit NCWSs was not selectedbased on strata, and only non-profit NCWSs were selected for review
for the Needs Survey (since they are the only NCWSs that are eligible to receive DWSRF monies).
The sample of non-profit NCWSs does not include transient non-community water systems. The
sample size of 100 systems provides a confidence level of 95 percent and a margin of error of 30
percent.

Thus, for UCMR use, inventory data for NTNCWSs has undergone the least confirmation and
correction. Each State participating in the UCMR through PAs with EPA reviewed the systems
selected by the UCMR process for the SMP

2.3.3 Tribal Water Systems

The sample frame for Tribal systems and Alaska Native water systems was based on input from the
Indian Health Service (MS) Sanitary Deficiency System (SDS), Tribes, and EPA regions. There are
approximately 940 systems nationwide that are owned and operated by Tribes and Alaska Natives.
Some of the Tribal systems are regulated by the States, and many State-owned and operated systems
serve a large population of American Indians. Since the Needs Survey considers all State-owned and
operated systems, EPA worked with the Tribes, Alaska Natives, the MS, and the States to determine
how to classify each Tribal system. Each Tribe notified the appropriate EPA region if they believed
that the State-owned or operated system should be considered in the Tribal system sample frame,
rather than in the State sample frame. Inventory information for Tribal systems and Alaska Native
systems were taken from SDWIS, then corrected, and updated by both the appropriate EPA Region
and the MS. The corrected data were then compiled and comprises the sample frame for the Needs
Survey.

For the UCMR, the Alaska Native systems were grouped with the remainder of the Alaska CWSs
and NTNCWSs. These systems were not grouped with the Tribal water systems. All State-owned

-------
UCMR Statistical Design August 2001

and operated water systems that serve a large proportion of American Indians were not treated as
Tribal water systems. The State-owned and operated water systems were treated as State systems
in the UCMR sampling frame.

2.3.4 System Classifications

Another potentially problematic issue is the source water classification of systems used in SDWIS.
Compliance and monitoring requirements under the SDWA are more stringent for surface water
systems than for ground water systems based on historic occurrence data and vulnerability
considerations. Generally, surface water systems are more vulnerable to releases, spills, and other
potential sources of contamination than ground water systems. Also, water systems can depend on
a single water source type, but can have a mix of sources. Therefore, to ensure that the level and
type of compliance and monitoring requirements are appropriate to the type(s) of source water used
by a water system, EPA created a hierarchy of system source water classifications. The hierarchy
(or sequence from lowest to highest regulatory regime) is: purchased ground water, ground water,
purchased ground water under the direct influence of surface water, ground water under the direct
influence of surface water, purchased surface water, and surface water. This hierarchy helps
establish the appropriate regulatory oversight relative to source water type to provide the highest
degree of human health protection possible. A water system with mixed sources is regulated
according to the water source type used that ranks highest in the regulatory hierarchy.

This hierarchy of compliance and monitoring classification scheme is designed to implement the
most protective regulatory approach, but it may also pose a problem for the national representative
sample. If a system uses ground water and also purchases surface water, the system will be listed
in SDWIS as a purchased surface water system since purchased surface water ranks higher on the
hierarchy. However, the UCMR sample selection criteria excludes purchased water systems;
therefore, this system would not be selected for inclusion in the national representative sample. The
number of purchased water systems is small compared to the total number of PWSs, and this
exclusion of purchased water systems will not significantly affect the UCMR sample. States and
PWSs adjusted their monitoring schedules to accommodate the above problems relative to the source
of the actual entry points to the distribution system (EPTDS). Using the single SDWIS classification
for each system resulted in some inaccuracies which could not be avoided. These inaccuracies were
corrected where needed by the States or EPA when the SMPS were reviewed.

2.3.5 Additional Sampling Frame Improvements

The resulting Needs Survey inventory list may not exactly reflect the information in SDWIS in the
fall of 1999 or 2000 since it is likely that a few systems will have changed status in the intervening
time. To minimize the effect of system status changes, EPA verified the status, water source,
population served, and system type for small CWSs in 19 States and three territories where there
were three or fewer PWSs within each stratum (system size category by water source type). Eleven
systems in five States were determined to be inactive. Four systems in four different States purchase
their water, and were removed from the system list. One system changed from a CWS to a
NTNCWS, and seven systems in four States changed their source water type from surface water to
ground water. The population of 12 systems changed, and three systems were moved into a different
service size category. The status of these systems was verified before sample selection to ensure that
the sample of systems selected from these categories was truly representative of the number of
systems in existence.

Each State, as noted earlier, has already received an additional opportunity to correct the inventory
system data and the strata assignments when they reviewed and approved the systems selected for

-------
UCMR Statistical Design August 2001

their SMPs. Each State (and EPA itself, in the cases where a State did not wish to participate) had
an opportunity to improve the sample by removing systems from the sampling pool that were
inactive, and replace them with active systems from the alternate/replacement list(s) provided to
them. States or EPA were also permitted to remove systems from their SMPs for reasons other than
those listed above, as long as the reasons were clearly explained in consultation with EPA. All
changes were included in the final SMPs sent back to States or EPA so that systems could be notified
of their requirements under the UCMR. States and EPA Regions will continue to update UCMR
inventory information as changes occur.

3. Selecting the Statistical Population for Systems Serving 10,000 or Fewer People

3.1 Determining the Population of Small PWS for Inclusion in the UCMR Sample

The total population of small PWSs is comprised of CWSs, NTNCWSs and transient non-
community water systems (TNCWSs). Two categories of PWSs were excluded from the population
for selecting the sample. PWSs that purchase their entire water supply from another PWS are
generally exempt from the regulation, since monitoring at these systems could result in double
counting of systems using the same source. Additionally, TNCWSs were excluded from the UCMR,
since projecting contaminant exposure from monitoring results is difficult and inconclusive due to
the transient nature of the population that use these sources of drinking water.

EPA estimates that there are approximately 66,808 non-purchased CWSs and NTNCWSs, based on
the 1999 Needs Survey inventory.1 Table 1 illustrates the total number of non-purchased CWSs and
NTNCWSs in each service size category (serving 25 to 500, 501 to 3,300, 3,301 to 10,000, 10,001
to 50,000, and greater than 50,000 people) by source water type (ground or surface water), from the
UCMR sampling frame.

3.2 Stratifying the Population

In developing the representative sample, EPA considered factors such as (1) geographic location, (2)
population served, and (3) water source. The sample was stratified by population served, allocating
samples proportionately to each State by system size, and then by water source type. NTNCWSs
were selected as a separate category since these systems may be a significant source of water
consumed by residents of a community.

Sources of water may not be evenly distributed across any given State. Cities transfer water across
watershed boundaries, or move water from one State to another. To account for the proportion of
the population served by a specific water source, EPA defined "geographic location" as the location
of the water source and stratified the sample further by source of water supply. For example, if 10
percent of the population in a State obtains their water from surface water supplied PWSs that serve
less than 500 individuals, then approximately 10 percent of the sampled systems in that State should
come from the PWSs in this size and source category. The distribution of systems across the State,
then, is accommodated by the population-weighted statistical sample selection. As explained further
in Section 4, the sample is not strictly population-weighted. The sample size for each State and each
stratum were optimized to ensure that UCMR sampling results have a high level of confidence and
a low margin of error. Therefore, the sample was stratified by system type (CWSs and NTNCWSs)
1 As noted earlier, the inventory sampling frame is based on the 1999 Needs Survey. The original data
were taken from SDWIS in March 1998.

-------
UCMR Statistical Design
August 2001
Table 1.  Systems Serving 10,000 or Fewer People
Population
Served
Size
Category
25 - 500
501 -3,300
3,301 -
10,000
Subtotal
10,0001-
50,000
Over 50,000
Total
Total Population Served Nationally
CWSs
Ground
Water
4,321,261
12,894,496
13,415,514
30,631,271
25,909,335
30,478,607
87,019,213
Surface
Water
248,417
2,542,195
6,269,284
9,059,896
23,033,999
139,106,597
171,200,492
NTNCWSs
Ground
Water
2,292,697
2,493,942
282,405
5,069,044
108,027
0
5,177,071
Surface
Water
68,088
179,371
57,643
305,102
0
0
305,102
Total
6,930,463
18,110,004
20,024,846
45,065,313
49,051,361
169,585,204
263,701,878
Number of Non-purchased PWSs
CWSs
Ground
Water
28,149
9,551
2,349
40,049
1,217
240
41,506
Surface
Water
1,403
1,586
1,027
4,016
993
482
5,491
NTNCWSs
Ground
Water
16,566
2,606
55
19,227
7
0
19,234
Surface
Water
416
148
13
577
0
0
577
Total
46,534
13,891
3,444
63,869
2,217
722
66,808
The population and water system information used in this table is from the 1999 Needs Survey inventory database. The information in this table was used to
derive the sample distribution and statistical calculations found in other tables in this document.
                                                                    10

-------
UCMR Statistical Design August 2001

and by source water type within each small system size category (categories 1 through 3) in each
State.

3.3 Tribal Water Systems as an Individual Stratum

Small PWSs that are located on Tribal lands in each of the 10 EPA Regions were grouped into a
single category for the representative sample; this Tribal category is equivalent to a State for the
statistical selection process. Tribal systems had the same probability of being selected as other water
systems in the stratified random selection process that weighs systems by water source and size class
by population served. Using this discrete stratum ensures that some Tribal systems were selected
as part of the national representative sample. The systems selected comprise the "SMP" for Tribal
water systems.

3.4 Consistency of State Plans

EPA selected the representative sample from the population of CWSs and NTNCWSs nationally,
then allocated the sample to individual States, weighted approximately for the proportion of the
population served by each service size category and water source type. Based on a stratified random
selection process applied to CWSs and NTNCWSs, the sample size was weighted by population
served (to enable exposure assessments from Assessment Monitoring results) and water source type
(to enable comparisons between surface or ground water) while allocated proportionately amongst
States (to ensure geographic coverage) within service size category (categories 1 through 3). EPA
also randomly selected two alternate/replacement systems for each PWS selected for the national
representative sample. EPA selected a supplemental alternate/replacement list, in cases where the
primary system, and both alternates were determined to be inactive. All of these systems appear in
the initial SMPs sent to States.

States could have include the EPA-selected systems on the initial plan list in their SMP. If, however,
the State review determined that a system on the initial plan list had closed or merged, the system
could be removed from the SMP List. To remove a system from the SMP List and replace it with
another system, the State should have notified EPA of the reasons for removal. Valid reasons for
removal included system closure, system merger, or a determination that a system operates
exclusively with purchased water. To identify a replacement system for the system removed, States
selected the first water system (from the appropriate category) from the existing replacement list for
the PWS removed. (See Section 5 for a more detailed discussion of initial plan and replacement list
selection procedures.) More detailed directions on modifying the initial SMP and using the lists of
alternate/replacement systems are included in the instructions of each SMP.

Once the list of systems was finalized, States informed the EPA Regional Office of the States' choice
of plans (including the details of any modified plans). The EPA Regional Office worked with the
State to develop an acceptable modified plan. This approach ensures a nationally consistent system
selection process and enables acceptable SMP development with minimal State burden.

If the EPA Regional Office did not receive the notice of a final SMP within 60 days, EPA assumed
that systems on the initial plan represented the final SMP. The plan also specifies the timing of the
monitoring.
11

-------
UCMR Statistical Design August 2001

4. Selecting the Representative Sample for Systems Serving 10,000 or Fewer People

4.1 Objectives of the Sample

The representative sample of small PWSs must allow EPA to collect high quality data about
contaminant occurrence. Such data must allow precise estimates of national occurrence (the fraction
of systems in which a contaminant occurs) and exposure (the fraction of people exposed to a
contaminant). The data must also provide enough information within smaller categories of systems
(e.g., small, medium, or large systems) to inform the development of possible regulatory alternatives.
The sample must also be representative of the population of small PWSs. Each of these data quality
objectives are described in more detail in the following section.

4.1.1 Accuracy and Precision

The representative sample of small PWSs must be selected so that the data collected yield accurate
and precise estimates of national contaminant occurrence and exposure.

Accurate or unbiased estimates are correct on average over the long term, or over many samples.
For instance, if the sampling plan were to be carried out many times, the average of the occurrence
or exposure estimates derived from all of the samples would be close to the true occurrence or
exposure fraction of the population. The first data quality objective is that the sample estimates be
unbiased.

Precise estimates have small variability. All estimators are variable: even if an estimator is unbiased
over many samples, the estimate computed from any particular sample will be different from the true
population value. The second data quality objective is to limit the amount of this variability.

Precision may be measured in terms of a margin of error and its associated confidence level. For
estimates of exposure fractions, EPA will allow a margin of error of ± 1% with 99% confidence,
when the estimated exposure fraction is 1%. That is, if the estimated exposure fraction is 1%, EPA
must be able to state with 99% confidence that the true exposure fraction is between 0% and 2%.
The meaning of "99% confidence" is that if the sampling plan were to be repeated many times, the
true exposure fraction would fall within the margin of error around the estimate in 99% of all cases.

EPA specified these stringent statistical parameters to ensure high quality data and dependable
monitoring results. In general, many similar random surveys with continuous variables use a lower
level of confidence (95%) and/or a larger allowable error (plus or minus 5%). However, use of a
larger error is unacceptable for the UCMR. Examination and analysis of current occurrence data
show that many contaminants which are currently regulated, or being considered for regulation,
occur in 1% or less of systems on a national basis. However, for many contaminants, a 1%
occurrence nationally reflects a substantially larger occurrence regionally. Even a small percentage
of systems with detections of a contaminant can translate into exposure of a significant population.
By accepting a greater margin of error, and the resultant smaller sample size, such small national
occurrence might be missed entirely.

There are also other uncertainties and sources of variation in such a sample program. For example,
all contaminants have censored distributions (i.e., "less than the detection level" analytical results)
and there are many factors that affect variability and vulnerability of ground water systems. The
statistical sampling theory used to derive levels of accuracy and precision may not account for all
of these sources of variation. Hence, the high confidence level, low allowable error, and consequent
larger sample size should help ensure adequate data to meet the objectives of the UCMR program.

-------
UCMR Statistical Design _ August 2001

The data quality objective of a 1% margin of error with 99% confidence level holds for CWSs. EPA
is allowing a 2.5% margin of error with a 95% confidence level for NTNCWS since these systems
serve fewer people than CWSs. Therefore, less information is required about NTNCWSs to compute
national exposure estimates. Although more information about contaminant exposure inNTNCWSs
would be desirable, with only 800 systems available for Assessment Monitoring, trade-offs are
required in placing sampling effort where it will yield the most information about exposure. Note
that previous EPA contaminant occurrence research has not identified any significant difference in
the quality of drinking water between CWSs and NTNCWSs (see EPA document A Review of
Contaminant Occurrence in Public Water Systems. EPA 816-R-99-006, November 1999).

The precision of an estimate is determined in part by the size of the sample used to derive it. Other
things being equal, a larger sample allows a more precise estimate. A rough idea of the sample size
needed to achieve the stated goals for margin of error and confidence may be obtained from the
formula:

2-*> (1)
in which n is the sample size; p is the true or estimated exposure fraction, or 0.01 in our case; d is
the desired margin of error, or 0.0 1 ; and z is the critical value of the normal distribution at the desired
confidence level. For a 99% confidence level, a table of the normal distribution gives z = 2.58.
Inserting the given values of/?, d, and z into equation (1) gives n = (2.58)2(.01)(.99)/(.01)2 « 659.
Therefore, approximately 659 CWSs are needed to achieve the UCMR's stated data quality
objectives. Similarly for NTNCWSs, a 2.5% margin of error with 95% confidence gives/? = 0.01,
d = 0.025, z = 1.96, and therefore n = 61 NTNCWSs are required to meet UCMR data quality
objectives.

The underlying assumptions of the approximation used to derive equation (1) are: (1) that the
sample is a simple random sample from the population of systems; (2) that the sample is large
enough for a normal approximation to hold; and (3) that in each system the presence of a
contaminant can be determined with certainty. However, these assumptions are more or less untrue
for the UCMR sample, so the estimate of 659 systems is only a rough guideline. Under the more
complicated stratified sampling plan described in Section 4.2, the 800 systems allocated to
Assessment Monitoring are more than enough to meet the objectives of accuracy and precision.

4.1.2 Stratification

EPA must be able to evaluate contaminant occurrence not only nationally, but within categories (or
"strata") of systems, including source water type (ground or surface), size (3 categories), and system
type (CWS or NTNCWS). Many statutes and regulations are implemented differently for systems
of different size, or for different source water categories. Combining the representative (small
system) sample with the results from all large systems provides increased power in the total sample,
but EPA must also be able to evaluate occurrence, and possible regulatory options, related to the
small systems themselves. The SDWA and many current rules focus on burden reduction for small
systems when feasible.

EPA has not placed a specific limit on the precision that can be achieved within each category of
water systems. In general, the level of precision that can be achieved within any category is less than
for all systems taken together, because fewer samples are taken within a single category. Therefore,
instead of requiring a set level of precision for each category, EPA has taken the approach of
minimizing the highest amount of variability of the estimates within any of the categories, while

-------
UCMR Statistical Design August 2001

maintaining the objectives of accuracy and precision of the overall estimates, as described above.
This approach is described further in Section 4.2.

4.1.3 Representativeness

A representative sample should be representative. This implies some sort of fairness in selecting
systems and thereby a fairness in imposing the burden of required sampling. Some properties of a
fair and representative sample are: systems are selected at random; all systems have a chance of
being selected; the characteristics of the sample will be close on average to that of the population,
such as system sizes and types; and systems from all subgroups of interest (e.g., Minnesota, or Size
1 surface water NTNCWS) are present in the sample.

In a representative sample, every system should have a chance of being selected, but not all systems
will necessarily have the same chance. Whether that is true depends on what the sample is intended
to represent. To accurately estimate contaminant occurrence (percent of systems) in PWSs, the
sample should be selected based on systems, so it makes sense to assign an equal probability of
selection to each system. To represent exposure, the sample should be selected based on people
exposed to a contaminant. In this case it makes sense to assign sampling probabilities in proportion
to the population served, so that the systems that serve the most people are most likely to be selected.
Clearly these two types of representativeness conflict and cannot both be optimized in the UCMR
sample. Since EPA needs to represent both contaminant occurrence and exposure for the UCMR,
EPA devised a sampling plan to reflect both the number of systems and the population served by
those systems while maintaining balance between the two objectives.

Although occurrence is important, EPA is interested first in estimating contaminant exposure. If this
were the only criterion, then EPA would allocate systems to States in proportion to the population
served. Then systems that serve the most people would be sampled most often. This population-
weighted allocation can be shown to lead to the most accurate and precise estimates of overall or
national exposure. But a problem with this approach is that it assigns small numbers of systems, or
even zero systems, to the smallest States and territories. For example, Guam serves 0.015% of the
population served by PWSs in the U.S., so a population-weighted allocation would assign 0.015%
of 800 systems to Guam. That is 0.1 systems, or rounded off, zero systems assigned to Guam.
Similarly, American Samoa and the Mariana Islands would each receive zero systems, and Rhode
Island would receive one system. Such a sample would not be fully representative of the population
ofCWSsintheU.S.

EPA believes that to be fully representative of the nation, a sample of water systems must include
at least 2 systems from each State and Territory in the U.S. Therefore, EPA has imposed the
additional constraint that its representative sample must contain at least 2 systems from each State
and Territory in the U.S. (The exception is Guam, which has only one PWS in the Needs Survey
inventory; so exactly one system was selected to sample in Guam.)

4.1.4 Summary

To summarize, a sample of small PWS must provide data that meet the following data quality
objectives:

• provide national exposure estimates that are unbiased, and have a margin of error of ±
1% with 99% confidence for CWSs, or a margin of error of ± 2.5% with 95% confidence
for NTNCWSs, when the estimated exposure fraction is 1%;
14

-------
UCMR Statistical Design August 2001

• minimize the maximum variability of exposure estimates within categories of system size
and source water type; and

• sample at least 2 systems from each State and Territory.

The next section describes the sampling plan that EPA designed to satisfy these objectives.

4.2 How the Samples Were Allocated

EPA is using a representative sample of 800 small systems for Assessment Monitoring. The sample
size was selected for various statistical and budgetary considerations. A sample of 800 systems is
more than the approximately 720 systems (659 CWSs and 61 NTNCWSs; see section 4.1.1, above)
needed to meet the first data quality objective, and allows at least two PWSs to be selected in each
State.

To meet the data quality objectives described above, the crucial step is to allocate the sampling effort
in the right amounts among strata (categories) of system size, source water type, and State or
Territory. With 3 size categories, 2 source water types, 2 system types, and 56 States and Territories,
there are 3x2*2*56 = 672 strata in which to allocate the 800 systems for Assessment Monitoring.

EPA used the following three-step procedure to allocate the 800 systems:

1. The systems were allocated among the 56 States and Territories. The allocation was
roughly in proportion to population, but with at least 2 systems allocated to each State
or Territory.

2. Within each State or Territory, a probability was selected for each of the 12 categories
of system size, source water type, and system type.

3. Within each State or Territory, a category was selected at random for each allocated
system, using the probabilities computed in step 2. Within the selected category, a PWS
was selected at random, with probability proportional to its population served among all
PWSs in the category.

In this way, each of the 800 systems was assigned first to a State, then to a category within that State,
then to a particular PWS within the category.

The rest of this section describes how the State allocations and category probabilities were selected
in Steps 1 and 2 above, in order to achieve the UCMR's stated data quality objectives. The random
assignment of PWSs to categories in Step 3 is described in Section 5. The description in this section
is meant to convey the idea of the procedure and the assumptions used to derive it, but it is not a
complete technical description. A complete description and justification of the procedure is provided
in Appendix A.
15

-------
UCMR Statistical Design
August 2001
       4.2.1  Allocation of Systems to States and Territories

To obtain the most precise national exposure estimates, the optimal allocation of systems to each
State should be in proportion to the State's population served. For example, Table 2 below shows
that Texas has about 8.9% of the population served by small systems, so Texas should receive 8.9%
of the 800 systems, or 71.4 systems. This population-weighted allocation has two drawbacks. First,
the allocation is only theoretical, since each State receives a fractional number of systems. Second,
under this scheme some small States receive fewer than two systems.  For example, Rhode Island
would receive 1.1 systems, and American Samoa would receive 0.1 systems.  To  get around this
problem, the population-weighted allocation was modified as follows:
Table 2. Distribution of Small Systems Required to Conduct Assessment Monitoring and
Screening Survey in Each State/Tribe/Territory
State/Tribes/
Territories
Tribes4
\labama
\laska
\merican Samoa
\rizona
\rkansas
California
Colorado
Connecticut
Delaware
7lorida
Georgia
juam
lawaii
daho
llinois
ndiana
owa
Cansas
Centucky
^ouisiana
Vlaine
Vlariana Islands
Population Served by
Small Systems
(10,000 or less people)1
(Pn)
394,267
826,868
207,650
6,278
654,139
736,435
2,706,432
545,759
348,727
128,494
1,810,083
1,254,642
5,504
159,339
436,697
1,599,786
1,108,704
940,771
675,059
505,977
1,552,807
323,762
12,769
Number of Small Systems
Conducting Assessment
Monitoring,2
(AJ
7
15
4
2
12
13
48
10
6
2
32
22
1
3
8
28
20
16
12
9
27
6
2
Number of Small Systems
Conducting Screening
Surveys,3
(Sn)
2
4
3
2
3
8
24
6
2
1
11
12
0
2
2
8
8
10
6
4
14
3
1
                                           16

-------
UCMR Statistical Design
August 2001
Table 2. Distribution of Small Systems Required to Conduct Assessment Monitoring and
Screening Survey in Each State/Tribe/Territory
State/Tribes/
Territories
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
view Hampshire
view Jersey
view Mexico
view York
sforth Carolina
sforth Dakota
)hio
Oklahoma
)regon
'ennsylvania
'uerto Rico
Ihode Island
South Carolina
South Dakota
Tennessee
Texas
Jtah
Vermont
Virgin Islands
Virginia
Washington
Washington DC6
West Virginia
Population Served by
Small Systems
(10,000 or less people)1
(Pn)
463,283
713,312
1,372,119
912,075
1,687,841
1,129,714
343,389
471,233
216,851
343,257
934,202
449,245
1,700,436
1,257,791
199,303
1,595,309
853,024
585,945
2,131,859
493,374
56,834
644,915
219,176
823,726
3,989,818
385,852
238,493
92,555
917,521
1,013,103
0
547,661
Number of Small Systems
Conducting Assessment
Monitoring,2
(AJ
8
12
24
16
30
20
6
8
4
6
16
8
29
22
4
28
15
11
37
9
2
11
4
14
71
7
4
2
16
17
0
10
Number of Small Systems
Conducting Screening
Surveys,3
(§„)
2
3
13
8
9
8
3
4
1
2
6
6
14
11
2
7
5
6
19
4
0
7
2
9
28
4
3
1
7
10
0
6
                                                17

-------
UCMR Statistical Design
August 2001
Table 2. Distribution of Small Systems Required to Conduct Assessment Monitoring and
Screening Survey in Each State/Tribe/Territory
State/Tribes/
Territories
Wisconsin
Wyoming
Total
Population Served by
Small Systems
(10,000 or less people)1
(Pn)
1,193,154
153,712
45,071,031
Number of Small Systems
Conducting Assessment
Monitoring,2
(AJ
21
3
800
Number of Small Systems
Conducting Screening
Surveys,3
(§„)
12
2
360
The distribution of samples above is based on the population and water system information in the 1999 Needs Survey
database inventory.
This column represents the total number of small systems allocated in individual States/Tribes from the national
representative sample of 800 systems.
There are 360 small systems shown for two Screening Surveys (180 for Screening Survey 1 and 180 for Screening
Survey 2). Note that each Screening Survey Group of 120 large systems will also be required to monitor. Therefore,
there is a total of 300 small and large systems (a total of 600 Screening Survey systems) in each Survey.
The number of Tribal water systems includes Tribal systems in each of the 10 EPA Regions. Tribal systems were
aggregated as a State to ensure that Tribal systems were represented in the national representative sample of small
systems in the UCMR.
U.S. Territories include American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and the Virgin Islands.
Territories were aggregated as a State to ensure that Territories were represented in the national representative sample
of small systems in the UCMR.
The Washington DC water supply is provided exclusively by large PWSs.
3.
An initial allocation was computed for each State, in proportion to that State's population
served by small systems.

All allocations were rounded up to the next largest integer, and any allocations less than two
were increased to two. Each State was allocated at least two systems, but the total number
of allocated systems increased to more than 800 systems.

Systems were removed one at a time from various States, in such a way as to minimize the
increase in variance of an overall exposure estimate and keep all State allocations at or
greater than two, until the total allocation was reduced again to 800.
The resulting State allocations are shown in Table 2. The results are very close to what one would
get by simply rounding the population-weighted allocations to the nearest integers.

Given the small individual State sample size, no statistically valid conclusions may be drawn at the
State level. However, EPA still considers it important that all States are represented and have the
opportunity to participate in the UCMR. Some contaminants, such as some pesticides, may only be
used intensively in specific regions of the country. It is possible that with the relatively small
number of systems in the representative sample, monitoring may miss contaminants with such
targeted regional use patterns. However, including systems in every State in approximate proportion
to the population served should ensure that contaminants with regional use patterns, to the extent that
they potentially contaminate water supplies, are proportionately represented by the national sampling
design.
18

-------
UCMR Statistical Design
August 2001
4.2.2 Calculation of Category Sampling Probabilities

Once systems are allocated to States, they must be allocated to categories of system size, system
type, and source water type, within each State. This allocation is computed not in terms of fixed
sample sizes, but by choosing the probability of drawing each system from each category. In this
way, systems from even the smallest categories have some chance of being sampled.

To see how the sampling probabilities were chosen, consider first a simple allocation in which the
probability of drawing from each category is proportional to the population served in that category.
As described above, this allocation gives the most precise overall national exposure estimates.
Table 3 shows the results of such an allocation, in terms of the expected number of systems sampled
from each category, and the resulting margins of error. For CWSs, an overall exposure estimate of
1% would have a margin of error of ± 0.97%, or a confidence interval of 0.03% to 1.97%, with 99%
confidence. This is slightly better than the first data quality objective described in Section 4.1. On
the other hand, the margins of error in the size-by-source-water-type categories are as high as ± 12%,
so that not much information is gathered about some categories. Similarly for NTNCWS, the margin
of error for an overall exposure estimate is ± 2% at 95% confidence, well within the first data quality
objective; but within smaller categories the margin of error is as high as ± 19%.
Table 3. Sample Allocation Proportional to Population Served: Expected
Number of Systems Drawn From Each Category, and Resulting
Margins of Error for Exposure Estimates
System Type
CWSs
NTNCWSs
Size Category
500 and Under
501 to 3,300
3,301 to 10,000
Total
500 and Under
501 to 3,300
3,301 to 10,000
Total
Ground
Water-
Supplied
Systems
n1
78
228
237
543
40
44
5
89
error2
±3.0
±1.7
±1.7
±1.1
±3.1
±3.0
±8.8
±2.1
Surface
Water-
Supplied
Systems
n1
5
46
111
162
1
3
1
6
error 2
±12.1
±3.8
±2.5
±2.0
±17.4
±10.8
±19.0
±8.3
Total
n1
83
274
348
705
42
47
6
95
error2
±2.9
±1.6
±1.4
±0.97
±3.0
±2.9
±8.0
±2.0
Rows and columns do not add up to totals due to rounding.
1 n = expected number of samples drawn.
2 error = expected normal-theory margin of error, in percent, when the estimated exposure fraction is
1%, at 99% confidence for CWSs and 95% confidence for NTNCWSs.
Note: The population-weighted distribution of samples is based on population and water system
nformation from the 1999 Needs Survev database inventory
19

-------
UCMR Statistical Design
August 2001
Table 3 shows that there is room for improvement in the proportional allocation. By shifting some
sampling effort into the categories with smaller allocations, more information can be collected about
those categories. This would reduce the widest margins of error. For example for CWSs, systems
could be shifted into the smallest surface water stratum. The cost would be to gather less information
about the other categories, and also increase the error in the overall estimate. But if the systems are
reallocated, the widest margin of error can be minimized (the UCMR's second data quality
objective), while keeping the overall margin of error at or below 1% (the first data quality objective).

Appendix A describes the procedure for reallocating PWSs as described above, in order to meet the
first two data quality objectives. Starting from the proportional allocation shown in Table 3,
sampling probabilities are reduced in the categories with the narrowest margins of error, and
increased in the categories with the widest margins of error. As the sampling plan moves farther
away from the proportional allocation in Table 3, the overall margin of error increases. The
procedure stops when further reallocation would cause the overall margin of error to exceed 1% for
CWS, or 2.5% for NTNCWS.

Using this procedure, sampling probabilities for Assessment Monitoring were derived for the
categories of system size, system type, and source water type, to satisfy the first two data quality
obj ectives described in Section 4.1. The third data quality obj ective, sampling at least 2 systems per
State or Territory, was already satisfied by allocating systems to States in Section 4.2.1. The
resulting sampling probabilities are provided in Appendix B. Table 4 shows a summary of the
results. Compared to Table 3, systems were shifted to the smallest surface water stratum for CWSs,
Fable 4. Sample Allocation for Assessment Monitoring: Expected Number of
Systems Drawn from Each Category, and Resulting Margins of
Error for Exposure Estimates
System Type
CWSs
NTNCWSs
Size Category
500 and Under
501 to 3,300
3,301 to 10,000
Total
500 and Under
501 to 3,300
3,301 to 10,000
Total
Ground
Water-
Supplied
Systems
n1
72
218
225
515
31
31
5
68
error2
±3.1
±1.8
±1.7
±1.1
±3.9
±3.8
±9.2
±2.6
Surface
Water-
Supplied
Systems
n1
47
41
102
190
10
9
8
28
error 2
±4.1
±4.1
±2.6
±2.1
±9.2
±9.2
±9.2
±6.0
Total
n1
119
259
327
705
41
41
13
95
error2
±2.9
±1.6
±1.4
±1.00
±3.8
±3.6
±7.8
±2.50
Rows and columns do not add up to totals due to rounding.
1 n = expected number of samples drawn.
2 error = expected normal-theory margin of error, in percent, when the estimated exposure fraction is
1% at 99% confidence for CWSs and 95% confidence for NTNCWSs
20

-------
UCMR Statistical Design
August 2001
and to the various surface water strata for NTNCWSs. As a result, the maximum margin of error
in any of the categories decreased from 12.1% to 4.1% for CWSs, and from 19.0% to 9.2% for
NTNCWSs. Although still somewhat high, these errors represent the best that can be achieved with
a sample of 800 small systems, while maintaining a good overall exposure estimate. The resulting
sample allocation reflects the difficulty of obtaining precise information in all categories of systems
from a limited sample. At the same time, the margins of error for overall exposure estimates
increased to exactly 1% for CWSs and 2.5% for NTNCWSs, meeting the first data quality objective.
Margins of error in some categories also increased slightly, by up to 0.8%.

The methodology used to derive the sampling probabilities requires some simplifying assumptions.
As a result, the margins of error in Table 4 are only approximately correct. The methodology and
its limitations are described in detail in Appendix A. An important simplifying assumption is that
once a system is selected for sampling, the presence or absence of a contaminant can be determined
with certainty. Of course this assumption is not true; if a contaminant is not detected in a system in
a finite number of samples, it may never be present there, or it may only have been absent or
undetectable when the samples were taken. Because the derivation or the sampling probabilities
ignore this source of uncertainty, the margins of error tend to be underestimated. Occurrence
estimates may also turn out to be negatively biased, since contaminants that are present will not
always be detected in a finite number of samples. To account for this additional uncertainty would
require data or assumptions about the frequency and spatial and temporal variability of occurrence,
as well as the spatial distribution of samples. Such information was not available for the design of
the sampling plan. Once sampling takes place and some occurrence data are available, corrected
confidence intervals and occurrence estimates may be computed.

Due to the small sample size of the NTNCWSs in the ground water and surface water categories
within each size category (Categories 1 through 3), statistical conclusions about NTNCWSs must
be analyzed with caution. Conclusions about NTNCWSs cannotbe based on source water type since
the margin of error would be too great. Note that since the actual allocation of systems to each
Fable 5. National Representative Sample Distributed by System Size Category and
Water Source Type as Selected for the Initial SMPs
Size Category
(by population
served)
Category 1
Category 2
Category 3
500 and
Under
501 to
3,000
3,001 to
10,000
Total
Number of
CWSs
Ground
Water
76
208
230
514
Surface
Water
51
38
106
195
Number of
NTNCWSs
Ground
Water
36
30
4
70
Surface
Water
8
7
6
21
Subtotal of All
Systems by Water
Source Type
Ground
Water
112
238
234
584
Surface
Water
59
45
112
216
Total
171
283
346
800
21

-------
UCMR Statistical Design _ August 2001

service size category was randomized, the number of systems selected to monitor for each service
size category were different from the expected sample allocation once the random number generator
was used, as described in more detail in Section 5. Table 5 shows the composition of the actual
national representative sample of 800 systems as selected for the initial SMPs. Once each State
reviewed their initial SMP, the sampling distribution was expected to change. For instance, if a State
had only one  ground water system serving 25-500 people which was selected to monitor for the
UCMR and this system  was inactive (and had no  replacement systems), this system was likely
replaced by a system in the State within  another service size  and/or water source category
(replacement  systems are discussed further in Section 5).  The number of systems that monitor for
the UCMR within each stratum will be included in future EPA documents that describe sampling
results.

   4.3    Statistical Implications of the Sampling Plan

Once system selection, sampling, chemical analysis, and reporting are complete, EPA will estimate
occurrence and exposure of the 12 List 1 contaminants (see Section 8, Assessment Monitoring), and
their associated margins of error. These estimates will take into account the nature of the sampling
plan, in particular the different probabilities  of sampling from systems in different strata.  In this
section the occurrence and exposure estimates and two different kinds of confidence intervals are
described, which take the sampling plan into account.  This section provides only a summary; a
complete description is provided in Appendix A.

       4.3.1  Occurrence and Exposure Estimates

When some systems are more likely to be sampled than others, an unbiased estimate of occurrence
or exposure has to take the  sampling probabilities into account, by giving less weight to those
systems that are more likely to be sampled.  An estimator that does this is:

                                             W.c..
                                          7=1   Pi
where

       S  stands for "summation";
       /'   are the sampled systems, 1,...,800;
       yt  = 1 if the contaminant occurs in system /', 0 otherwise;
       Wi is the weight given to system /'
          = population served by system /', for exposure estimates; or 1, for occurrence estimates;
       pt  is the probability of choosing system /';
       ct  is a constant, computed in Appendix A.

For example, for an exposure estimate, a sampled system receives more weight in equation (2) if it
serves more people (greater W,), and less weight if it is more likely to be chosen under the sampling
plan (larger p{ in the denominator).  Because of the weighting in equation (2), some systems were
made more likely to be sampled in order to meet the UCMR's data quality criteria, as described in
Section 4.2, without incurring any bias in the exposure or occurrence estimates. (The constant ct in
equation (2) performs a similar function to n~l in an ordinary arithmetic mean, correcting for the total
number of observations in the sample. Details and a more precise definition of $ are provided in
Appendix A.)
                                           22

-------
UCMR Statistical Design August 2001

There is an overlap between the populations served by CWSs and NTNCWSs. In the absence of
information about the number of people obtaining their drinking water from CWSs or NTNCWSs
and their degree of exposure, there is no way to combine exposure estimates from these two classes
of systems in the right proportions to reflect people's total exposure. For this reason, exposure
estimates will be computed separately for CWSs and NTNCWSs, and will not be combined into a
single overall exposure estimate.

4.3.2 Margins of Error

The error ranges in Table 5 were computed using the statistical formulas shown in Appendix A,
using the sampling probabilities and a normal approximation to the estimation error. The normal
approximation is valid when the expected number of detections is large enough. The expected
number of detections is n*p, where n is the number of systems selected andp is the fraction of
systems in which the contaminant occurs. In order for the normal approximation to hold, Casella
and Berger (1990) recommend n*p>5, while Parzen (1960) recommends n*p>W. For CWSs in
Table 5, where n = 705 andp = 0.01, n*p = 7.05. By this measure, the normal approximation may
not be valid. Moreover, Table 5 shows a clear problem with the normal approximation: the error
bounds are so wide that they include negative occurrence fractions within the margin of error. For
example, among very small ground water CWSs in Table 5, when the observed fraction of systems
with a contaminant is 1 percent, a 99% confidence interval for the true fraction is 1% ± 3.1%, or
[-2.1%, 4.1%]. This interval allows the possibility of a negative fraction of occurrence, which
cannot logically occur. The interval may be truncated to [0,4.1%], but the need to truncate suggests
that the normal approximation does not lead to an accurate confidence interval.

The normal-based confidence interval is only one of several possible confidence intervals for an
estimated proportion. Newcombe (1998) compares seven such intervals, including two varieties of
the normal interval. Of these, the Wilson score interval without continuity correction (Wilson, 1927)
has good statistical properties (e.g., the stated confidence level is approximately correct for a wide
range of n andp), is simple to compute, and unlike the normal interval, always gives confidence
limits between 0 and 1. Given an estimated occurrence fraction p from a sample of size n, the
Wilson score interval for/? is computed as shown in equation (A-13) of Appendix A.

Table 6 compares the normal and Wilson confidence intervals for CWS, still assuming an estimated
occurrence fraction of p = 0.01, and using the expected sample sizes summarized in Table 3. A
simple interpretation of these intervals is that the normal interval equals/?, the estimated fraction,
plus or minus some amount, while the Wilson interval is approximately p times or divided by some
amount. For example, for very small ground water CWSs, the Wilson interval is [0.1%, 11.2%], or
about 1% x / H- 11. So according to the Wilson interval, the true occurrence fraction lies somewhere
between 0.1% and 11.2%, with 99% confidence. By comparison, the normal interval for this
example is -2.1% to 4.1%. Although the Wilson interval in this example is wider than the normal
interval, it is more believable in part because it does not include negative occurrence values.

The normal-based error ranges in Tables 3 and 4 are useful as a rough guide to the expected
precision of an estimated occurrence fraction. Moreover, the normal approximation yields the
simple formula in equation (1) for estimating the sample size needed to achieve a given precision
with given confidence. However, when computing confidence intervals for the estimated proportion,
the Wilson score interval is preferred, both because of its good statistical properties and because it
avoids the possibility of including negative occurrence values.
23

-------
UCMR Statistical Design
August 2001
Table 6. Comparison of 99% Normal and Wilson Score Confidence Intervals for
Exposure Estimates in CWSs under Assessment Monitoring
Size Category
Ground Water-
Supplied Systems
Surface Water-
Supplied Systems
All
Normal Confidence Intervals
500 and under
501 to 3,300
3,301 to 10,000
All
[-2.1, 4.1]
[-0.8, 2.8]
[-0.7, 2.7]
[-0.1, 2.1]
[-3.1, 5.1]
[-3.1, 5.1]
[-1.6, 3.6]
[-1.1, 3.1]
[-19, 3.9]
[-0.6, 2.6]
[-0.4, 2.4]
[0.0, 2.0]
Wilson Score Confidence Intervals
500 and under
501 to 3,300
3,301 to 10,000
All
[0.1, 11.2]
[0.2, 4.8]
[0.2, 4.7]
[0.3, 2.9]
[0.1, 17.9]
[0.1, 18.2]
[0.1, 8.4]
[0.2, 6.2]
[0.1, 10.3]
[0.2, 4.3]
[0.3, 3.7]
[0.4, 2.6]
Confidence intervals are in percent, when the estimate exposure fraction is 1%.
5. Selecting Systems for the Initial Plan List and the Replacement List in Each State

EPA selected the PWSs for the national representative sample through a two-staged random
selection process. Once the number of Assessment Monitoring systems were selected for each State,
EPA selected the individual stratum from which the  systems would be selected.  The individual
systems within each stratum were then selected. The sampling process is described in detail below.

EPA first calculated the probability of selecting each stratum for both  CWSs and NTNCWSs
together (i.e., so that the cumulative probability of selecting any stratum from an individual State
equals one).  The method of computing the sampling probabilities was described in Section 4.2. A
random number generator was then used to allocate the systems to strata of system type, system size,
and source water type.  For instance, for a State that is allotted 10 PWSs for the UCMR sample, the
random number generator was run 10 times and was then compared to the cumulative probability
of selection to designate the strata from which the individual PWSs were selected.

In the example shown in Table 7 for the State of Colorado, the random number generator was run
ten  times, returning the following set of numbers:  0.793739, 0.474497, 0.245539, 0.647118,
0.558134, 0.647613, 0.416625,  0.889291, 0.94107,  and 0.457243.  Based on comparing these
randomly generated numbers to the cumulative probability column  in Table 7, one PWS was a
surface water based CWS serving 25-500 people, one was a ground water based CWS serving 501-
3,300 people, three were surface water based CWSs serving 501-3,300 people, two were ground
                                          24

-------
UCMR Statistical Design
August 2001
Table 7. Cumulative Probability of Selection by Stratum for Colorado
System Type
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
System Size
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
Water Source
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.089138
0.188764
0.166154
0.138157
0.129963
0.21383
0.002338
0.008918
0.001276
0.001898
0
0 059564
Cumulative
Probability
0.089138
0.277903
0.444056
0.582213
0.712176
0.926007
0.928344
0.937263
0.938538
0.940436
0.940436
1 000000
Expected
Number of
Systems
1
2
2
1
1
2
0
0
0
0
0
1
Actual
Number of
Systems
1
1
0
3
2
2
0
0
0
1
0
0
waterbased CWSs serving 3,301-10,000 people, two were surface water based CWSs serving 3,301-
10,000 people and one was a surface water based NTNCWS serving 501-3,300 people. When the
random number does not exactly match the cumulative probability, the random number was
"rounded" up to the next category. The number of systems to be selected from each stratum was
calculated with replacement. This means that each stratum in the State could be selected more than
one time.

Table 7 also shows the expected number of systems to be allocated to each stratum within the State.
The "Expected Number of Systems" column shows the sample distribution if the sample were
selected solely based on the proportion of the population served by each stratum. The actual
sampling distribution based on using the random number generator is shown in the next column.
Appendix B shows the cumulative probabilities of selection, as well as the expected and actual
number of systems that were allocated to each State for the Assessment Monitoring portion of the
UCMR.

Once systems were allocated to each stratum, systems were then chosen at random from the strata.
To ensure that the systems selected from each stratum were representative of population served, the
probability of selection for each system was taken to be proportional to its population served within
that stratum. For example, a system that served 5% of all people in the stratum of Category 1 ground
water systems in Nebraska would have a 5% chance of being drawn when a system was selected
from that stratum. For systems not to be selected twice, each time a system is selected it is removed
from consideration in its stratum, and the system selection probabilities are re-computed. This is
called sampling without replacement. (Although the systems are sampled without replacement, the
stratum sampling probabilities in Section 4.2 were derived under the assumption of sampling with
replacement. This assumption makes the calculation tractable, at the cost of introducing some error.
The error is small enough to be ignored; see the discussion in Section A.5. in Appendix A.

The Initial SMP included detailed tables with the total number of systems allocated to a State
through this two-staged random selection process. The alternate/replacement list is a list of
additional systems that were randomly selected to replace primary systems in the SMP if necessary.
25

-------
UCMR Statistical Design August 2001

Primary systems were replaced when the primary system was found to be inactive (i.e., the system
closed, merged, or now purchases all of its water from another system). Primary systems could have
also been replaced for other reasons, but the State should have already submitted a request and the
reasons the systems were removed to EPA with their modified SMP.

EPA randomly selected two alternate/replacement systems for each primary system for every State.
A supplemental replacement list was also generated for each State. The replacement lists were
generated individually to select one replacement for each system selected to monitor for the UCMR.
Once the first replacement list was generated, the second set of replacement systems was selected.
This ensured that each system selected to monitor for the UCMR had two replacement systems
selected from the appropriate stratum. These systems were selected in the same manner as the initial
system list. The third, or general replacement list consisted of a randomly selected number of PWSs
from the remaining PWSs in the State, regardless of system size category, source water type, and
system type. For example, if a ground water based CWS serving 501-3,300 people was inactive, and
if both replacement systems selected for that system were inactive, any system remaining in the
sample frame list may be randomly selected to monitor for the UCMR regardless of system type,
size, and source water type.

6. Selecting Systems for the State Plan

Each State, Tribe, and territory had 60 days to review the initial plan list. The State/Tribe either:
(1) accepted the selections as its SMP and notified the Regional Administrator of its acceptance or
(2) proposed changes to the initial plan list and selected alternates from the replacement list,
including the reasons for the changes, informing the Regional Administrator of the proposed
changes; or (3) took no action within 60 days, which allowed the Regional Administrator to specify
the portion of the representative sample applicable to the State as its SMP. In the second case, the
Regional Administrator had 60 days to work with the State to develop a suitable plan, if problems
were encountered. The Reference Guide for the Unregulated Contaminant Monitoring Regulation
(EPA 815-R-01-023) provides a more detailed discussion of the SMP process.

Any system(s) removed from the Initial SMP list must be replaced by the system(s) assigned as
replacement systems. If the State determines that both replacement systems are no longer active, the
first active system on the supplemental replacement list becomes the replacement system.

Each State/Tribe/EPA Region reviewed their SMP to determine that the systems selected have the
appropriate operational status. The State/Tribe/EPA Region then submitted its representative sample
listing to the EPA Regional Office, with all changes from the initial list marked and the reasons for
any changes noted.

States/Tribes may also sample additional systems. However, any additional PWSs sampled will not
be combined with those of the representative sample for the purpose of computing national estimates
of exposure and occurrence. EPA cannot pay for the testing of these additional systems. These
additional systems, though providing useful information, will bias the national set of systems if
included with those selected using the stated national criteria. However, if the States provide the
results of such monitoring, EPA will receive the data through SDWIS for input to the NCOD.
26

-------
UCMR Statistical Design
August 2001
7. Index System Monitoring

EPA identified 30 CWSs in the lower 48 States from the representative sample of small systems to
be "Index" systems. Five systems were selected from each size and source water stratum. The data
collected from the Index Systems will be used partly for added quality control and to better
characterize monitoring results and operating characteristics of small systems. These systems will
be monitored every year during the five year UCMR-listing cycle. This will provide detailed
information regarding temporal variation during the course of UCMR monitoring, as well as possible
effects related to operational changes. EPA will pay for this monitoring, and will provide for
sampling equipment, labor for sample collection, shipment of samples, testing and analysis.
Additional water quality and operational data from these systems may be collected at the same time,
with minimal burden to the systems. The Index Systems were selected so that they are located within
watersheds that have been studied extensively under the United States Geological Survey's (USGS)
National Water Quality Assessment Program. This allows both the EPA and the USGS to share
information on source water and finished water quality in watersheds across the US. Table 8 shows
the number of systems chosen in each size category as Index Systems from the representative
sample.
Fable 8. Distribution of Index Systems in the Representative
Sample
Size Category
Number of
Non-Index Systems
Number of
Index Systems
Ground Water
500 and Under
501 to 3,000
3,001 to 10,000
71
203
225
5
5
5
Surface Water
500 and Under
501 to 3,000
3,001 to 10,000
Number of Systems
in the Representative
Sample
46
33
101
679
5
5
5
30
Note: The distribution of samples indicated above is based on the 1999 Needs Survey database
inventory.
27

-------
UCMR Statistical Design August 2001

8. Assessment Monitoring

The first component of the UCMR is Assessment Monitoring which will be conducted by all of the
large CWSs andNTNCWSs (except those large systems that purchase a/7 of their water from another
PWS), and by a statistically representative sample of 800 small CWSs andNTNCWSs (except those
small systems that purchase all of their water from another PWS). Assessment Monitoring will be
conducted for the 12 UCMR (1999) List 1 contaminants (listed in §141.40(a)(3) Table 1, UCMR
(1999) List 1). One-third of the representative sample (267 systems) will monitor in each of the
three Assessment Monitoring years (2001 to 2003). This sampling distribution is designed to
facilitate laboratory scheduling and other logistical considerations. The small systems were
delegated to a sampling year by random selection with a 33 percent probability that each system
would be selected in each of the three years. The year for the first system was randomly selected,
then the year for the rest of the systems were chronologically ordered. For instance, the random
number generator selected the year 2002 as the sampling year for the first system. The second
system was then assigned the year 2003, and the third system was assigned 2001, until each system
in the sample had an assigned monitoring year.

After the sampling year was selected, the sampling months were randomly selected for the systems,
with four samples per year for surface water (and GUDI) systems and two samples per year for
ground water systems. The first month was selected randomly as described above for the sampling
year, then subsequent months were assigned consecutively. One sampling period must be during the
most vulnerable period (May 1 through July 31), as designated in the regulation. Specification of
the monitoring year and month not only facilitates scheduling of laboratory resources, but also
ensures that sampling covers vulnerable periods and all seasons to assess some aspect of temporal
occurrence patterns. To provide States with flexibility in determining vulnerable periods, EPA
allowed the State to modify the vulnerable period for some or all systems in their SMP The State
should have notified EPA of the reasons for the change. EPA specified the sampling date as the 15th
of each month, plus or minus two weeks. Systems may sample at any time during the month, as long
as all subsequent samples are taken on the same day. The second ground water sample may be taken
within 5 to 7 months of the initial vulnerable period sample. While Index Systems sample during
all five years of the UCMR cycle, each Index System was also assigned an "official" sample year.
Only the data from the official sample year will be used in the national summary of results from
Assessment Monitoring, for consistency with the sample design.

The UCMR does not specify any particular year for Assessment Monitoring for the large PWSs, but
does specify that they must conduct their monitoring within the first three years (2001-2003) of the
UCMR cycle. EPA expects that large system UCMR monitoring for unregulated contaminants will
coincide, whenever possible, with required monitoring for regulated contaminants. Since monitoring
schedules for regulated chemicals depend on system size and detection history, compliance schedules
vary significantly. EPA recognizes that although it will be desirable to collect UCMR samples
concurrently with compliance samples for regulated chemical contaminants, sometimes it may be
difficult to coordinate the two sampling events. Large systems are required to bear the costs of
sampling, testing and reporting the results, and coincident monitoring may help reduce the burden.

9. Screening Surveys

The second component of the UCMR includes the Screening Surveys. Each Screening Survey will
be conducted at a combined total of approximately 300 PWSs randomly selected from the pool of
systems required to conduct Assessment Monitoring. Screening Survey monitoring will be
conducted for the UCMR (1999) List 2 contaminants for which analytical methods have been
developed, but may need to be further refined before large-scale Assessment Monitoring is

-------
UCMR Statistical Design August 2001

conducted. There are 15 unregulated contaminants on the UCMR (1999) List 2. Fourteen of these
contaminants are chemical contaminants, and one is a microbiological contaminant. These
contaminants are listed in §141.40(a)(3) Table 1, UCMR (1999) List 2.

The Screening Surveys are being conducted to assess contaminant occurrence in PWSs, and not to
determine exposure assessment by population (as is the purpose of Assessment Monitoring). EPA
estimates that there will be two different groups of systems involved in the Screening Surveys. Each
group will be comprised of 300 large and small CWSs and NTNCWSs. Small systems will conduct
the first Screening Survey in the year 2001, while large systems will conduct the Screening Survey
in 2002 for the contaminants identified in the List 2 rule. EPA expects that Aeromonas will be
monitored in 2003, since the analytical method will not be completed before the first Screening
Survey. Sampling schedules have been established, in part, to enable Screening Survey samples to
be collected coincident with the Assessment Monitoring samples whenever possible to minimize the
burden to small systems. Large systems are responsible for coordinating their Screening Survey
sample selection with Assessment Monitoring.

EPA is examining general thresholds to evaluate Screening Survey results, relative to the margin of
error in the sample. For example, if a contaminant occurs over a certain threshold (i.e., in a
percentage of systems/population served), the contaminant may then be placed on the Assessment
Monitoring list and monitored in the next round of the UCMR by all large systems and a
representative sample of small systems. If the contaminant occurrence is below this threshold, it is
possible that no further testing will be required. Factors such as health effects levels will also need
to be considered; hence, thresholds may vary by contaminant.

Systems were selected from all the size and water source categories. However, selection was not
proportionately weighted by population served, or by the proportion of systems in each size category.
If the sample was weighted by population served, a disproportionate number of large systems would
be included in the Screening Surveys. If the sample were weighted by the number of systems in each
size category, a disproportionate number of small systems would be represented. Therefore, each
size category was given equal importance with 60 systems selected from each size category, with the
selected systems distributed evenly between surface water and ground water systems, wherever
possible (i.e., 30 ground water, and 30 surface water systems were targeted to be selected to monitor
for each Screening Survey. Note, however, that there were not enough very small (serving less than
500 people) systems in the sample to select a full 30 systems in this category for each Screening
Survey year. Only 20 very small surface water systems were selected to monitor for UCMR (1999)
List 2 contaminants in 2001. However, the extra 10 systems were selected from the very small
ground water category, so that a total of 40 systems will monitor for the Screening Survey in 2001.
This results in 180 small systems and 120 large systems in each of the Screening Surveys (i.e., a total
of 360 small systems and 240 large systems in the two Screening Surveys). To make national
occurrence or exposure estimates, the resultant data will need to be weighted in relation to these
sample distributions.

Table 9 illustrates the allocation of systems in each size category in each group for each Screening
Survey and the associated margin of errors of estimation at the 99 and 95 percent confidence levels
to evaluate the measurement precision for the sample of 300 systems. Even though there are a total
of 600 systems involved, there will be, as noted, two Screening Surveys performed, by two mutually
exclusive groups of systems, analyzing water samples for two different sets of contaminants.
29

-------
UCMR Statistical Design
August 2001
Table 9. Allocation of Systems for Screening Surveys by Size Category
with the Associated Confidence Levels and Margins of Error
Size Category
500 and Under
501 to 3,300
3,301 to 10,000
Subtotal
Small Systems
10,001 to 50,000
50,001 and over
Subtotal
Large Systems
All
Ground Water-
Supplied Systems
n1
40
45
30
115
30
30
60
175
99% 2
±4.1
±3.8
±4.7
±2.4
±4.7
±4.7
±3.3
±1.9
95% 2
±3.1
±2.9
±3.6
±1.8
±3.6
±3.6
±2.5
±1.5
Surface Water-
Supplied Systems
n1
20
15
30
65
30
30
60
125
99% 2
±5.7
±6.6
±4.7
±3.2
±4.7
±4.7
±3.3
±2.3
95% 2
±4.4
±5.0
±3.6
±2.4
±3.6
±3.6
±2.5
±1.7
All
n1
60
60
60
180
60
60
120
300
99% 2
±3.3
±3.3
±3.3
±1.9
±3.3
±3.3
±2.3
±1.5
95% 2
±2.5
±2.5
±2.5
±1.5
±2.5
±2.5
±1.8
±1.1
     1  Values in the columns with the heading of "n" indicate the number of PWSs allocated to a specific system
       size category.
     2  These column headings indicate the confidence level used for evaluation.  The values preceded by "±"
       listed in these columns are the normal-theory margins of error, in percent, associated with the given
       confidence level (either 99% or 95%).  Error calculations in the table assume an estimated occurrence
       fraction ofp = 0.01.


Results from the Screening Surveys are likely only suitable for aggregate national estimates given
the 99 percent confidence level and ± 1.5% margin of error.  Only aggregated national estimates are
appropriate because the error margin may be too large in small subcategories (e.g., surface or ground
water systems in a given size category) to be conclusive, particularly in cases where no detections
occur. For example, in very  small surface water systems,  if a contaminant does not occur in the
screening survey, there is a 95% chance that the national occurrence fraction of that contaminant is
less than 4.4%.  Note also that since the total number of systems allocated to each size category is
equal (60 systems per category), the monitoring results will have to be weighted by the proportion
of the population served within each service size category.  Monitoring results will have to be
carefully analyzed to correctly assess the possible implications of such results.

To implement the Screening Surveys, EPA selected 180 small PWSs from the set of 267 systems
(i.e., one-third of the 800  systems in the national representative  sample), scheduled to conduct
Assessment Monitoring in 2001  (for the  first group) and  again in 2003 (for the second group).
Although a Screening Survey is not currently scheduled for 2002, systems were still selected for this
year (in case it was necessary to conduct a Screening Survey in 2002). The probability of a system
being selected for Assessment Monitoring (^4) in any given year is 267/800, or Pn(A)=33%.
                                             30

-------
UCMR Statistical Design August 2001

Given that a system was first selected for Assessment Monitoring (^4) in any given year n, the
probability of that system also being selected for Screening Survey (5), is:

0.67 (3).
Overall, there was a 22% probability that a system would be selected for the Screening Survey and
Assessment Monitoring in the same year (67% chance of being selected for Screening Surveys,
multiplied by a 33% chance of being selected for Assessment Monitoring). However, if the first
Screening Survey is conducted in the year 2002, the systems selected to conduct Assessment
Monitoring in the year 2001 have no chance of being selected for a Screening Survey. Overall, there
is a 45% chance for a small system to be selected for both Assessment Monitoring and a Screening
Survey simultaneously. Therefore, the probability of a system being selected only for Assessment
Monitoring is estimated as 55%. Figure 2 depicts the number of systems and the probability of a
system being selected for Assessment Monitoring and a Screening Survey.

Similarly, for the large CWSs and NTNCWSs, the probability of a system being required to
participate in a Screening Survey (S) is:
P (S)S(__)= 0.0865
large^- ' 2774

Therefore, there is approximately a 9% probability that a large system will be chosen for a Screening
Survey.

Again, based on the proportion of population served by small CWSs and NTNCWSs in each State,
the number of systems selected for the two groups of Screening Surveys (Sn) in each State/Tribe «,
is calculated as:

L*Z. (5).
. .
m NPt '

where Pni is the population served by small systems in State/Tribe n in category /', and NPt is the total
national population served in system category /', and Zt is the total number of systems required to
conduct the survey in that category /'.

Figure 2 illustrates the allocation of systems conducting Assessment Monitoring and two Screening
Surveys in each State/Tribe based on the population served by the systems.

10. Pre-Screen Testing

The third monitoring component of the UCMR is Pre-Screen Testing. EPA established this third
tier of the UCMR monitoring with its stakeholders for contaminants of concern for which analytical
methods are in the early stages of development and/or whose methods are currently too expensive
for wide-scale monitoring. Pre-Screen Testing may also address contaminants that have recently
emerged or been identified as a concern, such as through the Governors' petition process. The
purpose of this monitoring component will be to determine whether the methods in early
development will provide adequate analytical results in conditions under which the contaminants are

-------
UCMR Statistical Design	August 2001

most likely to occur.  There are nine contaminants on the UCMR (1999) List 3, including seven
microbiological contaminants and two radiological contaminants. The complete list may be found
in §141.40(a)(3) Table 1, UCMR (1999) List 3.

EPA will ask each State to identify a list of between 5 and 25 systems that might be most vulnerable
to the UCMR (1999) List 3 Pre-Screen Testing contaminants.  EPA will define a process to select
up to  200 large and small PWSs from the list of systems nominated by States.  The Pre-Screen
Testing will use analytical results from a small  sample to evaluate and improve methods, and to
conduct an initial assessment of occurrence. Given the small number of Pre-Screen Testing systems,
the  monitoring results cannot be used to estimate national occurrence of UCMR (1999) List 3
contaminants in a statistically rigorous manner.  EPA will provide further guidance on Pre-Screen
Testing contaminants after the List 3 Rule is promulgated.
                                           32

-------
UCMR Statistical Design
August 2001
Figure 2.   Number and Probability of Small Systems Chosen for Assessment Monitoring and
           Screening Surveys for the UCMR Years 2001-2003
UCMR
Monitoring
Year
1
r> 2001 	 »•
800
Representative *• 2002 	 ^
Systems
(100%)

>2003 	 *
Assessment
Monitoring
in any
given year
i
266 systems
(33% of 800 system)
Assessment Monitoring
+
Screening Survey
(45%)*
1

Assessment
Monitoring
Only
(55%)*
i
266 systems
b (33% of 800
systems)

267 systems
(33% of 800 systems)
180 systems
(67% of 267 or 22.5% of
800 systems)

87 systems
(33% of 267 or 11% of
* 800 systems)

267 systems
(33% of 800 systems)
180 systems
	 > (67% of 267 or 22.5% of
800 systems)

87 systems
(33% of 267 or
^ 1 1% of
800 systems)
       * Overall Probability (over three years)
                                           33

-------
UCMR Statistical Design	August 2001

11. References

       Casella, G. and Berger, R. (1990), Statistical Inference, Pacific Grove, Ca, Wadsworth.

       Cochran, W. G. (1977), Sampling Techniques (3rd ed.), New York, Wiley.

       The MathWorks, Inc. (1996), UsingMATLAB, Natick, MA.

       The MathWorks, Inc. (1999), Optimization Toolbox User's Guide, Natick, MA.

       Newcombe, R.G. (1998), "Two-sided confidence intervals for the single proportion:
       comparison of seven methods," Statistics in Medicine, 17: 857-872.

       Parzen, E. (1960), Modern Probability Theory and Its Applications, New York: Wiley.

       Wilson, E.B. (1927), "Probable inference, the law of succession, and statistical
       inference," Journal of the American Statistical Association, 22: 209-212.
                                           34

-------
UCMR Statistical Design August 2001
Appendix A

Statistical Theory and Optimal Choice of Probabilities
for Probability-Weighted Estimation
This appendix presents some statistical theory for the methods of estimation, confidence intervals,
and selection of sampling probabilities that were used to derive the sampling plan for Assessment
Monitoring, as described in Section 4. The theory is presented here in order to show that, subject
to the approximations described in Section A.5, the sampling plan will provide occurrence and
exposure estimates that meet the UCMR's data quality objectives of accuracy and precision, as
described in Section 4.

The discussion below requires that the reader be familiar with basic statistical theory, for example
as in Casella and Berger (1991). It extends some of the sampling theory of Cochran (1977), but
does not require that the reader be familiar with that book.

A.I Probability- Weighted Estimation

Suppose we have a population of TV systems of interest. For example, we could consider the popu-
lation of N = 63,869 small PWSs in the United States and Territories. Fix a single contaminant of
interest. For each system /' = 1, . . . , N, let j// = 1 if the contaminant occurs at any time in system
/', or 0 otherwise. We want to estimate the weighted mean
where the Wi are given weights. For example, if Wi is the number of people served by system
/', then [i is the number of people exposed to the contaminant. If S is a subset of systems and
Wi = !{/' e S}/#{S], where \{A] equals 1 if the event A is true or 0 if A is false, and #{S] is the
number of systems in S, then [i is the fraction of systems in S that have some occurrence of the
contaminant.

In order to estimate /x, consider the following sampling scheme. We draw D independent samples.
The d-th sample consists of nj i.i.d. system numbers Id\, ..., Idnd drawn with replacement from
the distribution P(Id\ = /) = Pdi, where pd\, ..., pdN are given probabilities, X^i Pdi = 1-
The systems numbered Id\, ..., Idnd are then sampled, and _y/dl , . . . , yidnd are obtained. The total
number of systems sampled is n = X!d=i nd-
A-l

-------
UCMR Statistical Design _ August 2001


To compute an unbiased estimate of /z using this sampling scheme, let qdi = l{pdi > 0}, Cj =

(Eli ndqdi)   , and
                                      D   »d
Here  (— ^)  is a simplified notation for  'y'c' .  This notation simplifies formulas and is used
repeatedly below.

We call /z the probability-weighted estimator of /z, given the sampling probabilities pdi •  A includes
two bias corrections: Cj acts like n~l in a sample mean, correcting for the total number of possible
observations on a system; and p~^ gives greater weight to observations that are expected to be
drawn less often within a sample. In order for Cj to be defined, we require that Eli ndqdi > 0 for
each /'. This is equivalent to assuming that each system has positive probability of being sampled
at least once.

Cochran (1977, Section 9A.3) considers /z in the case D = 1. He calls /z the "probability propor-
tional to p" estimator.  A special case is probability proportional to size (pps), in which pdi  on Wj.
In this case systems are sampled in proportion to their weight in fji, and when D = 1 the estimator
is just the sample mean, /z = n~l E/=i yij • The pps estimator is easily shown to be the minimum
variance unbiased estimator of /z when there are no constraints on the />s.

The following Theorem gives properties of /z. See also Cochran (1977, Section 9A.3) for the case
D = l.

Theorem 1. Let »d = T^=i(Wycqd}i and jid = n~dl

(a)  /z is an unbiased estimate of /z.

(b)  Var(A) = Eli nd EJLi P

(c) An unbiased estimate of Var(A) is F(£) = Ei ^T E/
                                                  a     j


Note. When pdi = 0, we define /z in (A-2) to replace p~^ by an arbitrary number.  This leaves
the estimator unchanged, since the affected systems are sampled with probability zero.  But in
Theorem l(b) and below it allows us to write Pdi/Pdi =  0 when pdi = 0.
                                          A-2

-------
UCMR Statistical Design	August 2001


Proof of Theorem 1.  First prove some facts about [id and [id'-
                                    ndqdi = /  nd / (Wycqd)i = /  nd[id,       (A-3)
                                  d          d      i              d
                                                                                 (A-4)
              EV-  x    v(Wyc\     ^    (Wyc\    ^
              E(fj,d) = E  —    = ypdi   —   =  >
                         V Pd //,,,     ,     V Pd )i
(A-5)
                        -iv  f(Wyc\   \    _1V.    /Y0>c\      \2
                    = ndVar((^)J=nd^Pdl((^)l-^)-
(a) EA = Y.dndE[ld = Y.dnd^d = At, by (A-4), (A-5), and (A-3).
(b) Var(A) = Erf«rf Var(Arf) by (A-4); apply (A-6).

(c) Let vd = E"li (f1^)   - Arf)2. Standard results give £i)d = (nd - 1)
               •>  \\ jja ndj      /
    nd(nd - 1) Var(/irf), so £F(A) = Erf     Evd = Ednd^(M = Var(A) by
A.2   Stratified Sampling

Suppose now that the N systems are divided into T strata. The 5-th stratum contains Ns systems,
so N = Ej=i Ns. For the purposes of the UCMR, a stratum is a combination of State or Territory
(1 of 56), system size (1, 2, or 3), source water type (GW or SW), and system type (CWS or
NTNCWS). Thus there are 7 = 56x3x2x2 = 672 strata. We could also consider smaller sets
of strata, for example just the 6 size-by-source-water-type strata.

Instead of a single system number /',  each system is now identified by a stratum  number s and a
system number h within stratum s. This is just a relabeling of the systems, so the development of
Section A.I still holds, with / and / replaced everywhere by (s, h) and (S, H). The estimand is
      	 'T'   	 AT
[i = z2s=i z2h=\ Wshysh, and in the d-th sample we draw i.i.d. stratum and system number pairs
(Sdi,Hd\), ..., (Sdnd, Hdnd) using P(Sd\ = s, Hd\  = h) = pdsh.

An important special case is when the mean weights Wsh and sampling probabilities pdsh  are the
same for all systems h within a stratum. That is, assume

Assumption 1.  Wsh = Ws and pdsh = pds for all d, s, and h.
In this case let rds be the probability of drawing the next system from stratum s: then rds  =
P(Sd\ = s) = V_)fc Pdsh = Nspds, so [i becomes
                                   D
                                      ^—,  / iic. \
                                                                                 (A-7)
                                          A-3

-------
UCMR Statistical Design _ August 2001

where the new mean weights are Us = WSNS. The Us have a "per stratum" interpretation instead
of "per system." For example, if before Wsh was the number of people served by system h in
stratum s, now Us is the number of people served by all of stratum s.

From here forward we consider only the special case of Assumption 1. This has the advantage that
instead of simultaneously drawing S and H using pdsh, we can now think of first drawing a stratum
number S with probabilities r^, then drawing a system number within S from a uniform distribu-
tion on {1, . . . , NS}. Since the system numbers are uniformly distributed within each stratum, we
do not have to compute their probabilities and will concentrate on the strata.

A.3 Optimal Choice of Probabilities

In this section we describe a procedure for choosing the sampling probabilities r^s in order to
minimize the variance of certain mean estimates, subject to upper bounds on the variance of other
mean estimates. Some simplifying assumptions are required in order to solve the problem. We first
formulate the optimization as a nonlinear programming problem, and then describe some details
of the implementation.

A.3.1 Problem Formulation

Let jLti, . . . , I.IE+F be means of interest, each determined by a given set of weights: ^ =
J^s h Wishysh. Suppose that a set of samples will be drawn as described in Section A.I, and each
\jui will then be estimated by //,,• as in (A-7). The problem in this section is to find sampling prob-
abilities Yds that minimize the maximum variance of /ti, . . . , /ZE, subject to given upper bounds
on the variances of (AE+I, • • • , &E+F- IfR = [rds]^=\ Ts=\ is the matrix of sampling probabilities,
then we want R that solves
min max {Var(/ti), . . . , Var(/5,£)}
s.t. Var(#£+,-) ^ uf, i = 1, . . . , F ^A_8^
Rl = 1
R^ 0

The variance bounds iij could be chosen, for example, to give normal-theory confidence intervals
of no more than a specified width at a specified confidence.

The first simplifying assumption is that mean weights and sampling probabilities are constant
within each stratum. This is Assumption 1 of the previous section. Assumption 1 poses no problem
for estimating occurrence, since then we are just counting systems and every system in a stratum
can receive the same weight. On the other hand for estimating exposure, Assumption 1 is restrictive
since each system should be weighted by its number of customers, which varies within a stratum.
Under Assumption 1 we are forced instead to use, say, the mean number of customers per system
in each stratum. However, if the strata are based sufficiently on size so that most of the variation in

A-4

-------
UCMR Statistical Design	August 2001

system size is between and not within strata, then the restriction due to Assumption 1 will be mild.
Moreover, if information were available about the size distribution within each stratum, a different
assumption could be substituted to use that information.  Similarly,  other assumptions could be
substituted about the proportions of the
Under Assumption 1, the argument of Section A.2 can be applied to Theorem l(b) to give

                                   (Ij2c2 qdy\           /            \2
                                   —	 I  —  2_]nd I /_](UiCqdy")s }             (^-9)
                                      rd    Is    d     \ s          /

where ys = N~l X^=i ysh- The only unknowns in (A-9) are ys and rds (since for the purposes of
optimization, qds := l{r^  > 0} are assumed to be given as part of the problem). In order to leave
only rds unknown, we make the next simplifying assumption:

Assumption 2. ys  = p for all s and some p e [0,  1].

The user has to specify a value of p. The results of the optimization will be valid only for this
value. For example, the user could choose to optimize the sampling plan for contaminants which
occur on average in p = 1% of systems, as in Section 4.

Assumption 2 says that the fraction of occurrence is identical in each stratum.  This is obviously
unrealistic, but is a reasonable default in the absence of other information.  Again,  any other as-
sumption that simplifies or summarizes the effect of the _y's could also be used. For example, the
_y's could be assumed to depend in a simple way on mean system size in each stratum.
Using (2), we can rewrite (A-9) as
                               Var(&) = p V — - P2bt                         (A-10)
where aids = nd(U^cqd)s and bt = J^dnd (^ls(^ccld)s)  • We can now rewrite (A-8) in matrix
form. For each/', let at be a 1 x D Trow vector of the aids'. «/  = [«ni, • • • , «nr, • • • ,«»£>!, • • • ,
In the same way, arrange the rds and r^,1 into DT x 1 column vectors r and r"1,  respectively. Let
AI  =  [a\ . . . a'E]' , A2 = [a'E+l...a'E+F]', b\  =  [b\, . . . , bE]', b2 =  [bE+i, . . . , bE+F]', and
u = {u\, . . . , up\ '. Then (A-8) can be written as
                              min max (pA\r  l — p2b\)
                                r
                               s.t.      pA2r~l ^ p2b2 + u
                                                                                 (A-ll)
                                          Sr   =1
                                           r
where S i s a matrix of 1' s and 0' s such that Sr i s the same as Rl (in fact S = ID  11 x T , where
is the Kronecker or tensor product).

                                          A-5

-------
UCMR Statistical Design	August 2001

Problem (A-ll) is a nonlinear program, with both a nonlinear objective function and nonlinear
constraints.  It has to be solved numerically. Fortunately, the nonlinearity in (A-ll) is not too
bad: each minimax objective and constraint is a linear function of either r or r~l. Solution with
nonlinear programming software is therefore more or less routine. The next subsection describes
some details of the implementation.

A.3.2  Implementation

We programmed the optimization described above in Matlab (The Math Works, Inc.,  1996), a ma-
trix computation language. For the optimization  step we used the f minimax function in Matlab's
Optimization Toolbox (The Math Works, Inc., 1999).

The optimization requires that the user provide the following input:

    • The number of systems and number of people served  in each of the 672  sampling  strata.
     For the UCMR this information came from the 1999 Drinking Water Infrastructure Needs
     Survey, as described in Section 2.

    • A skeletal sampling plan:

        - D, the number of independent samples;
        - «i,..., «£>, the numbers of systems  in each sample;
        - qds, the indicators of positive sampling probabilities. If qds = 1, then the  optimization
          searches for the optimal positive probability r
-------
UCMR Statistical Design	August 2001

   2.  Specify a sampling plan with D = 56 samples, one for each State and Territory. The num-
      ber of systems in each sample (= State) is determined in Step 1.  Each sample allows pos-
      itive sampling probability in each of the 12 system-type-by-source-water-type-by-system-
      size strata within its State or Territory, and zero sampling probability in all other States and
      Territories.

This plan allows 56x12 = 672 positive sampling probabilities. The resulting optimal probabilities
are tabulated in  Appendix B.

In order to help the optimization converge, we simplified the problem further by partitioning it into
two subproblems, one for CWSs and one for NTNCWSs. The minimax objectives and variance
constraints already break down in this way, since each of the target means (e.g. Size 1 GW CWSs)
puts positive weight either only on CWSs, or only on NTNCWSs. In order to partition the prob-
ability constraints, we imposed a further constraint, which is  our last simplifying assumption for
the optimization:

Assumption 3.  The sums of the CWS andNTNCWS sampling probabilities in each sample (= State)
are proportional to the respective populations served.

For example in Texas, CWSs serve 93.2% of the population, so by  design each system allocated to
Texas has a 93.2% chance of being drawn as a CWS, as Appendix B  confirms.

Using Assumption 3,  the optimization may be decomposed from a single problem in 672 un-
knowns, into separate CWS and NTNCWS optimizations each in 336 unknowns. The resulting
subproblems are solved in about 3 hours each on a 400 MHz Pentium-II computer. Note how-
ever that even if Assumption 3 were not required, solving the two subproblems is not equivalent to
solving the original problem, because we now perform two separate minimax optimizations, which
give different optimal values.  As a result the widest confidence interval for CWSs in Table 5 is
narrower than for NTNCWSs.  We consider this difference to be an advantage: it reflects a decision
that CWSs, which serve more people, should comprise a larger proportion of the systems sampled
for the UCMR.

A.4   Confidence Intervals

A normal-theory 100(1 — a)% confidence interval for jl may be computed in the usual  way as
PL ± z\-a/2-JV(jji), where V(£L) is the variance estimate defined in Theorem 1.

Under simple random sampling, the Wilson score confidence interval without continuity correction
(Newcombe, 1998; Wilson, 1927) for a proportion p is derived by using the normal interval to find
                                          A-7
-------
UCMR Statistical Design	August 2001


the range of parameter values for which the parameter estimate is plausible:
                       p e p ± zvVO - p)/n
                  O-   pe  2«p+z±zz2 + 4«p(l-p)/2(« + z)            (A-12)

so (A-12) defines the Wilson score confidence interval for p. Under Assumptions 1 and 2, the same
derivation can be applied to a probability-weighted estimator with stratified sampling.  Writing
(A-10) as Var(p) = V\p — V2p2, we have
                  p e p±zy Vip- V2p2

                  P e   lp + Viz2 ± zvfz2 + 4(Vip - V2p2)2(\ + F2z2)        (A- 13)
so (A-13) may be considered an approximate Wilson score confidence interval for p. The interval
is only approximate because it requires Assumptions 1 and 2.

A.5   Problems

The method described in this Appendix has the following weaknesses.

1 .  The theory assumes that systems are sampled with replacement, while the sampling plan for
   Assessment Monitoring (Sections 4 and 8 above) uses sampling without replacement. When the
   sampling fraction n/N is small, the sampling probabilities under sampling without replacement
   do not change much from one sample to the next, so the difference between the two methods
   is small. In Section 4, the sampling fraction is 800/63,869, or about 1.25%, which is probably
   small enough to ignore. By comparison, in simple random sampling, a sampling fraction of this
   size reduces standard errors by about 0.6%, and is commonly ignored.

2.  The optimization forces the probability of drawing a CWS  or NTNCWS to be proportional
   to the population served by CWSs and NTNCWSs in each  State  (Assumption  3).  So about
   88% of the systems selected for Assessment Monitoring will be CWSs, since CWSs serve 88%
   of the total population served by small systems.  Although this constraint was imposed for
   computational reasons as described in Section A.3.2, it agrees with the principle of sampling in
   proportion to the population served.

3.  The optimization assumes that the mean weights and sampling probabilities are constant within
   each stratum (Assumption 1). As discussed in Section A.3, this assumption is mildly restrictive
   for exposure estimates, where weights and probabilities should increase with system size, which
   varies mostly between strata but also somewhat within strata.

4.  The optimization also assumes that the occurrence fraction p is the same in each stratum (As-
   sumption 2). As argued in Section A.3, this assumption is unrealistic but is a reasonable default
   in the absence of information about how occurrence depends on stratum properties.  Even if

                                          A-8
-------
UCMR Statistical Design	August 2001

   such information were available, it would probably be different for each contaminant and so
   again Assumption 2 is a reasonable default.

5.  The optimization depends on a user-specified value of p. If a new value of p is hypothesized,
   the optimization must be rerun. Moreover a single sampling plan has to be used for many con-
   taminants, which will occur with different frequencies; so the sampling plan will be suboptimal
   for most contaminants. The sensitivity of the optimal sampling plan to the assumed value of p
   has not been tested.

6.  Theorem 1 ignores sampling error in the _y,- 's. It assumes that once the system number is chosen,
   we can go to the system and observe y without error. This is a classical assumption in sampling,
   but it does not hold in this case. The response _yz- equals 1 if the contaminant of interest occurs
   at detectable levels at any time in system /', or 0 otherwise. But of course yt cannot be observed
   without error:  a finite number of  samples is drawn from each system, and instead of yt we
   observe j>z-, which equals 1 if the contaminant is observed in our few samples,  or 0 otherwise.
   The approximation of yt by yt introduces bias and additional variance. In particular, j),- will
   be negatively biased for yf, since we will often miss a contaminant which is only occasionally
   present.
   In order to study the effect of j)z- on the present theory, one needs either data  or assumptions
   about the frequency of occurrence of the contaminant of interest above the level of interest; the
   number of samples taken from each system; the number of sampling locations within systems;
   and temporal and spatial variability of contaminant occurrence across sampling  locations. Such
   data will be available once the sampling program is complete, and more accurate and conserva-
   tive confidence intervals can be computed at that time.
                                          A-9
-------
UCMR Statistical Design	August 2001
                                             A-10
-------
UCMR Statistical Design	August 2001
                                Appendix B
       Expected and Total Number of Systems Selected for Assessment Monitoring
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems
4











14











13











2
State/Territory
Alaska
Alaska
Alaska
Alaska
Alaska
Alaska
Alaska
Alaska
Alaska
Alaska
Alaska
Alaska
Alabama
Alabama
Alabama
Alabama
Alabama
Alabama
Alabama
Alabama
Alabama
Alabama
Alabama
Alabama
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
Arkansas
American
Samoa
System
Type
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
System Size
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.103141
0.527185
0.112833
0.061396
0.068704
0.126742
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.012354
0.003231
0.274297
0.023398
0.538540
0.111429
0.001197
0.009440
0.006425
0.019688
0.000000
0.000000
0.048335
0.037493
0.281970
0.065464
0.380083
0.165064
0.001920
0.012133
0.002293
0.005248
0.000000
0.000000
0.278403
Cumulative
Probability
0.103141
0.630325
0.743158
0.804554
0.873258
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
0.012354
0.015585
0.289882
0.313280
0.851820
0.963250
0.964446
0.973886
0.980312
1.000000
1.000000
1.000000
0.048335
0.085827
0.367797
0.433261
0.813343
0.978408
0.980327
0.992460
0.994752
1.000000
1.000000
1.000000
0.278403
Expected #
of Systems
0
1
1
0
1
1
0
0
0
0
0
0
0
0
4
0
8
2
0
0
0
0
0
0
1
0
4
1
5
2
0
0
0
0
0
0
1
Actual # of
Systems
1
2
1
0
0
0
0
0
0
0
0
0
0
1
2
1
10
1
0
0
0
0
0
0
1
1
3
0
5
3
0
0
0
0
0
0
1
                                              B-2
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems











12











48






State/Territory
American
Samoa
American
Samoa
American
Samoa
American
Samoa
American
Samoa
American
Samoa
American
Samoa
American
Samoa
American
Samoa
American
Samoa
American
Samoa
Arizona
Arizona
Arizona
Arizona
Arizona
Arizona
Arizona
Arizona
Arizona
Arizona
Arizona
Arizona
California
California
California
California
California
California
California
System
Type
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
System Size
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.636275
0.000000
0.085323
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.108010
0.058390
0.284651
0.023234
0.366545
0.016569
0.021645
0.019254
0.077477
0.005127
0.019097
0.000000
0.091801
0.145704
0.170920
0.058824
0.280015
0.129795
0.022186
Cumulative
Probability
0.914677
0.914677
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
0.108010
0.166401
0.451052
0.474286
0.840831
0.857400
0.879045
0.898299
0.975776
0.980903
1.000000
1.000000
0.091801
0.237505
0.408425
0.467249
0.747264
0.877059
0.899245
Expected #
of Systems
1
0
0
0
0
0
0
0
0
0
0
1
0
3
0
4
0
0
0
1
0
0
0
4
0
8
3
13
6
1
Actual # of
Systems
1
0
0
0
0
0
0
0
0
0
0
2
0
2
1
7
0
0
0
0
0
0
0
6
10
7
2
10
8
1
                                               B-3
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems





10











6











2







State/Territory
California
California
California
California
California
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Colorado
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Connecticut
Delaware
Delaware
Delaware
Delaware
Delaware
Delaware
Delaware
Delaware
System
Type
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
System Size
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.020841
0.022424
0.011769
0.012763
0.032959
0.089138
0.188764
0.166154
0.138157
0.129963
0.213830
0.002338
0.008918
0.001276
0.001898
0.000000
0.059564
0.195981
0.006058
0.166513
0.037219
0.101240
0.176568
0.153652
0.000000
0.162770
0.000000
0.000000
0.000000
0.177268
0.000000
0.381717
0.000000
0.239315
0.000000
0.060765
0.000000
Cumulative
Probability
0.920085
0.942510
0.954278
0.967041
1.000000
0.089138
0.277903
0.444056
0.582213
0.712176
0.926007
0.928344
0.937263
0.938538
0.940436
0.940436
1.000000
0.195981
0.202039
0.368552
0.405771
0.507011
0.683578
0.837230
0.837230
1.000000
1.000000
1.000000
1.000000
0.177268
0.177268
0.558986
0.558986
0.798300
0.798300
0.859066
0.859066
Expected #
of Systems
1
1
1
1
2
0
0
2
1
1
2
0
0
0
0
0
1
0
0
1
0
1
1
1
0
1
0
0
0
0
0
1
0
1
0
0
0
Actual # of
Systems
0
2
1
0
1
0
1
1
3
2
2
0
0
0
0
0
1
1
0
0
1
0
2
0
0
2
0
0
0
1
0
0
0
1
0
0
0
                                              B-4
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems




32











22











1








State/Territory
Delaware
Delaware
Delaware
Delaware
Florida
Florida
Florida
Florida
Florida
Florida
Florida
Florida
Florida
Florida
Florida
Florida
Georgia
Georgia
Georgia
Georgia
Georgia
Georgia
Georgia
Georgia
Georgia
Georgia
Georgia
Georgia
Guam
Guam
Guam
Guam
Guam
Guam
Guam
Guam
Guam
System
Type
NTNCWS
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
System Size
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.078519
0.062415
0.000000
0.000000
0.107618
0.000000
0.350597
0.001696
0.394205
0.016435
0.058549
0.000000
0.060849
0.000000
0.010051
0.000000
0.123473
0.075460
0.216882
0.076029
0.205910
0.233809
0.014761
0.016129
0.017752
0.017194
0.002601
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
1.000000
0.000000
0.000000
0.000000
Cumulative
Probability
0.937585
1.000000
1.000000
1.000000
0.107618
0.107618
0.458215
0.459910
0.854115
0.870550
0.929099
0.929099
0.989949
0.989949
1.000000
1.000000
0.123473
0.198932
0.415814
0.491843
0.697753
0.931562
0.946323
0.962453
0.980205
0.997399
1.000000
1.000000
0.000000
0.000000
0.000000
0.000000
0.000000
1.000000
1.000000
1.000000
1.000000
Expected #
of Systems
0
0
0
0
0
0
11
0
13
1
2
0
2
0
0
0
0
0
5
2
5
5
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
Actual # of
Systems
0
0
0
0
3
0
8
0
17
0
2
0
1
0
1
0
3
5
3
1
4
4
1
1
0
0
0
0
0
0
0
0
0
1
0
0
0
                                              B-5
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems



3











16











8









State/Territory
Guam
Guam
Guam
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Hawaii
Iowa
Iowa
Iowa
Iowa
Iowa
Iowa
Iowa
Iowa
Iowa
Iowa
Iowa
Iowa
Idaho
Idaho
Idaho
Idaho
Idaho
Idaho
Idaho
Idaho
Idaho
Idaho
System
Type
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
System Size
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.000000
0.000000
0.000000
0.044055
0.013192
0.348328
0.018984
0.442170
0.086899
0.029826
0.002279
0.014268
0.000000
0.000000
0.000000
0.108789
0.009389
0.421654
0.022059
0.338653
0.061586
0.016414
0.003658
0.017797
0.000000
0.000000
0.000000
0.124816
0.199402
0.181250
0.050444
0.245922
0.036241
0.044504
0.037115
0.042032
0.008380
Cumulative
Probability
1.000000
1.000000
1.000000
0.044055
0.057247
0.405574
0.424558
0.866728
0.953627
0.983453
0.985732
1.000000
1.000000
1.000000
1.000000
0.108789
0.118178
0.539832
0.561891
0.900544
0.962131
0.978545
0.982203
1.000000
1.000000
1.000000
1.000000
0.124816
0.324217
0.505467
0.555911
0.801833
0.838074
0.882578
0.919693
0.961725
0.970105
Expected #
of Systems
0
0
0
0
0
1
0
1
1
0
0
0
0
0
0
0
0
7
1
5
1
0
0
0
0
0
0
0
0
1
1
2
0
1
0
0
0
Actual # of
Systems
0
0
0
0
0
1
0
2
0
0
0
0
0
0
0
2
1
9
3
1
0
0
0
0
0
0
0
1
1
0
1
5
0
0
0
0
0
                                              B-6
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems


28











20











12










State/Territory
Idaho
Idaho
Illinois
Illinois
Illinois
Illinois
Illinois
Illinois
Illinois
Illinois
Illinois
Illinois
Illinois
Illinois
Indiana
Indiana
Indiana
Indiana
Indiana
Indiana
Indiana
Indiana
Indiana
Indiana
Indiana
Indiana
Kansas
Kansas
Kansas
Kansas
Kansas
Kansas
Kansas
Kansas
Kansas
Kansas
Kansas
System
Type
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
System Size
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.029895
0.000000
0.072244
0.010088
0.320482
0.036115
0.360524
0.105342
0.020272
0.010566
0.020635
0.005094
0.006049
0.032592
0.046374
0.001670
0.292420
0.015509
0.424877
0.076221
0.063461
0.021198
0.054268
0.004002
0.000000
0.000000
0.079950
0.060355
0.370682
0.113083
0.174180
0.169657
0.007580
0.009514
0.015001
0.000000
0.000000
Cumulative
Probability
1.000000
1.000000
0.072244
0.082331
0.402813
0.438927
0.799451
0.904793
0.925065
0.935631
0.956266
0.961359
0.967408
1.000000
0.046374
0.048044
0.340464
0.355973
0.780850
0.857071
0.920532
0.941730
0.995998
1.000000
1.000000
1.000000
0.079950
0.140304
0.510986
0.624069
0.798249
0.967906
0.975485
0.984999
1.000000
1.000000
1.000000
Expected #
of Systems
0
0
0
0
9
1
10
3
1
0
1
0
0
1
0
0
6
0
8
2
2
0
1
0
0
0
0
0
4
2
2
2
0
0
0
0
0
Actual # of
Systems
0
0
0
0
10
1
15
1
0
0
1
0
0
0
1
0
2
0
14
1
1
0
1
0
0
0
2
0
4
1
4
1
0
0
0
0
0
                                              B-7
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems

9











27











12











State/Territory
Kansas
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Kentucky
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Louisiana
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
Massachusetts
System
Type
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
System Size
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.000000
0.018156
0.057626
0.088308
0.124368
0.139996
0.509989
0.008458
0.027535
0.004291
0.021274
0.000000
0.000000
0.070882
0.013489
0.360128
0.018921
0.414728
0.058840
0.005874
0.013111
0.013590
0.005822
0.000000
0.024615
0.030041
0.006699
0.144887
0.024759
0.603425
0.100924
0.037313
0.000000
0.048693
0.000000
0.003259
0.000000
Cumulative
Probability
1.000000
0.018156
0.075782
0.164090
0.288458
0.428453
0.938442
0.946900
0.974435
0.978726
1.000000
1.000000
1.000000
0.070882
0.084371
0.444499
0.463420
0.878148
0.936988
0.942862
0.955973
0.969562
0.975385
0.975385
1.000000
0.030041
0.036740
0.181627
0.206386
0.809810
0.910734
0.948048
0.948048
0.996741
0.996741
1.000000
1.000000
Expected #
of Systems
0
0
0
1
1
1
5
0
0
0
0
0
0
0
0
10
1
11
2
0
0
0
0
0
1
0
0
2
0
7
1
1
0
1
0
0
0
Actual # of
Systems
0
0
1
0
2
2
4
0
0
0
0
0
0
3
1
12
0
6
1
0
0
0
0
0
4
0
0
2
1
7
1
0
0
1
0
0
0
                                              B-8
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems
8











6











24











16
State/Territory
Maryland
Maryland
Maryland
Maryland
Maryland
Maryland
Maryland
Maryland
Maryland
Maryland
Maryland
Maryland
Maine
Maine
Maine
Maine
Maine
Maine
Maine
Maine
Maine
Maine
Maine
Maine
Michigan
Michigan
Michigan
Michigan
Michigan
Michigan
Michigan
Michigan
Michigan
Michigan
Michigan
Michigan
Minnesota
System
Type
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
System Size
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.098108
0.062991
0.233629
0.035363
0.163664
0.096850
0.113191
0.008687
0.174751
0.000000
0.012768
0.000000
0.080051
0.080540
0.210100
0.133374
0.125473
0.149065
0.122935
0.058997
0.027121
0.012344
0.000000
0.000000
0.088378
0.009233
0.286533
0.023687
0.256976
0.063746
0.159639
0.000000
0.096246
0.000000
0.015561
0.000000
0.097146
Cumulative
Probability
0.098108
0.161098
0.394727
0.430090
0.593754
0.690604
0.803794
0.812481
0.987232
0.987232
1.000000
1.000000
0.080051
0.160591
0.370691
0.504065
0.629538
0.778603
0.901537
0.960535
0.987656
1.000000
1.000000
1.000000
0.088378
0.097611
0.384144
0.407831
0.664808
0.728554
0.888193
0.888193
0.984439
0.984439
1.000000
1.000000
0.097146
Expected #
of Systems
0
0
2
0
1
1
1
0
1
0
0
0
0
1
0
1
1
1
1
0
0
0
0
0
0
0
7
1
6
2
4
0
2
0
0
0
0
Actual # of
Systems
1
0
o
J
0
0
1
1
0
0
0
2
0
0
1
1
0
0
1
3
0
0
0
0
0
2
0
10
1
5
2
3
0
1
0
0
0
2
                                              B-9
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems











20











2







State/Territory
Minnesota
Minnesota
Minnesota
Minnesota
Minnesota
Minnesota
Minnesota
Minnesota
Minnesota
Minnesota
Minnesota
Missouri
Missouri
Missouri
Missouri
Missouri
Missouri
Missouri
Missouri
Missouri
Missouri
Missouri
Missouri
Marianna
Islands
Marianna
Islands
Marianna
Islands
Marianna
Islands
Marianna
Islands
Marianna
Islands
Marianna
Islands
Marianna
Islands
System
Type
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
System Size
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.002208
0.457793
0.010180
0.324204
0.058047
0.033979
0.001830
0.005726
0.008887
0.000000
0.000000
0.089997
0.047255
0.307287
0.046065
0.331344
0.109887
0.012704
0.021993
0.030450
0.000000
0.003018
0.000000
0.424375
0.000000
0.575625
0.000000
0.000000
0.000000
0.000000
0.000000
Cumulative
Probability
0.099353
0.557147
0.567327
0.891531
0.949578
0.983557
0.985387
0.991113
1.000000
1.000000
1.000000
0.089997
0.137252
0.444539
0.490604
0.821948
0.931835
0.944539
0.966532
0.996982
0.996982
1.000000
1.000000
0.424375
0.424375
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
Expected #
of Systems
0
7
0
5
1
1
0
0
0
0
0
0
0
6
1
7
2
0
0
1
0
0
0
0
0
1
0
0
0
0
0
Actual # of
Systems
0
5
0
7
0
2
0
0
0
0
0
o
5
0
7
0
5
3
2
0
0
0
0
0
1
0
1
0
0
0
0
0
                                             B-10
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems




30











6











22






State/Territory
Marianna
Islands
Marianna
Islands
Marianna
Islands
Marianna
Islands
Mississippi
Mississippi
Mississippi
Mississippi
Mississippi
Mississippi
Mississippi
Mississippi
Mississippi
Mississippi
Mississippi
Mississiroi
Montana
Montana
Montana
Montana
Montana
Montana
Montana
Montana
Montana
Montana
Montana
Montana
North Carolina
North Carolina
North Carolina
North Carolina
North Carolina
North Carolina
North Carolina
System
Type
NTNCWS
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
System Size
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.000000
0.000000
0.000000
0.000000
0.054848
0.000395
0.536246
0.000000
0.355742
0.000000
0.010244
0.000000
0.025558
0.000000
0.016968
0.000000
0.176760
0.109912
0.225216
0.081293
0.147776
0.146475
0.059815
0.024406
0.028346
0.000000
0.000000
0.000000
0.156723
0.045023
0.195084
0.052233
0.227239
0.163587
0.061479
Cumulative
Probability
1.000000
1.000000
1.000000
1.000000
0.054848
0.055243
0.591489
0.591489
0.947231
0.947231
0.957475
0.957475
0.983032
0.983032
1.000000
1.000000
0.176760
0.286672
0.511888
0.593181
0.740957
0.887433
0.947248
0.971654
1.000000
1.000000
1.000000
1.000000
0.156723
0.201746
0.396830
0.449063
0.676301
0.839889
0.901368
Expected #
of Systems
0
0
0
0
0
0
16
0
11
0
0
0
1
0
0
0
0
0
1
0
1
1
1
0
0
0
0
0
4
0
4
1
5
4
1
Actual # of
Systems
0
0
0
0
2
0
20
0
6
0
0
0
2
0
0
0
1
1
1
0
1
1
0
0
1
0
0
0
3
1
2
2
6
6
0
                                             B-ll
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems





4











8











6




State/Territory
North Carolina
North Carolina
North Carolina
North Carolina
North Carolina
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
North Dakota
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
Nebraska
New
Hampshire
New
Hampshire
New
Hampshire
New
Hampshire
New
Hampshire
System
Type
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
System Size
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.009916
0.066654
0.020225
0.001837
0.000000
0.107831
0.014230
0.511024
0.049591
0.220790
0.079568
0.001980
0.014986
0.000000
0.000000
0.000000
0.000000
0.152090
0.000000
0.426111
0.014720
0.307693
0.044192
0.024216
0.000000
0.016591
0.000000
0.014386
0.000000
0.179653
0.059689
0.211377
0.082279
0.100470
Cumulative
Probability
0.911284
0.977938
0.998163
1.000000
1.000000
0.107831
0.122061
0.633085
0.682676
0.903466
0.983034
0.985014
1.000000
1.000000
1.000000
1.000000
1.000000
0.152090
0.152090
0.578202
0.592922
0.900615
0.944807
0.969023
0.969023
0.985614
0.985614
1.000000
1.000000
0.179653
0.239342
0.450719
0.532998
0.633468
Expected #
of Systems
0
1
1
0
0
1
0
2
0
1
0
0
0
0
0
0
0
0
0
3
0
3
1
0
0
0
0
0
0
0
0
1
0
1
Actual # of
Systems
1
0
1
0
0
0
1
3
0
0
0
0
0
0
0
0
0
1
0
3
0
3
0
1
0
0
0
0
0
1
0
1
2
1
                                             B-12
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems







16











8











4

State/Territory
New
Hampshire
New
Hampshire
New
Hampshire
New
Hampshire
New
Hampshire
New
Hampshire
New
Hamrjshire
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Jersey
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
New Mexico
Nevada
Nevada
System
Type
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
System Size
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.134734
0.129265
0.000000
0.102533
0.000000
0.000000
0.000000
0.041427
0.000000
0.180334
0.006606
0.435798
0.043850
0.105685
0.001010
0.140774
0.018795
0.025721
0.000000
0.147242
0.069756
0.268916
0.044185
0.310844
0.077038
0.020732
0.024464
0.031415
0.005407
0.000000
0.000000
0.096624
0.061023
Cumulative
Probability
0.768202
0.897467
0.897467
1.000000
1.000000
1.000000
1.000000
0.041427
0.041427
0.221761
0.228367
0.664165
0.708015
0.813700
0.814710
0.955484
0.974279
1.000000
1.000000
0.147242
0.216998
0.485914
0.530099
0.840944
0.917981
0.938713
0.963178
0.994593
1.000000
1.000000
1.000000
0.096624
0.157648
Expected #
of Systems
1
1
0
1
0
0
0
0
0
3
0
7
1
2
0
2
0
0
0
0
0
2
1
2
1
0
0
0
0
0
0
1
0
Actual # of
Systems
0
0
0
1
0
0
0
1
0
4
0
7
2
2
0
0
0
0
0
1
2
3
0
0
0
0
0
2
0
0
0
0
0
                                             B-13
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems










29











28











15


State/Territory
Nevada
Nevada
Nevada
Nevada
Nevada
Nevada
Nevada
Nevada
Nevada
Nevada
New York
New York
New York
New York
New York
New York
New York
New York
New York
New York
New York
New York
Ohio
Ohio
Ohio
Ohio
Ohio
Ohio
Ohio
Ohio
Ohio
Ohio
Ohio
Ohio
Oklahoma
Oklahoma
Oklahoma
System
Type
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
System Size
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.316982
0.025688
0.301061
0.064865
0.030812
0.028647
0.032687
0.011499
0.030112
0.000000
0.105270
0.119826
0.212439
0.099139
0.128412
0.182485
0.025672
0.012511
0.039152
0.013069
0.007687
0.054339
0.060154
0.021562
0.247273
0.045966
0.308339
0.141144
0.071715
0.004899
0.056505
0.035977
0.006466
0.000000
0.053578
0.129226
0.235046
Cumulative
Probability
0.474630
0.500317
0.801378
0.866243
0.897055
0.925702
0.958389
0.969888
1.000000
1.000000
0.105270
0.225096
0.437535
0.536674
0.665086
0.847570
0.873242
0.885754
0.924905
0.937975
0.945661
1.000000
0.060154
0.081715
0.328989
0.374955
0.683294
0.824438
0.896153
0.901052
0.957557
0.993534
1.000000
1.000000
0.053578
0.182804
0.417850
Expected #
of Systems
1
0
1
1
0
0
0
0
0
0
0
0
6
3
4
5
1
1
1
0
0
2
0
0
0
1
9
4
2
0
2
1
0
0
0
0
3
Actual # of
Systems
o
6
0
0
0
0
1
0
0
0
0
1
1
8
1
4
6
4
0
2
0
0
2
1
0
6
0
12
4
2
0
1
0
2
0
3
0
1
                                             B-14
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems









11











37











9



State/Territory
Oklahoma
Oklahoma
Oklahoma
Oklahoma
Oklahoma
Oklahoma
Oklahoma
Oklahoma
Oklahoma
Oregon
Oregon
Oregon
Oregon
Oregon
Oregon
Oregon
Oregon
Oregon
Oregon
Oregon
Oregon
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Pennsylvania
Puerto Rico
Puerto Rico
Puerto Rico
Puerto Rico
System
Type
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
System Size
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.135236
0.110855
0.310569
0.004712
0.016061
0.000876
0.000000
0.003840
0.000000
0.105427
0.124548
0.189055
0.126690
0.136664
0.194845
0.057707
0.012444
0.025242
0.027378
0.000000
0.000000
0.090515
0.064085
0.200824
0.064166
0.183790
0.172501
0.082633
0.023604
0.080900
0.029903
0.007079
0.000000
0.030068
0.367728
0.096501
0.126482
Cumulative
Probability
0.553087
0.663942
0.974511
0.979223
0.995284
0.996160
0.996160
1.000000
1.000000
0.105427
0.229975
0.419030
0.545720
0.682384
0.877229
0.934936
0.947380
0.972622
1.000000
1.000000
1.000000
0.090515
0.154600
0.355424
0.419591
0.603380
0.775881
0.858514
0.882118
0.963018
0.992921
1.000000
1.000000
0.030068
0.397797
0.494298
0.620780
Expected #
of Systems
2
2
5
0
0
0
0
0
0
0
0
2
2
2
2
1
0
0
0
0
0
0
1
8
0
7
6
3
1
3
1
0
0
0
0
1
1
Actual # of
Systems
2
3
6
0
0
0
0
0
0
0
1
3
1
1
3
2
0
0
0
0
0
4
3
4
3
4
8
5
1
4
1
0
0
0
2
0
1
                                             B-15
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems








2











11











4




State/Territory
Puerto Rico
Puerto Rico
Puerto Rico
Puerto Rico
Puerto Rico
Puerto Rico
Puerto Rico
Puerto Rico
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
Rhode Island
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Carolina
South Dakota
South Dakota
South Dakota
South Dakota
South Dakota
System
Type
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
System Size
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.163618
0.140299
0.002643
0.013930
0.034820
0.000000
0.023911
0.000000
0.102141
0.000000
0.119044
0.075209
0.166570
0.151757
0.139146
0.000000
0.144328
0.000000
0.101804
0.000000
0.074769
0.000000
0.219103
0.054199
0.272259
0.277753
0.012816
0.005474
0.011296
0.015615
0.000000
0.056716
0.123446
0.114364
0.336164
0.065505
0.271831
Cumulative
Probability
0.784398
0.924697
0.927340
0.941269
0.976089
0.976089
1.000000
1.000000
0.102141
0.102141
0.221185
0.296394
0.462964
0.614721
0.753868
0.753868
0.898196
0.898196
1.000000
1.000000
0.074769
0.074769
0.293873
0.348072
0.620330
0.898083
0.910900
0.916373
0.927669
0.943284
0.943284
1.000000
0.123446
0.237810
0.573974
0.639479
0.911310
Expected #
of Systems
2
1
0
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
2
1
o
J
3
0
0
0
0
0
1
1
1
0
0
1
Actual # of
Systems
3
2
0
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
3
0
0
6
0
1
0
1
0
0
0
1
2
0
1
                                             B-16
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems







14











7











71





State/Territory
South Dakota
South Dakota
South Dakota
South Dakota
South Dakota
South Dakota
South Dakota
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tennessee
Tribes
Tribes
Tribes
Tribes
Tribes
Tribes
Tribes
Tribes
Tribes
Tribes
Tribes
Tribes
Texas
Texas
Texas
Texas
Texas
Texas
System
Type
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
CWS
CWS
CWS
CWS
CWS
System Size
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.067799
0.003200
0.016996
0.000695
0.000000
0.000000
0.000000
0.010921
0.039598
0.132222
0.091572
0.172997
0.527512
0.001157
0.004805
0.000531
0.018684
0.000000
0.000000
0.202971
0.147933
0.338683
0.046713
0.170082
0.055832
0.017954
0.000000
0.019833
0.000000
0.000000
0.000000
0.080530
0.032271
0.336133
0.034461
0.346726
0.102709
Cumulative
Probability
0.979109
0.982309
0.999305
1.000000
1.000000
1.000000
1.000000
0.010921
0.050519
0.182741
0.274314
0.447311
0.974823
0.975979
0.980785
0.981316
1.000000
1.000000
1.000000
0.202971
0.350904
0.689587
0.736299
0.906381
0.962213
0.980167
0.980167
1.000000
1.000000
1.000000
1.000000
0.080530
0.112801
0.448935
0.483395
0.830121
0.932830
Expected #
of Systems
0
0
0
0
0
0
0
0
0
2
2
2
7
0
0
0
0
0
0
2
0
2
0
1
1
0
0
0
0
0
0
0
0
24
2
25
7
Actual # of
Systems
0
0
0
0
0
0
0
0
2
2
0
0
10
0
0
0
0
0
0
2
2
1
0
0
1
0
0
1
0
0
0
8
3
19
0
29
4
                                             B-17
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems






7











16











2






State/Territory
Texas
Texas
Texas
Texas
Texas
Texas
Utah
Utah
Utah
Utah
Utah
Utah
Utah
Utah
Utah
Utah
Utah
Utah
Virginia
Virginia
Virginia
Virginia
Virginia
Virginia
Virginia
Virginia
Virginia
Virginia
Virginia
Virginia
Virgin Islands
Virgin Islands
Virgin Islands
Virgin Islands
Virgin Islands
Virgin Islands
Virgin Islands
System
Type
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
System Size
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
Probability
0.009594
0.011363
0.012236
0.013273
0.004174
0.016529
0.097464
0.089084
0.292421
0.042900
0.322360
0.100653
0.011787
0.011496
0.011412
0.000000
0.020423
0.000000
0.110551
0.098994
0.126866
0.084950
0.056914
0.209323
0.088632
0.018261
0.139405
0.052778
0.013327
0.000000
0.000000
0.203449
0.000000
0.009623
0.000000
0.007027
0.000000
Cumulative
Probability
0.942424
0.953787
0.966023
0.979297
0.983471
1.000000
0.097464
0.186548
0.478969
0.521869
0.844229
0.944882
0.956669
0.968165
0.979577
0.979577
1.000000
1.000000
0.110551
0.209545
0.336411
0.421361
0.478275
0.687598
0.776230
0.794490
0.933895
0.986673
1.000000
1.000000
0.000000
0.203449
0.203449
0.213072
0.213072
0.220099
0.220099
Expected #
of Systems
1
1
1
1
0
1
0
0
2
0
2
1
0
0
0
0
0
0
0
0
2
1
1
3
2
0
2
1
0
0
0
0
0
0
0
0
0
Actual # of
Systems
1
1
2
1
0
3
0
1
2
0
2
2
0
0
0
0
0
0
4
1
5
0
0
2
1
0
0
o
5
0
0
0
1
0
0
0
0
0
                                             B-18
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems





4











17











21







State/Territory
Virgin Islands
Virgin Islands
Virgin Islands
Virgin Islands
Virgin Islands
Vermont
Vermont
Vermont
Vermont
Vermont
Vermont
Vermont
Vermont
Vermont
Vermont
Vermont
Vermont
Washington
Washington
Washington
Washington
Washington
Washington
Washington
Washington
Washington
Washington
Washington
Washington
Wisconsin
Wisconsin
Wisconsin
Wisconsin
Wisconsin
Wisconsin
Wisconsin
Wisconsin
System
Type
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
System Size
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
Source
Type
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.298099
0.000000
0.189741
0.000000
0.292061
0.151019
0.168637
0.228719
0.150632
0.144298
0.156398
0.000298
0.000000
0.000000
0.000000
0.000000
0.000000
0.199296
0.063516
0.307767
0.048778
0.252023
0.047768
0.015331
0.018754
0.014947
0.031820
0.000000
0.000000
0.080289
0.000000
0.325307
0.001226
0.413393
0.007789
0.105799
0.000000
Cumulative
Probability
0.518198
0.518198
0.707939
0.707939
1.000000
0.151019
0.319656
0.548375
0.699007
0.843305
0.999702
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
0.199296
0.262812
0.570579
0.619357
0.871380
0.919148
0.934479
0.953233
0.968180
1.000000
1.000000
1.000000
0.080289
0.080289
0.405597
0.406822
0.820215
0.828004
0.933804
0.933804
Expected #
of Systems
1
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
4
0
5
1
4
1
0
0
0
1
0
0.0
0
0
7
0
9
0
2
0
Actual # of
Systems
1
0
0
0
0
1
0
2
0
0
1
0
0
0
0
0
0
4
1
4
1
4
0
0
0
1
2
0
0
0
0
8
0
11
0
2
0
                                             B-19
-------
UCMR Statistical Design
August 2001
Appendix B. Probability of Selection, with Expected and Initial SMP
Systems Selected for Assessment Monitoring
Total
Number of
Systems




10











3











State/Territory
Wisconsin
Wisconsin
Wisconsin
Wisconsin
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
West Virginia
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
Wyoming
System
Type
NTNCWS
NTNCWS
NTNCWS
NTNCWS
cws
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
CWS
cws
cws
cws
cws
cws
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
NTNCWS
System Size
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
25-500
25-500
501-3300
501-3300
3301-10000
3301-10000
Source
Type
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
GW
SW
Probability
0.066196
0.000000
0.000000
0.000000
0.049667
0.061836
0.154994
0.232457
0.083186
0.328074
0.011075
0.010705
0.003917
0.009707
0.000000
0.054382
0.125408
0.187899
0.145051
0.105977
0.087057
0.228394
0.008824
0.091474
0.003474
0.016443
0.000000
0.000000
Cumulative
Probability
1.000000
1.000000
1.000000
1.000000
0.049667
0.111503
0.266497
0.498954
0.582140
0.910214
0.921289
0.931994
0.935911
0.945618
0.945618
1.000000
0.125408
0.313306
0.458358
0.564335
0.651392
0.879786
0.888609
0.980084
0.983557
1.000000
1.000000
1.000000
Expected #
of Systems
1
0
0
0
0
0
2
2
1
3
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
Actual # of
Systems
0
0
0
0
0
0
0
5
0
4
0
0
0
1
0
0
0
1
1
0
0
0
0
1
0
0
0
0
                                             B-20
-------
UCMR Statistical Design
                                                       August 2001
                                  Appendix C

                                     Acronyms
CCL
CFR
CWS

EPA
EPTDS

GW
GUDI

NAWQA
NCOD
NTNCWS

PA
PWS

SDWA
SDWIS
SDWIS FED
SMP
SW

TNCWS

UCMR
USEPA
- Contaminant Candidate List
- Code of Federal Regulations
- community water system

- Environmental Protection Agency
- Entry Point to the Distribution System

- ground water
- ground water under the direct influence (of surface water)

- National Water Quality Assessment Program
- National Drinking Water Contaminant Occurrence Database
- non-transient non-community water system

- Partnership agreement
- Public Water System

- Safe Drinking Water Act
- Safe Drinking Water Information System
- the Federal Safe Drinking Water Information System
- State monitoring plan
- surface water

- transient non-community water system

- Unregulated Contaminant Monitoring Regulation/Rule
- United States Environmental Protection Agency
                                        C-l
-------
UCMR Statistical Design	August 2001
                                               C-2
-------
UCMR Statistical Design	August 2001


                                    Appendix D

                                       Definitions

Assessment Monitoring means sampling, testing, and reporting of listed contaminants that have
available analytical methods and for which preliminary data indicate their possible occurrence in
drinking water.  Assessment Monitoring  will  be conducted  for the  UCMR  (1999) List 1
contaminants.

Index Systems means  a  limited number of small CWSs and NTNCWSs,  selected from  the
Assessment Monitoring systems in State Plans, that will be required to provide more detailed and
frequent monitoring for the UCMR (1999) List 1 contaminants (§141.40(a)(6)). The Index Systems
will be selected to geographically coincide with watersheds and areas studied under the United States
Geological Survey's National Water Quality Assessment program. In addition to the reporting
information required for Assessment Monitoring, the Index Systems must also report information
on system operating conditions (such as water source, pumping rates, and environmental setting)
(§141.40(a)(6)). These systems must monitor each year of the 5-year UCMR cycle, with EPA paying
for all  reasonable monitoring costs  (§141.40(a)(4)(i)(A)). This more  detailed and  frequent
monitoring will  provide  important  information with which EPA can more  fully evaluate  the
conditions under which small systems operate.

Listed contaminant means a contaminant identified as an analyte in Table 1, 141.40(a)(3) of the
Unregulated Contaminant Monitoring Regulation (UCMR). To distinguish the current 1999 UCMR
listed contaminants  from potential future UCMR listed contaminants, all references to UCMR
contaminant lists will identify the appropriate year in parenthesis immediately following the acronym
UCMR and before the referenced list. For example, the contaminants included in the UCMR (1999)
List include the component lists identified as UCMR (1999) List 1, UCMR (1999) List 2 and UCMR
(1999) List 3 contaminants.

Listing cycle means the 5-year period for which each revised UCMR list is effective and during
which no more than 30 unregulated contaminants from the list may be required to be monitored.
EPA is mandated to develop and promulgate a new UCMR List every 5 years.

Monitored systems means all community water systems serving more than 10,000  people, and the
national representative sample of community and non-transient non-community water systems
serving 10,000 or fewer people that are selected to be part of a State Plan for the UCMR. (Note that
for this round of Assessment Monitoring, systems that purchase their primary source of water are
not included in the monitoring.)

Monitoring (as distinct from Assessment Monitoring) means all aspects of determining the quality
of drinking water relative to the listed contaminants. These aspects include drinking water sampling
and testing, and the reviewing, reporting, and submission to EPA of analytical results.

Most vulnerable systems  for Systems most vulnerable) means a subset of 5 to not more than 25
systems of all monitored systems in a State that are determined by that State in consultation with the
EPA Regional Office to be most likely to have the listed contaminants occur in their drinking waters,
considering  the characteristics of the listed  contaminants, precipitation, system  operation, and
environmental conditions (soils, geology and land use).
                                          D-l
-------
UCMR Statistical Design	August 2001

Pre-Screen Testing means sampling, testing, and reporting of the listed contaminants that may have
newly emerged as drinking water concerns and, in most cases, for which methods are in an early
stage of development. Pre-Screen Testing will be conducted by a limited number of systems (up to
200). States will nominate up to 25 of the most vulnerable systems per State for Pre-Screen Testing.
The actual Pre-Screen Testing systems will be selected from the list of nominated systems through
the use of a random number generator. Pre-Screen Testing will be performed to determine whether
a listed contaminant occurs in sufficient frequency in the most vulnerable systems or sampling
locations to warrant its being included in future Assessment Monitoring or Screening Surveys. Pre-
Screen Testing will be conducted for the UCMR (1999) List 3 contaminants.

Random Sampling is a statistical sampling method by which each member of the population has an
equal probability (an equal random chance) of being selected as part of a sample (the sample being
a small subset of the population which represents the population as a whole).

Representative Sample (or National Representative Sample) means a small subset of all community
and non-transient non-community water systems serving 10,000 or fewer people which EPA selects
using a random number generator. The systems in the representative sample are selected using a
stratified random sampling process that ensures that this small  subset of systems will proportionally
reflect (is "representative" of) the actual number of size- and water type-categories of all small
systems nationally. In finalizing State Plans, a State may substitute a system from the replacement
list for a system selected as  part of the original representative sample, if a  system on the
representative sample list in the State Plan is closed, merged or purchases water from another
system.

Sampling means the act of collecting water from the appropriate location in a public water system
(from the applicable point from an intake or well to the end of a distribution line, or in some limited
cases, a  residential  tap) following proper methods for  the  particular contaminant or group of
contaminants.

Sampling Point means a unique location where samples are to be collected.

Screening Survey means sampling, testing,  and reporting of the List 2  contaminants.  These
contaminants have analytical methods which have been recently developed, and  have uncertain
potential for occurrence in drinking water. Under the final List 2 Rule (66 FR 2273), two Screening
Surveys will be conducted by a subset of approximately 180 small systems from the 800 small
systems conducting Assessment Monitoring. Screening  Survey one will be conducted by small
systems during 2001 for the List 2 chemical contaminants. Screening Survey two will be conducted
by small systems during 2003 for the List 2 microbiological contaminant, Aeromonas.

State means each of the fifty States, the District of Columbia, U.S. Territories, and Tribal lands. For
the national representative sample, Guam, the Commonwealth of Puerto Rico, the Northern Mariana
Islands, the Virgin Islands, American Samoa, and the Trust Territories of the Pacific Islands are each
treated as an individual State. All Tribal water systems in the U.S. which have status as a State under
Section 1451 of the Safe Drinking Water Act for this program will be considered collectively as one
State for the purposes of selecting a representative sample of small systems.

State Monitoring Plan (or State Plan) means a State's portion of the national representative sample
of CWSs and NTNCWSs serving 10,000 or fewer people which must monitor  for unregulated
contaminants (Assessment Monitoring, Screening Survey(s) and Index Systems) and all large
systems (systems serving greater than 10,000 people) which are required to monitor for Screening
Survey  contaminants.  A State Plan may  be  developed by a  State's  acceptance  of EPA's


                                          D-2
-------
UCMR Statistical Design	August 2001

representative sample for that State, or by a State's selection of systems from a replacement list for
systems specified in the first list that are closed, are merged, or purchase water from another system.
A State Plan also includes the process by which the State will inform each public water system of
its selection for the plan and of its responsibilities to monitor. A State Plan will also include the
systems required to conduct Pre-Screen Testing, selected from the State's designation of vulnerable
systems. The State Plan may be part of the Partnership Agreement (PA) between the State and EPA.

Stratified Random Samplingis a procedure to draw a random sample from a population that has been
divided into subpopulations or strata, with each stratum comprised of a population subset sharing
common characteristics. Random samples are selected  from each stratum proportional to that
stratum's proportion of the entire population. The aggregate random sample (compiled from all the
strata samples) provides a random sample of the entire population that reflects the proportional
distribution of characteristics of the population.  In the context of the UCMR, the population served
by public water systems was stratified by size (with size categories of 500 or fewer people served,
501 to 3,300 people served, and 3,301 to 10,000 people served) and by water source type supplying
the water system (ground water or surface water). This stratification was done to ensure that systems
randomly selected as nationally representative sample systems would proportionally reflect the actual
number of size and water type categories nationally.

Testing means, for the purposes of the UCMR and distinct from Pre-Screen Testing, the submission
and/or shipment of samples following appropriate preservation practices to protect the integrity of
the sample; the chemical, radiological, physical and/or microbiological analysis of samples; and the
reporting of the sample's analytical results for evaluation. Testing is a subset of activities defined
as monitoring.

Unregulated contaminants means chemical, microbiological, radiological and other sub stances that
occur in drinking water or sources of drinking water that are not currently regulated under the federal
drinking water program. EPA has not issued standards for these substances in drinking water (i.e.,
maximum contaminant levels or treatment technology requirements). EPA is required by Congress
to establish a program to monitor for selected unregulated contaminants in public water systems to
determine whether they should be considered for future  regulation to protect public health. The
selected contaminants are listed in 141.40(a)(3), Table 1, the UCMR List.

Vulnerable time (or vulnerable period) means the time (or, in some cases, the 3-month quarter) of
the year determined as the most likely to have the listed group of contaminants present at their
highest concentrations or densities in drinking water. The vulnerable determination, in the case of
the UCMR, is made by the EPA or by the  State (under arrangement with the EPA) for a system,
subset of systems, or all systems in a State. The vulnerable determination is based on characteristics
of the contaminants,  precipitation, system operations, and environmental conditions such as soil
types,  geology,  and  land  use. This  determination  does not indicate or imply that the listed
contaminants will be identified in  the  drinking water  with  certainty,  but only that sampling
conducted during the vulnerable period presumably has the highest likelihood of identifying those
contaminants in higher concentrations relative to other sampling times of the year, if and when the
contaminants occur.
                                           D-3
-------
UCMR Statistical Design	August 2001
                                              D-4
-------