September 2, 1997
EPA-SAB-DWC-LTR-97-010
Honorable Carol M. Browner
Administrator
U.S. Environmental Protection Agency
401 M Street, S.W.
Washington, D.C. 20460
Subject: Science Advisory Board Review of the Agency Report: An Evaluation of the
Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source
Waters (June 26, 1996)
Dear Ms. Browner:
The Drinking Water Committee (DWC or the Committee) of the U.S. Environmental
Protection Agency (EPA) Science Advisory Board met on July 17, 1996 and again on June 7,
1997 in Washington, DC to review a report describing the statistical performance of the Agency's
protozoan oocyst monitoring methods. Agency staff recognized that the protozoan analysis
methodology that had been formally adopted for the Information Collection Rule (ICR) was crude
and had very low and highly variable recoveries of added oocysts. The statistical methodology
was considered by Agency staff to be necessary to determine whether the Agency can take
advantage of the large monitoring program agreed to under the negotiated rulemaking process.
The Agency charge asked that the Drinking Water Committee evaluate the report and
address the following concerns:
Charge 1: Evaluate the factual and conceptual soundness of the approach and methods
used, and the soundness of the results and conclusions of the report.
Charge 2: Evaluate the viability of the assumptions and conditions tested in the report.
Charge 3: Evaluate the suitability of the report as a basis for making a decision on the
use of protozoan monitoring data for a national impact assessment.
Charge 4: Evaluate whether the degree of accuracy and precision of the protozoan
method is acceptable for an impact analysis.
-------
This letter report provides the Drinking Water Committee's comments, conclusions, and
recommendations in response to the Agency's charge.
1. Overview and Summary
The Agency's draft report is based on a series of computer simulation studies. The
statistical properties of data from the EPA performance evaluation (PE) studies, and EPA field
spiking (FS) studies, along with extensive Giardia and Cryptosporidium sampling studies by the
American Waterworks Service Company, were used to simulate the outcome of the Information
Collection Rule sampling protocol. This outcome was then used to examine the significance of
improvements in method recovery and to further simulate a national regulatory impact analysis
(RIA).
The Agency report concluded that:
a) The Information Collection Rule should provide a reliable estimate of the distribution of
oocyst densities for the nation as a whole, provided that the ICR data achieve a minimum
recovery rate of 8%. Some relevant scenarios were also programmed by three different
groups using the same data and assumptions, and three different programming
languages. Since all three gave the same statistical properties for the potential ICR data,
the Agency stated that this provided further support for the simulations.
b) The error in national cost estimates, introduced by unit treatment cost assumptions
and decision tree assumptions, would be sufficient to mask the effect of errors from ICR
monitoring. This was based on an analyses by (Cromwell, cited below), simulating the
impact of a regulation that would specify site-specific treatment based on site-specific
protozoan measurements.
c) The Information Collection Rule's protozoan data will be adequate fora national
regulatory impact analysis.
The Committee reviewed the materials presented and concluded that the investigators
used sound approaches and methods, and that the methods conform to conventional standards
of modern statistical practice. As a consequence, the Committee feels that the statistical method
can be used with some confidence for the limited objectives within Charges one and two.
However, the Committee is concerned about extension of Charges three and four to the
implementation of the Enhanced Surface Water Treatment Rule. Essentially, the limitations in
the robustness, representativeness, and overall quality of data restrict its utility in statistical
modeling and the Committee has comments on the methods employed and on the conclusions
drawn from the analysis. The heart of the issue is the protozoan assay technique itself. As a
result, the Committee found it was necessary to make constructive comment on the method itself
as well as on the simulation effort.
A summary of the Committee's principle comments are:
-------
a) The data used were collected at a limited number of sites that may not accurately
reflect the national occurrence of cysts and oocysts in surface water. Since the
simulations relied heavily on these data, if the data were not accurate, the reliability of the
simulation outcomes may be in question.
b) Before health benefits can be effectively addressed, analytical methods must be
developed to address viability/infectivity of the organisms present as well as recovery.
c) A Regulatory Impact Analysis considers the cost of compliance and the health benefits
associated with compliance. This work demonstrates how the output of the Information
Collection Rule can be used for the first part but does not directly address the second.
d) With regard to the cost of compliance, if improved oocyst monitoring methods do not
become available for use in implementation of the Enhanced Surface Water Treatment
Rule, then the classification system used to simulate the use of the Information Collection
Rule to project site specific decisions will no longer be suitable.
e) When simulated sampling resulted in a "not detected" (ND), the EPA re-populated the
database with values at one fourth of the detection limit. This might result in an
unnecessary bias in the outcome. When available data are censored, maximum
likelihood or rank order statistical methods should be used.
f) The Agency choice of a minimum recovery value of 8% is based on interpolating within
the simulation data. In actuality, actions taken to improve recovery will improve precision
also. The lower the recovery rates are, the higher is the percentage of the data below the
detection limit. These simulations may not represent the most likely outcomes.
g) The Agency has not made an adequate scientific case for selecting the second
highest of 18 measurements as the basis for regulation. This approach does not take full
advantage of the Information Collection Rule data that will be available for characterizing
the distribution of cyst densities.
h) A national recovery adjustment for the protozoan assay is suitable for a Regulatory
Impact Assessment, but not for implementation of the Enhanced Surface Water
Treatment Rule at the local system level since it would result in treatment being required
in many places where it is not necessary and not being applied in others where it should
be. This problem is best addressed by improvement in the methods and elimination of
recovery adjustments altogether.
i) The probability distribution function (PDF) of the sample volumes in the six
performance evaluation data sets is used to represent the PDF of the Information
Collection Rule as a whole. It is likely that the outcome of the ICR will have much greater
variance than these sample volumes, which were drawn from a limited number of sites,
and conducted by an elite group of laboratories. Much can be gained by revising the
assay protocol so that each sample result is based on a large and consistent effective
-------
volume.
j) Another Committee concern is the lack of any expression of the variability of the
estimator, permitting a measurement of uncertainty. The Information Collection Rule
method should report protozoan density accompanied by a realistic measure of
uncertainty. The procedure should include formulas for calculating the standard error and
the upper 95% confidence limit.
k) The simulations show that the proposed method for analyzing the Information
Collection Rule protozoan density data possesses a positive bias. It should be possible
to revise the statistical analysis method so it is less biased. Explanatory variables (e.g.,
turbidity, temperature, bacterial densities, etc.) can be used to strengthen the analysis.
To use the ICR data to devise such bias adjustment procedures, it will be necessary to
collect data on these explanatory variables.
I) The Committee recommends that the EPA prepare a manuscript describing this work
and submit it for publication in a peer-reviewed scientific journal. Publication would
improve the accessibility of the Agency's documentation of this effort.
2. Specific Comments and Recommendations
This report assumes that the reader is familiar with the subject draft report (Fox, June 26,
1996) and the following supporting documents: 1) EPA memorandum on "Recovery rates for raw
water and ICR protozoan method performance" (Fox, April 26, 1996); 2) EPA memorandum on
"Simulation of protozoan method performance based on field spike recovery rates" (Fox, April 19,
1996); and 3) a contractor's memorandum on "Estimation of ESWTR National costs using
simulated ICR monitoring data" (Cromwell, February 12, 1996 and an amendment dated April 22,
1996). It also assumes the reader is familiar with the analytical method for protozoan oocysts
and the terminology and acronyms associated with this field.
The Committee reviewed the draft report and all supporting documents provided by the
Agency, and received oral presentations by EPA Office of Water (OW) personnel. This report is
based on those documents and discussions and is organized into three sections corresponding
to the charge (charge elements 3 and 4 have been combined).
In general, the Committee agrees with the report and the Agency's interpretation of it,
though the Committee has suggested some limits on the interpretation of the results. This body
of work should be viewed as an initial step with much follow-up needed. As a result, the
Committee has made some comments and suggestions for the future.
a) Charge 1: Evaluate the factual and conceptual soundness of the approach
and methods used, and the soundness of the results and conclusions of the
report.
The Committee concludes that the investigators used sound approaches and methods,
-------
and the methods used, in general, conform to conventional standards of modern statistical
practice. While the approach was appropriate for determining if the Information Collection Rule
data would be useful in conducting a regulatory impact analysis, it is not appropriate for actually
conducting the analysis once the ICR data are available. The Committee does have some
modifications to suggest for the methods and the conclusions drawn from the analysis.
In principle one could derive the statistical properties of potential Information Collection
Rule data using closed form analytical methods; however, the investigators found that
mathematical analysis was too difficult or required excessive simplifying assumptions, thus,
computer simulations seemed to be the only feasible approach. It is common to use computer
simulation methodology in work of this kind. Often the simulations provide a check that a
proposed mathematical statistical methodology is valid, or compare the characteristics of
alternative experimental designs. The Committee believes that the computer simulation
experiments for the ICR protozoan monitoring program could have been done differently. Even
so, it seems likely that if the simulations were conducted under the same assumptions the
simulation outcomes would have been much the same as those presented in the report.
The Committee has not performed any independent simulations or analyses of the
simulation data. Some of the scenarios were programmed by three different groups using three
different programming languages and resulted in the same statistical properties for the potential
Information Collection Rule data. This insures against errors in programming, but it does not
necessarily validate the model. The Committee does think that some improvement should be
made. These are detailed in the following section.
In EPA's simulation protocol, non-detects (NDs) were replaced with a value of one fourth
the detection limit. This decision, combined with the decision to select the second largest of 18
measurements as a 'non-parametric' way of determining the ninetieth percentile value, results in
a strong positive bias for sites with low oocyst levels, their ninetieth percentile value being
controlled not by the simulated oocyst densities themselves but by simulation of the sample
volume. When censored data are to be used in this way, maximum likelihood or rank order
statistical methods can be used to determine the parameters of the probability distribution
function (Helsel & Cohn, 1988, Water Resources Research, 24:1997-2004) and the censored
portion of the database can be repopulated to better determine the likely outcome. EPA should
also consider how this matter will be addressed in the implementation of the Enhanced Surface
Water Treatment Rule. Defending a requirement for treatment at sites where no oocysts are
detected will be difficult.
EPA concluded that its simulation demonstrates the suitability of the Information
Collection Rule data to conduct a regulatory impact analysis, but the simulation addresses only
the cost issues of the impact analysis required and neglects health effects. A regulatory impact
analysis considers, among other things, both the costs of compliance and the health benefits
received because of compliance. This work demonstrates how the output of the Information
Collection Rule might be used for the first part but it does not directly address the second. The
Committee identified two difficulties in conducting a meaningful regulatory impact analysis:
i) The Cromwell memo shows that, assuming the decisions to treat will be based on
-------
site-specific measurements of oocyst concentration, the cost of implementation
can be estimated with reasonable accuracy using the Information Collection Rule
data. On the other hand, current protozoan methods are not suitable for gathering
the data for those site-specific decisions. If improved oocyst monitoring methods
are not made available for use in implementation of the Enhanced Surface Water
Treatment Rule, then another approach to classification of systems will be
required and the regulatory impact analysis will have to be repeated using this
new classification system.
ii) Before health benefits can be effectively addressed, analytical methods must be
developed which address viability/infectivity of the organisms present.
EPA concluded that the Information Collection Rule data will support reliable estimates of
national occurrence if the average recovery rate is at least 8%. The Committee does not take
issue with this judgement. However, it notes that the choice of a recovery value of 8% was not
the solution to some well-defined optimization equation. Instead it was based on interpolating
within the simulation data. After looking at all the simulation results, the investigators appeared
to arrive at 8% as an informed judgment. The Committee understands that the simulation of
different recoveries was accomplished by increasing or decreasing all the recoveries sampled
from the performance evaluation study database by the same multiplier. Different multipliers
would result in different mean recoveries for a given simulation. This would have the effect of
changing the central tendency of the distribution without changing the variance. In actuality,
actions taken to improve recovery are likely to improve precision also (e.g. eliminating the labs
with lower recovery or developing better methods) . Much can be gained by revising the assay
protocol so that each sample result is based on an effective volume of at least Vo, where Vo is
specified at some appropriate large value. To accomplish the specified Vo, it may be necessary
to use more than one subsample of a pellet in the lab. The cost of conducting the assay will be
increased if the protocol is changed in this way (specifying a smallest allowable effective
volume). Simulation studies could show the benefits of such a change.
The Agency noted in its report and in the cover letter transmitting the report to the
Committee, that "The basis of this distinction is an elementary, but fundamental, statistical
principle: averages of imprecise estimates, if based on many values, are more precise than are
the individual estimates." Even so, the Agency must maintain its awareness that precision and
accuracy are different attributes of sampling and measurement. The accuracy of an estimate will
improve with averaging only if there are no systematic biases present. The Committee identified
a few potential sources of bias that could affect estimates of protozoan concentrations in water.
If these, or other biases are present, they may be deriving an inaccurate estimate with a great
deal of precision.
b) Charge 2: Evaluate the viability of the assumptions and conditions tested in
the report.
Most of the assumptions in the analysis are reasonable considering the time at which
they were made. The main issue is whether the simulated scenarios cover the spectrum of
possibilities for the national survey. The data used were collected at a limited number of sites
-------
that may not accurately reflect the national occurrence of cysts and oocysts in surface water.
Since the simulations relied heavily on these data, if the data were not accurate, the reliability of
the simulation outcomes may be in question. The Agency must be alert to this possibility when
the Information Collection Rule data become available. Parallel to this will be a need to
recognize that data obtained form the Information Collection Rule may be less useful than
implied by the current analysis.
The following are additional specific comments:
i) The probability distribution function of the sample volumes in the six performance
evaluation data sets is used to represent the probability distribution function of the
Information Collection Rule as a whole. EPA had no other data on the probability
distribution function of sample volumes, but it is likely that the outcome of the
Information Collection Rule will have much greater variance than these sample
volumes, which were drawn from a limited number of sites and analyzed by an
elite group of laboratories. The effect of this decision probably makes the
simulation data have greater precision than the real Information Collection Rule
data may actually have. Moreover it seems that real-world sample volumes are
generally determined by water quality and operator judgment, with the poorer
water qualities generally having smaller sample volumes.
ii) The simulations show that the proposed method for analyzing the Information
Collection Rule protozoan density data possesses a positive bias. The
investigators state that it should be possible to revise the statistical analysis
method so it is less biased. It is conceivable that potential explanatory variables
(e.g., turbidity, temperature, bacterial densities, etc.) can be used to strengthen
the analysis. To use the ICR data to devise such bias adjustment procedures, it
will be necessary to collect data on these explanatory variables. However, the
appropriate explanatory variables may have not been adequately identified and
characterized. These variables can affect different parts of the procedure, such
as, the volume of water which can be sampled, the volume of the final suspended
pellet, and the number of oocysts not identified.
iii) The best method for correcting for a small recovery rate is an important open
problem. Because the Information Collection Rule will produce more recovery
data, it may provide insights into better adjustment procedures. If the assay is to
be used as the basis for regulating a drinking water system the adjustment must
be applicable to that specific system. Adjustment by the national mean recovery
rate may be acceptable for purposes of discovering the national distribution of
oocyst densities, but it is not very satisfactory for regulatory purposes.
c) Charges 3 & 4: Evaluate the suitability of the report as a basis for making a
decision on the use of protozoan monitoring data fora national impact
assessment and evaluate whether the degree of accuracy and precision of
the protozoan method is acceptable for an impact analysis.
-------
These reports are of good quality, but they should be treated only as a beginning. The
Agency's thinking will require further development for effective decision making. The following
are additional specific comments:
i) These documents have chosen to base the performance standard on the second
highest value measured in eighteen samples, as an estimate of the ninetieth
percentile. This approach does not take full advantage of the information that is
available to determine the distribution of oocyst densities at the site and because
it uses such a small data sample, it is highly variable in its outcome. A better
method would be to use maximum likelihood or rank order statistics to determine
the parameters of the distribution and estimate performance standards from these
(Helfel and Cohn, 1997; Newman and Dixon, 1990).
ii) The documents implied that the performance standard would be based on the
estimated parameter. A better choice would be to base the performance standard
on a specific confidence limit for that parameter (e.g., the upper 95% or 99%
confidence limit). However, the selection of the specific limit is a value judgement
that must ultimately be made by the Agency. The confidence limit approach
guarantees a specified small chance that a system truly afflicted with too many
protozoa will incorrectly pass the compliance criterion. It also provides incentive
for good sampling and assay techniques. Imagine two drinking water systems
that collected monitoring data and calculated the same estimated value for the
oocyst density parameter. Suppose that the first system achieved large effective
volumes and large recovery rates, but the second system did not. Then the upper
confidence limit for the first system will be less than that for the second. If the
performance standard were stated for the upper confidence limit, the first system
could be in compliance, but the second system would not (that is, the system
providing unreliable results will more often be required to perform enhanced
treatment). In summary, the RIA should be conducted for a compliance criterion
based on the upper 95% confidence limit. To do that, a confidence interval
procedure would have to be developed.
iii) Because the method used in the simulated regulatory impact assessment
involved a national recovery adjustment, this method would not be appropriate for
implementation at the local system level as it would result in treatment being
required in many places where it is not necessary and not being applied in others
where it should be. This problem is best addressed by improvement of methods
and elimination of recovery adjustments altogether.
The Committee notes that Agency efforts, such as that described in the report reviewed
herein, do not often get widespread distribution. Their accessibility is thus restricted resulting in
limited scientific deliberation on, and feedback to the Agency, on issues that are important to
decision making. For this reason, the Committee recommends that the Agency prepare a
manuscript describing this work and submit it for publication in a peer-reviewed scientific journal
that uses statisticians as reviewers (e.g., Environmental Science and Technology, Water
Research, Environmetrics). The theoretical work described above would enhance the
8
-------
manuscript.
We appreciate having been given the opportunity to address this issue, and look forward
to receiving a response to our comments from the Office of Water.
Sincerely,
/signed/
Dr. Genevieve Matanowski, Chair
Science Advisory Board
/signed/
Dr. Richard J. Bull, Chair
Drinking Water Committee
-------
LIST OF ABBREVIATIONS
DWC Drinking Water Committee
EPA U.S. Environmental Protection Agency
ESWTR Enhanced Surface Water Treatment Rule
FS Field Spiking Studies
ICR Information Collection Rule
ND Non-detects or non-detected
OW Office of Water
PDF Probability Distribution Function
PE Performance Evaluation Studies
RIA Regulatory Impact Analysis
SAB EPA Science Advisory Board
-------
REFERENCES CITED
Cromwell, J. 1996. "Estimation of ESWTR national costs using simulated ICR monitoring
data".Contractor's report. 24 pp.
Cromwell, J. 1996. "Revised analysis ESWTR national costs using revised simulations".
Contractor's report. 3 pp.
Fox, J. 1996. An Evaluation of the Statistical Performance of a Method for Monitoring
Protozoan Cysts in U.S. Source Waters. US Environmental Protection Agency/Office of
Water/Office of Science and Technology. Attachment to a memorandum from John Fox
to Dr. Donald G. Barnes, Science Advisory Board; June 26, 1996.
Fox, J. 1996. "Recovery rates for raw water and ICR protozoan method performance". US
Environmental Protection Agency/Office of Water/Office of Science and Technology.
Memorandum dated April 26, 1996. 8 pp.
Fox, J. 1996. "Simulation of protozoan method performance based on field spike recovery
rates". US Environmental Protection Agency/Office of Water/Office of Science and
Technology. Memorandum dated April 19, 1996. 15pp.
Helfel, D. And T. Cohn. 1988. Estimation of desriptive statistics for multiply censored water
quality data. Water Resources Research 24:1997-2004.
Hamilton, Martin A.; Ph.D. 1996. Statistical Review of the Draft Report: An Evaluation
of the Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source
Waters (and supporting documents). Report dated July 26, 1996.
LeChevallier, M. 1995. Summary of an American Water Service Company study in the
eastern and central United States (not a title). Jour, of the Amer. Water Works Assoc.
September 1995.
Newman, M. And D. Dixon. 1990. Uncensor: A program to estimate the means and standard
deviations for data sets with below detection observations. April 1990. American
Environmental Laboratory. P. 26.
-------
U.S. ENVIRONMENTAL PROTECTION AGENCY
SCIENCE ADVISORY BOARD
DRINKING WATER COMMITTEE
June 9-10, 1997
Science Advisory Board Review of the Agency Report: An Evaluation of the Statistical
Performance of a Method for Monitoring Protozoan Cysts in U.S. Source Waters -PANEL
CHAIR
DR. RICHARD BULL, Batelle Pacific Northwest Laboratories, Molecular Biosciences, P.O. Box
999-P7-56, Richland, WA 99352
PAST CHAIR
DR. VERNE A. RAY, Medical Research Laboratory, Pfizer Inc., Groton, Connecticut 06340
MEMBERS
Dr. JUDY A. BEAN, Professor, Department of Epidemiology & Public Health, University of Miami,
School of Medicine, 1801 N.W. 9th Avenue, Miami, Florida 33136
DR. LENORE S. CLESCERI, Rensselaer Polytechnic Institute, Materials Research Center 236,
Troy, New York 12181
DR. ANNA FAN-CHEUK, California Environmental Protection Agency, 2151 Berkley Way - Annex
11, Berkley, California 94704
DR. LEE D. (L.D.) MCMULLEN, General Manager, Des Moines Waterworks, 2201 Valley Drive,
Des Moines, IA 50321
DR. CHARLES O'MELIA, Professor and Chairman, Department of Geography and Environmental
Engineering, The Johns Hopkins University, Baltimore, MD 21218
DR. EDO D. PELLIZZARI, Research Triangle Institute, PO Box 12194, 3040 Cornwallis Rd,
Research Triangle Park, NC 27709
DR. RHODES TRUSSELL, Montgomery Watson, 300 North Lake Avenue, Suite 1200,
Pasadena, CA91101
DR. MARYLYNN V. YATES, Department of Soil and Environmental Sciences, Room 2208
Geology, University of California, Riverside, CA 92521
in
DESIGNATED FEDERAL OFFICER
-------
MR. THOMAS MILLER, Designated Federal Official, US EPA/Science Advisory Board, 401 M
Street, S.W. (1400), Washington, D.C. 20460 (202) 260-5886 FAX: (202) 260-7118
STAFF SECRETARY
MS. MARY WINSTON, Staff Secretary, Science Advisory Board, 401 M Street, S.W. (1400),
Washington, DC 20460 (202) 260-8414 FAX: (202) 260-7118
IV
-------
NOTICE
This report has been written as a part of the activities of the Science Advisory Board, a public
advisory group providing extramural scientific information and advice to the Administrator and
other officials of the Environmental Protection Agency. The Board is structured to provide
balanced, expert assessment of scientific matters relating to problems facing the Agency. This
report has not been reviewed for approval by the Agency and, therefore, the contents of this
report do not necessarily represent the views and policies of the Environmental Protection
Agency, nor of other agencies in the Executive Branch of the Federal Government, nor does
mention of trade names or commercial products constitute a recommendation for use.
-------
DISTRIBUTION LIST
Administrator
Deputy Administrator
Assistant Administrators
Deputy Assistant Administrators for Research and Development
Deputy Assistant Administrator for Water
EPA Regional Administrators
EPA Laboratory and Center Directors
EPA Headquarters Library
EPA Regional Libraries
EPA Laboratory Libraries
Director; Office of Science and Technology, Office of Water
Director: Office of Groundwater and Drinking water, Office of Water
Drinking Water Committee Members
------- |