September 2, 1997
EPA-SAB-DWC-LTR-97-010

Honorable Carol M. Browner
Administrator
U.S. Environmental Protection Agency
401 M Street, S.W.
Washington, D.C. 20460

       Subject: Science Advisory Board Review of the Agency Report: An Evaluation of the
       Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source
       Waters (June 26, 1996)

Dear Ms. Browner:

       The Drinking Water Committee (DWC or the Committee) of the U.S. Environmental
Protection Agency (EPA) Science Advisory Board met on July 17, 1996 and again on June 7,
1997 in Washington, DC to review a report describing the statistical performance of the Agency's
protozoan oocyst monitoring methods. Agency staff recognized that the protozoan analysis
methodology that had been formally adopted for the Information Collection Rule (ICR) was crude
and had very low and highly variable recoveries of added oocysts. The statistical methodology
was considered by Agency staff to be necessary to determine whether the Agency can take
advantage of the  large monitoring program agreed to under the negotiated rulemaking process.

       The Agency charge asked that the Drinking Water Committee evaluate the report and
address the following concerns:

       Charge 1: Evaluate the factual and conceptual soundness of the  approach and methods
       used, and the soundness of the results and conclusions of the report.

       Charge 2: Evaluate the viability of the assumptions and conditions tested in the report.

       Charge 3: Evaluate the suitability of the report as a basis for making a decision on the
       use of protozoan monitoring data for a national impact assessment.
       Charge 4: Evaluate whether the degree of accuracy and precision of the protozoan
       method is acceptable for an impact analysis.

-------
       This letter report provides the Drinking Water Committee's comments, conclusions, and
recommendations in response to the Agency's charge.

1.  Overview and  Summary

       The Agency's draft report is based on a series of computer simulation studies. The
statistical properties of data from the EPA performance evaluation (PE) studies, and EPA field
spiking (FS) studies, along with extensive Giardia and Cryptosporidium sampling studies by the
American Waterworks Service Company, were used to simulate the outcome of the Information
Collection Rule sampling protocol.  This outcome was then used to examine the significance of
improvements in method recovery and to further simulate a national regulatory impact analysis
(RIA).

       The Agency report concluded that:

       a) The Information Collection Rule should provide a reliable estimate of the distribution of
       oocyst densities for the nation as a whole, provided that the ICR data achieve a minimum
       recovery rate of 8%.  Some relevant scenarios were also programmed by three different
       groups using the same data and assumptions, and three different programming
       languages. Since all three gave the same statistical properties for the potential ICR data,
       the Agency stated that this provided further support for the simulations.

       b) The error in national cost estimates, introduced by unit treatment cost assumptions
       and decision tree assumptions, would be sufficient to mask the effect of errors from ICR
       monitoring.  This was based on an analyses by (Cromwell, cited below), simulating the
       impact of a regulation that would specify site-specific treatment based on site-specific
       protozoan  measurements.

       c) The Information Collection Rule's protozoan data will be adequate fora national
       regulatory  impact analysis.

       The Committee reviewed the materials presented and concluded that the investigators
used sound approaches and methods, and that the methods conform to conventional standards
of modern statistical practice. As a consequence, the Committee feels that the statistical method
can be used  with some confidence for the limited objectives within Charges one and two.
However, the Committee is concerned about extension of Charges three and four to the
implementation of the Enhanced  Surface Water Treatment Rule.  Essentially, the limitations in
the robustness, representativeness, and overall quality  of data restrict its utility  in statistical
modeling and the Committee has comments on the methods employed and on  the conclusions
drawn from the analysis. The heart of the issue is the protozoan assay technique itself. As a
result, the Committee found it was necessary to make constructive comment on the method itself
as well as on the simulation effort.

       A summary of the Committee's principle comments are:

-------
a) The data used were collected at a limited number of sites that may not accurately
reflect the national occurrence of cysts and oocysts in surface water. Since the
simulations relied heavily on these data, if the data were not accurate, the reliability of the
simulation outcomes may be in question.

b) Before health benefits can be effectively addressed, analytical methods must be
developed to address viability/infectivity of the organisms present as well as recovery.

c) A Regulatory Impact Analysis considers the cost of compliance and the health benefits
associated with compliance. This work demonstrates how the output of the Information
Collection Rule can be used for the first part but does not directly address the second.

d) With regard to the cost of compliance, if improved oocyst monitoring methods do not
become available for use in implementation of the Enhanced Surface Water Treatment
Rule, then the  classification system used to simulate the use of the Information Collection
Rule to project site specific decisions will no longer be suitable.

e) When simulated sampling resulted in a "not detected" (ND),  the EPA re-populated the
database with values at one fourth of the detection limit. This might result in an
unnecessary bias in the outcome. When available data are censored,  maximum
likelihood or rank order statistical methods should be used.

f) The Agency choice of a minimum recovery value of 8% is based on  interpolating within
the simulation data.  In actuality, actions taken to improve recovery will improve precision
also. The lower the recovery rates are, the higher is the percentage of the data below the
detection limit. These simulations may not represent the most likely outcomes.

g) The Agency has not made an adequate scientific case for selecting the second
highest of 18 measurements as the basis for regulation. This approach does not take full
advantage of the Information Collection Rule data that will be available for characterizing
the distribution of cyst densities.
h) A national recovery adjustment for the protozoan assay is suitable for a Regulatory
Impact Assessment, but not for implementation of the Enhanced Surface Water
Treatment Rule at the local system level since it would result in treatment being required
in many places where it is not necessary and not being applied in others where it should
be.  This problem is best addressed by improvement in the methods and elimination of
recovery adjustments altogether.

i) The probability distribution function (PDF) of the sample volumes in the six
performance evaluation data sets is used to represent the PDF of the Information
Collection  Rule as a whole.  It is likely that the outcome of the ICR will have much greater
variance than these sample volumes, which were drawn from a  limited number of sites,
and conducted by an  elite group of laboratories.   Much can be gained by revising the
assay protocol so that each sample result is based on a large and consistent effective

-------
       volume.

       j) Another Committee concern is the lack of any expression of the variability of the
       estimator, permitting a measurement of uncertainty. The Information Collection Rule
       method should report protozoan density accompanied by a realistic measure of
       uncertainty.  The procedure should include formulas for calculating the standard error and
       the upper 95% confidence limit.

       k) The simulations show that the proposed method for analyzing the Information
       Collection Rule protozoan density data possesses a positive bias.  It should be possible
       to revise the statistical analysis method so it is less biased.  Explanatory variables (e.g.,
       turbidity, temperature, bacterial densities, etc.) can be used to strengthen the analysis.
       To use the ICR data to devise such  bias adjustment procedures, it will be necessary to
       collect data on these explanatory variables.

       I) The Committee recommends that the EPA prepare a manuscript describing this work
       and  submit it for publication in a peer-reviewed scientific journal.  Publication would
       improve the accessibility of the Agency's documentation of this effort.

2.  Specific Comments and Recommendations

       This report assumes that the reader is familiar with the subject draft report (Fox, June 26,
1996) and the following supporting documents:  1) EPA  memorandum on "Recovery rates for raw
water and ICR protozoan method performance" (Fox, April 26, 1996); 2) EPA memorandum on
"Simulation  of protozoan method performance based on field spike  recovery rates" (Fox, April 19,
1996); and 3) a contractor's memorandum on "Estimation of ESWTR National costs using
simulated ICR monitoring data" (Cromwell, February 12, 1996 and an amendment dated April 22,
1996). It also assumes the reader is familiar with the analytical method for protozoan oocysts
and the terminology and acronyms associated with this field.

       The Committee reviewed the draft report and all  supporting  documents provided by  the
Agency, and received oral presentations by EPA Office of Water (OW) personnel. This report is
based on those documents and discussions and is organized into three sections corresponding
to the charge (charge elements 3 and 4 have been combined).

       In general, the Committee  agrees with the report and the Agency's interpretation of it,
though the Committee has suggested some limits on the interpretation of the results.  This body
of work should be viewed as an initial step with much follow-up needed. As a result, the
Committee has made some comments and suggestions for the future.

       a)     Charge 1: Evaluate the factual and conceptual soundness of the approach
             and methods used, and the soundness of the results and conclusions of the
             report.

       The Committee concludes that the investigators  used sound approaches and methods,

-------
and the methods used, in general, conform to conventional standards of modern statistical
practice.  While the approach was appropriate for determining if the Information Collection Rule
data would be useful in conducting a regulatory impact analysis, it is not appropriate for actually
conducting the analysis once the ICR data are available. The Committee does have some
modifications to suggest for the methods and the conclusions drawn from the analysis.

       In principle one could derive the statistical properties of potential Information Collection
Rule data using closed form analytical methods; however, the investigators found that
mathematical analysis was too difficult or required excessive simplifying assumptions, thus,
computer simulations seemed to be the only feasible approach. It is common to use computer
simulation methodology in work of this kind. Often the simulations provide a check that a
proposed mathematical statistical methodology is valid, or compare the characteristics of
alternative experimental designs. The Committee believes that the computer simulation
experiments for the ICR protozoan monitoring program could have been done differently. Even
so, it seems likely that if the simulations were conducted under the same assumptions the
simulation outcomes would have been much the same as those presented in the report.

       The Committee has not performed any independent simulations or analyses of the
simulation data. Some of the scenarios were programmed by three different groups using three
different programming languages and resulted in the same statistical properties for the potential
Information Collection Rule data.  This insures against errors in programming,  but it does not
necessarily validate the model.  The Committee does think that some improvement should be
made.  These are detailed in the following section.

       In EPA's simulation protocol, non-detects (NDs) were replaced  with  a value of one fourth
the detection limit. This decision, combined with the decision to select the second largest of 18
measurements as a 'non-parametric' way of determining the ninetieth percentile value, results in
a strong positive bias for sites with low oocyst levels, their ninetieth  percentile value being
controlled not by the simulated oocyst densities themselves but by simulation of the sample
volume. When censored data are to be used in this way, maximum likelihood or rank order
statistical methods can be used to determine the parameters of the  probability distribution
function (Helsel & Cohn, 1988, Water Resources Research, 24:1997-2004)  and the censored
portion of the database can be repopulated to better determine the likely outcome.  EPA should
also consider how this matter will be addressed in the implementation of the Enhanced Surface
Water Treatment Rule. Defending a requirement for treatment at sites where  no oocysts are
detected will be difficult.

       EPA concluded that its simulation demonstrates the suitability of the Information
Collection Rule data to conduct a regulatory impact analysis, but the simulation addresses only
the cost issues of the impact analysis required and neglects health effects.  A regulatory impact
analysis considers, among other things, both the costs of compliance and the health benefits
received because of compliance.  This work demonstrates how the output of the Information
Collection Rule  might be used for the first part but it does not directly address the second. The
Committee identified two difficulties in conducting a meaningful regulatory impact analysis:

       i)     The Cromwell memo shows that, assuming the decisions to treat will be  based on

-------
              site-specific measurements of oocyst concentration, the cost of implementation
              can be estimated with reasonable accuracy using the Information Collection Rule
              data. On the other hand, current protozoan methods are not suitable for gathering
              the data for those site-specific decisions.  If improved oocyst monitoring methods
              are not made available for use in implementation of the Enhanced Surface Water
              Treatment  Rule, then another approach to classification of systems will be
              required and the regulatory impact analysis will have to be repeated using this
              new classification system.

       ii)      Before health benefits can be effectively addressed, analytical methods must  be
              developed  which address viability/infectivity of the organisms present.

       EPA concluded that the Information Collection Rule data will support reliable estimates of
national occurrence if the  average recovery rate is at least 8%.  The Committee does not take
issue with this judgement.  However, it notes that the choice of a recovery value of 8% was not
the solution to some well-defined optimization equation.  Instead it was based on interpolating
within the simulation data.  After looking at all the simulation results, the investigators appeared
to arrive at 8% as an informed judgment. The Committee understands that the simulation of
different recoveries was accomplished by increasing or decreasing all the recoveries sampled
from the performance evaluation study database by the same multiplier.  Different multipliers
would result in different mean recoveries for a given simulation. This would have the effect of
changing the central tendency of the distribution without changing the variance.  In actuality,
actions taken to improve recovery are likely to improve precision also (e.g. eliminating the labs
with lower recovery or developing better methods) . Much can be gained  by revising the assay
protocol so that each sample result is based on an effective volume of at  least Vo, where Vo is
specified at some appropriate large value. To accomplish the specified Vo,  it may be necessary
to use more than one subsample of a  pellet in the lab.  The cost of conducting the assay will  be
increased if the protocol is changed in this way (specifying a smallest allowable effective
volume). Simulation studies could show the benefits of such a change.

       The Agency noted in its report and in the cover letter transmitting the report to the
Committee, that "The basis of this distinction is an elementary, but fundamental, statistical
principle: averages of imprecise estimates, if based on many values, are more precise than are
the individual estimates."  Even so, the Agency must maintain its awareness that precision and
accuracy are different attributes of sampling and measurement. The accuracy of an estimate will
improve with averaging only if there are no systematic  biases present. The Committee identified
a few potential sources of bias that could affect estimates of protozoan concentrations in water.
If these, or other biases are present, they may be deriving an inaccurate estimate with a great
deal of precision.

       b)     Charge 2:  Evaluate the viability of the assumptions and conditions tested in
              the report.

       Most of the assumptions in the analysis are reasonable considering the time at which
they were made. The  main issue is whether the simulated scenarios cover the spectrum of
possibilities for the national survey. The data used were collected at a limited number of sites

-------
that may not accurately reflect the national occurrence of cysts and oocysts in surface water.
Since the simulations relied heavily on these data, if the data were not accurate, the reliability of
the simulation outcomes may be in question. The Agency must be alert to this possibility when
the Information Collection Rule data become available.  Parallel to this will be a need to
recognize that data obtained form the Information Collection Rule may be less useful than
implied by the current analysis.

       The following are additional specific comments:

       i)     The probability distribution function of the sample volumes in the six performance
             evaluation data sets is used to represent the probability distribution function of the
             Information Collection Rule as a whole.  EPA had no other data on the probability
             distribution function of sample volumes, but  it is likely that the outcome of the
             Information Collection Rule will have much greater variance than these sample
             volumes, which were drawn from a limited number of sites and analyzed by an
             elite group of laboratories.  The effect of this decision probably makes the
             simulation data have greater precision than the real Information Collection Rule
             data may actually have.  Moreover it seems  that real-world sample volumes are
             generally determined by water quality and operator judgment, with the poorer
             water qualities generally having smaller sample volumes.

       ii)     The simulations show that the proposed method for analyzing the Information
             Collection Rule protozoan density data possesses a positive  bias.  The
             investigators state that it should be possible to revise the statistical analysis
             method so it is less biased. It is conceivable that potential explanatory variables
             (e.g., turbidity, temperature, bacterial densities, etc.) can be used to strengthen
             the analysis. To use the ICR data to devise such bias adjustment procedures, it
             will be necessary to collect data on these explanatory variables.  However, the
             appropriate explanatory variables may have  not been adequately identified and
             characterized.  These variables can affect different parts of the procedure, such
             as, the volume of water which can be sampled, the volume of the final suspended
             pellet, and the number of oocysts not identified.

       iii)     The best method for correcting for a small recovery rate is an important open
             problem.  Because the Information Collection Rule will produce more recovery
             data, it may provide insights into better adjustment procedures. If the assay is to
             be used  as the basis for regulating a drinking water system the adjustment must
             be applicable to that specific system.  Adjustment by the national mean recovery
             rate may be acceptable for purposes of discovering the national distribution of
             oocyst densities, but it is not very satisfactory for regulatory purposes.

       c)     Charges 3 & 4: Evaluate the suitability of the report as a basis for making a
             decision on the use of protozoan monitoring data fora  national impact
             assessment and evaluate whether the degree of accuracy and precision of
             the protozoan method is acceptable for an impact analysis.

-------
       These reports are of good quality, but they should be treated only as a beginning. The
Agency's thinking will require further development for effective decision making. The following
are additional specific comments:

       i)     These documents have chosen to base the performance standard on the second
             highest value measured in eighteen samples, as an estimate of the ninetieth
             percentile.  This approach does not take full advantage of the information that is
             available to determine the distribution of oocyst densities at the site and because
             it uses such a small data sample, it is highly variable in its outcome. A better
             method would be to use maximum likelihood or rank order statistics to determine
             the parameters of the distribution and estimate performance standards from these
             (Helfel and Cohn,  1997; Newman and Dixon, 1990).

       ii)     The documents implied that the performance standard would be based on the
             estimated parameter.  A better choice would be to base the performance standard
             on a specific confidence limit for that parameter (e.g., the upper 95% or 99%
             confidence limit).  However, the selection of the specific limit is a value judgement
             that must ultimately  be made by the Agency. The confidence limit approach
             guarantees a specified small chance that a system truly afflicted with too many
             protozoa will incorrectly pass the compliance criterion.  It also provides incentive
             for good sampling and assay techniques. Imagine two drinking water systems
             that collected monitoring data and calculated the same estimated value for the
             oocyst density parameter. Suppose that the first system achieved large effective
             volumes and large recovery rates, but the second system did not.  Then the upper
             confidence limit for the first system will be less than that for the second. If the
             performance standard were stated for the upper confidence limit, the first system
             could be  in compliance, but  the second system would not (that is, the system
             providing unreliable  results will more often be required to perform enhanced
             treatment).  In summary, the RIA should be conducted for a compliance criterion
             based on the upper  95% confidence  limit.  To do that, a confidence interval
             procedure would have to be developed.

       iii)    Because the method used in the simulated regulatory impact assessment
             involved a national recovery adjustment, this method would  not be appropriate for
             implementation at the  local system level as it would result in treatment being
             required in many places where it is not necessary and not being applied in others
             where it should be. This problem is best addressed  by improvement of methods
             and elimination of recovery adjustments altogether.

       The Committee notes that Agency efforts, such as that described in the report reviewed
herein, do not often get widespread distribution.  Their accessibility is thus  restricted resulting in
limited scientific deliberation on, and feedback to the Agency, on issues that are important to
decision making. For this reason, the Committee recommends that the Agency prepare a
manuscript describing this work and submit it for publication in a peer-reviewed scientific journal
that uses statisticians as reviewers (e.g., Environmental Science and Technology, Water
Research, Environmetrics). The  theoretical work described above  would enhance the

                                           8

-------
manuscript.

       We appreciate having been given the opportunity to address this issue, and look forward
to receiving a response to our comments from the Office of Water.

                                        Sincerely,
                                         /signed/
                                 Dr. Genevieve Matanowski, Chair
                                 Science Advisory Board
                                          /signed/
                                 Dr. Richard J. Bull, Chair
                                 Drinking Water Committee

-------
                              LIST OF ABBREVIATIONS

DWC                Drinking Water Committee
EPA                U.S. Environmental Protection Agency
ESWTR             Enhanced Surface Water Treatment Rule
FS                  Field Spiking Studies
ICR                 Information Collection Rule
ND                  Non-detects or non-detected
OW                 Office of Water
PDF                Probability Distribution Function
PE                  Performance Evaluation Studies
RIA                 Regulatory Impact Analysis
SAB                EPA Science Advisory Board

-------
                                 REFERENCES CITED
Cromwell, J. 1996. "Estimation of ESWTR national costs using simulated ICR monitoring
       data".Contractor's report.  24 pp.

Cromwell, J. 1996. "Revised analysis ESWTR national costs using revised simulations".
       Contractor's report.  3 pp.

Fox, J. 1996. An Evaluation of the Statistical Performance of a Method for Monitoring
       Protozoan Cysts in U.S. Source Waters. US Environmental Protection Agency/Office of
       Water/Office of Science and Technology. Attachment to a memorandum from John Fox
       to Dr. Donald G.  Barnes, Science Advisory Board; June 26, 1996.

Fox, J. 1996.  "Recovery rates for raw water and ICR protozoan method performance". US
       Environmental Protection Agency/Office of Water/Office of Science and Technology.
       Memorandum dated April 26, 1996. 8 pp.

Fox, J. 1996.  "Simulation of protozoan method performance based on field spike recovery
       rates". US Environmental Protection Agency/Office of Water/Office of Science and
       Technology.  Memorandum dated April 19, 1996.  15pp.

Helfel, D. And T. Cohn.  1988.  Estimation of desriptive statistics for multiply censored water
       quality data. Water Resources Research 24:1997-2004.

Hamilton, Martin A.; Ph.D. 1996.  Statistical Review of the Draft Report: An Evaluation
       of the Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source
       Waters (and supporting documents). Report dated July 26, 1996.

LeChevallier, M.  1995.  Summary of an American Water Service Company study  in the
       eastern and central United States (not a title). Jour, of the Amer. Water Works Assoc.
       September 1995.

Newman, M. And D. Dixon.  1990. Uncensor: A program to estimate the means and standard
       deviations for data sets with below detection observations. April  1990. American
       Environmental Laboratory.  P. 26.

-------
                   U.S. ENVIRONMENTAL PROTECTION AGENCY
                           SCIENCE ADVISORY BOARD
                          DRINKING WATER COMMITTEE
                                 June 9-10, 1997

  Science Advisory Board Review of the Agency Report: An Evaluation of the Statistical
 Performance of a Method for Monitoring Protozoan Cysts in U.S. Source Waters -PANEL

CHAIR
  DR.  RICHARD BULL, Batelle Pacific Northwest Laboratories, Molecular Biosciences, P.O. Box
999-P7-56, Richland, WA 99352

PAST CHAIR

DR. VERNE A. RAY, Medical Research Laboratory, Pfizer Inc., Groton, Connecticut  06340

MEMBERS

Dr. JUDY A. BEAN, Professor, Department of Epidemiology &  Public Health, University of Miami,
School of Medicine, 1801 N.W. 9th Avenue, Miami, Florida 33136

DR. LENORE S. CLESCERI, Rensselaer Polytechnic Institute, Materials Research Center 236,
Troy, New York 12181

DR. ANNA FAN-CHEUK, California Environmental Protection Agency, 2151 Berkley Way - Annex
11, Berkley,  California 94704

DR. LEE D.  (L.D.) MCMULLEN, General Manager, Des Moines Waterworks, 2201 Valley Drive,
Des Moines, IA 50321

DR. CHARLES O'MELIA, Professor and Chairman, Department of Geography and Environmental
Engineering, The Johns Hopkins University, Baltimore, MD 21218

DR. EDO D. PELLIZZARI, Research Triangle Institute, PO Box 12194, 3040 Cornwallis Rd,
Research Triangle Park, NC 27709

DR. RHODES TRUSSELL, Montgomery Watson, 300 North Lake Avenue, Suite 1200,
Pasadena, CA91101

DR. MARYLYNN V. YATES, Department of Soil and Environmental Sciences, Room 2208
Geology, University of California, Riverside, CA 92521
                                       in
DESIGNATED FEDERAL OFFICER

-------
MR. THOMAS MILLER, Designated Federal Official, US EPA/Science Advisory Board, 401 M
Street, S.W. (1400), Washington, D.C. 20460 (202) 260-5886 FAX: (202) 260-7118

STAFF SECRETARY
MS. MARY WINSTON, Staff Secretary, Science Advisory Board, 401 M Street, S.W. (1400),
Washington, DC 20460 (202) 260-8414 FAX: (202) 260-7118
                                        IV

-------
                                        NOTICE
This report has been written as a part of the activities of the Science Advisory Board, a public
advisory group providing extramural scientific information and advice to the Administrator and
other officials of the Environmental Protection Agency.  The Board is structured to provide
balanced,  expert assessment of scientific matters relating to problems facing the Agency.  This
report has not been reviewed for approval by the Agency and, therefore, the contents of this
report do not necessarily represent the views and policies of the Environmental Protection
Agency, nor of other agencies in the Executive Branch of the Federal Government, nor does
mention of trade names or commercial products constitute a recommendation for use.

-------
                                  DISTRIBUTION LIST

Administrator
Deputy Administrator
Assistant Administrators
Deputy Assistant Administrators for Research and Development
Deputy Assistant Administrator for Water
EPA Regional Administrators
EPA Laboratory and Center Directors
EPA Headquarters Library
EPA Regional Libraries
EPA Laboratory Libraries
Director; Office of Science and Technology, Office of Water
Director: Office of Groundwater and Drinking water, Office of Water
Drinking Water Committee Members

-------