September 2, 1997 EPA-SAB-DWC-LTR-97-010 Honorable Carol M. Browner Administrator U.S. Environmental Protection Agency 401 M Street, S.W. Washington, D.C. 20460 Subject: Science Advisory Board Review of the Agency Report: An Evaluation of the Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source Waters (June 26, 1996) Dear Ms. Browner: The Drinking Water Committee (DWC or the Committee) of the U.S. Environmental Protection Agency (EPA) Science Advisory Board met on July 17, 1996 and again on June 7, 1997 in Washington, DC to review a report describing the statistical performance of the Agency's protozoan oocyst monitoring methods. Agency staff recognized that the protozoan analysis methodology that had been formally adopted for the Information Collection Rule (ICR) was crude and had very low and highly variable recoveries of added oocysts. The statistical methodology was considered by Agency staff to be necessary to determine whether the Agency can take advantage of the large monitoring program agreed to under the negotiated rulemaking process. The Agency charge asked that the Drinking Water Committee evaluate the report and address the following concerns: Charge 1: Evaluate the factual and conceptual soundness of the approach and methods used, and the soundness of the results and conclusions of the report. Charge 2: Evaluate the viability of the assumptions and conditions tested in the report. Charge 3: Evaluate the suitability of the report as a basis for making a decision on the use of protozoan monitoring data for a national impact assessment. Charge 4: Evaluate whether the degree of accuracy and precision of the protozoan method is acceptable for an impact analysis. ------- This letter report provides the Drinking Water Committee's comments, conclusions, and recommendations in response to the Agency's charge. 1. Overview and Summary The Agency's draft report is based on a series of computer simulation studies. The statistical properties of data from the EPA performance evaluation (PE) studies, and EPA field spiking (FS) studies, along with extensive Giardia and Cryptosporidium sampling studies by the American Waterworks Service Company, were used to simulate the outcome of the Information Collection Rule sampling protocol. This outcome was then used to examine the significance of improvements in method recovery and to further simulate a national regulatory impact analysis (RIA). The Agency report concluded that: a) The Information Collection Rule should provide a reliable estimate of the distribution of oocyst densities for the nation as a whole, provided that the ICR data achieve a minimum recovery rate of 8%. Some relevant scenarios were also programmed by three different groups using the same data and assumptions, and three different programming languages. Since all three gave the same statistical properties for the potential ICR data, the Agency stated that this provided further support for the simulations. b) The error in national cost estimates, introduced by unit treatment cost assumptions and decision tree assumptions, would be sufficient to mask the effect of errors from ICR monitoring. This was based on an analyses by (Cromwell, cited below), simulating the impact of a regulation that would specify site-specific treatment based on site-specific protozoan measurements. c) The Information Collection Rule's protozoan data will be adequate fora national regulatory impact analysis. The Committee reviewed the materials presented and concluded that the investigators used sound approaches and methods, and that the methods conform to conventional standards of modern statistical practice. As a consequence, the Committee feels that the statistical method can be used with some confidence for the limited objectives within Charges one and two. However, the Committee is concerned about extension of Charges three and four to the implementation of the Enhanced Surface Water Treatment Rule. Essentially, the limitations in the robustness, representativeness, and overall quality of data restrict its utility in statistical modeling and the Committee has comments on the methods employed and on the conclusions drawn from the analysis. The heart of the issue is the protozoan assay technique itself. As a result, the Committee found it was necessary to make constructive comment on the method itself as well as on the simulation effort. A summary of the Committee's principle comments are: ------- a) The data used were collected at a limited number of sites that may not accurately reflect the national occurrence of cysts and oocysts in surface water. Since the simulations relied heavily on these data, if the data were not accurate, the reliability of the simulation outcomes may be in question. b) Before health benefits can be effectively addressed, analytical methods must be developed to address viability/infectivity of the organisms present as well as recovery. c) A Regulatory Impact Analysis considers the cost of compliance and the health benefits associated with compliance. This work demonstrates how the output of the Information Collection Rule can be used for the first part but does not directly address the second. d) With regard to the cost of compliance, if improved oocyst monitoring methods do not become available for use in implementation of the Enhanced Surface Water Treatment Rule, then the classification system used to simulate the use of the Information Collection Rule to project site specific decisions will no longer be suitable. e) When simulated sampling resulted in a "not detected" (ND), the EPA re-populated the database with values at one fourth of the detection limit. This might result in an unnecessary bias in the outcome. When available data are censored, maximum likelihood or rank order statistical methods should be used. f) The Agency choice of a minimum recovery value of 8% is based on interpolating within the simulation data. In actuality, actions taken to improve recovery will improve precision also. The lower the recovery rates are, the higher is the percentage of the data below the detection limit. These simulations may not represent the most likely outcomes. g) The Agency has not made an adequate scientific case for selecting the second highest of 18 measurements as the basis for regulation. This approach does not take full advantage of the Information Collection Rule data that will be available for characterizing the distribution of cyst densities. h) A national recovery adjustment for the protozoan assay is suitable for a Regulatory Impact Assessment, but not for implementation of the Enhanced Surface Water Treatment Rule at the local system level since it would result in treatment being required in many places where it is not necessary and not being applied in others where it should be. This problem is best addressed by improvement in the methods and elimination of recovery adjustments altogether. i) The probability distribution function (PDF) of the sample volumes in the six performance evaluation data sets is used to represent the PDF of the Information Collection Rule as a whole. It is likely that the outcome of the ICR will have much greater variance than these sample volumes, which were drawn from a limited number of sites, and conducted by an elite group of laboratories. Much can be gained by revising the assay protocol so that each sample result is based on a large and consistent effective ------- volume. j) Another Committee concern is the lack of any expression of the variability of the estimator, permitting a measurement of uncertainty. The Information Collection Rule method should report protozoan density accompanied by a realistic measure of uncertainty. The procedure should include formulas for calculating the standard error and the upper 95% confidence limit. k) The simulations show that the proposed method for analyzing the Information Collection Rule protozoan density data possesses a positive bias. It should be possible to revise the statistical analysis method so it is less biased. Explanatory variables (e.g., turbidity, temperature, bacterial densities, etc.) can be used to strengthen the analysis. To use the ICR data to devise such bias adjustment procedures, it will be necessary to collect data on these explanatory variables. I) The Committee recommends that the EPA prepare a manuscript describing this work and submit it for publication in a peer-reviewed scientific journal. Publication would improve the accessibility of the Agency's documentation of this effort. 2. Specific Comments and Recommendations This report assumes that the reader is familiar with the subject draft report (Fox, June 26, 1996) and the following supporting documents: 1) EPA memorandum on "Recovery rates for raw water and ICR protozoan method performance" (Fox, April 26, 1996); 2) EPA memorandum on "Simulation of protozoan method performance based on field spike recovery rates" (Fox, April 19, 1996); and 3) a contractor's memorandum on "Estimation of ESWTR National costs using simulated ICR monitoring data" (Cromwell, February 12, 1996 and an amendment dated April 22, 1996). It also assumes the reader is familiar with the analytical method for protozoan oocysts and the terminology and acronyms associated with this field. The Committee reviewed the draft report and all supporting documents provided by the Agency, and received oral presentations by EPA Office of Water (OW) personnel. This report is based on those documents and discussions and is organized into three sections corresponding to the charge (charge elements 3 and 4 have been combined). In general, the Committee agrees with the report and the Agency's interpretation of it, though the Committee has suggested some limits on the interpretation of the results. This body of work should be viewed as an initial step with much follow-up needed. As a result, the Committee has made some comments and suggestions for the future. a) Charge 1: Evaluate the factual and conceptual soundness of the approach and methods used, and the soundness of the results and conclusions of the report. The Committee concludes that the investigators used sound approaches and methods, ------- and the methods used, in general, conform to conventional standards of modern statistical practice. While the approach was appropriate for determining if the Information Collection Rule data would be useful in conducting a regulatory impact analysis, it is not appropriate for actually conducting the analysis once the ICR data are available. The Committee does have some modifications to suggest for the methods and the conclusions drawn from the analysis. In principle one could derive the statistical properties of potential Information Collection Rule data using closed form analytical methods; however, the investigators found that mathematical analysis was too difficult or required excessive simplifying assumptions, thus, computer simulations seemed to be the only feasible approach. It is common to use computer simulation methodology in work of this kind. Often the simulations provide a check that a proposed mathematical statistical methodology is valid, or compare the characteristics of alternative experimental designs. The Committee believes that the computer simulation experiments for the ICR protozoan monitoring program could have been done differently. Even so, it seems likely that if the simulations were conducted under the same assumptions the simulation outcomes would have been much the same as those presented in the report. The Committee has not performed any independent simulations or analyses of the simulation data. Some of the scenarios were programmed by three different groups using three different programming languages and resulted in the same statistical properties for the potential Information Collection Rule data. This insures against errors in programming, but it does not necessarily validate the model. The Committee does think that some improvement should be made. These are detailed in the following section. In EPA's simulation protocol, non-detects (NDs) were replaced with a value of one fourth the detection limit. This decision, combined with the decision to select the second largest of 18 measurements as a 'non-parametric' way of determining the ninetieth percentile value, results in a strong positive bias for sites with low oocyst levels, their ninetieth percentile value being controlled not by the simulated oocyst densities themselves but by simulation of the sample volume. When censored data are to be used in this way, maximum likelihood or rank order statistical methods can be used to determine the parameters of the probability distribution function (Helsel & Cohn, 1988, Water Resources Research, 24:1997-2004) and the censored portion of the database can be repopulated to better determine the likely outcome. EPA should also consider how this matter will be addressed in the implementation of the Enhanced Surface Water Treatment Rule. Defending a requirement for treatment at sites where no oocysts are detected will be difficult. EPA concluded that its simulation demonstrates the suitability of the Information Collection Rule data to conduct a regulatory impact analysis, but the simulation addresses only the cost issues of the impact analysis required and neglects health effects. A regulatory impact analysis considers, among other things, both the costs of compliance and the health benefits received because of compliance. This work demonstrates how the output of the Information Collection Rule might be used for the first part but it does not directly address the second. The Committee identified two difficulties in conducting a meaningful regulatory impact analysis: i) The Cromwell memo shows that, assuming the decisions to treat will be based on ------- site-specific measurements of oocyst concentration, the cost of implementation can be estimated with reasonable accuracy using the Information Collection Rule data. On the other hand, current protozoan methods are not suitable for gathering the data for those site-specific decisions. If improved oocyst monitoring methods are not made available for use in implementation of the Enhanced Surface Water Treatment Rule, then another approach to classification of systems will be required and the regulatory impact analysis will have to be repeated using this new classification system. ii) Before health benefits can be effectively addressed, analytical methods must be developed which address viability/infectivity of the organisms present. EPA concluded that the Information Collection Rule data will support reliable estimates of national occurrence if the average recovery rate is at least 8%. The Committee does not take issue with this judgement. However, it notes that the choice of a recovery value of 8% was not the solution to some well-defined optimization equation. Instead it was based on interpolating within the simulation data. After looking at all the simulation results, the investigators appeared to arrive at 8% as an informed judgment. The Committee understands that the simulation of different recoveries was accomplished by increasing or decreasing all the recoveries sampled from the performance evaluation study database by the same multiplier. Different multipliers would result in different mean recoveries for a given simulation. This would have the effect of changing the central tendency of the distribution without changing the variance. In actuality, actions taken to improve recovery are likely to improve precision also (e.g. eliminating the labs with lower recovery or developing better methods) . Much can be gained by revising the assay protocol so that each sample result is based on an effective volume of at least Vo, where Vo is specified at some appropriate large value. To accomplish the specified Vo, it may be necessary to use more than one subsample of a pellet in the lab. The cost of conducting the assay will be increased if the protocol is changed in this way (specifying a smallest allowable effective volume). Simulation studies could show the benefits of such a change. The Agency noted in its report and in the cover letter transmitting the report to the Committee, that "The basis of this distinction is an elementary, but fundamental, statistical principle: averages of imprecise estimates, if based on many values, are more precise than are the individual estimates." Even so, the Agency must maintain its awareness that precision and accuracy are different attributes of sampling and measurement. The accuracy of an estimate will improve with averaging only if there are no systematic biases present. The Committee identified a few potential sources of bias that could affect estimates of protozoan concentrations in water. If these, or other biases are present, they may be deriving an inaccurate estimate with a great deal of precision. b) Charge 2: Evaluate the viability of the assumptions and conditions tested in the report. Most of the assumptions in the analysis are reasonable considering the time at which they were made. The main issue is whether the simulated scenarios cover the spectrum of possibilities for the national survey. The data used were collected at a limited number of sites ------- that may not accurately reflect the national occurrence of cysts and oocysts in surface water. Since the simulations relied heavily on these data, if the data were not accurate, the reliability of the simulation outcomes may be in question. The Agency must be alert to this possibility when the Information Collection Rule data become available. Parallel to this will be a need to recognize that data obtained form the Information Collection Rule may be less useful than implied by the current analysis. The following are additional specific comments: i) The probability distribution function of the sample volumes in the six performance evaluation data sets is used to represent the probability distribution function of the Information Collection Rule as a whole. EPA had no other data on the probability distribution function of sample volumes, but it is likely that the outcome of the Information Collection Rule will have much greater variance than these sample volumes, which were drawn from a limited number of sites and analyzed by an elite group of laboratories. The effect of this decision probably makes the simulation data have greater precision than the real Information Collection Rule data may actually have. Moreover it seems that real-world sample volumes are generally determined by water quality and operator judgment, with the poorer water qualities generally having smaller sample volumes. ii) The simulations show that the proposed method for analyzing the Information Collection Rule protozoan density data possesses a positive bias. The investigators state that it should be possible to revise the statistical analysis method so it is less biased. It is conceivable that potential explanatory variables (e.g., turbidity, temperature, bacterial densities, etc.) can be used to strengthen the analysis. To use the ICR data to devise such bias adjustment procedures, it will be necessary to collect data on these explanatory variables. However, the appropriate explanatory variables may have not been adequately identified and characterized. These variables can affect different parts of the procedure, such as, the volume of water which can be sampled, the volume of the final suspended pellet, and the number of oocysts not identified. iii) The best method for correcting for a small recovery rate is an important open problem. Because the Information Collection Rule will produce more recovery data, it may provide insights into better adjustment procedures. If the assay is to be used as the basis for regulating a drinking water system the adjustment must be applicable to that specific system. Adjustment by the national mean recovery rate may be acceptable for purposes of discovering the national distribution of oocyst densities, but it is not very satisfactory for regulatory purposes. c) Charges 3 & 4: Evaluate the suitability of the report as a basis for making a decision on the use of protozoan monitoring data fora national impact assessment and evaluate whether the degree of accuracy and precision of the protozoan method is acceptable for an impact analysis. ------- These reports are of good quality, but they should be treated only as a beginning. The Agency's thinking will require further development for effective decision making. The following are additional specific comments: i) These documents have chosen to base the performance standard on the second highest value measured in eighteen samples, as an estimate of the ninetieth percentile. This approach does not take full advantage of the information that is available to determine the distribution of oocyst densities at the site and because it uses such a small data sample, it is highly variable in its outcome. A better method would be to use maximum likelihood or rank order statistics to determine the parameters of the distribution and estimate performance standards from these (Helfel and Cohn, 1997; Newman and Dixon, 1990). ii) The documents implied that the performance standard would be based on the estimated parameter. A better choice would be to base the performance standard on a specific confidence limit for that parameter (e.g., the upper 95% or 99% confidence limit). However, the selection of the specific limit is a value judgement that must ultimately be made by the Agency. The confidence limit approach guarantees a specified small chance that a system truly afflicted with too many protozoa will incorrectly pass the compliance criterion. It also provides incentive for good sampling and assay techniques. Imagine two drinking water systems that collected monitoring data and calculated the same estimated value for the oocyst density parameter. Suppose that the first system achieved large effective volumes and large recovery rates, but the second system did not. Then the upper confidence limit for the first system will be less than that for the second. If the performance standard were stated for the upper confidence limit, the first system could be in compliance, but the second system would not (that is, the system providing unreliable results will more often be required to perform enhanced treatment). In summary, the RIA should be conducted for a compliance criterion based on the upper 95% confidence limit. To do that, a confidence interval procedure would have to be developed. iii) Because the method used in the simulated regulatory impact assessment involved a national recovery adjustment, this method would not be appropriate for implementation at the local system level as it would result in treatment being required in many places where it is not necessary and not being applied in others where it should be. This problem is best addressed by improvement of methods and elimination of recovery adjustments altogether. The Committee notes that Agency efforts, such as that described in the report reviewed herein, do not often get widespread distribution. Their accessibility is thus restricted resulting in limited scientific deliberation on, and feedback to the Agency, on issues that are important to decision making. For this reason, the Committee recommends that the Agency prepare a manuscript describing this work and submit it for publication in a peer-reviewed scientific journal that uses statisticians as reviewers (e.g., Environmental Science and Technology, Water Research, Environmetrics). The theoretical work described above would enhance the 8 ------- manuscript. We appreciate having been given the opportunity to address this issue, and look forward to receiving a response to our comments from the Office of Water. Sincerely, /signed/ Dr. Genevieve Matanowski, Chair Science Advisory Board /signed/ Dr. Richard J. Bull, Chair Drinking Water Committee ------- LIST OF ABBREVIATIONS DWC Drinking Water Committee EPA U.S. Environmental Protection Agency ESWTR Enhanced Surface Water Treatment Rule FS Field Spiking Studies ICR Information Collection Rule ND Non-detects or non-detected OW Office of Water PDF Probability Distribution Function PE Performance Evaluation Studies RIA Regulatory Impact Analysis SAB EPA Science Advisory Board ------- REFERENCES CITED Cromwell, J. 1996. "Estimation of ESWTR national costs using simulated ICR monitoring data".Contractor's report. 24 pp. Cromwell, J. 1996. "Revised analysis ESWTR national costs using revised simulations". Contractor's report. 3 pp. Fox, J. 1996. An Evaluation of the Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source Waters. US Environmental Protection Agency/Office of Water/Office of Science and Technology. Attachment to a memorandum from John Fox to Dr. Donald G. Barnes, Science Advisory Board; June 26, 1996. Fox, J. 1996. "Recovery rates for raw water and ICR protozoan method performance". US Environmental Protection Agency/Office of Water/Office of Science and Technology. Memorandum dated April 26, 1996. 8 pp. Fox, J. 1996. "Simulation of protozoan method performance based on field spike recovery rates". US Environmental Protection Agency/Office of Water/Office of Science and Technology. Memorandum dated April 19, 1996. 15pp. Helfel, D. And T. Cohn. 1988. Estimation of desriptive statistics for multiply censored water quality data. Water Resources Research 24:1997-2004. Hamilton, Martin A.; Ph.D. 1996. Statistical Review of the Draft Report: An Evaluation of the Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source Waters (and supporting documents). Report dated July 26, 1996. LeChevallier, M. 1995. Summary of an American Water Service Company study in the eastern and central United States (not a title). Jour, of the Amer. Water Works Assoc. September 1995. Newman, M. And D. Dixon. 1990. Uncensor: A program to estimate the means and standard deviations for data sets with below detection observations. April 1990. American Environmental Laboratory. P. 26. ------- U.S. ENVIRONMENTAL PROTECTION AGENCY SCIENCE ADVISORY BOARD DRINKING WATER COMMITTEE June 9-10, 1997 Science Advisory Board Review of the Agency Report: An Evaluation of the Statistical Performance of a Method for Monitoring Protozoan Cysts in U.S. Source Waters -PANEL CHAIR DR. RICHARD BULL, Batelle Pacific Northwest Laboratories, Molecular Biosciences, P.O. Box 999-P7-56, Richland, WA 99352 PAST CHAIR DR. VERNE A. RAY, Medical Research Laboratory, Pfizer Inc., Groton, Connecticut 06340 MEMBERS Dr. JUDY A. BEAN, Professor, Department of Epidemiology & Public Health, University of Miami, School of Medicine, 1801 N.W. 9th Avenue, Miami, Florida 33136 DR. LENORE S. CLESCERI, Rensselaer Polytechnic Institute, Materials Research Center 236, Troy, New York 12181 DR. ANNA FAN-CHEUK, California Environmental Protection Agency, 2151 Berkley Way - Annex 11, Berkley, California 94704 DR. LEE D. (L.D.) MCMULLEN, General Manager, Des Moines Waterworks, 2201 Valley Drive, Des Moines, IA 50321 DR. CHARLES O'MELIA, Professor and Chairman, Department of Geography and Environmental Engineering, The Johns Hopkins University, Baltimore, MD 21218 DR. EDO D. PELLIZZARI, Research Triangle Institute, PO Box 12194, 3040 Cornwallis Rd, Research Triangle Park, NC 27709 DR. RHODES TRUSSELL, Montgomery Watson, 300 North Lake Avenue, Suite 1200, Pasadena, CA91101 DR. MARYLYNN V. YATES, Department of Soil and Environmental Sciences, Room 2208 Geology, University of California, Riverside, CA 92521 in DESIGNATED FEDERAL OFFICER ------- MR. THOMAS MILLER, Designated Federal Official, US EPA/Science Advisory Board, 401 M Street, S.W. (1400), Washington, D.C. 20460 (202) 260-5886 FAX: (202) 260-7118 STAFF SECRETARY MS. MARY WINSTON, Staff Secretary, Science Advisory Board, 401 M Street, S.W. (1400), Washington, DC 20460 (202) 260-8414 FAX: (202) 260-7118 IV ------- NOTICE This report has been written as a part of the activities of the Science Advisory Board, a public advisory group providing extramural scientific information and advice to the Administrator and other officials of the Environmental Protection Agency. The Board is structured to provide balanced, expert assessment of scientific matters relating to problems facing the Agency. This report has not been reviewed for approval by the Agency and, therefore, the contents of this report do not necessarily represent the views and policies of the Environmental Protection Agency, nor of other agencies in the Executive Branch of the Federal Government, nor does mention of trade names or commercial products constitute a recommendation for use. ------- DISTRIBUTION LIST Administrator Deputy Administrator Assistant Administrators Deputy Assistant Administrators for Research and Development Deputy Assistant Administrator for Water EPA Regional Administrators EPA Laboratory and Center Directors EPA Headquarters Library EPA Regional Libraries EPA Laboratory Libraries Director; Office of Science and Technology, Office of Water Director: Office of Groundwater and Drinking water, Office of Water Drinking Water Committee Members ------- |