&EPA United States Environmental Protection Agency 2006 Drinking Water Data Reliability Analysis and Action Plan For State Reported Public Water System Data In the EPA Safe Drinking Water Information System / Federal Version (SDWIS/FED) Office of Water (4606M) EPA816-R-07-010 www.epa.gov/safewater March 2008 Printed on Recycled Paper ------- 2006 Drinking Water Data Quality Analysis and Action Plan Executive Summary Safeguarding our nation's drinking water by developing effective and appropriate policy decisions and conducting program oversight depends on data of known and documented quality. The Safe Drinking Water Information System/Federal Version (SDWIS/FED) is the Environmental Protection Agency's (EPA) principal database for the national drinking water program. It contains data on public water systems (PWS) provided by the states to EPA. It is primarily used for management of state and EPA programs and for informing the public about the compliance status of their drinking water systems, and indirectly, the safety of their drinking water. EPA uses the information in SDWIS/FED for various analyses to support programmatic decisions and identify trends, problems, and opportunities for improvement of the states' rule implementation as well as program oversight. Consequently, the utility of SDWIS/FED information for these purposes highly depends on the quality of the data it contains. EPA routinely evaluates state programs by conducting Data Verification (DV) audits, which evaluate state compliance decisions and reporting to SDWIS/FED. EPA prepares triennial summary evaluations based on the DV. This document presents results of EPA's third triennial review of data quality in SDWIS/FED and includes an evaluation of the data collected from 2002 through 2004. For the 38 states evaluated, we found that: • Ninety-four percent of health-based violation data in SDWIS/FED were accurate. • Approximately 81% of the maximum contamination level (MCL) and surface water treatment technology (SWTR TT) violations were reported to SDWIS/FED. • Including lead and copper treatment technology (LCR TT) violations, about 62% of the health-based violations (MCL and Treatment Technology violations) were reported to SDWIS/FED, where only 8% of LCR TT violations were reported to SDWIS/FED. • Only approximately 30% of the monitoring and reporting (M/R) violations were reported to SDWIS/FED. • The primary reason for non-reporting was due to compliance determination errors rather than data flow errors. • Further, 60% of the health-based violations1 were reported on time and approximately 30% of the monitoring and reporting violations were reported on time to SDWIS/FED in 2004. Background SDWIS/FED contains data about PWS facilities, violations (e.g., exceptions and exceedances) of Federal drinking water regulations adopted by the states, and enforcement actions taken by the state. The regulations include health-based drinking water quality standards, performance of 1 The health-based violations in this reference do not include lead and copper treatment technology violations because they have open-ended compliance period end dates. ------- treatment techniques and/or process requirements. The focus of this report is on two types of violations: (1) health-based violations (i.e., exceedance of maximum contaminant level or non- performance of a treatment technique or process), and (2) monitoring and reporting violations (i.e., a water system did not monitor, did not report monitoring results, or were late in reporting results to the state.) States manage their own processes and databases differently to document public water system capabilities and their program management decisions concerning violations (or noncompliance), and to record corrective actions undertaken. State data indicate that violations occur infrequently at most public water systems (PWS). Violation data that states report to EPA (SDWIS/FED) reflect only those major and minor noncompliance results that may lead to adverse public health outcomes. Violations represent a small fraction of all the determinations states make which demonstrates the safety of the nation's water supply. The first triennial review of data quality evaluated data for the period 1996-1998. That assessment, which resulted in a detailed data analysis report in 2000, produced an action plan under which states and EPA worked together to improve data quality. The plan resulted in actions that included training state staff, streamlining reporting to SDWIS/FED, making SDWIS/FED error reporting and correction more user-friendly, improving DVs, following up with Regions after DVs, and encouraging states to notify water systems of sampling schedules annually. Similarly, the second triennial review of data quality analyzed the data from the period 1999-2001 and findings were presented in the 2003 report. The recommended action plan in the 2003 report included: • Development of state-specific compliance determination and quality improvements plans necessary to remedy the major problem areas, • Conducting and improving data quality analysis and report results, • Implementation of the OGWDW information strategic plan and SDWIS/FED modernization, • Development of an automated monitoring requirement and sampling schedule tracking system by the states, and evaluation of timeliness of violations and potential violation non-reporting. This Review Between 2002 and 2004, EPA conducted DV audits in 38 states and reviewed data on drinking water system information, violations, and enforcement actions. See Table ES-1 for the list of DV States. EPA evaluated 2,658 PWSs, of which 43% were Community Water Systems (CWS). See Table ES-2 for the distribution of systems by system type and the size of population served. The violations addressed by the DVs are shown in the Appendix B. The period of review by rule was generally the two most recently scheduled monitoring periods for each water system and applicable rule. For the Total Coliform Rule (TCR) and the Surface Water Treatment Rule (SWTR TT), the most recent four quarters were evaluated. ------- Table ES-1: States Subject to Data Verifications from 2002-2004 Region 1 2 3 4 5 States CT, MA, RI, VT NJ, VI MD, PA, VA, WV AL, FL, KY, MS, NC('02), NC('04) SC, TN IL, MI, MN, OH Region 6 7 8 9 10 States AR, NM, OK, TX IA, MO CO, SD, UT, WY AZ, CA, R9 Tribes AK, ID, WA Table ES-2: Number of Systems included in Data Verifications by System Type and Size System Size Very Small (500 or fewer) Small (5 01 -3, 3 00) Medium (3,301-10,000) Large (10,001-100,000) Very Large (> 100,000) Total System Type CWS 572 277 119 135 44 1,147 NTNCWS 637 123 9 4 0 738 TNCWS 696 36 6 0 0 773 Total 1,905 436 134 139 44 2,658 Summary of Results For the MCL/SWTR TT violations, 81% of the data were reported to SDWIS/FED, Figure ES-3 summarizes the data quality estimates by violation type. Of the non-reported violations, 74% were due to compliance determination (CD) errors, where the states did not issue a violation when a violation had occurred. Twenty-six percent of the non-reported violations were due to data flow (DF) errors. Figure ES-4 summarizes the percentage of errors contributed from non- reporting by violation type. Approximately 94% of the data in SDWIS/FED were accurate. The overall data quality (DQ) of the MCL/SWTR TT violations was 77%. This means that 77% of the noncompliance determinations on MCL/SWTR TT were correctly reported in SDWIS/FED. in ------- Figure ES-3: Data Quality Estimates by Violation Type 100.000/* 90.00°/< 80.000/* 70.000/* 60.000/* 50.00°/< 40.000/* 30.00°/< 20.000/* 10.00°/< 0.00°/< ) ) ) ) ) ) ) ) ) ) 81 .33% Con 61 .69% iplete 29.02% ness I 94.12% A 94.30% ccurac 88.35% =y i 77.21% 59.18% DQ 27.08% E3MCL/SWTRTT Violations n Health-Based Violations H M/R Violations Figure ES-4: Percentages of Error Contribution to Non-Reporting of Violations 100% 90% 80% 70% - 60% 50% 40% - 30% 20% 10% - no/, 26.16% 73 84% MCL/SWTRTT Violations 15.55% 84 45% 7.97% 92 03%v Health-Based Violations M/R Violations m CD Error D DF Error CD: Compliance Determination DF: Data Flow IV ------- For the health-based violations including LCR TT violations, 62% of the data were reported to SDWIS/FED. Of the non-reported violations, 84% were due to CD errors. Approximately 94% of the health-based violations data in SDWIS/FED were accurate in SDWIS/FED. The overall data quality of the health-based violations was 59%, i.e., approximately, 59% of the noncompliance determinations on health-based standards were correctly reported in SDWIS/FED. Only 29% of the monitoring and reporting violations were reported to SDWIS/FED. Ninety-two percent of the non-reported violations were due to CD errors. Approximately 89% of the monitoring and reporting violations data in SDWIS/FED were accurate. The overall data quality of the M/R violations was 27%, i.e., 27% of the noncompliance determinations on M/R were correctly reported to SDWIS/FED. Data Reliability Improvement Action Plan Appendix A is a joint plan of EPA and the Association of State Drinking Water Administrators to achieve a goal of 90 percent complete and accurate data for health-based violations, as well as improving the quality of monitoring and reporting violations and inventory data. Progress toward accomplishment of this goal will be measured annually and assessed in 2009. ------- Acknowledgements The following people contributed to this analysis and the preparation of this report: Project Lead Drinking Water Protection Division Office of Ground Water and Drinking Water Chuck Job, Branch Chief, Infrastructure Branch Leslie Cronkhite, Associate Branch Chief, Infrastructure Branch Principal Author Jade Freeman, Ph. D., Statistician, Infrastructure Branch Contributing Author Lee Kyle, IT Specialist, Infrastructure Branch Peer-review of the statistical methodology in this report has been provided by: Anthony Fristachi, Exposure Analyst National Center for Environmental Assessment U.S. EPA Office of Research and Development Tony R. Olsen, Ph. D., Environmental Statistician Western Ecology Division Health and Environmental Effects Research Laboratory U.S. EPA Office of Research and Development Arthur H. Grube, Ph.D., Senior Economist U.S. EPA Office of Pesticide Programs A. Richard Bolstein, Ph. D., Chairman (retired) Department of Applied & Engineering Statistics George Mason University John Gaughan, Ph. D., Associate Professor Epidemiology & Biostatistics Temple University School of Medicine Matthias Schonlau, Ph. D., Head Statistician Statistical Consulting Service The Rand Corporation Collaboration on the Data Reliability Improvement Plan - Association of State Drinking Water Administrators ------- Table of Contents 1. Introduction 1 1.1 Previous Activities 2 1.2 Regulatory Context 2 1.3 Changes in 2006 Analysis Method 3 2. Overview of Data Verification 3 3. Statistical Sample Design of Data Verification and Analytical Method 6 3.1 Selection of States 6 3.2 Selection of Systems with State 7 3.2.1 Sample Frame 7 3.2.2 Sample Design Data Verification 7 3.2.3 Sampling Procedure and Data Collection Activities 8 3.3 Analytical Method: Weighting and Estimation 9 4. Results from the Analysis of Data Verification 12 4.1 Analysis of Inventory Data 12 4.2 Analysis of Violation Data 13 4.2.1 Results from 2002-2004 Data Verifications 19 4.2.2 Results from 1999-2001 Data Verifications 23 4.2. Data Quality Estimates from 1999-2001 and 2002-2004 25 4.3 Analysis of Enforcement Data 31 5. Analysis of Timeliness of Violation Reporting in SDWIS/FED 31 6. Conclusion 34 7. Data Reliability Improvement Action Plan 35 8. Future Analysis of Data Reliability 35 Appendix A: 2006 Data Reliability Improvement Action Plan 37 Appendix B: Violations Addressed by Data Verification (DV) 45 Appendix C: Definition of Public Notification (PN) Tier 47 ------- 2006 Drinking Water Data Quality Assessment and Action Plan 1. Introduction The Safe Drinking Water Information System/Federal Version (SDWIS/FED) is the Environmental Protection Agency's (EPA) principal database for the national drinking water program. Its two major uses are (1) to help manage state and EPA programs and (2) to inform the public about the compliance status of public water systems (PWSs) and, indirectly, the safety of drinking water. The Federal government uses SDWIS/FED data for program management for 90 contaminants (as of 2005) regulated in drinking water at approximately 158,000 PWSs in 56 state and territorial programs and on Indian lands. Data received by EPA from states in SDWIS/FED includes a limited set of water system descriptive information, e.g., system type, population served, number of service connections, water source type), data on PWSs' violations of regulatory standards and process requirements, and information on state enforcement actions. These data, which EPA uses to assess compliance with the Safe Drinking Water Act (SDWA) and its implementing regulations, represent the only data states are currently required to report to EPA relative to drinking water safety. SDWIS/FED data can be accessed from the EPA web site at www.epa.gov/safewater. The utility of SDWIS/FED data for program management and public communication is highly dependent on the quality of data housed by the system. To assess this quality, EPA routinely conducts data verification (DV) audits in states and develops a summary evaluation every three years called Drinking Water Data Quality Assessment. DV auditors evaluate compliance data in state databases and hard copy files, monitoring plans, and other compliance information submitted by PWSs. The auditors also examine sanitary surveys, correspondence between the state and the water system, compliance determination decisions, and enforcement actions taken by the state. Based on this information, the auditors confirm whether all required information was submitted to and evaluated correctly by the state and whether required reporting elements were submitted to SDWIS/FED. This report includes (1) a description of the methodology used; (2) analyses of the data from the 2002 to 2004 Data Verifications, the most recent triennial evaluation period; and (3) analysis of the timeliness of reporting in SDWIS/FED. The report also describes a plan to address continued improvement in drinking water compliance data reported by states. This report is not intended for evaluating states' performance. This report is a tool to identify the gap between the states' violation data and SDWIS/FED and to provide a benchmark for the collaborative efforts between the states and EPA to bridge the gap and improve the data quality in SDWIS/FED. 1.1 Previous Activities In 1998, EPA launched a major effort to assess the quality of the drinking water data contained within SDWIS/FED to respond to concerns regarding incorrect violations in the database. EPA ------- enlisted the help of its stakeholders in designing the review, analyzing the results for data collected between 1996 and 1998, and recommending actions to improve drinking water data quality. The first Data Reliability Analysis of SDWIS/FED was published in October 2000. Findings of the first Data Reliability Analysis, which indicated that data quality needed improvement, were later updated by the second triennial assessment in 2003 (which included data collected between 1999 and 2001). Together, these assessments included comprehensive recommendations for EPA and state primacy agencies on quality improvements. The reports identified near-term actions that had already been taken or were still needed to improve data quality more immediately. To implement the recommendations, the states and EPA have conducted numerous activities and projects to improve data quality. Activities undertaken have included a) providing training for states; b) streamlining reporting to SDWIS/FED; c) making SDWIS error reporting correction more user-friendly; d) improving data verifications; e) following up with Regions on findings after data verifications; and f) encouraging states to annually notify water systems of sampling schedules. The Office of Ground Water and Drinking Water's (OGWDW) response to the data reliability issues identified in the 2003 report included a commitment to conduct analyses which would provide periodic data quality estimates (DQEs), and provide input into program activities and priorities necessary to improve the quality and reliability of the data. Part of that commitment was to publish the results of these analyses every three years. 1.2 Regulatory Context States make a large number of determinations regarding public water systems' compliance with drinking water regulations and violations of these regulations are a small fraction of these determinations. Since violations represent a small fraction of all the determinations states make, this result indicates the general safety of the nation's drinking water supply. For example, an analysis of nitrate maximum contaminant level compliance data for Oklahoma from 2004, showed only 3% of determinations resulted in violations. The data considered for evaluating quality, particularly accuracy and completeness, consist of the violations of health-based standards and monitoring and reporting requirements. These data are important for two reasons: (1) States and EPA program management relies on them to identify priorities and (2) states and EPA use them to inform the public about the safety of its drinking water. For federal program reporting purposes under the Government Performance Results Act (GPRA), violation data have become a major focus. EPA's 2006-2011 strategic plan specifies a clean and safe water goal of "90% of the population served by community water systems (CWS) meeting all health-based standards and treatments by 2011." A CWS which meets all health-based standards and treatments does not have a violation of the federal regulations for maximum contaminant levels (MCL) or treatment techniques. Due to the importance and emphasis on violation data, this data quality evaluation methodology addresses whether states correctly identify and report the violations that should have been reported to EPA according to state primacy agreements pursuant to Federal regulations. ------- 1.3 Changes in 2006 Analytical Method In this analysis of 2002 to 2004 DV data, EPA uses a different method for evaluating the data quality as described below. ! In the previous report, the DQEs were calculated without considering the sample design of DVs, i.e., the selection process by which the systems are included in the sample. In this assessment, the DQEs are calculated using statistical sample design-based unbiased estimation. The sample design and the estimation method for calculating sample statistics are described in detail in Section 3. ! The completeness measure of the violation data quality in the 2003 report represented the proportion of accurate data in SDWIS/FED out of all violation data that should be reported to SDWIS/FED. However, in this 2006 analysis, EPA redefined completeness of SDWIS/FED based on any violation data reported to SDWIS/FED regardless of the accuracy. ! Because of the changes in the estimation method described above and non-random selection of states for DV audits, the results from this analysis will not be compared to those from the 2000 or 2003 assessments. The statistical methodology for the analysis of DV data and the results are described in Sections 3 and 4. The additional analysis of the timeliness of reporting in SDWIS/FED is presented in Section 5. 2. Overview of Data Verification EPA's OGWDW routinely conducts DV audits, which evaluate the management of state drinking water programs. During the DVs, EPA examines state compliance decisions, data on the system compliance and violations in the state files, and the data required to be reported to SDWIS/FED. During the DVs, EPA reviews data submitted by PWSs, state files and databases, and SDWIS/FED, and compiles the results on the discrepancies among the data. States have several opportunities to respond to findings while DV personnel are on site, and provide additional clarifying information if available. States also review the DV draft report before the final report is produced, and their comments are incorporated into the report. EPA responds to every state comment, to explain in detail whether or not the state's additional information changed the finding. Until 2004, states were selected for DVs considering a number of factors; for example, the states that had not been audited for a long period of time were selected for DVs. Also, in order to minimize the burden on EPA Regions and states, OGWDW tried to maintain an even distribution of DV states across the regions2. Further, resource constraints have affected the selection of 2 EPA is divided into 10 regional offices, each of which is responsible for several states and territories. 3 ------- certain states since it is more costly to conduct DVs in some states than others. Between 2002 and 2004, EPA conducted DV audits in 38 states and reviewed data on drinking water system information, violations, and enforcement actions (Table 2-1). State files for a total of 2,658 PWSs were evaluated, of which 43% were community water systems (Table 2-2). The regulations addressed by the DVs and the compliance period reviewed for each regulation are shown in Table 2-3. Table 2-1: States Subject to Data Verifications from 2002-2004 Region 1 2 3 4 5 States CT, MA, RI, VT NJ, VI MD, PA, VA, WV AL, FL, KY, MS, NC ('02),NC('04), SC, TN IL, MI, MN, OH Region 6 7 8 9 10 States AR, NM, OK, TX IA, MO CO, SD, UT, WY AZ, CA, R9 Tribes AK, ID, WA Table 2-2: Number of Systems included in Data Verifications by Type and Size System Size Very Small (500 or fewer) Small (5 01 -3, 3 00) Medium (3,301-10,000) Large (10,001-100,000) Very Large (> 100,000) Total System Type3 CWS 572 277 119 135 44 1,147 NTNCWS 637 123 9 4 0 738 TNCWS 696 36 6 0 0 773 Total 1,905 436 134 139 44 2,658 3 Community water systems (CWSs) have at least 15 service connections or serve 25 or more of the same population year-round. Nontransient noncommunity water systems (NTNCWSs) regularly serve at least 25 of the same persons over 6 months per year. Transient noncommunity water systems (TNCWSs) provide water where people remain for periods less than 6 months. ------- Table 2-3: Period of Compliance for Rules Reviewed During 2002-2004 Data Verifications Rule4 Inventory Consumer Confidence Report (CCR) Total Coliform Rule (TCR), Surface Water Treatment Rule (SWTR), Total Trihalomethanes (TTHMs) Nitrates Phase II/V excluding nitrates Lead and Copper Rule (LCR), Interim Radionuclides Regulation Enforcement Public Notification Compliance Period Reviewed Most Recent Most Recent 12-Month Period Available in SDWIS/FED Most Recent Two Calendar Years 1999-2001 Most Recent Two Samples Time Period Related to Violation The review evaluated recent monitoring history to confirm that systems monitored according to the required frequency. For many rules, the review evaluated one year of information (Surface Water Treatment Rule, Total Trihalomethanes, Total Coliform Rule, and Consumer Confidence Report). The two most recent monitoring periods or review cycles were reviewed for some rules (interim radionuclides, Lead and Copper Rule, sanitary surveys). In other instances, the review covered a defined period, such as the most recent 3-year monitoring period for the Standard Monitoring Framework outlined in the Phase II/V Rule5 3. Statistical Sample Design of Data Verifications and Analytical Methods 3.1 Selection of States As mentioned in Section 2, the states are selected for DVs by considering the date of their last verification, resource constraints, and burden on EPA Regions and states. This selection 4 CWSs were reviewed for inventory and each of the rules listed in this table. NTNCWSs are not subject to CCR, TTHM monitoring, or the interim radionuclide regulation. TNCWSs are not subject to the requirements for CCR, SWTR, TTHM, Phase II/V Rule, or interim radionuclide regulation. 5 The Standardized Monitoring Framework synchronizes the monitoring schedules for the Phase II/V regulation for chemicals and the interim radionuclides rule across defined 3-year monitoring periods and 9-year monitoring cycles. ------- procedure is a non-probability sampling method. Because of the subjective nature of the selection process, non-probability samples add uncertainty when the sample is used to represent the population as a whole. The accuracy and precision of statements about the population can only be determined by subjective judgment. The selection procedure does not provide rules or methods for inferring sample results to the population, and such inferences are not valid because of bias in the selection process. When non-probability sampling is used, the results only pertain to the sample itself, and should not be used to make quantitative statements about any population including the population from which the sample was selected. Since the DV states were selected by a non-probability sampling method, the results from the analysis only pertain to the DV states audited between 2002 and 2004. Therefore, it is not appropriate to make quantitative statements or inferences about the entire nation from the selected states or comparisons with sampled state data quality results from the previous years. 3.2 Selection of Systems within States The DVs involve the evaluation of the states' compliance decisions and the agreement between the data in the state files and SDWIS/Fed. Since neither time nor resources allow a complete census of consistencies between SDWIS/Fed and state records, EPA uses a statistically random sample of systems that is drawn from the total number of systems in the state. EPA uses the results from the probability sample of systems within each state to estimate DV compliance results for each state. The probability sample is designed to provide estimates with acceptable precision while minimizing the burden on Regions and states imposed by visits from auditors. EPA plans to further reduce burden on Regions and States through use of electronic data comparison. 3.2.1 Sample Frame A sample frame is a list of all members of a population (in this case, the PWSs), from which a random sample of members will be drawn. In other words, the sample frame identifies the population elements from which the sample is chosen. The population elements listed on the frame are called the sampling units. Often these are groups or clusters of units rather than individual units. For each state, EPA developed a sample frame (i.e., a list of the current inventory of PWSs in the state) using SDWIS/FED, from which a random sample of PWSs was selected according to the sample design. 3.2.2 Sample Design of Data Verification The unit of analysis is the recorded action taken by systems, not the systems themselves. The sample design for DVs is a stratified random cluster sample. In stratified sampling, the population is divided into non-overlapping subpopulations called strata and a random sample is taken from each stratum. Stratification increases the precision of the estimates when the population is divided into subpopulations with similar characteristics within each stratum. In cluster sampling, groups, or "clusters," of units in the population are formed and a random 6 ------- sample of the clusters is selected. In other words, within a particular stratum, rather than selecting individual units, clusters of units are selected. In the analysis of DV data, systems are grouped into three strata according to the system type (CWS, TNCWS, and NTNCWS) within each state. In the first stage of the sampling process, systems are randomly selected within each stratum. In the second stage, each action taken by the system is recorded. In other words, the system represents a cluster of actions. A few examples of these actions are: ! System inventory information that must be reported to SDWIS/FED, ! Violations of federal regulations (states also may report violations of state regulations), ! Enforcement actions taken when violations occur. 3.2.3 Sampling Procedure and Data Collection Activity Once the current state inventory is retrieved from SDWIS/FED, the number of systems is counted by size category (see Table 2-2 for size categories). The sample size for each system type within a state is calculated based on the acceptable precision level for the estimates within margin of error in most states of plus or minus five percent, with a confidence level of 90 or 95 percent6. As discussed in section 3.2.2, the sample design is a stratified random cluster sample. The required sample size is given by: n'h=(nh\deff} where nhis the size of the sample (number of systems) required for stratum h (specific state and system type) if a simple random sample is drawn, deffis the design effect of the clustering and is assumed to be greater than 1.0. nh is given by where nh= Number of systems required for the sample in stratum /z, Nh = Total number of systems in the state in stratum h, 6 For the three DVs that were conducted during the last quarter of 2004, (TX, VA, and IL), the confidence level for CWSs was 95 percent and the margin of error was plus or minus seven percent. For NTNCWSs and TNCWSs, the confidence level was 90 percent and the margin of error was plus or minus seven percent. ------- Mh = Average number of actions in each system in stratum /z, Bh = Acceptable precision level (margin of error) for stratum /z, Za = The abscissa of the normal curve that corresponds to the confidence level, and Ph = Proportion of discrepancy in violation data between DV results and SDWIS/FED in stratum h (estimated from the previous assessment.) The design effect deff depends on the proportion of actions and decisions reviewed in the DV that are consistent with the data in SDWIS/FED. This proportion is unknown before the DV; therefore, the design effect is unknown. Lacking estimates of the design effect, the DV draws a simple random sample within each stratum7. Because it excludes the design effect, this sample may not be large enough to meet the precision targets. The sample size is calculated in an Excel spreadsheet. Samples are drawn from the frame according to the random numbers generated in an Excel spreadsheet produced by EPA. Using the Excel spreadsheet random number generator, a random sample of systems is developed for each stratum. Then, the DV auditors collect data from the state files for each sampled system on PWS inventory, violations, and enforcement. 3.3 Analytical Method: Weighting and Estimation In this analysis, sample weights are applied to the data to adjust for the unequal probability of selection of systems, i.e., the differences in the likelihood of some systems appearing in the sample. Weights, based on the probability of selection, allow unbiased representation of the population from an unequal probability sample. In the 2002-2004 DV data analysis, EPA estimated proportions related to consistency and accuracy among state files, the state database, and SDWIS/FED for inventory information, violation data, and enforcement actions. A few examples of such proportions are the proportion of inventory data that are consistent between SDWIS/FED and the state file, the proportion of violation data that are reported to SDWIS/FED, and the proportion of enforcement data that are consistent between SDWIS/FED and the state file. In this report, these proportions are presented /*, as percentages after being multiplied by 100. The proportion P is estimated by H nh mha p _ h=\ a=\ /?=! nh h=l a=l 7 Future DVs can estimate deff using data from previous DVs and can incorporate the design effect into the sample size calculation. ------- N where the sample weight Wh = —-, Nh= total number of clusters (systems) in stratum (system type) h, h=l,...,H, nh = number of sampled clusters in stratum h, mha = number of data elements (reviewed actions) from cluster a in stratum h, Ihap = 0 or 1 indicator for 0th data element from the system a and Stratum h, corresponding to a specific characteristic. A simple illustration of the calculation procedure is presented here. Suppose there are three strata (H=3), namely CWS, NTNCWS, and TNCWS in State A. Also, suppose that the total number of systems in each stratum (system type in State A) is 6, 9, and 15 (Ni=6, A/2=9, and Ns=l5\ respectively. Further, the number of sampled systems is 3 for each stratum (n1=n2=n3=?>} and there are three violations reviewed for accuracy from each sampled system (mha =3 for /z= 1,2,3 and cc=l,2,3). Let /te/?be 1 if the violation was accurately reported to SDWIS/FED or 0 if the violation was incorrectly reported. Suppose the compiled data are as shown in Table 3-1. The proportion of the violations with a discrepancy is estimated by the ratio of the sum of WhIha/3 and the sum of mha Wh,, which, in this case, is 55/90=0.6111 or 61.11%. Sampling errors are also estimated for the proportion estimates. Sampling errors are measures of the extent to which the values estimated from the sample (proportions in this analysis) differ from the values that would be obtained from the entire population. Since there are inherent differences among the members of any population, and data are not collected for the whole population, the exact values of these differences for a particular sample are unknown. To estimate the sampling errors, Taylor series expansion method is applied. The Taylor series expansion method is widely used to obtain robust variance estimators for complex survey data with stratified, cluster sampling with unequal probabilities of selection. The Taylor series obtains an approximation to a non-linear function. The Taylor series expansion method is applied to the variance of the proportion estimate as mhaf Vff (/ -P) , where Varh(P) = "&-»» N^±(eha -ej, eha = —^ , ------- and eL = — n,. . With the sampling error, the margin of error based on a 95 percent confidence i /*• interval is calculated as tdfom5^Var(P), where ^ao25 is the percentile of the t distribution with df number of degrees of freedom, which is the number of clusters minus the number of strata. Table 3-1: Example of Proportion Estimation Procedure Stratum index h 1 State A CWS 2 State A NTNCWS o J State A TNCWS E Total Number of Systems Nh 6 9 15 Number of sampled systems nh o J 3 o J Weight wh 2 3 5 System Index a I 2 3 1 2 o J 1 2 3 Total number of Violations mha 3 3 3 3 3 3 3 3 3 ™hawh 6 6 6 9 9 9 15 15 15 90 Violation Index P 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Violation correctly reported? Yes=l; No=0 •* ha/3 1 0 0 1 0 1 1 1 0 1 0 1 1 1 0 0 0 1 0 0 0 1 W 1 " h1 haft 2 0 0 2 0 2 2 2 0 o J 0 3 3 3 0 0 0 o J 0 5 5 5 0 5 5 0 5 55 In Section 4, various types of proportions of consistent, reported, and accurate data in SDWIS/FED are calculated. These proportion estimates represent the data quality measures of inventory, violation, and enforcement data in SDWIS/FED based on the DVs. 10 ------- 4. Results from the Analysis of Data Verifications This section presents various proportion estimates for inventory, violation, and enforcement data. Also, the margins of error are calculated for each point estimate. The margin of error is based on a 95% confidence interval, which includes the true proportion with 95% confidence. All calculations were performed using SAS®. 4.1 Analysis of Inventory Data States are required to report eight inventory data elements to SDWIS/FED for Grant Eligibility. These elements are 1) public water system identification number (PWS ID; 2) system status 3) water system type; 4) primary source water type; 5) population served; 6) number of service connections; 7) administrative contact address; and 8) water system name. The records for population or service connections are considered to be consistent when there is less than 10% difference between the two records. Because the inventory data are analyzed at the system level, the estimation approach can be based on a stratified random sampling. Then, the proportion of systems for which the inventory elements were reported to SDWIS/FED without discrepancies and its sampling error are estimated in Section 3.3 only at the cluster (system) level (or P=l). Inventory data quality of each data element is displayed in Table 4-1. The overall data quality of the eight inventory (water system identification) parameters assessed was 87%. In other words, 87% of systems from DV states between 2002 and 2004 had consistent data for all eight inventory data elements between their state files and SDWIS/FED database, or 13% of systems had at least one data element reported with a discrepancy. The highest discrepancy rate was for the administrative contact address element. 11 ------- Table 4-1: Percent PWSs reported Grant Eligibility Inventory Data to SDWIS/FED without Discrepancy Reported Data without Discrepancy by Individual Data Element pwsro 99.83% (+/-0.20%) System status (active or inactive) 97.26% (+/-1.15%) Water system type 98.21% (+1-0.72%) Primary source type 99.31% (+1-0.35%) Population served 97.11% (+/-0.71%) # service connection s 96.22% (+1-0.93%) Admin. contact address 95.97% (+/-1.32%) PWS name 99.87% (+1-0.09%) Reported All Inventory Data Element Data without Discrepancy 87.4% (+/. 1.94%) 4.2 Analysis of Violation Data Federal regulations specify the outcomes which states must report to EPA that result in noncompliance (violation) with (a) health-based drinking water quality maximum contaminant levels (MCL) and related requirements for their attainment; (b) specified monitoring and reporting (M/R) requirements necessary to determine whether sampling, testing and treatment process checking occurred as stipulated in Federal regulations; and (c) health-based treatment techniques (TT) and associated water system management processes for contaminants for which it is not technologically or economically feasible to set an MCL. Violation data are evaluated by comparing the following: 1) EPA's evaluation of the state's compliance decision on the violations; 2) the assigned violations in the state files; and 3) the violations reported to SDWIS/FED. All the findings from these comparisons can be grouped into one of the categories as shown in Table 4-2. The total number of violations identified during the 2002-2004 DV is summarized below: • Out of 198 TCR MCL violations, 163 violations were reported to SDWIS/FED. • Out of 48 other (non-TCR) MCL violations, 21 violations were reported to SDWIS/FED. • Out of 41 SWTR TT violations, 35 violations were reported to SDWIS/FED. • Out of 176 LCR TT violations, 5 violations were reported to SDWIS/FED. • Out of 5,069 M/R violations, 1,589 violations were reported to SDWIS/FED. The following measures of data quality of violation data in SDWIS/FED are evaluated: ! Completeness of SDWIS/FED describes how many violations that are required to be reported are being reported to SDWIS/FED, expressed as a percentage. This quantity is estimated based on the violations found by EPA and reported to SDWIS/FED (EPA=Yes and SDWIS/FED=Yes;l, 4 from Table 4-2) out of all violations found by EPA (EPA=Yes;l, 2, 3, 4 from Table 4-2). Non-reporting rate in SDWIS/FED describes how many violations that are required to 12 ------- be reported are not being reported to SDWIS/FED, expressed as a percentage. This percentage is the complement of the Completeness estimate, i.e., 100%-Completeness. Compliance Determination (CD) error rate in the non-reported violations describes how many non-reported violation data are the result of errors in states' compliance determination (i.e., a violation was not reported because the state did not identify it as a violation), expressed as a percentage. This quantity is estimated based on the violations found by EPA, but not reported to SDWIS/FED and where the assigned violation in the state file does not agree with EPA (EPA=Yes and SDWIS/FED=No and EPA^state File; 3 from Table 4-2) out of all violations found by EPA and not reported to SDWIS/FED (EPA=Yes and SDWIS/FED=No; 2 and 3 from Table 4-2). Data Flow (DF) error rate in the non-reported data describes how many non-reported violation data are as a result of reporting problems from state to SDWIS/FED, expressed as a percentage. This quantity is estimated based on the violations found by EPA, but not reported to SDWIS/FED and where the assigned violation in the state file confirmed by EPA (EPA=Yes, State File=Yes, and SDWIS/FED=No; 2a and 2b from Table 4-2) out of all violations found by EPA and not reported to SDWIS/FED (EPA=Yes and SDWIS/FED=No; 2 and 3 from Table 4-2). Accuracy of the data in SDWIS/FED describes how much of the violation data in SDWIS/FED are correct, expressed as a percentage. This quantity is estimated based on the violations found by EPA that agree with those reported to SDWIS/FED (EPA= SDWIS/FED; la, Id, and 4a from Table 4-2) out of all violations reported to SDWIS/FED (SDWIS/FED=Yes; 1, 4, 5, 6 from Table 4-2). Compliance Determination (CD) error rate in SDWIS/FED describes how much of the violations data in SDWIS/FED are incorrect violations types as a result of errors in the state=s compliance determination, expressed as a percentage. This quantity is estimated based on the violations found by EPA that disagree with those reported to SDWIS/FED, but which are missing in the state file (State File=No and EPA^SDWIS/FED from Table 4-2) or the violations found by EPA that disagree with those found by the state, which were then reported to SDWIS/FED as found by the state (EPA^State File= SDWIS/FED; Ic and 4b8 from Table 4-2) out of all violations reported to SDWIS/FED (SDWIS/FED=Yes; 1, 4, 5, 6 from Table 4-2). Data Flow (DF) error rate in SDWIS/FED describes how many of the reported violations data are incorrect violations types due to reporting problems from the state to SDWIS/FED, expressed as a percentage. This quantity is estimated based on the violations found in the state files and confirmed by EPA, but which disagree with those reported to SDWIS/FED (EPA= Stated SDWIS/FED from Table 4-2) or the violations found by EPA that disagree with those found by the state, which were then reported to 8 If DV auditors determined it to be a CD error 13 ------- SDWIS/FED (EPA^State File= SDWIS/FED;lb, le, and 4b9 from Table 4-2) out of all violations reported to SDWIS/FED (SDWIS/FED=Yes; 1, 4, 5, 6 from Table 4-2). ! False Positive rate of the violation data in SDWIS/FED describes how much of the reported violation data in SDWIS are, in fact, false violations, expressed as a percentage. This quantity is estimated based on the violations not confirmed by EPA but reported to SDWIS/FED (EPA=No and SDWIS/FED=Yes; 5 and 6 from Table 4-2) out of all violations reported to SDWIS/FED (SDWIS/FED=Yes; 1, 4, 5, 6 from Table 4-2). ! Overall Data Quality Estimate in SDWIS/FED measures how many noncompliance determinations are correctly reported in SDWIS/FED among all noncompliance determinations (that are either violations or false-positive violations). This quantity is estimated based on the violations confirmed by EPA and correctly reported to SDWIS/FED (EPA=SDWIS/FED; la, Id, and 4a from Table 4-2) out of all violations found by EPA or in the state files and SDWIS/FED (EPA=Yes or State File=Yes or SDWIS/FED=Yes; 1-6 from Table 4-2). When the false positive rate is 0%, this measure is the product of Completeness and Accuracy. Since the DV states were not randomly selected, the states were treated as a fixed stratification variable for this analysis. During the DVs, there were systems that did not have any violations in the sample and did not require any reporting to SDWIS/FED. Thus, the actual number of sample systems used for the calculations was less than the number of sampled systems for the DVs. Furthermore, sub-domain analysis by rules or system types resulted in single-cluster strata and/or single observation in some clusters. A single-cluster stratum does not contribute in the calculation of variance estimates, which may underestimate the sampling errors. Therefore, the strata were combined within each EPA region except for overall data quality estimations, where the strata were combined within each DV state. 9 If DV auditors determined it to be a DF error. 14 ------- Table 4-2: Violation data Comparison Categorization Was a violation found? Found By DV Auditors Yes Yes Found In state File Yes Yes Reported to SDWIS Yes No Were the assigned violations in agreement? la. DV Auditors=State File=SDWIS Ib. DV Auditors =State File^SDWIS Ic. DV Auditors ^State File=SDWIS Id. DV Auditors =SDWIS^State File le. DV Auditors ^State File^SDWIS 2a. DV Auditors = State File Example A TCR violation 3 100-2 1 1 was found in the state file, confirmed by DV auditors, and correctly reported to SDWIS/FED. A TCR violation record 3 100-21 as found in state file and confirmed by DV auditors; the violation was incorrectly reported to SDWIS/FED as 3100-222. A TCR violation record 3 100-22 was found in state file and reported to SDWIS/FED as 3100-22 when the violation should have been 3 100-21. A TCR violation 3100-21 was reported to SDWIS/FED and confirmed by DV auditors but the state issued 3 100-22 in the file. A TCR violation record 3 100-22 found in state file when it should have been 3 100-21 according to DV auditors, while the violation was incorrectly reported to SDWIS/FED as 3100-233 A TCR violation 3 100-2 1 was found in the state file and confirmed by DV auditors, but not reported to SDWIS/FED. Description No discrepancy in SDWIS/FED Data Flow error Compliance determination error No discrepancy in SDWIS/FED Compliance determination error by state and Data flow error between state file and SDWIS/FED Non-reporting; Data Flow error 1 Acute TCR MCL violation. 2 Monthly TCR MCL violation. 3 Routine Major TCR Monitoring Violation 15 ------- Table 4-2: Violation data Comparison Categorization Was a violation found? Found By DV Auditors Yes Yes No Found In state File No No Yes Reported to SDWIS No Yes Yes Were the assigned violations in agreement? 2b. DV Auditors ^State File 3.N/A 4a. DV Auditors =SDWIS/FED 4b. DV Auditors ^SDWIS/FED 5a. State File=SDWIS/FED 5b. State File^SDWIS/FED Example A TCR violation 3 100-22 was issued in the state file when it should have been 3 100-21, and the violation was not reported to SDWIS/FED. There should have been a TCR violation 3 100-21, but the state did not issue a violation and did not report to SDWIS/FED. There should have been a TCR violation 3 100-21 issued in the state file, but the notice of violation (NOV) was not found in the state file, even though the violation as correctly reported to SDWIS/FED. There should have been a TCR violation 3 100-21 issued in the state file, but NOV was not found in the state file, while the violation as incorrectly reported to SDWIS/FED as 3100-22. A TCR violation 3100-21 was issued in the state file and reported to SDWIS/FED, but it should not have been a violation. A TCR violation 3 100-2 1 was found in the state file, but the DV Auditors concluded that there should not have been a violation in the first place. In addition, the state reported a different TCR violation type (3 100-22) to SDWIS FED. Description Non-reporting; Compliance determination error by the state; Data Flow error between the state file and SDWIS/FED. Non-reporting; Compliance determination error No discrepancy in SDWIS/FED Compliance determination error by state and/or Data Flow between state File and SDWIS/FED False positive in SDWIS/FED False positive in SDWIS/FED 16 ------- Table 4-2: Violation data Comparison Categorization Was a violation found? Found By DV Auditors No Found In state File No Reported to SDWIS Yes Were the assigned violations in agreement? 6. N/A Example A TCR violation 3 100 was reported to SDWIS/FED, but DV Auditors concluded that there should not have been a violation in the first place. In addition, no evidence of a violation was found in the state files because the state rescinded a violation but has not removed it if from SDWIS/FED. Description False positive in SDWIS/FED 17 ------- 4.2.1 Results from 2002-2004 Data Verifications The proportion estimates and the sampling errors for the violation DQE by violation types are presented in Table 4-3. Eighty-one percent of the MCL and SWTR TT violations were reported to SDWIS/FED. Seventy-four percent of the non-reported violations were due to compliance determination errors and 26% were due to data flow errors. The reported violations in SDWIS/FED were accurate at 94%. Overall, the DQE of the violation data was 77%. This means that 77% of the noncompliance determinations on MCL/ SWTR TT standards were correctly reported in SDWIS/FED. Considering all health-based violations (MCL and TT violations, which include Lead and Copper TT), 62 percent of the violations were reported to SDWIS/FED. This means that 38% of the violations were not reported. Eighty-four (84) percent of the non-reported violations were due to compliance determination errors and 16% were due to data flow errors. The reported violations in SDWIS/FED were accurate at 94%. Overall, the DQE of the health-based violation data was 59%. This means that 59% of the noncompliance determinations on all health-based standards were correctly reported in SDWIS/FED. The quality of the health-based violations data was much lower than the MCL/SWTR TT data because of the quality of data associated with the Lead and Copper Rule. The data quality of the LCR TT violations was the lowest at 7.6%. For example, we found that out of 176 LCR TT violations, only 5 were reported to SDWIS/FED. The non-reporting was also mainly because of compliance determination errors. Specifically, 161 out of 171 violations were not recognized as violations when the violations had occurred. Twenty-nine percent of the M/R violations were reported to SDWIS/FED and 71% of the violations were not reported. Ninety-two percent of the non-reported violations were due to compliance determination errors and 8% were due to data flow errors. The reported M/R violations in SDWIS/FED were accurate at 88%. Overall, the DQE of the M/R violation data was 27%, i.e., 27% of the noncompliance determinations on M/R were correctly reported in SDWIS/FED. Table 4-3: Data Quality Estimates (DQE) by Violation Type 18 ------- % COMPLETENESS OF SDWIS/FED %NON-REPORTING ON SDWIS/FED %CD ERROR ON NON- REPORTED DATA % DF ERROR ON NON- REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY % COMPLETENESS OF SDWIS/FED %NON-REPORTING ON SDWIS/FED %CD ERROR ON NON- REPORTED DATA %DF ERROR ON NON- REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY TCR MCL 83.29% (+1-9.66%) 16.71% (+1-9.66%) 83.40% 15.07%) 16.60% 15.07%) 96.65% (+/-2.62%) 0% 0% 2.26% 80.95% MCL/SWTR TT 81.33% 18.67% 73.84% 16.30%) 26. 16% 16.30%) 94.12% 1.27% (+/. 1.440/0) 0% 4.60% (+/-2.90%) 77.21% (+/-8.99%) OTHER MCL 48.94% (+/-27.05%) 51.06% (+/-27.05%) 56.89% 43.12%) 43.11% 43.12%) 79.22% (+/-20.56%) 5.73% 0% 15.05% (+/-17.96%) 42.00% (+/-26.67%) LCRTT 7.6% (+/-7.52%) 92.40% (+/-7.52%) 91.76% 11.80) 8.24% 11.80) 100% 0 /o 0 /o 0% 7.6% (+/-7.52%) TOTAL MCL 78.42% (+/-9.39%) 21.58% (+/-9.39%) 73.84% 16.98%) 26.16% 16.98%) 94.91% (+1-3.00%) 0.57% 0% 4.52% (+/-2.90%) 75. (+/-9 16% Health-Based Violations 61. (+/-1 38. (+/-1 84.45% 10.35%) 69% 31% 15.55% 10.35%) 94.30% 1.24% 0% 2.79% 59. (+/-1 18% SWTR TT 94.89%(+/-8.03%) 5.11%(+/-8.03o/o) 73.87% 26.87% 43.38%) 43.38%) 91.07% 3.98%(+/-7.26) 0% 4.95% 86.63% M/R 29.02% 70.98% 92.03% 7.97% (+1-1.75%) (+1-1.75%) 88.35% 3.18% (+/. 1.540/0) 0.99% 7.48% 27.08% *CD=Compliance determination *DF=DataFlow Note: TCR MCL + Other MCL = Total MCL + SWTR TT = MCL/SWTR TT + LCR TT = Health-Based Violations. M/R = monitoring and reporting violations. 19 ------- In general, the majority of non-reported data were due to compliance determination errors, i.e., the states did not issue violations when violations had occurred. The violations had not been recognized, not recorded by states as violations, and consequently, not reported to SDWIS/FED. We need to further examine the cause of such compliance determination errors. These errors may be due to late reporting or rule interpretation discrepancies. Eliminating these errors will significantly increase the completeness of the data in SDWIS/FED. For example, 84% of the non-reported health-based based violations were due to compliance determination errors. If these errors did not occur, the completeness of health-based violations in SDWIS/FED would be at 94% (62%+38%x84%). Similarly, the completeness of M/R violations would also be at 94% (29%+71%x92%). The violation data are further evaluated by system type in Tables 4-4a-c. The DQEs of MCL/SWTR TT violations were not significantly different among the different system types. Likewise, the DQEs of health-based violations were not significantly different between CWSs and NTNCWSs. (The DQE of health-based violations for TNCWSs was not calculated since LCR TT data were not collected for TNCWSs.) Table 4-4a: MCL/SWTR TT Violations Data Quality Estimates (DQE) by Public Water System Type % COMPLETENESS OF SDWIS/FED %NON-REPORTING IN SDWIS/FED %CD ERROR ON NON-REPORTED DATA %DF ERROR ON NON-REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY cws 78.87% (+/-10.59%) 21.13% (+/-10.59%) 61.24% (+/-25.44%) 38.76% (+/-25.44%) 93.15% (+/-4.35%) 1.06%(+/-1.47%) 0% 5.80% (+/-4.13%) 74.37% (+/- 10.73%) NTNCWS 83.07% (+/- 12.32%) 16.93% (+/- 12.32%) 49.15% (+/-35.34%) 50.85% (+/-35.34%) 83.42% (+/-13.55%) 6.62% (+/- 10.75%) 0% 9.96% (+/-11.71%) 70.49% (+/-13.61%) TNCWS 83.25% (+/-15.53%) 16.75% (+/-15.53%) 96.21% (+/-6.14%) 3.79% (+/-6.14%) 97.37% (+/-3.68%) 0.29% (+/-0.60%) 0% 2.34% (+/-3.60%) 81.38% (+/-15.51%) *CD=Compliance determination *DF=DataFlow 20 ------- Table 4-4b: Health-Based Violations Data Quality Estimates (DQE) by Public Water System Type *CD= *DF= % COMPLETENESS OF SDWIS/FED %NON-REPORTING IN SDWIS/FED %CL> EKKOK ON O/TW T?»»n» nw NON-REPORTED /oDt tRROR ON DATA NON-REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY cws 53.39% (+/-12.17%) 46.61% (+/-12.17%) 78.45% 21.55% (+/-16.17%) (+/-16.17%) 93.47% (+/-4.15%) 1.01%(+/-1.40%) 0% 5.52% (+/-3.94%) 51.22% (+/-11.75%) NTNCWS 40.86% (+/-15.19%) 59.14% (+/-15.19%) 93.57% 6.57% (+/-10.15%) (+/-10.15%) 84.95% (+/-12.58%) 6.01% (+/-9.86%) 0% 9.04% (+/-10.76%) 36.67% (+/-13.02%) Compliance determination Data Flow Table 4-4c: M/R Violations Data Quality Estimates (DQE) by Public Water System Type % COMPLETENESS OF SDWIS/FED %NON-REPORTING IN SDWIS/FED %CD ERROR ON %DF ERROR ON NON-REPORTED NON-REPORTED DATA DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY CWS 20.26% (+/-3.42%) 79.74% (+/-3.42%) 91.62% 8.38% (+7-2.36) (+7-2.36) 82. 10% (+7-4.96%) 5.25% (+7-2.84%) 0.73% (+7-0.74%) 11.92% (+7-4. 42%) 18.38% (+7-3. 12) NTNCWS 22.65% (+7-5.26%) 77.35% (+7-5.26%) 87.05% 12.95% (+7-5.07%) (+7-5.07%) 81.39% (+7-7.93%) 4. 10% (+7-2. 7%) 1.45% (+7-2.15%) 13.05% (+7-7.88%) 20.51% (+7-4. 77%) TNCWS 45.89% (+7-5.87%) 54. 11% (+7-5.87%) 96.87% 3.13% (+7-2.31) (+/-2.31) 94.81% (+7-2.71%) 1.83% (+7-1.99%) 0.60% (+7- 0.59%) 2.75% (+7-1.77%) 44. 17% (+7-5.79%) *CD=Compliance determination *DF=DataFlow 21 ------- EPA has public notification (PN) requirements to ensure that the public is notified of violations in a timely manner. The PN requirements define three tiers of notification that are based on the public health significance of the violation, with tier 1 being the most significant (See Appendix C for the definition of PN tiers). The DQEs are also calculated by PN tier groups of violations in Table 4-5. Two-thirds of PN tier 1 violations were reported to SDWIS/FED. There were no significant differences in DQEs between PN tier 1 and PN tier 2. The DQEs for PN tier 3, which mostly consisted of M/R violations, were significantly lower than those for PN tier 1 and PN tier 2. Less than two-thirds of PN tier 2 violations were reported to SDWIS/FED and only 30% of PN tier 3 violations were reported to SDWIS/FED. In all PN tier groups, the data in SDWIS/FED were highly accurate. The overall data quality does not reflect false-positive violations in SDWIS/FED since they can not be categorized into a PN tier. Table 4-5: Data Quality (DQ) by PN Tier % COMPLETENESS OF SDWIS/FED %NON-REPORTING in SDWIS/FED %CD ERROR ON NON-REPORTED DATA %DF ERROR ON NON- REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED OVERALL DATA QUALITY PN Tier 1 66.97% (+/-22.37%) 33.03% (+/-22.37%) 32.77% (+/-23.93%) 67.23% (+/-23.93%) 100% 0% 0% 66.97% (+/-22.37%) PN Tier 2 62.40% (+/-10.50%) 37.60% (+/-10.50%) 87.2 !%(+/- 9.04%) 12.79% (+/- 9.04%) 98.65% (+/-1. 52%) 1.35%(+/-1.52%) 0% 61.56% (+7-10.59%) PN Tier 3 30.58% (+7-4.15%) 69.42% (+7-4.15%) 91.44% (+/- 1.87%) 8.56% (+7-1.87%) 95.46% (+7-1. 82) 3.46%(+/-1.68) 1.08%(+/-0.58) 29. (+7-4 19% 13%) *CD=Compliance determination *DF=DataFlow 4.2.2 Results from 1999-2001 Data Verifications This section presents DQEs from 1999 to 2001 data verification audits recalculated using the current statistical methodology described in Section 3.3. The states subjected to the DV audits during 1999-2001 are shown in Table 4-6. In the calculation, the DV results from Region 2 were not included since the state DV reports were not finalized for those states during the period of this analysis. The DQEs are included in Tables 4-7a and b. Because these estimates were computed based on a different set of DV states in a different data quality assessment time frame and with a different statistical sample design, it is not scientifically valid to make a national inference by comparing the results between Table 4.3a-b. However, the DQEs from those states that had repeated DV audits during both assessment periods are calculated and compared in the following section. 22 ------- Table 4-6: States Subject to Data Verifications from 1999-2001 Region 1 2 O 4 5 States MA, ME, NH NY, PR VA, PA, DE FL, GA, KY, MS, NC, SC, TN IL, IN, OH, WI Region 6 7 8 9 10 States AR, LA, NM, TX KS, MO, NE MT, ND, UT HI,NV AK, ID, OR Table 4-7b shows that 69% of the MCL and SWTR TT violations were reported to SDWIS/FED. Seventy-nine percent of the non-reported violations were due to compliance determination errors and 21% were due to data flow errors. The reported violations in SDWIS/FED were accurate at 91%. Overall, the DQE of the violation data was 64%. This means that 64% of the noncompliance determinations on MCL/ SWTR TT standards were correctly reported in SDWIS/FED. Table 4-7a: 1999-2001 Data Quality Estimates (DQE) for MCL and SWTR TT % COMPLETENESS OF SDWIS/FED %NON-REPORTING ON SDWIS/FED %CD ERROR ON NON- REPORTED DATA %DF ERROR ON NON- REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY TCR MCL 76.71% 23.29% 70.69% 29.31% 91.71% (+1-2.62%) 1.61% (+/-2.96%) 0.64% (+/-1.26) 6.05% (+/-5.24%) 71.35% OTHER MCL 63.33% (+/-26.99%) 36.67% (+/-26.99%) 68.55% 31.45% 63.99% (+7-46.70%) 36.01% (+7-46.70%) 0% 0% 40.52% (+7-28.53%) TOTAL MCL 74.81% 25.19% 70.25% 29.75% 88.54% 5.54% (+7-7.82%) 0.56% (+7-7.82%) 5.36% 67.14% *CD=Compliance determination *DF=DataFlow 23 ------- Table 4-7b: 1999-2001 Data Quality Estimates (DQE) for MCL/ SWTR TT and MR % COMPLETENESS OF SDWIS/FED %NON-REPORTING ON SDWIS/FED %CD ERROR ON NON- REPORTED DATA %DF ERROR ON NON- REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY SWTR TT 54.54% (+/-11.79) 45.46% (+/-11.75) 92.43% (+/-13.88) 7.57% (+/-13.88) 100% 0% 0% 0% 54.54% (+/-11 .75%) MCL/SWTR TT 69.39% (+1-9. 59%) 30.61% (+1-9. 79.05% 59%) 20.95% 90.84% (+1-1. ISO/ \ Jo /o) 4.43% 0.45% (+1-0.9%) 4.28% (+/-3 63. S (+1-9. 8%) 8% 12%) MR 34.86% (+1-4 59%) 65. 14% (+1-4 92.26% (+1-2.64%) 59%) 7.74% (+1-2.64%) 91.85% (+1-2 89%) 1.08% (+1-0.93%) 0.20% (+1-0.25%) 6.87% (+1-2 58%) 33.51% (+1-4 .5%) *CD=Compliance determination *DF=DataFlow Thirty-five percent of the M/R violations were reported to SDWIS/FED and 65% of the violations were not reported. Ninety-two percent of the non-reported violations were due to compliance determination errors and 8% were due to data flow errors. The reported M/R violations in SDWIS/FED were accurate at 92%. Overall, the DQE of the M/R violation data was 33%, i.e., 33% of the noncompliance determinations on M/R were correctly reported in SDWIS/FED. 4.2.3 Data Quality Estimates from 1999-2001 and 2002-2004 In order to evaluate the progress of the data quality improvement, the DQEs from the states where the DV audits were conducted during the data quality assessment period 1999-2001 and again during 2002-2004 were calculated for the purpose of comparison. The states with repeated DV audits for both assessment periods can be identified from Table 2-1 and Table 4-6 and are listed in Table 4-8. Since the LCR was not reviewed during the 1999-2001 DVs, the data from the LCR were excluded from 2002-2004 DV results for this evaluation The DQEs from these 18 states are presented in Tables 4.9 a and b, which include point estimates as well as the lower and upper bounds for 95% confidence intervals. In order to determine any significant differences (increase or decrease) in the DQEs, the two confidence intervals, defined by the lower and upper bounds as the end points of the interval, for the two DQEs should not overlap. Sixty-seven percent of MCL/SWTR TT violations with a 95% confidence interval (55%, 79%) were reported to SDWIS/FED during 1999-2001. Similarly, 80% of MCL/SWTR TT violations with a 95% confidence interval (68%, 92%) were reported to SDWIS/FED during 2002-2004. Since the confidence intervals overlap, there was no statistically 24 ------- significant increase in the reporting of violations for these 18 states from 1999-2001 to 2002- 2003. The overall data quality of MCL/SWTR TT violations was 64% with a 95% confidence interval (52%, 75%) during 1999-2001 and 75% with a 95% confidence interval (64%, 87%) during 2002-2004. Based on the confidence intervals, there was no statistically significant increase in the overall data quality of MCL/SWTR TT violations for these 18 states from 1999- 2001 to 2002-2003. On the other hand, approximately, 60% of SWTR TT violations with a 95% confidence interval (44%, 76%) were reported to SDWIS/FED during 1999-2001. During 2002-2004, 93% of SWTR TT violations with a 95% confidence interval (81%, 100%) were reported to SDWIS/FED. Since the confidence intervals do not overlap, there was a statistically significant increase in the reporting of violations for these 18 states from 1999-2001 to 2002-2003. However, the accuracy of SWTR TT has decreased significantly from 100% to 78%. The overall data quality of SWTR TT violations was 60% with a 95% confidence interval (44%, 73%) during 1999-2001 and 74% with a 95% confidence interval (54%, 94%) during 2002-2004. Therefore, there was no statistically significant increase in the overall data quality of SWTR TT violations for these 18 states from 1999-2001 to 2002-2003. In general, all the confidence intervals from the two periods overlap for all DQEs, except for SWTR TT violations completeness DQE. Therefore, there were no statistically significant increases or decreases in the DQEs for these states from 1999-2001 to 2002-2003 assessment. Table 4-8: States Subject to Data Verifications during 1999-2001 and 2002-2004 Region 1 2 3 4 5 States MA VA, PA FL, KY, MS, NC, SC, TN IL, OH Region 6 7 8 9 10 States AR, NM, TX KS,NE UT AK, ID 25 ------- Table 4-9a: Data Quality Estimates (DQE) for MCL from MA, VA, PA, FL, KY, MS, NC, SC, TN, IL, OH, AR, NM, TX, MO, UT, AK, ID During 1999-2001 and 2002-2004 Year % COMPLETENESS OF SDWIS/FED %NON-REPORTING ON SDWIS/FED %DF %CD ERROR ERROR ON ONNON. NON- REPORTED REPORTED DATA DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER TCR MCL 1999-2001 70.13% 56.22% 84.03% 29.87% 15.97% 43.78% 74.90% 25. 10% 51.93% 2.13% 97.87% 48.07% 90.49% 81.28% 99.71% 3.04% 0% 8.99% 0% 0% 0% 2002-2004 82.17% 67.46% 96.88% 17.83% 3.12% 32.54% 83.54% 16.46% 61.95% 0% 100% 38.05% 96.01% 91.88% 100% 0% 0% 0% 0% 0% 0% Other MCL 1999-2001 74.57% 48.25% 100% 25.43% 0% 51.75% 58.02% 41.98% 12.32% 0% 100% 87.68% 100% 100% 100% 0% 0% 0% 0% 0% 0% 2002-2004 60.74% 30.84% 90.64% 39.26% 9.36% 69.16% 21.14% 78.86% 0% 49.66% 50.34% 100% 82.84% 59.65% 100% 0% 0% 0% 0% 0% 0% Total MCL 1999-2001 70.63% 57.87% 83.39% 29.37% 16.61% 42.13% 73.25% 26.75% 51.66% 5.16% 94.85% 48.34% 91.56% 83.40% 99.72% 2.70% 0% 7.98% 0% 0% 0% 2002-2004 79.17% 65.95% 92.39% 20.83% 7.61% 34.05% 67.09% 32.91% 38.77% 4.59% 95.41% 61.23% 94.40% 89.88% 98.920% 0% 0% 0% 0% 0% 0% 26 ------- Table 4-9a: Data Quality Estimates (DQE) for MCL from MA, VA, PA, FL, KY, MS, NC, SC, TN, IL, OH, AR, NM, TX, MO, UT, AK, ID During 1999-2001 and 2002-2004 Year %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND TCR MCL 1999-2001 6.47% 0% 13.54% 64.71% 50.90% 78.52% 2002-2004 3.99% 0% 8.12% 79.46% 65.09% 93.82% Other MCL 1999-2001 0% 0% 0% 74.57% 48.25% 100% 2002-2004 17.16% 0% 40.35% 53.95% 24.49% 83.42% Total MCL 1999-2001 5.74% 0% 12.00% 65.78% 53.02% 78.54% 2002-2004 5.60% 1.08% 10.12% 75.62% 62.79% 88.46% *CD=Compliance determination *DF=DataFlow 27 ------- Table 4-9b: Data Quality Estimates (DQE) for SWTR TT, MCL/SWTR TT, and MR from MA, VA, PA, FL, KY, MS, NC, SC, TN, IL, OH, AR, NM, TX, MO, UT, AK, ID During 1999-2001 and 2002-2004 Year % COMPLETENESS OF SDWIS/FED %NON-REPORTING ON SDWIS/FED %CD ERROR ON NON-REPORTED DATA %DF ERROR ON NON-REPORTED DATA %ACCURACY OF DATA IN SDWIS/FED %CD ERROR WITH DATA IN SDWIS/FED %DF ERROR WITH DATA IN SDWIS/FED POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND SWTR TT 1999-2001 59.95% 43.68% 76.21% 40.05% 43.68% 56.32% 93.23% 6.75% 76.38% 0% 100% 23.62% 100% 100% 100% 0% 0% 0% 0% 0% 0% 2002-2004 93.30% 80.83% 100% 6.70% 0% 19.16% 89.09% 10.91% 62.11% 0% 100% 37.89% 78.78% 60.28% 97.28% 17.01% 0% 39.27% 0% 0% 0% MCL/SWTR TT 1999-2001 66.73% 54.74% 78.72% 33.27% 21.28% 45.26% 82.04% 17.96% 63.88% 0% 100% 36.12% 94.22% 88.26% 100% 1.85% 0% 5.53% 0% 0% 0% 2002-2004 80.36% 68. 10% 92.62% 19.64% 7.38% 31.90% 67.72% 32.28% 40.40% 4.95% 95.05% 59.60% 92.90% 88.24% 97.55% 1.63% 0% 4.11% 0% 0% 0% MR 1999-2001 37.74% 31.69% 43.79% 62.26% 56.21% 68.31% 91.63% 8.37% 88.91% 5.66% 94.34% 11.09% 92.01% 88.90% 95.11% 1.44% 0.13% 2.75% 0.28% 0% 0.63% 2002-2004 28.06% 23.82% 32.31% 71.94% 67.69% 76.18% 93.59% 6.41% 91.60% 4.43% 95.57% 8.40% 92.2% 89.33% 95.08% 2.01% 0.37% 3.66% 0.64% 0.05% 1.23% 28 ------- Table 4-9b: Data Quality Estimates (DQE) for SWTR TT, MCL/SWTR TT, and MR from MA, VA, PA, FL, KY, MS, NC, SC, TN, IL, OH, AR, NM, TX, MO, UT, AK, ID During 1999-2001 and 2002-2004 Year %FALSE POSITIVE DATA IN SDWIS/FED OVERALL DATA QUALITY POINT ESTIMATE LOWER BOUND UPPER BOUND POINT ESTIMATE LOWER BOUND UPPER BOUND SWTR TT 1999-2001 0% 0% 0% 59.95% 43.68% 76.21% 2002-2004 4.21% 0% 10.00% 73.71% 53.90% 93.52% MCL/SWTR TT 1999-2001 3.93% 0% 8.44% 63.71% 52.70% 74.71% 2002-2004 5.57% 0.35% 9.58% 75.46% 63.60% 87.33% MR 1999-2001 6.27% 3.57% 8.98% 36.14% 30.19% 42.08% 2002-2004 5.15% 2.89% 7.40% 26.87% 22.71% 31.03% *CD=Compliance determination *DF=DataFlow 29 ------- 4.3 Analysis of Enforcement Data Federal regulations indicate the conditions under which enforcement actions will be taken with a PWS to ensure public health protection if the system is in violation of the Federal-State drinking water program. States must report a subset of these actions to EPA. EPA reports these data for situations where EPA is the enforcement authority because the state has decided not to obtain approval to implement the federal program (e.g. Wyoming, the District of Columbia and on Indian lands). Enforcement data reported to SDWIS were compared to those found in the state files during the DV. The proportion of enforcement data in the state files that were in agreement with those reported to SDWIS/FED (la, Ic, and 5a from Table 4-2) were estimated as described in Section 3.3 and presented in Table 4-8. The overall DQE for enforcement data was 86%. Table 4-8: Proportion Estimates of Enforcement Data in State Files reported to SDWIS/FED without discrepancy PWS Type Proportion Estimate cws 73.14% (+/-9.65%) NTNCWS 76.25% (+7-6.88%) TNCWS 94.92% (+7-2.72%) Overall 85.97% (+7-3.62) 5. Analysis of Timeliness of Violation Reporting in SDWIS/FED In this section, the results from the analysis of the data in SDWIS/FED are presented. This analysis evaluates the timeliness of violations based on compliance period end date, which provides a benchmark for comparison between fiscal years. Violations are due to be reported by the end of the following quarter after awareness or the compliance period end date. Timeliness is calculated as the ratio of the number of violations reported on time and the baseline number of violations that should be reported, i.e., Number of ViolationsReported on Time Timeliness = Numberof ViolationsReported Baseline where on time is defined as by the end of the following quarter after the compliance period end date and baseline is a point in time in the future (in this case, between 4 and 7 quarters after violations are due to be reported). Basically, the Timeliness is the proportion of violations that were eventually reported to SDWIS/FED on time. To compute the timeliness, the violation data were extracted from archived SDWIS/FED databases for each of five fiscal years (2000-2004). The violations were then grouped by PWS 30 ------- ID, fiscal year, quarter, violation code, contaminant code, and basic PWS attributes, and the on- time and baseline violations were summed. Table 5-1 shows the database extracted for the analysis. The database does not include LCR or other violations with open-ended compliance period end dates. For these violations, the compliance period end date is open until the system returns to compliance. Table 5-1: SDWIS/FED Database Analyzed for Timeliness FY2000 on time: OOQ1 OOQ2 OOQ3 OOQ4 FY2000 baseline: Violations with end dates between: 10/1/99 1/1/00 4/1/00 7/1/00 and and and and 12/31/99 3/3100 6/30/00 9/30/00 01Q4 tables, Archived 1/02 Archive Date 4/00 7/00 10/00 1/01 FY2001 on time: 01Q1 01Q2 01Q3 01Q4 FY2001 baseline: Violations with end dates between: 10/1/00 1/1/01 4/1/01 7/1/01 and and and and 12/31/00 3/31/01 6/30/01 9/30/01 02Q4 tables, Archived 1/03 Archive Date 4/01 7/01 10/01 1/02 FY2002 on time: 02Q1 02Q2 02Q3 02Q4 FY2002 baseline: Violations with end dates between: 10/1/01 1/1/02 4/1/02 7/1/02 and and and and 12/31/01 3/3102 6/30/02 9/30/02 03Q4 tables, Archived 1/04 Archive Date 4/02 7/02 10/02 1/03 FY2003 on time: 03Q1 03 Q2 03Q3 03 Q4 FY2003 baseline: Violations with end dates between: 10/1/02 1/1/03 4/1/03 7/1/03 and and and and 12/31/02 3/3103 6/30/03 9/30/03 04Q4 tables, Archived on 1/05 Archive Date 4/03 7/03 10/03 1/04 FY2004 on time: 04Q1 04Q2 04Q3 04Q4 FY2004 baseline: Violations with end dates between: 10/1/03 1/1/04 4/1/04 7/1/04 and and and and 12/31/03 3/31/04 6/30/04 9/30/04 05Q4 tables, Archived on 1/06 Archive Date 4/04 7/04 10/04 1/05 31 ------- Table 5-2: Violation Reporting Timeliness to SDWIS/FED by Violation Type Fiscal Year 2000 2001 2002 2003 2004 Number of Violations Reported on Time TCR MCL Other MCL SWTR TT Health-Based Violations13 M/R 7,738 727 932 9,397 49,782 8,114 652 918 9,684 50,868 7,977 771 1,045 9,793 55,425 7,902 1,106 774 9,831 61,967 7,421 1,273 540 9,308 32,742 Number of Violations Reported for Baseline TCR MCL Other MCL SWTR TT Health-Based Violations13 M/R 11,445 1,344 1,574 14,636 93,231 10,963 1,315 1,627 13,905 111,397 10,795 1,844 1,585 14,369 121,819 10,821 2,573 1,252 14,996 106,664 10,510 3,716 932 15,513 104,427 Percent Timeliness TCR MCL Other MCL SWTR TT Health-Based Violations13 M/R 68% 54% 59% 65% 53% 74% 50% 56% 70% 46% 74% 42% 66% 68% 45% 73% 43% 62% 66% 58% 71% 34% 58% 60% 31% Table 5-2 shows the computed timeliness of the reported violations in SDWIS/FED. Late reporting can have an impact on the reliability of SDWIS/FED in informing the public and stakeholders about the quality of their drinking water. Further, it hinders our effort to assess the public health risk and address the violations with enforcement actions in a timely manner. In 2004, 60% of the health-based violations were reported on time, while only 31% of the M/R violations were reported on time. Note that there is a 27% decline in timeliness for the M/R violations from 2003. Additional information (in the form of pivot tables) is available from EPA upon request that provides additional details on the timeliness in which violations are reported across several additional attributes. Additional findings based on this information are the following: Timeliness of reported health-based violations was similar across water system types. Monitoring violations for TNCWSs was highest at 58%, and lowest for NTNCWSs at 33%. 13 These heath-based violations do not include Lead and Copper Treatment Technology (LCR TT) violations because of they have open-ended compliance period end dates. 32 ------- ! Timeliness was similar across quarters. ! Timeliness generally decreased as system size decreased. ! It was difficult to evaluate the timeliness of reported violations for new rules, because many of the violations in these rules have open-ended compliance period end dates. 6. Conclusion For the 38 states evaluated from 2002 to 2004, most of the reported violations in SDWIS/FED were accurate at 90%. Approximately 81% of the MCL and SWTR TT violations were reported to SDWIS/FED. Sixty-two percent of the health-based violations (including LCR TT violations) and 39% of the monitoring and reporting violations were reported. Non-reporting was mostly attributable to the fact that states did not issue violations when violations had occurred. In other words, the violations were not recognized, not recorded by the states as violations, and consequently, not reported to SDWIS/FED. Eighty-four percent of non-reported health-based violations and 92% of non-reported M/R violations were due to compliance determination errors. EPA considers non-reported violations to be a serious problem that could have public health implications at many levels. The information and the analyses based on such incomplete data in SDWIS/FED compromises our ability to determine if and when we need to take action against non-compliant systems, to oversee and evaluate the effectiveness of state and federal programs and regulations, to alleviate burden on states, and to determine whether new regulations are needed to further protect public health. Further, our response to public inquiries and preparing national reports on the quality of drinking water in a thorough and complete manner will be severely limited. Some of the discrepancies between the number of violations that should have appeared in SDWIS/FED and those found by the DV auditors could have included differences in rule interpretation in light of the flexibility provided to states in implementing rules under state primacy agreements. The state implementation of rules must be at least as stringent as the Federal regulations, but can differ in substantial respects within a reasonable scope of the regulation. It is critical that EPA and the states continue to work together toward reducing non- reporting, reporting errors, and late reporting of violations. Additional findings included the DQEs of health- based violations were not significantly different between CWSs and NTNCWSs. The DQEs on M/R violations for TNCWSs were significantly higher than those for CWSs and NTNCWSs. Further, the DQEs from 18 states where the DV audits were conducted during the data quality assessment period of 1999-2001 and again during 2002-2004 were calculated for the purpose of comparison. For those states, 67% of MCL/SWTR TT violations with a 95% confidence interval (55%, 79%) were reported to SDWIS/FED during 1999-2001. Similarly, 80% of MCL/SWTR 33 ------- TT violations with a 95% confidence interval (68%, 92%) were reported to SDWIS/FED during 2002-2004. Since the confidence intervals overlap, there was no statistically significant increase in the reporting of violations for these 18 states from 1999-2001 to 2002-2003. The overall data quality of MCL/SWTR TT violations was 64% with a 95% confidence interval (52%, 75%) during 1999-2001 and 75% with a 95% confidence interval (64%, 87%) during 2002-2004. Based on the confidence intervals, there was no statistically significant increase in the overall data quality of MCL/SWTR TT violations for these 18 states from 1999-2001 to 2002-2003. Finally, 60% of MCL/SWTR TT violations were reported on time and approximately 30% of the MR violations were reported on time to SDWIS/FED in 2004. 7. Data Reliability Improvement Action Plan Based on this analysis and on the results of previous efforts, EPA, working with its state co- regulators through the Association of State Drinking Water Administrators, has developed a Data Reliability Improvement Action Plan ("the plan") designed to achieve a data quality goal of 90 percent complete and accurate data for health-based violation reporting. The plan covers the years 2007 through 2009 and addresses improving data quality for monitoring and reporting violations and inventory (water systems' facilities) data. Principally, the plan focuses on actions that EPA and states can take to address compliance determination issues and thereby improve violation data quality. Progress toward accomplishment of the data quality goal will be measured annually and assessed in 2009. The plan appears in Appendix A. 8. Future Analysis of Data Reliability Several factors will change both the process and the results of the data verifications and the data quality calculation for drinking water data. In the near term, the selection of states for DVs will be based on probability sampling from 2005. Specifically, the selection of states for the data verifications from 2005 to 2007 will be based on a probability sampling method, with every state being selected in a 4-year time frame. This will allow the data quality to be assessed nationally for rolling multiple-years. In the longer term (2008 and beyond), EPA is evaluating the feasibility of electronic data verification (ED V), which would collect and evaluate compliance sample results of regulated contaminants electronically for all CWSs. EPA believes that the most cost-effective and complete process of evaluating data quality in the long term may be via the EDV process. In each state, we can evaluate the data once every one or two years through the compliance determination processes recorded in the SDWIS/STATE software. SDWIS/STATE is already designed and developed for states to manage their drinking water programs. The advantages to this approach are that the software already exists and all compliance determinations are available for evaluation. The current DV process relies on a sample of systems, and due to the inherently small number of large CWSs, the large CWSs are not well represented in the samples. The EDV will allow us to use all systems instead of relying on a sample from a DV. Additionally, the drinking water administrators in decentralized states can 34 ------- have hands-on data in one location instead of going to regional drinking water offices. All states using SDWIS/STATE will have the capability to calculate data quality in near real time and take action on issues as they arise. Furthermore, EDV will allow states and EPA to reduce and reallocate time and resources spent on manual data reviews while providing a more complete picture of program implementation and leading to the identification of opportunities for program improvement. 35 ------- Appendix A: 2006 Data Reliability Improvement Action Plan Introduction The past two Data Reliability Improvement Action Plans have drawn attention to actions that can be taken to improve data quality and the usability of SDWIS/FED data. While they significantly focused on information system improvements and general activities that should improve data quality, this 2006 plan builds on current findings for more recent data and capabilities not previously developed that concentrate on specific factors that could result in real-time data quality improvements. The philosophy of past data reliability improvement action plans largely was built on the concept that we must improve the software of the information system, SDWIS/FED. This has largely been done, with the last remaining step to be completed in 2007 with SDWIS/STATE Web Release 2. This release fully web-enables SDWIS/STATE, reducing resources needed to implement the software by states and reducing the complexity of data entry with fewer data entry screens and more drop-down lists. This 2007-2009 Data Reliability Improvement Action Plan primarily focuses on the actions of those responsible for determining which data will be entered and how that will occur. The largest challenge is ensuring that all data reflecting determinations of violations are entered into the SDWIS/FED, the federal data base. As indicated in this report, EPA found that 77 percent of all data on MCL/SWTR TT violations in SDWIS/FED was complete and accurate. This is not satisfactory, has been the focus of media attention concerning the reliability of the data used to make decisions about the most important public health program in the nation for safeguarding its water supply to its citizens, and needs to be improved. To make a larger step forward over the next three years (2007-2009), EPA and ASDWA in October 2006 set a data quality goal of 90 percent (completeness and accuracy) for future compliance reporting of health- based violations in the federal database, SDWIS/FED. This plan is principally focused on achieving that goal. Based on past analyses of state-specific results, eleven states have achieved this level of data quality for health-based violations, indicating that it is achievable. The plan also addresses improving data quality of monitoring and reporting violations and inventory data; that is, improving the quality of all data used and supporting the state and national drinking water programs with the highest quality data. The Plan is presented in a series of issues and plan elements with assigned responsibility and timeframes. Issues (1) Modify Data Verification Selection Processes: EPA continues to conduct triennial data quality analysis and to follow up on data verification by working with states to address identified differences and discrepancies from federal regulations. In the 2005-2007 timeframe, EPA implemented probability-based selection (random selection) of states for data verification to enhance the representativeness of the data 36 ------- at the national level. The data quality results for these data will provide an indication of the extent of achievement of the 90 percent data quality goal set by EPA and ASDWA. Consistent with the August 2005 recommendation of the special ASDWA- EPA Data Quality Subcommittee, the quality of results in the national database, SDWIS/FED will be displayed by rule and significance (i.e., public notification tier). (2) Consider All Compliance Determinations in State Data Quality: In evaluating SDWIS/FED data quality, EPA only considers data in the national database and not in the state databases reflecting all compliance determinations resulting from the states' position as the primary enforcement authority for the federal program. EPA will develop an "electronic data verification" (eDV) tool to enable states to track any discrepancies of their compliance determinations relative to federal regulations and correct these discrepancies prior to data quality calculations and allow calculation of data quality relative to all compliance determinations. EPA should augment its SDWIS/STATE software to allow states to obtain management reports on any discrepancies in state compliance determination in near-real time to allow for the possibility of improving health-based response and data quality. (3) Use Electronic Scheduling and Lab Reporting: Using automated monitoring requirements/schedule generators and incorporating electronic reporting from laboratories to states would improve the quality of data that states receive from water systems. Anecdotal information suggests that when states issue automated monitoring schedules to water systems, on-time monitoring and reporting by those systems improves. This step increases the probability that all data will be used by the state in determining compliance with public health drinking water standards and that appropriate determinations are made. Additionally, when states receive monitoring data electronically, data entry errors are reduced. This second step helps ensure that the correct data are used in the decision process for determining compliance. Water system or laboratory submission of data to states must comply with the Cross-Media Electronic Reporting Rule (CROMERR), compliance with which will need to be considered in any effort to facilitate electronic reporting from laboratories to states. (4) Consider Data Management early in Rule Development: Data management concerns should be considered during every phase of the rule development process, beginning with the initial rule concept. If this does not occur, rules with complex reporting requirements may emerge, overwhelming the capability of states to implement them and shifting valuable resources from taking actions on real health needs to reporting. Data management using electronic reporting can simplify handling data but does not necessarily and always mean a simpler process for protecting health and should not be used as a "crutch" for creating complex rules instead of focusing on simpler, direct key health management objectives for drinking water supply protection. Streamlined approaches to data management in states' business processes must be considered in rule development. (5) Improve State Capability in Compliance Determination: Data reliability, as reported in the Triennial Data Reliability Report, appears to have marginally improved, even 37 ------- though this is not statistically significant. State compliance determinations play an integral role in determining the reliability of the data on violations reported to the national database, SDWIS/FED. Incorrect compliance determinations, when they do occur, are due in part to the complexity and number of drinking water rules. The need for training to facilitate correct determinations is critical, especially with the changing nature of state staff available to implement the drinking water regulations. Incorrect compliance determinations are a serious matter as they may affect public health. (6) Complete SDWIS Modernization: EPA should continue implementation of the OGWDW Information Strategic Plan to modernize and web-enable the SDWIS/STATE to take advantage of newer technologies and system platforms. This action will save state resources by being able to enter data from anywhere in the state that is web-accessible and reduce data entry time with fewer screens and more drop down lists. State deployment of SDWIS/STATE Web Releases 1 and 2 will take time because of different schedules and variation of available resources among states. For states using SDWIS/STATE, full use of all SDWIS/STATE modules and regular update of inventory data will facilitate improved data quality. (7) Evaluate Low Timeliness of Violation Reporting: Violation reporting timeliness is low and not improving. Because the states have been taking steps to improve data quality and the calculation of data quality considers results which may be 3 to 5 years old in some cases, estimates of reporting timeliness may not be current. EPA should use the reported results from the first year of using the modernized data flow to re- evaluate timeliness for each rule, as recommended by the Data Sharing Committee. (8) Update Out-of-date and missing Inventory Data: Key features of inventory data useful in examining compliance and for determining regulatory needs are not routinely updated and reported. For example, consecutive systems or treatment objectives for recent rules are inventory data that are not reported for each system to which they apply. As a result, EPA cannot conduct analyses of national capability to treat certain contaminants. Inventory data for grant eligibility are routinely reported for the purposes of ensuring adequate data for receiving grants. 38 ------- 2006 Drinking Water Data Reliability Improvement Action Plan Element (1) Modify Data Verification Selection Processes (2) Evaluate All Compliance Determinations Element Description EPA will calculate data quality with data from 2005-2007 data verification from the random selection of states and display by rule and public notification tier. Develop a tool to allow states to identify compliance determination discrepancies from federal regulations more easily. Activity (a) EPA will calculate data quality with data from 2005-2007 from the random selection of states and display by rule and public notification tier. (a) EPA will develop an "electronic data verification" (eDV) tool to enable states to track any discrepancies of their compliance determinations relative to federal regulations and correct these discrepancies prior to data quality calculations and allow calculation of data quality relative to all compliance determinations. (b) States will agree to provide Responsibility & Actions (1) EPA - Calculate national estimate of data quality for health-based violations and separately for monitoring and reporting violations and inventory data for 2005-2007 (2) EPA - Calculate state estimates of data quality for all health based compliance determinations and separately for all monitoring and reporting compliance determinations for 2005- 2007 (3) EPA - Report data quality by rule and public notification tier for 2005- 2007 using data verification results (1) EPA & States - Complete pilot test of eDV tool (1) EPA & States - EPA Completion (1) December 2008 (2) December 2008 (3) December 2008 (1) December 2008 (1) July 2007 Status 39 ------- Element (3) Use Automated Scheduling and Electronic Lab Reporting Element Description States and EPA will take steps to more fully utilize automated technology to improve reporting of water system data to states. Activity contaminant occurrence and monitoring schedules data to EPA to allow the Agency to conduct electronic data verification for all rules across all water systems in a state, retrospectively, on an annual basis, but not less frequently than every three years, to allow regular assessment of data quality and to identify opportunities for state program improvement. (a) States will utilize automated scheduling of water system monitoring to the extent possible and report on progress in on-time monitoring and reporting by water systems at the ASDWA-EPA Data Management Users Conference. (b) EPA will develop an electronic tool to allow laboratories testing drinking water samples to report to states ("lab-to-state") reporting tool, rather than submitting paper Responsibility & Actions request and states provide contaminant occurrence and schedules data for all water systems from at least nine states for testing eDV tool (2) EPA & States - complete data sharing agreements for contaminant occurrence and monitoring schedule data (3) EPA & States - EPA will receive state contaminant occurrence and schedules data for all water systems from all states through completion of a data sharing agreement (1) States - Report progress on State automated scheduling of system monitoring (1) EPA - develop "lab-to- state" reporting tool Completion (2) December 2008 (3) Annually beginning 2009 (1) Annually May 2007 May 2009 May 2010 (1) March 2007 Status (3) (b) (1) Done 40 ------- Element (4) Consider Data Management early in Rule Development Element Description Implement a process to address data management in rule development. Activity reports on monitoring results. (c) The EPA Office of Ground Water and Drinking Water will work with the EPA Office of Environmental Information (OEI) to incorporate CROMERR requirements in the "lab-to-state" reporting tool, work toward OEI approval of the tool. (d) States not using the EPA developed "Lab-to-State" electronic reporting tool will identify and use a similar tool (a) EPA information systems staff will participate in early rule development through preparation of issue papers on data management for each future rule and share these papers for comment with states through the ASDWA-EPA Data Management Steering Committee. (b) States will identify staff and participate in discussions of future rules to ensure that business processes are considered. (c) ASDWA and EPA will work toward agreement on a mutual generic timeline for considering data management in rule Responsibility & Actions (1) EPA- review and approval of CROMERR compliance of "lab-to-state" reporting tool (1) States will replace paper lab reports for compliance monitoring with automated lab reporting (1) EPA & States - information systems staff participate in rule development (1) State - staff identified for participation in rules to consider state business processes (1) ASDWA and EPA - reach agreement on generic timeline for including data management in rule Completion (1) August 2007 (1) Ongoing through December 2009 (1) Ongoing (1) Ongoing (1) December 2007 Status (3) (c) (1) Done (4) (a) (1) Ongoing; Completed issue paper on TCR/Distribution System reporting for DSMC input (4) (b) (1) Ongoing 41 ------- Element (5) Improve State Capability in Compliance Determination (6) Complete SDWIS Modernization Element Description EPA Regions, to ensure that data reliability improvement (including implementation of EPA Order 5360. 1.A2) is included in annual agreements with States, will work with states to identify the specific reasons for discrepancies in compliance determinations and to identify training needs among states to facilitate capability to make correct determinations. Complete modernization, web- enablement and deployment of SDWIS/STATE Web Activity development. (a) EPA Headquarters will develop an electronic data verification tool to allow EPA Regions to compare the results of all state compliance determinations to the violation data reported to EPA in SDWIS/FED. (b) EPA Regions will ensure that data reliability improvement steps are included in all agreements and work plans with States and identify specific reasons for discrepancies, including non-reporting, of state determinations with federal regulations. (c) States will identify compliance determination training needs to EPA Regions. (d) EPA Headquarters will develop and provide capability for training on compliance determination for states (a) Development of fully web- enabled SDWIS/STATE and facilitation of fuller use of software for state program management Responsibility & Actions development (1) EPA HQ & States - Complete testing eDVtool (2) EPA Regions and States - Use eDV tool to check compliance determinations and take appropriate action (1) EPA Regions & States - Incorporate data reliability improvement steps in state- EPA agreements and state work plans (1) States - identify compliance determination training needs (1) EPAHQ- Completed/revision underway for compliance determination training (1) EPA HQ - Develop SDWIS/STATE Web Release 2 (2) EPA Regions - promote full state use of Completion (1) September 2008 (2) Ongoing beginning in 2009; quarterly check and take action (1) Annually (1) Annually (1) Ongoing (1) October 2007 (2) Annually Status 42 ------- Element (7) Evaluate Low Timeliness of Violation Reporting (8) Update Out- of-date and missing Inventory Data Element Description Release 2, facilitate fuller use of SDWIS/STATE among states choosing to use it and regular update of inventory data to improve data quality Evaluate timeliness by rule with data reported to the modernized SDWIS/FED for 2006 Evaluate regulatory requirements to determine the appropriate inventory reporting relating the applicability of rules to systems, set a priority on the data needed, and work with states to update the inventory data routinely reported to EPA Activity (b) Deployment of web-enabled SDWIS/STATE with planned fuller use of modules by states using SDWIS/STATE and update of inventory data (a) Evaluate timeliness by rule with data reported to the modernized SDWIS/FED for 2006 (a) Evaluate regulatory requirements to determine the appropriate inventory reporting relating the applicability of rules to systems, set a priority on the data needed, and work with states to update the inventory data routinely reported to EPA Responsibility & Actions SDWIS/STATE software through state agreements (1) States - Deploy SDWIS/STATE Web Release 2 (2) EPA Regions and States using SDWIS/STATE - agree to steps toward fuller use of SDWIS/STATE in agreements and workplan (3) EPA HQ and States - Conduct workshop on SDWIS/STATE Web Release 2 (1) Data Sharing Committee - perform timeliness analysis in 2008 once all violation data are reported and processed; make recommendation to DMSC (1) Data Sharing Committee - evaluate inventory reporting and propose a priority on data to be updated Completion (1) Beginning October 2007 (2) Annually (or as appropriate) (3) Summer 2008 (1) 2007 (1) 2008 Status 43 ------- Appendix B: Violations Addressed by Data Verification (DV) Violation Code 1 2 3 4 5 6 7 8 9 10 11 12 13 21 22 23 24 25 26 27 28 29 31 36 Violation Name MCL, Single Sample MCL, Average Monitoring, Regular Monitoring, Check/Repeat/Confirmation Notification, State Notification, Public Treatment Techniques Variance/Exemption/Other Compliance Record Keeping Operations Report Non-Acute MRDL Treatment Technique No Certif. Operator Acute MRDL MCL, Acute (TCR) MCL, Monthly (TCR) Monitoring, Routine Major (TCR) Monitoring, Routine Minor (TCR) Monitoring, Repeat Major (TCR) Monitoring, Repeat Minor (TCR) Monitoring and Reporting Stage 1 Sanitary Survey (TCR) M&R Filter Profile/CPE Failure Monitoring, Routine/Repeat (SWTR-Unfilt) Monitoring, Routine/Repeat (SWTR-Filter) Violation Type MCL MCL MR MR Other Other Other Other Other Other MRDL TT MRDL MCL MCL MR MR MR MR MR Other MR MR MR Applicable rules and contaminant codes (CCodes) DBF 1009, 1011, 2456, 2950 FBR 0500 TTHM pre-'02 2941/42/437 44, 2950 IESWTR 0300 DBP0999, 1006/08 DBP 0400 Other VOC 2378/ 80, 2955/ 647 687 697 767 777 797 80/817 827 837 847 857 877 897 907 91 7 92/96 soc 2005/ 1 07 15/207 31/32/ 337 347 357 367 377 397 407 41 7 427 43*7 44*/ 467 47*7 507 51/637 657 67, 21 057 1 0, 22747 98, 2306/ 267 837 887 907 927 947 967 98, 2400, 2931 / 467 59 DBP 1008 TCR 31 00 DBP 0400, 0999, 1006/ 087 097 1 1 , 2456, 2920, 2950 SS, TCR 31 00 IESWTR 0300 SWTR 0200 Other IOC Nitrates 1005/10/157 Rads 40007 _ ... 1038, 20/24/251 06/10, £ ?™ 1040, 35/36*7457 41007 Innn 1041 74/75/8S/ 01/02/74 300° 94 * codes required for monitoring only 44 ------- Violation Code 37 38 39 40 41 42 43 44 46 47 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 71 72 75 76 Violation Name Treatment Technique State Prior Approval M&R Filter Turbidity Reporting M&R (FBRR) Treatment Technique (FBRR) Treatment Technique (SWTR) Failure to Filter (SWTR) Treatment Technique Exceeds Turb 1 NTU Treatment Technique Exceeds Turb 0.3 NTU Treatment Technique Precursor Removal Treatment Technique Uncovered Reservoir Initial Tap Sampling for Pb and Cu Follow-up and Routine Tap Sampling Initial Water Quality Parameter WQP M&R Follow-up & Routine E.P. WQP M&R (deleted) Follow-up & Routine Tap WQP M&R (deleted) Initial, Follow-up, or Routine SOWT M&R OCCT Study Recommendation OCCT Installation/Demonstration WQP Entry Point Noncompliance WQP Entry Point Noncompliance (deleted) SOWT Recommendation (deleted) SOWT Installation (deleted) MPL Noncompliance Lead Service Line Replacement (LSLR) Public Education CCR Complete Failure to Report OCR INADEQUATE REPORTING PN Violation for NPDWR Violation Other Non-NPDWR Potential Health Risks Violation Type TT MR MR TT TT TT TT TT TT TT MR MR MR MR MR MR TT TT TT TT TT TT TT TT TT Other Other Other Other Applicable rules and contaminant codes (CCodes) IESWTR 0300 FBR 0500 SWTR 0200 IESWTR 0300 DBP 2920 IESWTR 0300 LCR 5000 LCR 1022, 1030 LCR 5000 CCR 7000 PN 7500 DBP 0400 45 ------- Appendix C: Definition of Public Notification (PN) Tiers Tier 1: Violations and Other Situations Requiring Notice Within 24 Hours 1. Violation of the MCL for total coliform, when fecal coliform or E. coli are present in the water distribution system, or failure to test for fecal coliform or E. coli when any repeat sample tests positive for coliform 2. Violation of the MCL for nitrate, nitrite, or total nitrate and nitrite, or when a confirmation sample is not taken within 24 hours of the system's receipt of the first sample showing exceedance of the nitrate or nitrite MCL 3. Exceedance of the nitrate MCL (10 mg/1) by non-community water systems, where permitted to exceed the MCL (up to 20 mg/1) by the primacy agency 4. Violations of the MRDL for chlorine dioxide when one or more of the samples taken in the distribution system on the day after exceeding the MRDL at the entrance of the distribution system or when required samples are not taken in the distribution system 5. Violation of the turbidity MCL of 5 NTU, where the primacy agency determines after consultation that a Tier 1 notice is required or where consultation does not occur in 24 hours after the system learns of violation 6. Violation of the treatment technique requirement resulting from a single exceedance of the maximum allowable turbidity limit, where the primacy agency determines after consultation that a Tier 1 notice is required or where consultation does not take place in 24 hours after the system learns of violation 7. Occurrence of a waterborne disease outbreak, as defined in 40 CFR 141.2, or other waterborne emergency 8. Other violations or situations with significant potential to have serious adverse effects on human health as a result of short term exposure, as determined by the primacy agency either in its regulations or on a case-by case basis Tier 2: Violations Requiring Notice Within 30 Days 1. All violations of the MCL, MRDL, and treatment technique requirements except where Tier 1 notice is required 2. Violations of the monitoring requirements where the primacy agency determines that a Tier 2 public notice is required, taking into account potential health impacts and persistence of the violation 46 ------- 3. Failure to comply with the terms and conditions of any variance or exemption in place Tier 3: Violations and Other Situations Requiring Notice Within 1 Year 1. Monitoring violations, except where Tier 1 notice is required or the primacy agency determines that the violation requires a Tier 2 notice 2. Failure to comply with an established testing procedure, except where Tier 1 notice is required or the primacy agency determines that the violation requires a Tier 2 notice 3. Operation under variance granted under § 1415 or exemption granted under §1416 of the Safe Drinking Water Act 4. Availability of unregulated contaminant monitoring results 5. Exceedance of the secondary maximum contaminant level for fluoride 47 ------- |