United States Office of Water EPA 816-R-00-020 Environmental Protection (4606) October 2000 Aoencv www epa oov/safewater Data Reliability Analysis of the EPA Safe Drinking Water Information System / Federal Version (SDWIS/FED) ------- Acknowledgements The following people contributed to this project and the preparation of this report: Jan Auerbach Project lead Lee Kyle Data quality quantification, principal author of this report Fran Haertel Data quality characterization, state-specific data quality reports Members of the Data Reliability Stakeholders Workgroup EPA Headquarters: Jan Auerbach, Chair; Chief, Information Management Branch (1MB) Fran Haertel, Environmental Protection Specialist, 1MB Lee Kyle, Statistician, 1MB Ken Harmon, Office of Enforcement and Compliance Assistance (OECA) EPA Regions: 1: Chris Ryan, Region I 2: Mark Rasso, Region II SDWIS/FED coordinator 5: Tom Poleck, Region V SDWIS/FED coordinator 6: Andy Waite, Region VI SDWIS/STATE coordinator 8: Aundrey Wilkins; Jack Rychecky, Region VIII Branch Chief States: Florida: Kenna Study, Drinking Water Manager Iowa: Dennis Alt, Drinking Water Manager Utah: Kevin Brown, Drinking Water Administrator Washington: Peggy Johnson, Drinking Water Manager Asssociation of State Drinking Water Administrators (ASDWA): Vanessa Leiby, Executive Director; Bob Blanco Industry: American Metropolitan Water Administrators: David Denig-ChakrofT American Water Works Association (AWWA): Teryl Pajor, Dan Schechter National Association of Water Companies: John Hroncich Other: State Lab: Steve Jennis Natural Resources Defense Council (NRDC): Eric Olson, Adriana Quintaro ------- EXECUTIVE SUMMARY 1 PART I: MANAGEMENT SUMMARY 2 1 Introduction 2 1.1 Background 2 1.2 Perspective/context 4 2 Summary of findings 4 3 Corrective actions 7 3.1 Early actions taken 7 3.2 Actions taken resulting from September 1999 Stakeholder Workgroup recommendations 7 3.3 Future actions planned 8 3.4 Implementation process 9 PART II: DETAILED FINDINGS 12 4 National estimates of the quality of SDWIS/FED data 12 4.1 Data quality defined 12 4.2 Methodology 12 4.3 Perspective/context 13 4.4 Data verifications 14 4.5 Industry surveys 22 4.6 Frozen database comparison—Timeliness estimates 26 4.7 Comparison of SDWIS/FED to Envirofacts 28 5 Additional data quality analyses 29 5.1 States' reporting of violations data 29 5.2 Comparison of states' reporting of Annual Compliance Report (ACR) data to SDWIS/FED data 30 5.3 Error reports analysis—data transfer errors 33 5.4 State structures analysis 34 5.5 State summaries of SDWIS/FED data quality, and recommended improvements 35 Appendix A—Stakeholders Working Group recommendations 37 ------- EXECUTIVE SUMMARY In 1998 EPA launched a major effort to assess the quality of its drinking water data— data used to assess compliance with the Safe Drinking Water Act, and found that it needs to be improved. This report is the culmination of that effort. This report provides specific findings and estimates of the quality of the data that are in, and should be in, the EPA Safe Drinking Water Information System (SDWIS/FED). These results in no way should be interpreted as a reflection of drinking water quality, which overall remains high. SDWTS/FED is EPA's drinking water database. It contains drinking water data for approximately 170,000 public water systems serving over 250 million people. Each water system has inventory data which describe the water system, data on any violations they have incurred, and resulting enforcement actions taken by states and/or EPA to ensure drinking water protection. EPA found that the data quality for a selected subset of the required inventory data elements is high, that the data quality of violations data is low, and that enforcement actions data are of moderate data quality. SDWIS/FED data quality findings were found to be similar across water system types and size categories. The violations listed in SDWIS/FED are accurate, but they are incomplete. A number of states have never reported certain types of violations. While industry found a few cases of over-reporting in the past (which have been corrected), EPA found very little over- reporting of violations in its analyses. EPA and states have taken or scheduled a number of corrective actions, which are described in the Management Summary. These actions include (but are not limited to) more and improved training on rule implementation and data entry, additional and revised data audits, and improved data error interpretation. These corrective actions should improve the quality of the data reported nationally, as well as improve the public's understanding of the overall high quality of drinking water supplied by most systems in the United States. ------- PART I: MANAGEMENT SUMMARY 1 Introduction In 1998, EPA launched a major effort to assess the quality of its drinking water data— data used to assess compliance with the Safe Drinking Water Act, and found that it needs to be improved. This report provides estimates of the quality of the data that are in, and should be in, the EPA Safe Drinking Water Information System (SDWIS/FED). These results in no way should be interpreted as a reflection of drinking water quality, which overall remains high. SDWIS/FED is EPA's drinking water database. It contains drinking water data for approximately 170,000 public water systems serving over 250 million people. Each water system has inventory data which describe the water system, data on any violations they have incurred, and resulting enforcement actions taken by states and/or EPA. EPA found that the data quality for a subset of 8 required inventory data elements is high, that the data quality of violations data is low (principally because they are incomplete), and that enforcement actions data are of moderate data quality. 1.1 Background In the summer of 1998, some drinking water utility trade associations advised their members to check the EPA "Envirofacts" website, which contains violations and enforcement actions information on individual water systems. This was to prepare them for possible inquiries from their customers. Some larger utilities found gross errors in the reporting of violations against their water systems, specifically in cases of "over- reporting"— violations that never occurred which were listed in SDWIS/FED. Several of the utilities met with the incoming Assistant Administrator for the Office of Water, J. Charles Fox, to voice their concerns over the poor quality of the data that were available to the public. Also that summer, EPA was preparing the first Annual Compliance Report (ACR), as required by the 1996 Amendments to the Safe Drinking Water Act. The Amendments required states to prepare state reports, and for EPA to compile them in a national report. When EPA compared the data in the state reports to the data these states submitted to SDWIS/FED, it found more than a 30% overall difference in data which should have been virtually identical. These two concerns led the Assistant Administrator to issue a letter to the states on September 3, 1998, calling for a major initiative to quantify and characterize the quality of the data in SDWIS/FED. EPA began this initiative by holding three national public meetings on SDWIS/FED data quality in November and December 1998. EPA then formed a data reliability stakeholders workgroup comprised of people from EPA headquarters and regional offices, state drinking water programs, water utilities, industry associations, laboratories, and an environmental non-profit organization. The workgroup considered the comments from the public meetings and helped EPA develop a Data Quality Action Plan. ------- The Data Quality Action Plan, dated December 31, 1998 consisted of 4 major components: 1. Establish a SDWIS/FED data quality goal: "SDWIS/FED will contain 100% complete, accurate, timely, and consistent data which portray the data submitted by public water systems and primacy agencies, consistent with the Safe Drinking Water Act (SDWA) requirements. This goal will be advanced through interim milestones, which can be set once the current level of SDWIS/FED data quality is determined." 2. Improve the way SDWIS/FED data are presented in the EPA Envirofacts website. Several water utilities and other stakeholders raised concerns about what water system compliance information was available on the EPA Envirofacts website, and how it was displayed. 3. Take interim actions to improve SDWIS/FED data quality (status of these actions taken are discussed in Section 3.1). 4. Quantify and qualify the quality of SDWIS/FED data. EPA used several analyses to quantify and characterize data quality. The overall SDWIS/FED data quality estimates for inventory, violations, and enforcement actions data are based primarily on the findings from the data verifications analysis, with input from an analysis comparing Annual Compliance Report (ACR) data to data in SDWIS/FED. The data verifications analysis included 29 data verification audits conducted in 27 states between 1996 and 1998. A total of 1,857 systems were audited (see Section 4.4 for details). Analyses used to quantify and characterize SDWIS/FED data quality • Data verifications— reviews of data in state files that provided numerical estimates of overall SDWIS/FED data quality • Industry surveys—water systems reviews of SDWIS/FED data that provided numerical estimates of the accuracy of datam SDWIS/FED • Frozen database comparison—to develop numerical estimates of the timeliness in which violations are reported • Comparison of SDWIS/FED data to Envirofacts data— to check for data transmission errors from one data set to the other Comparison of states'reporting of 1997 Annual Compliance Report (ACR) data to SDWIS/FED data- provided ratios of under-/over-reporting, reasons for discrepancies Errors analysis—to evaluate errors incurred in transferring data to SDWIS/FED Initial SDWIS/FED data quality estimates were shared with states and EPA regions in summer 1999. Any errors found were checked and corrected. Some of the states concerns are discussed in Section 4.4.1.3 of this report. Since then, the other analyses have been completed, including the industry surveys. The overall findings are presented in this report. The outcome of the SDWIS/FED data quality analysis is to provide a benchmark of SDWIS/FED data quality, and a better understanding of where greater attention needs to be focused to improve it. The quantitative portion of this analysis also provides a water system perspective (% water systems having violations or enforcement actions) when ------- available to provide additional perspective. The qualitative portion ties together numerical and non-numerical information from a number of analyses in an attempt to further characterize where the problems are occurring, and why. 1.2 Perspective/context These results in no way should be interpreted as a reflection of drinking water quality, which overall remains high. Nor does it question the accuracy of the data submitted by laboratories or water systems to states (inaccurate lab results, fraud, data falsification, etc.). The thousands of compliance decisions that are made correctly by state drinking water programs are not enumerated. Only the violations and enforcement actions appear, because SDWIS/FED is an exceptions database (in other words, states do not provide sample data on regulated contaminants; they only report to SDWIS/FED when an "exception," such as a violation or enforcement action, has occurred). As will be shown, only a small percentage of systems have any health-based violations. Many states have taken corrective steps to improve their SDWIS/FED data quality since these data were gathered. 2 Summary of findings Summary estimates of SDWIS/FED data quality are presented in Text Box 1. Detailed estimates are contained in Part II of this report. Inventory data The overall quality of 8 core SDWIS/FED inventory data Parameters checked: 8 inventory System status (active or inactive) Water system type Primary source of water Population served # service connections Water system ID parameters is high. That is, only 4% of the inventory parameters checked had any discrepancies (discrepancies are differences in data, missing data, or errors). The two parameters that change most frequently—population served and number of service connections—had the highest discrepancy rates. SDWIS/FED data quality estimates are very similar across water system types. These results are corroborated by the industry surveys. Violations data • The overall quality of SDWIS/FED violations data is moderately high (estimated at 68%) for the Total Coliform Rule standard (an acute health-effects measure). However, it is very low for other health-based standards including Chemicals, Radionuclides, Surface Water Treatment Rule, and for monitoring/reporting requirements. • Most of the discrepancies are because of unrecorded and unreported violations. This accounts for 56% of all MCL discrepancies, 83% of SWTR TT discrepancies, and 94% of all monitoring/reporting discrepancies. Data flow discrepancies (data in state databases but not SDWIS/FED) account for the remainder. ------- • The data that are reported in SDWIS/FED are highly accurate overall, in part because edit checks reject data which are transferred incorrectly. • Data quality estimates are similar across water system types; this is corroborated by the industry surveys. • Very little indication of over-reporting of violations was found (less than 0.7% of violation discrepancies). • A number of states have never reported certain types of violations. • Many states are not meeting the 90-day deadline for reporting violations. Only 68% of violations were reported on time. • Violations reported using the Traditional method (selected data replacement or correction) appear to be more timely than those reported using the Total Replace method (replacing the entire data set each time changes are made). Enforcement actions data • SDWIS/FED enforcement actions data were found to be 87% complete and 83% accurate. Results were similar across water system types. Other findings • No discrepancies were found between data in SDWIS/FED and Envirofacts. • "Data entry problems" was the most frequently cited reason for discrepancies between ACR data reported by states and SDWIS/FED data. "Resource limitations" was the next most common reason for discrepancies. • Using the Traditional data entry method, 20% of inventory data and 32% of violations and enforcement actions data are being rejected. It was not possible to perform a similar calculation for the Total Replace method. • Only 25% of all states were successful in resubmitting data in their first attempt. Of those not successful on the first attempt, 82% of the error types were data entry errors. Seven percent or less represent SDWIS/FED software limitations and problems. • Characteristics of state programs that result in high quality SDWIS/FED data include routine, meaningful communication at all levels; Annual PWS notification of monitoring schedules; and automated monitoring compliance determination. ------- Text Box 1 SDWIS/FED Data Quality Summary Statistics Inventory data The SDWIS/FED data quality of 8 inventory parameters checked is estimated to be 96%. 1 ~ 2,000 systems reviewed times 8 parameters checked Number of data points Discrepancies: Number Percent SDWIS/FED data [~~ quality 16,006 646 4.0% 96% The # of instances where the DV audit team concluded that the parameter in SDWIS/FED was incorrect. SDWIS/FED data quality = % of data without discrepancies (errors) I— SDWIS/FED inventory data quality bvparameter. System status (active or inactive)—97%, water system type—97%, primary source of water—98%, population served—91%, # service connections—92%, address—95%, name—98%, water system ID—100%. Violations data The SDWIS/FED data quality of violations data ranges from 7% for Surface Water Treatment Rule Treatment Technique (SWTR TT) violations to 68% for Total Coliform Rule Maximum Contaminant Level (TCR MCL) violations. Violations data listed in SDWIS/FED are accurate, but not incomplete. In addition, 68% of violations are reported on time. The estimates on this line are not part of the data quality calculations, but lend perspective: 78% of the 1,857 systems reviewed had at least 1 violation of any type during the 1-3 year period of review for contaminants and rules # violations that the DVs determined should have been reported to SDWIS/FED, whether or not thev were # errors cited in the DVs: 93% were for violations not % systems w/ Violations Number of Violations Discrepancies Number Percent % Completeness % Accuracy SDWIS/FED data quality TCR Total Other MCL MCL 6.1% 162 52 32% 68% 99% 68% < 4.3% 59 50 85% 19% 79% 15% SWTR TT 9.6% 94 87 93% 11% 67% 7% Total M/R < 78% 5,091 4,613 91% 10% 95% 9% ^ \ N \N designated by states as violations; remaining 7% occurred between state databases and SDWIS/FED Completeness: % violations thatshould be in SDWIS/FED that made it in Accuracy: % violations jn SDWIS/FED that are correct SDWIS/FED data quality = % of data without discrepancies (errors) Enforcement actions data The SDWIS/FED data quality of formal enforcement actions is estimated to be 72%. All formal enforcement actions, which are issued by the state and/or EPA in response to violations, are required to be reported to SDWIS/FED. % systems with Enforcement Actions Number of Enforcement Actions Discrepancies Number Percent % Completeness % Accuracy SDWIS/FED data quality # errors cited in the DVs. The DV audits only measure the difference between state databases and SDWIS/FED. Auditors did not assume there should be an enforcement action unless the state actually took one Completeness: % enforcement actions in state files that made it into SDWIS/FED Accuracy: % enforcement actions in SDWIS/FED that are correct SDWIS/FED data quality = % of data without discrepancies {errors) ------- 3 Corrective actions Text Box 2 defines the 4 elements of SDWIS/FED data quality and correlates their improvement to the corrective actions discussed below. 3.1 Early actions taken Before waiting for the results of the analyses which would quantify and qualify the quality of the data in SDWIS/FED, the data reliability stakeholders workgroup recommended in December 1998, and EPA subsequently completed, several actions to improve SDWIS/FED data quality in the interim. EPA HQ: • Improved the way SDWIS/FED data are presented in the EPA Envirofacts website. In response to concerns about the quality of older SDWIS/FED data, now only violations and enforcement actions incurred since 1993 will be displayed in Envirofacts. Beginning in 2003, ten years' worth of data will be displayed. EPA also improved the way SDWIS/FED data in Envirofacts are displayed. Major changes included combining violations and enforcement actions so that they are displayed in the same table (previously, users had to match violations and enforcement actions by looking at two different tables and matching the violations identification number), showing health-based violations separated from monitoring and other violations, and adding links to utilities' Consumer Confidence Reports (CCRs) on-line using EPA's new CCR catalog. Better descriptions of what violations and enforcement actions are, as well as additional links to state pages and contaminant fact sheets, were also added. • Prioritized and corrected deficiencies already identified in the data entry process • Accelerated the development and implementation of SDWIS/STATE • Provided additional error check routines in SDWIS/FED • Improved existing data entry tools such as the data entry troubleshooter's guide • Accelerated efforts to develop new tools to simplify data retrieval, and accelerated efforts to improve existing reporting tools • Developed an interim mechanism to enable utilities to confirm their data before they are officially accepted in SDWIS/FED EPA Regions took additional steps to ensure that quarterly submissions are reviewed and errors are checked prior to the quarterly freeze in SDWIS/FED. EPA and States drafted quality assurance manuals to help states and regions operate the drinking water program and report drinking water information. 3.2 Actions taken resulting from September 1999 Stakeholder Workgroup recommendations The Stakeholder Workgroup reviewed the preliminary findings of the analyses used to quantify and characterize SDWIS/FED data quality in September 1999. Many of the ------- actions taken or scheduled listed below resulted from their prioritized recommendations, which are listed in Appendix A. 3.2.1 EPA HQ actions taken • Training: EPA staff have designed implementation and data reporting training courses for the Lead and Copper Rule Minor Revisions (LCRMR) and the Public Notification Rule (PN). Several courses have been conducted for states and regions. EPA has established a contractual arrangement for states and regions to obtain one- on-one, on-site data management assistance. EPA has expanded its offering of generic data entry and troubleshooting (i.e., correcting errors) training courses. • The SDWIS/FED Edit/Update Summary Report has been completely redesigned to fully account for and document the processing results of each data submission file. 3.3 Future actions planned 3.3.1 EPA HQ Actions • Provide additional training by: Developing a schedule for implementation and reporting training courses for the Chemicals/Radionuclides rules, the Surface Water Treatment Rule, the Total Coliform Rule, and developing training courses and materials for each new rule. The training will include implementation, compliance determination and reporting requirements. • Improve the data verifications audits by: Revising the Data Verification Protocol to incorporate workgroup recommendations, completing a version of the Data Verification Protocol for states to use in conducting a self-audit, and completing 11 data verification audits in FY2000 (more if funds allow). If 17 audits were conducted per year, the data quality in each state could be assessed every 3 years (audits cost roughly $25,000 each). • Complete a version of the error report which managers can use to help them improve data entry. • Target attention to some states and regions, based on the results of individual state analyses and ongoing data verification audits. EPA will conduct meetings to address issues, target technical assistance and develop plans of action with such states and regions. • Continue to calculate SDWIS/FED data quality including: National estimates for SDWIS/FED data quality at least every 3 years or more frequently if data from a sufficient number of data verifications analyses are available; ACR vs. SDWIS/FED analysis, national estimates of the timeliness of data reporting violations, and the number of states reporting violations by ------- contaminant/rule and water system type annually; and error rates by error code quarterly. 3.3.2 EPA Regional Actions • Conduct the errors analyses quarterly to determine which error conditions are occurring most frequently. 3.3.3 State Actions States may take the following actions to improve data quality, but specific actions in each state will be contingent on its particular situation. • Notify utilities annually of compliance monitoring schedules • Implement and participate through Association of State Drinking Water Administrators (ASDWA) in peer reviews among states • Conduct self-audits using the revised data verifications protocol • Share software, tracking systems, and compliance determination modules among states that support rule implementation • Evaluate current information management systems and consider adopting SDWIS/STATE • Participate in EPA-provided training for rule implementation, reporting requirements, and data entry • Develop and implement a quality assurance program Potential categories for SDWIS/FED data quality goals: • Overall inventory • Overall enforcement actions • Violations: TCRMCL Other MCL SWTR XT LCR TT M/R 3.3.4 Joint EPA-State Actions • Work together to establish goals for improving SDWIS/FED data quality at the national level, as assessed through data verifications results. • Continue early involvement of states and regions in rulemaking with a focus on (1) streamlining reporting requirements and (2) simplifying rules to ease interpretation and implementation, including reporting requirements. 3.4 Implementation process The ASDWA/EPA Data Management Steering Committee (DMSC), in conjunction with the Data Sharing/Data Quality Committee (DSC), will continue to focus on data quality improvement issues identified in this report, and will propose future corrective actions and strategies for EPA and States. Individual state-specific recommendations will be communicated to the states and EPA Regions through State Summary reports. Joint discussions will be conducted and an implementation schedule developed. Follow-up activities will be conducted through the normal mid-year and end-of-year program evaluation process. Generic state corrective actions will be pursued through the State/EPA annual Workplan process. ------- Formal implementation could begin as early as FY 2001. Many states have already begun state-specific corrective actions, as has EPA. Once finalized, appropriate standard operating procedures will be developed and incorporated in the EPA PWSS Data Management Quality Assurance Manual. Collectively, steps already taken by EPA and States, and those planned, are expected to significantly improve the quality of data in SDWIS/FED. These steps should also improve public understanding of the high quality of drinking water supplied to consumers by most water systems in the United States. 10 ------- Text Box 2 Improving the 4 Elements of SDWIS/FED Data Quality There are 4 major elements of data quality: 1 . Completeness— what percent of data that should be in SDWIS/FED is there? 2. Accuracy — how accurate are the data jn SDWIS/FED? 3. Timeliness— what percent of violations data are being reported within a quarter after their compliance period end dates? 4. Consistency — are the regulations being interpreted consistently? Actions taken or planned should improve these elements of SDWIS/FED data quality as follows: Early actions taken by EPA HQ Improved the way data are presented in the EPA Envirofacts website Corrected deficiencies in the data entry process Accelerated the development and implementation of SDWIS/STATE Provided additional error check routines in SDWIS/FED Improved existing data entry tools such as the data entry troubleshooter's guide Accelerated efforts to develop new tools to simplify data retrieval Developed interim mechanism to enable utilities to confirm their data before they are officially accepted in SDWIS/FED Completeness «£ p Consistency X X X X X X X X X X X X X X X EPA HQ actions taken resulting from September 1999 Stakeholder Workgroup recommendations Designed implementation and data reporting training classes for the LCRMR and PN Rules Established arrangement for states and regions to obtain one-on-one, on-site data management assistance Expanded offering of generic data entry and troubleshooting training courses Redesigned SDWIS/FED Edit/Update Summary Report X X X X X X X X X X X X EPA HQ actions planned Provide additional rule-specific training for existing and upcoming rules, including implementation, compliance determination and reporting requirements. Improve the data verifications audits to enable states to conduct self-audits; perform additional audits Complete a version of the error report which managers can use to help them improve data entry Target poorer-performing states and regions; conduct meetings to discuss issues, target technical assistance and develop plans of action with such states and regions Continue to quantify SDWIS/FED data quality X X X X X X X X X X X X X X X Benchmark DQ EPA Regional actions planned Conduct the errors analysis quarterly to determine which data entry error conditions are occurring most frequently X X X State actions planned Notify utilities annually of compliance monitoring schedules Implement and participate through ASDWA in peer reviews among states Conduct self-audits using the revised data verification protocol Share software, tracking systems, and compliance determination modules among states that support rule implementation Evaluate current information systems and consider adopting SDWIS/STATE Participate in EPA-provkJed training for rule implementation, reporting requirements, and data entry Develop and implement a quality assurance program Reduce MIR viols. X X X X X X X X X X X X X X X X X X X X X X X X Joint EPA-State actions planned Work together to establish goals for improving SDWIS/FED data quality, for specific categories of data, at the national level Continue early involvement of states and regions in rulemaking with a focus on (1) streamlining reporting requirements and (2) simplifying rules to ease interpretation and implementation, including reporting requirements X X X X X X X X 11 ------- PART II: DETAILED FINDINGS 4 National estimates of the quality of SDWIS/FED data Part II provides details of the analyses conducted and the estimates of SDWIS/FED data quality. This section provides a definition of data quality, describes the analytical methodology used, and provides detailed estimates of the quality of inventory, violations and enforcement actions data in SDWIS/FED. 4.1 Data quality defined Two questions need to be answered in order to estimate the quality of SDWIS/FED data: 1. What should be in SDWIS/FED (and is missing)? 2. How accurate is what is in SDWIS/FED? There are four major elements of data quality. The first two are essentially a variation on the two questions above: • Completeness—what percent of data that should be in SDWIS/FED is there? • Accuracy—how accurate are the data in SDWIS/FED? There are two additional elements of data quality: • Timeliness—what percent of violations data are being reported within a quarter after their compliance period end dates? Timeliness is a component of Completeness • Consistency—are the regulations being interpreted consistently? 4.2 Methodology 4.2.1 How EPA quantified data quality This quantification is based on discrepancy rates for inventory, violations and enforcement action data. Discrepancy rates are defined as the percent of data that should be in SDWIS/FED that have errors, are missing, or that do not match between state databases and SDWIS/FED. Overall data quality (for inventory, violations and enforcement actions data) is defined as the percent of data with no discrepancies. If, for example, 20% of the data have discrepancies, the SDWIS/FED data quality is 80%. For violations and enforcement actions data, overall data quality can also be defined as the multiple of Completeness and Accuracy. Because inventory data are not exceptions- based data they are not quantified the same way. Instead, they are quantified as a single number. Accuracy is conditional on Completeness: it measures the accuracy of the data, given the data are complete. For example, if there are: 100 violations that should be in SDWIS/FED, 12 ------- and 60 make it in (Completeness) and, of those, 48 are accurate (Accuracy), Overall quality=60/l 00*48/60=48% Timeliness is a component of Completeness and is included in the Completeness calculations; it was quantified separately in the frozen database analysis. Consistency is not quantified in this analysis, but is implicit, to some degree, in the data verifications. 4.2.2 Which estimates for which data Next to each data type (in bold) is a list of parameters for which EPA calculated data quality estimates; below each are the analyses used to generate these estimates. SDWIS/FED data quality estimates Inventory—core data elements: 1. status (i.e., water system is active or inactive), 2. type of public water system (e.g., community, transient), 3. primary source of water, 4. population served, 5. number of service connections, 6. address, 7. name, 8. PWS ID Overall SDWIS/FED data quality: • Data verifications analysis • Industry surveys Violations—all violations Overall SDWIS/FED data quality: • Data verifications analysis Completeness: • Data verifications analysis Accuracy: • Data verifications analysis (with input from the Annual Compliance Report vs. SDWIS/FED analysis) • Industry surveys Timeliness: • Frozen database comparison Enforcement actions—all required to be reported to SDWIS/FED Overall SDWIS/FED data quality: • Data verifications analysis Completeness: • Data verifications analysis Accuracy: • Data verifications analysis • Industry surveys 4.3 Perspective/context These results in no way should be interpreted as a reflection of drinking water quality, which overall remains high. Nor does it question the accuracy of the data submitted by laboratories or water systems to states (inaccurate lab results, fraud, data falsification, etc.). The thousands of compliance decisions that are made correctly by state drinking water administrators are not enumerated. Only the violations and enforcement actions 13 ------- appear, because SDW1S/FED is an exceptions database (in other words, states do not provide sample data on regulated contaminants; they only report to SDWIS/FED when an "exception," such as a violation or enforcement action, has occurred). As will be shown, only a small percentage of systems have any health-based violations. Many states have taken corrective steps to improve their SDWIS/FED data quality since these data were gathered. 4.4 Data verifications 4.4.1 Background The data verifications analysis is the only analysis that assesses the first key component of data quality for violations and enforcement actions data: Completeness, or the percentages of these data which should be in SDWIS/FED that are. The data verifications analysis also yields overall SDWIS/FED data quality estimates for inventory data (as do the inventory surveys). The purpose of data verification audits is to determine whether a state is in compliance with that state's primacy agreement (since late 1996, auditors have considered guidance from Regions in addition to Federal regulations). Recommendations contained in the audit are intended to assist states in correcting deficiencies in their program and improve SDWIS/FED data quality. An independent contractor has been performing data verifications since 1991. The contractor selects a (semi-) random sample of each type of water system in the state. During an audit, auditors primarily look at state files and database(s). The results are intended to be representative of the quality of drinking water data throughout the state with at least an 80% confidence level, and a 7.5% margin of error. States have the opportunity to review the draft report and provide appropriate documentation required to adjust or revise the final report. Most states have accepted the final results of their data verification audits. Prior to this analysis, data verification reports tabulated the number of systems having discrepancies. For this analysis, EPA tasked the contractor to re-tabulate the data on a data point basis—as a true SDWIS/FED data audit. That is, they compared data that should have been reported to SDWIS/FED to those that actually were reported, and cited reasons for each discrepancy. Now all data verifications are tabulated in this way. For this analysis, EPA selected all data verifications done between 1996 and 1998. This included 29 data verification audits from 27 states. A total of 1,857 systems were audited. Some of the data verifications focused only on specific rules/contaminants. Results from the portion of the audit associated with the Lead and Copper Rule (LCR) are not included in this analysis due to questions of regulatory interpretation, which have not yet been resolved. 14 ------- 4.4.1.1 States included in this analysis, by EPA Region: in IV VI VII VIII IX CT MA ME NH RI* VT VI DE MD PA* WV AL FL GA NC MI LA IA SD AZ WA MN NM NE WY OK TX * 2 audits were performed 4.4.1.2 Period of review for states reviewed during 1996-1998 Total Colifonn Rule (TCR) Nitrates Nitrites lOCs VOCs SOCs Radionuclides Total Trihalomethanes Surface Water Treatment Rule Enforcement Most recent four quarters in SDWIS/FED Most recent three calendar years 1993-1995 1993-1995; back to 1990 if grandfathered 1993-1996; back to 1988 if grandfathered 1993-1995; back to 1990 if grandfathered Most recent two samples Most recent four quarters available in SDWIS/FED Most recent four quarters available in SDWIS/FED Time period applicable to related violation 4.4.1.3 Summary of some states' concerns about using data verifications results to quantify SDWIS/FED data quality After EPA calculated SDWIS/FED data quality based on the data verifications, it shared the draft results with the states. Many states accepted the findings, and the methods used to derive them. • One of the states' most widespread concerns was that the public would misconstrue the quality estimates as an indication of how well states are running their drinking water programs, or as a measure of their drinking water quality. They felt a more accurate picture of state data quality would consider all the decisions a state is required to make, not just violation decisions. For example, a state may determine that a utility monitored correctly in eight out often instances. However, the state failed to issue one of the two violations which should have been issued. States noted that data quality in this case was really 90% (eight out of eight appropriate monitoring, one out of two failure-to-monitor instances results in a violation). In this analysis, only violation opportunities are considered. Since there were two violation opportunities and one of them was missed, this analysis would calculate SDWIS/FED data quality in this instance as 50%. • A number of states pointed out that one improper determination could turn into multiple deficiencies. For example, if a system, due to being mis-categorized as a smaller system, collects 1 coliform sample per month instead of the 2 required, the data verifications will list a dozen violation discrepancies for the year. However, EPA should note that this is at least partially balanced by the fact that 1 missing sample is counted as 1 M/R discrepancy. Some sample bottles are to be used for several 15 ------- contaminants, which, if missed, would result in up to 30 M/R violations (for a missing Synthetic Organic Chemicals (SOC) sample). • A few data verifications were targeted to states having known data quality concerns. Some state reports are therefore better characterized as the "worst case" scenario. • Some Federal requirements had just become effective in the time frame covered by the audits and many states were still in the process of adopting state rules and developing state data systems. Some of the data discrepancies are a function of normal and expected "start-up" problems. Some states felt that a snapshot taken today is likely to show a much better picture than one taken 3 years ago because many states have made data quality improvements since, and resulting from, their audits. Data verifications conducted in 1999 and after no longer review the previous compliance periods and therefore will be compared to the results in this analysis to measure the improvements suggested here. • A few states contest their initial data verification audits. Some states believe that the data verification review team overlooked existing data (particularly monitoring results) and incorrectly determined that a violation had occurred when it had not. A number of states have pointed out errors in the data verifications findings which have since been investigated and corrected, and are reflected in this report. Despite these concerns, EPA believes the findings are representative of SDWIS/FED data quality at the national level. Even slight biases (some of which tend to cancel each other out) do not significantly change the overall findings. 4.4.2 Confidence in findings This is not a scientific survey and therefore statistical confidence intervals are not included for most of the point estimates. However, EPA is confident that the findings represent the quality of SDWIS/FED data at the national level. First, the data verifications audits are designed to be representative of the quality of drinking water data throughout the state with at least an 80% confidence level and a 7.5% margin of error. In addition, the audits have undergone scrutiny: in the summer of 1999, states and regions had an opportunity to review the findings of their audits, and any errors found were corrected. Second, EPA considers the summation of the 29 audits in 27 states to be representative of the quality of drinking water data at the national level. This was ascertained after EPA modeled the individual state findings mathematically using Bayesian statistics; the resulting probability curve was found to have a normal distribution. Third, EPA looked at data quality from many perspectives, and has compared estimates with the results of other analyses wherever possible. As will be discussed, findings from other analyses corroborated the data verifications findings. 4.4.3 State Annual Compliance Report (ACR) vs. SDWIS/FED The data verifications analysis has a category for violations discrepancies between state databases and SDWIS/FED. However, it does not indicate what portion of these 16 ------- discrepancies represent under-reporting (data which is in state databases but not SDWIS/FED) and over-reporting (data which is in SDWIS/FED but not state databases). It is necessary to make this distinction in order to yield estimates of Completeness and Accuracy. To accomplish this, EPA compared calendar year 1997 ACR data reported using state databases to 1997 data in SDWIS/FED. EPA calculated ratios of the magnitude of under- reporting to over-reporting for Chemical, Total Coliform Rule (TCR), and Surface Water Treatment Rule (SWTR) health-based violations and monitoring/reporting violations. These ratios were input to the data verifications analysis to enable EPA to calculate estimates for Completeness and Accuracy for violations data. 4.4.4 Inventory data 4.4.4.1 Estimates by parameter, and overall Four percent of 8 required inventory data points had discrepancies, or errors. In other words, the overall SDWIS/FED inventory data quality is estimated to be 96%, as shown below. Water ,, . ..,, Act^nact. «" sS P<*^°" iSSt Md™ ""* PWS ID Overall Number of Systems Reviewed Discrepancies: Number Percent SDWIS/FED data quality 2,032 58 2.9% 97% 2,014 61 3.0% 97% 1,997 39 2.0% 98% 1,996 184 9.2% 91% 1,996 161 8.1% 92% 1,979 99 5.0% 95% 1,996 41 2.1% 98% 1,996 3 0.2% 100% 16,006 646 4.0% 96% Each water system has 1 chance for a discrepancy for each parameter reviewed. The "Overall quality" column uses the sum of water systems reviewed for each parameter, which represents the total opportunities for a discrepancy. The population served and # service connections parameters had the most discrepancies. A discrepancy in either of these categories is counted as such if the difference is greater than 10%. Under several drinking water rules, the number of samples required to be taken is based on the population served and therefore its accuracy is important. 4.4.4.2 Reasons for discrepancies About one-half of the discrepancies were due to file inconsistencies between data in state files and the state database(s); another one-third were due to inconsistencies between data in state database(s) and SDWIS/FED; most of the remaining one-sixth were due to late reporting, or no data found in state files. 4.4.4.3 Estimates by system type and size Results from the data verifications analysis were very similar across system types, as shown below. None of the quality estimates for the 8 parameters listed above differed by more than 4%. 17 ------- CWS 97% NTNCWS 96% TNCWS 95% Unfortunately, the results of the data verifications analysis cannot be categorized by system size. The only way to get any approximation using this data is to look at system types as a proxy for system size. The information below lists the average population served by system type (from the 98Q4 frozen database, which was frozen in January 1999). CWS 4,645 NTNCWS 308 TNCWS 175 The average system size for NTNCWSs and TNCWSs is in the Very Small size category (25-500 population served), and for CWS the Medium size category (3,301-10,000). If these results can serve as a proxy for system size, then it appears that data quality may be similar across size categories. The industry surveys, discussed later, provide a direct measure of SDWIS/FED inventory data quality by system size so this report addresses this issue in Section 4.5.3.3. 4.4.5 Violations data 4.4.5.1 Estimates by violation type Listed below are SDWIS/FED data quality estimates for violations data. The first line of the table shows the percent of systems (by violation type) having any violations. Less than 10.4% of all systems audited in the data verifications had any Maximum Contaminant Level (MCL) violation, and less than 10% of the surface water systems audited had Surface Water Treatment Rule (SWTR) Treatment Technique (TT) violations. The estimate that slightly less than 78% of systems had M/R violations is based on the finding that 78% of all systems audited had at least one violation of any type, and M/R violations account for 94% of all violations. The 78% estimate also includes the small number of systems which only had LCR violations (earlier versions of the analysis included estimates for LCR, and it was not possible to subsequently remove LCR from this statistic). These percentages of systems having violations lend a systems perspective. They are not part of the calculations of SDWIS/FED data quality, which is based on a data point perspective. The remainder of the table reflects a data point perspective. 18 ------- % systems w/ violations Number of Violations Discrepancies Number Percent % Completeness % Accuracy SDWIS/FED data quality TCR Total Other MCL MCL Total SWTR MCL TT 6.1% <4.3% <10.4% 162 52 32% 68% 99% 68% 59 50 85% 19% 79% 15% 221 102 46% 55% 97% 54% 9.6% 94 87 93% 11% 67% 7% Total M/R <78% 5,090 4,613 91% 10% 95% 9% Legend: • TCR: Total Coliform Rule, applicable to all water systems. Coliforms pose an acute health risk • MCL: Maximum Contaminant Level violation • TT: Treatment Technique violation (MCLs and TTs are health-based violations) • M/R: Monitoring/Reporting violation TCR MCL data will serve as an example to describe this table: • 6.1% of systems reviewed incurred or should have incurred a total of 162 MCL violations • Of the 162 violations, there were 52 discrepancies, or errors. The discrepancy rate is 32%, and the corresponding SDWIS/FED data quality estimate is 68% (100%-32%). • Completeness and Accuracy—68% of the violations that should be reported in SDWIS/FED made it in, and of the violations in SDWIS/FED, 99% are accurate. According to these estimates, roughly 2/3 (68%) of all TCR MCL violations were reported completely and accurately. The SDWIS/FED data quality is 15% for Other MCLs, 7% for SWTR TTs, and 9% for M/R violations. Overall, the data that do make it into SDWIS/FED are accurate. In fact, 99% of the TCR MCL violations, 79% of Other MCL violations, and 95% of M/R violations listed in SDWIS/FED are accurate. However, only, 2/3 of SWTR TT violations listed in SDWIS/FED are estimated to be accurate. In other words, there may be some over- reporting of SWTR TTs in SDWIS/FED. The weak link in data quality is the large number of violations that never make it to SDWIS/FED (as estimated by Completeness). Only 1 out of every 9 SWTR TT violations that should be in SDWIS/FED make it in (11% Completeness), and only 1 out of every 10 M/R violations make it in. 4.4.5.2 Reasons for discrepancies The data verifications include several categories, or reasons, for violations discrepancies. Reason Not In state database No data found in state files Insufficient samples Different implementation policies Other In state database(s) but not SDWIS/FED In SDWIS/FED but not state database(s) Total TCR Other MCL MCL 0 0 31 4 16 1 52 0 0 22 0 26 2 50 Total SWTR MCL TT 0 0 53 4 42 3 102 0 0 72 0 11 4 87 M/R 3,492 205 417 56 252 25 4,613 M/R violations discrepancies account for the majority (94%) of all the discrepancies, with the largest category being "no data found in state files." This category applies to 19 ------- M/R violations only. If, for example, required sample results could not be found in any state files, a discrepancy would be cited if the state did not issue a M/R violation. This could also occur if water systems were told they could reduce the monitoring frequency for some requirements, but no record of a waiver having been issued was found. The "Different implementation policies" category means that the state did not determine compliance in accordance with their state primacy agreement. Since late 1996, auditors have also been factoring in any additional guidance provided by EPA Regional offices. Thus, as long as a state acts in accordance with its own EPA-approved regulations, or formal interpretive guidance issued by the region, no discrepancy is issued. The last category listed, "In SDWIS/FED but not state database(s)," represents over- reporting. AH the other categories represent under-reporting. Overall, 99.3% of all violation discrepancies found in the data verifications analysis are estimated to be from under-reporting. Only 32 out of the 4,802 violation discrepancies found (<0.7%) are estimated to be from over-reporting. These estimates are based in part on ratios of under- to over-reporting identified from the ACR analysis. Most violations discrepancies are related to compliance determination at the state level, which consist of violations which never made it into state databases. The remaining discrepancies (i.e., those listed in the last two rows of the above table) are related to data flow between state files and SDWIS/FED. However, since monitoring/reporting discrepancies comprise 94% of the total number of discrepancies an overall number shouldn't represent this. A more precise picture is portrayed when the discrepancy categories are analyzed by violation type: Breakdown of TCR Other Total SWTR discrepancies MCL MCL MCL TT M/R Compliance determination Dataflow 67% 44% 56% 83% 94% 33% 56% 44% 17% 6% As shown in the table above, only one-third of all TCR MCL violation discrepancies occur between state files and SDWIS/FED (17/52). Over one-half of Other MCLs (28/50), one-sixth of SWTR TTs (45/102), and only 6% of all monitoring/reporting violation discrepancies (277/4,613) occur between state files and SDWIS/FED. Other analyses will look at some reasons for these data flow discrepancies, including the frozen database analysis, which looks at Timeliness (were some violations merely entered late?), and the errors analysis (were some violations rejected at data entry?). 4.4.5.3 Estimates by rule/contaminant The data verifications also listed violations data by rule/contaminant. There were neither sufficient data points to calculate quality estimates for some of the Chemical MCLs, nor sufficient data points to calculate estimates of Completeness and Accuracy. Again, a systems perspective, listing the percentage of systems having any violations, precedes the SDWIS/FED data quality estimates. 20 ------- # systems reviewed # systems w/ violations % systems w/ violations # Violations # Discrepancies % Discrepancies SDWIS/FED data quality # systems reviewed # systems w/ violations % systems w/violations # Violations # Discrepancies % Discrepancies SDWIS/FED data quality TCR 1,857 113 6.1% 162 52 32% 68% IOCS 1,025 10 1.0% 12 11 92% 8% Nitrate 1,489 19 1.3% 37 32 86% 14% * insufficient data TCR 1,857 480 26% 1,289 1,034 80% 20% IOCS 1,025 175 17% 193 174 90% 10% Nitrate 1,489 507 34% 964 844 88% 12% MCLs Nitrite 1,489 3 0.2% 4 3 75% 25% MVRs Nitrite 1,489 224 15% 235 210 89% 11% SOCs 1,025 0 0.0% 0 0 0% * SOCs 1,025 263 26% 877 864 99% 1% VOCs 1,026 2 0.2% 2 1 50% * VOCs 1,026 257 25% 722 686 95% 5% TTHMs 83 1 1.2% 1 0 0% * TTHMs 83 6 7% 12 12 100% 0% TTs Rads SWTR 523 2 0.4% 3 3 100% * 395 38 9.6% 94 87 94.7% 5% Rads SWTR 523 395 111 83 21% 21% 163 635 161 628 99% 99% 1% 1% Legend: MCL: Maximum Contaminant Level violation TT: Treatment Technique violation (MCLs and TTs are health-based violations) M/R: Monitoring/Reporting violation TCR: Total Coliform Rule IOC: Inorganic Chemicals SOCs: Synthetic Organic Chemicals VOCs: Volatile Organic Chemicals TTHMs: Total Trihalomethanes Rads: Radionuclides SWTR: Surface Water Treatment Rule SDWIS/FED data quality for TCR data is significantly higher than for other rules or contaminants. The data quality estimates for SOCs, TTHMs, Rads, and SWTR averaged 1% quality, or less. The vast majority of these discrepancies were due to under- reporting—specifically, no data found in state files. In other words, no more than 1 of every 100 SOC, TTHM, Rad, and SWTR M/R violations are reported to SDWIS/FED. 4.4.5.4 Estimates by system type These data by water system type were similar enough to be counted together, as shown below. By doing so, the accuracy of our estimates increased significantly, since more data points yield better estimates. TCR MCL SWTRTT M/R cws NTNCWS TNCWS Overall 69% 67% 68% 9% 11% 0% 9% 7% 14% 68% 7% 9% Again, there are not enough data points to calculate estimates for Completeness and Accuracy, nor are there sufficient data to estimate quality of Other MCLs by system type. Unfortunately, the results of the data verifications analysis cannot be sorted by system size. The industry surveys can be categorized in this way, as will be discussed later. 21 ------- 4.4.6 Enforcement actions data 4.4.6.1 Estimates by system type, and overall Estimates for formal SDW1S/FED enforcement actions data quality are preceded by a systems perspective. CWS NTNCWS TNCWS Total # Systems Reviewed ' # Systems with Enforcement Actions % Systems with Enforcement Actions # Enforcement Actions # Discrepancies—under-reporting # Discrepancies—over-reporting # Discrepancies—incorrect reporting Total discrepancies % Discrepancies % Completeness % Accuracy SDWIS/FED data quality 696 163 23% 505 55 37 29 121 24% 89% 85% 76% 548 122 22% 305 53 17 22 92 30% 83% 85% 70% 562 75 13% 222 29 24 21 74 33% 87% 77% 67% 1,806 360 20% 1,032 137 78 12 287 28% 87% 83% 72% Data in the "Total" column will serve as an example to describe this table: • System perspective—of the 1,806 systems reviewed in the audits, 360, or 20%, had enforcement actions. • Of the 1,032 enforcement actions listed for these 360 systems, there were 287 discrepancies. The discrepancy rate is 287/1,302 or 28%. • Overall, 87% of the data that should be in SDWIS/FED make it in (Completeness), and 83% of the enforcement actions in SDWIS/FED are accurate. The calculation for Completeness is based on the number of discrepancies that represent under-reporting. Here the data verifications are clear as to which actions were not reported to SDWIS/FED. The calculation for Accuracy is based on Over-reporting (missing from state files) as well as for Incorrect reporting (which occur if the dates listed in SDWIS/FED are off by more than a month). The quality estimates are similar across system types: all were within 4% of the combined average. 4.5 Industry surveys 4.5.1 Background Both the National Rural Water Association (NRWA), in conjunction with the Association of Drinking Water Administrators (ASDWA), and the American Water Works Association (AWWA) volunteered to survey their water systems. The objective of this effort was to get data quality estimates from water systems directly. Indeed, this is the only analysis that goes upstream of state records. From this analysis EPA derived overall SDWIS/FED data quality estimates for inventory data, and Accuracy estimates for violations and enforcement actions data. Operators were not 22 ------- asked to assess the Completeness of violations and enforcement actions data, but only the Accuracy of those listed in SDWIS/FED. Another objective of this effort was to provide states with feedback from this effort to help them investigate and correct potential errors that may exist. Any corrections they make will be reflected in SDWIS/FED in the next quarterly update after the state's corrections are submitted to SDWIS/FED. 4.5.2 Survey design Water systems surveyed received a printout of their inventory, violations, and enforcement action data from SDWIS/FED. Water system operators were asked to indicate whether each data point was correct, incorrect, or to indicate "DK" if they did not know. Each data point marked "DK" was removed from the survey analysis so as not to artificially lower the discrepancy rates. AWWA sent surveys to all water systems serving more than 10,000 people that incurred at least one violation between FY1993 and FY1997. Of the 2,222 surveys sent, 684 were completed and returned, resulting in a 31% response rate (25% is a typical response rate for mailed surveys). NRWA/ASDWA surveyed active, current systems serving fewer than 10,000 people that incurred at least one violation between FY1993 and FY1997. A random sample of 40 CWSs and 5 NTNCWSs were selected for each state. Of 2,549 surveys sent, 439 were completed and returned from 23 states. The response rate was 17% overall, and 39% from the 23 states that participated. As discussed below, in both surveys, water systems that did not respond had a higher average number of violations than those that did. The effect of this self-selection bias on the results of this analysis is unclear. Two systems were removed from the AWWA survey. A system in New Jersey disputed all of their 751 violations. This may be a case of over-reporting, but the inclusion of this single system in the survey would have resulted in overall discrepancy rates four times higher. A system in PA with 718 violations was removed because it was not clear how to categorize their violations. On their survey sheets, the water system indicated "DK." In a letter they sent with their completed survey they did not dispute any of them, and in fact explained how several of them occurred. In a telephone interview, they disputed all of them. An EPA contractor conducted telephone interviews with 7 water system operators to evaluate how they filled out the survey, and to investigate potentially "extreme" responses—water systems which disputed either all or none of their violations data. The contractor found the presence of response bias in the violations and enforcement actions responses: a number of water systems contacted said they left violations and enforcement actions data points blank, rather than indicating "DK," if they were unsure whether the data points were correct or not. The magnitude of this bias is unclear. 23 ------- 4.5.3 Inventory data Operators did a thorough job in evaluating their inventory data. Since each water system had 7 required data points to evaluate, the results from each water system are counted equally. This is in contrast to violations and enforcement actions data, where a few systems having hundreds of violations, for example, significantly increase the average violations discrepancy rates. The result shows a fairly high degree of confidence in these inventory estimates. 4.5.3.1 Estimates by required parameter, and overall Status: Actjve/lnact. Water system Primary Population *Se™ce Address '*" SOUrce rnnnnrhnnc Type connections Name Overall AVWVA survey NRWA survey Data verifications 100% 99% 97% 100% 98% 97% 90% 97% 98% 85% 84% 91% 86% 85% 92% 87% 87% 95% 97% 97% 98% 92% 93% 96% Public water system identification number (PWS ID) was not assessed in the surveys. A discrepancy in the Population served and # service connections is counted as such if the difference is greater than 10%, as was done in the data verifications analysis. As shown above, the overall SDWIS/FED data quality estimates are very close to but slightly lower than the estimates from the data verifications analysis. The surveys estimated slightly lower SDWIS/FED data quality for primary source (AWWA survey only), population served, service connections, and address. The surveys also asked water system operators to evaluate some optional data parameters. This information was requested to estimate the quality of the currently optional data that would become required as of January 2000. 4.5.3.2 Estimates by additional parameters Primary Contact 93% 94% Phone 65% 70% Owner category 96% 91% County 1 97% 91% County 2 100% 100% Principal city 56% 76% Principal county 95% 98% AWWA survey NRWA survey 4.5.3.3 Estimates by system type and size, for required data System type Size category AWWA survey NRWA survey Data verifications NRWA survey Data verifications cws NTNCWS 92% 92% 97% 97% 96% NRWA survey AWWA survey Very Small Small Medium Large Very Large 93% 92% 91% 92% 91% These estimates by system type are slightly lower than those estimated by the data verifications for CWSs, and they are very close for NTNCWSs. As discussed above, EPA was not able to calculate SDWIS/FED data quality estimates by system size category in the data verifications analysis. Fortunately, EPA was able to do this in the industry surveys. As shown above, the results across size categories are very close and show high data quality for required inventory data selected for this analysis. 24 ------- 4.5.4 Violations data As described above, the surveys yielded estimates of the Accuracy of the data in SDWIS/FED; they did not assess the Completeness of the data (the % of data that should be in SDWIS/FED that made it in). These Accuracy estimates are of uncertain value, due to the fact that some water system operators may have left data points blank because they were not sure whether or not a violation was correct (instead of indicating that they did not know). This dilutes the discrepancy rates to an unknown degree. In addition, there may be some non-response bias: water systems included in the survey that did not respond averaged 40% more violations in the AWWA survey and 58% more violations in the NRWA survey than those that did. The effect of this bias is unclear. Ninety-six percent (96%) of systems in the AWWA survey, and 91% in the NRWA survey, did not dispute any of their violations. Overall, these Accuracy estimates are very close to those from the data verifications analysis. 4.5.4.1 Accuracy estimates by violation type Total MCL SWTR TT Total M/R NRWA survey AWWA survey Data verifications 96% 99% 97% 91% 99% 67% 97% 96% 95% The Accuracy estimates for Total MCLs and Total M/Rs are very similar to those from the data verifications analysis. However, the surveys estimated a higher Accuracy of SWTR TTs than did the data verifications. 4.5.4.2 Accuracy estimates by rule/contaminant MCLs TTs TCR IOCS Nitrate Nitrite SOCs VOCs TTHMs Rads SWTR LCR NRWA survey AWWA survey 97.0% 99.4% n/a 100% 89.9% 100% * * * 100% 100% 100% * 97.0% 100% 100% 90.6% 99.4% 100% 96.8% * insufficient data M/Rs TCR IOCS Nitrate Nitrite SOCs VOCs TTHMs Rads SWTR LCR NRWA survey AWWA survey 93.6% 100% 94.2% 100% 100% 99.7% * 100% 100% 88.1% 91.7% 99.2% 96.8% 100% 76.7% 98.4% 94.0% 100% 69.3% 100% * insufficient data Overall, the surveys estimate very high Accuracy of these contaminants/rules. In other words, water system operators disputed very few of the violations listed in SDWIS/FED. Unfortunately, EPA was not able to calculate comparable Accuracy estimates for violations data from the data verifications analysis. There were an insufficient number of data points available at the rule/contaminant level. Therefore, a direct comparison of the results listed above to the data verifications analysis is not possible. 25 ------- 4.5.4.3 Accuracy estimates by system type and size AWWA survey NRWA survey " System type CWS CWS NTNCWS Accuracy- all violations 95% 97% 95% NRWA survey AWWA survey Size category Very Small Small Medium Large Very Large Accuracy — all violations 97% 96% 95% 96% 90% Overall Accuracy estimates are very close across system types. By system size they are very close as well, with the exception that Very Large systems are 5-7 percentage points lower. 4.5.5 Enforcement actions data Again, these Accuracy estimates are of uncertain value, due to the fact that some water system operators may have left data points blank instead of indicating "DK" on their surveys. This dilutes the discrepancy rates to an unknown degree. 4.5.5.1 Accuracy estimates by system type and size System type Accuracy Size category Accuracy AWWA survey NRWA survey Data verifications NRWA survey Data verifications CWS NTNCWS 98% 99.6% 89% 99% 83% NRWA survey AWWA survey Very Small Small Medium Large Very Large 99% 99.8% 99% 99% 98% The Accuracy estimates are higher than those estimated in the data verifications analysis. In addition, the survey findings indicate that the Accuracy is very similar across system types and sizes. 4.6 Frozen database comparison—Timeliness estimates A violation is due to be reported to SDWIS/FED within 90 days after its compliance period end date. This analysis quantifies how long it has taken for FY1997 violations to be reported. SDWIS/FED databases have been "frozen" quarterly since 1997. These frozen databases enable EPA to look at what data were in SDWIS/FED during set time periods. This analysis compares fiscal year 1997 data reported in each of the seven quarterly databases frozen since January 1998. The following estimates are based on violations reported to SDWIS/FED by July 1999. This analysis assumes that all FY1997 violations which were going to be reported actually were reported by July 1999 (7 quarters after all violations were due). Data for North Carolina is not included in these estimates. Their reporting of violations data was highly erratic, which skewed the results. There were 137,978 violations in FY1997 with end dates at or before September 30, 1997, which were due to be reported by December 31, 1997. Similarly, there were an 26 ------- additional 36,937 violations with end dates between October 1 and December 31, 1997, and these were due to be reported by March 31, 1998. Therefore, a total of 174,915 FY1997 violations were due to be reported not later than March 31, 1998. There were also 12,849 violations having end dates later than December 31, 1997; they are not included in this analysis. Most had significantly later end dates and would not be due to be reported before July 1999. Timeliness # violations reported # that should have been reported % reported by each frozen database 97Q4, frozen Jan -98 94,484 137,978 68% 98Q1, frozen Apr '98 118,318 174,915 68% 98Q2, frozen JuT98 153,988 174,915 88% 98Q3, frozen Oct'98 158,752 174,915 91% 98(24, frozen Jan '99 170,793 174,915 98% 99Q1, frozen Apr '99 170,647 174,915 98% 99Q2, frozen Jul '99 174,915 174,915 100% 174,915 # violations due to be reported 137,978 97Q4 98Q1 98Q2 98Q3 98Q4 # violations reported 99Q1 9902 At the national level, this analysis indicates that 68% of FY1997 violations that should be in SDWIS/FED by December 31, 1997 made it in on time, and that 68% of all the violations that should have been reported by March 31, 1998 actually were reported by then. Late reporting is a component of Completeness. As can be seen above, late reporting is a significant problem. It was not possible to factor Timeliness estimates into the SDWIS/FED data quality estimates since the period of review for most contaminants/rules in the data verifications audits was primarily 1993-1998—most of which occurred before late 1997 when EPA began to "freeze" SDWIS/FED databases. EPA was able to categorize Timeliness using the two methods of data entry to SDWIS/FED. EPA used data from the errors analysis to determine which state used which data entry method. • Traditional method, wherein only new, modified, or deleted information is transmitted • Total Replace method wherein the state sends a complete data set every quarter and totally over-writes all data previously submitted. Violations appear to be reported in a more timely manner when the Traditional method is used to report violations, compared to the Total Replace method: 27 ------- % reported by each frozen database Total Replace method Traditional method 97Q4, frozen Jan '98 63% 71% 98Q1, frozen Apr '98 60% 71% 98Q2, frozen Jul '98 75% 94% 98Q3, frozen Oct'98 75% 98% 98Q4, frozen Jan '99 92% 100% 99Q1, frozen Apr '99 93% 100% 99Q2, frozen Jul '99 100% 100% Traditional method i Total Replace method 97Q4 98Q1 98Q2 98Q3 98Q4 99Q1 99Q2 As can be seen, many states have been adding and modifying FY1997 violations data several quarters after they were due. Through 1996, there does not appear to have been nearly as much volatility in the data. This may be the result of attention focused on correcting discrepancies between State Annual Compliance Reports and SDWIS/FED which were first identified in the 1996 and 1997 reports. An almost identical trend is occurring with FY1998 data, as illustrated below: # violations listed in each database frozen since Jan '98 for FY1997 data, and Jan '99 for FY1998 data FY1998 FY1997 North Carolina data not included (their FY 1997 reporting was highly erratic, and this would have skewed the results). 4.7 Comparison of SDWIS/FED to Envirofacts The public sees SDWIS/FED data as displayed in Envirofacts, EPA's multimedia website. One aspect of this analysis was to compare data in SDWIS/FED to Envirofacts to ensure that no errors are introduced in transfer of data from SDWIS/FED to Envirofacts. All data from 250 water systems selected at random were compared in the two databases to identify any data transfer errors. No errors were found. 28 ------- 5 Additional data quality analyses 5.1 States' reporting of violations data As part of a further analysis of under-reporting identified initially in the data verifications analysis, EPA looked at the Annual Compliance Report (ACR) comparison to SDWIS/FED. The ACR vs. SDWIS/FED analysis indicated that several states did not report any violations at all in CY1997 for certain contaminants/rules. Some of the non- reporting was attributable to late reporting. Some states that did not report in 1997 had reported in other years. To factor out late reporting, and to get a more comprehensive picture of non-reporting of certain violations by state, EPA queried the SDWIS/FED database frozen in October 1999 and listed all violations reported by each state between FY1993 and FY1998. It found that over a dozen states have never reported chemical rule violations for any NTNCWSs or TNCWSs, and half have never reported Radiological rule violations for CWSs. Clearly, some of the non-reporting is attributable to states simply not having any violations to report. However, in light of the magnitude of under-reporting estimated in the data verifications analysis, and given the percentages of systems estimated to have violations, by rule, many of these "blanks" represent a problem. These "blanks" are being evaluated in state-by-state summaries of SDWIS/FED data quality. The two tables below only include situations where the state has certain systems subject to a rule. One state has no NTNCWSs (Alaska), and the SWTR has no impact in states without any surface water systems in a system type category. One state/territory has no surface water CWSs, 7 have no NTNCWSs, and 13 have no surface water TNCWSs. 5.1.1 Number of the 52 states/territories that have never reported any violations, by rule, between FY1993 and FY1998 Below is a list of states/territories that have never reported a violation in this six-year period. cws NTNCWS TNCWS This can also be shown in percentages, which will facilitate a comparison with the next table. TCR Chemicals RADs LCR SWTR Ma M/R MCL M/R MCL M/R TT M/R TT M/R 0 1 0 0 1 1 5 8 13 14 12 14 25 23 Rule applies to CWS only 21 1 25 4 Does not apply 4 18 16 27 11 20 cws NTNCWS TNCWS TCR Chemicals RADs LCR SWTR MCL M/R MCL M/R MCL M/R TT M/R TT M/R 0% 2% 0% 0% 2% 2% 10% 15% 25% 27% 23% 27% 48% 44% Rule applies to CWS only 40% 2% 49% 8% Does not apply 8% 35% 36% 60% 26% 48% 29 ------- 5.1.2 Percent non-reporting of violations, by type, between FY1996 and FY1998 It is informative to look at the percentage of non-reporting, to count the percentage of "blanks" in each year. The table below lists the percent of non-reporting that occurred between FY1996 and FY1998 by contaminant/rule. There are 156 opportunities to report violations in each box below (52 states*3 years), less the number of states not counted, as described above. cws NTNCWS TNCWS TCR Chemicals RADs LCR SWTR MCL MIR MCL M/R MCL MIR TT MIR TT MIR 0% 3% 2% 5% 5% 6% 22% 28% 46% 38% 40% 36% 62% 56% Rule applies to CWS only 55% 28% 70% 39% Does not apply 17% 63% 55% 81% 53% 72% This shows that almost all states have reported both TCR MCL and M/R violations in each year. The other rules have had significantly less reporting. For each rule and violation type, the most reporting has been done for CWSs. NTNCWSs and TNCWSs fared about the same as each other, but were reported less frequently than CWSs. Comparing this table to the one above it shows that the percentage of "blanks" is in some cases significantly higher than when merely considering states that have never reported. In other words, the states that have reported for specific rules/contaminants have not done so in each year. For example, 10% of states have never reported a Chem MCL (which accounts for 10% of the "blanks"), but there are 22% "blanks" for Chem MCLs. 5.1.3 Percent non-reporting of violations, by year The level of non-reporting in each year has been fairly steady, although it increased in 1998. EPA calculated the statistics below by dividing the total number of "blanks" each year by the total number of opportunities to report violations. 1998 1997 1996 1995 1994 1993 64% 59% 58% 58% 53% 58% Again, some of these blanks represent states that simply had no violations in a category during a year. State-by-state summaries of SDWIS/FED data quality take a closer look at this issue. 5.2 Comparison of states' reporting of Annual Compliance Report (ACR) data to SDWIS/FED data 5.2.1 Background This analysis highlighted differences between data in state databases and files and SDWIS/FED. These differences were analyzed numerically using the 1997 ACR data. States were also asked to identify reasons for discrepancies between what they reported for the 1996 and 1997 ACR and what is in SDWIS/FED. 30 ------- 5.2.2 Under- and over-reporting between state databases and SDWIS/FED In this exercise, EPA calculated ratios of under- to over-reporting. These results were also used in the data verifications analysis. The data verifications list discrepancies between state databases and SDWIS/FED, but do not divide them into over-reporting, under-reporting, and incorrect reporting (in the case of incorrect reporting, the violation exists in both databases but does not match). In order to calculate estimates for Completeness and Accuracy, EPA to ascribed discrepancies to either over-reporting or under-reporting (it is not possible to get numerical estimates of incorrect reporting). This will also enable EPA to compare accuracy estimates from the data verifications analysis to the industry surveys. The ACR vs. SDWIS/FED analysis used 1997 ACR data. The ratios of under-reporting to over-reporting, by rule and overall, are shown below: TCR Chem SWTR LCR Total Overall MCL MIR MCL M/R TT M/R TT M/R MCL TT M/R Total 21.0 3.0 ] 10.5 136.5 | 3.3 37.3 [ 14.6 19.0 [ [ 16.3 4.5 16.2| [ 15.5 Significantly more under-reporting than over-reporting of violations was found. For example, of the 1997 ACR violations reported using state databases vs. SDWIS/FED, the magnitude of overall under-reporting was more than 15 times as great as the magnitude of over-reporting. Here is how these estimates were calculated: First, EPA excluded states that reported using SDWIS/FED, since EPA wants to compare what is in state databases to what is in SDWIS/FED. EPA also excluded Chemical M/R violations for one state that listed 21,807 violations in their state database and only 98 in SDWIS/FED; these numbers were an anomaly, and they skewed the overall results. Next, instances of over-reporting and under-reporting were summed separately. For each, differences were taken (between the totals for violations in state databases and in SDWIS/FED). Finally, the difference, or number of discrepancies, for under-reporting was divided by the difference for over-reporting. 5.2.3 Minimum discrepancy rates between state databases and SDWIS/FED Along with the ratios calculated above, it is informative to look at the discrepancy rates between 1997 ACR data in state files and SDWIS/FED. These estimates are listed below: TCR Chem SWTR LCR Total Overall MCL M/R MCL M/R TT M/R TT M/R MCL TT M/R Total 15% 31% 40% 41% 20% 38% 86% 68% 18% 26% 39% 37% These discrepancy rate estimates are minimum estimates since in order to generate them EPA had to assume that all violations match between state databases and SDWIS/FED. For example, if there are 6 violations in a state's database and 10 in SDWIS/FED, we've assumed that these 6 match, resulting in 4 instances of over-reporting. The discrepancy rates generated from this analysis are also understated because the discrepancy rate uses the maximum value in the denominator in order that discrepancy rates do not exceed 100%. 31 ------- Another way of looking at these results is to see how well the data match between state databases and SDWIS/FED. For example, TCR MCL data have an estimated minimum discrepancy rate of 15%; this means that a maximum of 85% of the data match. Maximum correlation estimates are shown below: TCR Chem SWTR LCR Total Overall MCL MIR MCL M/R TT M/R TT M/R MCL TT M/R Total 85% 69% 60% 59% 80% 62% 14% 32% 82% 74% 61% 63% Overall, roughly 2/3 of the data in state databases and SDWIS/FED match. LCR TTs had the lowest correlation estimate of 14%. Here is how these estimates were calculated: Again, EPA used 1997 data, only included states that reported using their own databases, and excluded Chem M/R violations from the state with huge underreporting. EPA separated the minimum and maximum value of each pair of data (a pair of data being, for example, the number of TCR MCL violations in state databases and the corresponding number in SDWIS/FED). Each pair of data (the data point in the state database and the corresponding value in SDWIS/FED) was sorted by the maximum and minimum value. The totals were put into the following equation: Minimum discrepancy rate = sum of maximum # violations - sum of minimum # violations Sum of maximum # violations EPA divided by the maximum number of violations to keep the discrepancy rates below 100%. 5.2.4 Main reasons cited by states for these discrepancies Overall, the category of "data entry problems" was the most common reason given for discrepancies. This includes incomplete PWS inventories; and data submission, transfer file format, and coding problems. The category of "resource limitations" was the next most common reason for discrepancies. This includes the inability of a state system to upload data to SDWIS/FED, lack of staff and/or programmers, and no automated tracking system for a particular rule. The following number of states citing these reasons for discrepancies, by violation type: MCL M/R TT Under- Over- Under- Over- Under- Over- Keason reporting reporting reporting reporting reporting reporting Data Entry Resource Limitations Regulation Interpretation Issues ACR Guidance Interpretation Issues Late Reporting Automated System Generation Reason not provided 14 8 23 9 9 7 7 11 3 7 5265 1 412 1 20 10 31 18 11 For under-reporting, M/R violations had the most discrepancies. The most frequently cited reason is data entry. This was followed by TT violations, with the most frequently 32 ------- cited reason being resource limitations, followed by data entry. MCL violations had the fewest discrepancies. The most frequently cited reason is data entry. Most frequently cited reasons for discrepancies, by rule: Dataentrv Resource ACR Regulation " y limitations guidance implementation Chems TCR SWTR LCR wa M/R MCL M/R TT M/R TT M/R #1 #1 #1 #1 #1 #1 #2 #2 #2 #2 #2 #2 #2 #1 #1 5.3 Error reports analysis—data transfer errors 5.3.1 Background This analysis reviewed 2 quarters of error production reports (received during the period August 1 through December 31, 1998), to look at the magnitude of, and reasons for, data transfer errors between state databases and SDWIS/FED. Eight hundred forty one (841) files were reviewed. Three hundred two (302) files were analyzed in detail to determine the error rejection rate and the error correction rate. At the state level, the information obtained will be used to provide recommendations for corrective actions, training needs identification, and quality assurance procedures. Because the method of update, error correction, and level of effort states expend on correcting errors varies from quarter to quarter, extrapolating the errors analysis information to determine a national level rejection rate for each type of submission and/or data type was determined to be inconclusive. Additional meta-data will need to be collected in the future if more detail is desired on rejection rates. 5.3.2 Common types of errors Of the over 800 possible error conditions which are programmed into SDWIS/FED edit criteria, only 230 occurred in the 841 files analyzed. The most common reasons are listed below: 27% 14% 8% 8% 8% 7% 7% 6% Invalid values: typos, non-permitted values, etc. Cross Edits: data rejected because a comparison between two or more attributes yielded incompatible values. Non-Existent Data: attempts to modify or delete data or records which do not exist on the database. Processing Rule: comparison between two or more attributes showed invalid combinations. Missing Registration Requirements: attempts to post a new water system without all required elements present. Content: missing values and/or missing combinations of data. SDWIS/FED bugs and software limitations. Duplicate Data: data already exists in the input file or in the database. Eighty-two percent (82%) of the error "types" relate to data entry errors (e.g., failure to follow data entry instructions, keypunch, missing or incomplete data, or invalid values). SDWIS/FED bugs and software limitations represent 7% or less of the errors. The 33 ------- remaining 11 % included informational messages, old FRDS conversion errors, and errors that could be either a SDWIS/FED bug or a state data entry error depending on the data submitted. 5.3.3 Main reasons cited for non-reporting during the analysis period State resource limitation was given as the primary reason for Lead & Copper sample data not being reported during the analysis period. Three states were unable to submit action files due to major system software reprogramming or data clean-up activities. Those failing to submit any inventory data during 1998 cited major system software conversion activities or state resource limitations as the reason. 5.3.4 Rejection rates of files Rejection rates for inventory and actions data were calculated for files submitted using the Traditional method. It was not possible to calculate comparable rejection rates using the Total Replace method because SDWIS/FED cannot identify which data in the file are being submitted for the first time. The following equation was used; The error rates of files using the Traditional method = # lines in error file # lines in input file In Traditional updates, 20% of inventory data and 32% of violation and enforcement actions data are being rejected. 5.3.5 States' success in submitting correction files Most states attempted corrective actions. When data are rejected from SDWIS/FED, states (or EPA regions acting on the state's behalf) are sent error reports indicating what data were rejected and the reason(s) for the rejection. It appears that error files having a large number and/or variety of errors were not being corrected on the first attempt. Some errors did not require correction, such as duplicate records being submitted, or intentional manipulation by the state or EPA region in order to achieve a specific result. It was not possible to accurately determine the volume of such errors. Only a quarter of all states were completely successful in resubmitting rejected data on their first attempt. Three- fourths of the states had at least a quarter of the second attempt reject. Reasons for errors remaining uncorrected include: states did not understand how to correct the original error, they chose to correct only some errors, or, as mentioned above, some errors do not require correction. 5.4 State structures analysis Analysis of the ASDWA Management and Data Flow of States survey failed to produce any clear-cut reasons for particular state drinking water programs to have better data quality, defined as consistency between state records and SDWIS/FED. To perform the analysis, state ranking was determined by dividing the total number of discrepancies for violations between State records and SDWIS/FED by the total number of violations, and obtaining a percentage. Then staff looked to see if there was a correlation between the way a state is organized and its data quality, as measured by its discrepancy rate. 34 ------- Analysis showed that both the highest and lowest ranking states had similar responses to the survey questions. Because the analysis of the ASDWA data provided no clear-cut answers, EPA asked the Cadmus Group which has conducted data verifications in the past to select several states that it believes maintain model programs and summarize the organizational structure of those states. These states were selected regardless of any violation or discrepancy numbers present in DV reports. The number or levels of the organization do not appear to have as great an impact on data quality as does the quality of communications. Another key factor was an adequate number of trained, qualified personnel. The program and management structural components which were believed to be critical to promoting high data quality, are presented below: • Communication: Routine, meaningful and timely communication at all levels. • Annual PWS notification of monitoring schedules and requirements • Automated compliance determination for monitoring requirements • Violation notification with required corrective action instructions • Standard operating procedures and related periodic training including: data entry, forms completion, conducting sanitary surveys, and compliance determination • Efficient and timely method of access to water system data for all staff • Electronic access to laboratory sample data • Existence and use of a quality assurance program which resolves and prevents errors • Standardized data submission format (electronic or forms) from PWS and labs • Streamlined handling process of document and analytical result handling through compliance determination and recording violations and follow-up actions 5.5 State summaries of SDWIS/FED data quality, and recommended improvements The last component of this project is the EPA analysis of SDWIS/FED data quality on a state-by-state basis. The resulting state summary reports will provide specific prioritized recommendations to help states improve their data quality. The individual summaries will be provided to states separately during the spring of 2000. The summaries will address the following state-specific findings. • ACR vs. SDWIS/FED analysis findings which highlight areas of zero and non- reporting, and violation type discrepancies which are greater than 10%. • Numeric and non-numeric findings from data verifications conducted between 1996 and 1998. Data verifications conducted during 1999 are used to clarify or support findings from other analysis areas. Strengths and areas of weakness, which impact data quality, are highlighted. 35 ------- • The number of violations reported by each state in each fiscal year between 1993 and 1998, from the frozen database analysis. Violations are categorized by contaminant/rule and by water system type. • A discussion of state management structures with recommendations for improvement including a listing of key components that promote good SDWIS/FED data quality. • Significant findings from EPA Mid-Year and/or End-of-year Program Reviews relating to data management and SDWIS/FED data quality. • An analysis of SDWIS/FED error reports from data submitted during August 1, 1998 through December 31, 1998. 36 ------- Appendix A—Stakeholders Working Group recommendations Recommendations were identified and evaluated during the three major phases of the Data Reliability Action Plan. The first phase involved the 3 public stakeholder meetings. The second phase involved the individual analyses that were conducted, the results of which are included in this report. The third phase was the Stakeholder Work Group review of the preliminary findings of the data verifications, error report, ACR and timeliness analyses at the September 1999 meeting and additional recommendations were suggested. All recommendations were discusses and voted on at the meeting. The following table presents the results of that vote. # votes Recommendation 19 17 15 15 11 10 7 7 6 6 6 5 Increase training • Provide on-site assistance to resolve state-specific data entry problems. • Provide additional compliance determination training, and data entry training for new and existing rules • Establish a multi-regional cadre of trainers (funded through either a central contract and/or with the states paying for travel). Improve the data verifications audits • Include specific, prioritized, implementable recommendations. • Include the # of systems with discrepancies. • Conduct DVs for each state every 2-3 years, which will help promote and track follow-up to previous DV recommendations. • Issue DV procedures so states can perform self-audits • Review data at the water system level to correlate data in state files • Add a timeliness review • Make follow-up of DVs part of regional quarterly/annual reviews • Tighten follow-up procedures — have the EPA regional office check back with states within 6 months Streamline reporting and rule complexity Make error reports more user-friendly. It is currently very difficult for managers to use them to identify specific problems Encourage states to notify utilities annually of compliance monitoring schedules EPA should focus follow-up on poorer state/regional performers • Focus on states not reporting specific rules — should trigger a focused DV audit Require electronic reporting of monitoring regulations in the future Require states to issue notices to utilities for each violation Require labs to report sample results directly to states electronically Improve front end retrieval of SDWIS/FED data EPA HQ should provide contract funds for data management technical assistance Provide new resources for data management 37 ------- 5 5 4 3 3 3 2 1 1 Enable utilities to review their data before it is sent to SDWIS/FED • Encourage state web access • Ask trade association to communicate need for states to have additional resources to enable web access Establish a multi-state cadre of state peer reviewers. • States provide travel funds • Voluntary basis Focus national program guidance on M/R discrepancies • Help mitigate funds drawn to other media Develop automated compliance determination mechanisms in SDWIS/STATE Centralize Oracle DBA support (this recommendation applies to all states, not only those using SDWIS/STATE) Establish contract funds to help states enter data on an as-needed basis Provide better guidance, including data flow diagrams, when new rules are issued Have EPA over-file for states which choose not to report Complete the edit summary report to identify generic errors 0 (Standardize data transfer mechanisms 38 ------- |