United States
Environmental Protection
Agency
Data Management and Quality
Assurance/Quality Control Process for the
Fourth Six-Year Review Information
Collection Rule Dataset
-------
Office of Water (4607M)
EP A-815-R-24-017
February 2024
-------
Disclaimer
This document is not a regulation. It is not legally enforceable and does not confer legal rights or
impose legal obligations on any party, including EPA, States, or the regulated community.
While EPA has made every effort to ensure the accuracy of any references to statutory or
regulatory requirements, the obligations of the interested stakeholders are determined by statutes,
regulations, or other legally binding requirements, not this document. In the event of a conflict
between the information in this document and any statute or regulation, this document would not
be controlling.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
February 2024
-------
Executive Summary
The 1996 Amendments to the Safe Drinking Water Act (SDWA) require that the Environmental
Protection Agency (EPA) "shall, not less often than every 6 years, review and revise, as
appropriate, each national primary drinking water regulation." The National Primary Drinking
Water Regulations (NPDWRs) are often referred to as the national drinking water contaminant
regulations or drinking water standards. The purpose of the review, called the Six-Year Review
(SYR), is to evaluate current information for regulated contaminants to determine if there is new
information on health effects, treatment technologies, analytical methods, occurrence and
exposure, implementation, and/or other factors that provides a health or technical basis to
support a regulatory revision that will improve or strengthen public health protection. To support
the SYR process, EPA generally issues an Information Collection Request (ICR) to the states
and other primacy agencies to collect the recent data information that public water systems
(PWSs) have submitted per requirements of NPDWRs. The data are voluntarily submitted and
typically consist of the compliance monitoring records and the records related to treatment
technique requirements, usually covering a period of about six years for every cycle. For more
information on the SYR 4 ICR see EPA's website: https://www.epa.gov/dwsixyearreview/six-
vear-review-4-drinking-water-standards-information-collection-request.
This report describes how the compliance monitoring data and treatment technique information
for EPA's fourth Six-Year Review (SYR 4) of NPDWRs were obtained, evaluated, and
formatted, where necessary, to enable national contaminant occurrence estimates. In addition,
this document describes the data requested and received, data quality issues, and data
management efforts to make it consistent and usable for subsequent analyses.
EPA conducted data management and quality assurance (QA) evaluations on the data received
for contaminants evaluated for the SYR 4 to establish a national compliance monitoring and
treatment technique dataset consisting of data from 59 states/primacy agencies (46 states plus
territories, Washington, D.C., and tribes). The compliance monitoring data and treatment
technique information for these 59 states/primacy agencies comprise more than 71 million
analytical records from approximately 140,000 PWSs, which serve more than 301 million people
nationally.1 The ICR dataset for the fourth Six-Year Review (SYR 4 ICR dataset) is the largest
and most comprehensive compliance monitoring data and treatment technique information
dataset ever compiled and analyzed under EPA's drinking water program.
Information regarding the acquisition, storage, and management of the SYR 4 ICR data is
presented in Sections 2 through 4 of this report. Detailed descriptions of the QA evaluations and
data preparation for analyses are presented in Section 5 and Section 6, respectively. Additional
technical information related to the SYR 4 ICR dataset is presented in the appendices to this
report.
1 These statistics reflect the portion of the overall dataset representing compliance monitoring samples collected for
requested regulated contaminants. The initial dataset, including data not specifically requested by EPA but
submitted voluntarily by some states, was comprised of over 83 million records from approximately 142,000 PWSs.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
ii
February 2024
-------
For the national contaminant occurrence assessments for the Chemical Phase Rules and
Radionuclides Rule conducted in support of EPA's fourth Six-Year Review of NPDWRs, refer
to the USEPA (2024a) report entitled Analysis of Regulated Contaminant Occurrence Data from
Public Water Systems in Support of the Fourth Six-Year Review of National Primary Drinking
Water Regulations: Chemical Phase Rules and Radionuclides Rules. For more detailed
information on the microbial contaminants' occurrence analysis, refer to USEPA (2024b) report
entitled Six-Year Review 4 Technical Support Document for Microbial Contaminant Regulations.
The final SYR 4 ICR datasets are posted online at: https://www.epa.gov/dwsixyearreview.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
iii
February 2024
-------
Table of Contents
1 Introduction 1
2 Data Acquisition 2-1
3 Data Storage 3-1
4 Data Management 4-1
4.1 Review of SYR 4 Dataset Content 4-1
4.2 Restructuring Non-SDWIS State Data 4-2
4.3 Establishing Consistent Data Fields for Analytical Results (SDWIS and Non-SDWIS
States) 4-3
5 Data Quality Assurance and Quality Control 5-1
5.1 Completeness and Representativeness of the Six-Year Review ICR Dataset 5-1
5.2 Quality Assurance Measures Applied to All Contaminants 5-1
5.2.1 Non-Public Water Systems 5-3
5.2.2 Systems with Missing Inventory Data 5-3
5.2.3 Sample Results Collected Outside of the Date Range 5-4
5.2.4 Non-Compliance 5-4
5.2.5 Uniform System Inventory Information 5-4
5.3 Quality Assurance Measures Applied to Chemicals and Radionuclides 5-4
5.3.1 Non-Routine 5-7
5.3.2 Duplicate Records 5-7
5.3.3 Units of Measure 5-7
5.3.4 Potential Outliers 5-7
5.3.5 Transient Water Systems 5-10
5.3.6 Non-Community Water Systems (Radionuclides Only) 5-11
5.3.7 Source Water Type Adjustment 5-11
5.3.8 Consecutive Water Systems 5-11
5.3.9 Samples from Source/Raw Water 5-11
5.3.10 Mismatched Nitrate and Nitrite Data 5-12
5.4 Quality Assurance Measures Applied to DBPs and Related Parameters 5-12
5.4.1 Non-Routine Samples 5-13
5.4.2 Duplicate Records 5-14
5.4.3 Units of Measure 5-14
Data Management and QA/QC Process iv February 2024
for the SYR 4 ICR Dataset
-------
Table of Contents (continued)
5.4.4 Potential Outliers 5-14
5.4.5 Locational Flag 5-15
5.5 Quality Assurance Measures Applied to Microbial Contaminants 5-16
5.5.1 Non-Routine Samples 5-17
5.5.2 Pairing Disinfectant Residual and Coliform Results for non-SDWIS States... 5-17
5.5.3 Updates to Absence and Presence Codes 5-18
6 Data Preparation for Chemical Phase and Radionuclides Rules' Analyses 6-1
6.1 Non-Detection Record Replacement 6-1
6.2 Adjustments of Population Served by Public Water Systems 6-2
7 Public Access to SYR 4 ICR Data 7-1
8 References 8-1
9 List of Appendices 9-1
Appendix A: Data Request Letter that EPA Sent on June 3, 2020 to Each Primacy Agency
to Request Voluntary Submission of Compliance Monitoring Data and Treatment
Technique Information for Regulated Chemical, Radiological, and Microbiological
Contaminants 1
Appendix B: Crosswalk of Data Elements Requested for SYR 4 ICR and the SDWIS Data
Element Names 1
Appendix C: Data Dictionary for the SYR 4 ICR Database 1
Appendix D: Occurrence data for the Aircraft Drinking Water Rule (ADWR) 1
Appendix E: User Guide to Downloading and Using Six-Year Review 4 and Related Data
from EPA's Website 1
Section 1: Background Information on SYR 4 Data Records 2
Section 2: SYR 4 Data Records Posted for Phase Chemicals, Lead, Copper and
Radionuclides 6
Section 3: SYR 4 Data Records Posted for Disinfection Byproducts 10
Section 4: SYR 4 Data Records Posted for Disinfection Byproduct Related Parameters 11
Section 5: SYR 4 Data Records Posted for Microbial Contaminants, Microbial Related
Parameters, and Disinfectant Residuals 13
Section 6: SYR 4 Data Records Posted for Aircraft Drinking Water Rule (ADWR) 15
Section 7: Additional Data Collected under SYR 4 ICR 18
Section 8: Treatment Data 19
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
v
February 2024
-------
Table of Contents (continued)
Section 9: SYR 4 Data Considerations 22
Section 10: Instructions on Importing SYR 4 Datasets 23
References 29
Data Management and QA/QC Process vi February 2024
for the SYR 4 ICR Dataset
-------
Appendices
APPENDIX A Data Request Letter that EPA Sent on June 3, 2020 to Each Primacy
Agency to Request Voluntary Submission of Compliance Monitoring Data
and Treatment Technique Information for Regulated Chemical,
Radiological, and Microbiological Contaminants
APPENDIX B Crosswalk of Data Elements Requested for SYR 4 ICR
and the SDWIS Data Element Names
APPENDIX C Data Dictionary for the SYR 4 ICR Database
APPENDIX D Occurrence Data for the Aircraft Drinking Water Rule (ADWR)
APPENDIX E User Guide to Downloading and Using SYR 4 and Related Data from
EPA's Website
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
vii
February 2024
-------
Exhibits
Exhibit 1: List of Contaminants/Parameters Identified in SYR 4 ICR for which Data Were
Requested from States 2-1
i
Exhibit 2: Data Elements Requested by EPA for the Fourth Six-Year Review 2-3
Exhibit 3: Summary of States that Provided Compliance Monitoring Data and Treatment
Technique Information for SYR 4 2-6
Exhibit 4: Description of Tables Included in SYR 4 ICR Database 3-1
Exhibit 5: Mann-Whitney U Test for MCL Violation Rates in States Included in SYR 4 versus
States Not Included 5-3
Exhibit 6: Comparison of the Total Number of Systems and Population Served in SDWIS/Fed
and the SYR 4 ICR Dataset, By State 5-4
Exhibit 7: Comparison of the Total Number of Systems and Population Served in SDWIS/Fed
and the SYR 4 ICR Dataset, By Source Water Type and System Type 5-7
Exhibit 8: Contaminant Group Monitoring Requirements 5-1
Exhibit 9: Flow Chart of QA Measures Applied to All SYR 4 Contaminants 5-3
Exhibit 10. Flow Chart of Additional QA Measures Specific to Chemicals, Radionuclides, and
Lead and Copper 5-5
Exhibit 11: Summary of the Count of Sample Analytical Results Removed via the QA Measures
Applied to Chemical Phase, Radionuclides and Lead and Copper Rules' Contaminants 5-6
Exhibit 12: List of Contaminant MCL and MDL Values 5-8
Exhibit 13. Flow Chart of Additional QA Measures Specific to DBPs and DBP-Related
Parameters 5-12
Exhibit 14: Summary of the Count of Analytical Sample Results Removed via the QA Measures
Applied to DBP Rule Contaminants1 5-13
Exhibit 15: List of DBP MCL Values 5-15
Exhibit 16. Flow Chart of Additional QA Measures Specific to Microbial Contaminants 5-16
Exhibit 17: Summary of the Count of Analytical Samples Results Removed via the QA Measures
Applied to Microbial Rule Contaminants1 5-16
Exhibit 18. Process to Establish Contaminant National Modal MRLs 6-2
Exhibit 19: Illustration of the Adjusted Total Population Served by Wholesale Systems 6-3
Exhibit 20: Illustration of the Allotment of Consecutive System Populations to Wholesale
Systems 6-4
Data Management and QA/QC Process viii February 2024
for the SYR 4 ICR Dataset
-------
Abbreviations and Acronyms
ADWR
Airline Drinking Water Rule
CAS
Chemical Abstracts Service
CHEM ID
Four Digit SDWIS Code
CO
Confirmation
CWS
Community Water System
DBCP
l,2-Dibromo-3-chloropropane
DBP
Disinfection Byproduct
DBPR
Disinfection Byproduct Rule
D/DBPR
Disinfectants and Disinfection Byproducts Rule
DEHA
Di(2-ethylhexyl) adipate
DEHP
Di(2-ethylhexyl) phthalate
EC
Escherichia coli (E. coli)
EDB
Ethylene dibromide
eDWR
Electronic Drinking Water Report
EPA
Environmental Protection Agency (United States)
FBRR
Filter Backwash Recycling Rule
FC
Fecal Coliform
GAC
Granular Activated Carbon
GW
Ground Water
GWP
Ground Water Purchased
GWR
Ground Water Rule
GWUDI (or GU)
Ground Water Under Direct Influence (of Surface Water)
GUP
Purchased Ground Water Under Direct Influence of Surface Water
HAA
Haloacetic Acids
HPC
Heterotrophic Plate Count
IESWTR
Interim Enhanced Surface Water Rule
ICR
Information Collection Request
IOC
Inorganic Contaminant
LCR
Lead and Copper Rule
LT IESWTR
Long-Term 1 Enhanced Surface Water Treatment Rule
LT2ESWTR
Long-Term 2 Enhanced Surface Water Treatment Rule
MCL
Maximum Contaminant Level
MDBP
Microbial and Disinfection Byproducts
MDL
Method Detection Limit
MFL
Million Fibers per Liter
mg/L
Milligrams per Liter
mrem/yr
Millirem per year
MR
Maximum Residence
MRDL
Maximum Disinfectant Residual Level
MRL
Minimum Reporting Level
NPDWR
National Primary Drinking Water Regulation
NTNCWS
Non-Transient Non-Community Water System
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
ix
February 2024
-------
Abbreviations and Acronyms (cont.)
PCBs Polychlorinated Biphenyls
pCi/L Picocuries per Liter
PWS Public Water System
PWSID Public Water System Identification Number
QA Quality Assurance
QC Quality Control
RP Repeat
RT Routine
RTCR Revised Total Coliform Rule
SDWA Safe Drinking Water Act
SDWIS/Fed Safe Drinking Water Information System / Federal Version
SDWIS/State Safe Drinking Water Information System / State Version
SOC Synthetic Organic Contaminant
SW Surface Water
SWP Purchased Surface Water
SWTR Surface Water Treatment Rule
SYR 4 Fourth Six-Year Review
TC Total Coliform
TCR Total Coliform Rule
TG Triggered
TNCWS Transient Non-Community Water System
TOC Total Organic Carbon
TTHM Total Trihalomethanes
USEPA United States Environmental Protection Agency
|ig/L Micrograms per Liter
VOC Volatile Organic Contaminant
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
x
February 2024
-------
1 Introduction
This document describes how the compliance monitoring data and treatment technique
information for the fourth Six-Year Review (SYR 4) were obtained, evaluated, and formatted,
where necessary, to enable national contaminant occurrence estimates in support of the
Environmental Protection Agency's (EPA) SYR 4 of National Primary Drinking Water
Regulations (NPDWRs). In addition, this document describes the data requested and received,
data quality issues, and modifications to the data to make it consistent and usable for subsequent
analyses. The actual analyses performed are described in other reports, referenced further in this
section.
The 1996 Amendments to the Safe Drinking Water Act (SDWA) require that the EPA "shall, not
less often than every 6 years, review and revise, as appropriate, each national primary drinking
water regulation," (Section 1412(b)(9)). The NPDWRs are often referred to as the national
drinking water contaminant regulations or drinking water standards. The purpose of the Six-Year
Review is to evaluate current information for regulated contaminants to determine if there is new
information on health effects, treatment technologies, analytical methods, occurrence, exposure,
implementation, and/or other factors that provides a health or technical basis to support a
regulatory revision that will improve or strengthen public health protection.
National contaminant occurrence assessments were conducted in support of EPA's SYR 4, using
data from National Compliance Monitoring Information Collection Request (ICR) dataset for the
fourth Six-Year Review (SYR 4 ICR dataset). These compliance monitoring data and treatment
technique information were provided to EPA by States2 via the ICR process. The report Analysis
of Regulated Contaminant Occurrence Data from Public Water Systems in Support of the Fourth
Six-Year Review of National Primary Drinking Water Regulations: Chemical Phase Rules and
Radionuclides Rules (USEPA, 2024a) provides complete details on the national contaminant
occurrence assessments of the contaminants regulated by the Phase I, II, lib, and V Rules, the
Arsenic Rule, and the Radionuclides Rule conducted in support of EPA's SYR 4. Included in
that report are detailed descriptions of the national contaminant compliance monitoring and
treatment technique dataset compiled and the statistical analytical methods employed to generate
national estimates of regulated contaminant occurrence in public drinking water systems.
Compliance monitoring data for rules concerning microbial contaminants, disinfectants, and
disinfection byproducts were also collected under SYR 4. For more detailed information on the
microbial contaminants' occurrence analysis, refer to Six-Year Review 4 Technical Support
Document for Microbial Contaminant Regulations (USEPA, 2024b). Occurrence analyses of
disinfectants, disinfection byproducts, and certain microbial contaminants were not included in
SYR 4 because these NPDWRs were identified as candidates for revision under Six-Year
Review 3. However, the occurrence information collected under SYR 4 will be used to inform
potential revisions to MDBP rules.
2 In the remainder of this document, the terms "State" or "States" refers to primacy agencies in states of the United States, the
District of Columbia, the Commonwealth of Puerto Rico, the Virgin Islands, Guam, American Samoa, the Commonwealth of the
Northern Mariana Islands, the Trust Territory of the Pacific Islands, or an eligible Indian tribe.
Data Management and QA/QC Process 1-1 February 2024
for the SYR 4 ICR Dataset
-------
The SYR 4 ICR data were received from the States in a variety of formats and data structures.
The submitted data required restructuring to a uniform format to conduct the national
contaminant occurrence analyses. EPA conducted a rigorous quality control evaluation of the
data submitted by States, then assembled these data into a database. This document provides a
description of the processes EPA used to assure overall data quality while developing the
occurrence dataset for SYR 4 contaminant occurrence evaluations.
Specifically, this document describes the compliance monitoring data and treatment technique
information requested and received and provides an overview of the data management and
quality assurance/quality control (QA/QC) efforts used to prepare the data to analyze
contaminant occurrence. Additional QA/QC processes specific to the microbial analyses are
described in USEPA (2024b).
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
1-2
February 2024
-------
2 Data Acquisition
Compliance monitoring data and treatment technique information provide information critical to
the Six-Year Review occurrence assessments. Without an understanding of where and at what
levels these contaminants are occurring in public drinking water, EPA cannot assess the risk to
public health and whether potential revisions are likely to maintain or improve public health
protection. In addition, other compliance data can help in evaluating the effectiveness of current
regulations.
The Federal Safe Drinking Water Information System database (SDWIS/Fed) contains
information about public water systems (PWSs) and their violations of EPA's drinking water
regulations. However, SDWIS/Fed does not receive nor store compliance monitoring data, which
include non-detections as well as detections. To estimate national occurrence of regulated
contaminants in PWSs, it was necessary to compile results from all compliance monitoring
samples, including samples which showed analytical detections and non-detections. These data
are collected by States but are not required to be submitted to SDWIS/Fed. Therefore, to obtain
the compliance monitoring data and treatment technique information used in support of national
occurrence assessments for SYR 4, EPA conducted a voluntary data call-in from the States,
through the ICR process. For more information on the process undertaken to request the
voluntary submission of compliance monitoring data and treatment technique information from
States, see the SYR 4 ICR (84 FR 58381, USEPA, 2019).
Similar to prior rounds of the Six-Year Review, EPA contacted each State via letter requesting
the voluntary submission of their compliance monitoring data for regulated chemical,
radiological, microbial, and disinfection byproduct (DBP) contaminants and treatment technique
information for all NPDWRs and related parameters that were collected between January 2012
and December 2019. See Appendix A for the compliance monitoring data and treatment
technique information request letter.
EPA requested only information stored electronically (i.e., no paper records) that represented
routine compliance monitoring data and treatment technique information. Exhibit 1 shows the
regulated contaminants for which EPA requested data, and Exhibit 2 shows the requested data
elements (e.g., columns, fields) for each sample result. See Appendix B: Crosswalk of Data
Elements Requested for SYR 4 ICR and the SDWIS Data Element Names for a crosswalk table
between the data elements requested and the actual data element names as they appear in
SDWIS. In some cases, EPA did not receive any data for the elements and/or analytes requested.
Exhibit 1: List of Contaminants/Parameters Identified in SYR 4 ICR for which Data
Were Requested from States
Chemical Contaminants (Phase 1, II, IIB, and V Rules; Arsenic Rule; Lead and Copper Rule)
Acrylamide
1,1-Dichloroethylene
Methoxychlor
Alachlor
cis-1,2-Dichloroethylene
Monochlorobenzene
(Chlorobenzene)
Antimony
trans-1,2-Dichloroethylene
Nitrate (as N)
Arsenic
Dichloromethane (Methylene chloride)
Nitrite (as N)
Asbestos
1,2-Dichloropropane
Oxamyl (Vydate)
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
2-1
February 2024
-------
Chemical Contaminants (Phase 1, II, IIB, and V Rules; Arsenic Rule; Lead and Copper Rule)
Atrazine
Di(2-ethylhexyl) adipate (DEHA)
Pentachlorophenol
Barium
Di(2-ethylhexyl) phthalate (DEHP)
Picloram
Benzene
Dinoseb
Polychlorinated biphenyls (PCBs)
Benzo[a]pyrene
Diquat
Selenium
Beryllium
Endothall
Simazine
Cadmium
Endrin
Styrene
Carbofuran
Epichlorohydrin
2,3,7,8-TCDD (Dioxin)
Carbon tetrachloride
Ethylbenzene
Tetrachloroethylene
Chlordane
Ethylene dibromide (EDB)
Thallium
Chromium (total)
Fluoride
Toluene
Copper
Glyphosate
Toxaphene
Cyanide
Heptachlor
2,4,5-TP (Silvex)
2,4-D
Heptachlor epoxide
1,2,4-T richlorobenzene
Dalapon
Hexachlorobenzene
1,1,1-Trichloroethane
1,2-Dibromo-3-chloropropane (DBCP)
Hexachlorocyclopentadiene
1,1,2-Trichloroethane
1,2-Dichlorobenzene (o-Dichlorobenzene)
Lead
Trichloroethylene
1,4-Dichlorobenzene (p-Dichlorobenzene)
Lindane
Vinyl chloride
1,2-Dichloroethane (Ethylene dichloride)
Mercury (inorganic)
Xylenes (total)
Radiological Contaminants
Combined Radium-226/228; and Radium-
226 & Radium-228 (if available)
Gross beta
Tritium
lodine-131
Uranium
Gross alpha
Strontium-90
Total Coliform Rule (TCR) and Revised Total Coliform Rule (RTCR)
Total coliforms
Fecal coliforms
Escherichia coli (E. coli)
Disinfectants and Disinfection Byproducts Rules (D/DBPRs)
Total Trihalomethanes (TTHMs):
Chloroform
Bromodichloromethane
Dibromochloromethane
Bromoform
Haloacetic Acids 5 (HAA5):
Monochloroacetic acid
Dichloroacetic acid
Trichloroacetic acid
Bromoacetic acid
Dibromoacetic acid
Bromate
Chlorite
Chlorine*
Chloramines*
Chlorine dioxide
Ground Water Rule (GWR)
Escherichia coli (E. coli)
Enterococci
Coliphage
Surface Water Treatment Rules (SWTRs)
Chlorine**
Cryptosporidium***
Heterotrophic Plate Count (HPC)
Chloramines**
Filter Backwash Recycling Rule (FBRR)
No specific occurrence data collected.
Source: Attachment A to the letter EPA sent to each State to request voluntary submission of its compliance monitoring data and
treatment technique information for regulated chemical, radiological, and microbiological contaminants. See Appendix A for the data
request letter.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
2-2
February 2024
-------
* As a maximum disinfectant residual level (MDRL). Chlorine and chloramines are reported as free chlorine and total chlorine,
respectively.
** As a minimum disinfectant residual level. Chlorine and chloramines are reported as free chlorine and total chlorine, respectively.
*** The monitoring data from Round 2 under Long Term 2 Enhanced Surface Water Treatment Rule (LT2ESWTR), is being
reviewed and will be available along with the SYR 4 results.
1
Exhibit 2: Data Elements Requested by EPA for the Fourth Six-Year Review
Data Category
Description
System-Specific Information
Public Water System
Identification Number
(PWSID)
The code used to identify each PWS. The code begins with the standard 2-character
postal state abbreviation or Region code; the remaining 7 numbers are unique to each
PWS in the State.
System Name
Name of the PWS.
Federal Public Water
System Type Code
A code to identify whether a system is:
Community Water System;
Non-transient Non-community Water System; or
Transient Non-community Water System.
Population Served
Highest average daily number of people served by a PWS, when in operation.
Federal Source Water
Type
Type of water at the source. Source water type can be:
Ground water; or
Surface water; or
Ground water under the direct influence of surface water (GWUDI)
(Note: Some States may not distinguish GWUDI from surface water sources. In those
States, a GWUDI source should be reported as a surface water source type.)
Treatment Information
Water System Facility
System facility data, including treatment plant identification number, treatment plant
information, treatment unit process/objectives, facility flow, treatment train (train or flow
of water through treatment units within the treatment plant).
Filtration Type
Information relating to system filtration, including filtration status, types of filtration (e.g.,
unfiltered, conventional filtration, and other permitted values).
Treatment Technique
Information
Information pertaining to treatment processes. Types of treatment technique
information including disinfectants used and their doses for primary and secondary
disinfection, coagulant/coagulant aid type and dose, disinfectant concentration,
disinfection profile/benchmark data, log ofviral inactivation/removal, contact time,
contact value, pH, temperature.
Filter Backwash
Information
Information about filter backwash that is returned to the treatment plant influent (e.g.,
information on recycle/schematic status, alternative return location, corrective action
requirements, and recycle flows and frequency).
Sample-Specific Information
Sampling Point
Identification Code
A sampling point identifier established by the State, unique within each applicable
facility, for each applicable sampling location (e.g., entry point to the distribution
system). This information enables occurrence assessments that address intra-system
variability.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
2-3
February 2024
-------
Data Category
Description
Sample Identification
Number
Identifier assigned by State or the laboratory that uniquely identifies a sample.
Sample Collection
Date
Date the sample is collected, including month, day, and year.
Sample Type
Indicates why the sample is being collected (e.g., compliance, routine, repeat,
confirmation, additional routine samples, duplicate, special, special duplicate).
Sample Analysis Type
Code
Code for type of water sample collected.
Raw (Untreated) water sample
Finished (Treated) water sample
For lead and copper only:
Source
Tap
For TCR Repeats only; indicator of sampling location relative to sample point where
positive sample was originally collected:
Upstream
Downstream
Original
Contaminant
Contaminant name, 4-digit SDWIS contaminant identification number, or Chemical
Abstracts Service (CAS) Registry Number for which the sample is being analyzed.
Sample Analytical
Result - Sign
The sign indicates whether the sample analytical result was:
(<) "less than" means the contaminant was not detected or was detected at a
level "less than" the minimum reporting level (MRL).
(=) "equal to" means the contaminant was detected at a level "equal to" the
value reported in "Sample Analytical Result - Value."
(+) "positive result" (For RTCR data, only positive E. coli result sign to be
included.)
Sample Analytical
Result - Value
Actual numeric (decimal) value of the analysis for the chemical results, or the MRL if
the analytical result is less than the contaminant's MRL.
(For the TCR and RTCR, TC and E. coli will indicate presence/absence, and positive
E. coli will have numeric results.)
Sample Analytical
Result - Unit of
Measure
Unit of measurement for the analytical results reported (usually expressed in either
|jg/L or mg/L for chemicals; or pCi/l or mrem/yr for radiological contaminants).
(Not required for TCR and RTCR data)
Sample Analytical
Method Number
EPA identification number of the analytical method used to analyze the sample for a
given contaminant.
Minimum Reporting
Level (MRL)- Value
MRL refers to the lowest concentration of an analyte that may be reported.
(Not required for TCR and RTCR data)
MRL - Unit of Measure
Unit of measure to express the concentration value of a contaminant's MRL.
(Not required for TCR and RTCR data)
Source Water
Monitoring Information
Total organic carbon (TOC), including percent TOC removal, TOC removal summary,
pH, alkalinity, monitoring data entered as individual results or included in DBP (or
monthly operating report) summary records, alternative compliance criteria, results
from round 2 monitoring under LT2 ESWTR (including Cryptosporidium, E. coli,
turbidity, or State-approved alternate indicators).
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
2-4
February 2024
-------
Data Category
Description
Sample Summary
Reports
Sample summaries for DBPRs, SWTRs, RTCR, GWR corrective actions, and the Lead
and Copper Rule (LCR) associated with analytical result records. Values used for
compliance determination [e.g., turbidity (combined effluent/individual effluent),
disinfectant residual levels in treatment plant and distribution system, treatment
technique information, HPC, etc.]
Source: Attachment A to the letter EPA sent to each State to request voluntary submission of compliance
monitoring data and treatment technique information for regulated chemical, radiological, and microbiological
contaminants. See Appendix A for the data request letter.
1 These are the data elements requested in the SYR 4 ICR. The "Data Category" and "Description" columns were
intentionally descriptive rather than prescriptive. This allowed the States that do not use SDWIS/State the flexibility
to provide as much information as possible. EPA accepted all data "as is" without prescribing structure or format.
About 78 percent of the 50 U.S. states currently store and manage at least portions of their
compliance monitoring data and/or treatment technique information in the Safe Drinking Water
Information System/State Version (SDWIS/State). EPA developed SDWIS/State in collaboration
with primacy agencies to manage drinking water information and provide a common structure
for the development of reusable components and shared applications. The SDWIS/State structure
has the flexibility to support the most complex primacy program implementation while
maintaining a common core of data elements required for reporting to SDWIS/Fed. In an attempt
to make the SYR 4 data submittal process as easy for States as possible, EPA developed a
SDWIS/State Extraction Tool (also referred to as "extraction tool" throughout this document),
which enabled States to run a customized query to pull the requested data from a SDWIS/State
database maintained by those States. All of the States using SDWIS/State that submitted data to
EPA for SYR 4 used the extraction tool to extract and compile the EPA-requested compliance
monitoring and treatment technique data.
SDWIS/State supports the Electronic Drinking Water Report (eDWR) XML Schema used by
laboratories throughout the nation to electronically report sample analytical results as structured
data to SDWIS/State (for more information, see the full eDWR description and schema details
https://exchangenetwork.net/data-exchange/electronic-drinking-water-reports/). As a result,
States receive tabular data from laboratories that is batch-processed into SDWIS/State rather than
manually entered. Consequently, States have a substantial amount of structured data available in
SDWIS/State. In all, for SYR 4, 46 states and 13 other jurisdictions provided compliance
monitoring data and treatment technique information that included parametric records. The seven
States that did not provide data were Georgia, Michigan, Mississippi, New Mexico, Guam,
Puerto Rico, and U.S. Virgin Islands.
Exhibit 3 lists the States that submitted SYR 4 data and indicates whether they used the
extraction tool. Thirty-five states, Washington D.C, and six regional tribal entities used the
extraction tool to transmit all or some of their chemical and microbial data; therefore, those
datasets were all submitted in a similar format. The 17 States not using SDWIS/State submitted
their compliance monitoring data and treatment technique information "as is," resulting in a
variety of formats, including dBase, Excel, XML, Access, and comma-delimited. Apart from
California, Colorado, and Florida, whose data were downloaded from their publicly available
websites, all States submitted their data online via EPA's Central Data Exchange.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
2-5
February 2024
-------
Exhibit 3: Summary of States that Provided Compliance Monitoring Data and
Treatment Technique Information for SYR 4
State/Entity Name
States/Tribes that DID
use the SDWIS/State
Extraction Tool
Alabama
Alaska
Arizona
Arkansas
Connecticut
Delaware
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Missouri
Montana
Nebraska
Nevada
New Jersey
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Region 4 tribes
Region 5 tribes
Region 6 tribes
Region 7 tribes
Region 8 tribes
Region 10 tribes
Rhode Island
South Carolina
Texas
Utah
Vermont
Virginia
Washington D.C
West Virginia
Wyoming
States/Tribes that DID
NOT use the
SDWIS/State Extraction
Tool
American Samoa
California1
Colorado1
Commonwealth of the
Northern Mariana
Islands
Florida1
Massachusetts
Minnesota
Navajo Nation
New Hampshire
Pennsylvania
Region 1 tribes
Region 2 tribes
Region 9 tribes
South Dakota
Tennessee
Washington
Wisconsin
States/Tribes that DID
NOT submit anv SYR 4
data
Georgia
Guam
Michigan
Mississippi
New Mexico
Puerto Rico
U.S. Virgin Islands
1 CA, CO, and FL compliance monitoring and treatment technique information was extracted from a publicly available website.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
2-6
February 2024
-------
3 Data Storage
EPA designed the SYR 4 ICR database similarly to SDWIS/State to house the data that States
sent in response to the SYR 4 ICR data request. The SYR 4 ICR database is an Oracle relational
database which consists of tables, relationships, import scripts, and other objects that support
populating the database tables. Because of the likelihood of duplicate record identifiers in the
source tables (e.g., same IDs from different States), most tables in the SYR 4 database contain a
unique record identifier (i.e., a primary key). The unique record identifiers ensured that all
relevant records were imported and that duplicate record identifiers present in the source data did
not cause relevant records to be excluded. The relational database structure is an appropriate
method of storing large volumes of data because it allows each table to store unique information.
The SYR 4 database was designed to ensure information was not duplicated between tables and
to maintain the logical relationships inherent to the data.
Exhibit 4 presents a description of the tables included in the SYR 4 ICR database. The database
includes 17 primary tables and 2 transaction tables. The primary tables include SDWIS data
elements, codes, and the compliance monitoring data and treatment technique information. The
two additional transaction tables that relate to the QA/QC review were created by EPA to
manage the QA/QC review effort. The QA/QC review documentation codes are called
transactions in the database and are listed in Exhibit 4 with the word "transaction" in the title.
For a list of all of the data elements included in each table, as well as available codes for each
data element, refer to Appendix C: Data Dictionary for the SYR 4 Database.
Exhibit 4: Description of Tables Included in SYR 4 ICR Database
Table Name
Brief Description
Description of Contents of Table
T6YWS
Water system (Ws) table
Inventory information: PWSID, source water type,
system type, population, etc.
T6YWSF
Water system facility (Wsf)
table
Facility identification information: facility ID, facility
type, etc.
T6YSPT
Sample point (Spt) table
Sample point identification information: sample point
type, source type, etc.
T6YANALYTE
Analyte table
Analyte identification information: contaminant name,
4-digit chemical IDs, etc.
T6YSAR
Sample analytical result (Sar)
table
Monitoring records: sample date, sample type code,
analyte, concentration, reporting level, method, etc.
T6YDBPSUM
Disinfectant Byproduct
summaries table
Summary used to enter sampling requirements and
collection information in support of the
SWTR/IESWTR and DBP rules.
T6YFANL
Facility analyte levels table
Includes information from primacy agencies where
they specify and maintain M&R and level compliance
values for an analyte at a water system facility.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
3-1
February 2024
-------
Table Name
Brief Description
Description of Contents of Table
T6YSAMPSUM
Lead and Copper Rule and
Total Coliform Rule sample
summaries table
Quantity of each different type of sample (e.g., total
samples collected, or number of repeat samples) and
the result (e.g., total positive samples, total negative
samples) of the sample analysis summaries for an
analyte.
T6YCMCLV
Compliance Monitoring
Compliance Level Violations
Includes information on calculated compliance
values.
T6YCORACT
Corrective actions table
Includes information on corrective actions.
T6YMCL_MDL
Maximum contaminant
level and minimum
detection level table
Includes information on the values and units of
the maximum contaminant level, four times the
maximum contaminant level, minimum detection
level, and one tenth the minimum detection level.
T6YWSFPLT
Treatment plant water system
facilities table
Includes information on treatment plant facilities.
T6YTREATPROCESS
Treatments associated to
treatment plants table
Includes information pertaining to the treatment
processes and objectives.
T6YWSF FLOWS
Water system facility flows
table
Includes information on the relationship or connection
between the different water system facilities of a
water system.
T6YWSFIND
Water system facility
indicators table
Includes information on the recording of an indicator
for a Water System Facility.
T6YWSIND
Water system indicators table
Includes information on the recording of an indicator
for a Water System.
T6YWSPURCH
Water system buyers and
sellers
Includes information on the purchase of water
between water systems.
T6YSAR_TRANSACTION
Transaction table for sample
analytical results
Flagged monitoring records: reason record was
flagged, action taken on flagged record, response
from the State (when available), and any other
relevant notes/remarks. Some records have multiple
entries in the transaction table if the record was
flagged for more than one reason.
T6YWS_TRANSACTION
Transaction table for water
systems
Flagged water systems: reason record was flagged,
action taken on flagged record, response from the
State (when available), and any other relevant
notes/remarks. Some records have multiple entries
in the transaction table if the record was flagged for
more than one reason.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
3-2
February 2024
-------
4 Data Management
This section provides descriptions of the data management tasks that were implemented to
prepare the SYR 4 datasets for QA/QC review. The SDWIS/State Extraction Tool transferred the
SDWIS/State data to Microsoft Access. Data from States that did not use the extraction tool were
restructured into a similar format. The two subdatasets (the extract States and the non-extract
States, referred to for the remainder of this document as the "SDWIS States" and the "non-
SDWIS States," respectively) were managed separately in order to arrange them into the same
format. After reformatting and transforming data from the non-SDWIS States, all data were
combined into the final SYR 4 ICR dataset.
A status documentation file was maintained that included information for each State.
Specifically, the status documentation described the date received, file type, whether the
extraction tool was used, and the date range of the data. The status documentation also described
any State-specific notes, issues, or concerns. Upon receipt of each state dataset, EPA created
State-specific directories. Original datasets were saved and maintained exactly as received and
stored in an EPA database. Any subsequent changes to a State's dataset were made to a copy of
the original dataset, and all changes were documented.
4.1 Review of SYR 4 Dataset Content
Similar to prior rounds of the Six-Year Review, the first assessment of the submitted SYR 4
datasets sought to verify that all of the necessary data elements were included in each state
dataset. This review included a comparison of the data elements requested in the state letter,
specifically those necessary for the SYR 4 analyses, to the entire list of data elements included in
each State's dataset. Although data dictionaries were not necessary for the review of data from
the SDWIS States, these files (and any other available supporting information provided by the
States) were useful when interpreting the data submitted by the non-SDWIS States. Supporting
information included descriptions of the sampling efforts provided in emails from the State,
additional information on acronym definitions, etc.
Data dictionaries and supporting information were reviewed for definitions of the various data
elements, row and column headings, codes, and acronyms. If fields were missing or not
recognizable, EPA contacted the State via email for clarification. EPA created a flagged record
report for each State to summarize questions regarding potential data quality concerns, data
completeness, statewide waivers, and any other unique aspects of their dataset. In addition, many
of the non-SDWIS States submitted datasets with more data elements than requested. In those
cases, EPA determined which data elements corresponded to the SYR 4 ICR.
EPA also confirmed that all of the requested contaminants from the SYR 4 ICR were included in
each State's dataset. As a first step for the non-SDWIS States, EPA reviewed the CHEMIDs (i.e.,
four-digit SDWIS codes) and/or contaminant names within each State's dataset. Many States
included only CHEMIDs or contaminant names. A few other States only included CAS numbers
or State-specific codes. EPA populated missing information using a variety of sources including
a list of SDWIS codes from the SDWIS/Fed database as well as the ChemlDPlus website (if only
CAS numbers were provided). Nine of the non-SDWIS States submitted at least some data for a
contaminant or contaminants for which a four-digit SDWIS code could not be determined. Other
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
4-1
February 2024
-------
times, the State appeared to use an incorrect four-digit SDWIS code for a particular contaminant.
EPA included issues regarding missing contaminants or undetermined CHEMIDs in the flagged
record reports that were sent to each State to ask for clarification.
Sample collection dates were reviewed for consistency with the SYR 4 ICR timeframe (2012-
2019). If sample collection dates were suspicious or incorrect, EPA tried to use other data
elements to infer the correct date (e.g., analyzed date). If the correct date could not be
determined, EPA included a question for the State in its flagged record report.
4.2 Restructuring Non-SDWIS State Data
Datasets received from the non-SDWIS States were restructured through a series of Microsoft
Access queries into a format similar to the structure of the data from the SDWIS States to allow
for the construction of a unified database for the SYR 4 national contaminant occurrence
analyses. As a first step in this process, EPA identified the data structure of each non-SDWIS
State dataset to plan the best method for conversion to the final database structure.
Several States submitted their data as a single flat file. However, the SYR 4 ICR database was
designed as a relational database so the structure of that flat file had to be modified (i.e., mapped)
into the structure of the relational database. The various data elements were mapped from the
single flat file table into three separate inventory tables for water systems, facilities, and sample
points (T6YWS, T6YWSF, and T6YSPT, respectively). As an example, a flat file from a State
may have contained columns for PWSID, population served, and system type for every sample
analytical result. However, in the final SYR 4 ICR database, the sample analytical result table
(T6YSAR) stores the sample analysis results with a water system ID to link it to a single record
in the water system table (T6YWS) with the corresponding inventory information. In this case, a
unique list of water systems and their system-level information was created from the flat file and
imported into T6YWS. The same procedure was followed with the sample point and facility
information. In some cases, a State provided sample point information but not facility
information. Within the SYR 4 ICR database, both the sample point and facility tables had to be
fully populated. In these cases, facility IDs were set equal to sample point IDs.
For each non-SDWIS State, EPA compiled a list of all tables and data elements, including
permitted values and a description of each element. An example of a permitted value is a
recognized system type code such as "C" (community) or "NTNC" (non-transient non-
community). From this framework, the submitted values were matched to the corresponding
values within SDWIS/Fed for the federally reportable data elements. The remaining data
elements and permitted values were mapped to the corresponding SDWIS/State values where
possible. For example, the source water type column in a non-SDWIS State's dataset could be
called "PSource"; in this instance, EPA created a crosswalk table3 indicating that "PSource"
should be mapped to the SDWIS/Fed field "D FED PRIM SRC CD". Generally, the States
that did not use the extraction tool provided enough information in data dictionaries or other
documentation for EPA to accurately organize the data in the SDWIS/Fed format.
3 A "crosswalk table" shows equivalent data elements in more than one database schema (e.g., a non-SDWIS/State
dataset format to the SDWIS/State dataset format). It maps the elements in one database to the equivalent elements
in another database.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
4-2
February 2024
-------
Prior to populating the SYR 4 ICR database, EPA standardized the data reported by each non-
SDWIS State to reflect the appropriate SDWIS codes. For example, in the source water type
field (i.e., "DFEDPRIMSRCCD"), all instances of "surface water" or "S" were changed to
"SW." In the system type field (i.e., "D_PWS FED TYPE CD"), all instances of "CWS" or
"community" were changed to "C" for community water systems. All PWSIDs had to be put in
the federal format of the two-character postal State abbreviation or region code followed by a
seven-digit number, unique to each PWS.
After the various State-specific formatting and transformations were completed, EPA imported
all non-SDWIS datasets into Access to ultimately merge with the SDWIS/State datasets in
Oracle, a database storing all SYR 4 data. In some cases, EPA imported only the data elements
identified as essential to the occurrence analysis. Upon completion, EPA compared all
transformed state datasets to the original datasets to ensure all data were accurately converted.
Furthermore, EPA saved a record of the procedures used to map the state datasets to the SYR 4
ICR database. All queries were created and saved in Access to document the transformation,
ensuring that this process is reproducible.
4.3 Establishing Consistent Data Fields for Analytical Results (SDWIS and Non-SDWIS
States)
EPA structured the sample analytical result sign, sample analytical result value, and sample
analytical result unit of measure into a consistent format to prepare the data for occurrence
analysis. EPA conducted this step prior to reviewing the data for potential outliers. Many of the
state datasets included analytical results signs (e.g., "<" for non-detections, "=" for detections),
detection limits, and analytical results data in multiple fields. EPA added a "DETECT" field to
the SYR 4 ICR dataset to identify the results sign and facilitate analysis. Wherever the analytical
result was greater than zero and the result sign indicated a detection, then DETECT was set equal
to 1, representing a detection. When the analytical result was equal to zero and/or the result sign
indicated a non-detection, then DETECT was set equal to 0 (i.e., a non-detect).
EPA received data with various units of measure. It was important that all data for each
individual contaminant be expressed in a single unit to facilitate analysis. Chemical monitoring
data were received in both milligrams per liter (mg/L) and micrograms per liter (|ig/L). For this
analysis, EPA converted all data for inorganic contaminants (IOCs), synthetic organic
contaminants (SOCs), volatile organic contaminants (VOCs), uranium, trihalomethanes (THMs),
and haloacetic acids (HAAs) to |ig/L. Data for alpha particles, beta particles,4 and combined
radium-226/228 were analyzed in picocuries per liter (pCi/L). Except for asbestos and
radionuclides, all thresholds and concentrations in this report are expressed in |ig/L. As described
in Section 5.3.3, all records with missing or unusual units in the SYR 4 ICR dataset were sent
back to States for input as part of the flagged records reports mentioned earlier.
4 Although the MCL for beta particles is in the unit of measure of millirem per year (i.e., 4 mrem/yr), the primary
unit of analytical measure is picocuries per liter (pCi/L). This unit of measure relates to screening thresholds of 15
pCi/L and 50 pCi/L that are defined in the 2000 Radionuclides Rule. More than 99 percent of all compliance
monitoring data for beta particles submitted by the States to EPA were in units of pCi/L.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
4-3
February 2024
-------
5 Data Quality Assurance and Quality Control
After EPA converted the state datasets into a consistent format, a significant effort was
undertaken to ensure the quality of the data submitted. Data quality, completeness, and
representativeness were key considerations for the dataset. Given the size, scope, and variety of
formats of the datasets received from the States, EPA conducted an extensive QA/QC evaluation
on the data to be included in the SYR 4 ICR dataset. This QA/QC evaluation involved the
assessment of data ranging in quality across the different contaminants and different States.
This chapter includes a summary description of the QA/QC measures that were conducted on the
state datasets prior to analysis. Not all QA/QC measures described were conducted on all States,
as noted in this chapter.
5.1 Completeness and Representativeness of the Six-Year Review ICR Dataset
The final SYR 4 ICR dataset consists of compliance monitoring data and treatment technique
information received from 59 of 66 States. It represents a large sample of PWSs across the
United States and the largest compliance monitoring dataset ever compiled and analyzed under
EPA's drinking water program. The 59 States that provided data for the SYR 4 ICR dataset
comprise 88 percent of all PWSs and 92 percent of the total population served by PWSs
nationally. The SYR 4 ICR dataset is geographically representative of PWSs nationwide.
The absence of data from seven States in the final SYR 4 ICR dataset could potentially bias the
dataset's representation of the national occurrence of contaminants. However, the seven States,
representing 12 percent of PWSs and 8 percent of the population served by PWSs nationally, are
expected to have a relatively small influence when compared to the PWSs and populations
represented by the States that did submit data. The seven States that did not provide compliance
monitoring data or treatment technique information are Georgia, Michigan, Mississippi, New
Mexico, Puerto Rico, Guam, and the U.S. Virgin Islands. Although Georgia and Mississippi, two
sizeable States in the southeastern United States did not provide data, all other southeastern
States did provide data, allowing for substantial regional coverage, especially from a population-
based perspective. All other regions of the conterminous United States had at most one State not
included in the dataset. The SYR 4 ICR dataset, with 59 of the 66 States represented, is therefore
considered reasonably complete and nationally representative as the basis of the contaminant
occurrence estimates for this Six-Year Review. However, to further address the issue of potential
bias, EPA assessed the contaminants regulated by the Chemical Phase and Radionuclides Rules
by comparing occurrence in the States that contributed data to the SYR ICR dataset to those that
did not.
Because a complete compliance monitoring dataset for every PWS was not available to EPA, it
was not possible to monitor national occurrence with complete certainty or to confirm that the
SYR 4 ICR dataset is representative of the States that did not voluntarily contribute data.
Therefore, an indicator of occurrence was developed using data available from the SDWIS/Fed
database, which does not have complete compliance monitoring data but does include violation
data from all 66 States. EPA compiled SDWIS/Fed records of MCL violations for the Chemical
Phase and Radionuclides Rules only, used here as an indicator of contaminant occurrence, by
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-1
February 2024
-------
State for the same years as the SYR 4 ICR dataset (2012-2019).5 The MCL violation records
were used to determine if the violation rate in the 7 missing States was significantly different
than the violation rate in the 59 States in the dataset, or if the violation rate in the 59 States could
be considered representative (i.e., drawn from the same statistical population). EPA conducted
this assessment for select chemical and radiological analytes evaluated under SYR 4.
The MCL violation rate for each contaminant (i.e., the percentage of systems with at least one
MCL violation) was calculated for the 59 States in the dataset and separately for the 7 States not
in the SYR 4 ICR dataset. For each contaminant, a Mann-Whitney U test, also known as a
Wilcoxon rank-sum test, was used to determine whether the population of MCL violation rates
by State significantly differs between the two groups (59 States versus 7 States). The non-
parametric Mann-Whitney test was chosen, as opposed to a parametric t-test, because the small
sample sizes (Exhibit 5) do not support an assumption that the data fit a normal distribution. The
resulting p-values from the Mann-Whitney U test were first compared to an alpha (a) level of
0.05, a common threshold of significance, then to 0.1, a less-stringent threshold considered to
account for small sample sizes. If the p-value resulting from the Mann-Whitney U test was less
than 0.1, EPA rejected the null hypothesis that the two populations of MCL violation rates were
equal and accepted the alternative hypothesis that they were unequal. Exhibit 5 summarizes the
results of the Mann-Whitney U test analysis.
Of the 69 chemical and radiological contaminants evaluated, only 10 contaminants had at least
one MCL violation listed in the SDWIS/Fed database for the 2012-2019 period for both groups
(i.e., 59 States that submitted data to the SYR 4 ICR dataset versus the 7 States that did not). As
States are only required to submit MCL violations to SDWIS/Fed but are not otherwise required
to submit compliance monitoring data, only States with at least one violation in SDWIS/Fed for
the specified contaminant were used in this analysis. Therefore, Mann-Whitney U tests were
conducted on only these 10 contaminants (Exhibit 5). The resulting p-values were greater than
0.1 for 9 of the 10 contaminants: arsenic, combined radium, uranium, fluoride, gross-alpha
(excluding radon and uranium), nitrate, nitrite, selenium, and thallium. Thus, EPA failed to reject
the null hypothesis that the two populations of MCL violation rates are equal. For one
contaminant (chromium), only one State in each group had an MCL violation, and so the Mann-
Whitney U test could not be applied effectively.
5 While the SDWIS/Fed database does not store complete compliance monitoring parametric records, the database
does maintain the most current and complete national and state records of contaminant MCL violations. Annual
MCL compliance data were extracted from SDWIS/Fed by EPA in November 2021.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
February 2024
5-2
-------
Exhibit 5: Mann-Whitney U Test for MCL Violation Rates in States Included in SYR
4 versus States Not Included
Contaminant
Name
Number of States with MCL
Violations
Median of State-Level
Violation Rates (percent)
p-Value
States in SYR 4
ICR
States NOT in
SYR 4 ICR
States in SYR 4
ICR
States NOT in
SYR 4 ICR
Uranium
26
2
6.68
32.91
0.259
Thallium
7
2
0.30
0.11
0.333
Radium-226/228
(combined)
35
4
5.98
4.01
0.460
Selenium
7
1
2.21
6.79
0.500
Arsenic
43
4
8.00
4.61
0.519
Nitrite
10
1
0.22
0.08
0.545
Fluoride
23
3
0.82
0.23
0.648
Nitrate
35
2
4.74
12.11
0.721
Alpha/photon
emitters
29
3
1.79
4.53
0.903
Chromium
1
1
0.68
0.08
n/a1
1 The Mann-Whitney test is not appropriate for this small sample size.
To further evaluate the completeness of each State's dataset, EPA used the SDWIS/Fed database
as a reference and compared the number of PWSs by State in the SYR 4 ICR dataset to the
number of systems by State in the SDWIS/Fed database (frozen fourth quarter 2019). Only the
SDWIS/Fed database records from the 59 States that are also in the SYR 4 ICR dataset were
included. Although the system inventories represented in the two data sources are similar, they
are not equivalent. The main difference is that the SYR 4 ICR dataset counts reflect the total
number of active water systems with compliance monitoring data during any of the eight years
represented in the dataset (2012-2019), while the SDWIS/Fed 2019 fourth quarter data freeze
counts reflect the total number of active water systems in a single year (2019). Since systems
open, close, and consolidate over time, the number of systems in each State will understandably
be somewhat different between the two data sources. Population changes in system service areas
over time could also contribute to differences in population served numbers for systems between
the two data sources. Exhibit 6 presents this comparison between the SDWIS/Fed and SYR 4
ICR datasets. If a system had more than one specified population served value in the submitted
data, the most frequently occurring population served value was included in the SYR 4 ICR
dataset.
Exhibit 6 compares the number of systems and population served by these systems in the
December 2019 SDWIS/Fed freeze and the SYR 4 ICR dataset by State. The counts of systems
and population served presented in for the SYR 4 ICR dataset only include systems that provided
data for the requested regulated contaminants, including chemicals, radionuclides, microbes, and
DBPs, prior to QA/QC review. The comparison between the counts of systems in the two data
sources indicates a 9 percent difference between the number of systems listed in the December
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-3
February 2024
-------
2019 SDWIS/Fed freeze compared to the number of systems in the SYR 4 ICR dataset. In
Exhibit 6, positive values for percent difference indicate that more systems are reported in the
SYR 4 ICR dataset, while negative values indicate that more systems are reported in the 2019
SDWIS/Fed freeze. Comparing the number of systems for each State, the absolute percentage
difference between SDWIS/Fed and the SYR 4 ICR dataset ranges from 0 percent (e.g., Region 1
tribes, Region 2 tribes, Region 4 tribes, Navajo Nation, Washington, D.C., Kentucky, and
Hawaii) to 24 percent (e.g., Oklahoma) in the number of systems. Based on the population
served by systems, the absolute percentage difference between the total population served by
systems listed in SDWIS/Fed and that listed in the SYR 4 ICR dataset is less than 1 percent.
Comparing population served values for individual States, the absolute percentage difference
between SDWIS/Fed and the SYR 4 ICR dataset ranges from 0 percent (e.g., Region 2 tribes,
Region 4 tribes, and Washington, D.C.,) to 30 percent (e.g., Utah).
Exhibit 6: Comparison of the Total Number of Systems and Population Served in
SDWIS/Fed and the SYR 4 ICR Dataset, By State
State
1 2
Total Number of Systems
Population Served
2019
SDWIS/Fed
Freeze
SYR 4 ICR
Dataset
Percent
3
Difference
2019
SDWIS/Fed
Freeze
SYR 4 ICR
Dataset
Percent
3
Difference
Alabama
579
592
2%
5,782,465
5,935,212
3%
Alaska
1,378
1,370
-1%
849,984
851,634
0.2%
American Samoa
111
100
-11%
59,379
58,476
-2%
Arizona
1,526
1,528
0.1%
6,739,728
6,777,613
1%
Arkansas
1,051
1,042
-1%
2,909,279
2,932,762
1%
California
7,498
8,394
11%
40,916,430
41,647,398
2%
Commonwealth of the
Northern Mariana
Islands
70
69
-1%
76,157
74,076
-3%
Connecticut
2,432
2,485
2%
2,877,830
2,882,881
0.2%
Colorado
2,048
2,500
18%
6,745,814
6,397,009
-5%
Delaware
482
521
7%
980,130
1,014,200
3%
Florida
5,241
5,962
12%
20,862,887
20,860,764
0.0%
Hawaii
136
136
0%
1,525,474
1,521,687
-0.2%
Idaho
2,007
1,976
-2%
1,495,882
1,516,508
1%
Illinois
5,353
6,181
13%
12,502,127
12,608,341
1%
Indiana
4,036
4,692
14%
5,512,342
5,658,801
3%
Iowa
1,817
1,982
8%
2,949,070
2,976,894
1%
Kansas
982
979
-0.3%
2,835,829
2,875,770
1%
Kentucky
433
433
0%
4,508,752
4,502,282
-0.1%
Louisiana
1,317
1,486
11%
5,074,387
5,320,364
5%
Maine
1,910
2,209
14%
931,352
968,213
4%
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-4
February 2024
-------
State
1 2
Total Number of Systems
Population Served
2019
SDWIS/Fed
Freeze
SYR 4 ICR
Dataset
Percent
3
Difference
2019
SDWIS/Fed
Freeze
SYR 4 ICR
Dataset
Percent
3
Difference
Maryland
3,302
3,337
1%
5,867,239
5,861,767
-0.1%
Massachusetts
1,727
1,759
2%
9,811,383
9,788,373
-0.2%
Minnesota
6,703
6,628
-1%
5,037,593
5,027,228
-0.2%
Missouri
2,761
3,045
9%
5,622,969
5,660,127
1%
Montana
2,196
2,176
-1%
1,067,458
1,063,777
-0.3%
Navajo Nation
171
171
0%
176,792
176,750
0.0%
Nebraska
1,339
1,494
10%
1,660,734
1,681,763
1%
Nevada
601
594
-1%
2,891,787
2,899,400
0.3%
New Hampshire
2,513
2,747
9%
1,218,513
1,256,653
3%
New Jersey
3,625
4,180
13%
9,607,693
9,718,394
1%
New York
8,401
9,454
11%
21,265,451
18,006,468
-18%
North Carolina
5,366
5,946
10%
8,975,117
9,047,042
1%
North Dakota
400
502
20%
709,109
718,937
1%
Ohio
4,418
5,241
16%
10,916,586
11,149,543
2%
Oklahoma
1,386
1,822
24%
3,721,779
3,785,103
2%
Oregon
2,496
2,720
8%
3,748,090
3,784,217
1%
Pennsylvania
8,167
9,968
18%
12,670,902
12,931,009
2%
Region 1 tribes
5
5
0%
75,826
75,845
0.0%
Region 2 tribes
9
9
0%
12,565
12,565
0%
Region 4 tribes
30
30
0%
27,571
27,571
0%
Region 5 tribes
106
123
14%
136,541
149,532
9%
Region 6 tribes
87
92
5%
187,255
194,809
4%
Region 7 tribes
14
15
7%
15,926
15,506
-3%
Region 8 tribes
148
147
-1%
140,568
141,174
0.4%
Region 9 tribes
309
302
-2%
530,167
528,365
-0.3%
Region 10 tribes
134
139
4%
132,798
143,367
7%
Rhode Island
483
479
-1%
1,134,075
1,134,759
0.1%
South Carolina
1,410
1,169
-21%
4,081,703
4,078,161
-0.1%
South Dakota
651
749
13%
839,311
849,252
1%
Tennessee
783
921
15%
7,219,007
7,269,841
1%
Texas
7,040
6,955
-1%
28,945,548
29,290,499
1%
Utah
1,046
1,055
1%
3,327,756
4,721,824
30%
Vermont
1,403
1,539
9%
614,390
628,868
2%
Virginia
2,813
3,218
13%
7,510,864
7,835,414
4%
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-5
February 2024
-------
State
1 2
Total Number of Systems
Population Served
2019
SDWIS/Fed
Freeze
SYR 4 ICR
Dataset
Percent
3
Difference
2019
SDWIS/Fed
Freeze
SYR 4 ICR
Dataset
Percent
3
Difference
Washington
4,457
4,386
-2%
8,029,486
8,184,593
2%
Washington, D.C.
6
6
0%
665,602
665,602
0%
West Virginia
857
831
-3%
1,597,832
1,599,584
0%
Wisconsin
11,325
12,835
12%
5,040,624
5,109,898
1%
Wyoming
778
764
-2%
589,509
588,998
-0.1%
Total
129,873
142,190
9%
301,959,417
303,183,463
0.4%
1 The majority of the water systems with data in the SYR 4 ICR dataset are transient non-community water systems. Because only
the nitrate/nitrite regulations require compliance monitoring by these transient systems (see Exhibit 7), data from the transient
systems were included only for the nitrate and nitrite occurrence analyses and were excluded for all occurrence analyses for lOCs,
SOCs, VOCs, and radiological contaminants.
2 The data shown did not undergo QA procedures.
3
The "percent difference" was calculated by subtracting the 2019 SDWIS/Fed Freeze total number of systems (or population served
by systems) from the SYR 4 ICR dataset total number of systems (or population served by systems). That difference was then
divided by the total number of systems (or population served by systems) from the SYR 4 ICR dataset. The percent difference is
less than zero if the SYR 4 ICR dataset indicated a smaller number of systems (or population served by systems).
Exhibit 7 compares the number of systems and population served by these systems in the
December 2019 SDWIS/Fed freeze and the SYR 4 ICR dataset stratified by source water type
and system type. The total differences for all 59 States indicate 9 percent more systems and 0.4
percent greater population served is reported in the SYR 4 ICR dataset than in SDWIS/Fed. For
community water systems (CWSs), the difference is 3 percent based on the number of systems
and 1 percent based on the population served by systems. For non-transient non-community
water systems (NTNCWSs), the difference is 8 percent based on the number of systems and 3
percent based on the population served by systems. For transient non-community water systems
(TNCWSs), the difference is 10 percent based on the number of systems and 9 percent based on
the population served by systems. Overall, these comparisons indicate that the SYR 4 ICR
dataset is suitable for use as the basis of national contaminant occurrence estimates. As stated
earlier in this report, the 59 States that provided data for the SYR 4 ICR dataset comprise 88
percent of all PWSs and 92 percent of the total population served by PWSs, representing a
nati onwi de di stributi on of PW S s.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-6
February 2024
-------
Exhibit 7: Comparison of the Total Number of Systems and Population Served in SDWIS/Fed and the SYR 4 ICR
Dataset, By Source Water Type and System Type
Source Water
Type
2019 SDWIS/Fed Freeze
SYR 4 ICR Dataset
CWS
NTNCWS
TNCWS
Total
CWS
NTNCWS
TNCWS
Unknown1
Total
Number of Systems
Ground Water
(GW)
33,613
14,905
67,564
116,082
35,528
16,181
75,027
745
127,481
Surface Water
(SW)
10,807
755
2,172
13,734
10,145
701
2,240
135
13,221
Unknown
27
8
22
57
119
96
312
961
1,488
Total
44,447
15,668
69,758
129,873
45,792
16,978
77,579
1,841
142,190
Population Served
Ground Water
(GW)
81,806,757
4,631,058
8,663,270
95,101,085
107,516,099
4,954,238
9,600,777
49,520
122,120,634
Surface Water
(SW)
202,988,465
1,363,942
2,486,544
206,838,951
179,187,202
1,211,353
533,646
4,474
180,936,675
Unknown
11,676
4,855
2,850
19,381
33,000
16,735
75,105
1,314
126,154
Total
284,806,898
5,999,855
11,152,664
301,959,417
286,736,301
6,182,326
10,209,528
55,308
303,183,463
1 Systems with unknown system type (i.e., system type not reported by the State) were included in the fourth Six-Year Review analyses.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-7
February 2024
-------
5.2 Quality Assurance Measures Applied to All Contaminants
Before analyzing contaminant occurrence, EPA performed a rigorous QA/QC evaluation of the
data from each State. When necessary, EPA contacted States, sent detailed flagged records
reports, and asked specific questions about its dataset. Question topics included descriptions of
non-intuitive data element names, definitions of field headings, or non-standard codes that were
not described in any documentation files from the State. EPA also confirmed that all of the
requested contaminants were included in each State's dataset. When a State was missing data for
any of the contaminants, EPA asked the State to identify the reason for the omission, such as a
statewide waiver of the requirement to monitor for the contaminant(s). The information provided
by each State was recorded.
Exhibit 8 lists the contaminant groups that each system type is required to monitor. All data that
passed the QA/QC process from these systems were included in the SYR 4 occurrence analyses.
Data from systems that were not required to sample for a given contaminant (e.g., SOC data
from transient systems, radionuclide data from non-community systems) were excluded from the
SYR 4 analyses.
Exhibit 8: Contaminant Group Monitoring Requirements
Contaminant Group
System Types Required to Sample
(sample data included in analyses)
System Types Not Required to
Sample (sample data excluded
from analyses)
Inorganic
Contaminants
(lOCs)
All non-purchased community water systems
and non- transient non-community water
systems are required to sample for lOCs.
All purchased systems and
transient non-community water
systems are not required to
sample for lOCs.
Lead and Copper
All (non-purchased and purchased) community
water systems and non-transient non-community
water systems are required to sample for lead
and copper.
Transient non-community water
systems are not required to sample for
lead and copper.
Nitrate and Nitrite
Non-purchased community water systems, non-
transient non-community water systems, and
transient non-community water systems are all
required to sample for nitrate and nitrite.
All purchased systems are not
required to sample for nitrate
and nitrite.
Synthetic Organic
Contaminants
(SOCs)
All non-purchased community water systems
and non- transient non-community water
systems are required to sample for SOCs.
All purchased systems and
transient non-community water
systems are not required to
sample for SOCs.
Volatile Organic
Contaminants
(VOCs)
All non-purchased community water systems
and non- transient non-community water
systems are required to sample for VOCs.
All purchased systems and
transient non- community water
systems are not required to
sample for VOCs.
Radiological
Contaminants
All non-purchased community water
systems are required to sample for the
radionuclides.
All purchased systems and non-
purchased non-transient non-
community and non-purchased
transient non-community water
systems are not required to sample
for radionuclides.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-1
February 2024
-------
Contaminant Group
System Types Required to Sample
(sample data included in analyses)
System Types Not Required to
Sample (sample data excluded
from analyses)
Disinfection
Byproducts and
Disinfectant
Residuals
Stage 1 and Stage 2 DBP Rules: All community
water systems and non-transient noncommunity
water systems that add a disinfectant other than
ultraviolet (UV) light or deliver disinfected water,
and transient non-community water systems that
add chlorine dioxide.
Community water systems and non-
transient noncommunity water
systems that do not add a
disinfectant other than UV light, as
well as transient non-community
water systems that add a disinfectant
other than chlorine dioxide.
Microbial
Contaminants and
Disinfectant
Residuals
Groundwater Rule (GWR): The GWR applies to
all public water systems that use ground water,
including consecutive systems, except that it does
not apply to PWSs that combine all of their
ground water with surface water or with ground
water under the direct influence of surface water
prior to treatment.
Surface Water Treatment Rules (SWTRs): The
SWTRs apply to all public water systems that use
surface water or ground water under direct
influence of surface water.
Revised Total Coliform Rule (RTCR): The RTCR
applies to all public water systems.
None.
EPA created several automated data QA checks within the SYR 4 ICR dataset. These QA checks
identified (i.e., flagged) records of potential data quality concerns. EPA sent out a detailed
flagged record report to each State describing the identified records. These reports included the
counts of flagged records by category, as well as specific questions for each category. In
addition, an attachment identified the specific records that were flagged. EPA requested that each
State provide the appropriate disposition (e.g., delete, make corrections) of these flagged records.
EPA documented all changes made to the compliance monitoring data and suggested to the
States that they make corrections in their data system as well, if appropriate. To resolve data
quality issues that required significant corrections, such as identifying outliers or identifying and
changing incorrect units, consultations with state data management staff were conducted or
attempted before data corrections were completed.
Sections 5.2 through 5.5 provide a description of the various QA measures applied to the SYR 4
dataset to identify records of potential data quality concern. For all flagged records, input from
States was always considered as the initial criteria in deciding on the appropriate action or
decision to include or exclude the record from analysis. When States did not provide a response
or action, EPA used best professional judgement on whether to include or exclude the data in
question. When a determination was made to exclude records from the occurrence analyses, a
code was added to the transaction table in the database. This code could be changed if EPA were
to revise their decision about the exclusion of particular records from the occurrence analyses.
Section 5.2.1 through Section 5.2.5 describe the QA measures that were applied to the entire
database (i.e., all regulated contaminant monitoring data). Exhibit 9 provides a visual
representation of the overall flow of the QA/QC process for QA measures applied to all SYR 4
contaminants. Additional QA/QC measures applied to specified groups of contaminants are
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-2
February 2024
-------
included in Section 5.3 (chemicals and radionuclides), Section 5.4 (DBPs and related
parameters), and Section 5.5 (microbes and residuals). Additional QA/QC measures were also
taken to identify and exclude fluoride samples from fluoridated water systems prior to the
occurrence analysis. See "Review of Fluoride Occurrence for the Fourth Six-Year Review"
(USEPA, 2024c) for more information on additional QA/QC measures for fluoride data.
Exhibit 9: Flow Chart of QA Measures Applied to All SYR 4 Contaminants
Isthe record from a non-public watersystem?
yes
Exclude from analysis.
no
Is the record from a system with missing inventory info
yes
(e.g., source water type and population served information)?
no
yes
Isthe record from outside of the SYR4 date range (2012-2019)?
no
yes
Isthe record marked as being
"not for compliance"?
Exclude from analysis.
Exclude from analysis.
Exclude from analysis.
Move onto next phase of QA review
5.2.1 Non-Public Water Systems
Some States require water systems that do not meet the criteria to be classified as a PWS to
submit sample results that are "routine" or "for compliance." The State's information system
usually identifies these water systems as "non-public" or uses another method to differentiate
them from PWSs. All records from non-public water systems were excluded from the occurrence
analysis. The records that were included in the occurrence analysis were from systems that
classify as PWSs, by definition or systems that identify as a PWS (e.g., wholesale systems).
5.2.2 Systems with Missing Inventory Data
For some of the non-SDWIS States, there were systems for which the inventory information was
missing (e.g., no source water type, no population served). When inventory data were incomplete
or missing, the missing data were populated from the SDWIS/Fed data from the fourth quarter of
December 2019. All cases where SDWIS/Fed data were used to populate inventory data fields in
the State's dataset were documented. The inventory information for a given system may differ
over time, so the SDWIS/Fed data may not fully match the actual inventory information at the
time of sampling. All records from systems whose inventory data were still missing after filling
gaps with SDWIS/Fed were excluded from the occurrence analysis.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-3
February 2024
-------
5.2.3 Sample Results Collected Outside of the Date Range
The SYR 4 ICR requested compliance monitoring data and treatment technique information from
January 1, 2012 through December 31, 2019. The extraction tool only pulled sample results from
this time period. However, some non-SDWIS States submitted sample results from outside of
this date range; all sample results collected outside of the date range were excluded from the
occurrence analysis.
5.2.4 Non-Compliance
In some cases, water systems may submit sample results that are not used to determine
compliance with NPDWRs. States that use information systems with automated compliance
determination functions often use indicators to differentiate these sample results such as the
"compliance purpose indicator code" or something similar. While the extraction tool only pulled
compliance sample results, some non-compliance sample results were present in data from the
non-SDWIS States. There were a few non-SDWIS States for which EPA asked for more details
on how to accurately identify the sample results that were for compliance. Three non-SDWIS
States (California, Colorado, and Minnesota) did not make a designation as to whether their data
were for compliance. For all occurrence analyses, EPA assumed that all data from these three
States were for compliance. All sample results flagged as "not for compliance" were excluded
from the occurrence analysis.
5.2.5 Uniform System Inventory Information
For analysis, each system must have a single source water type and population-served
designation to define each system in a unique source water type/population size strata. Systems
using both ground water and surface water as well as systems using ground water under direct
influence of surface water were considered surface water systems to include in the occurrence
analyses. This methodology to designate source may underestimate the number of groundwater
systems and overestimate the number of surface water systems. Systems with more than one
specified value of population served were assigned the population served value that occurred
most frequently within those years of data collected.
5.3 Quality Assurance Measures Applied to Chemicals and Radionuclides
In addition to the QA measures described in Section 5.2, there were several other QA measures
applied to only the chemical contaminants and radionuclides. Those QA measures are described
in Sections 5.3.1 through 5.3.10. Additional QA measures are shown in Exhibit 10.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-4
February 2024
-------
Exhibit 10. Flow Chart of Additional QA Measures Specific to Chemicals,
Radionuclides, and Lead and Copper
Exhibit 11 documents the specific counts of records included and excluded in each QA step.
After applying the various QA measures to nearly 26 million SYR 4 ICR records for the
Chemical Phase, Radionuclides, and Lead and Copper Rules' contaminants, 96 percent of the
records remained in the final dataset. Most of the records were removed in either Step 9, removal
of records from transient water systems for contaminants for which transient water systems are
not required to sample or Step 11, removal of records from consecutive water systems, which are
not required to sample for the Chemical Phase or Radionuclides Rules' contaminants.
Data Management and QJ/QC Process
for the SYR 4 ICR Dataset
5-5
February 2024
-------
Exhibit 11: Summary of the Count of Sample Analytical Results Removed via the
QA Measures Applied to Chemical Phase, Radionuclides and Lead and Copper
Rules' Contaminants
QA Step
Count of Records
Included
Excluded
Original number of analytical sample results1
25,756,988
Step 1: Removal of analytical sample results from non-public water systems
25,752,276
4,712
Step 2: Removal of data from systems with missing source water type and/or
population served information
25,712,838
39,438
Step 3: Removal of data with a sample collection date outside the SYR 4 date
range of 2012 -2019
25,637,677
75,161
Step 4: Removal of data marked as being "not for compliance"
25,567,220
70,457
Step 5: Removal of records marked with a sample type code other than routine or
confirmation
25,455,914
111,306
Step 6: Removal of records marked as potential duplicates, along with a state
response saying that one set of the duplicate results should be excluded.
25,448,501
7,413
Step 7: Removal of data with detected concentrations with non-standard / blank
unit of measure for the contaminant
25,448,171
330
Step 8: Removal of detected concentrations identified as potential high or low
outliers
25,435,824
12,347
Step 9: Removal of records from transient water systems for contaminants for
which transients are not required to sample
25,086,334
349,490
Step 10: Removal of records from non-transient water systems for radionuclides
25,070,331
16,003
Step 11: Removal of records from consecutive water systems
24,625,831
444,500
Step 12: Removal of raw water records where less than half the facility's records
are raw
24,611,906
13,925
Step 13: Other flags (e.g., State responded that nitrate / nitrite records had been
incorrectly entered, State included rows of data with no concentration value or
detect / non-detect identifier)
24,596,843
15,063
Final number of records
24,596,843
Percent Included
95%
1 The following 72 analytes are represented in the counts above: lead, copper, arsenic, barium, cadmium, chromium, cyanide,
fluoride, mercury, nitrate-nitrite, nitrate, nitrite, selenium, antimony, total, beryllium, total, thallium, total, asbestos, endrin, bhc-
gamma, methoxychlor, toxaphene, dalapon, diquat, endothall, glyphosate, di(2-ethylhexyl) adipate, oxamyl, simazine, di(2-
ethylhexyl) phthalate, picloram, dinoseb, hexachlorocyclopentadiene, carbofuran, atrazine, alachlor lasso, 2,3,7,8-tcdd, heptachlor,
heptachlor epoxide, 2,4-d, 2,4,5-tp, hexachlorobenzene, benzo(a)pyrene, pentachlorophenol, 1,2,4-trichlorobenzene, cis-1,2-
dichloroethylene, total polychlorinated biphenyls (PCBs), 1,2-dibromo-3-chloropropane, ethylene dibromide, xylenes, total,
chlordane, dichloromethane, o-dichlorobenzene, p-dichlorobenzene, vinyl chloride, 1,1-dichloroethylene, trans-1,2-dichloroethylene,
1,2-dichloroethane, 1,1,1-trichloroethane, carbon tetrachloride, 1,2-dichloropropane, trichloroethylene, 1,1,2-trichloroethane,
tetrachloroethylene, chlorobenzene, benzene, toluene, ethylbenzene, styrene, gross alpha, excl. radon & uranium, combined
uranium, combined radium (-226 & -228), and gross beta particle activity.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-6
February 2024
-------
5.3.1 Non-Routine
Some States have regulations that are more stringent than the NPDWRs and require water
systems to submit more sample results than federally required. States also may require
laboratories to report all sample results from water systems including results from contaminants
that are not regulated. Usually, non-routine sample results that are specifically listed as "special
request" in the database are also identified as being "non-compliance" samples. Most other types
of non-routine sample results, such as confirmation, repeat, or maximum residence time sample
results are "for compliance." While the extraction tool excluded sample results that were "not for
compliance," some "special" sample results that were marked as being "for compliance" were
included in the data extracted from SDWIS States. In addition, "non-routine/not for compliance"
results were present in data from the non-SDWIS States. All results that were marked as routine
(RT) or confirmation (CO) were included in the occurrence analyses for the Chemical Phase
Rules (i.e., contaminants evaluated in USEPA (2024a)); all other sample results for those
contaminants were considered "non-routine" and were excluded from the occurrence analyses.
5.3.2 Duplicate Records
Potential duplicate sample analytical results for chemical contaminants and radionuclides were
identified as all detection records with the same PWSID, sample point ID, analyte, sample
collection date, and concentration. All records identified as potential duplicates were retained in
the occurrence analysis unless the State responded to indicate that records were indeed duplicates
and should be excluded.
5.3.3 Units of Measure
EPA identified all detection records for the Chemical Phase and Radionuclides Rules'
contaminants where the units of measure reported were not one of the standard units used for the
particular contaminant (i.e., not mg/L, |ig/L, MFL (million fibers per liter), or pCi/L). For
example, a benzene record with a unit of measure listed as NTU would be flagged since NTU is
the unit of measure specifically for turbidity. EPA excluded all records in non-standard units
from the occurrence analyses unless there was strong evidence of the correct standard unit (e.g.,
state response indicating the correct unit of measure, obvious data entry error, concentration is
within the range of standard units and all other records from the State are reported in the standard
units).
5.3.4 Potential Outliers
To identify potential high outliers, EPA flagged all detected concentrations for the Chemical
Phase and Radionuclides Rules' contaminants that were greater than 4 times the contaminant's
MCL and all detected concentrations that were greater than 10 times the contaminant's MCL. All
detected concentrations greater than 10 times the MCL were also included in the set of detected
concentrations that were greater than 4 times the MCL. To identify potential low outliers, EPA
flagged all detected concentrations that were less than one-tenth the minimum MDL. Exhibit 12
provides a list of all relevant MCL and MDL values for these contaminants.
EPA included questions to the State on each of these potential high and low outliers in their
flagged record report. Any changes suggested by the States were implemented for these records.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-7
February 2024
-------
For example, some States wrote back to say there were "no errors" in their high detect
concentrations or that they had "no reason or evidence to show these data to be invalid." Other
States explained that "all of the high results were due to using mg/L when they should have been
|ig/L" For the States that did not respond, all detected concentrations greater than 100 times the
contaminant's MCL were excluded from the analysis, as were all detected concentrations less
than one-hundredth the contaminant's minimum MDL. All other potential outliers less than or
equal to 100 times the contaminant's MCL or greater than or equal to one-hundredth the
contaminant's minimum MDL were included in the analysis. The values of 100 times the MCL
and one-hundredth times the minimum MDL were chosen as conservative high-end and low-end
cut-offs, respectively. For example, a benzene detected concentration of 1,600 ug/L was
excluded as it was a likely data entry error. Likewise, a thallium record with a detected
concentration of 0.00254 ug/L was excluded.
Exhibit 12: List of Contaminant MCL and MDL Values
Contaminant
Maximum Contaminant Level
(MCL)
Method Detection Limit
(MDL)
Value
Unit of
Measure
Value
Unit of
Measure
Inorganic Contaminants
Antimony
6
hq/l
0.4
pg/L
Arsenic
10
hq/l
0.5
pg/L
Asbestos
7
MFL
-
MFL
Barium
2,000
pg/L
0.8
pg/L
Beryllium
4
pg/L
0.2
pg/L
Cadmium
5
pg/L
0.05
pg/L
Chromium (Total)
100
pg/L
0.08
pg/L
Copper
AL1 = 1,300
pg/L
0.5
pg/L
Cyanide
200
pg/L
5
pg/L
Fluoride
4,000
pg/L
0.01
pg/L
Lead
AL1 = 15
pg/L
0.6
pg/L
Mercury (Inorganic)
2
pg/L
0.2
pg/L
Nitrate (as N)
10,000
pg/L
0.002
pg/L
Nitrite (as N)
1,000
pg/L
0.004
pg/L
Selenium
50
pg/L
0.6
pg/L
Thallium
2
pg/L
0.3
pg/L
Synthetic Organic Contaminants
Alachlor
2
pg/L
0.009
pg/L
Atrazine
3
pg/L
0.003
pg/L
Benzo(a)pyrene
0.2
pg/L
0.016
pg/L
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-8
February 2024
-------
Contaminant
Maximum Contaminant Level
(MCL)
Method Detection Limit
(MDL)
Value
Unit of
Measure
Value
Unit of
Measure
Carbofuran
40
hq/l
0.52
pg/L
Chlordane
2
hq/l
0.001
pg/L
Dalapon
200
pg/L
0.054
pg/L
Di(2-ethylhexyl)adipate (DEHA)
400
pg/L
0.09
pg/L
Di(2-ethylhexyl)phthalate (DEHP)
6
pg/L
0.46
pg/L
1,2-Dibromo-3-chloropropane (DBCP)
0.2
pg/L
0.009
pg/L
2,4-Dichlorophenoxyacetic acid
70
pg/L
0.055
pg/L
Dinoseb
7
pg/L
0.166
pg/L
Diquat
20
pg/L
0.72
pg/L
Endothall
100
pg/L
0.7
pg/L
Endrin
2
pg/L
0.002
pg/L
Ethylene Dibromide (EDB)
0.05
pg/L
0.008
pg/L
Glyphosate
700
pg/L
6
pg/L
Heptachlor
0.4
pg/L
0.0015
pg/L
Heptachlor Epoxide
0.2
pg/L
0.001
pg/L
Hexachlorobenzene
1
pg/L
0.001
pg/L
Hexachlorocyclopentadiene
50
pg/L
0.004
pg/L
Lindane (gamma-Hexachlorocyclohexane)
0.2
pg/L
0.003
pg/L
Methoxychlor
40
pg/L
0.003
pg/L
Oxamyl (Vydate)
200
pg/L
0.86
pg/L
Pentachlorophenol
1
pg/L
0.014
pg/L
Picloram
500
pg/L
0.05
pg/L
Polychlorinated biphenyls (PCBs)
0.5
pg/L
0.039
pg/L
Simazine
4
pg/L
0.008
pg/L
Toxaphene
3
pg/L
0.13
pg/L
2,3,7,8-TCDD (Dioxin)
0.00003
pg/L
0.0000044
pg/L
2,4,5-Trichlorophenoxypropionic Acid
(Silvex)
50
pg/L
0.033
pg/L
Volatile Organic Contaminants
Benzene
5
pg/L
0.1
pg/L
Carbon Tetrachloride
5
pg/L
0.002
pg/L
1,2-Dichlorobenzene
600
pg/L
0.02
pg/L
1,4-Dichlorobenzene
75
pg/L
0.01
pg/L
1,2-Dichloroethane
5
pg/L
0.02
pg/L
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-9
February 2024
-------
Contaminant
Maximum Contaminant Level
(MCL)
Method Detection Limit
(MDL)
Value
Unit of
Measure
Value
Unit of
Measure
1,1-Dichloroethylene
7
hq/l
0.05
pg/L
cis-1,2-Dichloroethylene
70
hq/l
0.02
pg/L
trans-1,2-Dichloroethylene
100
pg/L
0.03
pg/L
Dichloromethane
5
pg/L
0.02
pg/L
1,2-Dichloropropane
5
pg/L
0.01
pg/L
Ethylbenzene
700
pg/L
0.01
pg/L
Monochlorobenzene
100
pg/L
0.01
pg/L
Styrene
100
pg/L
0.01
pg/L
Tetrachloroethylene
5
pg/L
0.002
pg/L
Toluene
1,000
pg/L
0.01
pg/L
1,2,4-T richlorobenzene
70
pg/L
0.02
pg/L
1,1,1-Trichloroethane
200
pg/L
0.005
pg/L
1,1,2-Trichloroethane
5
pg/L
0.01
pg/L
Trichloroethylene
5
pg/L
0.002
pg/L
Vinyl Chloride
2
pg/L
0.01
pg/L
Xylenes (Total)
10,000
pg/L
0.01
pg/L
Radiological Contaminants
Alpha Particles
15
pCi/L
-
-
Beta Particles2
50
pCi/L
-
-
Combined Radium-226 & -228
5
pCi/L
-
-
Uranium
30
pg/L
-
-
1 AL - Action Level
2
The analyses presented here are based on compliance monitoring data represented in units of pCi/L and are conducted relative to
the screening threshold of 50 pCi/L.
5.3.5 Transient Water Systems
Transient non-community water systems (TNCWS) operate for at least 60 days per year and
serve at least 25 people per day. With regard to the Chemical Phase and Radionuclides Rules,
transient water systems are only required to submit nitrate, nitrite, or total nitrate/nitrite sample
results collected from entry points. Unless a State responded to say that the system in question
used to be a CWS or NTNCWS at the time of sampling (and thus the records should be
included), all data from transient water systems were excluded from the occurrence analyses
presented in USEPA (2024a), except for nitrate, nitrite, or total nitrate/nitrite which TNCWS are
required to monitor.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-10
February 2024
-------
5.3.6 Non-Community Water Systems (Radionuclides Only)
Transient non-community water systems and non-transient non-community water systems are
not required to submit radiological sample results. All data from non-community water systems
were excluded from the occurrence analyses for the radionuclides.
5.3.7 Source Water Type Adjustment
As explained in Section 5.2.5, each system is defined with a single source water type and
population-served category. For the Chemical Phase and Radionuclides Rules analyses, an
adjustment to the source water type was necessary for a select group of systems whose water
came from a mix of consecutive connections and their own sources. Specifically, these were
systems that do not have their own surface intake or other SW facilities but do purchase some
SW, in addition to using their own GW wells. In these cases, because the system does include
some purchased surface water (SWP) sources, the federal source water type is listed as SWP in
SDWIS/Fed and in the States' compliance monitoring data. This is the case even if the system
only purchases a small portion of their water and the rest of the water comes from GW wells. To
capture the legitimate (and required) compliance monitoring data from purchased systems (e.g.,
SWP, GWP) with their own GW wells, EPA reclassified the source water type of these systems
prior to occurrence and preliminary exposure analyses. To identify purchased systems with their
own GW wells, EPA reviewed all non-emergency, active facilities within a system. When active
facilities with GW wells were identified, the system's source water type code was updated to
"GW" in the SYR 4 ICR database. When all active, non-emergency facilities were classified as
purchased sources according to SDWIS/Fed database (frozen fourth quarter 2019), the system
was designated as a consecutive system (see Section 5.3.8).
5.3.8 Consecutive Water Systems
Consecutive water systems purchase 100 percent of their water from another water system(s).
These systems do not have sources that require entry point monitoring for the Chemical Phase or
Radionuclides Rules except for lead and copper. Analytical records from consecutive systems
were excluded from the occurrence analyses for chemicals and radionuclides presented in
USEPA (2024a) because this monitoring was not required for compliance. Population-served
values and occurrence estimates in USEPA (2024a) were generated using the adjusted total
populations served. Section 5.3.8 describes the process of identifying consecutive systems, and
Section 6.2 discusses the adjustments of the population served to account for consecutive
systems.
5.3.9 Samples from Source/Raw Water
EPA investigated source water samples (i.e., raw water samples) in some cases. In some States,
systems are allowed to monitor raw water before treatment, rather than finished drinking water.
If a contaminant is detected in a raw water sample at or above a level specified by the State, the
system is required to collect a follow-up sample at the entry point to the distribution system,
unless the water is not treated. EPA reviewed the raw (i.e., untreated, unfinished) samples related
to the contaminants regulated under the Chemical Phase and Radionuclides Rules. EPA reviewed
data at the facility-level (e.g., GW well, treatment plant) and excluded raw water records from
the analysis if raw water records comprised less than 50 percent of the overall number of records
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-11
February 2024
-------
for the facility. EPA assumed that non-compliance source water samples had been incidentally
included in the ICR reporting when they comprised less than half of the monitoring records for a
given facility. When source water samples represented more than 50 percent of a facility's
samples, EPA assumed that source water samples were intended for compliance.
5.3.10 Mismatched Nitrate and Nitrite Data
In some cases, data appeared to be mismatched for nitrate and nitrite. EPA reviewed data for
instances where a nitrate and a nitrite result were reported as having an identical analytical result
in the same water system on the same date and took corrective actions such as removing such
data from the analysis or determining that the intent had been to report a single total nitrate plus
nitrite result. EPA also evaluated cases where it was likely that nitrate and nitrite results were
reversed and corrected them per State response when available.
5.4 Quality Assurance Measures Applied to DBPs and Related Parameters
In addition to the QA measures described in Section 5.2 that were applied to all contaminants,
several additional contaminant-specific QA measures were applied to DBP data. For this reason,
QA measures applied to DBP data will differ from those QA measures applied to chemical,
radionuclide, and microbial contaminant data. The QA measures applied to DBPs and DBP-
related parameters are described in this section. Exhibit 13 presents a flow chart of these
additional QA measures for DBPs and DBP related parameters.
Exhibit 13. Flow Chart of Additional QA Measures Specific to DBPs and DBP-
Related Parameters
Exhibit 14 documents the specific counts of DBP records included and excluded in each QA
Data Management and OA/OC Process
for the SYR 4 ICR Dataset
5-12
February 2024
-------
step. After applying the various QA measures to nearly 12 million SYR 4 ICR records for the
DBPs and DBP related parameters, 96 percent of the records from 58 States remained in the final
dataset. Exhibit 14 includes records for the following DBP contaminants: total trihalomethanes
(TTHM), bromoform, chloroform, dibromochloromethane, bromodichloromethane, five
haloacetic acids (HAA5), dibromoacetic acid, dichloroacetic acid, monobromoacetic acid,
monochloroacetic acid, trichloroacetic acid, bromate, chlorite and DBP-related parameters: pH,
alkalinity, and total organic carbon (TOC).
Exhibit 14: Summary of the Count of Analytical Sample Results Removed via the
QA Measures Applied to DBP Rule Contaminants1
QA Step
Count of Records
Included
Excluded
Original number of analytical sample results
11,755,299
Step 1: Removal of analytical sample results from non-public water systems.
11,754,859
440
Step 2: Removal of data from systems with missing source water type and/or
population served information.
11,748,860
5,999
Step 3: Removal of data with a sample collection date outside of the Six-Year 4
date range of 2012 -2019.
11,717,184
31,676
Step 4: Removal of data marked as being "not for compliance."
11,700,871
16,313
Step 5: Removal of DBP data with sample type code other than "RT" (routine),
"CO" (confirmation), "DS" (distribution system), or "MR" (max. residence).
11,671,157
29,714
Step 6: Removal of records marked as potential duplicates, along with a state
response saying that one set of the duplicate results should be excluded.
11,652,715
18,442
Step 7: Removal of DBP data with detected concentrations with non-
standard/blank unit of measure for the contaminant.
11,651,996
719
Step 8: Removal of detected concentrations greater than 100*MCL or less than
1/100*MDL for the contaminant. For TOC, removal of detections >100xMCL.
11,651,791
205
Step 9: Removal of DBP records sampled outside of the distribution system or
entry point to the distribution system.
11,229,596
422,195
Step 10: Removal of records with no data/results
11,229,589
7
Step 11: Removal of records with irregular system type codes (specific to State of
PA where unknown system type codes were included)
11,228,599
990
Final number of records
11,228,599
Percent Included
96%
1 This table includes records for the following contaminants: TTHM, bromoform, chloroform, dibromochloromethane,
bromodichloromethane, HAA5, dibromoacetic acid, dichloroacetic acid, monobromoacetic acid, monochloroacetic acid,
trichloroacetic acid, bromate, chlorite, pH, alkalinity, and total organic carbon (TOC).
5.4.1 Non-Routine Samples
Some States have regulations that are more stringent than the NPDWRs and require water
systems to submit more sample results than federally required. States also may require
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-13
February 2024
-------
laboratories to report all sample results from water systems including results from contaminants
that are not regulated. Usually, non-routine sample results that are specifically listed as "special
request" in the database are also identified as being "non-compliance" samples. Most other types
of non-routine sample results, such as confirmation, repeat, or maximum residence time sample
results are considered "for compliance." While the extraction tool excluded sample results that
were "not for compliance," some "special" sample results that were marked as being "for
compliance" were included in the data extracted from SDWIS States. In addition, "non-
routine/not for compliance" results were present in data from the non-SDWIS States. All DBP
results that were marked as routine (RT), confirmation (CO), or maximum residence (MR) were
included in the DBP dataset.
5.4.2 Duplicate Records
In the SYR 4 analysis of DBPs and DBP-related parameters data, potential duplicates were
identified as all detection records with the same PWSID, sample point ID, analyte, sample
collection date, and concentration. All records identified as potential duplicates were retained in
the occurrence dataset unless the State responded to indicate that records were indeed duplicates
and should be excluded from the occurrence analyses.
5.4.3 Units of Measure
EPA identified all detection records for the DBPs, TOC, and alkalinity where the units of
measure reported were not one of the standard units used for the particular contaminant (i.e., not
mg/L or |ig/L), For example, a chloroform record with a unit of measure listed as NTU would be
flagged. All records in non-standard units were excluded from the occurrence dataset unless
there was strong evidence of the correct standard unit (e.g., state response indicating the correct
unit of measure, obvious data entry error, concentration is within the range of standard units and
all other records from the State are reported in the standard units).
5.4.4 Potential Outliers
To identify potential high outliers, EPA flagged all detected concentrations for the DBP-rule
contaminants that were greater than 4 times the contaminant's MCL and all detected
concentrations that were greater than 10 times the contaminant's MCL. All detected
concentrations greater than 10 times the MCL were also included in the set of detected
concentrations that were greater than 4 times the MCL. Any concentration identified in the
greater than 10 times the MCL would be captured in the greater than 4 times MCL and then
followed up with the State about them. Exhibit 15 provides a list of all relevant MCL values. For
total organic carbon, which is not listed in Exhibit 15, all results greater than 100 mg/L were
excluded from the data file.
EPA included questions to the State on each of these potential high and low outliers in their
flagged record report. Any changes suggested by the States were implemented for these records.
For example, some States wrote back to say there were "no errors" in their high detect
concentrations or that they had "no reason or evidence to show these data to be invalid." Other
States explained that "all of the high results were due to using mg/L when they should have been
|ig/L" For the States that did not respond, all detected DBP concentrations greater than 100
times the contaminant's MCL were excluded from the analyses. No low-end cut-off was applied
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-14
February 2024
-------
for the DBP data. All other potential outliers less than or equal to 100 times the contaminant's
MCL were included in the occurrence analysis. The value of 100 times the MCL was chosen as a
conservative high-end cut-off. For example, a TTHM detected concentration of 10,000 ug/L was
excluded as it was assumed a data entry error.
Exhibit 15: List of DBP MCL Values
Contaminant
Maximum Contaminant
Level (MCL) (pg/L)
Chloroform
00
o
Bromoform
801
Bromodichloromethane
00
o
Dibromochloromethane
801
Total Trihalomethanes (TTHM)
80
Monochloroacetic Acid
602
Dichloroacetic Acid
602
Trichloroacetic Acid
602
Bromoacetic Acid
602
Dibromoacetic Acid
602
Haloacetic acids 5 (HAA5)
60
Bromate
10
Chlorite
1,000
1 The MCL for total trihalomethanes is 80 ng/L but the individual trihalomethane results were also compared against that MCL to
identify potential outliers.
2 The MCL for the sum of five haloacetic acids is 60 ng/L but the individual haloacetic acid results were also compared against that
MCL to identify potential outliers.
5.4.5 Locational Flag
While the occurrence of DBPs could theoretically occur anywhere in a given water system, EPA
is primarily focused on the occurrence in the distribution system. As such, EPA excluded any
DBP records with a location sampling point type that was not obviously a part of the distribution
system or entry point to the distribution system, such as sampling results from raw or source
waters. Specifically, the following location sampling point types were not flagged for exclusion:
DS (distribution system), EP (entry point), FC (first customer), FN (finished), LD (lowest
disinfectant residual), MD (midpoint of distribution system), or MR (maximum residence time).
For records whose sampling point location type was either null or labeled as a generic "Water
System Facility Point," an additional filter was added to make sure any records with a water
system facility type that was likely associated with the distribution system were not excluded.
Specifically, the following facility type codes were not flagged for exclusion when the sampling
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-15
February 2024
-------
point type code was listed as WS (water system facility point) or null: CC (consecutive
connection), DS (distribution system), TM (transmission main), or TP (treatment plant).
5.5 Quality Assurance Measures Applied to Microbial Contaminants
In addition to the QA measures described above in Section 5.2, there were a handful of
additional QA measures applied to only microbial contaminants. Those QA measures are
described in this section. Exhibit 16 is a flow chart of the additional QA measures.
Exhibit 16. Flow Chart of Additional QA Measures Specific to Microbial
Contaminants
Exhibit 17 documents the specific counts of microbial records included and excluded in each QA
step. After applying the various QA measures to more than 28 million SYR 4 ICR microbial
records, 99 percent of the records from 57 States remained in the final dataset that was used for
conducting occurrence analyses.
Exhibit 17: Summary of the Count of Analytical Samples Results Removed via the
QA Measures Applied to Microbial Rule Contaminants1
QA Step
Count of Records
Included
Excluded
Original number of analytical samples results
28,329,039
Stepl: Removal of analytical sample results from non-public water systems.
28,315,533
13,506
Step 2: Removal of data from systems with missing source water type and/or
population served information.
28,236,298
79,235
Step 3: Removal of data with a sample collection date outside of the Six-Year 4
date range of 2012 -2019.
28,114,841
121,457
Step 4: Removal of data marked as being "not for compliance."
27,985,027
129,814
Step 5: Removal of microbial data with sample type code other than "RT" (routine),
"RP" (repeat), or'TG" (triggered).
27,981,035
3,992
Step 6: Removal of records with no data/results
27,964,042
16,993
Data Management and OA/OC Process
for the SYR 4 ICR Dataset
5-16
February 2024
-------
OA Step
Count of Records
Included
Excluded
Step 7: Removal of records with irregular system type codes (specific to State of
PA where unknown system type codes were included)
27,962,474
1,568
Final number of records
27,962,474
Percent Included
99%
1 The following analytes are included in the counts above: Total coliform, Fecal coliform, E. coli, Cryptosporidium, Giardia lamblia,
Enterococci, and coliphage.
5.5.1 Non-Routine Samples
Some States have regulations that are more stringent than the NPDWRs and require water
systems to submit more sample results than federally required. States also may require
laboratories to report all sample results from water systems including results from contaminants
that are not regulated. Usually, non-routine sample results that are specifically listed as "special
request" in the database are also identified as being "non-compliance" samples. Most other types
of non-routine sample results, such as confirmation, repeat or maximum residence time sample
results are "for compliance." While the extraction tool excluded sample results that were "not for
compliance," some "special" sample results that were marked as being "for compliance" were
included in the data extracted from SDWIS States. In addition, "non-routine / not for
compliance" results were present in data from the non-SDWIS States. These data were flagged
and inquired to the States. All results that were marked as routine (RT), repeat (RP), or triggered
(TG) were included in the occurrence analyses for the microbial contaminants.
5.5.2 Pairing Disinfectant Residual and Coliform Results for non-SDWIS States
Per the requirements under the Surface Water Treatment Rule (SWTR), surface water systems
need to monitor disinfectant residuals at the same locations and time as for routine total coliform
(TC) under the total coliform rule (TCR) and Revised TCR (RTCR). Thus, the TC data
submitted by States generally also contain paired disinfectant residual monitoring records.
However, some non-SDWIS States submit disinfectant residual concentration data as
independent records not paired with TC samples. These data were submitted under different
analyte codes: chlorine (0999), total chlorine (1000), chloramine (1006), chlorine dioxide (1008),
residual chlorine (1012), and free residual chlorine (1013), depending on the State. To enable
evaluation of disinfectant residual concentrations versus TC positivity rates, EPA paired the
residual chlorine data with the associated TC result based on the sample collection date, sample
point ID, and lab assigned ID. Specifically, EPA conducted this pairing for Wisconsin and
Pennsylvania, two non-SDWIS States which submitted disinfected residual concentration data as
independent records. Pennsylvania and Wisconsin were the only non-SDWIS States that had the
necessary information needed to conduct this pairing. For Pennsylvania, 83,785 TC records (10
percent) were paired with free chlorine residuals (1013) and 54,395 TC (6 percent) were paired
with total chlorine residuals (1000). For Wisconsin, 327,230 TC records (47 percent) were paired
with free chlorine residuals (1013). In an effort to pair more results, EPA applied a secondary
approach to the remaining unpaired records which omitted the lab assigned ID as a necessary
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-17
February 2024
-------
join field. This pairing effort enabled an additional 96,701 TC records in Pennsylvania and 335
TC records in Wisconsin to be paired records to be paired with free chlorine residuals (1013). An
additional 32,824 TC records in Pennsylvania were paired with total chlorine residuals (1000).
This resulted in a total of 267,705 TC records paired in Pennsylvania (31 percent) and 327,565
records paired in Wisconsin (47 percent). EPA did not have enough information to conduct
pairing using the remaining analyte codes, including whether reported concentrations represent
free or total chlorine. However, EPA is still making those unpaired disinfectant residual records
available in the public release of the SYR 4 dataset (see Appendix E).
5.5.3 Updates to Absence and Presence Codes
Under the SYR 4 ICR, some microbial records (TC, EC, and fecal coliform) were submitted
without a presence indicator code (i.e., indicating whether the result was absent (A) or present
(P)) but with a value in the measured concentration field (specifically, the
CONCENTRATION MSR field). EPA updated nearly 4 million microbial records with a null
presence absence code and a concentration of zero to set the presence absence code equal to "A".
In addition, EPA updated nearly 60,000 microbial records with a PRESENCE IND CODE of
null to "P" when the concentration was greater than zero, indicating the presence of the microbe.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
5-18
February 2024
-------
6 Data Preparation for Chemical Phase and
Radionuclides Rules' Analyses
6.1 Non-Detection Record Replacement
Within the SYR 4 ICR dataset, each sample analytical result specifies a value and a sign to
indicate whether that result is a detection (i.e., greater than or equal to the MRL) or a non-
detection. Sample records reported as non-detections were less uniform and less complete than
sample records for analytical detections. For some of the States that did report MRL data, this
information was recorded in the analytical result field, along with a "<" sign in a corresponding
field to identify the record as a non-detection. Other States simply included a zero or negative
result in the analytical result field to signify a non-detection. For some of the occurrence
analyses, EPA calculated system mean concentrations using a "simple substitution" approach
that substitutes MRL values for reported analytical non-detections. Non-zero MRL numeric
values were needed to replace all analytical results that were reported either as zero, "non-
detection," "ND," etc.
A convention was established where EPA replaced any missing MRL data for non-detection
results with the modal MRL value for the State in which the system was located. The State-
specific modal MRLs were derived directly from the SYR 4 ICR dataset. In some cases, though,
all MRL data for a specific contaminant's data from an entire State were missing. In these cases,
the missing values were replaced with the national modal MRL derived as the mode of all the
State-specific modal MRL values for that contaminant. If State-specific modal MRL values were
greater than the national modal MRL or less than the minimum MDL for the contaminant, a
process was developed to identify and replace such values with more reasonable MRL values.
Exhibit 18 provides a description of the three-step process.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
6-1
February 2024
-------
Exhibit 18. Process to Establish Contaminant National Modal MRLs
Step 1: Establish a National Modal MRL Value for Each Contaminant
6.2 Adjustments of Population Served by Public Water Systems
Consecutive water systems purchase all of their water from other systems (i.e., seller or
wholesale systems). Compliance monitoring requirements are different for consecutive water
systems compared to other systems because their water has already been treated and monitored
by the wholesale water system. For the occurrence analyses of the Chemical Phase and
Radionuclides Rules' contaminants presented in USEPA (2024a), EPA excluded data from
consecutive systems, as those systems are not required to sample for those contaminants.6
However, EPA did adjust the population values of the wholesale systems to include the
population of consecutive systems that buy their water. The population served directly by these
wholesale systems is the retail population, and the population served indirectly through the
purchased systems is the wholesale population. The sum of the retail and wholesale populations
is the adjusted total population. Adjusting for the total population served ensured that the entire
relevant population was included in the exposure estimates.
6 Note that consecutive water systems do their own sampling for lead and copper, as well as the microbial
contaminants and DBPs; thus, the data from these systems were not excluded from the lead, copper, microbial, or
DBP occurrence datasets (see USEPA. 2024a and USEPA. 2024b).
Data Management and OA/OC Process
for the SYR 4 ICR Dataset
6-2
February 2024
-------
Exhibit 19 illustrates a simple example of these adjustments. In the diagram, Systems B, C, and
D (consecutive systems) buy 100 percent of their water from System A (wholesale system).
System A is required to monitor for contaminant X; however, Systems B, C, and D are not
required to monitor. If contaminant X was detected and population values were not adjusted, the
exposure estimates would not account for the populations served by Systems B, C, and D, even
though these populations could be exposed to contaminant X. To correct for this, EPA uses the
adjusted total population served (i.e., retail plus wholesale populations) for System A for all
population-served estimates, which is equal to 24,600 people.
Exhibit 19: Illustration of the Adjusted Total Population Served by Wholesale
Systems
Wholesale System A
Retail Population: 10,000
Has a detection of
contaminantX
Total population served by wholesale system A exposed to detection of contaminantx
= retail population + wholesale population
= 10,000 + (5,400 + 8,000 + 1,200)
= 24,600
For some systems, a slightly more complicated adjustment to the wholesalers' total population
served values was required. Many consecutive water systems buy water from more than one
wholesale system. Because of this, their entire population should not be attributed to a single
wholesale system, and EPA must instead distribute the population across the wholesale systems.
The actual relative quantities of water purchased from the different wholesalers are not available;
therefore, in the cases of multiple wholesalers, the population served by the consecutive system
was assumed to be uniformly distributed across the wholesalers.
Exhibit 20 illustrates the complete population adjustment for System A, including the uniform
distribution of the consecutive systems' population served. In the diagram, for example, System
B, a system serving a population of 5,400 purchases its water from three different wholesale
systems - Systems A, E, and F. To account for the population served by System B in the
population exposure estimates, a third of System B's population (5,400 ^ 3 =1,800) is uniformly
distributed across Systems A, E, and F.
System B
Populations,400
System C
Population^,000
System D
Population:!,200
Data Management and OA/OC Process
for the SYR 4 ICR Dataset
6-3
February 2024
-------
Exhibit 20: Illustration of the Allotment of Consecutive System Populations to
Wholesale Systems
Adjusted population served by wholesale system A exposed to detection of contaminant x
= retail population +wholesale population
= 10,000 + (5,400/3 + 8,000 + 1,200/3)
= 20,200
To make adjustments across the SYR 4 ICR dataset, EPA compiled a list of all wholesale and
consecutive systems. This list of buyer-wholesaler relationships was from SDWIS/Fed, fourth
quarter of 2019. EPA then created a crosswalk linking the consecutive systems to the wholesale
systems from which they purchased their water. Finally, EPA distributed the population served
by each consecutive system evenly across the relevant wholesale system populations, according
to the calculations described. As a result, the contaminant occurrence measures are associated
with the adjusted total population (i.e., retail plus wholesale) served by these wholesale systems
included in the Six-Year Review dataset.
Data Management and OA/OC Process
for the SYR 4 ICR Dataset
6-4
February 2024
-------
7
Public Access to SYR 4 ICR Data
Through extensive data management efforts and QA evaluations, including consultations with
state data management staffs, EPA established a compliance monitoring and treatment technique
dataset (SYR 4 ICR dataset) that consists of data from 59 States (46 states of the United States,
Washington, D.C., American Samoa, Navajo Nation, Commonwealth of the Northern Mariana
Islands, and other tribes). The initial SYR 4 ICR dataset included more than 83 million analytical
records from approximately 142,000 PWSs that serve approximately 303 million people
nationally.7 More than 73 million analytical contaminant records underwent QA/QC review to be
included in the SYR 4 ICR dataset to support the SYR 4 analyses in USEPA (2024a-d). After the
QA/QC review was completed on these analytical records and a small percentage of records that
did not meet quality standards were omitted from analyses, the final SYR 4 ICR dataset comprise
almost 71 million analytical records from approximately 140,000 PWSs that serve approximately
301 million people nationally.8
EPA maintains the final SYR 4 ICR compliance monitoring data and treatment technique
information online at https://www.epa.gov/dwsixyearreview. The public can download the final
SYR 4 ICR data (i.e., all records that passed the QA/QC review) that were used in support of the
evaluation of regulated contaminant levels in drinking water. Appendix E includes a user guide
to obtaining and using the SYR 4 ICR compliance monitoring, treatment technique, and related
data from EPA's website.
7 This count of 142,000 PWSs represents all water systems with any SYR 4 data, including data for information not
specifically requested.
8 This count of 140,000 PWSs serving 301 million people represents water systems that provided data for requested
contaminants that passed QA/QC review.
Data Management and QA/QC Process 7-1 February 2024
for the SYR 4 ICR Dataset
-------
8 References
United States Environmental Agency (USEPA). 2016. Six-Year Review 3 Technical Support
Document for Disinfectants/Disinfection Byproducts Rules. EPA-810-R-16-012. December
2016.
USEPA. 2019. Information Collection Request Submitted to OMB for Review and Approval;
Comment Request; Contaminant Occurrence Data in Support of the EPA's Fourth Six-Year
Review of National Primary Drinking Water Regulations: October 31, 2019, Volume 84,
Number 211, Page 58381-58382.
USEPA. 2024a. Analysis of Regulated Contaminant Occurrence Data from Public Water
Systems in Support of the Fourth Six-Year Review of National Primary Drinking Water
Regulations: Chemical Phase Rules and Radionuclides Rules. EPA-815-R-24-014. February
2024.
USEPA. 2024b. Six-Year Review 4 Technical Support Document for Microbial Contaminant
Regulations. EPA-815-R-24-022. February 2024.
USEPA. 2024c. Review of Fluoride Occurrence for the Fourth Six-Year Review. EPA-815-R-
24-021. February 2024.
USEPA. 2024d. Analytical Feasibility Support Document for the Fourth Six-Year Review of
National Primary Drinking Water Regulation. EPA-815-R-24-015. February 2024.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
8-1
February 2024
-------
Data Management and Quality
Assurance/Quality Control Process for the
Fourth Six-Year Review Information
Collection Rule Dataset: Appendices
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
February 2024
-------
9 List of Appendices
APPENDIX A
APPENDIX B
APPENDIX C
APPENDIX D
APPENDIX E
Data Request Letter that EPA sent on June 3, 2020 to Each Primacy
Agency to Request Voluntary Submission of Compliance Monitoring Data
and Treatment Technique Information for Regulated Chemical,
Radiological, and Microbiological Contaminants
Crosswalk of Data Elements Requested for SYR 4 ICR and the SDWIS
Data Element Names
Data Dictionary for the SYR 4 ICR Database
Occurrence Data for the Aircraft Drinking Water Rule (ADWR)
User Guide to Downloading SYR 4 Data from EPA's Website
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
9-1
February 2024
-------
Appendix A: Data Request Letter that EPA Sent on June 3,2020 to
Each Primacy Agency to Request Voluntary Submission of
Compliance Monitoring Data and Treatment Technique
Information for Regulated Chemical, Radiological, and
Microbiological Contaminants
£
U
%
Q
UNITED STATES
'ro
ENVIRONMENTAL
1 5
PROTECTION AGENCY
I <3
WASHINGTON, D.C. 20460
V
OFFICE OF WATER
State Drinking Water Administrators
Association of State Drinking Water Administrators
1401 Wilson Blvd# 1225
Arlington, VA 22209
Dear State Drinking Water Administrator,
The 1996 Safe Drinking Water Act Amendments require the U.S. Environmental
Protection Agency (EPA) to review and revise, if appropriate, existing National Primary
Drinking Water Regulations (NPDWRs) at least every six years (i.e., the Six-Year Review). The
Agency is currently preparing for the fourth round of the Six-Year Review (Six-Year Review 4).
As was done for the third Six-Year Review, the EPA is contacting each primacy agency
(hereinafter referred to as "state") and requesting voluntary submission of its compliance
monitoring data and treatment technique information for regulated chemical, radiological, and
microbiological contaminants. We are requesting compliance monitoring data collected between
January 2012 and December 2019. The Office of Management and Budget (OMB) has approved
the information collection request for the EPA's fourth Six-Year Review under the provisions of
the Paperwork Reduction Act, 44 U.S.C. 3501 et seq., and has assigned OMB control number
2040-0298.
These data are an important component in supporting the EPA's Six-Year Review of
NPDWRs. We are encouraging each state to submit its contaminant monitoring and treatment
technique information because these data will contribute directly to the EPA's understanding of
national contaminant occurrence, treatment technique information, the population exposed to
regulated contaminants, and exposure reductions associated with the current regulations. The
EPA is requesting your voluntary submission by September 30, 2020.
Data Management and OA/OC Process
for the SYR 4 ICR Dataset
A-l
February 2024
-------
The EPA is requesting only data that are currently stored electronically (no paper
records), including both detection and non-detection results for compliance monitoring and
treatment technique information. Exhibit 1 of the attachment provides a list of the regulated
contaminants for which the EPA is requesting data. Exhibit 2 presents critical data elements
needed for each sample result. To make your voluntary reporting as easy as possible, your state
can transmit its compliance monitoring data set to the EPA using the same process your state
currently uses to submit your SDWIS data quarterly. The attachment also answers questions
about how the data will be transferred, managed, and used and provides some background
information about why we are requesting these data.
In our previous Six-Year Review data collections, we have worked closely with state data
managers to answer questions and facilitate data transfer. Soon after June 30, 2020 we will begin
contacting data managers and coordinating directly with them by phone and/or email.
Thank you for your consideration of this request. Many of you voluntarily submitted your
data for the Six-Year Review 3. We appreciated your participation and hope you will do so
again. If you have any questions about this request or the intended uses of the data, please
contact Lili Wang, Associate Chief, Standards and Risk Reduction Branch, at wang.lili@epa.gov
or Nicole Tucker, Six-Year Review 4 Team Lead, attucker.nicole@epa.gov.
Sincerely,
Jennifer L. McLain, Director
Office of Ground Water and Drinking Water
Enclosure: Attachment
cc: Regional Water Division Directors
Regional Drinking Water Branch Chiefs
Tribal Direct Implementation Contacts
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-2
February 2024
-------
ATTACHMENT
I. Details Regarding EPA's Request for Contaminant Monitoring Data
A. What regulated contaminants are included in this request?
EPA is requesting compliance monitoring information for chemical, radiological, and
microbiological contaminants, as was requested under past Six-Year Reviews. Exhibit 1, below,
lists the specific contaminants for which EPA is requesting monitoring data. EPA will work with
you to make the data transfer as easy as possible. Voluntary submission of your regulated
drinking water contaminant monitoring and treatment technique data is the most critical step in
this national occurrence assessment for the Six-Year Review 4.
B. What specific data are being requested and what timeframe should the data cover?
EPA is requesting the voluntary submission of compliance monitoring data for regulated
chemical, radiological, and microbiological contaminants (Exhibit 1) collected between January
2012 and December 2019. This request only includes those data that you have stored in
electronic format. The requested data include routine compliance monitoring samples (including
repeat and confirmation samples) and treatment technique data. Please include all results for both
analytical detections and non-detections.
Exhibit 2 lists the data elements that are likely to be captured as part of your facility and
treatment data, and likely to be in your compliance monitoring database. We encourage you to
send us your data even if you feel that your data set is incomplete.
l-'\liihil 1: Occurrence l);K;i Kc(|iics(cd
Chemical Contaminants (Phase I, II, IIB, and VRules; Arsenic Rule; Lead and Copper Rule)
Acrylamide
1,1 -Dichloroethy lene
Methoxychlor
Alachlor
cis-1,2-Dichloroethylene
Monochlorobenzene
(Chlorobenzene)
Antimony
trans-1,2-Dichloroethylene
Nitrate (as N)
Arsenic
Dichloromethane (Methylene
chloride)
Nitrite (as N)
Asbestos
1,2-Dichloropropane
Oxamyl (Vydate)
Atrazine
Di(2-ethylhexyl) adipate (DEHA)
Pentachlorophenol
Barium
Di(2-ethylhexyl) phthalate (DEHP)
Picloram
Benzene
Dinoseb
Polychlorinated biphenyls (PCBs)
Benzo[a]pyrene
Diquat
Selenium
Beryllium
Endothall
Simazine
Cadmium
Endrin
Styrene
Carbofuran
Epichlorohydrin
2,3,7,8-TCDD (Dioxin)
Carbon tetrachloride
Ethylbenzene
Tetrachloroethylene
Chlordane
Ethylene dibromide (EDB)
Thallium
Chromium (total)
Fluoride
Toluene
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-3
February 2024
-------
l-'\liihil 1: Occurrence l);il;i Uc(|iics(cd
Copper
Glyphosate
Toxaphene
Cyanide
Heptachlor
2,4,5-TP (Silvex)
2,4-D
Heptachlor epoxide
1,2,4-Trichlorobenzene
Dalapon
Hexachlorobenzene
1,1,1 -T richloroethane
1,2 -Dibromo - 3 -chloropropane
(DBCP)
Hexachlorocyclopentadiene
1,1,2-T richloroethane
1,2-Dichlorobenzene
(o-Dichlorobenzene)
Lead
Trichloroethylene
1,4-Dichlorobenzene
(p-Dichlorobenzene)
Lindane
Vinyl chloride
1,2-Dichloroethane (Ethylene
dichloride)
Mercury (inorganic)
Xylenes (total)
Radiological Contaminants
Combined Radium-226/228; and
Radium-226 & Radium-228 (if
available)
Gross beta
Tritium
Iodine-131
Uranium
Gross alpha
Strontium-90
Total Coliform Rule (TCR) and Revised Total Coliform Rule (RTCR)
Total coliforms
Fecal coliforms
Escherichia coli (E. coli)
Disinfectants and Disinfection Byproducts Rules (DBPRs)
Total Trihalomethanes (TTHMs):
Chloroform
Bromodichloromethane
Dibromochloromethane
Bromoform
Haloacetic Acids (HAA5):
Monochloroacetic acid
Dichloroacetic acid
Trichloroacetic acid
Bromoacetic acid
Dibromoacetic acid
Bromate
Chlorite
Chlorine
Chloramines
Chlorine dioxide
Ground Water Rule (GWR)
Escherichia coli (E. coli)
Enterococci
Coliphage
Surface Water Treatment Rules (SWTRs)
Chlorine
Cryptosporidium
Heterotrophic Plate Count (HPC)
Chloramines
Giardia lamblia
Filter Backwash Recycling Rule (FBRR)
No specific occurrence data collected.
l.xhihil 2: Rc(|ucsk'(l Dala ( alciiorics
Data Category
Description
System-Specific Information
Public Water System
Identification Number
(PWSID)
The code used to identify each PWS. The code begins with the standard 2-character
postal state abbreviation or Region code; the remaining 7 numbers are unique to
each PWS in the state.
System Name
Name of the PWS.
Federal Public Water
System Type Code
A code to identify whether a system is:
Community Water System;
Non-transient Non-community Water System; or
Transient Non-community Water System.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-4
February 2024
-------
Exhibit 2: Requested Data Categories
Population Served
Highest average daily number of people served by a PWS, when in operation.
Federal Source Water
Type
Type of water at the source. Source water type can be:
Ground water; or
Surface water; or
Ground water under the direct influence of surface water (GWUDI) (Note: Some
States may not distinguish GWUDI from surface water sources. In those States, a
GWUDI source should be reported as a surface water source type.)
Treatment Information
Water System Facility
System facility data, including: treatment plant identification number, treatment
plant information, treatment unit process/objectives, facility flow, treatment train
(train or flow of water through treatment units within the treatment plant).
Filtration Type
Information relating to system filtration, including: filtration status, types of
filtration (e.g., unfiltered, conventional filtration, and other permitted values).
Treatment Technique
Information
Information pertaining to treatment processes. Types of treatment technique
information including: disinfectants used and their doses for primary and secondary
disinfection, coagulant/coagulant aid type and dose, disinfectant concentration,
disinfection profile/bench mark data, log of viral inactivation/removal, contact
time, contact value, pH, temperature.
Filter Backwash
Information
Information about filter backwash that is returned to the treatment plant influent
(e.g., information on: recycle/schematic status, alternative return location,
corrective action requirements, and recycle flows and frequency).
Sample-Specific Information
Sampling Point
Identification Code
A sampling point identifier established by the state, unique within each applicable
facility, for each applicable sampling location (e.g., entry point to the distribution
system). This information enables occurrence assessments that address intra-
system variability.
Sample Identification
Number
Identifier assigned by state or the laboratory that uniquely identifies a sample.
Sample Collection Date
Date the sample is collected, including month, day, and year.
Sample Type
Indicates why the sample is being collected (e.g., compliance, routine, repeat,
confirmation, additional routine samples, duplicate, special, special duplicate, etc.).
Sample Analysis Type
Code
Code for type of water sample collected.
Raw (Untreated) water sample
Finished (Treated) water sample
For lead and copper only:
Source
Tap
For TCR Repeats only; indicator of sampling location relative to sample point
where positive sample was originally collected:
Upstream
Downstream
Original
Contaminant
Contaminant name, 4-digit SDWIS contaminant identification number, or
Chemical Abstracts Service (CAS) Registry Number for which the sample is being
analyzed.
Sample Analytical Result
-Sign
The sign indicates whether the sample analytical result was:
(<) "less than" means the contaminant was not detected or was detected at a level
"less than" the minimum reporting level (MRL).
(=) "equal to" means the contaminant was detected at a level "equal to" the value
reported in "Sample Analytical Result - Value."
(+) "positive result" (For RTCR data, only positive E. coli result sign to be
included.)
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-5
February 2024
-------
Exhibit 2: Requested Data Categories
Sample Analytical Result
- Value
Actual numeric (decimal) value of the analysis for the chemical results, or the MRL
if the analytical result is less than the contaminant's MRL.
(For the TCR andRTCR, TC and E. coli will indicate presence/absence, and
positive E. coli will have numeric results.)
Sample Analytical Result
- Unit of Measure
Unit of measurement for the analytical results reported (usually expressed in either
|ig/L or mg/L for chemicals; or pCi/1 or mrem/yr for radiological contaminants).
(Not required for TCR andRTCR data)
Sample Analytical Method
Number
EPA identification number of the analytical method used to analyze the sample for
a given contaminant.
Minimum Reporting Level
(MRL) - Value
MRL refers to the lowest concentration of an analyte that may be reported.
(Not required for TCR andRTCR data)
MRL - Unit of Measure
Unit of measure to express the concentration value of a contaminant's MRL.
(Not required for TCR andRTCR data)
Source Water Monitoring
Information
Total organic carbon (TOC), including percent TOC removal, TOC removal
summary, pH, alkalinity, monitoring data entered as individual results or included
in DBP (or monthly operating report) summary records, alternative compliance
criteria, results from round 2 monitoring under LT2 ESWTR (including
Cryptosporidium, E. coli, turbidity, or state-approved alternate indicators).
Sample Summary Reports
Sample summaries for DBPRs, SWTRs, GWR corrective actions, and the Lead and
Copper Rule (LCR) associated with analytical result records. Values used for
compliance determination [e.g., turbidity (combined effluent/individual effluent),
disinfectant residual levels in treatment plant and distribution system, treatment
technique information, HPC, etc.]
1. For systems that are no longer required to individually monitor for nitrite, results should be reported for total
nitrate plus nitrite (expressed as N) as SDWIS Analyte Code 1038 in lieu of individual results for nitrite and nitrate.
C. How do I prepare my data for submission to EPA ?
We want to make this process as easy as possible for states that are volunteering to submit
monitoring and treatment technique data. EPA developed and refined a SDWIS/State extraction
tool, which runs a customized query to pull data for those using SDWIS/State. We believe this
would be the most efficient (i.e., easiest) method of data extraction for those states using some or
all of SDWIS/State. Currently, some states store and manage their data in more than one
database. If it is easier for you to provide the electronic data for all contaminants that are stored
in your data system, EPA can help you with a global extraction of the data. Please send inquiries
to SixYearData@cadmusgroup.com. All data will be transmitted to EPA using the same process
your state currently uses to submit your SDWIS data (see section D, below, for details).
L Extracting data that are stored in SDWIS/State:
SDWIS/State Extract Tool: EPA has developed the SDWIS/State Extract Tool to extract the
relevant data (specified in Exhibit 2) from a SDWIS/State database. The tool consists of three
parts: PWS Inventory and Treatment, Analytical Results and Calculated Compliance Values. The
first two parts were used in the Six-Year Review 3. States that use SDWIS/State for data storage
and management and are interested in using the SDWIS/State extract tool can email
SixYearData@cadmusgroup.com for instructions to download the extraction tool. EPA believes
the extract tool would be the easiest mode of extraction for data that are stored in SDWIS/State.
For the data transfer step, please see section D, below.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-6
February 2024
-------
Note: If you have not migrated all drinking water monitoring data for the applicable period
(January 2012 through December 2019) to SDWIS/State, a separate data submission to include
all data back to January 2012 is requested, so that the data included in the Agency's Six-Year
Review analysis is as complete and comparable as possible.
Automated Data Quality Assurance (QA) with SDWIS/State Extraction Tool: EPA has built
in several automated data QA checks with this extraction tool. For example, the extraction tool
will check for duplicate data, and analytical results that are >10 times the MCL. Before the data
are extracted from SDWIS/State, the extraction tool runs these queries and returns a "flagged
item report" for any data that meet these and other criteria that may indicate anomalies in your
data (e.g., incorrect units of measurement, or data entry error). If there are entries in your
"flagged item report," we strongly encourage you to review and resolve as many of these flags as
possible before re-running and submitting your data. Doing this will help ensure your submitted
data are of the highest quality possible. In addition, we will run these and other QA checks once
we receive your data; so, by addressing flags before submitting your data, you will reduce the
number of questions that need to be resolved once your data are submitted.
2. Format for Non-SDWIS/State data:
Virtually any electronic file format is acceptable. It would be ideal for states to submit their data
sets in one of the following file formats: dBaseTM (.dbf); Microsoft Access (.accdb); comma or
tab delimited files (such as .csv or .txt), or; Microsoft Excel (.xls). However, you can submit the
requested data "as is," by simply sending the compliance monitoring and treatment technique
records in whatever structure or condition in which they are currently stored and submitting that
copy of the electronic data to EPA. If it is easier for you to provide your entire electronic data
set, EPA will extract the needed data. If you have further questions about this data submission,
you can contact SixYearData@cadmusgroup.com.
3. Documentation:
EPA requests that your submission also include, at a minimum, a brief description of the basic
format and structure of each data set, and definitions of all data elements, column/row headings,
codes, acronyms, etc., used in each data set. (Note: EPA does not need this information if you are
using SDWIS/State. EPA already has this information.) This "data dictionary" information will
reduce the amount of time needed for questions and clarification later. EPA's primary goal is to
obtain the most complete national occurrence and treatment technique data possible, and the
Agency will work with the states to reconcile data questions where needed. If your data set is
incomplete, or there are known anomalies, such as those that may have been identified by the
SDWIS/State extract tool, it would be helpful if an explanation of these issues were included
with your transmittal.
D. How do I send my data to EPA ?
Regardless of whether data is stored in SDWIS/State, states can submit data using the same
process your state currently uses to submit your SDWIS data. (Note some states using
SDWIS/State may store some of the requested data outside of SDWIS/State and they should also
follow these instructions.) Zip your files extracted from SDWIS/State or from some other
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-7
February 2024
-------
location and name them SIXYEAR_REVIEW_XX.ZIP where XX is the Primacy Agency
identifier. For example, Maryland would submit a file SIXYEAR_REVIEW_MD.ZIP. The files
extracted from SDWIS/State by the extraction tool get zipped up and saved together with this
naming convention. For more information on how to submit the data please see instructions file
accompanying the extraction tool.
E. When do these data need to be submitted?
To help EPA meet its Six-Year Review 4 statutory timeframe and to allow ample time for data
compilation, analysis and documentation of results, EPA requests that the data be submitted by
September 30. 2020.
It. Background Information Regarding EPA's Occurrence Data Request
A. Why is EPA requesting this data?
The 1996 Safe Drinking Water Act (SDWA) Amendments require EPA to review and revise, if
appropriate, existing National Primary Drinking Water Regulations (NPDWRs) at least every six
years (i.e., the Six-Year Review). EPA is requesting monitoring and treatment technique data for
NPDWRs to support the fourth Six-Year Review. Without an understanding of where and at
what levels regulated drinking water contaminants are occurring in public drinking water, EPA
cannot assess any potential need to revise the regulations.
In addition, the 1996 SDWA Amendments require the Agency to maintain a national drinking
water contaminant occurrence database (i.e., the National Contaminant Occurrence Database or
NCOD) using occurrence data for both regulated and unregulated contaminants. Through this
data collection, EPA will be fulfilling various requirements set forth by Congress in the 1996
SDWA Amendments.
B. How will these data be used?
EPA's OGWDW will use the data to estimate the occurrence of regulated contaminants in public
drinking water systems and to evaluate the number of people exposed and exposure reductions.
Combined with results of other technical analyses (such as assessments of contaminant health
effects), the results of the occurrence and exposure analyses will be used to help determine
whether potential revisions to the current drinking water regulations are likely to maintain or
provide for greater protection of public health for people served by public water systems. This
data will help EPA to make well-informed regulatory decisions.
Once the Agency publishes the review results for the Six-Year Review 4, these data will be made
publicly available. The procedures used to analyze these data will reflect those established and
refined in prior Six-Year Reviews. Copies of EPA's Six-Year Review occurrence findings and
methodology reports can be obtained at:
http://water.epa.gov/lawsregs/rulesregs/regulatingcontaminants/sixyearreview/index.cfm. These
documents contain the first, second, and third Six-Year Review occurrence findings and provide
direct examples of the types of occurrence analyses that will be conducted using the compliance
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-8
February 2024
-------
monitoring data you submit.
C. Why is it important to submit these data?
Regulatory decisions and the public health protection resulting from these decisions are
improved by both the quality and quantity of the data. Each state that submits data can be
directly represented in any national occurrence estimates we develop. The Six-Year Review 4
data will be used in the review of existing regulations to determine whether current NPDWRs
remain appropriate or if revisions should be considered. All data will undergo a comprehensive
quality assurance/quality control (QA/QC) process required for the Six-Year Review 4
occurrence analyses. A copy of the resulting final, QA/QC reviewed contaminant data sets will
be posted on the EPA Six Year Review website.
D. What will happen once the data are submitted?
EPA will conduct uniform QA/QC assessments on each data set. Contaminant-specific analytical
values will be assessed as part of the QA/QC review. For example, assessment of all analytical
values for a specific contaminant will help identify possible unit errors or the presence of
outliers. The data will also be checked for duplicate data entries (as defined by multiple rows of
identical data elements) with duplicates excluded from the analysis, as needed. Identified errors
that do not have straight-forward solutions will be addressed through consultations with the
appropriate data management staff.
Based on EPA's experience with monitoring information provided by states for the prior Six-
Year Reviews, the Agency will likely need to contact some states to address questions regarding
the data format and content (e.g., outlier values, or missing or undefined data elements). EPA
will document the QA/QC process and all edits or changes made to the submitted monitoring
data.
After the data have undergone QA/QC editing and formatting, the datasets will be aggregated
into national contaminant occurrence datasets for each contaminant. The national aggregate
datasets will be used to generate statistical estimations of national occurrence. When the analyses
are completed and reported, the data will be placed in the NCOD and in the docket to support
any Six-Year Review 4 decisions.
Treatment information will also be compiled and assessed to support the Six-Year Review 4
decisions. However, the format of this information may not lend itself to analogous quantitative
analysis and national summaries. Assessment of this information will be conducted and may be
summarized in a more qualitative manner. Water system facility characteristics, filtration type,
treatment technique information, and filter backwash information may be used to further inform
the results of the occurrence data assessment.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
A-9
February 2024
-------
Appendix B: Crosswalk of Data Elements Requested for SYR 4 ICR
and the SDWIS Data Element Names
Exhibit B.l provides a crosswalk of the data elements requested in the SYR 4 ICR letter to the
States compared with the actual data elements as they appear in the SDWIS/State databases. These
were the data elements extracted via the SDWIS/State Extraction Tool.
Exhibit B.1: Crosswalk Table of Data Elements in SYR 4 ICR Request and SDWIS
Data Category
SDWIS Mapping ([Table Name],[Data Element])
System-Specific Information
Public Water System Identification Number
(PWSID)
TINWYS.NUMBERO
System Name
TINWSYS.NAME
Federal Public Water System Type Code
T1N WSYS. D_P WS_F E D_TY P E_C D
Population Served
TINWSYS.D_POPULATION_CNT
Federal Source Water Type
TINWSYS.D_FED_PRIM_SRC_CD
Treatment Information
Water System Facility
T6YWSF; [TINWSF_IS_NUMBER] and [TINWSF_ST_CODE]
Filtration Type
T1NWSYS.D_SWGUDl_lNT_CD; TINTRPLT.FILTER_TYPE
Treatment Technique Information
TINTROBJ.NAME; TINTRPRO.NAME; TINTRPLT.DBM VIR INACT LOG?;
TINTRPLT.DBM VIR INACT DT?; TINTRPLT.DBM VIR INACT STAT?;
TINTRPLT.DBM VIR INACT PCT?; TSAOSAM.NAME;
TSOSAM.VALUE_NUMBER; TSOSAM.UOM_CODE
Filter Backwash Information
TINTRPLT.FBR SCHEMATIC STAT; TINTRPLT.FBR SCHEMA RCV DAT;
TINTRPLT.FBR SCHEMA RVW DAT; TINTRPLT.FBR ALTR RTN RQS;
TINTRPLT.FBR ALTR RTN DT; TINTRPLT.FBR CORCTV ACT RQS;
TINTRPLT.FBR CORCTV ACT DT
Sample-Specific Information
Sampling Point Identification Code
TSASMPPT.IDENTIFICATION_CD
Sample Identification Number
TSASAMPL.ST_ASGN_IDENT_NUM
Sample Collection Date
TSASAMPL.COLLECTION_END_DATE
Sample Type
TSASAMPL.TYPE_CODE
Sample Analysis Type Code
TSASAMPL. REPEAT_LOC_TYP_CD
Contaminant
TSAANLYT.CAS_REGISTRY_NUM (TSAANLYT. CODE)
Sample Analytical Result- Sign
TSASAR.LESS_THAN_IND (TSAANLYT.LESS_THAN_CODE)
Sample Analytical Result- Value
TSASAR.CONCENTRATION_MSR
Sample Analytical Result- Unit of Measure
TSASAR.UOM_CODE
Sample Analytical Method Number
TSASMN.CODE
Minimum Reporting Level (MRL) - Value
TMNALRA.MEASURE (TSASAR.DETCTN LIMIT NUM,
TSASAR.DETECTN_LIM_UOM_CD)
MRL - Unit of Measure
TMNALRA.UOM_CODE (TSASAR.UOM_CODE)
Source Water Monitoring Information
TMNFANL.*
(TMNMPAVG.PRC ACH RMVL RA NO.TMNMPAVG.PRC ACH RMVL RA T
X)
Sample Summary Reports
TSASMPSM.* (TSAMDBPS.)
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
B-l
February 2024
-------
Appendix C: Data Dictionary for the SYR 4 ICR Database
This appendix contains 19 tables presenting the various tables and their data elements in the SYR
4 relational database, along with all permitted values in those tables. The data dictionary for
ADWR compliance data is in Appendix E, Section 6.
Exhibit C.1: Description of T6YWS (water system table)
Field Name
Data
Type
Description
T6YWSJD
Number
Unique identifier for each water system record.
Tl N WSY S_l S_N U M B E R
Number
Identifier for each water system that is unique when combined with
Tl NWSYS_ST_CODE.
Tl NWSY S_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
NUMBERO
Text
Public water system identification number (PWSID)
WS_NAME
Text
Water system name
D_POPULATION_COUNT
Number
Retail population served by the water system.
D_FED_PRIM_SRC_CD
Text
Updated primary water source for the water system. (Updated for systems
that were listed as purchased but are not truly 100% purchased.)
GU = Ground water Under Direct Influence of Surface Water
GUP = Purchased Ground Water Under Direct Influence of Surface Water
GW = Ground Water
GWP = Purchased Ground Water
SW = Surface Water
SWP = Purchased Surface Water
D_P WS_F E D_TY P E_C D
Text
Water system type according to federal requirements.
C = Community water system
NC = Non-community water system
NTNC = Non-transient non-community water system
NP = Non-public water system (This field has been corrected as a part
of the QA/QC process)
WS_ACTIVITY_STATUS_CD
Text
Activity status of the water system.
A = Active (i.e., water system that is producing water on a regular basis
(obtaining, treating, pumping, storing, or distributing)); I = Inactive
WS_ACTI VI TY_DATE
Date
For SDWIS States, the ACTIVITY_DATE is the date of the
ACTIVITY_STATUS_CD. For non-SDWIS States, it's the date that the
water system was deactivated (if applicable).
STATE_CODE
Text
Two-letter code that identifies the U.S. state in which the system is located.
This differs from TINWSYS_ST_CODE for tribal systems.
WHOLESALE_POPULATION
Number
Wholesale population served (for seller systems only)
TOTAL_POPULATION
Number
Total retail plus wholesale population served (for seller systems only)
AD J USTE D_TOTAL_PO P U LATI0 N
Number
Adjusted total population served (retail plus adjusted wholesale population
served as not to double-count buyer systems that purchase from multiple
seller systems). For non-seller systems, this value is equal to
D_POPULATION_COUNT.
ORIGINAL_ D_FED_PRIM_SRC_CD
Text
Original primary water source for the water system.
GU = Ground water Under Direct Influence of Surface Water
GUP = Purchased Ground Water Under Direct Influence of Surface Water
GW = Ground Water
GWP = Purchased Ground Water
SW = Surface Water
SWP = Purchased Surface Water
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-l
February 2024
-------
Exhibit C.2: Description of T6YWSF (water system facility table)
Field Name
Data Type
Description
T6YWSFJD
Number
Unique identifier for each water system facility record.
T6YWSJD
Number
Identifier matching each record to T6YWS
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that is unique when combined with
TINWSF_ST_CODE.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
TINWSYS_IS_NUMBER
Number
Identifier for each water system that is unique when combined with
Tl NWSY S_ST_CODE.
TINWSYS_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
WSF_ACTIVITY_STATUS_CD
Text
Activity status of the water system facility. A = Active; 1 = Inactive
WSF_ACTIVITY_DATE
Date/Time
For SDWIS States, the ACTIVITY_DATE is the date of the
ACTIVITY_STATUS_CD. For non-SDWIS States, it's the date that the water
system facility was deactivated (if applicable).
ST_ASGN_IDENT_CD
Text
A State-assigned value which identifies the water system facility.
WSF_NAME
Text
Name of the water system facility.
WSF_TYPE_CODE
Text
Type of the water system facility (permitted values).
CC = Consecutive Connection; CH = Common Headers; CS = Cistern; CW =
Clear Well; DS = Distribution System/Zone; IG = Infiltration Gallery; IN = Intake;
NN = Non-piped, non-purchased; NP = Non-piped; OT = Other; PC = Pressure
Control; PF = Pump Facility; RC = Roof Catchment; RS = Reservoir; SI =
Surface Impoundment; SP = Spring; SS = Sampling Station; ST = Storage; TM
= Transmission Main (Manifold); TP = Treatment Plant; WH = Well Head; WL =
Well
FILTRATION_STATUS
Text
Indicates whether a non-emergency surface water source or a non-emergency
ground water under the influence of surface water source is required to install
filtration by a certain date or is successfully avoiding filtration.
FILTRATION_STAT_DT
Date/Time
Date the Filtration Status was determined.
Exhibit C.3: Description of T6YSPT (sample point table)
Field Name
Data Type
Description
T6YSPTJD
Number
Unique identifier for each sample point record.
T6YWSFJD
Number
Identifier that relates each record to the unique record in the T6YWSF table.
T6YWSJD
Number
Identifier that relates each record to the unique record in the T6YWS table.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that is unique when combined with
TINWSF_ST_CODE.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
TSASMPPT_IS_NUMBER
Number
Identifier for each sample point that is unique when combined with
T S AS M P PT_ST_CO D E.
TSASMPPT_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the sample point.
TSASMPPT_TYPE_CODE
Text
Location type of a sampling point (permitted values).
DN = Wthin 5 service connections Downstream; DS = Distribution System; EP =
Entry point; NF = Near the first service connection; OR = Original location; SR =
Source sampling point; UP = Within 5 service connections Upstream
SOURCE_TYPE_CODE
Text
The type of water source, based on whether treatment has taken place.
FN = Finished, treated; RW = Raw, untreated; x = unknown
IDENTIFICATIONS
Text
Unique code for identifying a water system facility's sample point. This value must be
unique within the Water System Facility.
DESCRIPTION_TEXT
Text
Description of the sample point location.
LD_C P_T 1E R_L EV_TXT
Text
Indicates if the sample point is a Lead and Copper
Tier 1, 2, or 3 site.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-2
February 2024
-------
Exhibit C.4: Description of T6YANALYTE (analyte table)
Field Name
Data Type
Description
T6YANALYTEJD
Number
Unique identifier for each analyte record.
TS AAN LYTJ S_N U M B E R
Number
Identifier for each analyte that is unique when combined with TSAANLYT_ST_CODE.
TSAANLYT_ST_CODE
Text
This value is "HQ" for all SDWIS/Fed contaminants. If the value is not "HQ,"
the analyte code is specific to the primacy agency.
ANALYTE_CODE
Text
4-digit EPA Analyte code
ANALYTE_NAME
Text
Analyte name
ALTERNATE_NAME
Text
Synonym for analyte name
Fl RSTIMPORTSTATE
Text
First State from which the analyte was added (if a non-requested contaminant
from a non-SDWIS State).
Exhibit C.5: Description of T6YSAR (sample analytical result table)
Field Name
Data Type
Description
T6YSARJD
Number
Unique identifier for each sample analytical result record.
T6YWSJD
Number
Identifier that relates each record to the unique record in the T6YWS table.
T6YWSFJD
Number
Identifier that relates each record to the unique record in the T6YWSF table.
T6YSPTJD
Number
Identifier that relates each record to the unique record in the T6YSPT table.
T6Y ANALYTEJ D
Number
Identifier that relates each record to the unique record in the T6YANALYTE table.
TSASAR_IS_NUMBER
Number
Identifier for each sample analytical result that is unique when combined with
TSASAR_ST_CODE.
TSASAR_ST_CODE
Text
Two-digit code that identifies the State that submitted data.
TSASAMPLJS_NUMBER
Number
Identifier for each sample that must be combined with TSASAMPL_ ST_CODE when
used. These values may not be unique.
T S AS AM P L_ST_CO D E
Text
Two-digit code that identifies the State that submitted data.
TSASMN_IS_NUMBER
Number
Identifier for each standard method number that must be combined with
TSASMN_ST_CODE when used. These values may not be unique.
TSASMN_ST_CODE
Text
Two-digit code that identifies the State that submitted data.
TSASAMPLOIS_NUMBER
Number
Identifier for each sample that must be combined with TSASAMPLOST_CODE when
used. These values may not be unique. This relates a confirmation or repeat sample to
the originating routine sample.
T S AS AM P LOST_CO D E
Text
Two-digit code that identifies the State that submitted data.
LAB_ASGND_ID_NUM
Text
An identifier used for reconciliation with the State data system or sample identification
number assigned by the laboratory.
COLLLECTION_END_DT
Date/Time
Sample Collection Date.
COMPL_PURP_IND_CD
Text
Indicates whether or not the sample result is used for
compliance determination.
Y = "yes" (use for compliance determination)
N = "no" (taken for reasons other than compliance determination such as lab
performance, etc.)
TS AS AM P L_TY P E_CO D E
Text
Sample Type Code (permitted values):
BB = Batch Blank; CN = Continuous; CO = Confirmation; DU = Duplicate; FB = Field
Blank; GR = Grab; MR = Maximum Residence Time; MS = Matrix spike; PE =
Performance Evaluation; Rl = Replacement for Invalid; RL = Replacement; RP =
Repeat; RT = Routine; SB = Shipping Blank; SL (or ST) = Split; SP =Special; TE =
Technical Evaluation; TG = Triggered
REPEAT_LOC_TYP_CD
Text
The location of the repeat/check/confirmation sample with respect to the location of the
original routine sample.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-3
February 2024
-------
Field Name
Data Type
Description
LESS_THAN_IND
Text
Indication of whether the result is "less than" the Lab Reporting Limit or "less
than" the Regulatory Minimum Reporting Limit.
"Y" = "yes" result is less than (i.e., a non-detection)
"N" = "no" result is not less than (i.e., a detection)
LESS_THAN_CODE
Text
When valued, indicates that the analytical result (concentration) was below the
Regulatory Minimum Reporting Level or below the Laboratory Reporting Level.
DL = Detection Limit;
MDL = The lab reported the analytical result was less than the Method Detection
Limit;
MRL = The lab reported the analytical result was less than the Minimum Reporting
Level.
DETECTN_LIMIT_NUM
Number
Limit established by the laboratory below which scientifically reliable results cannot be
achieved.
DETECT N_LIM_UOM_CD
Text
Unit of measure associated with the detection limit.
REPORTED_MSR
Text
Value (in text form) that represents the result obtained from a sample analysis. This
field maintains the level of precision of the result (i.e., maintains the correct number of
trailing zeroes in the analysis result).
CONCENTRATION_MSR
Number
A numeric value that represents the result obtained from a sample analysis.
UOM_CODE
Text
Unit of measure.
PRESENCE_IND_CODE
Text
Indicates whether results of an analysis were positive (P-Presence) or negative (A-
Absence). Indication of presence or absence creates an analytical result for a
microbial analyte.
COUNT_QTY
Number
The number of organisms counted or estimated in a microbiological sample. Usually
expressed as "# of colonies per 100 milliliter sample."
COUNT_TYPE
Text
Type of microbiological unit that is being counted per specified count unit. Count type
varies with the microbiological organism where count has been recorded.
COUNT_UOM_CODE
Text
The units of measure associated with the microbial analytical result count.
FF_CHLOR_RES_MSR
Number
Amount of free chlorine residual disinfectant found in the water after disinfection has
been applied.
FLDTOT_CHL_RES_MSR
Number
Amount of total chlorine residual disinfectant found in the water after disinfection has
been applied.
FIELD_TEMP_MSR
Number
Temperature of the water being sampled at the time and place of sample collection.
TEMP_MEAS_TYPE_CD
Text
Enables selection of "C" for centigrade or "F" for Fahrenheit degrees.
FIELD_TURBID_MSR
Number
Turbidity of the water being sampled at the time and place of sample collection in
Nephelometric Turbidity Units (NTU).
FIELD_PH_MEASURE
Number
pH of the water being sampled at the time and place of sample collection (pH units).
Fl ELD_FLOW_RATE
Number
Flow of the water being sampled at the time and place of sample collection.
METHOD_CODE
Text
Method used to analyze the sample.
METHOD_NAME
Text
Name of method used to analyze the sample.
DETECT
Number
DETECT = 1 for all detections. Detections were identified as records with
[CONCENTRATION_MSR] > 0 and [LESS_THAN_IND] was <> to "Y" or was null.
DETECT = 0 for all non-detections. Non-detections were identified as records with
[CONCENTRATION_MSR] = 0 and/or [LESS_THAN_IND] = "Y."
VALUE
Number
For all non-detections (i.e., [DETECT] = 0), [VALUE] was left blank.
For all detections (i.e., [DETECT] = 1), [VALUE] = [CONCENTRATION_MSR],
UNITS
Text
Unit of measure associated with [VALUE]
TSASMPPT_IS_NUMBER
Number
Identifier for each sample point that is unique when combined with
T SAS MP PT_ST_CO DE.
T S AS M P PT_ST_CO D E
Text
Two-digit code that identifies the State that submitted data for the sample point.
ASSAY_UOM_CODE
Text
Unit of measure for microbiological analytical result
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-4
February 2024
-------
Exhibit C.6: Description of T6YDBPSUM (DBP summary table)
Field Name
Data Type
Description
T6YDBPSUMJD
Number
Unique identifier for each DBP summary record.
T6YWSJD
Number
Identifier that relates each record to the unique record in the T6YWS table.
T6YSPTJD
Number
Identifier that relates each record to the unique record in the T6YSPT table.
T6YFANL_ID
Number
Identifier that relates each record to the unique record in the T6YFanls table.
TSAMDBPS_IS_NUMBER
Number
Identifier for each MDBP summary that must be combined with TSAMDBPS_ST_CODE
when used.
TSAMDBPS_ST_CODE
Text
Two-digit code that identifies the State that submitted the MDBP summary.
SOURCE_TYPE_CODE
Text
The type of water source, based on whether treatment has taken place.
IDENTIFICATION_CD
Text
The unique code for identifying a water system facility sample point. This value must be
unique within the Water System Facility.
DESCRIPTION_TEXT
Text
A description of the monitoring requirement.
LD_CP_TI ER_LEV_TXT
Text
"Tiers" for sampling sites by water systems, established by the lead and copper rules:
Tier 1: Single family residences that contain copper pipe and lead solder installed
after 1982 and/or served by a lead service line
Tier 2: Same as above but multi-family buildings
Tier 3: Single family residence with copper pipe and lead solder installed before 1983
TYPE_CODE_CV
Text
Type of Microbial Disinfection Byproduct Summary.
REPORTED_DATE
Date/Time
Date that the MDBP Summary is reported to regulating agency.
SAMPLES_REQUIRED
Number
Number of samples required for specified analyte and water system facility.
SAMPLES_COLLECTED
Number
Number of samples collected for specified analyte and water system facility.
MR_COMPLIANCE_IND
Text
Indicates status of M&R compliance for specified analyte and water system facility.
LVL_COMPLIANCE_IND
Text
Indicates status of level compliance for the specified analyte and water system facility.
S M P LS_BY N D_M EA_LVL
Number
The total number of outlier samples (i.e., samples that exceed the Max, Min, or 95P
Measure Level), stored as a number.
PRCNT_BYND_MEA_LVL
Number
The percentage of outlier samples (i.e., samples that exceed the Max, Min, or 95P
Measure Level), stored as a number.
PRCNT_BYND_MEA_TXT
Text
The percentage of outlier samples (i.e., samples that exceed the Max, Min, or 95P
Measure Level), stored as text.
HIGHEST_MSR
Number
The highest measure during the specified monitoring period.
HIGHEST_MSR_TXT
Text
The highest measure during the specified monitoring period stored as text to preserve
the trailing zeros (which indicate the precision of the measure).
CP_PRD_BEGIN_DT
Date/Time
Compliance Period Begin Date
CP_PRD_END_DT
Date/Time
Compliance Period End Date
Tl N WS Y S_l S_N U M B E R
Number
Identifier for each water system that is unique when combined with
TI NWSY S_ST_CO D E.
Tl NWSY S_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that is unique when combined with
TINWSF_ST_CODE.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
T6YWSF_ID
Number
Unique identifier for each water system facility record.
TSASMPPT_TYPE_CODE
Text
Location type of a sampling point.
TSASMPPT_IS_NUMBER
Number
Identifier for each sample point that is unique when combined with
T SAS MP PT_ST_CO DE.
T S AS M P PT_ST_CO D E
Text
Two-digit code that identifies the State that submitted data for the sample point.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-5
February 2024
-------
Exhibit C.7: Description of T6YFANL (facility analyte levels table)
Field Name
Data Type
Description
T6YFANLJD
Number
Unique identifier for each facility analyte level record.
T6Y ANALYTEJ D
Number
Identifier that relates each record to the unique record in the T6YANALYTE table.
TMNFANL_IS_NUMBER
Number
Identifier for each facility analyte level that must be combined with
TINWSYS_ST_CODE when used.
Tl N WSY S_l S_N U M B E R
Number
Identifier for each water system that must be combined with TINWSYS_ST_CODE
when used.
Tl NWSY S_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that must be combined with
TINWSF_ST_CODE when used.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
EFFECTIVE_BEG_DAT
Date/Time
The first date a facility analyte level was made effective.
EFFECTIVE_END_DAT
Date/Time
The last date a facility analyte level was effective.
REPORTED_MSR
Text
A numeric value that represents the result obtained from a single analysis, or the
average result obtained from multiple analyses.
FANL_UOM_CODE
Text
A code or abbreviation for a unit of measure.
NUM_DAYS_PER_MONTH
Number
The number of days per month during the annual operation period for which this water
system facility is normally in operation and/or must monitor for the analyte specified in
this FANL. The number 31 is meant to signify each day within the month.
SAMPLE_RQT_PER_DAY
Number
The number of samples that must be collected during a 24-hour period from
midnight to midnight for which this water system facility must monitor for the
analyte specified. The number 24 is meant to signify continuous.
IND_FILT_MNTRG_FLG
Text
Individual Filter Monitoring Required Flag - either Yes/No
SUM_TYPE_CODE_CV
Text
Type of Microbial Disinfection Byproduct Summary.
MDBP_SUM_CHK_FLG
Text
Indicates whether MDBP Summaries will be used in checking for compliance at the
Facility Analyte Level.
CONTROL_LVL_MSR
Number
The measure of facility analyte control level captured as a number.
FANL_ANALYTE_CODE
Text
4-digit EPA Analyte code
FANL_ANALYTE_NAME
Text
Analyte name
T6YWSJD
Number
Identifier that relates each record to the unique record in the T6YWS table.
T6YWSF_ID
Number
Unique identifier for each water system facility record in the T6YWSF table.
Exhibit C.8: Description of T6YSAMPSUM (sample summaries table)
Field Name
Data Type
Description
T6YSAMPSUMJD
Number
Unique identifier for each sample summary record.
T6Y ANALYTEJ D
Number
Identifier that relates each record to the unique record in the T6YANALYTE table.
TSASSR_IS_NUMBER
Number
Identifier for each sample summary result that must be combined with
TSASSR_ST_CODE when used.
TSASSR_ST_CODE
Text
Two-digit code that identifies the State that submitted the sample summary result.
TSASMPSM_IS_NUMBER
Number
Identifier for each sample summary that must be combined with
TSASMPSM_ST_CODE when used.
TSASMPSM_ST_CODE
Text
Two-digit code that identifies the State that submitted the sample summary result.
Tl NWSYS_l S_N U M BE R
Number
Identifier for each water system that must be combined with TINWSYS_ST_CODE
when used.
Tl NWSYS_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that must be combined with
TINWSF_ST_CODE when used.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-6
February 2024
-------
Field Name
Data Type
Description
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
COLLECTION_STRT_DT
Date/Time
The earliest date the samples represented in the sample summary were collected.
COLLECTION_END_DT
Date/Time
The latest date the samples represented in the sample summary were collected.
COMPL_PURP_IND_CD
Text
Indicates whether or not the sample summary was used for compliance determination.
SAM P_S U M_TY P E_CO D E
Text
Analyte Codes CU90 and PB90:
90 - 90th percentile value (lead and copper only)
95 - 95th Percentile value (lead and copper only)
AL - Number of samples greater than the action level (lead and copper only)
Analyte Code 3100:
RT - routine samples with negative results from the distribution system.
COUNT_QTY
Number
Number of analytical results represented in the sample summary record
SAM P_S U M_M EAS U R E
Number
The calculated value of the results represented in the sample summary
defined by the sample summary's TYPE_CODE.
SAM P_S U M_U 0 M_CO D E
Text
The unit of measure (UOM) that is associated with the value reported for the sample
summary measure.
TSAANLYT_IS_NUMBER
Number
Identifier for each analyte that is unique when combined with TSAANLYT_ST_CODE.
TSAANLYT_ST_CODE
Text
This value is "HQ" for all SDWIS/Fed contaminants. If the value is not "HQ," the analyte
code is specific to the primacy agency.
ANALYTE_CODE
Text
4-digit EPA Analyte code
ANALYTE_NAM E
Text
Analyte name
T6YWS_ID
Number
Identifier that relates each record to the unique record in the T6YWS table.
T6YWSF_ID
Number
Identifier that relates each record to the unique record in the T6YWSF table.
Exhibit C.9: Description of T6YCMCLV (Compliance monitoring and compliance
level violations table)
Field Name
Data Type
Description
T6Y ANALYTEJ D
Number
Unique identifier for each treatment record.
T6YWSJD
Number
Identifier that relates each record to the unique record in the T6YWSF table.
T6YWSFJD
Text
Unique identifier for each water system facility record.
T6YSPTJD
Text
Unique identifier for each sample point record.
CP_PRD_BEGIN_DT
Date
Compliance Period Begin Date.
CP_PRD_END_DT
Date
Compliance Period End Date.
AVG_TYPE_CODE
Text
The type of average represented by the MCL Value.
TSAANLYT_IS_NUMBER
Number
Identifier for each analyte that is unique when combined with
TSAANLYT_ST_CODE.
TSAANLYT_ST_CODE
Text
This value is "HQ" for all SDWIS/Fed contaminants. If the value is not "HQ," the
analyte code is specific to the primacy agency.
CALCULATEDVALUE
Number
The value for a given analyte, sampling location and period of time that is
compared against an MCL to determined compliance.
UOM_CODE
Text
The measurement units used to express the measure or value.
NUMB_RESULTS_USED
Number
The number of results used in the calculation of a given Monitoring Period
Average.
PRC_ACH_RMVL_RA_NO
Number
Precursor Achieved Removal Ratio Number Used by the Calculate MCL
AVG_DUR_TYPE_CD
Text
The type of monitoring period, i.e., monthly, quarterly, annually.
AVG_NBR_MON_PRD
Number
The number of monitoring periods covered by the average.
BIN_NUMBER
Text
The BIN assignment for the period of time covered by the average.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that is unique when combined with
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-7
February 2024
-------
Field Name
Data Type
Description
TINWSF_ST_CODE.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
TSASMPPT_IS_NUMBER
Number
Identifier for each sample point that is unique when combined with
T S AS M P PT_ST_CO D E.
T S AS M P PT_ST_CO D E
Text
Two-digit code that identifies the State that submitted data for the sample point.
MP_TYPE_CODE
Text
The code of monitoring period, i.e., monthly, quarterly, annually.
T6YCMCLVJD
Number
Unique identifier for each calculated compliance value.
Exhibit C.10: Description of T6YC0RACT (Corrective Actions)
Field Name
Data Type
Description
T6YC0RACTJD
Number
Unique identifier for each corrective action.
T6YWS_ID
Number
Identifier that relates each record to the unique record in the T6YWSF table.
TIN WSY S_l S_N U M B E R
Number
Identifier for each water system that is unique when combined with
Tl NWSY S_ST_CODE.
TINWSYS_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
DATEJSSUEJDENTIFIED
Text
Date the corrective action was identified.
SCHEDULE_TYPE
Text
Type of schedule for the corrective action.
SCHEDULE_DESCRIPTION
Text
Schedule for the corrective action.
CORACT_CATEGORY_CODE
Text
Category code for the corrective action.
CORACT_NAME
Text
Name of the corrective action.
DUE_DATE
Date
Due date for the required corrective action.
AC HI EVE D_DAT E
Date
The date that the water system achieved the corrective action required.
TENSCHD_IS_NUMBER
Number
Identifier for each corrective action compliance schedule that must be combined
with TENSCHD_ST_CODE when used.
TENSCHD_ST_CODE
Text
Two-digit code that identifies the State of the corrective action compliance
schedule.
Exhibit C.11: Description of T6YMCL_MDL (Maximum contaminant level and
minimum detection level table)
Field Name
Data Type
Description
T6YMCL_MDL_ID
Number
Unique identifier for each MCL or MDL
ANALYTE_CODE
Text
4-digit EPA Analyte code
CHEMGRP
Text
Chemical Group
DB_MCL
Number
Maximum Contaminant Level
DB_MCL_UNIT
Text
Maximum Contaminant Level Unit of Measure
DB_4XMCL
Number
Four times the Maximum Contaminant Level
MDL
Number
Method Detection Limit
MDLJOTH
Number
One-tenth the Method Detection Limit
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-8
February 2024
-------
Exhibit C.12: Description of T6YWSFPLT (Treatment plant water system facilities
table)
Field Name
Data Type
Description
T6YWSFPLTJD
Number
Unique identifier for each treatment plant water system facility record.
T6YWSFJD
Number
Identifier that relates each record to the unique record in the T6YWSF table.
ST_ASGN_I DENT_CD
Text
A State-assigned value which identifies the treatment plant water system facility.
WSF_TYPE_CODE
Text
The value extracted from SDWIS/State will be "TP" (treatment plant).
FILTER_TYPE
Text
(Unfiltered (UF), Conventional Filtration (CF), Direct Filtration (DF),
Diatomaceous Earth (DE), Other (OT), and other permitted values that the
System Administrator may add)
FILTER_DESCRIPTION
Text
A description of the filter.
DISINFECT_CONCENTN
Text
Disinfectant Concentration in mg/L
CO NTACT_TIM E_STAT
Text
Contact Time Status (Permitted values):
RQD - Required; NRQD - Not Required; REQT - Requested; RECV -
Received; URVW - Under Review; RVWD - Reviewed; APVD - Approved;
DTMD - Determined; DENY - Denied; RESB - Resubmitted
CT_TI ME_DETERM_DAT
Date/Time
Date the Contact Time was determined
CONTACT_TIME
Text
Contact Time in minutes-the number of minutes the water was in contact with
the disinfectant to be properly disinfected. The range of values is 0001 to 2400
CT_VALUE
Text
Contact value in mg/min/liter
DBM_GIA_INACT_LOG
Number
The disinfection profile benchmark for Giardia inactivation in Logs.
DBM_GIA_I NACT_STAT
Text
The status of the disinfection profile benchmark for
Giardia inactivation. See CONTACT_TIME_STAT for
permitted values and description
DBM_GIA_INACT_DT
Date/Time
The date the disinfection virus benchmark was determined.
DBM_GIA_I NACT_PCT
Number
The disinfection profile benchmark for Giardia inactivation percent.
DBM_VI R_l NACT_LOG
Number
The disinfection profile benchmark for virus inactivation in Logs.
DBM_VI R_l NACT_STAT
Text
The status of the disinfection profile benchmark for Virus inactivation. See
CONTACT_TIME_STAT for permitted values and description
D B M_VI R U S_l NACT_DT
Date/Time
The date the disinfection virus benchmark was determined.
DBM_VI R_l NACT_PCT
Number
The disinfection profile benchmark for virus inactivation percent.
BIN_STATUS
Text
The status of the BIN determination for the Long Term 2 Surface Water Treatment
Rule. See CONTACT_TIME_STAT for permitted values and description.
BIN_LT2
Number
The BIN number for the Long Term 2 Surface Water Treatment Rule.
Bl N_DETERM_DT
Date/Time
The date the BIN number was determined for the Long Term 2 Surface Water
Treatment Rule.
F B R_S C H E M ATI C_ST AT
Text
Under the Filter Backwash Rule, a water system is required to submit a schematic
of this treatment plant to the primacy agency for review to demonstrate the
percentage of filter backwash that is returned to the treatment plant influent. See
CONTACT_TIME_STAT for permitted values and description.
FBR_SCHEMA_RCV_DAT
Date/Time
Date primacy agency received treatment plant schematic to demonstrate the
percentage of filter backwash that is returned to the treatment plant influent.
F B R_SC H E M A_RVW_DAT
Date/Time
Date primacy agency completes review of treatment plant schematic and
determines the percentage of filter backwash that is returned to the treatment plant
influent.
FBR_ALTR_RTN_RQS
Text
The status of a request from the water system to request an alternate location for
return of the filter backwash.
FBR_ALTR_RTN_DT
Date/Time
The date that the water system requested an alternate location for return of the
filter backwash.
FBR_CORCTV_ACT_RQS
Text
The status of corrective action by the water system as required by the primacy
agency after review of the schematic of the filter backwash flow in the treatment
plant.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-9
February 2024
-------
Field Name
Data Type
Description
FBR_CO RCTV_ACT_DT
Date/Time
The date that the water system achieved the corrective action required for the filter
backwash.
WSF_NAME
Text
Name of the water system facility.
FBR_COMMENTS
Text
A memo field into which a user may enter comments about the Filter Backwash
Recycling Rule.
DSNF_BMRK_REASON
Text
Text description associated with the Disinfection Benchmark Reason
CONTACT_TIM_REASON
Text
Text description associated with the Contact Time
Exhibit C.13: Description of T6YTREATPR0CESS (Treatments associated to
treatment plants table)
Field Name
Data Type
Description
T6YTREATPR0CESSJD
Number
Unique identifier for each treatment record.
T6YWSFJD
Number
Identifier that relates each record to the unique record in the T6YWSF table.
TINTROBJ_CODE
Text
A coded value that categorizes the treatment objective.
TINTROBJ_NAME
Text
The name of the treatment objective.
TINTRPRO_CODE
Text
A coded value that categorizes the treatment process.
TINTRPRO_NAME
Text
The name of the treatment process.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that is unique when combined with
TINWSF_ST_CODE.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
Exhibit C.14: Description of T6YWSFFL0WS (Water system facility flows table)
Field Name
Data Type
Description
T6YWSFFL0WSJD
Number
Unique identifier for each water system facility flow record.
T6YWSFJD
Number
Identifier that relates each record to the unique record in the T6YWSF table.
TINWSFF_IS_NUMBER
Number
Identifier for each water system facility flow entry that is unique when combined
with T6YWSFJD.
TINWSFF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility flow
entry.
TRAIN_ID
Text
This attribute identifies the water system facilities that are part of the same flow.
SEQUENCEJD
Text
This attribute identifies the order of the water system facilities in a specific flow.
PROCESS_WATER_TYPE
Text
A system administrator-controlled code of the type of water flowing between the
facilities.
WAT E R_QTY_M S R
Number
A value that represents the number of gallons of water purchased.
WATER_QTY_MSR_UNIT
Text
A coded value which specifies the unit of measurement for the quantity of water
purchased.
CONNECTION_TYPE_CD
Text
Categorizes the type of connection between the water system facilities.
CONNECTION_DATE
Date/Time
The date of the connection of the water system facility to another water system
facility.
DISCONNECTION_DATE
Date/Time
The date of the disconnection of the water system facility from another water
system facility.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that is unique when combined with
TINWSF_ST_CODE.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
TINWSFOIS_NUMBER
Number
Identifier for each supplying water system facility that is unique when combined
with TINWSFOST_CODE.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-10
February 2024
-------
Field Name
Data Type
Description
TINWSFOST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
T6YWSF0ID
Number
Unique identifier for each supplying water system facility.
Exhibit C.15: Description of T6YWSFIND (Water system facility indicators table)
Field Name
Data Type
Description
T6YWSFINDJD
Number
Unique identifier for each water system facility indicator record.
T6YWSFJD
Number
Identifier that relates each record to the unique record in the T6YWSF table.
TINWSFIN_IS_NUMBER
Number
Identifier for each water system facility indicator that is unique when combined with
T6YWSFJD
WSF_IND_NAME
Text
The water system facility indicator name.
WSFJ ND_DESC
Text
The description of the water system facility indicator name.
WS F_l N D_VAL U E_C D
Text
The value of the indicator established by the primacy agency.
WSF_IND_DATE
Date/Time
The date associated with the indicator.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that is unique when combined with
TINWSF_ST_CODE.
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
Exhibit C.16: Description of T6YWSIND (Water system indicators table)
Field Name
Data
Type
Description
T6YWSINDJD
Number
Unique identifier for each water system indicator record.
T6YWSJD
Number
Identifier that relates each record to the unique record in the T6YWS table.
TINWSIN_IS_NUMBER
Number
Identifier for each water system indicator that is unique when combined with.
T6YWSFJD.
Tl N WSY S_l S_N U M B E R
Number
Identifier for each water system that is unique when combined with
Tl NWSYS_ST_CODE.
Tl NWSY S_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
WS_IND_NAME
Text
The water system indicator name.
WS_IND_DESC
Text
The description of the water system indicator name.
WS_IND_VALUE_CD
Text
The value of the indicator established by the primacy agency.
WS_IND_DATE
Date/Time
The date associated with the indicator.
Exhibit C.17: Description of T6YWSPURCH (Water system buyers and sellers)
Field Name
Data
Type
Description
T6YWSPURCHJD
Number
Unique identifier for each water system buyer and seller record.
T6YWSJD
Number
Identifier that relates each record to the unique record in the T6YWS table.
Tl NWSYSOI S_NUMBER
Number
Identifier for each supplying water system that is unique when combined with
Tl NWSYSOST_CODE.
TINWSYSOST_CODE
Text
Two-digit code that identifies the State that submitted data for the supplying water
system.
TINWPURC_IS_NUMBER
Number
Identifier for each water system purchase record that must be combined with
TINWSYSOST_CODE when used.
TINWSF_IS_NUMBER
Number
Identifier for each water system facility that must be combined with
TINWSF_ST_CODE when used.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-ll
February 2024
-------
Field Name
Data
Type
Description
TINWSF_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the facility.
TINWSFOIS_NUMBER
Number
Identifier for each supplying water system facility record that must be combined with
TINWSFOST_CODE when used.
TINWSFOST_CODE
Text
Two-digit code that identifies the State that submitted data for the supplying facility.
T6YWS0ID
Number
Unique identifier for each supplying water system.
TINWSYS_IS_NUMBER
Number
Identifier for each water system that is unique when combined with
Tl NWSY S_ST_CODE.
TINWSYS_ST_CODE
Text
Two-digit code that identifies the State that submitted data for the system.
T6YWSFJD
Number
Unique identifier for each water system facility record.
T6YWSF0ID
Number
Unique identifier for each supplying water system facility.
Exhibit C.18: Description of T6YSAR_TRANSACTI0N (Sample analytical result
transaction table)
Field Name
Data Type
Description
T 6Y_T RAN SACTI0 N_l D
Number
Unique identifier for each transaction. (Note: Some records will be listed more than once
if they were flagged for more than one reason such as being greater than 4*MCL and
greater than 10*MCL.)
T6YSAR_ID
Number
Unique identifier for each sample analytical result (enables linking to T6YSAR).
TSASAR_IS_NUMBER
Number
Identifier for each sample analytical result that is unique when combined with
TSASAR_ST_CODE.
TSASAR_ST_CODE
Text
Two-digit code that identifies the State that submitted data.
QA_FLAG_ID
Number
A coded value (1 through 11) that identifies the reason that the record was flagged.
Values have the following descriptions:
1 = flagged a s a potential duplicate;
2 = flagged as a transient sample for an analyte for which transient systems are not
required to sample;
3 = flagged as a non-compliance sample;
4 = flagged as a non-routine sample;
5 = flagged as 4 times greater than the MCL;
6 = flagged as 10 times greater than the MCL;
7 = flagged as less than the MDL;
8 = flagged as less than 1 /10th of the MDL;
9 = flagged for having abnormal units;
10= DBP samples flagged as taken outside the distribution system/entry point; and
11 = Utah nitrate or nitrite records flagged as being assigned an inaccurate analyte code.
ACTIONJD
Number
A coded value (1 through 3) that identifies the reason that the record was
flagged. Values have the following descriptions: 1 = no change; 2 = one
of the record's fields was changed; 3 = record excluded (or a duplicate).
ANALYZE
Text
Field contains "yes" or "no," identifying whether or not the record will be included in
the occurrence analysis.
REMARK
Text
Text describing the QA issues, as well as other notes related to the record.
STATERESPONSE
Text
Verbatim response from the State on the flagged record (when available).
ACTIONDETAIL
Text
Additional detail on the record's "action" such as why the record was excluded or
changed.
CREATEDATE
Date/Time
Date the transaction was entered into the database.
LASTMODIFIEDDATE
Date/Time
Date the transaction record was last modified.
ACTION_ID_CLEAN
Number
A coded value (1 through 4) that identifies the reason that the record was
flagged. Values have the following descriptions: 1 = no change; 2 = one
of the record's fields was changed; 3 = record excluded; 4 = duplicate
record (which may or may not be excluded as one copy of the duplicate is
retained).
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-12
February 2024
-------
Field Name
Data Type
Description
NEW_COLUMN
Text
Field indicating which column in "T6YSAR" should be modified by the transaction record.
N EW_VALU E_DATE
Date
New value to replace the existing value in "T6YSAR" that should be modified by the
transaction record. Only stores values if they are in Date format.
N EW_VALU E_TEXT
Text
New value to replace the existing value in "T6YSAR" that should be modified by the
transaction record Only stores values if they are in Text format.
NEW_VALUE_NUMERIC
Number
New value to replace the existing value in "T6YSAR" that should be modified by the
transaction record. Only stores values if they are in Number format.
COLUMN_TYPE
Number
A coded value (1 through 3) that identifies the column that stores the value that will
replace the existing value in "T6YSAR" that should be modified by the transaction record.
1 = NEW_VALUE_DATE, 2 = NEW_VALUE_TEXT, 3 = NEW_VALUE_NUMERIC.
NUMBERO
Text
Public water system identification number (PWSID) derived from T6YSAR.
COLLECTION_END_DT
Date
The latest date the samples represented in the sample summary were collected derived
from T6YSAR.
CONCENTRATION_MSR
Number
A numeric value that represents the result obtained from a sample analysis derived from
T6YSAR.
LAB_ASGND_ID_NUM
Text
An identifier used for reconciliation with the state data system or sample identification
number assigned by the laboratory derived from T6YSAR.
ANALYTE_CODE
Text
4-digit EPA Analyte code
QA_TRANSACT_I D
Number
Unique identifier for QA of each transaction.
Exhibit C.19: Description of T6YWS_TRANSACTION (Water system transaction
table)
Field Name
Data Type
Description
T6YWS TRANSACTION I
D
Number
Unique identifier for each transaction. (Note: Some records will be listed more than once
if they were flagged for more than one reason such as being greater than 4*MCL and
greater than 10*MCL.)
T6YWSJD
Number
Unique identifier for each sample analytical result (enables linking to T6YSAR).
TINWSYS_IS_NUMBER
Number
Identifier for each sample analytical result that is unique when combined with
TSASAR_ST_CODE.
Tl NWSY S_ST_CODE
Text
Two-digit code that identifies the State that submitted data.for the system
QA_FLAG_ID
Number
A coded value (1 through 11) that identifies the reason that the record was
flagged. Values have the following descriptions:
1 = flagged a s a potential duplicate;
2 = flagged as a transient sample for an analyte for which transient systems are not
required to sample;
3 = flagged as a non-compliance sample;
4 = flagged as a non-routine sample;
5 = flagged as 4 times greater than the MCL;
6 = flagged as 10 times greater than the MCL;
7 = flagged as less than the MDL;
8 = flagged as less than 1/10th of the MDL;
9 = flagged for having abnormal units;
10= DBP samples flagged as taken outside the distribution system/entry point; and
11 = Utah nitrate or nitrite records flagged as being assigned an inaccurate analyte.
ACTIONJD
Number
A coded value (1 through 3) that identifies the reason that the record was
flagged. Values have the following descriptions: 1 = no change; 2 = one
of the record's fields was changed; 3 = record excluded (or a duplicate).
ANALYZE
Text
Field contains "yes" or "no," identifying whether or not the record will be included in
the occurrence analysis.
REMARK
Text
Text describing the QA issues, as well as other notes related to the record.
CREATEDATE
Date/Time
Date the transaction was entered into the database.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-13
February 2024
-------
Field Name
Data Type
Description
LASTMODIFIEDDATE
Date/Time
Date the transaction record was last modified.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
C-14
February 2024
-------
Appendix D: Occurrence data for the Aircraft Drinking Water Rule
(ADWR)
In May 2021, EPA downloaded compliance monitoring data from its Aircraft Reporting and
Compliance System (ARCS) for evaluation under SYR 4. ARCS is a centralized web-based data
collection and management system that provides accountability and regulatory oversight and is
used to facilitate the reporting of aircraft public water system (PWS) data. This data is also
publicly available on the ADWR Compliance Reports website:
https://www.epa.gov/dwreginfo/adwr-compliance-reports. Air carriers subject to the ADWR
must report to EPA and conduct, as appropriate, the following actions in ARCS, unless an
alternative reporting method has been approved (https://www.epa.gov/dwreginfo/aircraft-
drinking-water-rule):
A complete inventory of aircraft PWS fleet;
PWS activity details, such as whether the aircraft is currently in an active or inactive
status.
The date the Operations and Maintenance plan was developed;
The date the Coliform Sampling plan was developed;
The date the aircraft PWS Sampling plan(s) was incorporated into the aircraft water
system Operations and Maintenance plan;
The date the Operations and Maintenance plan(s) was incorporated into FAA-accepted
air carrier Operation and Maintenance program;
The frequency for routine disinfection and flushing, and the corresponding routine total
coliform sampling frequency; and
The date for routine disinfection and flushing, routine coliform sampling dates and
results, and corrective actions (when applicable).
Approximately 212,937 records9 of aircraft PWS compliance monitoring data for total coliform
(TC) and E. coli (EC) samples were available in ARCS from February 2011 through May 2021,
including results reported for more than 70 different makes/models of aircraft. These results were
used to characterize the positivity rates of TC and EC in aircraft PWSs on an annual basis, as
well as for all the years that data were available (2011-2021) and for the subset of years 2012
through 2019. The evaluation of data for years 2012 through 2019 was performed to allow for a
comparison with similar data for stationary PWSs as described in Section 5.5. In addition, this
approach removes potentially confounding considerations associated with evaluating data for
calendar year 2020 when a large number of aircraft PWS were inactive due to COVID-19, as
well as years 2011 and 2021 for which the ARCS data evaluated at this time only represents
partial years.
Aircraft inventory data, including manufacturer, model, and disinfection and flushing frequency,
9 The number of records presented here is greater than the number of rows of data downloaded from ARCS (70,979
at the time of download in support of the SYR 4 analysis) because it counts all samples within each row of data (i.e.,
Sample 1, Sample 2, and Sample 3). Note that Sample 3 is related to the ability to have third sample collected,
which is not a requirement of ADWR and is not often used. Typically there is no data for Sample 3 fields.
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
D-l
February 2024
-------
were linked to the monitoring results by public water system identification number (PWSID).
Aircraft PWS were categorized as small, medium, or large based on the seat capacity (small =
<130 seats; medium = >130 - 250 seats; large = >250 seats). Note that these categories were
developed specifically for this analysis, based on the dataset and do not represent regulatory
categories. ADWR does not categorize aircraft PWS based on size. In addition, the first three
digits of the model number were used to summarize the make/model of each aircraft. For
example, inventory data showing model numbers for Boeing as 737800, 737-823, and 7377BD
all were captured in this analysis as 737.
A number of quality assurance (QA) steps were applied to the ADWR dataset to identify the TC
and EC records suitable for analysis. Data were excluded via the following QA steps:
Records where [Location] was were excluded (72,406 records)
Remaining records where [Total Coliform] was or "from" were excluded (4).
Remaining records where [Sample Taken On] date was incorrectly entered were
excluded. These dates were as follows: 12/08/0014 00:00", "09/26/0201 03:52",
"09/13/0019 03:59", "09/09/0201 03:35", "07/22/0204 05:17", "07/16/0018 01:35",
"06/21/0018 01:40", and "02/02/0017 16:10" (16 records).
Remaining records where [Total Coliform] result was entered as "absent" but [E. coli\
was positive (9 records).
The ADWR analyses were stratified in a variety of ways to summarize results, including the
number of TC samples and public water systems by aircraft size, manufacturer, model, air
carrier, sample type, and more. It is important to note that all EC positivity rates were calculated
twice, under two different sets of assumptions:
1. An EC sample was included in the analysis only if the EC result was listed as "Present"
or "Absent."
2. An EC sample was included if the EC result was listed as "Present" or "Absent" (i.e., the
same as the first set of assumptions) but with an added consideration of assuming that an
EC sample was "Absent" if the associated TC result was reported as "Absent" and there
was no EC result provided. These results are labeled in the file as "E. coli (Alternative
Approach) "
After the QA steps were applied, there were 140,502 TC results used in this evaluation, provided
by 8,093 PWSs and covering the full range of years for which ARCS data were collected (i.e.,
February 2011 - May 2021). Of those results, 7,250 results (5.2 percent) were positive for TC.
Under the first approach for calculating EC positivity rates listed above, there were 92,994 EC
results provided by 7,091 PWSs (i.e., 66 percent of the number of TC results and 88 percent of
the aircraft PWSs), with a total of 241 results (0.26 percent) positive for EC. Under the second
approach for calculating EC positivity rates listed above, there were 140,485 EC results provided
by 8,093 aircraft PWSs, with 241 results (0.17 percent) positive for EC.
Considering only the 8-year period from 2012-2019, there were 118,070 TC results used in this
evaluation, provided by 7,816 PWSs. Of those results, 6,448 results (5.5 percent) were positive
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
D-2
February 2024
-------
for TC. Under the first approach for calculating EC positivity rates listed above, there were
78,114 EC results provided by 6,776 PWSs (i.e., 66 percent of the number of TC results and 87
percent of the PWSs), with a total of 201 results (0.26 percent) positive for EC. Under the second
approach for calculating EC positivity rates, there were 118,056 EC results provided by 7,816
PWSs, with 201 results (0.17 percent) positive for EC.
Data users will find a difference between the number of FAA Corporate Names in the inventory
versus the samples file. The difference is due to the inventory FAA Corporate Names covering
the last year of data collection and the samples file covering all the years of data collection.
Some of the additional air carriers listed in the sampling file have since merged or gone out of
business. For more on ADWR analyses, see Six-Year Review 4 Technical Support Document for
Microbial Contaminant Regulations (USEPA, 2024b).
Data Management and QA/QC Process
for the SYR 4 ICR Dataset
D-3
February 2024
-------
Appendix E: User Guide to Downloading and Using Six-Year
Review 4 and Related Data from EPA's Website
This appendix includes a user guide for downloading and using the Six-Year Review 4 (SYR 4)
and related data from EPA's website. This document is also posted online with the data. In
addition, instructions on importing the SYR 4 datasets are included in this Appendix (see Section
10). The data dictionary for all datasets is also included in Appendix C above.
Several of the contaminant occurrence datasets that are posted online were not analyzed as part
of the SYR 4 effort. These contaminants were not subject to detailed review in SYR 4 due to
recent, ongoing, or pending regulatory action (e.g., lead, copper, DBPs). These datasets passed
the same QA procedures as those analyzed in SYR 4.
The data files are posted online in several zip files. Each zip file includes text files for multiple
contaminants/parameters. The number of records and contaminants/parameters included in each
file varies. The user may want to compare their counts of records downloaded for each
contaminant of interest to the table of records provided in this user guide's exhibits to ensure that
all of the records were correctly downloaded and imported. Note that these record counts reflect
the data after the QA/QC process. For a list of data elements included in the data posted online,
refer to Exhibit E. 1.
The remainder of this document is organized as follows:
Section 1: Background Information on Six-Year Review 4 Data Records
Section 2: SYR 4 Data Records Posted for Phase Chemicals, Lead, Copper and
Radionuclides
Section 3: SYR 4 Data Records Posted for Disinfection Byproducts
Section 4: SYR 4 Data Records Posted for Disinfection Byproducts Related Parameters
Section 5: SYR 4 Data Records Posted for Microbial Contaminants, Microbial Related
Parameters, and Disinfectant Residuals
Section 6: SYR 4 Data Records Posted for the Aircraft Drinking Water Rule (ADWR)
Section 7: Additional Data Collected under SYR 4 ICR
Section 8: SYR 4 Data Records Posted for Treatment
Section 9: SYR 4 Data Considerations
Section 10: Instructions on Importing SYR 4 Datasets
10A: Downloading Data Files
10B: Importing Data into Microsoft Excel
10C: Importing Data into R
10D: Importing Data into Microsoft Access
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-l
February 2024
-------
Section 1: Background Information on SYR 4 Data Records
To support the national contaminant occurrence and exposure assessments performed under the
fourth Six-Year Review process (SYR 4), EPA collected compliance monitoring data and
treatment technique information from public water systems (PWSs) for regulated drinking water
contaminants. This analysis allows EPA to characterize the frequency of occurrence, the levels
found, and the geographic distribution of contaminants and related data to help the agency
determine if there may be a meaningful opportunity to improve public health protection. EPA
conducted a voluntary data request from states, primacy agencies, territories, and tribes (referred
to as "States" throughout the remainder of this Appendix) to obtain compliance monitoring data
and treatment technique information necessary to analyze national contaminant occurrence in
support of SYR 4. This data request was conducted through the Information Collection Request
(ICR) process. EPA requested States to submit their Safe Drinking Water Act (SDWA)
compliance monitoring data and treatment technique information collected between January
2012 and December 2019. For more information on the process undertaken to request the
voluntary submission of compliance monitoring data and treatment technique information by the
States, see the fourth Six-Year Review ICR (84 FR 58381, USEPA, 2019).
Through extensive data management efforts, quality assurance evaluations, and communications
and consultations with State's data management staff, EPA established a single contaminant
occurrence dataset that consists of compliance monitoring data and treatment technique
information from 59 out of 66 jurisdictions (46 states plus Washington, D.C., American Samoa,
Navajo Nation, Commonwealth of the Northern Mariana Islands, and other tribes). This dataset
is referred to as the National Compliance Monitoring ICR dataset for the fourth Six-Year Review
(SYR 4 ICR dataset). The 59 States that provided data for the SYR 4 ICR dataset comprise 88
percent of all PWSs and 92 percent of the total population served by PWSs nationally, and are
geographically representative of PWSs nationwide. The SYR 4 ICR dataset was used to estimate
a variety of occurrence measures to characterize the national occurrence of regulated
contaminants in public water systems to support the Six-Year Review process.
EPA received compliance monitoring data and treatment technique information from both
SDWIS/State and non-SDWIS/State users. For States that use SDWIS/State, EPA developed a
tool, available upon request from States, to extract the requested data identified in the SYR 4
ICR from a SDWIS/State database. In all, 46 states and 13 other jurisdictions provided
compliance monitoring data that included parametric records. Thirty-five states, Washington
D.C, and six regional tribal entities used the extraction tool to extract all or some of their data.
The 17 States not using SDWIS/State submitted their compliance monitoring data and treatment
technique "as is," resulting in a variety of formats, including dBase, Excel, XML, Access, and
comma-delimited. With the exception of two States whose data were downloaded from their
publicly available website (California and Florida), all States submitted their data online via
EPA's Central Data Exchange. All data were conformed to a similar format with consistent units
of measurement for consistency. For more details about the collection and formatting of SYR 4
ICR data, see the main chapters of this document.
EPA conducted a quality assurance and control evaluation of these data submitted by States and
assembled these data into the SYR 4 ICR database, which includes more than 83 million records
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-2
February 2024
-------
from approximately 142,000 public water systems, serving approximately 303 million people
nationally. The dataset includes the results of all compliance monitoring data (i.e., all sample
analytical detections and non-detections) from January 2012 to December 2019 for regulated
chemical phase contaminants, radionuclides, disinfectants and disinfection byproducts
(D/DBPs), DBP precursors, microbial contaminants/indicators, disinfectant residuals, and other
related data including treatment information. As noted in the main chapters, only the data that
passed the QA/QC process are posted online.
Exhibit E.l, Six-Year Review 4 Data Field Names and Definitions, contains a list of the data
elements, column names and a brief description of the data for each data element included in the
SYR 4 ICR data text files.
Exhibit E.1: Six-Year Review 4 Data Field Names and Definitions
Data Element
Column Name
Description
Contaminant
Identification Code
ANALYTE_CODE
4-digit Safe Drinking Water Information System (SDWIS)
contaminant identification number for which the sample is being
analyzed.
Contaminant Name
ANALYTE_NAM E
Common name of contaminant for which the sample is being
analyzed.
Primacy Code
PRIMACY_CODE
2- digit code identifying the primacy agency (i.e., State) for the
water system.
State Code
STATE_CODE
2-digit code identifying the U.S. state or territory in which the
water system is located.
Public Water System
Identification Number
(PWSID)
PWSID
The code used to identify each PWS. The code begins with the
standard 2- character postal state abbreviation or region code;
the remaining 7 numbers are unique to each PWS in the State.
System Name
SYSTEM NAME
Name of the PWS.
Federal Public Water
System Type Code
SYSTEM_TYPE
A code to identify whether a system is:
Community Water System (C);
Non-Transient Non-Community Water System (NTNC); or
Transient Non-Community Water System (NC).
Retail Population
served
RETAIL POPULATIO
N SERVED
Retail population served by a system.
Adjusted Total
Population-served
ADJUSTED TOTAL
POPULATION
SERVED
Adjusted total population served (retail plus adjusted wholesale
population served as not to double-count buyer systems that
purchase from multiple seller systems).
Source Water Type
SOURCE WATER
TYPE
Type of water at the source. Source water type can be:
Ground water (GW);
Surface water (SW);
Purchased Surface Water (SWP);
Purchased Ground Water (GWP);
Ground Water Under Direct Influence of Surface Water (GU);
or
Purchased Ground Water Under Direct Influence of Surface
Water (GUP).
Facility Identification
Code
WATER_FACILITY_ID
A unique identifier for each water system facility.
Water Facility Type
WATER FACILITY
TYPE
Type of water system facility:
CC = Consecutive Connection;
CH = Common Headers;
CW= Clear Well;
DS = Distribution System;
IG = Infiltration Gallery;
IN = Intake;
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-3
February 2024
-------
Data Element
Column Name
Description
OT = Other;
PC = Pressure Control;
PF = Pumping Facility;
RS = Reservoir;
SI = Surface Impoundment;
SP = Spring;
SS = Sampling Station;
ST = Storage;
TM = Transmission Main (Manifold);
TP = Treatment Plant;
WH = Well Head;
WL = Well; or
XX = unknown.
Sampling Point
Identification Code
SAMPLING POINT 1
D
A unique identifier for each sampling point location.
Sampling Point Type
SAMPLING POINT
TYPE
Location type of a sampling point:
DS = Distribution System;
EP = Entry point;
FC = First Customer;
FN = Finished Water Source;
LD = Lowest Disinfectant Residual;
MD = Midpoint in the Distribution System;
MR = Point of Maximum Residence;
PC = Process Control;
RW = Raw Water Source;
SR = Source Water Point;
UP = Unit Process; or
WS = Water System Facility Point
Source Type Code
SOURCE TYPE COD
E
Type of water source, based on whether treatment has taken
place. Source type can be:
Finished (FN);
Raw (RW); or
Unknown (null orX).
Sample Type Code
SAMPLE TYPE COD
E
Type of sample:
CO = Confirmation;
MR = Maximum Residence Time;
RP = Repeat;
RT = Routine;
ST = Split;
MS = Matrix spike;
TG = Triggered; or
FB = Field Blank.
Laboratory Assigned
Identification Number
LABORATORY
ASSIGNED ID
Unique lab identification, used to link up the total coliform
positive (TC+) and E. coli / fecal coliform samples.
Six-Year ID
SIX YEAR ID
Unique identifier for each analytical result.
Sample Identification
Number
SAMPLEJD
Identifier assigned by State or the laboratory that uniquely
identifies a sample.
Sample Collection
Date
SAMPLE
COLLECTION DATE
Date the sample was collected, including month, day, and year.
Detection Limit Value
DETECTION LIMIT
VALUE
Limit below which the specific lab indicated they could not
reliably measure results for a contaminant with the methods and
procedures used by the lab.
Detection Limit Unit
DETECTION LIMIT
UNIT
Units of the detection limit value.
Detection Limit Code
DETECTION LIMIT
CODE
Indicates the type of Detection Limit reported in the Detection
Limit Value column (e.g., the Minimum Reporting Level,
Laboratory Reporting Level, etc.)
Sample Analytical
Result - Sign
DETECT
The sign indicates whether the sample analytical result was:
(0) "less than" means the contaminant was not detected or was
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-4
February 2024
-------
Data Element
Column Name
Description
detected at a level "less than" the MRL.
(1) "equal to" means the contaminant was detected at a level
"equal to" the value reported in "Sample Analytical Result -
Value."
Sample Analytical
Result - Value
VALUE
For detections, this field is equal to the actual numeric (decimal)
value of the analysis for the chemical result; for non-detections,
this field is blank.
Sample Analytical
Result - Unit of
Measure
UNIT
Unit of measurement for the analytical results reported (usually
expressed in either |jg/L or mg/L for chemicals; or pCi/L for
radionuclides).
Presence Indicator
Code
PRESENCE
INDICATOR_CODE
Indication of whether results of an analysis were positive or
negative for TC, EC and FC.
P = Presence
A = Absence.
Residual Field Free
Chlorine
RESIDUAL FIELD
FREE CHLORINE M
G_L
Amount of free chlorine residual (in mg/L) found in the water
after disinfectant has been applied. These concentrations were
measured in the field at the same time and location as coliform
samples (TC-EC-FC samples).
Residual Field Total
Chlorine
RESIDUAL FIELD
TOTAL CHLORINE
MG L
Amount of total chlorine residual (in mg/L) found in the water
after disinfectant has been applied. These concentrations were
measured in the field at the same time and location as coliform
samples (TC-EC-FC samples).
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-5
February 2024
-------
Section 2: SYR 4 Data Records Posted for Phase Chemicals, Lead,
Copper and Radionuclides
Exhibit E.2 provides a count of States, total number of sample records and systems for each
phase chemical, lead, copper, and radionuclide collected for SYR 4. Contaminant occurrence
data are grouped into zip files, which are indicated in the final column of Exhibit E2.
Exhibit E.2: Number of Six-Year Review 4 Data Records for Phase Chemicals,
Lead, Copper, and Radionuclides and Zip Filename(s)
Contaminant
Analyte
ID
Number
of
States
Total
Number
of Sample
Records
Total
Number
of
Systems
Zip Filename
Phase Chemicals
1,1,1-Trichloroethane
2981
58
491,411
52,207
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
1,1,2-Trichloroethane
2985
58
482,294
52,200
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
1,1-Dichloroethylene
2977
58
508,764
52,206
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
1,2,4-T richlorobenzene
2378
58
480,039
52,201
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
1,2-Dibromo-3-chloropropane
2931
57
244,298
37,153
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
1,2-Dichloroethane
2980
58
493,514
52,209
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
1,2-Dichloropropane
2983
58
481,065
52,197
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
2,3,7,8-TCDD
2063
42
38,934
6,222
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
2,4,5-TP
2110
58
187,025
40,954
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
2,4-D
2105
58
191,658
41,519
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
Alachlor
2051
58
215,965
42,822
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
Antimony, Total
1074
59
230,942
51,063
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
Arsenic
1005
59
452,852
52,505
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
Asbestos
1094
48
24,124
13,772
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
Atrazine
2050
58
225,827
43,763
SYR4_PhaseChem_1 (1,1,1-
Trichloroethane to Atrazine).zip
Barium
1010
59
232,216
52,488
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Benzene
2990
58
487,631
52,207
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Benzo(A)pyrene
2306
58
190,003
35,877
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Beryllium, Total
1075
59
229,630
50,225
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
BHC-Gamma
2010
58
195,775
38,843
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-6
February 2024
-------
Contaminant
Analyte
ID
Number
of
States
Total
Number
of Sample
Records
Total
Number
of
Systems
Zip Filename
Cadmium
1015
59
230,098
50,989
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Carbofuran
2046
58
176,608
37,375
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Carbon Tetrachloride
2982
58
510,599
52,205
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Chlordane
2959
58
189,512
38,310
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Chlorobenzene
2989
58
479,909
52,184
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Chromium
1020
59
238,413
51,357
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
cis-1,2-Dichloroethylene
2380
58
495,228
52,210
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Cyanide
1024
57
163,373
38,760
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Dalapon
2031
58
232,471
40,062
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Di(2-Ethylhexyl) Adipate
2035
58
192,447
36,369
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Di(2-Ethylhexyl) Phthalate
2039
58
202,419
36,486
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Dichloromethane
2964
58
487,166
52,222
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Dinoseb
2041
58
186,403
40,854
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Diquat
2032
54
110,637
22,215
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Endothall
2033
51
98,015
18,624
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Endrin
2005
58
192,869
38,483
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Ethylbenzene
2992
58
487,555
52,200
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Ethylene Dibromide
2946
57
243,161
38,371
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Fluoride1
1025
59
435,466
52,202
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Glyphosate
2034
55
105,084
21,744
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Heptachlor
2065
58
193,927
38,640
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Heptachlor Epoxide
2067
58
193,623
38,638
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Hexachlorobenzene
2274
58
195,150
38,311
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Hexachlorocyclopentadiene
2042
58
196,236
38,471
SYR4_PhaseChem_3 (Dalapon to
Hexachlorocyclopentadiene).zip
Mercury
1035
59
226,418
50,990
SYR4_PhaseChem_4 (Hybrid
Nitrate to Nitrate).zip
Methoxychlor
2015
58
196,131
38,834
SYR4_PhaseChem_4 (Hybrid
Nitrate to Nitrate).zip
Nitrate
1040
59
1,404,609
105,202
SYR4_PhaseChem_4 (Hybrid
Nitrate to Nitrate).zip
Nitrate (Hybrid)2
1040/
1038
59
1,635,300
127,904
SYR4_PhaseChem_4 (Hybrid
Nitrate to Nitrate).zip
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-7
February 2024
-------
Contaminant
Analyte
ID
Number
of
States
Total
Number
of Sample
Records
Total
Number
of
Systems
Zip Filename
Nitrite
1041
59
512,234
73,442
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Nitrate-Nitrite
1038
51
561,314
76,530
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
o-Dichlorobenzene
2968
58
480,075
52,200
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Oxamyl
2036
58
175,728
37,235
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
p-Dichlorobenzene
2969
58
480,247
52,203
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Pentachlorophenol
2326
58
201,636
41,094
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Picloram
2040
58
188,833
41,375
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Selenium
1045
59
232,598
51,317
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Simazine
2037
58
220,013
43,211
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Styrene
2996
58
479,601
52,187
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Tetrachloroethylene
2987
58
544,460
52,210
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Thallium, Total
1085
59
229,685
51,007
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Toluene
2991
58
488,192
52,348
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Total Polychlorinated
Biphenyls (PCB)
2383
49
116,454
23,262
SYR4_PhaseChem_5 (Nitrate-
Nitrite to Total Polychlorinated
Biphenyls (PCB)
Toxaphene
2020
58
183,765
37,419
SYR4_PhaseChem_6
(Toxaphene to Xylenes, total).zip
trans-1,2-Dichloroethylene
2979
58
488,716
52,194
SYR4_PhaseChem_6
(Toxaphene to Xylenes, total).zip
Trichloroethylene
2984
58
540,777
52,222
SYR4_PhaseChem_6
(Toxaphene to Xylenes, total).zip
Vinyl Chloride
2976
58
482,672
52,021
SYR4_PhaseChem_6
(Toxaphene to Xylenes, total).zip
Xylenes, Total
2955
56
412,436
46,720
SYR4_PhaseChem_6
(Toxaphene to Xylenes, total).zip
Lead and Copper
Lead
1030
54
1,552,995
53,058
SYR4_PhaseChem_4 (Hybrid
nitrate to nitrate).zip
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-8
February 2024
-------
Contaminant
Analyte
ID
Number
of
States
Total
Number
of Sample
Records
Total
Number
of
Systems
Zip Filename
Copper
1022
55
1,579,728
54,224
SYR4_PhaseChem_2 (Barium to
Cyanide).zip
Radionuclides
Gross Alpha, Excl. Radon &
U
4000
55
64,413
16,925
SYR4_Rads.zip
Gross Beta Particle Activity
4100
50
48,520
11,261
SYR4_Rads.zip
Combined Radium (-226 & -
228)
4010
53
86,594
21,972
SYR4_Rads.zip
Combined Uranium
4006
55
97,663
18,491
SYR4_Rads.zip
1 Includes records that passed the QA/QC procedures described in this document. See USEPA (2024c) for additional
information on procedures conducted for the occurrence analysis.
2 Includes all sampling results for nitrate and sampling results for total nitrate plus nitrite for systems for which there
were no SYR 4 nitrate (only) data.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-9
February 2024
-------
Section 3: SYR 4 Data Records Posted for Disinfection Byproducts
Exhibit E.3 provides a count of States, total number of sample records and systems for each
regulated disinfection byproduct collected for SYR 4, and the zip files names that the data files
can be located. These data records were not analyzed under SYR 4 because of the ongoing
considerations of potential revisions of the Stage 1 and Stage 2 DBP Rules.
Exhibit E.3: Number of Six-Year Review 4 Data Records for Disinfection
Byproducts and Zip filename(s)
Contaminant
Analyte
ID
Number
of States
Total
Number of
Sample
Records
Total
Number of
Systems
Zip Filename
Disinfection Byproducts - Full Datasets
Total Trihalomethanes (TTHM)
2950
57
1,089,557
46,297
SYR4_THMs.zip
Dibromochloromethane
2944
46
981,059
47,172
SYR4_THMs.zip
Bromoform
2942
46
976,412
47,129
SYR4_THMs.zip
Chloroform
2941
46
981,289
47,403
SYR4_THMs.zip
Bromodichloromethane
2943
46
977,561
47,196
SYR4_THMs.zip
Haloacetic Acids (HAA5)
2456
57
1,005,235
43,577
SYR4_HAAs.zip
Dibromoacetic Acid
2454
44
720,986
36,121
SYR4_HAAs.zip
Dichloroacetic Acid
2451
44
721,017
36,134
SYR4_HAAs.zip
Monochloroacetic Acid
2450
44
720,474
36,113
SYR4_HAAs.zip
Trichloroacetic Acid
2452
44
720,706
36,125
SYR4_HAAs.zip
Monobromoacetic Acid
2453
44
720,595
36,095
SYR4_HAAs.zip
Bromate
1011
38
23,298
444
S YR4_B ro mate_C h I o rite.
zip
Chlorite
1009
33
87,995
514
S YR4_B ro mate_C h I o rite.
zip
Note: The speciation data is higher for TTHM than HAA5 (90+% vs 70+%). There were two more States that provided
speciated THM results as compared to speciated HAA results. About 11,000 systems had speciated THM data but
not speciated HAA data. There are only about 200 systems with speciated HAA data but no speciated THM data. In
addition, the number of PWSs providing speciated TTHM data is higher than number of PWSs providing TTHM.
There are about 8,000 systems that have data for the speciated THMs but not TTHM whereas there are only about
7,000 systems with data for TTHM but not the speciated THMs.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-10
February 2024
-------
Section 4: SYR 4 Data Records Posted for Disinfection Byproduct
Related Parameters
This DBP-related data includes total organic carbon (TOC), alkalinity, pH, dissolved organic
carbon (DOC), specific UV-absorbance (SUVA), and UV-absorbance. Full datasets for TOC and
alkalinity (i.e., text files including all individual sample analytical results for TOC and alkalinity)
are included. In addition to the full datasets for TOC and alkalinity, a paired TOC-alkalinity
dataset was created that included, for each treatment plant (listed in Exhibit E.2 as a water
system facility), the average monthly concentrations of TOC and alkalinity in raw water paired
with the corresponding average finished water concentration of TOC. The paired TOC-alkalinity
dataset was created to evaluate the percent removal of TOC using the SYR 4 data and joined the
average monthly TOC concentration with the average monthly alkalinity concentration for
individual water system facilities when possible. This paired dataset is directly related to the
treatment technique requirements for TOC removal under the Stage 1 DBPR. EPA produced
these datasets to support the ongoing considerations of potential revisions of the Stage 1 DBP
Rule (85 FR 61680, USEPA, 2020). EPA did not analyze these data records under the SYR 4
effort. Historical efforts to evaluate the paired TOC-alkalinity data are described in Six-Year
Review 3 Technical Support Document for Disinfectants/Disinfection Byproducts Rules (USEPA,
2016).
Exhibit E.4 provides a count of States, total number of sample records and systems for TOC (raw
and finished), alkalinity, pH, DOC, SUVA, and UV-absorbance. The count of systems for raw
and finished TOC samples are counted separately, so systems with samples in both categories are
counted twice. The "full" TOC dataset contains only the raw/finished water designations from
the original data provided by the State (see SOURCETYPECODE). However, for the "paired"
TOC-alkalinity dataset, EPA applies the following logic to assign raw/finished water
designations to records that were missing it. Raw samples are identified as samples taken at
source water sampling points. Records are marked as raw if SOURCE TYPE CODE equals
"RW" or SOURCE TYPE CODE is NULL but the water system facility type code equals "IG",
"IN", "RS", "SP", "WL", or "CC". Records are marked as finished if SOURCE TYPE CODE
equals "FN" or SOURCE TYPE CODE is NULL but the water facility type code equals "CW",
"DS", "PF", "ST", "TM", "TP". Exhibit E.5 contains the list of data elements, column names,
and a brief description of the data for each data element included in the "paired" TOC-alkalinity
dataset. For a list of data elements included in the "full" TOC, alkalinity, and pH datasets, refer
to Exhibit E. 1.
Exhibit E.4: Number of Six-Year Review 4 Data Records for TOC, Alkalinity, pH,
DOC, SUVA, and UV-absorbance and Zip Filename(s)
Contaminant
Analyte ID
Number of
States
Total
Number of
Sample
Records
Total
Number of
Systems
Zip Filename
Disinfection Byproduct Related Parameters - Full Datasets
Total Organic Carbon
(TOC)
2920
49
440,197
3,156
SYR4_DBP_Related
Parameters.zip
Raw TOC
2920
42
188,358
2,494
SYR4_DBP_Related
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-ll
February 2024
-------
Contaminant
Analyte ID
Number of
States
Total
Number of
Sample
Records
Total
Number of
Systems
Zip Filename
Parameters.zip
Finished TOC
2920
38
155,558
1,999
SYR4_DBP_Related
Parameters.zip
Alkalinity
1927
51
429,397
18,140
SYR4_DBP_Related
Parameters.zip
PH
1925
52
632,821
28,660
SYR4_DBP_Related
Parameters.zip
SUVA
2923
2
8,026
59
SYR4_DBP_Related
Parameters.zip
UV-254
2922
3
6,061
60
SYR4_DBP_Related
Parameters.zip
DOC
2919
3
5,908
76
SYR4_DBP_Related
Parameters.zip
Disinfection Byproduct Related Parameters - Paired Dataset
Paired TOC-alkalinity
record
N/A
33
92,666
1,192
SYR4_DBP_Related
Parameters.zip
Exhibit E.5 Paired TOC-Alkalinity Dataset Field Names and Definitions
Data Element
Column Name
Description
Public Water System
Identification Number
(PWSID)
NUMBER0
The code used to identify each PWS. The code begins with the
standard 2- character postal state abbreviation or region code;
the remaining 7 numbers are unigue to each PWS in the state.
Sample Collection
Date (Month)
Month
Month (1 through 12).
Sample Collection
Date (Year)
Year
Year (2012 through 2019).
Retail Population-
served
Population Served
Retail population served by the water system.
Federal Public Water
System Type Code
System Type
Water system type according to federal reguirements.
C = Community water system
NTNC = Non-transient non-community water system
Source Water Type
Source Water Type
Primary water source for the water system.
GU = Ground water Under Direct Influence of Surface Water
GW = Ground Water
GWP = Purchased Ground Water
SW = Surface Water
SWP = Purchased Surface Water
Facility Identification
Code
Water Facility ID
Unigue identifier for each water system facility.
State Facility
Identification Code
State Facility ID
Identifier for each water system facility that is unigue within a
particular state.
State Assigned
Identification Code
State Assigned ID
A state-assigned value which identifies the water system facility.
Raw water TOC
average concentration
Avg Of Raw TOC
(mg/L)
Monthly average (in mg/L) total organic carbon (TOC)
concentration in raw water.
Raw water alkalinity
average concentration
Avg Of Raw
Alkalinity (mg/L)
Monthly average (in mg/L) alkalinity concentration in raw water.
Finished water TOC
average concentration
Avg Of Finished
TOC (mg/L)
Monthly average (in mg/L) total organic carbon (TOC)
concentration in finished water.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-12
February 2024
-------
Section 5: SYR 4 Data Records Posted for Microbial Contaminants,
Microbial Related Parameters, and Disinfectant Residuals
Data for three microbial contaminants (total coliforms (TC), Escherichia coli (EC), and fecal
coliform (FC)) were collected from 2012 to 2019 for SYR 4. The TC datasets are separated into
individual files by each year of data collected because of the large volume of data collected.
Unlike the TC records which are provided separately by year, the EC and FC are contained in
one file. The EC dataset is one large file intended for use in Access or R. Systems are required
under the Surface Water Treatment Rule to monitor for disinfectant residuals at the same time
and locations as TC under TCR/RTCR. Most States submitted data from systems that included
free and total residual chlorine results paired with TC records. However, some States provided
the residual monitoring data in separate datafiles or did not submit that information under the
SYR 4 ICR.
Exhibit E.6 provides a count of States, total number of sample records and systems for TC, EC,
FC, and records of disinfectant residuals. Exhibit E.6 also shows that some States submitted
chlorine residual monitoring results separately under different analyte codes (e.g., Chlorine
(Analyte ID 0999), Residual Chlorine (Analyte ID 1012), and Free Residual Chlorine (Analyte
ID 1013)). To maximize the number of paired total coliform and chlorine residual records, EPA
took additional steps to add records from States reporting residual data records separately (see
Section 5.5.2 of the main text for details on pairing and the analytes used). The "full" datafiles in
Exhibit E.6 contain these paired records as well as records for systems with reported microbial
indicator presence and absence but no associated disinfection residual information.
To assist the user, EPA produced the "paired" TC, EC, and FC datafiles (Exhibit E.6), which
contain only the records for systems in the "full" versions of those datasets that include paired
residual information. The "paired" datafiles were not analyzed under SYR 4 because of the
ongoing considerations of potential revisions of the Surface Water Treatment Rules.
Note that the TC, EC, and FC datasets contain the monitoring records under TCR/RTCR for
systems with all source water types. The HPC, Giardia, disinfectant residual, and paired
TC/EC/FC disinfectant residual files contain the monitoring records under the SWTRs. See
Exhibit E. 1 for a description of field names.
Exhibit E.6: Number of Six-Year Review 4 Data Records for Microbial
Contaminants, Microbial Related Parameters, and Disinfectant Residuals and Zip
Filename(s)
Contaminant
Analyte
ID
Number
of States/
Entities
with Data
Total
Number
of Sample
Records
Total
Number
of
Systems
Zip Filename
Microbial Contaminants and Disinfectant Residual - Full Datasets
Total Coliform (2012)
3100
54
2,349,687
102,423
SYR4_TC.zip
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-13
February 2024
-------
Contaminant
Analyte
ID
Number
of States/
Entities
with Data
Total
Number
of Sample
Records
Total
Number
of
Systems
Zip Filename
Total Coliform (2013)
3100
54
2,398,740
102,713
SYR4_TC.zip
Total Coliform (2014)
3100
56
2,521,212
105,515
SYR4_TC.zip
Total Coliform (2015)
3100
56
2,513,937
104,532
SYR4_TC.zip
Total Coliform (2016)
3100
57
2,656,932
113,099
SYR4_TC.zip
Total Coliform (2017)
3100
57
2,780,743
114,328
SYR4_TC.zip
Total Coliform (2018)
3100
57
2,849,385
114,954
SYR4_TC.zip
Total Coliform (2019)
3100
57
2,675,476
111,385
SYR4_TC.zip
E. coli (EC)
3014
57
7,175,363
93,728
SYR4_EC_FC_HPC_Giardia.zip
E. coli (EC) In Raw Water1
3014
43
65,805
19,515
SYR4_EC_FC_HPC_Giardia.zip
E. coli (EC) In Distribution
Systems2
3014
49
6,346,973
90,607
SYR4_EC_FC_HPC_Giardia.zip
E. coli (EC) In Unknown
Sampling Location3
3014
54
762,585
24,486
SYR4_EC_FC_HPC_Giardia.zip
Fecal Coliform (FC)
3013
40
16,818
1,835
SYR4_EC_FC_HPC_Giardia.zip
Coliphage
3028
2
3
3
SYR4_EC_FC_HPC_Giardia.zip
Enterococci
3002
3
8
4
SYR4_EC_FC_HPC_Giardia.zip
Cryptosporidium
3015
29
19,542
740
SYR4_EC_FC_HPC_Giardia.zip
Heterotrophic Bacteria
(HPC)
3001
16
135,081
595
SYR4_EC_FC_HPC_Giardia.zip
Giardia Lamblia
3008
15
4,628
229
SYR4_EC_FC_HPC_Giardia.zip
Legionella
0
0
0
N/A
Chlorine4
0999
19
6,100,133
4,438
SYR4_Disinfectant_
Residuals.zip
Total Chlorine4
1000
1
125,788
741
SYR4_Disinfectant_
Residuals.zip
Chloramine4
1006
9
78,664
198
SYR4_Disinfectant_
Residuals.zip
Residual Chlorine4
1012
4
179,599
572
SYR4_Disinfectant_
Residuals.zip
Free Residual Chlorine4
1013
3
2,000,997
4,044
SYR4_Disinfectant_
Residuals.zip
Chlorine Dioxide
1008
9
12,752
28
SYR4_Disinfectant_
Residuals.zip
Microbes and Associated Disinfectant Residuals - Paired Datasets5
E. coli (EC) with Associated
Disinfectant Residuals
3014
49
3,079,032
28,091
SYR4_Paired Microbes_DR.zip
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-14
February 2024
-------
Number
Total
Total
Contaminant
Analyte
ID
of States/
Entities
with Data
Number
of Sample
Records
Number
of
Systems
Zip Filename
Fecal Coliform (FC) with
Associated Disinfectant
3013
24
5,966
534
SYR4 Paired Microbes DR.zip
Residuals
Total Coliform (TC) paired
with Associated Disinfectant
3100
43
1,165,209
30,950
SYR4 Paired Microbes DR.zip
Residuals (2012)
Total Coliform (TC) paired
with Associated Disinfectant
3100
44
1,173,926
31,132
SYR4 Paired Microbes DR.zip
Residuals (2013)
Total Coliform (TC) paired
with Associated Disinfectant
3100
46
1,218,722
31,865
SYR4 Paired Microbes DR.zip
Residuals (2014)
Total Coliform (TC) paired
with Associated Disinfectant
3100
47
1,241,995
31,880
SYR4 Paired Microbes DR.zip
Residuals (2015)
Total Coliform (TC) paired
with Associated Disinfectant
3100
48
1,274,211
34,654
SYR4 Paired Microbes DR.zip
Residuals (2016)
Total Coliform (TC) paired
with Associated Disinfectant
3100
50
1,331,868
37,217
SYR4 Paired Microbes DR.zip
Residuals (2017)
Total Coliform (TC) paired
with Associated Disinfectant
3100
50
1,480,354
41,053
SYR4 Paired Microbes DR.zip
Residuals (2018)
Total Coliform (TC) paired
with Associated Disinfectant
3100
50
1,498,050
38,029
SYR4 Paired Microbes DR.zip
Residuals (2019)
1 Includes results with a sample type code of "TG" (i.e., triggered monitoring). Note that these record counts are
subsets of the E. coli records included in the E. coli data set.
2 Includes results not marked as triggered but had a sample point type of"DS", "FC", "FN", "LD", "MD", or "MR" or
records with water facility type of "CC", "DS", "TP", or "TM" and sample point type of "WS" or null. Note that these
record counts are subsets of the E. coli records included in the E. coli data set.
3 Includes remaining E. coli results not identified as coming from raw water or the distribution system. Note that these
record counts are subsets of the E. coli records included in the E. coli data set.
4 Reported independently of the coliform sample results.
5 Refer to Section 5.5.2 for more details on the paired disinfectant residual and total coliform records.
Section 6: SYR 4 Data Records Posted for Aircraft Drinking Water
Rule (ADWR)
EPA downloaded compliance data from the Agency's Aircraft Reporting and Compliance
System (ARCS) for the period from February 2011 to May 2021. This dataset includes aircraft
compliance monitoring data for TC and EC for aircraft drinking water systems (Exhibit E.7). The
Aircraft PWS Inventory file includes records for 8,627 unique aircraft drinking water systems.
Details on the QA/QC procedure for this data can be found in Appendix D of this document.
Note that the number of sample records presented below and included in the posted data reflect
counts prior to the QA/QC procedures were applied for the SYR 4 analyses as presented in
USEPA (2024b). After the QA/QC steps described in Appendix D are applied, there are 140,502
total coliform and 92,994 E. coli records remaining.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-15
February 2024
-------
Exhibit E.7: Number of Aircraft Drinking Water Rule (ADWR) Data Records and
Zip filename
Contaminant
Total Number of
Sample Records
Total Number of
Systems
Zip Filename
Aircraft PWS Sample by Air Carrier and Results
Total Coliform
212,937
8,094
SYR4_ADWRCompliance Data.zip
E. coli2
93,011
7,091
SYR4_ADWR Compliance Data.zip
1 The number of records presented here is greater than the number of rows of data downloaded from ARCS (70,979
at the time of download in support of the SYR 4 analysis) because it counts all samples within each row of data (i.e.,
Sample 1, Sample 2, and Sample 3). Note that Sample 3 is related to the ability to have third sample collected, which
is not a requirement of ADWR and is not often used. Typically there is no data for Sample 3 fields.
2 The count of E. coli records and systems is based on all E. coli samples listed as either "present" or "absent." It
does not include samples listed as "not speciated" or "not analyzed."
Exhibit E.8, Data Dictionary Aircraft Drinking Water Rule (ADWR) Dataset, contains a list of
the data elements, column names and a brief description of the data for each data element
included in the ADWR data text files.
Exhibit E.8: Data Dictionary Aircraft Drinking Water Rule (ADWR) Dataset
Data Element
Column Name
Description
PWS Inventory
Official FAA Corporate
Name
FAA Corporate Name
The name of the air carrier or operator as registered with the
FAA.
FAA Designator
FAA Designator
The four-character designator assigned to the air carrier by the
FAA.
PWS ID
PWS ID
The aircraft public water system identification number (PWSJD)
used by EPA to uniquely identify the aircraft public water system
(PWS).
FAA Aircraft Registry
No
FAARegistry No.
The number for the aircraft that is registered with the Federal
Aviation Administration (FAA), commonly referred to as the N-
number or tail number.
Aircraft Activity Status
Code
Status
The activity status of the aircraft. It is selectable from the drop-
down list. Permissible values are [Activel or [Inactivel.
Routine Disinfection
and Flushing
Frequency
D&FFrequency
The frequency of routine disinfection and flushing scheduled for
this aircraft.
Routine Sample
Frequency
SamplingFrequency
The frequency of routine coliform sampling scheduled for this
aircraft.
Aircraft Manufacturer
Manufacturer
The manufacturer of the aircraft.
Aircraft Model
Model
The manufacturer's model of the aircraft.
Seating Capacity
Seat Capacity
The number of passenger seats configured for the aircraft. It has
a maximum value of 999.
Samples by Air Carrier
Official FAA Corporate
Name
FAA Corporate Name
The name of the air carrier or operator as registered with the
FAA.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-16
February 2024
-------
Data Element
Column Name
Description
FAA Aircraft Registry
No
FAA Registry No.
The number for the aircraft that is registered with the Federal
Aviation Administration (FAA), commonly referred to as the N-
number or tail number.
PWSID
PWS ID
The aircraft public water system identification number (PWSJD)
used by EPA to uniquely identify the aircraft public water system
(PWS).
Routine Sample
Frequency
Routine Sample
Frequency
The frequency of routine coliform sampling scheduled for this
aircraft.
Sample Type
Sample Type
Indicates the type of individual sample: routine, repeat, follow-up,
special.
Date and Time
Collected
Sample Taken On
The date and time the sample was collected. When the galley and
lavatory samples are collected on the same day, the date and
time the first sample was collected is used. The required format is
MM/DD/YYYY with time reported on a 24-hour clock as H:MI
(e.g., 12/01/2014 15:00).
Date and Time Results
Received
Samples Results On
The date and time the sample analysis results were received from
the laboratory (e.g., phone message, USPS delivery date, office
date and time stamp, e-mail receipt date and time). The required
format is MM/DD/YYYY with time reported on a 24-hour clock as
Hours:Minutes (e.g., 12/01/2014 15:00).
Sample Collection
Location (Sample 1)
Samplel Location
The location on the aircraft from where the first sample was
collected. The options are [galley] or [lavatory].
Total Coliform Result
(Sample 1)
Samplel Total
Coliform
The reported lab result that indicates the presence or absence of
total coliform in the first sample analyzed. The drop-down list
options are [Presentl or [Absentl.
E. coli Result (Sample
1)
Samplel E.Coli
The lab analytical result that indicates the presence or absence of
E. coli in the first sample analyzed. The drop-down list options are
[Present] or [Absent] or [Did not speciate], "Did not speciate" is
used when the lab did not analyze a TC+ sample (or "present"
sample result) for E. coli. Note: certified labs are required to
analyze all TC+ samples for E. coli, but it is the carrier's
responsibility to make sure the lab completed the speciation.
Sample Collection
Location (Sample 2)
Sample2 Location
The location on the aircraft from where the second sample was
collected. The options are [galleyl or [lavatoryl.
Total Coliform Result
(Sample 2)
Sample2 Total
Coliform
The reported lab result that indicates the presence or absence of
total coliform in the second sample analyzed. The drop-down list
options are [Present] or [Absent],
E. coli Result (Sample
2)
Sample2 E. coli
The lab analytical result that indicates the presence or absence of
E. coli in the second sample analyzed. The drop-down list options
are [Present] or [Absent] or [Did not speciate], "Did not speciate"
is used when the lab did not analyze a TC+ sample (or "present"
sample result) for E. coli. Note: certified labs are required to
analyze all TC+ samples for E. coli, but it is the carrier's
responsibility to make sure the lab completed the speciation.
Sample Collection
Location (Sample 3)
Sample3 Location
The location on the aircraft from where the third sample was
collected. The options are [galleyl or [lavatoryl.
Total Coliform Result
(Sample 3)
Sample3 Total
Coliform
The reported lab result that indicates the presence or absence of
total coliform in the third sample analyzed. The drop-down list
options are [Present] or [Absent],
E. coli Result (Sample
3)
Sample3 E. coli
The lab analytical result that indicates the presence or absence of
E. coli in the third sample analyzed. The drop-down list options
are [Present] or [Absent] or [Did not speciate], "Did not speciate"
is used when the lab did not analyze a TC+ sample (or "present"
sample result) for E. coli. Note: certified labs are required to
analyze all TC+ samples for E. coli, but it is the carrier's
responsibility to make sure the lab completed the speciation.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-17
February 2024
-------
Section 7: Additional Data Collected under SYR 4 ICR
Additional data relating to certain microbial rules were collected under the SYR 4 ICR request,
including calculated compliance values and corrective actions information. Note that these data
did not undergo the same quality assurance evaluations as the rest of the data.
Calculated Compliance Values
Exhibit E.9 provides a summary of the data elements included in the calculative compliance
values related to Cryptosporidium binning information from SYR 4 ICR database. Exhibit E.10
provides a summary of the systems and states that provided SYR4 Cryptosporidium binning
data.
Exhibit E.9: Data Dictionary of Cryptosporidium Binning Information Included as
part of the Calculated Compliance Values Table (Filename: SYR4_CryptoBinning)
Data Element
Column Name
Description
Contaminant Identification Code
ANALYTE_CODE
4-digit Safe Drinking Water Information System
(SDWIS) contaminant identification number for which
the sample is being analyzed.
Contaminant Name
AN ALYTE_N AM E
Common name of contaminant for which the sample is
being analyzed.
Public Water System Identification
Number (PWSID)
PWSID
The code used to identify each PWS. The code begins
with the standard 2- character postal State
abbreviation or region code; the remaining 7 numbers
are unique to each PWS in the State.
Facility Identification Code
WATER_FACILITY_ID
A unique identifier for each water system facility.
Compliance Period Begin Date
CP_PRD_BEGIN_ DT
Compliance Period Begin Date.
Compliance Period End Date
C P_P R D_E N D_DT
Compliance Period End Date.
Bin Number
BIN_NUMBER
The BIN assignment for the period of time covered by
the average.
Exhibit E.10: Six-Year Review 4 Data Summary for Calculated Compliance Values
Related to Cryptosporidium Binning Information
Number of
States
Total Number of
Sample
Records
Total
Number of
Systems
Zip Filename
23
27,812
486
SYR4_CryptoBinning.zip
Corrective Actions
Exhibit E. 11 provides a summary of the data elements included in the corrective actions table
within the SYR 4 ICR database. Exhibit E. 12 provides a summary of the corrective action data
collected as part of SYR. Note, however, that EPA did not evaluate the specific types of
corrective actions (e.g., those related to sanitary surveys) as part of SYR 4.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-18
February 2024
-------
Exhibit E.11: Corrective Actions Data Dictionary (Filename:
SYR4_CorrectiveActions)
Data Element
Column Name
Description
Corrective Action ID
CORACTJD
Unique identifier for each corrective action.
Public Water System
Identification Number (PWSID)
PWSID
The code used to identify each PWS. The code
begins with the standard 2- character postal State
abbreviation or region code; the remaining 7
numbers are unique to each PWS in the State.
State Code
STATE_CODE
State in which the system is located using the
State's two letter abbreviation.
Date Issue Identified
DATEJSSUEJDENTIFIED
Date the corrective action was identified.
Schedule Type
SCHEDULE_TYPE
Type of schedule for the corrective action.
Schedule Description
SCHEDULE_DESCRIPTION
Schedule for the corrective action.
Corrective Action Category Code
CORACT_CAT_CODE
Category code for the corrective action.
Corrective Action Name
CORACT_NAME
Name of the corrective action.
Due Date
DUE_DATE
Due date for the required corrective action.
Achieved Date
ACHIEVED_DATE
The date that the water system achieved the
corrective action required.
Exhibit E.12: Six-Year Review 4 Data Summary for Corrective Actions
Number of
States
Total Number
of Sample
Records
Total
Number of
Systems
Zip Filename
41
69,821
15,984
SYR4_Corrective_Actions.zip
Section 8: Treatment Data
Exhibits E. 13 and E. 14 provide a comprehensive summary of the data elements included in the
treatment information within the SYR 4 ICR database. EPA has posted these data online;
however, it is important to note that the treatment information did not undergo the same quality
assurance evaluations as the SYR 4 occurrence data. Exhibit E.13 identifies the data elements
used in the treatment information tables and a description of each data element. However, the
majority of these data elements are not populated in the SYR 4 ICR dataset. Exhibit E.14
represents the database relationships between tables in the SYR 4 ICR treatment database. This
diagram shows how the treatment tables relate to one another. Bolded field names are primary
keys, or unique fields, designated to identify all table records. Primary keys contain a unique
number for each row of data. Italicized field names are foreign keys that serve as the link (i.e.,
connection) between two or more related tables. Relationships between key fields in different
tables are illustrated by the lines connecting the tables.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-19
February 2024
-------
Exhibit E.13: Treatment Data Dictionary (Filename: SYR4_Treatment)
Data Element | Description
Water system facility plant table (T6YWSFPLT)
Treatment Plant ID
Unique identifier for each treatment plant water system facility record.
Water Facility ID
Identifier that relates each record to the unique record in the T6YWSF
table.
State Assigned ID Code
A State-assigned value which identifies the treatment plant water system
facility.
Water Facility Type
The value extracted from SDWIS/State will be "TP" (treatment plant).
The values from non-SDWIS States include "TM" (transmission manifold)
and "ST" (storage).
Filter Type
Unfiltered (UF), Conventional Filtration (CF), Direct Filtration (DF),
Diatomaceous Earth (DE), Other (OT), and other permitted values that
the System Administrator may add.
Description of Filter
A description of the filter.
Disinfectant Concentration (mg/L)
Disinfectant Concentration in mg/L.
Contact Time Status
Contact Time Status. Permitted values are: RQD - Required; NRQD -
Not Required; REQT - Requested; RECV - Received; URVW - Under
Review; RVWD - Reviewed; APVD - Approved; DTMD - Determined;
DENY - Denied; RESB - Resubmitted.
Contact Time Determination Date
Date the Contact Time was determined.
Contact Time
Contact Time in minutes - the number of minutes the water was in
contact with the disinfectant in order to be properly disinfected. The
ranqe of values is 0001 to 2400.
CT Value
CT value in mq x min/liter.
Disinfection Benchmark for
Giardia Inactivation in Logs
The disinfection profile benchmark for Giardia inactivation in Logs.
Status of Disinfection Benchmark
for Giardia Inactivation
The status of the disinfection profile benchmark for Giardia inactivation.
See CONTACT TIME STAT for permitted values and description.
Date of Disinfection Benchmark
for Giardia
The date the disinfection virus benchmark was determined.
Disinfection Benchmark for
Giardia Inactivation Percent
The disinfection profile benchmark for Giardia inactivation percent.
Disinfection Benchmark for Virus
Inactivation in Logs
The disinfection profile benchmark for virus inactivation in Logs.
Status of Disinfection Benchmark
for Virus Inactivation
The status of the disinfection profile benchmark for Virus inactivation.
See CONTACT TIME STAT for permitted values and description
Date of Disinfection Benchmark
for Virus
The date the disinfection virus benchmark was determined.
Disinfection Benchmark for Virus
Inactivation Percent
The disinfection profile benchmark for virus inactivation percent.
FBR Schematic Status
Under the Filter Backwash Rule, a water system is required to submit a
schematic of this treatment plant to the primacy agency for review to
demonstrate the percentage of filter backwash that is returned to the
treatment plant influent. See CONTACT_TIME_STAT for permitted
values and description.
Date FBR Schematic Received
Date primacy agency received treatment plant schematic to demonstrate
the percentage of filter backwash that is returned to the treatment plant
influent.
Date FBR Schematic Reviewed
Date primacy agency completes review of treatment plant schematic and
determines the percentage of filter backwash that is returned to the
treatment plant influent.
Status of Alternate Return
Location for FBR
The status of a request from the water system to request an alternate
location for return of the filter backwash
Date of Alternate Return Location
for FBR
The date that the water system requested an alternate location for return
of the filter backwash.
Status of FBR Corrective Action
The status of corrective action by the water system as required by the
primacy agency after review of the schematic of the filter backwash flow
in the treatment plant.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-20
February 2024
-------
Data Element
Description
FBR Corrective Action Date
The date that the water system achieved the corrective action required
for the filter backwash.
User ID Initials
The User ID of the person who created this record.
FBR Comments
A memo field into which a user may enter comments about the Filter
Backwash Recycling Rule.
Disinfection Benchmark Reason
Text description associated with the Disinfection Benchmark Reason.
Contact Time Reason
Text description associated with the Contact Time.
Treatment process table (T6YTREATPROCESS)
Treatment Process ID
Unigue identifier for each treatment record.
Water Facility ID
Identifier that relates each record to the unigue record in the T6YWSF
table.
Treatment Objective Code
A coded value that categorizes the treatment objective.
Treatment Objective Name
The name of the treatment objective.
Treatment Process Code
A coded value that categorizes the treatment process.
Treatment Process Name
The name of the treatment process.
Water system flows table (T6YWSFFLOWS)
Water System Facility Flow ID
Unigue identifier for each water system facility flow record.
Water Facility ID
Identifier that relates each record to the unigue record in the T6YWSF
table.
Facility Flow ID Number
Identifier for each water system facility flow entry that is unigue when
combined with T6YWSFT6YWSF ID.
Facility Train ID
This attribute identifies the water system facilities that are part of the
same flow.
Sequence ID
This attribute identifies the order of the water system facilities in a
specific flow.
Process Water Type
A system administrator controlled code of the type of water flowing
between the facilities.
Water Quantity Measure
A value that represents the number of gallons of water purchased.
Water Quantity Measure Unit
A coded value which specifies the unit of measurement for the guantity
of water purchased.
Connection Type
Categorizes the type of connection between the water system facilities.
Connection Date
The date of the connection of the water system facility to another water
system facility.
Disconnection Date
The date of the disconnection of the water system facility from another
water system facility.
Supplying Facility ID
Identifier for each supplying water system facility that is unigue when
combined with TINWSFOST CODE.
Supplying Facility State Code
Two-digit code that identifies the State that submitted data for the facility
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-21
February 2024
-------
Exhibit E.14: Treatment Data Diagram
Facility Flow ID Number
Facility Train ID
Sequence ID
Process Water Type
Water Quantity Measure
Water Quantity Measure Unit
Connection Type
Connection Date
Disconnection Date
Supplying Facility ID
Supplying Facility State Code
Water System Facility Plant Table
T6YWSFPLT
Treatment Plant ID
Water Facility ID
State Assigned ID Code
Water Facility Type
Filter Type
Description of Filter
Disinfectant Concentration (mg/L)
Contact Time Status
Contact Time Determination Date
Contact Time
CT Value
Disinfection Benchmark for Giardia Inactivation in Logs
Status of Disinfection Benchmark tor Giardia
Inactivation
Date of Disinfection Benchmark for Giardia
Disinfection Benchmark for Giardia Inactivation Percent
Disinfection Benchmark tor Virus Inactivation in Logs
Status of Disinfection Benchmark for Virus Inactivation
Date of Disinfection Benchmark for Virus
Disinfection Benchmark for Virus Inactivation Percent
FBR Schematic Status
Date FBR Schematic Received
Date FBR Schematic Reviewed
Status of Alternate Return Location for FBR
Date of Alternate Return Location tor FBR
Status of FBR Corrective Action
FBR Corrective Action Date
User ID Initials
FBR Comments
Disinfection Benchmark Reason
Contact Time Reason
Section 9: SYR 4 Data Considerations
The SYR 4 ICR data has undergone appropriate quality assurance evaluation and enough States
provided compliance monitoring data and treatment technique information to be representative
for national-scale analyses. EPA used the data in analytical activities informing decisions for
SYR 4. The data include sufficient information for users to be able to reproduce the SYR 4
analyses. There are a few limitations of the final SYR 4 ICR dataset that should also be
acknowledged. There may be different levels of completeness for different contaminants within
the dataset. In some cases, the number of records per State ranged from less than one hundred
records up to more than a million records for a given contaminant. States might not have
submitted data for certain contaminants if they have monitoring waivers for the contaminant.
States may grant waivers to PWSs to reduce monitoring frequencies, and it is possible that no
samples were collected by systems during the SYR 4 period of review. Other States may have
submitted data for these contaminants under the ICR; however, the data were not in a format
compatible with the SYR 4 ICR dataset. Furthermore, there were four States and three additional
tribes or territories whose data are missing entirely from the analysis. A thorough QA/QC
process was undertaken to evaluate these SYR 4 ICR data used for analyses. However, it is
possible that data entry errors may still exist in the final SYR 4 ICR dataset. The QA/QC review
focused only on the data elements essential for analysis as part of SYR 4. For a complete
discussion of the SYR 4 ICR dataset, including a description of the quality assurance/quality
control review, refer to the main text of this document and USEPA (2024a). For more detailed
information on the microbial contaminants' occurrence analysis, refer to USEPA (2024b).
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-22
February 2024
-------
Section 10: Instructions on Importing SYR 4 Datasets
These text files are tab delimited and have no text qualifier. Field names are included in the first
row of each file. The complete SYR 4 ICR dataset is too large to be imported into Excel as well
as certain individual files, these files include individual years of TC and EC files, free chlorine,
total chlorine and paired datasets of TC/EC/FC with residual disinfectant. The data are available
for download for each parameter and should be imported into a data management system that
supports large datasets for analysis.
10A: Downloading Data Files (Note that instructions may vary depending on the version and
software used to import data.)
1. Begin by reviewing the SYR 4 ICR Dataset Summary (Exhibit E.2) and in particular note
the table of Data Field Names and Definitions (Exhibit E. 1).
2. Access the SYR 4 ICR data by going to the Six-Year Review homepage. Click on the
link for "Six-Year Review 4."
3. Click on the desired zip file and select "Save As" to save the file to your computer.
4. Navigate to the location on your computer where you saved the zip file and extract the
zip file contents by clicking "Open with" and using WinZip or a similar file compression
software
10B: Importing Data into Microsoft Excel
Using Microsoft Excel 2013 or a newer version is recommended due to the size of the
dataset(s). Note, the following microbial and disinfection byproduct data files are too
large to import into Microsoft Excel: TTHM, HAA, free residual chlorine, total chlorine,
all TC files, EC, and all paired microbes and disinfectant residual files.
5. Open a blank workbook in Microsoft Excel.
6. In the workbook, select Data among the tabs at the top of the page.
7. On the far left, top of the screen, go to the Get External Data section and select From
Text.
8. You will be prompted to select a text file. Locate the text files you extracted in Step 4,
and click "Import" on the text file of interest.
9. A preview of the file text converted to a table will appear. At the top, verify that File
Origin (depending on your computer's operating system) displays "10000: Western
European (Mac)" or "1252: Western European (Windows) " Select "Tab" as the
Delimiter and "Based on first 200 rows" as the Data Type Detection. Click Load To...
Data Management QA/QC Process E-23 February 2024
for the SYR 4 ICR Dataset
-------
10. In the next window, choose "Table" under Select how you want to view the data in your
workbook. Select "Existing worksheet" for where to put the data and verify the table's
origin cell origin displays as "=$A$1." Click OK.
11. A "Queries & Connections" window will appear on the right of the screen as Excel
generates the new table. This step may take several minutes.
12. Save the Excel spreadsheet file once the table generation is complete.
10C: Importing Data into R
1. Open a blank R script.
2. Using the function read.delim(), import the text file using the following format:
a. [analyte name] <- read.delim(file = [filepath], header = TRUE)
Example: bromoform <- read.delim(file = "C:/Users/[username]/Desktop/SYR4-
Microbes /SUMMARY MDBPSBROMOFORM.txt", header = TRUE)
3. Check the data frame that is generated to ensure correct formatting.
4. NOTE: data columns that should be in date format will be imported as character type. To
fix the format, include the line "df$DATE <- as.Date.character(df$DATE, format = "%d-
%b-%y")" in the R code, replacing df with the name of the dataframe, and DATE with
the name of the column containing date information.
10D: Importing Data into Microsoft Access
1. Open a blank database in Microsoft Access.
2. In the database, select External Data among the tabs at the top of the page.
3. On the far left, top of the screen, go to the New Data Source dropdown and select From
File > Text File.
4. You will be prompted to select a text file. Locate the text files you extracted in Step 4,
and with the following options: "import the source data into a new table in the current
database", or "Link to the data source by creating a linked table". You can choose either
method, but note that linking the file will maintain a smaller database size. Click OK.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-24
February 2024
-------
Get External Data - Text Fife
Select the source and destination of the data
Specify the source of the definition of the objects.
File name:
Specify how and where you want to store the data in the current database,
We will not import table relationships, calculated columns, validation rules, default values, and columns of certain legacy data types
such as OLE Object,
Search for 'Import' in Microsoft Access Help for more information.
O Import the source data into a new table in the current database.
If the specified table does not exist, Access will create it. If the specified table already exists. Access might overwrite its
contents with the imported data. Changes made to the source data will not be reflected in the database.
O Append a copy of the records to the table: SUMMARY_ALKALINITY_TOTAL
If the specified table exists. Access will add the records to the table. If the table does not exist, Access will create it.
Changes made to the source data will not be reflected in the database,
(§) Link to the data source by creating a linked table.
Access will create a table that will maintain a link to the source data. You cannot change or delete data that is linked to a
text file. However, you can add new records.
5. The Link (or Import) Text Wizard will appear. The default settings will be displayed and
should have Delimited selected as the data format. Select Next>.
6. Default settings will display next and should have "Tab" selected as the delimiter. Select
the checkmark box next to "First Row Contains Field Names." Next, click
"Advanced...".
Data Management (M/QC Process
for the SYR 4 ICR Dataset
E-25
February 2024
-------
51 Link Text Wizard
What delimiter separates your fields? Select the appropriate delimiter and see how your text is affected in the preview below.
Choose the delimiter that separates your fields:
(#) Tab Q Semicolon Q
Comma Q Sgace Q
Other:
|s/]First Row Contains Field Names
Text Qualifier: |" ^ |
ANALYTE CODE
ANALYTE NAME
STATE CODE
PWSID
SYSTEM NAME
SYST
1009
:hlorite
AL
&L0000798
iOULTON WATER WORKS BOARD
:
/S
1009
CHLORITE
IA
IA2038038
DSCEOLA WATER WORKS
:
1009
CHLORITE
IA
IA2038038
DSCEOLA WATER WORKS
;
1009
CHLORITE
IA
IA2038038
3SCEOLA WATER WORKS
:
1009
:hlorite
IA
IA2038038
3SCEOLA WATER WORKS
;
1009
:hlorite
IA
IA2038038
3SCEOLA WATER WORKS
:
1009
CHLORITE
IA
IA2038038
5SCEOLA WATER WORKS
;
1009
:hlorite
*1
*11592010
NEWPORT-CITY OF
3
1009
:hlorite
U
111592010
NEWPORT-CITY OF
:
1009
CHLORITE
RI
111592010
NEWPORT-CITY OF
:
1009
:hlorite
K1
¦111592010
NEWPORT-CITY OF
;
1009
CHLORITE
ts
CS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
NTNC
1009
3HLORITE
CS
SS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
!JTNC
1009
CHLORITE
CS
CS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
NTNC
V
<
>
Advanced...,,
]
Cancel < Back 1 Next > 1 Finish
Is
7. The Link (or Import) Specification window will appear. In the Dates, Times, and
Numbers section, set the Date Order value to "DMY."
SUMMARY_FECAL_C0L1F0RM Link Specification
File Format:
Language:
Code Page:
(5) Delimited
O Fixed Width
Field Delimiter:
Text Qualifier:
[tab)
English
OEM United States
Dates,Times, and Numbers
Date Order:
Date Delimiter:
Time Delimiter:
*iy|| v
DYM L>
Field Information:
MDY
MYD
YDM
YMD
0 Four Digit Years
~ Leading Zeros in Dates
Decimal Symbol:
OK
Cancel
Save As,.
Sgecs.,,
r
Field Name
¦.ILbtLiMgPW
ANALYTE CODE
ShortText
ANALYTE NAME
Short Text
STATE CODE
ShortText
PWSID
ShortText
SYSTEM NAME
ShortText
SYSTEM TYPE
ShortText
RETAIL POPULA1
Long Inteqer
ADJUSTED TOTA
Lonq Inteqer
SOURCE WATER
ShortText
On the screen that follows, keep the default settings shown below and click Next>.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-26
February 2024
-------
51 Link Text Wizard
You can specify information about each of the fields you are importing. Select fields in the area below. You can then modify field
information in the 'Field Options' area,
Field Options
Field Name: aewwrfSBEH | Datalype: |shortText |v |
indexed: [No v | |~| Do not import field (Skip)
Ianalyte code |
ANALYTE NAME
STATE CODE
PWSID
SYSTEM NAME
SYST
1009
3HLORITE
AL
&L0000798
tfOULTON WATER WORKS BOARD
C
1009
3HLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
CHLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
ZHLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
3HLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
:hlcr:rz
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
CHLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
3HLORITE
RI
RI1592010
NEWPORT-CITY OF
c
1009
3HLORITE
RI
RI1592010
NEWPORT-CITY OF
c
1009
CHLORITE
RI
RI1592010
fJEWPORT-CITY OF
c
1009
3HLORITE
RI
RI1592010
NEWPORT-CITY OF
c
1009
ZHLORITE
KS
KS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
MTNC
1009
CHLORITE
KS
KS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
ntnc
1009
CHLORITE
KS
KS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
NTNC
< >
Advanced... Cancel | Finish
w
If you are importing instead of linking, a window will pop up related to setting a primary
key. The default is set to "Let Access add a primary key". Check "No primary key" and
click Next >,
El] Import Text Wizard X
Microsoft Access recommends that you define a primary key for your new table. A primary key is used to
uniquely identify each record in your table. It allows you to retrieve data more quickly.
OLet Access add primary key.
Q Choose my own primary key.
('iNo primary key.
Fieldl
Field2
Field3
Field4
Field5
Field6
Field7
PWSID
State
SDWIS_YN
PurchasingStatus
Population Served
System Type
Source Water Typ
080890001
08
Y
0%
1527
c
SW
080890001
08
if
0%
1527
C
sw
080890001
08
Y
0%
1527
c
SW
080890001
08
if
0%
1527
C
sw
080890001
08
₯
0%
1527
C
sw
080890001
08
y
0%
1527
c
SW
080890001
08
t
0%
1527
c
sw
080890001
08
y
0%
1527
c
sw
080890001
08
₯
0%
1527
c
sw
080890001
08
Y
0%
1527
c
sw
080890001
08
0%
1527
c
sw
080890001
08
f
0%
1527
c
sw
080890001
08
Y
0%
1527
c
sw
080890001
08
Y
0%
1527
c
sw
080890001
08
Y
0%
1527
c
sw
9. A final screen will appear. Enter a meaningful name for the linked/imported table. This
field will be auto-populated with the name of the linked file. Click Finish.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-27
February 2024
-------
^il Link Text Wizard
X
Thafs all the information the wizard needs to link to your data.
Linked Table Name:
aiifiifiMa5h««ii3
Advanced... Cancel | Finish j\ |
Part Two: Filtering and Formatting Data in Excel
10. To efficiently search, have cell A1 selected, choose "Data" among the tabs on the top of
the page and click on "Filter." Each header title for each column now will have a small
dropdown arrow displayed.
11. Filtering the data: a. If you want to look for a specific public water system, click the
dropdown arrow for "PWSID" or "System Name." Within the search field, type the name
and select from the displayed list. b. If you want to search for a different public water
system, click the dropdown arrow and "Clear Filter from PWSID" or "Clear Filter from
System Name." c. If you want to filter the data by contaminant, select "Analyte Name."
12. Multiple filters can be applied for example, allowing you to look for an individual water
system's data for a specific contaminant of interest.
13. De-select Filter in the top menu bar and the entire database will again be displayed.
14. Note, all column formats are imported as the default General formatting. Column formats
must be individually, manually changed in Excel after the download is complete to aid in
data analysis. Use the Home screen in Excel, highlight the column and select the format
from the drop-down menu. Suggested formats are:
Text fields
Analyte Name
State Code
PWSID
System Name
System Type
Source Water Type
Water Facility Type
Data Management OA/OC Process
for the SYR 4 ICR Dataset
E-28
February 2024
-------
Sampling Point Type
Source Type Code
Sample Type Code
Laboratory Assigned ID
Sample Collection Date
Detection Limit Unit
Detection Limit Code
Value Unit
Presence Indicator Code
Numeric fields
Analyte ID
Retail Population Served
Adjusted Total Population Served
Water Facility ID
Sampling Point ID
Six-Year ID
Sample ID
Detection Limit Value
Detect
Value
Residual Field Free Chlorine mg/L
Residual Field Total Chlorine mg/L
References
United States Environmental Protection Agency (USEPA). 2016. Six-Year Review 3 Technical
Support Document for Disinfectants/Disinfection Byproducts Rules. EPA-810-R-16-012.
December 2016.
USEPA. 2019. Information Collection Request Submitted to OMB for Review and Approval;
Comment Request; Contaminant Occurrence Data in Support of the EPA's Fourth Six-Year
Review of National Primary Drinking Water Regulations: October 31, 2019, Volume 84,
Number 211, Page 58381-58382.
Data Management QA/QC Process
for the SYR 4 ICR Dataset
E-29
February 2024
------- |