SEPA
United States
Environmental Protection
Agency
The Data Management and Quality
Assurance/Quality Control Process for EPA's
Fourth Six-Year Review's Microbial and
Disinfection Byproduct Preliminary Datasets
-------
Office of Water (4607M)
EPA- 810-R-22-001
August 2022
-------
Disclaimer
This document describes the Microbial and Disinfection Byproducts (MDBP) compliance
monitoring data and treatment technique information that was collected for EPA's fourth Six-
Year Review (SYR4). The purpose of the Six-Year Review (SYR) is to evaluate current
information for regulated contaminants to determine if there is new information to support a
regulatory revision that will improve or strengthen public health protection. The SYR4's MDBP
data files are being preliminarily released ahead of the publication of SYR4 results for the
purpose of MDBP rule revisions analyses. For more information on the Potential Revisions of
the MDBP Rules see EPA's webpage https://www.epa.gov/dwsixvearreview/potential-revisions-
microbial-and-disinfection-bvproducts-rules. The data files released in July 2022 are believed to
be fully accurate. Should errors or other data quality issues be identified between July 2022 and
the date for the final release of SYR4, EPA may elect to update the MDBP data files - i.e., at any
time up until the completion of SYR4.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
iii
August 2022
-------
Executive Summary
The 1996 Amendments to the Safe Drinking Water Act (SDWA) require that the Environmental
Protection Agency (EPA) "shall, at least once every six years, review and revise, as appropriate,
each National Primary Drinking Water Regulation (NPDWR)." The NPDWRs are often referred
to as the national drinking water contaminant regulations or drinking water standards. The
purpose of the review, called the Six-Year Review (SYR), is to evaluate current information for
regulated contaminants to determine if there is new information on health effects, treatment
technologies, analytical methods, occurrence and exposure, implementation and/or other factors
that provides a health or technical basis to support a regulatory revision that will improve or
strengthen public health protection. To support each of Six-Year Review processes (including
fourth Six-Year Review, SYR4, the EPA issues an Information Collection Request (ICR) to the
States and primacy agencies to collect the recent data information that public water systems
(PWSs) have submitted per requirements of NPDWRs. The data is voluntarily submitted and
typically consist of the compliance monitoring records and the records related to treatment
technique requirements, usually covering a period of about six years for every cycle. For more
information on the SYR4 ICR see EPA's website: https://www.epa.gov/dwsixvearreview/six-
vear-review-4-drmkmg-water-standards-mformation-coHection-requesf)
As a result of EPA's third Six-Year Review (SYR3) of NPDWRs that was published in 2017
(https://www.epa.gov/dwsixvearreview/six-vear-review-3-drinking-water-standards). EPA
identified eight contaminants covered by the Microbial and Disinfection Byproducts (MDBP)
rules as candidates for revision. The eight contaminants include: Chlorite, Cryptosporidium,
Haloacetic acids, Heterotrophic bacteria, Giardia lamblia, Legionella, Total Trihalomethanes,
and viruses. The eight contaminants are included in the following MDBP rules: Stage 1 and
Stage 2 Disinfectants and Disinfection Byproducts Rules, Surface Water Treatment Rules,
Interim Enhanced Surface Water Treatment Rule, and Long-Term 1 Enhanced Surface Water
Treatment Rule. As a follow-on to SYR3, EPA is conducting analyses to further evaluate the
eight NPDWRs for potential regulatory revisions under the potential MDBP Rule Revisions
effort (https://www.epa.gov/dwsixvearreview/potential-revisions-microbial-and-disinfection-
bvproducts-rules). To help support the ongoing considerations of the potential MDBP Rule
Revisions and related analyses, EPA is posting the SYR4 ICR data files pertaining to MDBP
rules prior to the publication of SYR4 results. The SYR4 ICR data records not pertaining to
MDBP rules will be available along with the SYR4 results, expected in 2023.
Since the data recording, managing practices and resultant data records can vary among
individual states and primacy agencies, upon receipt of the data files for SYR, EPA conducts a
Quality Assurance/Quality Control (QA/QC) Process to normalize the data records for analyses
at a national level (including characterization of national occurrence baselines of regulated
contaminants). This document describes the QA/QC process for the posted MDBP data files
contained in the SYR4 ICR dataset for the potential MDBP Rule Revisions. This document
describes the overall QA/QC process that was applied to all SYR4 ICR data as well as the
QA/QC process applied specifically to the MDBP data files.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
iv
August 2022
-------
The document also contains a User Guide for downloading and importing the MDBP data from
the EPA website (https://www.epa.gov/dwsixvearreview/microbial-and-disinfection-bvproduct-
data-fil es-2012-2019-epas-fourth-six-vearY
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
v
August 2022
-------
Contents
Disclaimer iii
Executive Summary..................................................................................................... iv
List of Exhibits ............................................................................................................ vii
Appendices.................................................................................................................. vii
Chapter 1 Introduction.................................................................................................. 1
Chapter 2 Data Acquisition .......................................................................................2-1
Chapter 3 Data Management.....................................................................................3-1
3.1 Review of SYR4 Dataset Content 3-1
3.2 Restructuring Non-SDWIS State Data 3-2
3.3 Establishing Consistent Data Fields for Analytical Results (SDWIS and Non-SDWIS States) 3-2
Chapter 4 Data Quality Assurance and Quality Control .........................................4-1
4.1 Quality Assurance Measures Applied to All Contaminants 4-1
4.1.1 Non-Public Water Systems 4-3
4.1.2 Systems with Missing Inventory Data 4-3
4.1.3 Sample Results Collected Outside of the Date Range 4-4
4.1.4 Non-Compliance 4-4
4.1.5 Uniform System Inventory Information 4-4
Chapter 5 Quality Assurance Measures Applied to Disinfection Byproducts and
Disinfection Byproduct Related Parameters ........................................................... 5-1
5.1 Non-Routine Samples 5-2
5.2 Duplicate Records 5-3
5.3 Units of Measure 5-3
5.4 Potential Outliers 5-3
5.5 Locational Flag 5-4
Chapter 6 Quality Assurance Measures Applied to Microbial Contaminants.......6-1
6.1 Non-Routine Samples 6-2
6.2 Pairing Disinfectant Residual and Coliform Results for non-SDWIS states 6-2
6.3 Updates to Absence and Presence Codes 6-3
Chapter 7 References ................................................................................................7-1
for the SYR4 MDBP Preliminary Datasets
-------
List of Exhibits
Exhibit 1: List of Microbial and Disinfection Byproducts Contaminants/Parameters Identified in SYR4
ICR for which Data Were Requested from States 2-1
Exhibit 2: Data Elements Requested by EPA for the Fourth Six-Year Review1 2-2
Exhibit 3: Summary of States and Other Entities that Provided Compliance Monitoring Data and
Treatment Technique Information for SYR4 2-5
Exhibit 4: Contaminant Group Monitoring Requirements 4-2
Exhibit 5: Flow Chart of QA Measures Applied to All SYR4 Contaminants 4-3
Exhibit 6: Flow Chart of Additional QA Measures Specific to DBPs and DBP Related Parameters 5-1
Exhibit 7: Summary of the Count of Analytical Sample Results Removed via the QA Measures Applied
to DBP Rule Contaminants1 5-2
Exhibit 8: List of DBP MCL Values 5-4
Exhibit 9: Summary of the Count of Records Removed via the QA Measures Applied to Microbial Rule
Contaminants 6-1
Exhibit 10: Summary of the Count of Analytical Samples Results Removed via the QA Measures Applied
to Microbial Rule Contaminants1 6-1
Appendices
Data request letter EPA sent contacting each primacy agency to request
voluntary submission of its compliance monitoring data and treatment
technique information for regulated chemical, radiological, and
microbiological contaminants.
User Guide to Downloading Six-Year Review 4's Microbial and
Disinfection Byproducts Information Collection Request data files from
EPA's Website
Six-Year Review 4's Microbial and Disinfection Byproduct Data Records
by State
Data Management QA/QC Process vii August 2022
for the SYR4 MDBP Preliminary Datasets
-------
Acronyms
CAS Chemical Abstracts Service
CO Confirmation
CWS Community Water System
DBP Disinfection Byproduct
DBPR Disinfection Byproduct Rule
D/DBPR Disinfectants and Disinfection Byproducts Rule
EC Escherichia coli (E. coli)
eDWR Electronic Drinking Water Report
EPA Environmental Protection Agency (United States)
FBRR Filter Backwash Recycling Rule
FC Fecal Coliforms
GW Ground Water
GWR Ground W ater Rul e
GWUDI Ground Water Under Direct Influence (of Surface Water)
HAA Haloacetic Acids
HPC Heterotrophic Plate Count
IESWTR Interim Enhanced Surface Water Rule
ICR Information Collection Request
LT1ESWTR Long-Term 1 Enhanced Surface Water Treatment Rule
LT2ESWTR Long-Term 2 Enhanced Surface Water Treatment Rule
MCL Maximum Contaminant Level
MDBP Microbial and Disinfection Byproducts
MDL Method Detection Limit
mg/L Milligrams per Liter
MOR Monthly Operating Report
MR Maximum Residence
MRDL Maximum Disinfectant Residual Level
MRL Minimum Reporting Level
MS Microsoft
NCOD National Contaminant Occurrence Database
ND Non-detect or Non-detection
NPDWR National Primary Drinking Water Regulation
NTNCWS Non-Transient Non-Community Water System
OMB Office of Management and Budget
PWS Public Water System
PWSID Public Water System Identification Number
QA Quality Assurance
QC Quality Control
RP Repeat
RT Routine
RTCR Revised Total Coliform Rule
SDWA Safe Drinking Water Act
SDWIS/Fed Safe Drinking Water Information System / Federal Version
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
August 2022
-------
SDWIS/State
Safe Drinking Water Information System/State Version
sw
Surface Water
SWP
Purchased Surface Water
SWTR
Surface Water Treatment Rule
SYR
Six-Year Review
SYR3
Third Six-Year Review
SYR4
Fourth Six-Year Review
TC
Total Coliform
TCR
Total Coliform Rule
TG
Triggered
TNCWS
Transient Non-Community Water System
TOC
Total Organic Carbon
TTHM
Total Trihalomethanes
USEPA
United States Environmental Protection Agency
^g/L
Micrograms per Liter
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
ix
August 2022
-------
Chapter 1 Introduction
This document describes the Quality Assurance/Quality Control (QA/QC) process applied to the
Microbial and Disinfection Byproduct (MDBP) data that was collected as a part of the fourth
Six-Year Review (Six-Year Review 4 or SYR4) of National Primary Drinking Water
Regulations (NPDWRs). The purpose of the Six-Year Review (SYR) is to evaluate current
information for regulated contaminants to determine if there is new information to support a
regulatory revision that will improve or strengthen public health protection. This document
describes how this data were requested, obtained, received, evaluated and formatted (when
necessary). This document also describes data quality issues and modifications to the data to
make it consistent throughout and usable for analyses. The SYR4 MDBP data files are being
released separately of SYR4 publication for the purpose of MDBP rulemaking revisions
analyses.
The SYR4 compliance monitoring data and treatment technique information were provided to
EPA voluntarily by primacy agencies via the SYR4 Information Collection Request (ICR)
process. EPA received data from 59 primacy agencies (46 states plus territories, Washington,
D.C., and Tribes).
The SYR4 ICR data were received from primacy agencies in a variety of formats and data
structures and required restructuring to a uniform format for the purpose of conducting
contaminant occurrence analyses.
This document describes the MDBP compliance monitoring data and treatment technique
information requested and received for SYR4, and provides an overview of the data
management, and the QA/QC efforts used to prepare the MDBP datasets.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
1-1
June 2022
-------
Chapter 2 Data Acquisition
To obtain national compliance monitoring data and treatment technique information used in
support of SYR4, EPA conducted a data call-in from the states, through the National Compliance
Monitoring Information Collection Request (ICR) Dataset for the fourth Six-Year Review (or
"SYR4 ICR dataset"). For more information on the process undertaken to request the voluntary
submission of compliance monitoring data and treatment technique information from primacy
agencies, see the fourth Six-Year Review ICR (84 FR 58381, USEPA, 2019).
EPA contacted each primacy agency via a letter requesting the voluntary submission of their
compliance monitoring data and treatment technique information for all NPDWRs and related
parameters that were collected between January 2012 and December 2019.
EPA requested only information stored electronically (no paper records) and that represented
routine compliance monitoring data and treatment technique information. Exhibit 1 shows the
regulated contaminants for Stage 1 and Stage 2 Disinfectants and Disinfection Byproducts Rules
DBP Rules (D/DBPRs) and Surface Water Treatment Rules (SWTRs) for which EPA requested
data, and Exhibit 2 shows the requested data elements (e.g., columns or fields) for each sample
result. Note that there were cases where EPA did not receive any data on the data elements
and/or analytes requested (these cases were at both the state and system level).
Exhibit 1: List of Microbial and Disinfection Byproducts
Contaminants/Parameters Identified in SYR4 ICR for which Data Were Requested
from States
Disinfectants and Disinfection Byproducts Rules (D/DBPRs)
Total Trihalomethanes (TTHMs):
Haloacetic Acids 5 (HAA5):
Bromate
Chloroform
Monochloroacetic acid
Chlorite*
Bromodichloromethane
Dichloroacetic acid
Chlorine*
Dibromochloromethane
Trichloroacetic acid
Chloramines*
Bromoform
Bromoacetic acid
Chlorine dioxide
Dibromoacetic acid
Total Coliform Rule (TCR) and Revised Total Coliform Rule (RTCR)
Total coliforms
Fecal coliforms
Escherichia coii (E. coii)
Surface Water Treatment Rules (SWTRs)
Chlorine**
Cryptosporidium ***
Heterotrophic Plate Count (HPC)
Chloramines**
Giardia iambiia
Filter Backwash Recycling Rule (FBRR)
No specific occurrence data collected.
*As a maximum disinfectant residual level (MDRL). Chlorine and chloramines are reported as free chlorine and total chlorine,
respectively.
** As a minimum disinfectant residual level. Chlorine and chloramines are reported as free chlorine and total chlorine, respectively.
***The monitoring data from Round 2 under Long- Term 2 Enhanced Surface Water Treatment Rule (LT2), is being reviewed and
will be available along with the SYR4 results.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
2-1
August 2022
-------
Exhibit 2: Data Elements Requested by EPA for the Fourth Six-Year Review1
Data Category
Description
System-Specific Information
Public Water System
Identification Number
(PWSID)
The code used to identify each PWS. The code begins with the standard 2-character
postal state abbreviation or Region code; the remaining 7 numbers are unique to each
PWS in the state.
System Name
Name of the PWS.
Federal Public Water
System Type Code
A code to identify whether a system is:
Community Water System;
Non-transient Non-community Water System; or
Transient Non-community Water System.
Population Served
Highest average daily number of people served by a PWS, when in operation.
Federal Source Water
Type
Type of water at the source. Source water type can be:
Ground water; or
Surface water; or
Ground water under the direct influence of surface water (GWUDI) (Note: Some
States may not distinguish GWUDI from surface water sources. In those States, a
GWUDI source should be reported as a surface water source type.)
Treatment Information
Water System Facility
System facility data, including: treatment plant identification number, treatment plant
information, treatment unit process/objectives, facility flow, treatment train (train or flow
of water through treatment units within the treatment plant).
Filtration Type
Information relating to system filtration, including: filtration status, types of filtration
(e.g., unfiltered, conventional filtration, and other permitted values).
Treatment Technique
Information
Information pertaining to treatment processes. Types of treatment technique
information including: disinfectants used and their doses for primary and secondary
disinfection, coagulant/coagulant aid type and dose, disinfectant concentration,
disinfection profile/benchmark data, log of viral inactivation/removal, contact time,
contact value, pH, temperature.
Filter Backwash
Information
Information about filter backwash that is returned to the treatment plant influent (e.g.,
information on: recycle/schematic status, alternative return location, corrective action
requirements, and recycle flows and frequency).
Sample-Specific Information
Sampling Point
Identification Code
A sampling point identifier established by the state, unique within each applicable
facility, for each applicable sampling location (e.g., entry point to the distribution
system). This information enables occurrence assessments that address intra-system
variability.
Sample Identification
Number
Identifier assigned by state or the laboratory that uniquely identifies a sample.
Sample Collection Date
Date the sample is collected, including month, day, and year.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
2-2
August 2022
-------
Data Category
Description
Sample Type
Indicates why the sample is being collected (e.g., compliance, routine, repeat,
confirmation, additional routine samples, duplicate, special, special duplicate, etc.).
Sample Analysis Type
Code
Code for type of water sample collected.
Raw (Untreated) water sample
Finished (Treated) water sample
For TCR Repeats only; indicator of sampling location relative to sample point where
positive sample was originally collected:
Upstream
Downstream
Original
Contaminant
Contaminant name, 4-digit SDWIS contaminant identification number, or Chemical
Abstracts Service (CAS) Registry Number for which the sample is being analyzed.
Sample Analytical Result
- Sign
The sign indicates whether the sample analytical result was:
(<) "less than" means the contaminant was not detected or was detected at a level
"less than" the minimum reporting level (MRL).
(=) "equal to" means the contaminant was detected at a level "equal to" the value
reported in "Sample Analytical Result - Value."
(+) "positive result" (For RTCR data, only positive E. coli result sign to be included.)
Sample Analytical Result
- Value
Actual numeric (decimal) value of the analysis for the chemical results, or the MRL if the
analytical result is less than the contaminant's MRL.
(For the TCR and RTCR, TC and E. coli will indicate presence/absence, and positive E.
coli will have numeric results.)
Sample Analytical Result
- Unit of Measure
Unit of measurement for the analytical results reported (usually expressed in either |jg/L
or mg/L for chemicals; or pCi/l or mrem/yr for radiological contaminants).
(Not required for TCR and RTCR data)
Sample Analytical Method
Number
EPA identification number of the analytical method used to analyze the sample for a
given contaminant.
Source Water Monitoring
Information
Total organic carbon (TOC), including percent TOC removal, TOC removal summary,
pH, alkalinity, monitoring data entered as individual results or included in DBP (or
monthly operating report) summary records, alternative compliance criteria, results
from round 2 monitoring under LT2 ESWTR (including Cryptosporidium, E. coli,
turbidity, or state-approved alternate indicators).
Sample Summary Reports
Sample summaries for DBPRs, SWTRs, RTCR, GWR corrective actions, and the Lead
and Copper Rule (LCR) associated with analytical result records. Values used for
compliance determination [e.g., turbidity (combined effluent/individual effluent),
disinfectant residual levels in treatment plant and distribution system, treatment
technique information, HPC, etc.]
1 These are the data elements requested in the SYR4 ICR. Note that the "Data Category" and "Description" Columns were
intentionally descriptive rather than prescriptive. This allowed the states that do not use SDWIS/State flexibility to provide as much
information as possible. EPA accepted all data "as is" without prescribing structure or format.
About 78 percent of all states currently store and manage at least portions of their compliance
monitoring data and/or treatment technique information in the Safe Drinking Water Information
System/State Version (SDWIS/State). EPA developed SDWIS/State in collaboration with state
primacy agencies to manage drinking water information and provide a common structure for the
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
2-3
August 2022
-------
development of reusable components and shared applications. The SDWIS/State structure is
flexible enough to support the most complex primacy agency program implementation while
maintaining a common core of data elements required for reporting to SDWIS/Fed. In an attempt
to make the SYR4 data submittal process as easy for states as possible, EPA developed a
SDWIS/State Extract Tool (also referred to as "extraction tool" throughout this document),
which enabled to run a customized query to pull the requested data from a SDWIS/State database
maintained by those states. All of the primacy agencies using SDWIS/State that submitted data
to EPA for SYR4 used the extraction tool to extract and compile the EPA-requested compliance
monitoring and treatment technique data.
SDWIS/State supports the eDWR (Electronic Drinking Water Report) XML Schema used by
laboratories throughout the nation to electronically report sample analytical results as structured
data to SDWIS/State. As a result, primacy agencies receive high quality data from laboratories
that is batch-processed into SDWIS/State rather than manually entered. Consequently, states
have a substantial amount of high-quality structured data available in SDWIS/State. In all, for
SYR4, 46 states and 13 other primacy agencies provided compliance monitoring data and
treatment technique information that included parametric records. The seven states/primacy
agencies that did not provide any SYR4 data were Georgia, Michigan, Mississippi, New Mexico,
Guam, Puerto Rico, and U.S. Virgin Islands.
Exhibit 3 lists the states that did submit SYR4 data and indicates whether or not they used the
extraction tool. Thirty-five states, Washington D.C, and six regional tribal entities used the
extraction tool to extract all or some of their data; therefore, those datasets were all submitted in
a similar format. The 17 states/entities not using SDWIS/State submitted their compliance
monitoring data and treatment technique information "as is," resulting in a variety of formats,
including dBase, MS Excel, XML, MS Access, and comma-delimited. With the exception of two
states whose data were downloaded from their publicly available website (California and
Florida), all states submitted their data over the Internet via EPA's Central Data Exchange.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
2-4
August 2022
-------
Exhibit 3: Summary of States and Other Entities that Provided Compliance
Monitoring Data and Treatment Technique Information for SYR4
State/Entity Name
Alabama
Maine
Region 5 tribes
Alaska
Maryland
Region 6 tribes
Arizona
Missouri
Region 7 tribes
Arkansas
Montana
Region 8 tribes
Connecticut
Nebraska
Region 10 tribes
Delaware
Nevada
Rhode Island
Hawaii
New Jersey
South Carolina
States/Tribes that DID use the
Idaho
New York
Texas
SDWIS/State Extract Tool
Illinois
North Carolina
Utah
Indiana
North Dakota
Vermont
Iowa
Ohio
Virginia
Kansas
Oklahoma
Washington D.C
Kentucky
Oregon
West Virginia
Louisiana
Region 4 tribes
Wyoming
American Samoa
Minnesota
Region 9 tribes
California1
Navajo Nation
South Dakota
Colorado
New Hampshire
Tennessee
States/Tribes that DID NOT
Commonwealth of the
Pennsylvania
Washington
use the SDWIS/State Extract
Northern Mariana Islands
Region 1 tribes
Wisconsin
Tool
Florida1
Region 2 tribes
Massachusetts
Georgia
Mississippi
Puerto Rico
Guam
New Mexico
U.S. Virgin Islands
States/Tribes that DID NOT
Michigan
submit any SYR4 data
1 CA and FL compliance monitoring and treatment technique information was extracted from a publicly available website
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
2-5
August 2022
-------
Chapter 3 Data Management
This section provides descriptions of the data management tasks that were used to prepare the
SYR4 datasets for QA/QC review. The SDWIS/State Extract Tool pulled the SDWIS/State data
into Microsoft Access. Data from states that did not use the SDWIS/State Extraction tool were
restructured into a format similar to the SDWIS/State Extraction tool's output. The two groups of
datasets (the extract states and the non-extract states (referred to for the remainder of this
document as the "SDWIS states" and the "non-SDWIS states," respectively) were managed
separately, ultimately getting all datasets into the same format.
A status documentation file was maintained that included information for each state.
Specifically, the status documentation described the state datasets received as well as the date
received, file type, whether the extraction tool was used and the date range of the data. The status
documentation also described any state-specific notes, issues or concerns. Upon receipt of each
state dataset, EPA created state-specific directories for each raw dataset. Original datasets were
saved and maintained exactly as received and stored in EPA database. Any subsequent changes
to a state's dataset were made to a copy of the original dataset and all changes were documented.
3.1 Review of SYR4 Dataset Content
Similar to prior rounds of the Six-Year Review, the first assessment of the submitted SYR4
datasets sought to verify that all of the necessary data elements were included in each state
dataset. This review included a comparison of the data elements requested in the state letter,
specifically those necessary for the SYR4 analyses, to the entire list of data elements included in
each state's dataset. Although data dictionaries were not necessary for the review of data from
the SDWIS states, these files (and any other available supporting information provided by the
states) were useful interpreting the data submitted by the non-SDWIS states. Supporting
information included descriptions of the sampling efforts provided in emails from the state,
additional information on acronym definitions, and more.
Data dictionaries and supporting information were reviewed for definitions of the various data
elements, row and column headings, codes, and acronyms. If fields were missing or not
recognizable, EPA included a question to the state in their "flagged record report" email.
"Flagged record reports" were detailed reports sent via email to each state that identified records
of potential data quality concern. In addition, questions on data completeness, statewide waivers,
and any other unique factors within the state's dataset were included. In addition, many of the
non-SDWIS states submitted datasets with more data elements than necessary. In those cases,
EPA determined which data elements were and were not specific to the SYR4 data request.
EPA also confirmed that all of the requested contaminants from the SYR4 ICR were included in
each state dataset. As a first step for the non-SDWIS states, EPA reviewed the CHEMIDs (i.e.,
four-digit SDWIS codes) and/or contaminant names within each state's dataset. Many states
included only CHEMIDs or contaminant names. A few other states only included CAS numbers
or state-specific codes. EPA populated missing information using a variety of sources including a
list of SDWIS codes from the SDWIS/Fed database as well as the ChemlDPlus website (if only
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
3-1
July 2022
-------
CAS numbers were included). Nine of the non-SDWIS states submitted at least some data for a
contaminant or contaminants for which a four-digit SDWIS code could not be determined. Other
times, the state appeared to be using an incorrect four-digit SDWIS code for a particular
contaminant. EPA compiled a list of questions for states related to issues such as missing
contaminants or undetermined CHEMIDs to be included in the "flagged record reports." States
were asked questions such as if there was a statewide waiver for missing contaminants, if certain
contaminant data were stored in a separate database, or if there had been a typographical error
with a particular CHEMID.
Sample collection dates were reviewed to ensure that there were not any inconsistent dates
reported (e.g., data from the year 1900). If there were suspicious/incorrect sample collection
dates included, EPA tried to use other data elements to provide insight on the correct date (e.g.,
"analyzed date"). If the correct date could not be determined, EPA included a question for the
state in its "flagged record report" and either states followed up with EPA or EPA followed up
with states.
3.2 Restructuring Non-SDWIS State Data
Datasets received from the non-SDWIS states were restructured through a series of Microsoft
(MS) Access queries into a format similar to the data structure of the data from the SDWIS states
to allow for the construction of a unified database for the SYR4 national contaminant occurrence
analyses. As a first step in this process, EPA identified the data structure of each non-SDWIS
state dataset to plan the best method for conversion to the final database structure.
Prior to populating the SYR4 ICR database, EPA standardized the data reported by each non-
SDWIS state to reflect the appropriate SDWIS codes. For example, in the source water type field
(i.e., "DFEDPRIMSRCCD"), all instances of "surface water" or "S" were changed to
"SW " In the system type field (i.e., "D_PWS FED TYPE CD"), all instances of "CWS" or
"community" were changed to "C" for community water systems. All PWSIDs had to be put in
the federal format of the two-character postal state abbreviation or Region code followed by a
seven-digit number, unique to each PWS in the state.
After the various state-specific formatting and transformations were completed, EPA imported
all non-SDWIS datasets into Access to ultimately merge with the SDWIS/State data sets in
Oracle, a database storing all SYR4 data. In some cases, EPA imported only the data elements
identified as essential to the occurrence analysis. Upon completion, EPA compared all
transformed state datasets to the original datasets to ensure all data were accurately converted.
Furthermore, EPA saved a record of the procedures used to map the state datasets to the SYR4
ICR database. All queries were created and saved in Access to document the transformation,
ensuring that this process is reproducible.
3.3 Establishing Consistent Data Fields for Analytical Results (SDWIS and Non-SDWIS
States)
EPA structured the sample analytical result sign, sample analytical result value, and sample
analytical result unit of measure into a consistent format to prepare the data for occurrence
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
3-2
July 2022
-------
analysis. EPA conducted this step prior to reviewing the data for potential outliers. Many of the
state datasets included analytical results signs (e.g., "<" for non- detections or "=" for
detections), detection limits and analytical results data in multiple fields. EPA added a
"DETECT" field to the SYR4 ICR dataset to identify the results sign and to more easily conduct
analyses. Wherever the analytical result was greater than zero and the result sign indicated a
detection, then DETECT was set equal to 1, representing a detection. When the analytical result
was equal to zero and/or the result sign indicated a non-detection, then DETECT was set equal to
0 (i.e., a non-detect).
EPA received data with various units of measure. It was important that all data for each
individual contaminant be expressed in a single unit to facilitate analysis. For this analysis, EPA
converted all data for trihalomethanes (THMs) and haloacetic acids (HAAs) to |ig/L. All records
with missing or unusual units in the SYR4 ICR dataset were sent back to states for input as part
of flagged reports mentioned earlier.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
3-3
July 2022
-------
Chapter 4 Data Quality Assurance and Quality Control
After EPA converted the state datasets into a consistent format, a significant effort was
undertaken to ensure the quality of the data submitted. Data quality, completeness, and
representativeness were key considerations for the dataset. Given the size, scope, and variety of
formats of the datasets received from the states, EPA conducted extensive data management and
QA/QC evaluation on the data to be included in the SYR4 ICR dataset. This QA/QC evaluation
involved the assessment of data ranging in quality across the different contaminants and different
states. This chapter includes a summary description of the QA/QC measures that were conducted
on the state datasets for all SYR4 data which includes the MDBP data.
4.1 Quality Assurance Measures Applied to All Contaminants
Before analyzing contaminant occurrence, EPA performed a rigorous QA/QC evaluation of the
data from each state (for both SDWIS and non-SDWIS state users). When necessary, EPA sent
emails to states, asking specific questions about its dataset. Question topics included descriptions
of non-intuitive data element names, definitions of field headings, or non-standard codes that
were not described in any documentation files from the state. EPA also confirmed that all of the
requested contaminants were included in each state dataset. When a state was missing data for
any of the contaminants, EPA asked the state to identify the reason for the omission, such as a
state-wide waiver of the requirement to monitor for the contaminant(s). Information provided by
states was documented and kept as a record.
Exhibit 4 lists the system types that are required to sample for the MDBP contaminants. All data
that passed the QA/QC process from these systems were included in the SYR4 datasets. Data
from systems that were not required to sample for a given contaminant were excluded from the
SYR4 datasets.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
4-1
July 2022
-------
Exhibit 4: Contaminant Group Monitoring Requirements
Contaminant
Group
System Types Required to Sample (sample
data included in analyses)
System Types Not Required to
Sample (sample data excluded
from analyses)
Disinfection
Byproducts and
disinfectant
residuals
Stage 1 and Stage 2 DBP Rules: All community water
systems and non-transient noncommunity water
systems that add a disinfectant other than ultraviolet
(UV) light or deliver disinfected water, and transient
non-community water systems that add chlorine
dioxide.
Community water systems and
non-transient noncommunity
water systems that do not add a
disinfectant other than UV light,
as well as transient non-
community water systems that
add a disinfectant other than
chlorine dioxide.
Microbial
Contaminants and
disinfectant
residuals
Groundwater Rule (GWR): The GWR applies to all
public water systems that use ground water, including
consecutive systems, except that it does not apply to
PWSs that combine all of their ground water with
surface water or with ground water under the direct
influence of surface water prior to treatment.
Surface Water Treatment Rules (SWTRs): The
SWTRs apply to all public water systems that use
surface water or ground water under direct influence
of surface water.
Revised Total Coliform Rule (RTCR): The RTCR
applies to all public water systems.
None.
EPA created several automated data QA checks within the SYR4 ICR dataset. These QA checks
identified (or "flagged") records of potential data quality concerns. EPA sent out a detailed
report to each state describing their flagged records called a "flagged records report." These
reports included the counts of flagged records by category, as well as specific questions related to
each of these categories. In addition, an attachment identified the specific records that were
flagged. EPA requested that each state provide the appropriate disposition (delete, make
corrections, etc.) of these flagged records. EPA documented all changes made to the compliance
monitoring data and suggested to the states that they make corrections in their data system as
well, if appropriate. To resolve data quality issues that required significant corrections to the raw
data, such as identifying outliers or identifying and changing incorrect units, consultations with
state data management staff were conducted or attempted before data corrections were
completed.
The following sections provide a description of the various QA measures applied to the entire
SYR4 dataset that were used to identify records of potential data quality concern. For all flagged
records, input from states was always considered as the initial criteria in deciding on the
appropriate action or decision to include or exclude the record from analysis. When states did not
provide a response or action, EPA used best professional judgement on whether to include or
exclude the data in question. When a determination was made to exclude records from the
occurrence analyses, a code was added to the "transaction table" in the database to indicate that
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
4-2
July 2022
-------
the record should not be included in the analyses. This code could be changed if EPA were to
revise their decision about excluding/including particular records for occurrence analyses.
Section 4.1.1 through 4.1.5 describe the QA measures that were applied to the entire database
(i.e., were relevant to all regulated contaminant monitoring data in the SYR4 ICR dataset).
Exhibit 5 provides a visual for the overall flow of the QA/QC process for QA measures applied
to all SYR4 contaminants. Additional QA/QC measures applied to specified groups of
contaminants are included in Chapter 5 (DBPs and DBP related parameters) and Chapter 6
(microbial contaminants).
Exhibit 5: Flow Chart of QA Measures Applied to All SYR4 Contaminants
Is the record from a non-public watersystem?
yes
Exclude from analysis.
no
Is the record from a system with missing inventory info
yes
(e.g., source watertype and population served information)?
no
yes
Is the record from outside of the SYR4 date range (2012-2019)?
no
yes
Is the record marked as being
"not for compliance"?
Exclude from analysis,
Exclude from analysis.
Exclude from analysis.
Move onto next phase of QA review
4.1.1 Non-Public Water Systems
Some primacy agencies require water systems that do not meet the criteria to be classified as
public water systems to submit sample results that are "routine" or "for compliance." The
primacy agency's information system usually identifies these water systems as "non-public" or
uses another method to differentiate them from public water systems. All records from non-
public water systems were excluded. The records that were included were from systems that
classify as PWSs by definition, or systems that identify as a PWS, e.g., wholesale systems.
4.1.2 Systems with Missing Inventory Data
For some of the non-SDWIS states, there were systems for which the inventory information was
missing (e.g., no source water type or no population served). When inventory data were
incomplete or missing, the missing data were populated with data from the SDWIS/Fed data
from the fourth quarter of December 2019. All cases where SDWIS/Fed data were used to
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
4-3
July 2022
-------
populate inventory data fields in the state's dataset were documented. Note that inventory
information may differ for a given system over time so the SDWIS (2019) fourth quarter data
may not fully match the actual inventory information at the time of sampling. All records from
systems whose inventory data were still missing after filling gaps with SDWIS/Fed were
excluded from the datasets.
4.1.3 Sample Results Collected Outside of the Date Range
The SYR4 ICR requested compliance monitoring data and treatment technique information from
January 1, 2012 through December 31, 2019. The extraction tool only pulled sample results from
this time period. However, some non-SDWIS states submitted sample results from outside of this
date range; all sample results collected outside of the date range were excluded from the datasets.
4.1.4 Non-Compliance
In some cases, water systems may submit sample results that are not used to determine
compliance with NPDWRs. States that use information systems with automated compliance
determination functions often use indicators to differentiate these sample results such as the
"compliance purpose indicator code" or something similar. While the extraction tool only pulled
compliance sample results, some non-compliance sample results were present in data from the
non-SDWIS states. There were a few non-SDWIS states for which EPA asked for more details
on how to accurately identify the sample results that were "for compliance." Three non-SDWIS
states (California, Colorado and Minnesota) did not make a designation as to whether their data
were for compliance. For all occurrence datasets, EPA assumed that all data from these three
states were for compliance and included in the datasets. All sample results flagged as "not for
compliance" were excluded from the dataset.
4.1.5 Uniform System Inventory Information
For analyses, each system must have a single source water type and population-served
designation to define each system in a unique source water type/population size strata. Systems
using both ground water and surface water, and systems using ground water under direct
influence of surface water, were considered surface water systems to include in datasets (note,
the number of systems that use different sources, disconnected from one another, are unknown.
This methodology used to designate source may underestimate the number of ground water
systems and overestimate the number of surface water systems). Systems with more than one
specified value of population served were assigned the population served value that occurred
most frequently within those years of data collected.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
4-4
July 2022
-------
Chapter 5 Quality Assurance Measures Applied to Disinfection
Byproducts and Disinfection Byproduct Related Parameters
In addition to the QA measures described in Chapter 4 that were applied to all contaminants,
there were several additional contaminant-specific QA measures applied to particular
contaminant data. In this way QA measures applied to DBP data will differ from those QA
measures applied to microbial contaminant data. The QA measures applied to DBPs and DBP
related parameters are described in this chapter.
Exhibit 6 presents a flow chart of these additional QA measures for DBPs and DBP related-
parameters.
Exhibit 6: Flow Chart of Additional QA Measures Specific to DBPs and DBP
Related Parameters
After applying the various QA measures to nearly 12 million SYR4 ICR records for the DBPs
and DBP related parameters, 96 percent of the records from 58 states and primacy agencies
remained in the final dataset. Exhibits 7 documents the specific counts of DBP records included
and excluded in each QA step. Exhibit 7 includes records for the following DBP contaminants:
TTHM, bromoform, chloroform, dibromochloromethane, bromodichloromethane, HAA5,
dibromoacetic acid, dichloroacetic acid, bromoacetic acid, monochloroacetic acid, trichloroacetic
acid, bromate, chlorite and DBP Related Parameters: pH, alkalinity, and total organic carbon
(TOC).
Data Management OA/OC Process
for the SYR4 MDBP Preliminary Datasets
5-1
August 2022
-------
Exhibit 7: Summary of the Count of Analytical Sample Results Removed via the
QA Measures Applied to DBP Rule Contaminants1
QA Step
Count of Records
Included
Excluded
Original number of analytical sample results
11,755,299
Step 1: Removal of analytical sample results from non-public water systems.
11,754,859
440
Step 2: Removal of data from systems with missing source water type and/or
population served information.
11,748,860
5,999
Step 3: Removal of data with a sample collection date outside of the Six-Year 4
date range of 2012 - 2019.
11,717,184
31,676
Step 4: Removal of data marked as being "not for compliance."
11,700,871
16,313
Step 5: Removal of DBP data with sample type code other than "RT" (routine),
"CO" (confirmation), "DS" (distribution system), or "MR" (max. residence).
11,671,157
29,714
Step 6: Removal of records marked as potential duplicates, along with a state
response saying that one set of the duplicate results should be excluded.
11,652,715
18,442
Step 7: Removal of DBP data with detected concentrations with non-
standard/blank unit of measure for the contaminant.
11,651,996
719
Step 8: Removal of detected concentrations greater than 100*MCL or less than
1/100*MDL for the contaminant. ForTOC, removal of detections >100xMCL.
11,651,791
205
Step 9: Removal of DBP records sampled outside of the distribution system or
entry point to the distribution system.
11,229,596
422,195
Step 10: Removal of records with no data/results
11,229,589
7
Step 11: Removal of records with irregular system type codes (specific to State of
PA where unknown system type codes were included)
11,228,599
990
Final number of records
11,228,599
Percent Included
96%
1 This table includes records for the following contaminants: TTHM, bromoform, chloroform, dibromochloromethane,
bromodichloromethane, HAA5, dibromoacetic acid, dichloroacetic acid, bromoacetic acid, monochloroacetic acid, trichloroacetic
acid, bromate, chlorite, pH, alkalinity, and total organic carbon.
5.1 Non-Routine Samples
Some primacy agencies have regulations that are more stringent than the NPDWRs and require
water systems to submit more sample results than federally required. Primacy agencies also may
require laboratories to report all sample results from water systems including results from
contaminants that are not regulated. Usually, non-routine sample results that are specifically
listed as "special request" in the database are also identified as being "non-compliance" samples.
Most other types of non-routine sample results, such as confirmation, repeat or maximum
residence time sample results are considered as "for compliance." While the extraction tool
excluded sample results that were "not for compliance," some "special" sample results that were
marked as being "for compliance" were included in the data extracted from SDWIS states. In
addition, "non-routine / not for compliance" results were present in data from the non-SDWIS
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
5-2
August 2022
-------
states. All DBP results that were marked as routine ("RT"), confirmation ("CO"), or maximum
residence ("MR") were included in the DBP dataset.
5.2 Duplicate Records
In the analysis of DBPs and DBP related parameters data, potential duplicates were identified as
all detection records with the same PWSID, Sample Point ID, analyte, sample collection date,
and concentration. All records identified as potential duplicates were retained in the occurrence
dataset unless the state responded to indicate that records were indeed duplicates and should be
excluded from the dataset.
5.3 Units of Measure
EPA identified all detection records for the DBPs, TOC, and alkalinity where the units of
measure reported were not one of the standard units used for the particular contaminant (i.e., not
equal to "mg/L" or "|ig/L"). For example, a chloroform record with a unit of measure listed as
"NTU" would be flagged. All records in non-standard units were excluded from the occurrence
dataset unless there was strong evidence of the correct standard unit to use (e.g., state response
indicating the correct unit of measure, obvious data entry error, concentration is within the range
of standard units and all other records from the state are reported in the standard units).
5.4 Potential Outliers
To identify potential high outliers, EPA flagged all detected concentrations for the DBP rule
contaminants that were greater than four times the contaminant's MCL and all detected
concentrations that were greater than ten times the contaminant's MCL. Any concentration
identified in the greater than 10 times the MCL would be captured in the greater than 4 times
MCL and then followed up with the state about them. To identify potential low outliers, EPA
flagged all detected concentrations that were less than one-tenth the minimum MDL. Exhibit 8
provides a list of all relevant MCL values. Note that for total organic carbon (TOC) (not listed in
Exhibit 8) all results greater than 100 mg/L were excluded from TOC data file.
EPA included questions to the state on each of these potential high and low outliers in their
"flagged record report." Any changes suggested by the states were implemented for these
records. For example, some states wrote back to say there were "no errors" in their high detect
concentrations or that they had "no reason or evidence to show these data to be invalid." Other
states stated that "all of the high results were due to using mg/L when they should have been
|ig/L." For the states that did not respond, all detected DBP concentrations greater than 100 times
the contaminant's MCL were excluded from the dataset. No low-end cut-off was applied for the
DBP data. All other potential outliers less than or equal to 100 times the contaminant's MCL
were included in the datasets. The value of 100 times the MCL was chosen as a conservative
high-end cut-off. For example, a TTHM detected concentration of 10,000 ug/L was excluded as
it was assumed a data entry error.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
5-3
August 2022
-------
Exhibit 8: List of DBP MCL Values
Contaminant
Maximum Contaminant Level (MCL)
Value
Unit of Measure
Chloroform
801
hq/l
Bromoform
801
hq/l
Bromodichloromethane
00
o
pg/L
Dibromochloromethane
801
pg/L
Total Trihalomethanes (TTHM)1
80
pg/L
Monochloroacetic Acid
602
pg/L
Dichloroacetic Acid
602
pg/L
Trichloroacetic Acid
602
pg/L
Bromoacetic Acid
602
pg/L
Dibromoacetic Acid
602
pg/L
Haloacetic acids 5 (HAA5)
60
pg/L
Bromate
10
pg/L
Chlorite
1,000
pg/L
1 The MCL for total trihalomethanes is 80 ng/L but the individual trihalomethane results were also compared against that MCL to
identify potential outliers.
2 The MCL for the sum of five haloacetic acids is 60 ng/L but the individual haloacetic acid results were also compared against that
MCL to identify potential outliers.
5.5 Locational Flag
While the occurrence of DBPs could theoretically occur anywhere in a given water system, EPA
is primarily focused on the occurrence in the distribution system. As such, EPA excluded any
DBP records with a location sampling point type that was not obviously a part of the distribution
system or entry point to the distribution system, such as sampling results from raw or source
waters. Specifically, the following location sampling point types were not flagged for exclusion:
"DS" (distribution system), "EP" (entry point), "FC" (first customer), "FN" (finished), "LD"
(lowest disinfectant residual), "MD" (midpoint of distribution system), or "MR" (maximum
residence time). For records whose sampling point location type was either null or labeled as a
generic "Water System Facility Point," an additional filter was added to make sure any records
with a water system facility type that was likely associated with the distribution system were not
excluded. Specifically, the following facility type codes were not flagged for exclusion when the
sampling point type code was listed as "WS" (water system facility point) or null: "CC"
(consecutive connection), "DS" (distribution system), "TM" (transmission main), or "TP"
(treatment plant).
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
5-4
August 2022
-------
Chapter 6 Quality Assurance Measures Applied to Microbial
Contaminants
In addition to the QA measures described in Chapter 4, there were a handful of additional QA
measures applied to only the microbial contaminants. Those QA measures are described in this
chapter. Exhibit 9 is a flow chart of the additional QA measures applied to the microbial
contaminants.
Exhibit 9: Summary of the Count of Records Removed via the QA Measures
Applied to Microbial Rule Contaminants
Exhibit 10 documents the specific counts of microbial records included and excluded in each QA
step. After applying the various QA measures to more than 28 million SYR4 ICR microbial
records, 99 percent of the records from 57 states and primacy agencies remained in the final
dataset for use of analyses.
Exhibit 10: Summary of the Count of Analytical Samples Results Removed via the
QA Measures Applied to Microbial Rule Contaminants1
QA Step
Count of Records
Included
Excluded
Original number of analytical samples results
28,329,039
Stepl: Removal of analytical sample results from non-public water systems.
28,315,533
13,506
Step 2: Removal of data from systems with missing source water type and/or
population served information.
28,236,298
79,235
Step 3: Removal of data with a sample collection date outside of the Six-Year 4
date range of 2012 - 2019.
28,114,841
121,457
Step 4: Removal of data marked as being "not for compliance."
27,985,027
129,814
Step 5: Removal of microbial data with sample type code other than "RT" (routine),
"RP" (repeat), or'TG" (triggered).
27,981,035
3,992
Step 6: Removal of records with no data/results
27,964,042
16,993
Step 7: Removal of records with irregular system type codes (specific to State of
PA where unknown system type codes were included)
27,962,474
1,568
Data Management OA/OC Process
for the SYR4 MDBP Preliminary Datasets
6-1
August 2022
-------
OA Step
Count of Records
Included Excluded
Final number of records
27,962,474
Percent Included
99%
1 The following analytes are included in the counts above: Total coliform, Fecal coliform, E. coli, Cryptosporidium, Giardia lamblia,
Enterococci, and coliphage.
6.1 Non-Routine Samples
Some primacy agencies have regulations that are more stringent than the NPDWRs and require
water systems to submit more sample results than federally required. Primacy agencies also may
require laboratories to report all sample results from water systems including results from
contaminants that are not regulated. Usually, non-routine sample results that are specifically
listed as "special request" in the database are also identified as being "non-compliance" samples.
Most other types of non-routine sample results, such as confirmation, repeat or maximum
residence time sample results are "for compliance." While the extraction tool excluded sample
results that were "not for compliance," some "special" sample results that were marked as being
"for compliance" were included in the data extracted from SDWIS states. In addition, "non-
routine / not for compliance" results were present in data from the non-SDWIS states. These data
were flagged and inquired to the states. All results that were marked as routine ("RT"), repeat
("RP"), or triggered ("TG") were included in the occurrence datasets for the microbial
contaminants.
6.2 Pairing Disinfectant Residual and Coliform Results for non-SDWIS states
Per requirements under the SWTR, surface water systems need to monitoring disinfectant
residuals at the same locations and time as for routine TC under TCR/RTCR. Thus, the TC/EC
datasets generally also contain paired disinfectant residual monitoring records. However, two
non-SDWIS states, Wisconsin and Pennsylvania, submitted disinfectant residual concentration
data as independent records not paired with total coliform (TC) samples. To enable evaluation of
disinfectant residual concentrations versus TC positivity rates, EPA paired the residual chlorine
data with the associated TC result. EPA paired the two sets of results based on the sample
collection date, sample point ID, and lab assigned ID. Using a combination of two approaches,
roughly 31 percent of Wisconsin and Pennsylvania's TC records were paired with free chlorine
residuals, while around 5 percent were paired with total chlorine residuals. This method enabled
more than 410,000 TC records to be paired with free chlorine residuals. In addition, more than
54,000 TC records were paired with total chlorine residuals. In an effort to pair more results,
EPA applied a secondary approach to the remaining unpaired records which omitted the lab
assigned ID as a necessary "join" field. This pairing effort enabled an additional 97,000 TC
records to be paired with free chlorine residuals. Additionally, nearly 33,000 TC were paired
with total chlorine residuals.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
6-2
August 2022
-------
6.3 Updates to Absence and Presence Codes
Under the SYR4 ICR, some microbial records (total coliform, E. coli, and fecal coliform) were
submitted without a presence indicator code (i.e., indicating whether the result was absent ("A")
or present ("P")) but with a value in the measured concentration field (specifically, the
CONCENTRATION MSR field). EPA updated nearly 4 million microbial records with a null
presence absence code and a concentration of zero to set the presence absence code equal to "A"
In addition, EPA updated nearly 60,000 microbial records with a PRESENCE IND CODE of
null to "P" when the concentration was greater than zero, indicating the presence of the microbe.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
6-3
August 2022
-------
Chapter 7 References
United States Environmental Protection Agency (USEPA). 2016. Six-Year Review 3 Technical
Support Document for Disinfectants/Disinfection Byproducts Rules.
USEPA. 2019. Information Collection Request Submitted to OMB for Review and Approval;
Comment Request; Contaminant Occurrence Data in Support of the EPA's Fourth Six-Year
Review of National Primary Drinking Water Regulations: October 31, 2019, Volume 84,
Number 211, Page 58381-58382.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
7-1
August 2022
-------
The Data Management and Quality
Assurance/Quality Control Process for
EPA's Fourth Six-Year Review's
Microbial and Disinfection Byproduct
Datasets: Appendices
-------
Appendix A: Data request letter EPA sent June 3,2020 contacting
each Primacy Agency to request voluntary submission of its
compliance monitoring data and treatment technique information
for regulated chemical, radiological, and microbiological
contaminants
» '
Q
$
V
PRO^
UNITED STATES
'ro
ENVIRONMENTAL
1 5
PROTECTION AGENCY
I <3
WASHINGTON, D.C. 20460
V
OFFICE OF WATER
State Drinking Water Administrators
Association of State Drinking Water Administrators
1401 Wilson Blvd# 1225
Arlington, VA 22209
Dear State Drinking Water Administrator,
The 1996 Safe Drinking Water Act Amendments require the U.S. Environmental
Protection Agency (EPA) to review and revise, if appropriate, existing National Primary
Drinking Water Regulations (NPDWRs) at least every six years (i.e., the Six-Year Review). The
Agency is currently preparing for the fourth round of the Six-Year Review (Six-Year Review 4).
As was done for the third Six-Year Review, the EPA is contacting each primacy agency
(hereinafter referred to as "state") and requesting voluntary submission of its compliance
monitoring data and treatment technique information for regulated chemical, radiological, and
microbiological contaminants. We are requesting compliance monitoring data collected between
January 2012 and December 2019. The Office of Management and Budget (OMB) has approved
the information collection request for the EPA's fourth Six-Year Review under the provisions of
the Paperwork Reduction Act, 44 U.S.C. 3501 et seq., and has assigned OMB control number
2040-0298.
These data are an important component in supporting the EPA's Six-Year Review of
NPDWRs. We are encouraging each state to submit its contaminant monitoring and treatment
technique information because these data will contribute directly to the EPA's understanding of
national contaminant occurrence, treatment technique information, the population exposed to
regulated contaminants, and exposure reductions associated with the current regulations. The
EPA is requesting your voluntary submission by September 30, 2020.
Data Management OA/OC Process
for the SYR4 MDBP Preliminary Datasets
A-l
August 2022
-------
The EPA is requesting only data that are currently stored electronically (no paper
records), including both detection and non-detection results for compliance monitoring and
treatment technique information. Exhibit 1 of the attachment provides a list of the regulated
contaminants for which the EPA is requesting data. Exhibit 2 presents critical data elements
needed for each sample result. To make your voluntary reporting as easy as possible, your state
can transmit its compliance monitoring data set to the EPA using the same process your state
currently uses to submit your SDWIS data quarterly. The attachment also answers questions
about how the data will be transferred, managed, and used and provides some background
information about why we are requesting these data.
In our previous Six-Year Review data collections, we have worked closely with state data
managers to answer questions and facilitate data transfer. Soon after June 30, 2020 we will begin
contacting data managers and coordinating directly with them by phone and/or email.
Thank you for your consideration of this request. Many of you voluntarily submitted your
data for the Six-Year Review 3. We appreciated your participation and hope you will do so
again. If you have any questions about this request or the intended uses of the data, please
contact Lili Wang, Associate Chief, Standards and Risk Reduction Branch, at wang.lili@epa.gov
or Nicole Tucker, Six-Year Review 4 Team Lead, at tucker.nicole@epa.gov.
Sincerely,
Jennifer L. McLain, Director
Office of Ground Water and Drinking Water
Enclosure: Attachment
cc: Regional Water Division Directors
Regional Drinking Water Branch Chiefs
Tribal Direct Implementation Contacts
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
August 2022
-------
ATTACHMENT
I. Details Regarding EPA's Request for Contaminant Monitoring Data
A. What regulated contaminants are included in this request?
EPA is requesting compliance monitoring information for chemical, radiological, and
microbiological contaminants, as was requested under past Six-Year Reviews. Exhibit 1, below,
lists the specific contaminants for which EPA is requesting monitoring data. EPA will work with
you to make the data transfer as easy as possible. Voluntary submission of your regulated
drinking water contaminant monitoring and treatment technique data is the most critical step in
this national occurrence assessment for the Six-Year Review 4.
B. What specific data are being requested and what timeframe should the data cover?
EPA is requesting the voluntary submission of compliance monitoring data for regulated
chemical, radiological, and microbiological contaminants (Exhibit 1) collected between January
2012 and December 2019. This request only includes those data that you have stored in
electronic format. The requested data include routine compliance monitoring samples (including
repeat and confirmation samples) and treatment technique data. Please include all results for both
analytical detections and non-detections.
Exhibit 2 lists the data elements that are likely to be captured as part of your facility and
treatment data, and likely to be in your compliance monitoring database. We encourage you to
send us your data even if you feel that your data set is incomplete.
Kxhihil 1: Occurrence Data Requested
Chemical Contaminants (Phase I, II, IIB, and VRules; Arsenic Rule; Lead and Copper Rule)
Acrylamide
1,1 -Dichloroethy lene
Methoxychlor
Alachlor
cis-1,2-Dichloroethylene
Monochlorobenzene
(Chlorobenzene)
Antimony
trans-1,2-Dichloroethylene
Nitrate (as N)
Arsenic
Dichloromethane (Methylene
chloride)
Nitrite (as N)
Asbestos
1,2-Dichloropropane
Oxamyl (Vydate)
Atrazine
Di(2-ethylhexyl) adipate (DEHA)
Pentachlorophenol
Barium
Di(2-ethylhexyl) phthalate (DEHP)
Picloram
Benzene
Dinoseb
Poly chlorinated biphenyls (PCBs)
Benzo[a]pyrene
Diquat
Selenium
Beryllium
Endothall
Simazine
Cadmium
Endrin
Styrene
Carbofuran
Epichlorohydrin
2,3,7,8-TCDD (Dioxin)
Carbon tetrachloride
Ethylbenzene
Tetrachloroethylene
Chlordane
Ethylene dibromide (EDB)
Thallium
Chromium (total)
Fluoride
Toluene
Copper
Glyphosate
Toxaphene
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
A-3
August 2022
-------
Kxhihil 1: Occurrence Requested
Cyanide
Heptachlor
2,4,5-TP (Silvex)
2,4-D
Heptachlor epoxide
1,2,4-Trichlorobenzene
Dalapon
Hexachlorobenzene
1,1,1 -Trichloroethane
1,2-Dibromo-3-chloropropane
(DBCP)
Hexachlorocyclopentadiene
1,1,2-Trichloroethane
1,2-Dichlorobenzene
(o-Dichlorobenzene)
Lead
Trichloroethylene
1,4-Dichlorobenzene
(p-Dichlorobenzene)
Lindane
Vinyl chloride
1,2-Dichloroethane (Ethylene
dichloride)
Mercury (inorganic)
Xylenes (total)
Radiological Contaminants
Combined Radium-226/228; and
Radium-226 & Radium-228 (if
available)
Gross beta
Tritium
Iodine-131
Uranium
Gross alpha
Strontium-90
Total Coliform Rule (TCR) and Revised Total Coliform Rule (RTCR)
Total coliforms
Fecal coliforms
Escherichia coli (E. coli)
Disinfectants and Disinfection Byproducts Rules (DBPRs)
Total Trihalomethanes (TTHMs):
Chloroform
Bromodichloromethane
Dibromochloromethane
Bromoform
Haloacetic Acids (HAA5):
Monochloroacetic acid
Dichloroacetic acid
Trichloroacetic acid
Bromoacetic acid
Dibromoacetic acid
Bromate
Chlorite
Chlorine
Chloramines
Chlorine dioxide
Ground Water Rule (GWR)
Escherichia coli (E. coli)
Enterococci
Coliphage
Surface Water Treatment Rules (SWTRs)
Chlorine
Cryptosporidium
Heterotrophic Plate Count (HPC)
Chloramines
Giardia lamblia
Filter Backwash Recycling Rule (FBRR)
No specific occurrence data collected.
r.xhihil 2: Kc(|iicMc(l l);il;i ( alcjiorics
Data Category
Description
System-Specific Information
Public Water System
Identification Number
(PWSID)
The code used to identify each PWS. The code begins with the standard 2-character
postal state abbreviation or Region code; the remaining 7 numbers are unique to
each PWS in the state.
System Name
Name of the PWS.
Federal Public Water
System Type Code
A code to identify whether a system is:
Community Water System;
Non-transient Non-community Water System; or
Transient Non-community Water System.
Population Served
Highest average daily number of people served by a PWS, when in operation.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
A-4
August 2022
-------
Exhibit 2: Requested Data Categories
Federal Source Water
Type
Type of water at the source. Source water type can be:
Ground water; or
Surface water; or
Ground water under the direct influence of surface water (GWUDI) (Note: Some
States may not distinguish GWUDI from surface water sources. In those States, a
GWUDI source should be reported as a surface water source type.)
Treatment Information
Water System Facility
System facility data, including: treatment plant identification number, treatment
plant information, treatment unit process/objectives, facility flow, treatment train
(train or flow of water through treatment units within the treatment plant).
Filtration Type
Information relating to system filtration, including: filtration status, types of
filtration (e.g., unfiltered, conventional filtration, and other permitted values).
Treatment Technique
Information
Information pertaining to treatment processes. Types of treatment technique
information including: disinfectants used and their doses for primary and secondary
disinfection, coagulant/coagulant aid type and dose, disinfectant concentration,
disinfection profile/bench mark data, log of viral inactivation/removal, contact
time, contact value, pH, temperature.
Filter Backwash
Information
Information about filter backwash that is returned to the treatment plant influent
(e.g., information on: recycle/schematic status, alternative return location,
corrective action requirements, and recycle flows and frequency).
Sample-Specific Information
Sampling Point
Identification Code
A sampling point identifier established by the state, unique within each applicable
facility, for each applicable sampling location (e.g., entry point to the distribution
system). This information enables occurrence assessments that address intra-
system variability.
Sample Identification
Number
Identifier assigned by state or the laboratory that uniquely identifies a sample.
Sample Collection Date
Date the sample is collected, including month, day, and year.
Sample Type
Indicates why the sample is being collected (e.g., compliance, routine, repeat,
confirmation, additional routine samples, duplicate, special, special duplicate, etc.).
Sample Analysis Type
Code
Code for type of water sample collected.
Raw (Untreated) water sample
Finished (Treated) water sample
For lead and copper only:
Source
Tap
For TCR Repeats only; indicator of sampling location relative to sample point
where positive sample was originally collected:
Upstream
Downstream
Original
Contaminant
Contaminant name, 4-digit SDWIS contaminant identification number, or
Chemical Abstracts Service (CAS) Registry Number for which the sample is being
analyzed.
Sample Analytical Result
-Sign
The sign indicates whether the sample analytical result was:
(<) "less than" means the contaminant was not detected or was detected at a level
"less than" the minimum reporting level (MRL).
(=) "equal to" means the contaminant was detected at a level "equal to" the value
reported in "Sample Analytical Result - Value."
(+) "positive result" (For RTCR data, only positive E. coli result sign to be
included.)
Sample Analytical Result
- Value
Actual numeric (decimal) value of the analysis for the chemical results, or the MRL
if the analytical result is less than the contaminant's MRL.
(For the TCR and RTCR, TC and E. coli will indicate presence/absence, and
positive E. coli will have numeric results.)
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
A-5
August 2022
-------
Exhibit 2: Requested Data Categories
Sample Analytical Result
- Unit of Measure
Unit of measurement for the analytical results reported (usually expressed in either
|ig/L or mg/L for chemicals; or pCi/1 or mrem/yr for radiological contaminants).
(Not required for TCR and RTCR data)
Sample Analytical Method
Number
EPA identification number of the analytical method used to analyze the sample for
a given contaminant.
Minimum Reporting Level
(MRL) - Value
MRL refers to the lowest concentration of an analyte that may be reported.
(Not required for TCR and RTCR data)
MRL - Unit of Measure
Unit of measure to express the concentration value of a contaminant's MRL.
(Not required for TCR and RTCR data)
Source Water Monitoring
Information
Total organic carbon (TOC), including percent TOC removal, TOC removal
summary, pH, alkalinity, monitoring data entered as individual results or included
in DBP (or monthly operating report) summary records, alternative compliance
criteria, results from round 2 monitoring under LT2 ESWTR (including
Cryptosporidium, E. coli, turbidity, or state-approved alternate indicators).
Sample Summary Reports
Sample summaries for DBPRs, SWTRs, GWR corrective actions, and the Lead and
Copper Rule (LCR) associated with analytical result records. Values used for
compliance determination [e.g., turbidity (combined effluent/individual effluent),
disinfectant residual levels in treatment plant and distribution system, treatment
technique information, HPC, etc.l
1. For systems that are no longer required to individually monitor for nitrite, results should be reported for total
nitrate plus nitrite (expressed as N) as SDWIS Analyte Code 1038 in lieu of individual results for nitrite and nitrate.
C. How do I prepare my data for submission to EPA ?
We want to make this process as easy as possible for states that are volunteering to submit
monitoring and treatment technique data. EPA developed and refined a SDWIS/State extraction
tool, which runs a customized query to pull data for those using SDWIS/State. We believe this
would be the most efficient (i.e., easiest) method of data extraction for those states using some or
all of SDWIS/State. Currently, some states store and manage their data in more than one
database. If it is easier for you to provide the electronic data for all contaminants that are stored
in your data system, EPA can help you with a global extraction of the data. Please send inquiries
to SixYearData@cadmusgroup.com. All data will be transmitted to EPA using the same process
your state currently uses to submit your SDWIS data (see section D, below, for details).
Extracting data that are stored in SDWTS/State:
SDWIS/State Extract Tool: EPA has developed the SDWIS/State Extract Tool to extract the
relevant data (specified in Exhibit 2) from a SDWIS/State database. The tool consists of three
parts: PWS Inventory and Treatment, Analytical Results and Calculated Compliance Values. The
first two parts were used in the Six-Year Review 3. States that use SDWIS/State for data storage
and management and are interested in using the SDWIS/State extract tool can email
SixYearData@cadmusgroup.com for instructions to download the extraction tool. EPA believes
the extract tool would be the easiest mode of extraction for data that are stored in SDWIS/State.
For the data transfer step, please see section D, below.
Note: If you have not migrated all drinking water monitoring data for the applicable period
(January 2012 through December 2019) to SDWIS/State, a separate data submission to include
all data back to January 2012 is requested, so that the data included in the Agency's Six-Year
Review analysis is as complete and comparable as possible.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
A-6
August 2022
-------
Automated Data Quality Assurance (QA) with SDWIS/State Extraction Tool: EPA has built
in several automated data QA checks with this extraction tool. For example, the extraction tool
will check for duplicate data, and analytical results that are >10 times the MCL. Before the data
are extracted from SDWIS/State, the extraction tool runs these queries and returns a "flagged
item report" for any data that meet these and other criteria that may indicate anomalies in your
data (e.g., incorrect units of measurement, or data entry error). If there are entries in your
"flagged item report," we strongly encourage you to review and resolve as many of these flags as
possible before re-running and submitting your data. Doing this will help ensure your submitted
data are of the highest quality possible. In addition, we will run these and other QA checks once
we receive your data; so, by addressing flags before submitting your data, you will reduce the
number of questions that need to be resolved once your data are submitted.
Format for Non-SDWTS/State data:
Virtually any electronic file format is acceptable. It would be ideal for states to submit their data
sets in one of the following file formats: dBaseTM(.dbf); Microsoft Access (.accdb); comma or
tab delimited files (such as .csv or .txt), or; Microsoft Excel (.xls). However, you can submit the
requested data "as is," by simply sending the compliance monitoring and treatment technique
records in whatever structure or condition in which they are currently stored and submitting that
copy of the electronic data to EPA. If it is easier for you to provide your entire electronic data
set, EPA will extract the needed data. If you have further questions about this data submission,
you can contact SixYearData@cadmusgroup.com.
Documentation:
EPA requests that your submission also include, at a minimum, a brief description of the basic
format and structure of each data set, and definitions of all data elements, column/row headings,
codes, acronyms, etc., used in each data set. (Note: EPA does not need this information if you are
using SDWIS/State. EPA already has this information.) This "data dictionary" information will
reduce the amount of time needed for questions and clarification later. EPA's primary goal is to
obtain the most complete national occurrence and treatment technique data possible, and the
Agency will work with the states to reconcile data questions where needed. If your data set is
incomplete, or there are known anomalies, such as those that may have been identified by the
SDWIS/State extract tool, it would be helpful if an explanation of these issues were included
with your transmittal.
D. How do I send my data to EPA ?
Regardless of whether data is stored in SDWIS/State, states can submit data using the same
process your state currently uses to submit your SDWIS data. (Note some states using
SDWIS/State may store some of the requested data outside of SDWIS/State and they should also
follow these instructions.) Zip your files extracted from SDWIS/State or from some other
location and name them SIXYEAR_REVIEW_XX.ZIP where XX is the Primacy Agency
identifier. For example, Maryland would submit a file SIXYEAR_REVIEW_MD.ZIP. The files
extracted from SDWIS/State by the extraction tool get zipped up and saved together with this
naming convention. For more information on how to submit the data please see instructions file
accompanying the extraction tool.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
August 2022
-------
E. When do these data need to be submitted?
To help EPA meet its Six-Year Review 4 statutory timeframe and to allow ample time for data
compilation, analysis and documentation of results, EPA requests that the data be submitted by
September 30. 2020.
LL Background Information Regarding EPA's Occurrence Data Request
A. Why is EPA requesting this data?
The 1996 Safe Drinking Water Act (SDWA) Amendments require EPA to review and revise, if
appropriate, existing National Primary Drinking Water Regulations (NPDWRs) at least every six
years (i.e., the Six-Year Review). EPA is requesting monitoring and treatment technique data for
NPDWRs to support the fourth Six-Year Review. Without an understanding of where and at
what levels regulated drinking water contaminants are occurring in public drinking water, EPA
cannot assess any potential need to revise the regulations.
In addition, the 1996 SDWA Amendments require the Agency to maintain a national drinking
water contaminant occurrence database (i.e., the National Contaminant Occurrence Database or
NCOD) using occurrence data for both regulated and unregulated contaminants. Through this
data collection, EPA will be fulfilling various requirements set forth by Congress in the 1996
SDWA Amendments.
B. How will these data be used?
EPA's OGWDW will use the data to estimate the occurrence of regulated contaminants in public
drinking water systems and to evaluate the number of people exposed and exposure reductions.
Combined with results of other technical analyses (such as assessments of contaminant health
effects), the results of the occurrence and exposure analyses will be used to help determine
whether potential revisions to the current drinking water regulations are likely to maintain or
provide for greater protection of public health for people served by public water systems. This
data will help EPA to make well-informed regulatory decisions.
Once the Agency publishes the review results for the Six-Year Review 4, these data will be made
publicly available. The procedures used to analyze these data will reflect those established and
refined in prior Six-Year Reviews. Copies of EPA's Six-Year Review occurrence findings and
methodology reports can be obtained at:
http://water.epa.gov/lawsregs/rulesregs/regulatingcontaminants/sixyearreview/index.cfm. These
documents contain the first, second, and third Six-Year Review occurrence findings and provide
direct examples of the types of occurrence analyses that will be conducted using the compliance
monitoring data you submit.
C. Why is it important to submit these data?
Regulatory decisions and the public health protection resulting from these decisions are
improved by both the quality and quantity of the data. Each state that submits data can be
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
A-8
August 2022
-------
directly represented in any national occurrence estimates we develop. The Six-Year Review 4
data will be used in the review of existing regulations to determine whether current NPDWRs
remain appropriate or if revisions should be considered. All data will undergo a comprehensive
quality assurance/quality control (QA/QC) process required for the Six-Year Review 4
occurrence analyses. A copy of the resulting final, QA/QC reviewed contaminant data sets will
be posted on the EPA Six-Year Review website.
D. What will happen once the data are submitted?
EPA will conduct uniform QA/QC assessments on each data set. Contaminant-specific analytical
values will be assessed as part of the QA/QC review. For example, assessment of all analytical
values for a specific contaminant will help identify possible unit errors or the presence of
outliers. The data will also be checked for duplicate data entries (as defined by multiple rows of
identical data elements) with duplicates excluded from the analysis, as needed. Identified errors
that do not have straight-forward solutions will be addressed through consultations with the
appropriate data management staff.
Based on EPA's experience with monitoring information provided by states for the prior Six-
Year Reviews, the Agency will likely need to contact some states to address questions regarding
the data format and content (e.g., outlier values, or missing or undefined data elements). EPA
will document the QA/QC process and all edits or changes made to the submitted monitoring
data.
After the data have undergone QA/QC editing and formatting, the data sets will be aggregated
into national contaminant occurrence data sets for each contaminant. The national aggregate data
sets will be used to generate statistical estimations of national occurrence. When the analyses are
completed and reported, the data will be placed in the NCOD and in the docket to support any
Six-Year Review 4 decisions.
Treatment information will also be compiled and assessed to support the Six-Year Review 4
decisions. However, the format of this information may not lend itself to analogous quantitative
analysis and national summaries. Assessment of this information will be conducted and may be
summarized in a more qualitative manner. Water system facility characteristics, filtration type,
treatment technique information, and filter backwash information may be used to further inform
the results of the occurrence data assessment.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
A-9
August 2022
-------
Appendix B: User Guide to Downloading and Using Six-Year
Review 4?s Microbial and Disinfection Byproducts Information
Collection Request data files from EPA's Website
This appendix includes a user guide for downloading and using the SYR4 MDBP data from
EPA's website: https://www.epa.gov/dwsixvearreview/microbial-and-disinfection-bvproduct-
data-files-2012-2019-epas-fourth-six-vear. In addition, instructions on importing the SYR4
MDBP datasets and data dictionary for the MDBP datasets are also included in this Appendix
(see section 5 and 6, respectively).
Some datasets are described as "full" or reduced datasets. Full datasets are defined as all the QA-
ed data for that contaminant. A "reduced" dataset is a subset of the QA-ed data that has be
created by combining data from two or more contaminants to fit a particular purpose, e.g. pairing
microbial contaminant data with its associated disinfectant residual and eliminating non-paired
records is called a reduced dataset.
The data files are posted online in several zip files. Each zip file includes text files for multiple
contaminants/parameters. The number of records and contaminants/parameters included in each
file vary. The user may want to compare their counts of records downloaded for each
contaminant of interest to the table of records provided in this user guide's exhibits to ensure that
all of the records were correctly downloaded and imported. Note that these record counts reflect
the data after the QA/QC process. For a list of data elements included in the data posted online,
refer to Section 6 of this Appendix - Data Dictionary for Six-Year 4 ICR MDBP Database.
The remainder of this document is organized as follows:
Section 1: Background Information on Six-Year Review 4 Data
Section 2: Disinfection Byproducts
2A. Description of the Data Files for Disinfection Byproducts
2B. Data Files Posted for Disinfection Byproducts
2C. Disinfection Byproducts Data Records
Section 3: Disinfection Byproducts Related Parameters
3 A. Description of Data Files for Disinfection Byproducts Related Parameters
3B Data Files Posted for Disinfection Byproduct Related Parameters
3C. Disinfection Byproduct Related Parameters Data Records
Section 4: Microbial Contaminants, Microbial Related Parameters, and Associated
Disinfectant Residuals
4 A. Description of Data Files for Microbial Contaminants. Microbial Related
Parameters, and Associated Disinfectant Residuals Data
4B. Data Files Posted for Microbial Contaminants. Microbial Related Parameters.
and Associated Disinfectant Residuals
4C. Microbial Contaminants. Microbial Related Parameters, and Associated
Disinfectant Residuals Data Records
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-l
August 2022
-------
Section 5: Instructions on Importing Microbial and Disinfection Byproduct Datasets
unloading Data Files
3B Importing Data into Microsoft Excel
5C. Importing Data into R
Importing Data in Microsoft Access
Section 6: Data Dictionary for the Six-Year Review 4 Information Collection Request
Microbial Disinfection Byproduct Datasets
Section 1. Background Information on Six-Year Review 4 Data
To support the national contaminant occurrence and exposure assessments performed under the
fourth Six-Year Review process (SYR4), EPA collected compliance monitoring data and
treatment technique information from public water systems (PWSs) for regulated drinking water
contaminants. EPA conducted a voluntary data request from state and other primacy agencies to
obtain compliance monitoring data and treatment technique information necessary to analyze
national contaminant occurrence in support of SYR4. This data request was conducted through
the Information Collection Request (ICR) process. EPA requested primacy agencies submit their
Safe Drinking Water Act (SDWA) compliance monitoring data and treatment technique
information collected between January 2012 and December 2019. For the MDBP data
particularly, EPA collected the data recorded in the individual states databases related to these
National Primary Drinking Water Regulations: Stage 1 and Stage 2 Disinfectants and
Disinfection Byproducts Rules, Surface Water Treatment Rules, Interim Enhanced Surface
Water Treatment Rule, and Long-Term 1 Enhanced Surface Water Treatment Rule. For more
information on the process undertaken to request the voluntary submission of compliance
monitoring data and treatment technique information by the states, see the fourth Six-Year
Review ICR (84 FR 58381, USEPA, 2019).
EPA received compliance monitoring data and treatment technique information from both
SDWIS state and non-SDWIS state users. For states that use SDWIS/state, EPA developed a
tool, available upon request from primacy agencies, to extract the requested data identified in the
SYR4 ICR from a SDWIS/State database. In all, 46 states and 13 other primacy agencies
provided compliance monitoring data that included parametric records. Thirty-five states,
Washington D.C, and six regional tribal entities used the extraction tool to extract all or some of
their data. The 17 states/entities not using SDWIS/State submitted their compliance monitoring
data and treatment technique "as is," resulting in a variety of formats, including dBase, MS
Excel, XML, MS Access, and comma-delimited. With the exception of two states whose data
were downloaded from their publicly available website (California and Florida), all states
submitted their data over the Internet via EPA's Central Data Exchange. All data was conformed
to a similar format with consistent units of measurement for consistency. For more details about
the collection and formatting of SYR4 MDBP data see the main chapters of this document.
EPA conducted a quality assurance and control evaluation of these data submitted by primacy
agencies, and assembled these data into a database. As noted in the main chapters, that only the
data that passed the QA/QC process are posted online.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-2
August 2022
-------
Section 2: Disinfection Byproducts
2A. Description of the Data Files for Disinfection Byproducts
The SYR4 disinfection byproducts (DBPs) datasets include data text files of regulated
disinfection byproducts such as total trihalomethanes (TTHM) and sum of five haloacetic acids
(HAA) along with the individual speciated DBPs within these groups, respectively.
2B. Data Files Posted for Disinfection Byproducts
The following SYR4 ICR data text files are located in their designated zip file at
https://www.epa.gov/dwsixvearreview/microbial-and-disinfection-bvproduct-data-files-2Q12-
2019-epas-fourth-six-vear under Disinfection Byproducts:
SYR4_THMs.zip file contains individual files for:
Total Trihalomethanes (TTHM)
Bromodichloromethane
Bromoform
Chloroform
Dibromochloromethane
SYR4_HAAs.zip file contains individual files for:
Haloacetic Acids (HAA5)
Bromoacetic acid
Dibromoacetic acid
Dichloroacetic acid
Monochloroacetic acid
Trichloroacetic acid
S YR4_ lira mate_ Chlorite, zip contains individual files for:
Bromate
Chlorite
2C. Disinfection Byproducts Data Records
Exhibit 1 provides a count of states, total number of sample records and systems for each
disinfection byproduct whose data is posted online.
Note the speciation data is higher for TTHM than HAA5. There were two more states that
provided speciated THM results as compared to speciated HAA results. About 11,000 systems
provided speciated THM data but not speciated HAA data and there are about 200 systems with
speciated HAA data but no speciated THM data. In addition, the number of PWSs that provided
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-3
August 2022
-------
speciated TTHM data was higher than number of PWSs providing TTHM. There are
approximately 8,000 systems that have data for the speciated THMs but not TTHM whereas
there are only about 7,000 systems with data for TTHM but not the speciated THMs.
Exhibit 1: Number of Disinfection Byproduct Data Records and Zip filename(s)
Contaminant
Analyte
ID
Number of
States/Entities
with Data
Total
Number of
Sample
Records
Total
Number
of
Systems
Zip Filename
Disinfection Byproducts-Full Datasets
TOTAL TRIHALOMETHANES
(TTHM)
2950
57
1,089,557
46,297
SYR4_THMs.zip
DIBROMOCHLOROMETHANE
2944
46
981,059
47,172
SYR4_THMs.zip
BROMOFORM
2942
46
976,412
47,129
SYR4_THMs.zip
CHLOROFORM
2941
46
981,289
47,403
SYR4_THMs.zip
BROMODICHLOROMETHANE
2943
46
977,561
47,196
SYR4_THMs.zip
HALOACETIC ACIDS (HAA5)
2456
57
1,005,235
43,577
SYR4_HAAs.zip
DIBROMOACETIC ACID
2454
44
720,986
36,121
SYR4_HAAs.zip
DICHLOROACETIC ACID
2451
44
721,017
36,134
SYR4_HAAs.zip
MONOCHLOROACETIC ACID
2450
44
720,474
36,113
SYR4_HAAs.zip
TRICHLOROACETIC ACID
2452
44
720,706
36,125
SYR4_HAAs.zip
BROMOACETIC ACID
2453
44
720,595
36,095
SYR4_HAAs.zip
BROMATE
1011
38
23,298
444
SYR4_Bromate_Chlorite.zip
CHLORITE
1009
33
87,995
514
SYR4_Bromate_Chlorite.zip
Section 3: Disinfection Byproduct Related Parameters
3A. Description of Data Files Posted for Disinfection Byproduct Related Parameters
This DBP related parameters data posted includes data files for: total organic carbon (TOC), total
alkalinity, Paired TOC-Alkalinity, pH, DOC, SUVA, and UV-absorbance.
Full datasets are provided for TOC, Alkalinity, pH, DOC, SUVA, and UV-absorbance.
A reduced dataset, Paired TOC-alkalinity, was created that included, for each treatment plant
(listed as a water system facility in Exhibit 2), the average monthly concentrations of TOC and
alkalinity in source (raw) water paired with the corresponding average finished water
concentration of TOC. The "paired" TOC-alkalinity dataset was created to evaluate the percent
removal of TOC using the SYR4 data and joined the average monthly TOC concentration with
the average monthly alkalinity concentration for individual water system facilities when possible.
This paired dataset is directly related to the treatment technique requirements for TOC removal
under the Stage 1 DBPR. Historical efforts to evaluate the paired TOC-alkalinity data were
described in Six-Year Review 3 Technical Support Document for Disinfectants/Disinfection
Byproducts Rules" (USEPA, 2016).
Exhibit 3 contains the list of data elements, column names, and a brief description of the data for
each data element included in the "paired" TOC-alkalinity dataset. For a list of data elements
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-4
August 2022
-------
included in the "full" TOC, alkalinity and pH datasets, refer to Section 6 Data Dictionary for the
SYR4 ICR Database.
3B. Data Files Posted for Disinfection Byproduct Related Parameters
The following SYR4 ICR data text files are located in their designated zip file at
https://www.epa.gov/dwsixvearreview/microbial-and-disinfection-bvproduct-data-files-2Q12-
2019-epas-fourth-six-vear under Disinfection Byproducts Related Parameters:
SYR4DBP'Related Parameters.zip contains individual files for:
DOC
pH
SUVA
Total Alkalinity
Total Organic Carbon (TOC) (raw and finished TOC)
Paired TOC and Alkalinity
UV absorbance
Exhibit 2: "Paired TOC-Alkalinity" Dataset Field Names and Definitions
Data Element
Column Name
Description
Public Water System
NUMBERO
The code used to identify each PWS. The code begins with the
Identification Number
standard 2- character postal state abbreviation or region code;
(PWSID)
the remaining 7 numbers are unique to each PWS in the state.
Sample Collection
Month
Month (1 through 12).
Date (Month)
Sample Collection
Year
Year (2012 through 2019).
Date (Year)
Retail Population-
Population Served
Retail population served by the water system.
served
Federal Public Water
System Type
Water system type according to federal requirements.
System Type Code
C = Community water system
NTNC = Non-transient non-community water system
Source Water Type
Source Water Type
Primary water source for the water system.
GU = Ground water Under Direct Influence of Surface Water
GW = Ground Water
GWP = Purchased Ground Water
SW = Surface Water
SWP = Purchased Surface Water
Facility Identification
Water Facility ID
Unique identifier for each water system facility.
Code
State Facility
State Facility ID
Identifier for each water system facility that is unique within a
Identification Code
particular state.
State Assigned
State Assigned ID
A state-assigned value which identifies the water system
Identification Code
facility.
Raw water TOC
Avg Of Raw TOC
Monthly average (in mg/L) total organic carbon (TOC)
average concentration
(mg/L)
concentration in raw water.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-5
August 2022
-------
Data Element
Column Name
Description
Raw water alkalinity
average concentration
Avg Of Raw
Alkalinity (mg/L)
Monthly average (in mg/L) alkalinity concentration in raw
water.
Finished water TOC
average concentration
Avg Of Finished TOC
(mg/L)
Monthly average (in mg/L) total organic carbon (TOC)
concentration in finished water.
3C. Disinfection Byproduct Related Parameters Data Records
Exhibit 3 provides a count of states, total number of sample records and systems for Total
Organic Carbon (TOC)(raw and finished), Alkalinity, Paired TOC-Alkalinity, pH, DOC, SUVA,
UV-absorbance.
The count of systems for raw and finished TOC samples are counted separately, so systems with
samples in both categories are counted twice. Raw samples are identified as samples taken at
source water sampling points. Records were marked as raw if [SOURCETYPECODE] = 'RW'
OR [SOURCE TYPE CODE] was NULL but water system facility type code = ('IG' or 'IN' or
'RS' or 'SP' or 'WL' or 'CC'). Records were marked as finished if SOURCE TYPE CODE = 'FN'
or SOURCE TYPE CODE was NULL but water facility type code = ('CW' or 'DS' or 'PF or
'ST' or 'TM' or 'TP').
Note that within the "Full" TOC text file, raw/finished designations are not assigned. However,
with the Paired TOC-alkalinity record reduced dataset, raw and finished designations are
assigned.
Exhibit 3: Number of TOC, Alkalinity, pH, DOC, SUVA, and UV-absorbance Data
Records and Zip Filename(s)
Contaminant
Analyte
ID
Number of
States/Entities
with Data
Total Number
of Sample
Records
Total
Number
of
Systems
Zip Filename
Disinfection Byproduct Related Parameters - Full Datasets
TOTAL ORGANIC
CARBON (TOC)
2920
49
440,197
3,156
SYR4_DBP_Related Parameters.zip
RAW TOC
2920
42
188,358
2,494
SYR4_DBP_Related Parameters.zip
FINISHED TOC
2920
38
155,558
1,999
SYR4_DBP_Related Parameters.zip
ALKALINITY
1927
51
429,397
18,140
SYR4_DBP_Related Parameters.zip
PH
1925
52
632,821
28,660
SYR4_DBP_Related Parameters.zip
SUVA
2923
2
8,026
59
SYR4_DBP_Related Parameters.zip
UV-absorbance
2922
3
6,061
60
SYR4_DBP_Related Parameters.zip
DOC
2919
3
5,908
76
SYR4_DBP_Related Parameters.zip
Disinfection Byproduct Related Parameters - Reduced Dataset
Paired TOC-alkalinity
record1
N/A
33
92,666
1,192
SYR4_DBP_Related Parameters.zip
1 The "paired" TOC-alkalinity dataset includes average monthly concentrations of TOC and alkalinity in source (raw) water
paired with the corresponding average finished water concentrations of TOC.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-6
August 2022
-------
Section 4: Microbial Contaminants, Microbial Related Parameters, and
Associated Disinfectant Residuals
4A. Description of Data Files for Microbial Contaminants, Microbial Related Parameters,
and Associated Disinfectant Residuals Data
Data for three microbial contaminants (total coliforms (TC), Escherichia coli (EC), and fecal
coliform (FC)) were collected from 2012 to 2019 for SYR4. The total coliform datasets are
separated into individual files by each year of data collected due the large volume of data
collected on TC.
Reduced datasets were created to pair microbial data (TC, EC, FC) with associated disinfectant
residual for disinfecting systems. Disinfectant residual results are shown as free residual chlorine
and total chlorine in theses reduced datasets. These disinfectant residual data were collected on
the same date and location as the microbial parameters. Additional data for disinfectant residual
include datasets for chlorine and chloramine; those data were not reported as being collected on
the same date and location as the microbial parameters.
Note that the TC/EC/FC data files contain the monitoring records under Total Coliform
Rule/Revised Total Coliform Rule for systems with all source water types. The HPC
disinfectants, disinfectant residuals, paired microbes disinfectant residuals files contain the
monitoring records under SWTRs for surface water systems.
4B. Data Files Posted for Microbial Contaminants, Microbial Related Parameters, and
Associated Disinfectant Residuals
The following SYR4 ICR data text files are located in their designated zip file at
https://www.epa.gov/dwsixvearreview/microbial-and-disinfection-bvproduct-data-files-2Q12-
2019-epas-fourth-six-vear under Microbial Contaminants, Microbial Related Parameters,
Associated Disinfectant Residuals:
SYR4 TCzip contains individual files for:
Total Coliform_2012
Total Coliform_2013
Total Coliform_2014
Total Coliform_2015
Total Coliform_2016
Total Coliform_2017
Total Coliform_2018
Total Coliform_2019
SYR4_EC_FC_HPC_ Giardia, zip contains individual files for:
Escherichia coli (EC)
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-7
August 2022
-------
Fecal coliform (FC)
Giardia Lamblia
Heterotrophic Plate Count (HPC)
SYR4_Disinfectant Residuals.zip contains individual files for:
Chloramines
Chlorine
Chlorine dioxide
Free Residual Chlorine
Residual Chlorine
Total Chlorine
S YR4_Paired Microbes!) R (Disinfectant Residuals).zip contains individual files for:
Paired
EC
DR
Paired
FC
DR
Paired
TC
DR
2012
Paired
TC
DR
2013
Paired
TC
DR
2014
Paired
TC
DR
2015
Paired
TC
DR
2016
Paired
TC
DR
2017
Paired
TC
DR
2018
Paired
TC
DR
2019
4C. Microbial Contaminants, Microbial Related Parameters, and Associated Disinfectant
Residuals Data Records
Exhibit 4 is a list of data elements included in the TC, EC, FC and Reduced Dataset for Analysis
of Disinfecting Systems with Disinfectant Residual records.
Exhibit 4: Field Names and Descriptions for Paired Microbial Contaminants and
Associated Disinfectant Residuals Datasets
Data Element
Column Name
Description
Presence Indicator
Code
PRESENCE_
INDICATOR_CODE
Indication of whether results of an analysis were positive or
negative forTC, EC and FC.
P = Presence
A = Absence.
Residual Field Free
Chlorine
RESIDUAL_FIELD_
FREE_CHLORINE_MG_L
Amount of free chlorine residual (in mg/L) found in the water
after disinfectant has been applied. These concentrations
were measured in the field at the same time and location as
coliform samples (TC-EC-FC samples).
Residual Field Total
RESIDUAL_FIELD_
Amount of total chlorine residual (in mg/L) found in the
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-8
August 2022
-------
Data Element
Column Name
Description
Chlorine
TOTAL_CHLORINE_
MG_L
water after disinfectant has been applied. These
concentrations were measured in the field at the same time
and location as coliform samples (TC-EC-FC samples).
Exhibit 5 provides a count of states, total number of sample records and systems for TC, EC, FC,
and their associated free and total chlorine residual concentrations for both the full and reduced
datasets.
Exhibit 5: Number of Microbial Contaminants, Microbial Related Parameters, and
Associated Disinfectant Residuals Data Records and Zip Filename(s)
Contaminant
Analyte
ID
Number of
States/
Entities
with Data
Total
Number of
Sample
Records
Total
Number
of
Systems
Zip Filename
Microbes and Disinfectants - Full Datasets
TOTAL COLIFORM (2012)
3100
54
2,349,687
102,423
SYR4_TC.zip
TOTAL COLIFORM (2013)
3100
54
2,398,740
102,713
SYR4_TC.zip
TOTAL COLIFORM (2014)
3100
56
2,521,212
105,515
SYR4_TC.zip
TOTAL COLIFORM (2015)
3100
56
2,513,937
104,532
SYR4_TC.zip
TOTAL COLIFORM (2016)
3100
57
2,656,932
113,099
SYR4_TC.zip
TOTAL COLIFORM (2017)
3100
57
2,780,743
114,328
SYR4_TC.zip
TOTAL COLIFORM (2018)
3100
57
2,849,385
114,954
SYR4_TC.zip
TOTAL COLIFORM (2019)
3100
57
2,675,476
111,385
SYR4_TC.zip
E. COLI (EC)
3014
57
7,175,363
93,728
SYR4_EC_FC_HPC_Giardia.zip
FECAL COLIFORM (FC)
3013
40
16,818
1,835
SYR4_EC_FC_HPC_Giardia.zip
HETEROTROPHIC BACTERIA (HPC)
3001
16
135,081
595
SYR4_EC_FC_HPC_Giardia.zip
GIARDIA LAMBLIA
3008
15
4628
229
SYR4_EC_FC_HPC_Giardia.zip
LEGIONELLA
0
0
0
N/A
CHLORINE1
0999
19
6,100,133
4,438
SYR4_Disinfectant Residuals.zip
TOTAL CHLORINE
1000
1
125,788
741
SYR4_Disinfectant Residuals.zip
CHLORAMINE1
1006
9
78,664
198
SYR4_Disinfectant Residuals.zip
RESIDUAL CHLORINE
1012
4
179,599
572
SYR4_Disinfectant Residuals.zip
FREE RESIDUAL CHLORINE1
1013
3
2,000,997
4,044
SYR4_Disinfectant Residuals.zip
CHLORINE DIOXIDE
1008
9
12,752
28
SYR4_Disinfectant Residuals.zip
Microbes and Associated Disinfectant Residuals - Reduced Dataset
E. coli (EC) with Associated
Disinfectant Residuals
3014
49
3,079,032
28,091
SYR4_Paired Microbes_DR.zip
Fecal Coliform (FC) with
Associated Disinfectant Residuals
3013
24
5,966
534
SYR4_Paired Microbes_DR.zip
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2012)
3100
43
1,165,209
30,950
SYR4_Paired Microbes_DR.zip
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2013)
3100
44
1,173,926
31,132
SYR4_Paired Microbes_DR.zip
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-9
August 2022
-------
Contaminant
Analyte
ID
Number of
States/
Entities
with Data
Total
Number of
Sample
Records
Total
Number
of
Systems
Zip Filename
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2014)
3100
46
1,218,722
31,865
SYR4_Paired Microbes_DR.zip
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2015)
3100
47
1,241,995
31,880
SYR4_Paired Microbes_DR.zip
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2016)
3100
48
1,274,211
34,654
SYR4_Paired Microbes_DR.zip
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2017)
3100
50
1,331,868
37,217
SYR4_Paired Microbes_DR.zip
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2018)
3100
50
1,480,354
41,053
SYR4_Paired Microbes_DR.zip
Total Coliform (TC) paired with
Associated Disinfectant Residuals
(2019)
3100
50
1,498,050
38,029
SYR4_Paired Microbes_DR.zip
1 Reported independently of the coliform sample results.
Section 5: Instructions on Importing Microbial and Disinfection
Byproduct Datasets
These text files are tab delimited and have no text qualifier. Field names are included in the first
row of each file. The data are available for download for each parameter and should be imported
into a data management system that supports large datasets for analysis.
5A: Downloading Data Files (Note that instructions may vary depending on the version and
software used to import data.)
1. Begin by reviewing the Data Field Names and Definitions (Section 6- Data Dictionary for
the SYR4 ICR Database).
2. Access the SYR4 MDBP data files by going to
https://www.epa.gov/dwsixvearreview/six-vear-review-4-microbial-aiid-disinfection-
bvproduct-data-
3. Click on the desired zip file and select "Save As" to save the file to your computer.
4. Navigate to the location on your computer where you saved the zip file and extract the
zip file contents by clicking "Open with" and using WinZip or a similar file compression
software
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-10
August 2022
-------
5B: Importing Data into Microsoft Excel
Using Microsoft Excel 2013 or a newer version is recommended due to the size of the dataset(s).
Note the following MDBP data files are too large to import into Microsoft excel: TTHM, HAA,
Free Residual Chlorine, Total Chlorine, all TC files, EC, and all Paired microbes and
Disinfectant Residual files.
5. Open a blank workbook in Microsoft Excel.
6. In the workbook, select Data among the tabs at the top of the page.
7. On the far left, top of the screen, go to the Get External Data section and select From
Text.
8. You will be prompted to select a text file. Locate the text files you extracted in Step 4,
and click "Import" on the text file of interest.
9. A preview of the file text converted to a table will appear. At the top, verify that File
Origin (depending on your computer's operating system) displays "10000: Western
European (Mac)" or "1252: Western European (Windows)" Select "Tab" as the
Delimiter and "Based on first 200 rows" as the Data Type Detection. Click Load To...
10. In the next window, choose "Table" under Select how you want to view the data in your
workbook. Select "Existing worksheet" for where to put the data and verify the table's
origin cell origin displays as "=$A$1." Click OK.
11. A "Queries & Connections" window will appear on the right of the screen as Excel
generates the new table. This step may take several minutes.
12. Save the Excel spreadsheet file once the table generation is complete.
5C: Importing Data into R
1. Open a blank R script.
2. Using the function read.delim(), import the text file using the following format:
a. [analyte name] <- read.delim(file = [filepath], header = TRUE)
Example: bromoform <- read.delim(file = "C:/Users/[username]/Desktop/SYR4-
Microbes /SUMMARYMDBPS BROMOFORM.txt", header = TRUE)
3. Check the data frame that is generated to ensure correct formatting.
4. NOTE: data columns that should be in date format will be imported as character type. To
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-ll
August 2022
-------
fix, include the line "df$DATE <- as.Date.character(df$DATE, format = "%d-%b-%y")"
in the R code, replacing df with the name of the dataframe, and DATE with the name of
the column containing date information.
5D: Importing Data into Microsoft Access
1. Open a blank database in Microsoft Access.
2. In the database, select External Data among the tabs at the top of the page.
3. On the far left, top of the screen, go to the New Data Source dropdown and select From
File > Text File."
4. You will be prompted to select a text file. Locate the text files you extracted in Step 4,
and with the following options: "import the source data into a new table in the current
database", or "Link to the data source by creating a linked table". You can choose either
method, but note that linking the file will maintain a smaller database size. Click OK.
Get External Data - Text Fil
Select the source and destination of the data
Specify the source of the definition of the objects.
File name | |
Specify how and where you want to store the data in the current database,
We will not import table relationships, calculated columns, validation rules, default values, and columns of certain legacy data types
such as OLE Object.
Search for 'Import* in Microsoft Access Help for more information.
O Import the source data into a new table in the current database.
If the specified table does not exist. Access will create it. If the specified table already exists. Access might overwrite its
contents with the imported data. Changes made to the source data will not be reflected in the database.
O Append a copy of the records to the table: SUMMARY_ALKALINITY_TOTAL [^T||
If the specified table exists, Access will add the records to the table. If the table does not exist. Access will create it.
Changes made to the source data will not be reflected in the database.
(5) Link to the data source by creating a linked table.
Access will create a table that will maintain a link to the source data. You cannot change or delete data that is linked to a
text file. However, you can add new records.
I °Kfc I I Can"'
5. The Link (or Import) Text Wizard will appear. The default settings will be displayed and
should have Delimited selected as the data format. Select Next>.
Data Management OA/OC Process
for the SYR4 MDBP Preliminary Datasets
B-12
August 2022
-------
m Link Text Wizard
X
Your data appears to be in a 'Delimited' format. If it isn't, choose the format that more
correctly describes your data.
©Delimited - Characters such as comma or tab separate each field
Q Fixed Width - Fields are aligned in columns with spaces between each field
Sample data from file: \\CADMUSGROUP.ORG^PROJECTS\583X-SRMD3pCCURRENCEDATA^D!YEAR4\ANALrTE_TXT\10_15_2Q21\10_15_21_MDB
' ANALYTE_CODE""ANALYTE_NAME""STATE_CODE""PWSID""SYSTEM_NAME""SYSTEM_TYPE""RETAIL
'1009""CHLORITE""AL""ALOO00798""MOOLTON WATER WORKS BOARD""C"744015975"SW"110
,1009""CHLORITE""IA""IA203803S""OSCEOLA WATER WORKS""C"49297502"SW"71999"DS"
'1009""CHLORITE""IA""IA2038038""OSCEOLA WATER WORKS""C"49297502"SW"71999"DS"
'1009""CHLORITE""IA""IA2038038""OSCEOLA WATER WORKS""C"49297502"SW"71999"DS"
'1009""CHLORITE""IA""IA2038038""OSCEOLA WATER WORKS""C"49297502"SW"71999"DS"
' 1009""CHLORITE""IA""IA2038038""OSCEOLA WATER WORKS""C"49297502"SW"71999"DS"
'1009""CHLORITE""IA""IA2033038""OSCEOLA WATER WORKS""C"49297502"SW"71999"DS"
' 1009""CHLORITE""RI""RI1592010""NEWPORT-CITY OF""C"4200067053"SW"557293"DS"
"1009""CHLORITE""RI""RI1592010""NEWPORT-CITY OF""C"4200067053"SW"557293"DS"
'1009""CHLORITE""RI""RI1592010""NEWPORT-CITY OF""C"4200067053"SW"557293"DS"
'1009""CHLORITE""RI""RI1592010""NEWPORT-CITY OF""C"4200067053"SW"557293"DS"
' 1009""CHLORITE""KS""KS2117502""NATIONAL BEEF PACKING CO LLC LIBERAL""NTNC"3086
' 1009""CHLORITE""KS""KS2117502""NATIONAL BEEF PACKING CO LLC LIBERAL""NTNC"3086
6. Default settings will display next and should have "Tab" selected as the delimiter. Select
the checkmark box next to "First Row Contains Field Names." Next, click
"Advanced...".
51 Link Text Wizard X
What delimiter separates your fields? Select the appropriate delimiter and see how your text is affected in the preview below.
Choose the delimiter that separates your fields:
(»)Tab Q Semicolon Q Comma QSgace Q Other:
|s/|First Row Contains Field Names
Text Qualifier: " v
ANALYTE CODE
ANALYTE NAME
STATE CODE
PWSID
SYSTEM NAME
SYST
1009
:hlorite
&L
&L0000798
tfOULTON WATER WORKS BOARD
;
*
1009
CHLORITE
IA
IA2038038
3SCEOLA WATER WORKS
:
1009
:hlorite
IA
IA2038038
3SCEOLA WATER WORKS
:
1009
:hlorite
IA
IA2038038
5SCEOLA WATER WORKS
:
1009
chlorite
IA
IA2038038
DSCEOLA WATER WORKS
:
1009
:hlorite
IA
IA2038038
5SCEOLA WATER WORKS
;
1009
CHLORITE
IA
IA2038038
>SCEOLA WATER WORKS
:
1009
:hlorite
*1
*11592010
NEWPORT-CITY OF
;
1009
:hlorite
*1
*11592010
NEWPORT-CITY OF
:
1009
chlorite
*1
*11592010
NEWPORT-CITY OF
;
1009
:hlorite
*i
*11592010
NEWPORT-CITY OF
;
1009
chlorite
cs
¦CS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
NTNC
1009
:hlorite
1
k
7. The Link (or Import) Specification window will appear. In the Dates, Times, and
Numbers section, set the Date Order value to "DMY."
Data Management (J. I ( H ' Process
for the SYR4 MDBP Preliminary Datasets
B-13
August 2022
-------
E SUMMARY_FECAL_COLIFORM Link Specification
X
File Format:
Language:
Code Page:
(J) Delimited
O Fixed Width
Field Delimiter:
Text Qualifier:
{tab} v
OK
Cancel
English
Save As,
OEM United States
Seecs.
Dates, Times, and Numbers
Date Order:
Date Delimiter:
Time Delimiter: MDY
MYD
YDM
YMD
Field Information:
0 Four Digit Years
Q Leading Zeros in Dates
Decimal Symbol:
E
ANALYTE CODE
ShortText
¦
ANALYTE NAME
Short Text
STATE CODE
ShortText
PWSID
ShortText
SYSTEM NAME
ShortText
SYSTEM TYPE
ShortText
RETAIL POPULA1
Lonq Inteqer
ADJUSTED TOTA
Lonq Inteqer
SOURCE WATER
ShortText
8. On the screen that follows, keep the default settings shown below and click Next>.
You can specify information about each of the fields you are importing. Select fields in the area below. You can then modify field
information in tine 'Field Options' area.
i-Field Options
Field Name: jflrJiriiMtBHwa Datajype: [ShortText | y |
Indexed: |No v | | | Do not import field (Skip)
|ANALYTE CODE |
ANALYTE NAME
STATE CODE
PWSID
SYSTEM NAME
SYSI
1009
CHLORITE
AL
AL0000798
tfOULTON WATER WORKS BOARD
C
1009
.:hicrzie
IA
IA2038038
OSCEOLA WATER WORKS
z
1009
CHLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
CHLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
CHLORITE
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
:hlcritz
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
:hicr:ts
IA
IA2038038
OSCEOLA WATER WORKS
c
1009
CHLORITE
RI
RI1592010
NEWPORT -CITY OF
c
1009
CHLORITE
RI
RI1592010
NEWPORT-CITY OF
c
1009
CHLORITE
RI
RI1592010
NEWPORT-CITY OF
c
1009
:hlcr:iz
RI
RI1592010
NEWPORT-CITY OF
c
1009
CHLORITE
KS
KS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
NTNC
1009
CHLORITE
KS
KS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
NINC
1009
CHLORITE
KS
KS2117502
NATIONAL BEEF PACKING CO LLC LIBERAL
NTNC
< >
Advanced... Cancel Finish
If you are importing instead of linking, a window will pop up related to setting a primary
key. The default is set to "Let Access add a primary key". Check "No primary key" and
click Next >,
Data Management (J. I ( H ' Process
for the SYR4 MDBP Preliminary Datasets
B-14
August 2022
-------
a Import Text Wizard
Microsoft Access recommends that you define a primary key for your new table. A primary key is used to
uniquely identify each record in your table. It allows you to retrieve data more quickly.
QLet Access add primary key.
QChoose my own primary key.
(' No primary key.
Fieldl
Field2
Field3
Field4
Field5
Field6
Field7
PWSID
State
SDWIS_YN
PurchasingStatus
Population Served
System Type
Source Water Typ
080890001
08
y
0%
1527
C
SW
080890001
08
5f
0%
1527
C
SW
080890001
08
Y
0%
1527
C
SW
080890001
08
Y
0%
1527
C
SW
080890001
08
Y
0%
1527
c
SW
080890001
08
Y
0%
1527
C
SW
080890001
08
Y
0%
1527
c
SW
080890001
08
Y
0%
1527
C
SW
080890001
08
Y
0%
1527
c
SW
080890001
08
Y
0%
1527
C
SW
080890001
08
Y
0%
1527
c
SW
080890001
08
Y
0%
1527
C
SW
080890001
08
Y
0%
1527
c
SW
080890001
08
Y
0%
1527
c
SW
080890001
08
Y
0%
1527
c
SW
9. A final screen will appear. Enter a meaningful name for the linked/imported table. This
field will be auto-populated with the name of the linked file. Click Finish.
m Link Text Wizard
That's all the information the wizard needs to link to your data.
Linked Table Name:
Part Two: Filtering and Formatting Data in Excel
10. To efficiently search, have cell A1 selected, choose "Data" among the tabs on the top of
the page and click on "Filter." Each header title for each column now will have a small
dropdown arrow displayed.
11. Filtering the data: a. If you want to look for a specific public water system, click the
dropdown arrow for "PWSID" or "System Name." Within the search field, type the name
Data Management (J. I ( H ' Process
for the SYR4 MDBP Preliminary Datasets
B-15
August 2022
-------
and select from the displayed list. b. If you want to search for a different public water
system, click the dropdown arrow and "Clear Filter from PWSID" or "Clear Filter from
System Name." c. If you want to filter the data by contaminant, select "Analyte Name."
12. Multiple filters can be applied for example, allowing you to look for an individual water
system's data for a specific contaminant of interest.
13. De-select Filter in the top menu bar and the entire database will again be displayed.
14. Note, all column formats are imported as the default General formatting. Column formats
must be individually, manually changed in Excel after the download is complete to aid in
data analysis. Use the Home screen in excel, highlight the column and select the format
from the drop down menu. Suggested formats are:
Text fields Analyte Name
State Code
PWSID
System Name
System Type
Source Water Type
Water Facility Type
Sampling Point Type
Source Type Code
Sample Type Code
Laboratory Assigned ID
Sample Collection Date
Detection Limit Unit
Detection Limit Code
Value Unit
Presence Indicator Code
Numeric fields Analyte ID
Retail Population Served
Adjusted Total Population Served
Water Facility ID
Sampling Point ID
Six Year ID
Sample ID
Detection Limit Value
Detect
Value
Residual Field Free Chlorine mg/L
Residual Field Total Chlorine mg/L
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-16
August 2022
-------
Section 6: Data Dictionary for the SYR4 ICR Database
Exhibit 6 below contains a list of the data elements, column names and a brief description of the
data for each data element included in each of the SYR4 ICR data text files.
Exhibit 6: Six-Year 4 Data Field Names and Definitions
Column Name
Data Element
Description
ANALYTE_CODE
Contaminant
Identification Code
4-digit Safe Drinking Water Information System (SDWIS)
contaminant identification number for which the sample is being
analyzed.
ANALYTE_NAME
Contaminant
Name
Common name of contaminant for which the sample is being
analyzed.
STATE_CODE
State Code
2- digit state code. Note that the state code "IM" refers to non-
community water system data from the State of Illinois.
PWSID
Public Water
System
Identification
Number (PWSID)
The code used to identify each PWS. The code begins with the
standard 2- character postal state abbreviation or region code;
the remaining 7 numbers are unique to each PWS in the state.
SYSTEM NAME
System Name
Name of the PWS.
SYSTEM_TYPE
Federal Public
Water System
Type Code
A code to identify whether a system is:
Community Water System (C);
Non-Transient Non-Community Water System (NTNC); or
Transient Non-Community Water System (NC).
RETAIL_POPULATI
ON SERVED
Retail Population
served
Retail population served by a system.
ADJUSTED_TOTAL_
POPULATION_
SERVED1
Adjusted Total
Population-served
Total population served by a system, adjusted to reduce double-
counting of population served by purchasing water systems.
SOURCE_WATER_
TYPE
Source Water Type
Type of water at the source. Source water type can be:
Ground water (GW);
Surface water (SW);
Purchased Surface Water (SWP);
Purchased Ground Water (GWP);
Ground Water Under Direct Influence of Surface Water (GU); or
Purchased Ground Water Under Direct Influence of Surface
Water (GUP).
WATER_FACILITY_I
D
Facility
Identification Code
A unique identifier for each water system facility.
WATER_FACILITY_
TYPE
Water Facility Type
Type of water system facility:
CC = Consecutive Connection;
CH = Common Headers;
CW = Clear Well;
DS = Distribution System;
IG = Infiltration Gallery;
IN = Intake;
OT = Other;
PC = Pressure Control;
PF = Pumping Facility;
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-17
August 2022
-------
Column Name
Data Element
Description
RS = Reservoir;
SI = Surface Impoundment;
SP = Spring;
SS = Sampling Station;
ST = Storage;
TM = Transmission Main (Manifold);
TP = Treatment Plant;
WH = Well Head;
WL = Well; or
XX = unknown.
SAMPUNG_POINT
ID
Sampling Point
Identification Code
A unique identifier for each sampling point location.
SAMPUNG_POINT
_TYPE
Sampling Point
Type
Location type of a sampling point:
DS = Distribution System;
EP = Entry point;
FC = First Customer;
FN = Finished Water Source;
LD = Lowest Disinfectant Residual;
MD = Midpoint in the Distribution System;
MR = Point of Maximum Residence;
PC = Process Control;
RW = Raw Water Source;
SR = Source Water Point;
UP = Unit Process; or
WS = Water System Facility Point
SOURCE_TYPE_CO
DE
Source Type Code
Type of water source, based on whether treatment has taken
place. Source type can be:
Finished (FN);
Raw (RW); or
Unknown (null or X).
SAMPLE_TYPE_CO
DE
Sample Type Code
Type of sample:
CO = Confirmation;
MR = Maximum Residence Time;
RP = Repeat;
RT = Routine;
ST = Split;
MS = Matrix spike;
TG = Triggered; or
FB = Field Blank.
LABORATORY_
ASSIGNEDJD
Laboratory
Assigned
Identification
Number
Unique lab identification, used to link up the total coliform
positive (TC+) and E. coli/fecal coliform samples.
SIX YEAR ID
Six Year ID
Unique identifier for each analytical result.
SAMPLEJD
Sample
Identification
Number
Identifier assigned by state or the laboratory that uniquely
identifies a sample.
SAMPLE_
COLLECTION DATE
Sample Collection
Date
Date the sample was collected, including month, day, and year.
DETECTION_LIMIT
Detection Limit
Limit below which the specific lab indicated they could not
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-18
August 2022
-------
Column Name
Data Element
Description
_VALUE
Value
reliably measure results for a contaminant with the methods and
procedures used by the lab.
DETECTION_UMIT
UNIT
Detection Limit
Unit
Units of the detection limit value.
DETECTION_UMIT
_ CODE
Detection Limit
Code
Indicates the type of Detection Limit reported in the Detection
Limit Value column (e.g., the Minimum Reporting Level,
Laboratory Reporting Level, etc.)
DETECT
Sample Analytical
Result - Sign
The sign indicates whether the sample analytical result was:
(0) "less than" means the contaminant was not detected or was
detected at a level "less than" the MRL.
(1) "equal to" means the contaminant was detected at a level
"equal to" the value reported in "Sample Analytical Result -
Value."
VALUE
Sample Analytical
Result - Value
For detections, this field is equal to the actual numeric (decimal)
value of the analysis for the chemical result; for non-detections,
this field is blank.
UNIT
Sample Analytical
Result - Unit of
Measure
Unit of measurement for the analytical results reported (usually
expressed in either ng/L or mg/Lfor chemicals; or pCi/Lfor
radionuclides).
PRESENCE_
INDICATOR_CODE
Presence Indicator
Code
Indication of whether results of an analysis were positive or
negative for TC, EC and FC.
P = Presence
A = Absence.
RESIDUAL_FIELD_
FREE_CHLORINE_
MG_L
Residual Field Free
Chlorine
Amount of free chlorine residual (in mg/L) found in the water
after disinfectant has been applied. These concentrations were
measured in the field at the same time and location as coliform
samples (TC-EC-FC samples).
RESIDUAL_FIELD_
TOTAL_CHLORINE_
MG_L
Residual Field Total
Chlorine
Amount of total chlorine residual (in mg/L) found in the water
after disinfectant has been applied. These concentrations were
measured in the field at the same time and location as coliform
samples (TC-EC-FC samples).
1 Information for total population was not received. This value was generated for wholesale systems using buyer-seller
relationships and calculating the adjusted total population served.
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
B-19
August 2022
-------
Appendix C: Six-Year Review 4 Microbial and Disinfection
Byproduct Data Records by State
Appendix C contains exhibits with the number of Six-Year 4 Microbial and Disinfection
Byproducts (MDBP) data records by category by state. The following is a list of the exhibits:
Exhibit C-l: Number of Microbial Contaminants (Total Coliform, E.coli, Fecal Coliform, Giardia
Lamblia) Data Records by State
Exhibit C-2: Number of Total Trihalomethanes (TTHM) Data Records by State
Exhibit C-3: Number ofHaloacetic acids (HAAs) Data Records by State
Exhibit C-4: Number of Chlorite and Bromate Data Records by State
Exhibit C-5: Number of Disinfection Byproduct Related Parameters Data Records by State
Exhibit C-1: Number of Microbial Contaminants (Total Coliform, E.coli, Fecal
Coliform, Giardia Lamblia) Data Records by State
State
Total Coliform
E. Coli
Fecal Coliform
Giardia lamblia
Alaska
103,898
65,414
2,823
0
Alabama
284,580
90,650
6
60
Arkansas
394,314
6,089
0
0
American Samoa
13,186
13,184
0
0
Arizona
219,468
42,862
26
0
California
0
0
0
0
Colorado
352,349
204,889
24
0
Connecticut
382,725
219,854
14
23
Washington, D.C.
13,693
9,648
0
0
Delaware
70,366
13,042
3
0
Florida
2,342,672
350
21
0
Hawaii
16,035
13,593
13
0
Iowa
425,813
207,287
3
0
Idaho
193,935
14,451
3
7
Illinois
1,526,019
651,044
235
0
Indiana
398,481
13,702
0
0
Kansas
279,741
208,962
11
926
Kentucky
427,911
1,949
0
0
Louisiana
179,619
147,417
14
0
Massachusetts
0
0
0
0
Maryland
60,832
34,081
1,092
0
Maine
145,575
77,758
2
56
Minnesota
225,927
15,141
12
398
Missouri
601,095
282,873
1
0
Northern Mariana Islands
13,364
12,020
0
0
Montana
260,675
216,652
4,942
0
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-l
August 2022
-------
State
Total Coliform
E. Coli
Fecal Coliform
Giardia lamblia
North Carolina
926,048
628,350
4
0
North Dakota
95,674
936
1
0
Nebraska
218,891
153,908
0
0
New Hampshire
155,791
156,191
0
0
New Jersey
935,126
22,684
64
785
Navajo Nation
7,447
6,789
0
0
Nevada
81,129
13,499
0
256
New York
541,960
88,232
438
153
Ohio
1,022,164
112,768
112
0
Oklahoma
398,661
236,786
0
0
Oregon
477,951
16,078
1
0
Pennsylvania
854,438
246,817
730
0
Rhode Island
61,041
44,878
1,792
1
South Carolina
9,563
7,510
2
0
South Dakota
117,852
66,507
0
0
Tennessee
91,984
84
1,449
0
Texas
2,637,545
1,359,122
1,132
0
Utah
297,343
92,252
10
4
Virginia
703,226
343,357
150
8
Vermont
126,345
106,484
1
192
Washington
949,429
224,822
191
0
Wisconsin
693,211
545,150
0
0
West Virginia
187,869
4,082
11
1,689
Wyoming
108,011
87,686
1,409
0
Region 1 - Tr
bes
2,722
2,708
0
0
Region 2 - Tr
bes
912
84
0
0
Region 4 - Tr
bes
3,591
57
3
70
Region 5 - Tr
bes
19,648
145
1
0
Region 6 - Tr
bes
21,655
10,140
47
0
Region 7 - Tr
bes
2,468
2,237
0
0
Region 8 - Tr
bes
21,291
13,740
24
0
Region 9 - Tr
bes
21,764
17,844
0
0
Region 10 - Tribes
21,089
524
1
0
Exhibit C-2: Number of Total Trihalomethanes (TTHM) Data Records by State
State
TTHM
Chloroform
Bromoform
Bromodichloromethane
Dibromochloromethane
Alaska
4,546
4,557
4,548
4,559
4,558
Alabama
41,159
5,361
5,392
5,371
5,377
Arkansas
21,380
25,444
25,446
25,446
28,031
American Samoa
161
0
0
0
0
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-2
August 2022
-------
State
TTHM
Chloroform
Bromoform
Bromodichloromethane
Dibromochloromethane
Arizona
15,050
545
548
549
552
California
143,888
149,737
150,126
150,620
150,591
Colorado
24,986
11,810
11,812
11,816
11,815
Connecticut
11,128
20,566
20,567
20,567
20,565
Washington, D.C.
240
7
7
7
7
Delaware
3,031
5,389
5,386
5,389
5,389
Florida
48,865
0
0
0
0
Hawaii
2,732
2,649
2,674
2,671
2,670
Iowa
14,736
14,708
14,708
14,708
14,708
Idaho
4,822
366
357
358
364
Illinois
43,203
42,414
42,430
42,444
42,439
Indiana
19,042
2,867
2,871
2,870
2,871
Kansas
15,283
13,449
13,439
13,449
13,449
Kentucky
26,111
0
0
0
0
Louisiana
35,015
35,257
35,267
35,257
35,261
Massachusetts
22,494
15,614
15,540
15,621
15,586
Maryland
12,715
9,838
9,679
9,782
9,699
Maine
4,588
4,031
4,020
4,024
4,020
Minnesota
0
17,244
16,988
17,159
17,098
Missouri
20,303
26,742
26,743
26,743
26,743
Northern Mariana
Islands
245
0
0
0
0
Montana
7,503
7,764
7,765
7,763
7,763
North Carolina
44,268
35,821
35,862
35,797
35,784
North Dakota
4,170
4,163
4,164
4,164
4,164
Nebraska
7,256
7,260
7,260
7,260
7,260
New Hampshire
6,394
9,776
9,766
9,774
9,774
New Jersey
32,013
46,887
47,020
46,925
47,230
Navajo Nation
1,369
0
0
0
0
Nevada
6,176
6,853
6,852
6,850
6,853
New York
48,574
44,873
44,696
44,777
44,732
Ohio
42,844
46,461
46,298
46,219
46,333
Oklahoma
30,421
30,611
30,614
30,615
30,616
Oregon
13,218
0
0
0
0
Pennsylvania
49,995
48,859
46,398
47,023
47,054
Rhode Island
3,175
1,477
1,477
1,475
1,477
South Carolina
19,816
19,818
19,818
19,817
19,815
South Dakota
4,095
0
0
0
0
Tennessee
22,006
0
0
0
0
Texas
113,625
154,480
154,480
154,479
154,480
Utah
9,277
7,852
7,840
7,826
7,796
Virginia
27,661
28,375
27,769
28,335
28,175
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-3
August 2022
-------
State
TTHM
Chloroform
Bromoform
Bromodichloromethane
Dibromochloromethane
Vermont
4,173
6,810
6,811
6,811
6,812
Washington
21,349
29,032
28,368
26,916
27,996
Wisconsin
9,976
16,404
15,506
16,223
16,046
West Virginia
13,049
13,028
13,026
13,022
13,022
Wyoming
4,803
3,293
3,287
3,293
3,293
Region 1 - Tr
bes
259
0
0
0
0
Region 2 - Tr
bes
62
0
0
0
0
Region 4 - Tr
bes
0
0
0
0
0
Region 5 - Tr
bes
543
0
0
0
0
Region 6 - Tr
bes
828
828
828
827
828
Region 7 - Tr
bes
137
71
72
72
73
Region 8 - Tr
bes
1,573
661
660
661
662
Region 9 - Tr
bes
2,243
0
0
0
0
Region 10 - Tribes
983
1,237
1,227
1,227
1,228
Exhibit C-3: Number of for Haloacetic acids (HAAs) Data Records by State
State
HAA5
Monochloro-
acetic Acid
Dichloroacetic
Acid
Trichloroacetic
Acid
Monobro mo-
acetic Acid
Dibromoacetic
Acid
Alaska
4,222
4,205
4,207
4,202
4,197
4,205
Alabama
41,186
0
0
0
0
0
Arkansas
21,435
21,445
21,442
21,439
21,442
21,442
American Samoa
158
0
0
0
0
0
Arizona
14,956
518
517
517
517
526
California
86,262
83,511
84,239
84,067
83,471
84,002
Colorado
23,814
9,290
9,290
9,413
9,290
9,290
Connecticut
10,777
8,925
8,925
8,925
8,924
8,905
Washington, D.C.
241
4
4
3
4
4
Delaware
2,981
3,014
3,016
3,016
3,013
3,014
Florida
48,591
0
0
0
0
0
Hawaii
2,223
2,144
2,161
2,160
2,162
2,164
Iowa
14,730
14,704
14,703
14,703
14,704
14,704
Idaho
4,039
164
165
164
164
167
Illinois
43,147
42,393
42,393
42,360
42,393
42,392
Indiana
19,024
0
0
0
0
0
Kansas
15,225
13,410
13,416
13,413
13,416
13,413
Kentucky
26,113
0
0
0
0
0
Louisiana
34,991
35,004
34,999
34,993
35,011
35,001
Massachusetts
21,448
15,525
15,558
15,545
15,495
15,485
Maryland
12,645
6,196
6,138
6,163
6,149
6,166
Maine
4,097
2,497
2,499
2,499
2,497
2,496
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-4
August 2022
-------
State
HAA5
Monochloro-
acetic Acid
Dichloroacetic
Acid
Trichloroacetic
Acid
Monobro mo-
acetic Acid
Dibromoacetic
Acid
Minnesota
0
11,390
11,501
11,469
11,385
11,415
Missouri
20,221
19,896
19,896
19,896
19,896
19,896
Northern Mariana Islands
209
0
0
0
0
0
Montana
3,824
3,809
3,807
3,805
3,809
3,810
North Carolina
44,217
35,794
35,720
35,721
35,802
35,783
North Dakota
4,161
4,155
4,155
4,155
4,155
4,155
Nebraska
2,903
2,903
2,903
2,903
2,903
2,903
New Hampshire
3,576
3,501
3,498
3,497
3,501
3,501
New Jersey
31,995
32,005
32,004
32,003
32,003
32,005
Navajo Nation
1,360
0
0
0
0
0
Nevada
5,265
5,238
5,238
5,232
5,235
5,237
New York
42,009
37,146
37,179
37,168
37,151
37,173
Ohio
42,508
42,510
42,510
42,483
42,529
42,462
Oklahoma
30,320
27,331
27,331
27,327
27,332
27,334
Oregon
13,221
0
0
0
0
0
Pennsylvania
50,166
15,471
15,487
15,483
15,481
15,479
Rhode Island
3,117
1,442
1,442
1,442
1,442
1,442
South Carolina
19,820
19,819
19,819
19,819
19,819
19,816
South Dakota
4,087
0
0
0
0
0
Tennessee
21,996
0
0
0
0
0
Texas
113,097
113,098
113,098
113,098
113,098
113,098
Utah
9,290
7,120
7,118
7,118
7,124
7,126
Virginia
27,387
21,656
21,724
21,732
21,677
21,646
Vermont
4,055
4,055
4,055
4,055
4,055
4,055
Washington
21,330
21,879
21,549
21,410
22,039
21,964
Wisconsin
9,848
9,850
9,848
9,847
9,849
9,849
West Virginia
13,021
12,990
12,992
12,990
12,995
12,989
Wyoming
3,755
2,246
2,248
2,247
2,246
2,249
Region 1 - Tr
bes
260
0
0
0
0
0
Region 2 - Tr
bes
55
0
0
0
0
0
Region 4 - Tr
bes
0
0
0
0
0
0
Region 5 - Tr
bes
476
0
0
0
0
0
Region 6 - Tr
bes
827
783
783
784
783
783
Region 7 - Tr
bes
127
47
49
49
47
49
Region 8 - Tr
bes
1,307
397
397
397
397
397
Region 9 - Tr
bes
2,146
0
0
0
0
0
Region 10 - Tribes
974
994
994
994
993
994
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-5
August 2022
-------
Exhibit C-4: Number of Chlorite and Bromate Data Records by State
State
Chlorite
Bromate
Alaska
0
203
Alabama
5,396
0
Arkansas
1,862
192
American Samoa
0
0
Arizona
2,418
601
California
1,520
6,065
Colorado
2,823
739
Connecticut
393
152
Washington, D.C.
0
0
Delaware
0
73
Florida
0
0
Hawaii
0
0
Iowa
2,128
94
Idaho
13
49
Illinois
1,897
222
Indiana
0
267
Kansas
4,933
651
Kentucky
1,786
0
Louisiana
0
0
Massachusetts
2,414
1,050
Maryland
31
0
Maine
350
214
Minnesota
66
189
Missouri
5,034
225
Northern Mariana Islands
0
0
Montana
5
779
North Carolina
920
540
North Dakota
0
201
Nebraska
195
30
New Hampshire
0
0
New Jersey
1,233
721
Navajo Nation
0
0
Nevada
2,031
886
New York
348
88
Ohio
1,391
364
Oklahoma
3,864
672
Oregon
3
235
Pennsylvania
15,344
306
Rhode Island
867
0
South Carolina
0
1
South Dakota
0
0
Tennessee
0
0
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-6
August 2022
-------
State
Chlorite
Bromate
Texas
26,960
4,289
Utah
2
314
Virginia
1,406
1,430
Vermont
0
0
Washington
0
2
Wisconsin
0
1,079
West Virginia
84
93
Wyoming
0
133
Region 1 - Tr
bes
0
0
Region 2 - Tr
bes
0
0
Region 4 - Tr
bes
0
0
Region 5 - Tr
bes
0
0
Region 6 - Tr
bes
0
96
Region 7 - Tr
bes
0
0
Region 8 - Tr
bes
47
0
Region 9 - Tr
bes
231
24
Region 10 - Tribes
0
29
Exhibit C-5: Number of Disinfection Byproduct Related Parameters Data Records
by State
State
Alkalinity
PH
All Total
Raw
Finished
Free
Total
Free
Total
Organic
Water
Water
Chlorine
Chlorine
Chlorine
Chlorine
Carbon
TOC
TOC
Data1
Data1
Data2
Data2
(TOC)
Alaska
1,533
191
2,915
524
169
55,417
498
176
0
Alabama
18,574
3,279
17,239
8,489
8,596
182,333
2,687
3,274
1,179
Arkansas
0
0
0
0
0
2
371,859
0
0
American Samoa
2
2
0
0
0
7,491
23
0
0
Arizona
3,540
776
4,221
2,114
2,107
7
5
0
0
California
0
125,308
32,893
18,884
13,637
0
0
0
0
Colorado
8,960
1,000
16,549
0
0
321,103
28,287
1,437
22
Connecticut
16,188
147,504
8,074
4,033
3,784
338,697
52,495
277
10
Washington, D.C.
1
41
17
6
0
8,688
13,666
64
159
Delaware
6,363
6,684
336
127
209
51,299
11,139
4,207
807
Florida
73
6,919
1
0
0
1,036,993
0
0
0
Hawaii
60
2
3
1
2
14,454
213
0
0
Iowa
2,587
216
6,555
2,838
0
315,592
366,036
11
11
Idaho
1,436
476
2,001
181
180
86,182
333
0
0
Illinois
12,766
3,111
17,715
8,857
8,858
828,654
521,300
0
0
Indiana
4,416
937
6,945
3,579
3,366
133,771
124,094
0
0
Kansas
10,654
3,114
15,085
7,479
7,510
143,389
129,953
0
0
Kentucky
15,521
3,331
27,990
13,997
13,993
351,946
133,281
0
0
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-7
August 2022
-------
State
Alkalinity
PH
All Total
Organic
Carbon
(TOC)
Raw
Water
TOC
Finished
Water
TOC
Free
Chlorine
Data1
Total
Chlorine
Data1
Free
Chlorine
Data2
Total
Chlorine
Data2
Louisiana
5,594
5,854
1
1
0
126,762
67,051
51,477
28,978
Massachusetts
10
4,785
0
0
0
0
0
0
0
Maryland
3,479
1,646
4,908
2,412
0
1,569
1,112
0
0
Maine
6,958
4,533
1,744
977
767
17,767
41,444
0
0
Minnesota
4,948
22,506
3,320
0
0
0
0
0
0
Missouri
11,408
11,710
17,671
8,749
8,922
293,054
420,150
0
0
Northern Mariana
Islands
0
261
0
0
0
6,196
1
0
0
Montana
4,109
724
6,456
2,561
2,774
59,138
21,696
5
47
North Carolina
30,586
39,246
26,460
12,990
13,470
631,245
315,687
0
0
North Dakota
1,554
472
2,176
1,083
1,093
0
0
0
0
Nebraska
0
7
989
0
289
27,968
46,685
733
141
New Hampshire
1,774
3,454
1,133
0
0
0
0
0
0
New Jersey
44,603
94,278
12,340
6,185
5,814
172,293
48,946
0
0
Navajo Nation
221
239
23
0
0
6,348
105
0
0
Nevada
4,995
7,692
1,027
539
488
41,407
226
0
0
New York
2,674
5,523
8,890
4,716
2,789
125,909
2,035
1,675
0
Ohio
1,769
2,402
156
1
14
761,430
802,905
0
0
Oklahoma
22,713
2,282
33,140
454
105
169,182
184,063
27,762
23,977
Oregon
3,145
0
7,699
4,600
3,097
346,652
5
0
0
Pennsylvania
60,459
86,188
33,174
17,903
0
180,486
87,219
0
0
Rhode Island
1,492
588
1,513
755
712
25,221
25,629
0
5
South Carolina
5,477
2,116
10,714
5,287
5,354
0
64
0
0
South Dakota
0
0
0
0
0
0
0
0
0
Tennessee
259
1,438
0
0
0
89,236
0
0
0
Texas
63,232
11,262
55,684
27,835
27,849
0
0
588,072
820,698
Utah
3,123
662
4,961
2,415
2,508
120,687
1,858
12,594
0
Virginia
20,257
10,176
20,652
10,312
10,340
596,794
6,035
8,278
0
Vermont
308
281
184
88
96
60,180
25,395
3,058
2,352
Washington
681
203
29
3
26
0
0
0
0
Wisconsin
6,786
6,187
3,442
0
0
327,565
0
0
0
West Virginia
10,692
2,507
17,588
4,807
4,246
13,231
176,456
0
132
Wyoming
1,634
111
3,071
1,584
1,452
70,446
12,085
14
4
Region 1 - Tr
bes
21
18
0
0
0
2,571
43
0
0
Region 2 - Tr
bes
0
0
0
0
0
575
24
0
0
Region 4 - Tr
bes
0
0
0
0
0
3,176
0
0
0
Region 5 - Tr
bes
0
0
0
0
0
16,392
0
0
0
Region 6 - Tr
bes
110
0
224
114
99
16,865
3,831
409
195
Region 7 - Tr
bes
83
4
176
0
0
1,480
906
0
0
Region 8 - Tr
bes
865
55
1,483
738
739
13,841
3,639
43
49
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-8
August 2022
-------
State
Alkalinity
PH
All Total
Raw
Finished
Free
Total
Free
Total
Organic
Water
Water
Chlorine
Chlorine
Chlorine
Chlorine
Carbon
TOC
TOC
Data1
Data1
Data2
Data2
(TOC)
Region 9 - Tribes
400
361
379
0
0
16,145
41
0
0
Region 10 - Tribes
304
159
251
140
104
9,787
10,664
0
0
1 Free and Total Chlorine data associated with Total Coliform
2 Free and Total Chlorine data associated with DBPs
Data Management QA/QC Process
for the SYR4 MDBP Preliminary Datasets
C-9
August 2022
------- |