oEPA
United States
Environmental Protection
Agency
Six Key Steps
for Developing and Using
Predictive Tools at Your Beach
Wat<
BASf
;r Quality Today
~^£i+
GOOD
D ON RECENT MONIIOHING
FOR E.COU BACVEHIA
U.S. Environmental Protection Agency
Office of Water
March 8,2016
820-R-16-001
-------
Foreword
This non-technical guide was developed by the U.S. Environmental Protection
Agency (EPA) to provide local government officials, beach managers, health
department personnel, and others basic information on how to develop predictive
tools in the context of an overall beach monitoring and notification program. Five
case studies are presented toward the end of this document as examples of how
predictive tools have been developed and used at actual beaches. Readers seeking
more in-depth design and implementation information are encourage to review the
sources used to develop this document as well as various on-line resources provided
by EPA and other agencies.
Front cover photos, starting upper left and moving clockwise:
* Little boy enjoying the waves, ©istockphoto.com.
« Water quality notification sign, USEPA.
• qPCR analysis, City of Racine Health Department.
« Lake Superior, Michigan, upper peninsula, ©istockphoto.com.
• Miami Beach, ©islockpholo.com.
Case study images are courtesy of the Chicago Parks District, Charles River Watershed
Association, Milwaukee Department of City Development, Milwaukee County Parks,
University of Wisconsin Zilber School of Public Health, and City of Racine Health Department.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Contents
Foreword
Acronym List v
Introduction 1
The Time-Lag Problem 2
Predictive Tools 3
Developing a Predictive Model 4
Step 1: Evaluate the Appropriateness of a FIB Predictive Tool 6
Introduction to Step 1 6
Is There a Need for a Predictive Tool? 6
Are Beach Characteristics Compatible with Predictive Tools? 7
Are There Sufficient Historical Data to Develop and Test a Predictive Tool? 7
Are There Funding and Other Resources Available to Develop, Operate,
Maintain, and Update a Predictive Tool? 8
Personnel and Technical Experts 9
Data Collection 9
Monitoring Equipment and Supplies 10
Modeling and Statistical Software 10
Model Evaluation Over Time 11
Step 2: Identify Variables and Collect Data 12
Introduction to Step 2 12
Key Attributes of Variable Data Sets 12
FIB Density 14
Independent Variables 16
Variables Relating to Bacteria Movement through the Drainage Area 16
Variables Relating to Bacteria Movement through the Receiving Water 17
Variables Relating to the Fate of Bacteria in the Swimming Area 19
Variables Relating to Activities and Conditions at the Beach 20
Step 3: Perform Exploratory Data Analysis 23
Introduction to Step 3 23
Virtual Beach Software 24
Data Management 24
Characterize the FIB and Independent Variable Data Sets 27
BoxPlots 27
Outliers 29
Comparing Data Distributions among Variable Subsets 30
-------
i i Six Key Steps for Developing and Using Predictive Tools at Your Beach
Contents
Examine the Relationship between FIB and Independent Variables 31
Scatterplots 31
Variable Transformation 32
Creation of New Variables 32
Correlation among Independent Variables 32
Analysis of Variance for Categorical Variables 33
Step 4: Develop and Test a Predictive Model 34
Introduction to Step 4 34
Data Sets 34
Reducing Errors 35
Virtual Beach 35
Models that Do Not Meet Performance Goals 40
Exceedance Probability Threshold 41
Step 5: Integrate the Predictive Tool into a Beach Monitoring and
Notification Program 44
Introduction to Step 5 44
Frequency of Running the Model 45
Notification Protocols 45
Types of Beach Notifications 47
Beach Advisories 47
Beach Closings 47
Preemptive Advisories 47
Permanent Advisories 48
Public Communication 49
Public Education 49
Public Outreach 50
Other Uses for Predictive Models 50
Step 6: Evaluate the Predictive Tool over Time 51
Introduction to Step 6 51
Changes to the Fate and Transport of FIB 51
Changes to Data Sources 52
Changes to Your Beach Program 53
Bibliography 54
Case Studies 61
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach i i i
Contents
Figures
Figure 1. Using sampling and culture analysis to make a beach notification decision 2
Figure 2. Using predictive modeling to make a beach notification decision 3
Figure 3. Box plot attributes 28
Figure 4. Box plots of E. coli density sorted by wind direction 29
Figure 5. Comparison of E. coli density over a four-year period 30
Figure 6. Scatterplots of E. coli vs. rainfall without transformation (A) and with
a log-transformation (B) 32
Figure 7. Plot of persistence model results of 2005 data (adapted from Francy
and Darner 2006.) 38
Figure 8. Plot of predictive model results of 2005 data (adapted from Francy
and Darner 2006.) 40
Figure 9. Plot of predictive model results of 2005 data expressed as exceedance
probability threshold (adapted from Francy & Darner 2006.) 42
Figure 10. Notification protocol for a beach program that uses sampling results and
a predictive model to make notification decisions 46
Figure 11. Notification protocol for a beach that uses only model results to make
notification decisions 46
Tables
Table 1. Beaufort Wind Scale 18
Table 2. Independent variables used in final statistical models from case studies 21
-------
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Acronym List
ANN Artificial Neural Network
ANOVA Analysis of Variance
CART Classification and Regression Tree
CESN Coastal Environmental Sensing Network
CPU Colony Forming Unit
CPD Chicago Parks District
CRWA Charles River Watershed Association
CSO Combined Sewer Overflow
DOY Day of the Year
EDA Exploratory Data Analysis
EMPACT Environmental Monitoring for Public
Access and Community Tracking
EnDDaT Environmental Data Discovery and
Transformation
EPA U.S. Environmental Protection Agency
FIB Fecal Indicator Bacteria
GBM Gradient Boosting Machine
GIS Geographic Information System
GLRI Great Lakes Restoration Initiative
MHD Milwaukee Health Department
MLR Multivariable Linear Regression
MPN Most Probable Number
NDBC National Data Buoy Center
NOAA National Oceanic and Atmospheric
Administration
NWIS National Water Information System
NWS National Weather Service
OLS Ordinary Least Squares
PLS Partial Least Squares
-------
vi Six Key Steps for Developing and Using Predictive Tools at Your Beach
Acronym List
QA Quality Assurance
QAPP Quality Assurance Project Plan
QC Quality Control
qPCR Quantitative Polymerase Chain Reaction
SCDHEC South Carolina Department of Health and
Environmental Control (SCDHEC)
SSO Sanitary Sewer Overflow
TMDL Total Maximum Daily Load
USAGE U.S. Army Corps of Engineers
USGS U.S. Geological Survey
UTC Coordinated Universal Time
UV Ultraviolet
UWM University of Wisconsin-Milwaukee
VB Virtual Beach
VB3 Virtual Beach Version 3
WDNR Wisconsin Department of Natural Resources
WWTP Wastewater Treatment Plant
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Introduction
Even the most pristine waters contain a variety of microscopic
organisms. Most of them are harmless, but a small portion can cause
illness in humans, including gastroenteritis; eye, ear, and throat infections;
hepatitis; and giardiasis. Generally disease-causing (pathogenic) organisms
encountered at swimming beaches originate from the feces of humans
and warm-blooded animals and are carried into recreational waters by
stormwater runoff.
Monitoring directly for pathogens in recreational waters is currently
impractical for a number of reasons, which include the difficulty in
identifying which pathogens are present, filtering large volumes of water
to isolate enough organisms to measure, and the high cost of analytical
methods. Fortunately, some types of nonpathogenic fecal bacteria are
transported along with disease-causing microbes. Known generically as
"fecal indicator bacteria" (FIB), they exist in far greater numbers than
pathogens and are easier to isolate and enumerate in the laboratory.
Consequently, FIB can serve as markers for the potential presence of
pathogens.
Currently EPA recommends two types of FIB for use in beach monitoring
programs: enterococci and Escherichia coli (E. coli) Either type can be used
at freshwater beaches, and enterococci are recommended for marine water.
State beach programs use exceedance of a beach notification threshold based
on the U.S. Environmental Protection Agency's (EPA's) national criteria
recommendation or a site-specific water quality standard for these bacteria
to determine when to issue a swimming advisory or close a beach (beach
notification).
Information on EPA's
recommended water quality
criteria is provided in the National
Beach Guidance and Required
Performance Criteria for Grants (the
National Beach Guidance) at http://
www.epa.qov/sites/production/
files/2014-07/documents/beach-
auidance-final-2014.pdf.
Key Resources on Predictive Tools
• Predictive Tools for Beach Notification. Volume I, Review and Technical Protocol (USEPA 201 Oa)
• Predictive Modeling at Beaches. Volume II, Predictive Tools for Beach Notification (USEPA 201 Ob)
• Developing and Implementing Predictive Models for Estimating Recreational Water Quality at Great Lakes
Beaches (Francy et al. 2013a)
• Virtual Beach 3.0.4: User's Guide (Cyterski et al. 2013)
• Accessing Online Data for Building and Evaluating Real-Time Models to Predict Beach Water Quality
(Mednick2009)
• Report of the Experts Scientific Workshop on Critical Research Needs for the Development of New or Revised
Recreational Water Quality Criteria (USEPA 2007)
• Beach Water Quality Decision Support System (Rockwell etal. 2013)
-------
Sampling beach water for FIB,
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Introduction
The Time-Lag Problem
At first glance, the process for determining when beach water is safe for
swimming seems fairly straightforward. If laboratory results indicate FIB
densities above the state water quality standard or other threshold value, a
beach notification is issued. If FIB densities are below the threshold value,
no action is taken (Figure 1).
8:00 a.m.
Collect FIB
sample
10:00 a.m.
Deliver sample
to lab for culture
analysis
7:00 a.m.
the next day
Receive
sample results
7:00 a.m.
the next day
Make beach
notification
decision
Figure 1. Using sampling and culture analysis to make a beach notification decision.
Underlying this beach notification system is the assumption that FIB
densities do not change (i.e., they persist) between the time a water sample
is taken and the laboratory results are known (usually a span of 18-24 hours
for culture methods—the methods most often used). At some beaches,
this "persistence model" is valid, especially when natural or artificial
barriers restrict water movement at the beach. At many open water beaches,
however, studies have shown that FIB density can fluctuate significantly
over relatively short periods of time. This phenomenon sets up possible
undesirable scenarios, for example:
Beach water is sampled on Monday. Results obtained on Tuesday
indicate that FIB density was above the state standard, so the
beach manager issues an advisory. On Wednesday, results of
follow-up samples taken on Tuesday reveal that FIB density was
back to normal and the water was actually safe for swimming (i.e.,
Monday's FIB levels did not persist into Tuesday). Consequently,
Tuesday—a perfectly good beach day—was lost. Monday's
swimmers, on the other hand, were exposed to high levels of FIB
and potentially unhealthy levels of pathogens.
None of the consequences are good: (1) Monday's swimmers might have
swum in contaminated water, (2) beachgoers might have lost recreational
time on Tuesday, and (3) area businesses might have suffered economic losses
due to the lack of customers. The 18-24-hour time-lag can be a problem.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Introduction
Predictive Tools
The time-lag problem of culture analysis and the shortcomings of the
persistence model have led to the development of tools that predict whether
the applicable water quality standard has been or is likely to be exceeded so
that beach notifications can be issued in a more timely way. When integrated
properly into a beach notification program, these tools can provide an early
warning of potentially unsafe swimming conditions. This guide presents
an overview of how to develop a predictive tool for your beach program. It
focuses mainly on implementation activities and issues and not on technical
details.
In most instances, the "tool" is actually a mathematical equation or
"model" designed to produce one of two types of output: (1) a FIB density
prediction, or (2) a probability prediction that expresses the chances that an
applicable water quality standard or notification threshold will be exceeded
(e.g., "There is 60 percent chance that the standard or threshold will be
exceeded."). Either output type can be used by beach managers to "trigger"
a beach notification. Throughout this document, when "bacteria density"
is mentioned as the model output, assume that it includes "exceedance
probability" as an alternative form of output, unless indicated otherwise.
Figure 2 shows the timeline for a beach program using predictive modeling.
The time required to make the beach notification decision is significantly
shorter than the time required in the scenario shown in Figure 1.
8:00 a.m.
Collect model
input variables
(e.g., rainfall,
turbidity)
8:30 a.m.
Run predictive
model
8:45 a.m.
Interpret results,
compare to WQS
or other
notification
threshold
9:00 a.m.
Make beach
notification
decision
Figure 2. Using predictive modeling to make a beach notification decision.
In addition to improving the timeliness of beach notifications, predictive
models can also help reduce sampling and increase the accuracy of
identifying notification days by adding to the existing monitoring
program (e.g., if FIB sampling occurs only once or twice a week because of
resource constraints, predictive models can provide information for timely
notification on other days).
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Introduction
Developing a Predictive Model
This document presents six basic steps that an interdisciplinary project team
(the Beach Team) might take to analyze, develop, implement, and evaluate
the success of a predictive model. Each step is discussed in a separate section.
• Step 1: Evaluate the Appropriateness of a Predictive Tool. This
section outlines factors that your Beach Team should consider before
proceeding with a modeling project. The team should assess the
degree of risk to the public from swimming at the beach, confirm that
essential historical FIB data exist that can be used to develop the model,
identify any beach conditions or attributes that are not compatible with
FIB modeling, and evaluate whether sufficient resources are available
locally to support model development, operation, and maintenance.
• Step 2: Identify Variables and Collect Data. This section introduces
independent variables influencing the movement of bacteria from their
sources, through the drainage system and receiving water, and into the
swimming area of a beach. It offers insights into which independent
variables might serve as the best candidates for modeling FIB at a
beach.
• Step 3: Perform Exploratory Data Analysis. Once a set of candidate
independent variables is selected, they must be statistically evaluated to
see how well they correlate with FIB densities. Results from exploratory
data analysis further refine the list of candidate variables.
• Step 4: Develop and Test the Predictive Tool. Models can range from
simple to complex. This section begins with a discussion of rainfall-
based models that need only one independent variable to develop and
run. The discussion continues with modeling using multiple variables
and concludes with techniques for testing the model.
• Step 5: Integrate the Predictive Tool into a Beach Monitoring and
Notification Program. Predictive tools are one component of an
overall beach program. To successfully integrate a model into a beach
monitoring program, your Beach Team should develop protocols for
collecting input data, running the model, and using model results.
• Step 6: Evaluate the Predictive Tool over Time. To ensure that model
output remains accurate and relevant over time as beach conditions
change, your Beach Team should evaluate the model's accuracy at least
annually.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Introduction
This document concludes with a series of case studies that illustrate various
ways that predictive models have been developed and implemented. The
following case studies helped inform this guidance:
• The Grand Strand, South Carolina. The South Carolina Department
of Health and Environmental Control (SCDHEC) developed a
stormwater model to predict FIB densities at South Carolina state
beaches. This case study highlights the limitations of monitoring
equipment and the value of collaboration and technology.
• Charles River, Massachusetts. The Charles River Watershed Association
(CRWA) worked with Tufts University to develop a statistical model
to predict water quality in the Lower Charles River Basin. CRWA's
experience highlights the importance of model simplicity and the
availability of real-time data when resources are limited.
• Chicago, Illinois. The Chicago Parks District (CPD) developed a
predictive model in 2011 with the assistance of the U.S. Geological
Survey (USGS). CPD s experience emphasizes the need for
comprehensive knowledge of the beach environment as well as
adequate funding and technical resources to collect data and conduct
statistical analyses.
• Racine, Wisconsin. The City of Racine and the Wisconsin Department
of Natural Resources developed NOWCAST statistical models for
Racine's two beaches using EPA's Virtual Beach (VB) software.
Racine's experience illustrates the importance of a robust data set and
the advantages of reinforcing a model with other beach monitoring
components.
• South Shore Beach, Wisconsin. With assistance from the University
of Indiana and USGS, the Milwaukee Health Department (MHD)
developed a statistical model for three of its public beaches based on
24-hour rainfall data and previous 24-hour bacterial sampling data.
MHD's experience shows that a model can be a good fit for the local
public health department.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 1: Evaluate the Appropriateness of a
FIB Predictive Tool
Introduction to Step 1
While a predictive tool might provide a huge benefit to some beach
programs, your Beach Team should first carefully consider and answer the
following questions to make sure that a predictive tool is right for your beach:
a. Is there a need for a predictive tool?
b. Are beach characteristics compatible with predictive tools?
c. Are there sufficient historical data to develop and test a predictive tool?
d. Are there funding and personnel experienced with model development
and maintenance available to develop, operate, maintain, and update a
predictive tool?
Is There a Need for a Predictive Tool?
One of the first things your Beach Team should evaluate is whether a
predictive tool is needed. Remember that the main purpose of a predictive
tool is to predict whether the applicable water quality standard has been or
is likely to be exceeded during the time period prior to
culture results being available (time lag) on a sampling
day or on nonsampling days. Using time series analyses,
EPA reports that bacteria levels at the beach can change
over relatively short periods of time (USEPA 2010c). If FIB
density at your beach, however, is known to persist for
24 hours or more, the need for making predictions is not
as important. Traditional water sampling and laboratory
analysis alone might adequately protect swimmer health.
Other situations when a predictive tool might not be
needed include (1) beaches being sampled daily using
rapid methods, and (2) beaches never, or hardly ever,
exceeding applicable recreational water quality standards.
Given several beaches to manage and limited budgets,
your Beach Team will likely rank their beaches according
to factors such as potential risk to human health
presented by pathogens and beach use. These rankings
(described further in Chapter 3 of EPA's 2014 National
Beach Guidance) can also help identify the beaches that
could benefit most from predictive models.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 1: Evaluate the Appropriateness of a FIB Predictive Tool
Are Beach Characteristics Compatible with
Predictive Tools?
Beaches that make the best candidates for predictive tools are located in
environmental settings that are themselves predictable. A good candidate
beach operates under a fairly constant range of "normal" conditions that,
when processed through a predictive tool, should yield a good estimate of
FIB levels. The tool operates like an "if...then" statement. If a set of these
conditions occurs, then you get a specific FIB density. Importantly, most
predictive tools are developed using historical data which, in effect, describe
and define the norm in terms of both the conditions and the predicted value.
Once in operation, if the tool is presented with conditions outside the norm,
it might not yield accurate results. Therefore, the team might have to revisit
the conditions predictive for FIB density.
Beaches that might not be good candidates for predictive tools are usually
those subject to a wide or frequently changing set of conditions and
disturbances that impact FIB density, making "normal" difficult to define
and characterize. These conditions might include frequent impacts by spills
or illicit discharges or periodic visits by large flocks of birds. Some open
ocean beaches are not good candidates for modeling simply because of the
sheer complexity of the various meteorological conditions, tidal patterns,
offshore currents, and other factors that occur.
Are There Sufficient Historical Data to Develop and
Test a Predictive Tool?
Access to a sufficient amount of historical FIB density data and
corresponding data describing a variety of environmental conditions (i.e.,
independent variable data) is crucial for developing and testing predictive
models. EPA recommends having at least 50 observations; but 100 or more is
preferable (USEPA 2010b). Ideally the observations should represent a range
of conditions experienced at the beach and include data collected in normal
seasons, dryer-than-normal seasons, and wetter-than-normal seasons. This
is rarely the case; but the closer you can get to this ideal, the more robust
your model will be.
An important part of the model development process is testing the model.
Francy et al. (2013a) recommends that you collect data for at least three
seasons, then use two seasons' data as the training dataset and one season's
data as the testing dataset.
Checklist of Beach and Program
Characteristics Compatible with
Modeling
• The beach operates
under a constant range of
"normal" conditions.
• Exceedances of beach
notification threshold
values occur occasionally
but are not a chronic
problem.
• FIB densities change over
relatively short periods of
time (time-lag problem).
• A sufficient amount
of historical FIB and
independent data exists.
• Funding for personnel
and technical experts is
available.
• Monitoring equipment is
available.
• Computer equipment and
software are available.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 1: Evaluate the Appropriateness of a FIB Predictive Tool
A more complete discussion of independent variable data along with tips
on how to collect them is provided in step 2. For preliminary purposes,
however, your Beach Team should investigate the FIB density, rainfall data,
and data on factors that affect water movement in and around the beach
(i.e., wind and wave direction and magnitude). Water quality data such as
turbidity and water temperature are also important factors at some beaches;
as are data on near-shore sources of fecal pollution (e.g., birds).
Sources of data include federal agencies (e.g., the National Weather Service
(NWS) and USGS) as well as various state and local agencies. A particularly
valuable resource is beach sanitary surveys, especially if they are conducted
on a daily basis (see http://www.epa.gov/beach-tech/beach-sanitary-surveys
for sanitary surveys developed by EPA). Sanitary
surveys provide site-specific data that match exactly
to the time a FIB sample is collected.
If a minimum of three seasons' worth of historical
data is not readily available, then your Beach Team
might need to collect more data before developing
the model. Step 2 provides more information on data
collection.
Are There Funding and Other
Resources Available to Develop,
Operate, Maintain, and Update a
Predictive Tool?
The development of a predictive model is just the first
phase of an overall predictive modeling program.
Once the model is developed, there are a variety of
costs associated with operating and maintaining it.
The majority of local agencies responsible for beach
programs—usually city or county public health
departments—have limited staff time, technical exper-
tise, and funding available for projects. Consequently,
resources and costs must be carefully planned and
budgeted. Major costs to consider include:
• Personnel and technical experts.
• Data collection.
• Monitoring equipment and supplies.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 1: Evaluate the Appropriateness of a FIB Predictive Tool
• Modeling and statistical software.
• Model evaluation over time.
Personnel and Technical Experts
Your Beach Team needs to have the right combination of staff to develop
and implement a predictive tool. The most important staff will be the
following:
• Field staff—to conduct sampling and maintain equipment.
• Modeler/statistician—to analyze data and develop, validate, and refine
the model.
• Beach manager—to integrate the model into your beach program and
conduct public outreach.
It can be helpful to collaborate with others, such as universities, federal
agencies, and state or local governments (see Collaboration with Others
text box). They can be excellent resources, especially when the technical
knowledge of a statistician is required.
Data Collection
In addition to gathering historical data, the team will need to continue to
collect data from the same sources to run the model once it is implemented.
Data collection is discussed in more detail in step 2. If the data source changes
Collaboration with Others
Many partnerships have successfully developed a number of modeling programs. For example, USGS
has played a major role in modeling efforts in the Great Lakes region. They offer extensive resources,
expertise, and comprehensive knowledge of watersheds (including beaches) and can provide in-depth
statistical tools and statisticians to run them. Local universities can be another highly valuable resource.
Graduate students from ecology, biology, and environmental science and engineering departments
might be available to assist with water quality monitoring, sampling, and even model development. A
mutually beneficial partnership might develop as students have the opportunity to apply their research
to a real-world scenario and it allows for low-cost sampling and monitoring. In addition, universities often
have their own monitoring equipment, laboratories, and even statistical software that can be shared.
Some beach programs have models that began as part of graduate theses and dissertations. For example,
SCDHEC developed its model with the help of a graduate student at the University of South Carolina who
used it as part of a master's thesis. The CRWA's predictive model was also developed as part of a master's
thesis by a student at Tufts University. These collaborative efforts proved to be highly advantageous,
providing a wealth of knowledge and expertise, as well as significant cost savings.
-------
1O
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 1: Evaluate the Appropriateness of a FIB Predictive Tool
for any of the model's variables or significant alterations occur to the beach
and surrounding area, the model will need to be recalibrated (see step 6).
Monitoring Equipment and Supplies
Even when there is an abundance of data for your beach from external
sources, use of monitoring equipment such as data sondes, flow meters, and
rain gauges might provide you with more accurate data. The main drawback
of using this equipment is that it can be expensive to purchase and maintain,
especially when it must be placed in harsh environments and exposed to
weather, waves, sand, and vandalism. Sufficient funding resources as well as
staff to maintain and repair equipment are necessary. As described in the
case studies, both MHD and SCDHEC stopped using data sondes and rain
gauges because of their high maintenance costs, but were able to develop
successful models using other data sources.
Modeling and Statistical Software
Some models are simple enough to run in a basic Excel spreadsheet, with
no additional software required. Statistical software can be purchased,
but it might have licensing costs. EPA developed VB, a free model builder
software program (described in more detail in steps 3 and 4), that enables
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 1: Evaluate the Appropriateness of a FIB Predictive Tool
11
beach managers and others to develop or update models using statistical
techniques. The software is user-friendly; however, preparing data for input
into the modeling software requires considerable time and expertise. Visit
http://www.epa.gov/exposure-assessment-models/virtual-beach-vb for more
information about VB.
For information on how to manage your data set for use in VB, see step 2. If
the answers to the four questions asked in the introduction to this step are
"Yes", you are in a good position to move forward with the development of
a predictive model. If you determine that a predictive model is not needed
or your beach is not a good candidate for a modeling project, consider
working with your public health officials to alter the current monitoring
program to focus your efforts during times when conditions favor high FIB
densities. If your answer is "No" to question c., you might need to collect
additional data to build a historical database for use for model development
in the future. If your answer is "No" to question d., consider exploring the
potential of collaborating with others interested in model development (see
"Collaborating with Others" text box). If that option is not available, there
are other ways to increase the level of public health protection at beaches,
including the use of sanitary surveys and preemptive advisories.
Model Evaluation Over Time
Once your model has been developed, it must be maintained to keep it
running properly and performing as expected. This process is covered in
step 6.
-------
12
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect
Data
Introduction to Step 2
After your Beach Team has determined that a predictive model is
appropriate for a beach, it can proceed to Step 2: identifying candidate
independent variables for use in model building and collecting a set of high-
quality historical data for those variables and FIB density. Refer to page 16
for list of independent variables.
To predict an exceedance of a water quality standard, your Beach Team
must first identify environmental conditions that likely affect the levels of
bacteria at the beach. In the context of predictive modeling, those conditions
are the "independent variables." In this step you are trying to identify and
collect data for the independent variables that exhibit the strongest statistical
relationship with the dependent variable, FIB density. It is important to
keep in mind, however, that a strong statistical association should not be
interpreted as reflecting actual causative mechanisms for an observed
elevation of FIB densities. The association is based only on the correlation
of past observations of independent variables with FIB density. When such
associations are evident, further scientific investigation can inform beach
managers of the nature of the association and improve their understanding
of future occurrences and how much weight to give them.
Key Attributes of Variable Data Sets
For model-building, the variable data sets should possess the following basic
characteristics.
An adequate amount of data to develop (training dataset) and validate
(testing dataset) the model. EPA recommends at least 50 observations
for model development, but 100 or more are preferred. There are several
ways of portioning the available data into training and testing datasets.
One common way is to collect three seasons of data, then designate two
seasons as the training data set and the third as the testing data set. You
can construct your model using fewer data, but model performance might
suffer because accuracy might be low.
High-quality data, including quality assurance documentation. Ideally,
a quality assurance plan exists that describes data collection methods,
protocols, and procedures. Given the particular variable, the plan might
include laboratory methods; field sampling protocols, including metadata
(e.g., sampling time and depth of sample); and data processing procedures.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
Easily collected or obtained data. Because predictive models are often
run daily, all input data must be obtained quickly. Automatic samplers
with data transmission capabilities and data that are easily downloaded
from government agencies' websites (e.g., NWS and USGS) represent
good data sources. The "ease of collection" will likely eliminate many
potentially good candidate variables from consideration. In some
cases, a more easily collected surrogate variable might convey similar
information; in other cases, your Beach Team will have to abandon
the variable and look elsewhere. In general, data collected locally are
preferred over data obtained from external sources. If data from external
sources are used, it is preferable if the collection methods are subject to
good QA/QC (e.g., USGS or NWS data).
Consistent procedures for collecting data for pre- and post-model
development. Independent variable data have two functions: (1) they
are used to develop the predictive model, and (2) they are used as input
variables to run the model. When you use historical data to build a
model, you assume that the methods used to collect and report that data
will remain in place for future model input data collection. Consistency
is key. You cannot mix and match data sources for the same variables
without re-validating the model.
Temporally relevant independent variable and FIB data. FIB sampling
at swimming beaches usually occurs early in the morning. Some
independent variable data are likely collected at the time of sampling.
Other data might relate to conditions that occurred prior to the sample
collection time, such as cumulative rainfall over the previous 12 hours.
As you determine your independent variables, you must keep them
temporally relevant to the sample time. If the sample time was at 8:00 a.m.,
you need to ensure that your independent variables are also based on 8:00
a.m. or an earlier time based on knowledge of stream effects and runoff.
13
Quality Assurance and Quality
Control
EPA's National Beach
Guidance (2014) provides
important information and
recommendations concerning
primary data collection to
ensure that all observations,
samples, and measurements
are properly and consistently
collected and processed.
Specifically, the Agency
recommends that a quality
assurance project plan (QAPP)
be developed to ensure that
collected data are complete,
accurate, and suitable for the
intended purpose. Essentially,
the QAPP serves as a blueprint
for collection activities and
quality assurance (QA) and
quality control (QC) procedures.
Also included in the plan should
be detailed descriptions of
standard operating procedures
and staff training requirements.
-------
14
Some Basic Bacteria Facts
FIB are very small, immobile
single-celled organisms.
They have to be physically
transported from point to point
by some mechanism. Usually
this mechanism is moving
water.
Life expectancy of individual
cells outside their natural
environment is usually short,
around 2-5 days. Many
stressors can shorten it further.
FIB can survive and even
multiply for some time in
sediments and algal mats.
They can be easily stirred up
and resuspended in overlying
waters.
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
FIB Density
It is important that FIB density measurements are taken at a consistent
location, depth, and time. Sample collection, handling procedures, and
analytical methods must be consistent as well. This section lists EPA
recommendations concerning FIB sampling and analysis. They are presented
not to encourage immediate changes to data collection procedures (which
would disrupt consistency), but as background to allow better interpretation
of FIB density data that have already been collected.
• Sample Location. Sample sites should be located where the greatest
recreational use occurs. Features that might directly affect the
movement of FIB to and from the beach, such as outfalls and jetties,
should also be taken into account.
• Sample Depth. Samples should generally be taken in approximately
knee- to waist-deep water unless that depth poses a safety risk to the
sampler (e.g., powerful waves). The sample should be drawn 0.5-1 foot
below the surface. Samples taken from shallower waters might not
accurately represent ambient FIB density due to the resuspension of
bacteria from sediments.
• Sample Time. Samples taken early in the morning are generally
considered the best for beach monitoring programs because that is
the time when FIB densities are usually the highest. The sampling
time should be consistent day to day because FIB density can change
fairly quickly in response to increasing sunlight intensity, temperature,
and other environmental conditions. EPA's National Beach Guidance
includes a detailed discussion on event-scale, diurnal, and tidal
variability (USEPA 2014).
• Sample Frequency. For the purpose of developing a predictive model,
the more samples the better. Most beach programs sample high priority
beaches at least once a week during the swimming season. In general,
a model will be increasingly robust as more FIB data are collected and
matched with independent variable data. The Report of the Experts
Scientific Workshop on Critical Research Needs for the Development of
New or Revised Recreational Water Quality Criteria recommends you
collect data four or five times a week covering a variety of sampling
events to capture temporal variability (e.g., if FIB sampling occurs only
once or twice a week due to resource constraints, predictive models can
provide information for timely notifications on other days) (USEPA
2007).
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
• Sample Processing and Analysis. Your Beach Team should consult the
state for the proper procedures and QA/QC requirements, including
holding times, for collecting, handling, and analyzing water samples.
EPA has approved a number of analytical methods for culture analyses
of recreational waters (40 CFR part 136). In addition, EPA has validated
quantitative polymerase chain reaction (qPCR) methods for measuring
water quality at beaches (see http://www.epa.gov/cwa-methods/other-
clean-water-act-test-methods-microbiological). Standard Methods
for the Examination of Water and Waste-water (APHA 1998) is also a
source of valuable information.
FIB density data is usually reported as colony forming units (CPUs) and
most probable number (MPN). CPU is a measurement based on a direct
count of bacteria colonies grown on Petri plates and substrate media from
water samples passed through membrane filters. MPN tests involve multiple
tubes that are allowed to ferment over time. Probability formulas are applied
to the number of tubes that produce a positive reaction, and a FIB density
estimate is calculated. EPA has approved methods for both types of analyses
and either is acceptable to use in modeling. The key is consistency, however.
If the type of analysis has changed for your beach, construct (or reconstruct)
your model using only data generated using the current analytical method.
Sources of Bacteria
Human Sources
Some older cities have combined sewer systems that convey both sanitary sewer wastewater and
stormwater in one piping system. During periods of significant rainfall, the capacity of the combined sewer
might be exceeded. When this happens, the excess mixture of sanitary wastewater and stormwater is
discharged at combined sewer overflow (CSO) points, typically to rivers and streams. During dry weather
periods, human-derived bacteria usually cause a problem at beaches only if septic systems in the area fail
or wastewater pipes are compromised or illegally connected to storm drains.
Animal Sources
In urban and suburban landscapes, animal-derived bacteria and other pollutants tend to collect on
impervious surfaces. Sources typically include dogs and cats; waterfowl such as geese, gulls, and ducks; and
scavenger species such as raccoons, rats, and pigeons. During the beginning of a storm, the initial runoff
flow will sweep up most of the deposited fecal matter and quickly carry it into the drainage network. Known
as the "first flush" phenomenon, this flow typically has significantly higher concentrations of bacteria than
subsequent flows that occur as the storm lingers. In general, the amount of first flush pollutants available for
transport is a function of the number of dry days since the previous storm. Animal-derived bacteria can also
be transported from feedlots, barnyards, and other confined-animal facilities located in the drainage area.
is
qPCR
Newer analytical technologies
have accelerated the timeliness
of laboratory results. One
such method is quantitative
polymerase chain reaction
(qPCR) which quantifies a
targeted genetic sequence
for both viable and nonviable
forms of the indicator bacteria.
Because the method does not
require culturing live bacteria,
analysis can be completed in
less time—within 2-4 hours of
receipt of the sample by the
laboratory. Although both qPCR
and bacteria culture methods
report FIB density, they are
derived using significantly
different methods. These
results should not be combined
when building and operating a
predictive model.
-------
16
Common Parameters Used in
Models
• Parameters relating to sources
of FIB atthe beach
- Beach attendance
- Bather counts
- Dog counts
- Bird counts
• Parameters relating to
movement of FIB through the
drainage area
- Cumulative rainfall
- Antecedent dry days
- Stream discharge
- Stream stage
• Parameters relating to
movement of FIB in receiving
waters
- Currentspeed
- Current direction
- Current A- and 0-components
(created byVB)
- Wind speed
- Wind direction
- Wind A-and 0-components
(created byVB)
- Water level
- Barometric pressure
• Parameters relating to the fate
FIB atthe beach
- Solar irradiance
- Air temperature
- Water temperature
- Cloud cover
- Dew point
- Day of year (ordered number)
- Turbidity
- Conductivity
- Wave height
- Wave direction
- Wave A-and 0-components
- Chlorophyll
- Dissolved Oxygen
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
Independent Variables
Independent variables associate directly and indirectly to environmental
conditions. To aid in choosing the best candidate variables, your Beach Team
should become familiar with the likely sources of bacteria that affect the
beach, how they are transported to the beach area, and conditions that tend to
increase or decrease FIB density in the swimming area. A useful way to collect
this information is by using a sanitary survey (see Data Sources text box on
page 21 for more details on sanitary surveys). That information can serve as a
valuable starting point for selecting candidate independent variables.
Independent variables can be roughly categorized into one of four groups:
• Variables relating to bacteria movement through the drainage area.
• Variables relating to bacteria movement through the receiving water.
• Variables relating to the fate of bacteria in the swimming area.
• Variables relating to activities and conditions at the beach.
Other good sources of guidance on selecting variables include:
• Predictive Tools for Beach Notification. Volume I, Review and Technical
Protocol (USEPA 2010a).
• Predictive Modeling at Beaches. Volume II, Predictive Tools for Beach
Notification. (USEPA 2010b)
• Procedures for Developing Models to Predict Exceedances of Recre-
ational Water Quality Standards at Coastal Beaches: U.S. Geological
Survey Techniques and Methods 6-B5 (Trancy and Darner 2006).
Variables Relating to Bacteria Movement through the
Drainage Area
The amount, intensity, and duration of a rain event determine the timing
and amount of runoff and the extent of water movement in the drainage
area. Since runoff functions as the primary transport mechanism for both
human- and animal-derived bacteria, rainfall is usually identified as a very
important independent variable for FIB predictive modeling.
Your Beach Team's analysis of the drainage network and the potential
sources of bacteria within the network should help identify the specific types
of rainfall statistics that might be considered for use in the predictive model.
The most common choice is cumulative rainfall over a specific time period
prior to the FIB sample time (e.g., 6-hour, 24-hour, 48-hour lag).
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
17
Further analysis might lead you to create an even better variable by assigning
more importance or "weight" to segments within a chosen time range. For
example, Francy and Darner (2006) created a weighted 3-day rainfall statistic
by assigning the most weight to the rainfall total occurring 24-hours
immediately prior to the sample time and progressively lesser weights to the
amounts occurring one and two days before sampling (see equation below).
+R
Day3)
where:
Rw = weighted cumulative variable
RDa 1 = 24-hour total rainfall/0-hour lag
RDa 2 = 24-hour total rainfall/24-hour lag
RDa 3 = 24-hour total rainfall/48-hour lag
Rainfall data can be collected locally using a rain gauge, or it can be
obtained from an external source such as the NWS. Some water and
wastewater utilities operate rain gauges near beaches and might be good
sources of data. Locally collected data might correlate better with actual
conditions at the beach site; however, operating and maintaining rain gauges
can be challenging. Data available on the Internet can be easy to download
and use, but might not adequately characterize local conditions.
Since drainage flow is a direct result of rainfall, if one or more streams in
the drainage network have monitoring gauges in place that provide daily
or hourly measurements of discharge
or the height of the water surface (i.e.,
stage or gauge height), those data might
also prove to be valuable as independent
variables.
Variables Relating to Bacteria
Movement through the
Receiving Water
The endpoints of the drainage networks
are typically mouths of streams or
drainage outfall structures that discharge
into a lake, river, estuary, or ocean
(the "receiving waters"). When outfalls
are not located directly on the beach,
-------
18
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
bacteria contained in the discharge must be transported from the outfall,
through the receiving water, and to the beach to cause unhealthy conditions
for swimmers. In addition to the lateral movement of bacteria from outfall
to beach, bacteria residing in sand or sediments can move vertically into the
water column when the sand or sediment is stirred up.
Wind, waves, and water currents are usually the three most important
independent variables associated with the movement of bacteria through the
receiving water. They all can be characterized by direction and magnitude
measurements. In general, "continuous variables" (those with numeric
values) are preferred over "categorical variables" (those with labels as values),
but both forms have been successfully used in predictive models.
Routine sanitary surveys tend to collect either (1) very simple discrete
measurements (continuous variable) of wind speed and direction, current
speed and direction, and wave height, or (2) categorical descriptions of wind
and wave attributes. The Beaufort Wind Scale, developed in 1805 by Sir
Francis Beaufort, U.K. Royal Navy, is an example of a categorical approach
to measuring wind and waves (Table 1).
Table 1. Beaufort Wind Scale
Wind
(Knots)
Less
than 1
1-3
4-6
7-10
11-16
17-21
22-27
28-33
34-40
41-47
48-55
56-63
64+
Classification
Calm
Light Air
Light Breeze
Gentle Breeze
Moderate Breeze
Fresh Breeze
Strong Breeze
Near Gale
Gale
Strong Gale
Storm
Violent Storm
Hurricane
On the Water
Sea surface smooth and mirror-like
Scaly ripples, no foam crests
Small wavelets, crests glassy, no breaking
Large wavelets, crests begin to break, scattered whitecaps
Small waves 1-4 ft. becoming longer, numerous whitecaps
Moderate waves 4-8 ft taking longer form, many whitecaps, some spray
Larger waves 8-13 ft, whitecaps common, more spray
Sea heaps up, waves 13-19 ft, white foam streaks off breakers
Moderately high (18-25 ft) waves of greater length, edges of crests begin to
break into spindrift, foam blown in streaks
High waves (23-32 ft), sea begins to roll, dense streaks of foam, spray may
reduce visibility
Very high waves (29-41 ft) with overhanging crests, sea white with densely
blown foam, heavy rolling, lowered visibility
Exceptionally high (37-52 ft) waves, foam patches cover sea, visibility more
reduced
Air filled with foam, waves over 45 ft, sea completely white with driving
spray, visibility greatly reduced
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
19
Automated collection of wind, wave, and current data offers several
advantages over manual collection because (1) it is more easily obtained,
(2) it eliminates the subjectivity associated the measurements, and
(3) assuming data are recorded continuously, it allows for the construction
of antecedent variables (e.g., average wind speed over the previous 24 hours).
The most convenient source of wind, wave, and current data is the National
Data Buoy Center. This agency is part of the NWS and maintains a network
of 90 buoys and 60 coastal stations that collect hourly data on wind speed
and direction and wave height. Some also collect data on currents.
Tides also create currents that can affect
FIB density in beach areas. Incoming tides
usually tend to keep FIB in residence at some
beaches, while outgoing tides can serve to
flush them away. Although very site-specific,
the tidal cycle might be an important
independent variable at some ocean beaches.
Man-made structures such as jetties, groins,
piers, breakwaters, and seawalls can affect
FIB movement through the receiving water at
some beaches. Those structures can enclose
most or part of a beach, preventing water
circulation between the beach and open
water. Several studies have reported higher
densities of FIB in those situations because
of the retention of bacteria from lack of
flushing.
Variables Relating to the Fate of
Bacteria in the Swimming Area
Bacteria residing in the receiving water,
including in the swimming area, are subject
to many conditions that can increase or
decrease their presence in the water column.
One of the more important stressors of
bacteria is sunlight—specifically, ultraviolet
(UV) light. Exposure causes bacteria to die
off, which is why FIB densities are usually
found to be greater in the early morning
before the sun rises higher in the sky.
-------
2O
Waterfowl as a Pollution
Source
Gulls and other waterfowl
are often a source of fecal
contamination at beaches,
particularly in the Great Lakes.
Hansen et al. (2011) concluded
that waterfowl, including
Canada geese, ring-billed
gulls, and mallard ducks were
the primary source of E, coli
contamination at beaches
near Duluth, Minnesota, and
Superior, Wisconsin. Chicago
and Racine have also correlated
gull populations at its beaches
to FIB densities in beach water
samples (Converse et al. 2012;
Whitman and Nevers 2003;
Hartmann et al. 2013). Chicago
has reduced the numbers
of gulls at its beaches by
managing their nests.
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
Turbidity is a common measurement and often found to be an important
independent variable for predictive modeling. It is essentially the cloudiness
of the water as defined by a measurement of scattered light. Turbidity is
generally caused by a combination of suspended solids, colloidal matter,
and algae. Cloud cover also affects the amount of light penetrating the water
column and is sometimes used as an inverse surrogate for UV light. Staining
of the water by tannins also affects light penetration.
Light alone is not the only factor attributable to turbidity's value as
an independent variable. Perhaps even more importantly, stormwater
runoff carries with it a load of suspended solids, silt, and other material.
Consequently, outfall discharge during and following storms is usually more
turbid than the receiving water. Thus, turbid water moves in tandem with
outfall bacteria. Other parameters associated with stormwater runoff—such
as total suspended solids, salinity, and conductivity—can also serve as
independent variables.
Suspended solids can play a role in removing FIB from the water column
via sedimentation. Individual bacteria cells are very small (some are only a
micron in length) and easily remain suspended in water. But they can also
be adsorbed on sediment particles and, in doing so, increase their weight
and their chances of settling to the bottom. Once in the sediments, however,
they can remain viable and be resuspended in the water column by any
number of turbulent forces, including waves and even swimmer activity.
Variables Relating to Activities and Conditions at the Beach
The variables described in the previous subsection are related to sources of
FIB that (1) originate in the drainage area and are subsequently transported
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
through the drainage network and receiving waters to the beach, and
(2) have settled from the water column to the sediments. At some beaches,
however, significant sources of bacteria found in or immediately adjacent
to beaches can cause high FIB densities within the swimming area.
For example, resident populations of gulls and Canada geese have been
identified as important contributors to bacteria loads at some beaches.
Table 2 includes a list of the independent variables included in the final
models for the five case studies. The variables were useful in making timely
beach notification decisions. Models must be developed on a beach-specific
basis using site-specific data, as shown by the variety of independent
variables used in the case studies as well as the number of variables used in
similar models (Francy et al. 2013b).
Table 2. Independent variables used in final statistical models from case studies.
Location
Chicago
Parks
District
Charles
River
Watershed
Association
Milwaukee,
Wisconsin
Horry
County,
South
Carolina
Racine,
Wisconsin
Beaches
Montrose Beach,
Oak Street Beach,
Foster Beach,
63rd Street Beach,
and Calumet Beach
Lower Charles
River Basin from
Watertown to
Boston Harbor
Bradford Beach,
McKinley Beach,
and South Shore
Beach
Grand Strand
beaches
North Beach,
Zoo Beach
Independent Variables Used in Final Model
6-hour rainfall, 4-hour wave period,
6-hour solar radiation, 48-hour rainfall,
6-hour longshore wind, onshore wind,
turbidity
Rainfall volume, river flow, and wind
24-hour rainfall, previous 24-hour £ coll
sampling, pH, conductivity, wave height,
water temperature
Cumulative rainfall, rainfall intensity,
preceding dry days, weather (e.g., wind
speed), tides and lunar phase data,
current and salinity
Water temperature, air temperature,
seagull counts, dog counts, wildlife
counts, wave height and intensity, water
clarity, sky conditions, color changes,
odor, algae amount, algae type, bather
load (in, out, and total), long shore
current and components, wind direction
and speed, stream discharge, pollution
discharge , rainfall (24-, 48-, and 72-
hour), day of year, season, lake levels,
and previous day's £ co/; values
21
Sand and Algal Mats
Sand in the wave-washed zone
of a beach can be a potential
source of fecal contamination
(Aim et al. 2003). Beach sand
can support large densities
of FIB for prolonged periods,
independent of lake, human,
or animal input (Whitman et al.
2014).
Other research has examined
the presence of FIB in algal
mats along beaches. Whitman
et al. (2003) found that
Cladophora can provide a
secondary habitat for FIB that
could potentially impact water
quality in affected Great Lakes
swimming areas.
-------
22
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 2: Identify Variables and Collect Data
Data Sources
National Oceanic and Atmospheric Administration (NOAA)/NWS Weather
Station Data
• NWS airport weather data (e.g., rainfall, temperature, cloud cover, wind speed and
direction) are frequently available and easily downloaded.
• NOAA maintains a network of buoys, tidal stations, and satellite measurements that
provide data on tides, currents, wind, cloud cover, and other marine characteristics
(http://tidesandcurrents.noaa.gov).
• Additional water quality data are available from NWS (e.g., forecast maps, radar,
river/lake levels, rainfall, air quality, and past weather) (http://www.weather.gov).
uses
• USGS provides continuous real-time water quality data, including streamflow,
water temperature, conductivity, pH, dissolved oxygen, turbidity, and runoff
(http://water.usgs.gov/data).
• USGS supports the National Water Information System (NWIS), which includes data
from more than 1.5 million sites, some in operation for more than 100 years
(http://waterdata.usgs.gov/nwis).
Sanitary Surveys
Sanitary surveys are an excellent source of information on site characteristics that
can support the development of predictive models. The surveys provide detailed
environmental data, including the following observational variables that could be
translated into predictive variables for a model:
• Number of swimmers/bathers.
• Boat traffic.
• Wildlife and domestic animals.
• Debris and litter.
• Presence of algae.
• Infrastructure (e.g., parking lots, storm drains, WWTPs).
EPA has developed beach sanitary survey tools—one each for marine and Great Lakes
beaches—to help beach managers evaluate all contributing beach and watershed
information, including water quality data, pollution source data, and land use data
(http://www.epa.gov/beach-tech/beach-sanitary-surveys).
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data
Analysis
After selecting and collecting high-quality FIB data and independent
variable data sets, your Beach Team is ready to proceed to exploratory data
analysis (EDA).
Introduction to Step 3
The primary purpose of EDA is to explore the relationships between the
independent and FIB density variables and identify the best candidate
variables for model development. Another purpose is to assess two
fundamental assumptions of the statistical models described in this
guidance: (1) the data sets represent the normal range of conditions that are
expected in the future, and (2) the FIB density and independent variables
are linearly related. Your Beach Team should consider working with a
statistician who can provide statistical expertise during EDA.
The EDA work is valuable because it adds to your Beach Team's depth of
knowledge about relationships between FIB density and the various drainage
area, receiving water, and fate independent variables. This knowledge is
crucial for integrating predictive modeling into an overall beach program.
23
The purpose of this section
is to provide an overview of
the approach to exploratory
analysis. It does not attempt to
provide a thorough discussion
of techniques or evaluations.
Further information can be
found at http://www3.epa.gov/
caddis/da exploratory O.html.
-------
24
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
Virtual Beach Software
Your Beach Team will need to use specialized computer software for many of
the data processing and EDA tasks described in this step, as well as for model
development and testing activities described in the next step. EPA's Virtual
Beach (VB) software package is specifically designed for constructing site-
specific FIB prediction models at freshwater and marine beaches. Created
for use by beach managers and researchers, VB includes a variety of EDA
techniques, including the basic ones described in this section.
Although many free and proprietary statistical packages that include EDA
programs are available, VB allows predictive beach modelers to seamlessly
integrate all the necessary components for preparing and analyzing data
and building and testing models. VB also includes an integrated mapping
component to determine geographic orientation of the beach and assists
the user in compiling wind/current speed and direction in along-shore and
onshore/offshore components. Your Beach Team should ensure they have
staff with appropriate skills as VB does not replace the need to work with
someone knowledgeable in data management and analysis.
For more information about VB and its capabilities, including how to
download a free copy, visit http://www.epa.gov/exposure-assessment-
models/virtual-beach-vb. You can also visit http://www.seagrant.wisc.edu/
home/Default.aspx?tabid=646#Training for predictive modeling workshop
presentations, webinars on accessing online data, and step-by-step modules
onVB.
Exposure Assessment Models
in arf tifif 'PJ5 Homo » F*fKWJirp flw^menr Mnrlp!* » Vimnt frw+i (VR)
Virtual Beach (VB)
3&A December 2014
2.4.3 September 2013
liilyjfll?
Applications and Possible Uses
Technical Support and Training
Duality Assurance and Oualitv Control
VMiJdl Bedi.ii b d wjllwdie pdtkdye designed fui developing bile-spetifii. sLdUbliidl mudelb lui me
prediction of pathogen indicator levels at recreational beaches.
VD is primarily designed for beach managers responsible for making decisions regarding beach
closures due to pathogen contamination. However, researchers, scientists, engineers, and students
ntcrcstcd in studying relationships between water quality indicators and ambient environmental
conditions will find VB useful.
Data Management
The management of data is an important
part of the model development process.
Before data can be uploaded to VB or other
modeling software, it must be manipulated
and formatted properly. This can be a fairly
complex and time-consuming process and
enlisting the help of data processing experts
is often necessary.
It is important to keep in mind that each
measurement in an independent variable
data set must pair with one, and only one,
FIB density measurement. Some beaches
collect multiple samples at about the same
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
time and record all of them in a database. In that case, you can take the
geometric mean and use that as your data point.
As with any data-driven analyses, variable data must be checked carefully
and identified errors or anomalies corrected before they are entered into
any analytical software, including VB. Some basic things to watch out for
include missing data, improperly recorded information, invalid data cells,
and other potential formatting problems. Data formatting and structure
must meet all of the input standards and requirements of the software. For
example, empty data cells are not permitted in VB. In such cases, you need
to either identify and replace these values or delete the observation from the
data set.
VB includes a component that assists users in the input data-check process.
It can go through a spreadsheet cell by cell looking for blanks as well as
non-numeric or user-specified values. If a bad cell or value is identified, the
user is presented with an opportunity to fix it.
Other data checks can include:
Linking FIB observations with independent variable data. A key
challenge in developing the input data sheet for VB is selecting only those
data temporally linked to the FIB observations. The challenge is further
complicated if you are also creating antecedent variables from those data.
There are several methods for accomplishing this data manipulation task,
both within and outside of the VB-input file. Mednick (2009) describes
a system for joining various data tables into one master table using
Microsoft's Access database software.
Numerical conversion of categorical variables. VB requires that all
categorical variable labels be given a numerical designation. Ordinal
variables can be simply converted to a continuous-like numerical variable.
For example, turbidity values can be translated as Clear = 1, Slightly
Turbid = 2, Turbid = 3, and Opaque = 4. Of course, even though they
appear as numbers, they are still categorical values and, therefore, most
summary statistics (e.g., mean values) are not applicable. VB provides
an opportunity for the user to flag categorical variables to prevent the
creation of inappropriate summary statistics and variable transformations
(e.g., natural log or square root variable).
Data entry errors. Your Beach Team should put in place data
management QA oversight and QC procedures that include the transfer
and manipulation of data such as in the VB input data sheet. After data
25
NOAAand USGS have
developed tools to help
automate the process of
downloading data from online
sources and compiling them
into a single data sheet.
• NOAA-PROCESSNOAA.
Accesses, compiles, and
processes wind speed and
direction (instantaneous
and previous 24 hours) and
rainfall totals for 24-hour
windows of lag times of 1,
2, and 3 days. It also has
the ability to display data
graphically and weight
rainfall variables.
To access the tool, visit
http://pubs.usgs.gov/
sir/2013/5166/Ddf/sir2013-
5166 appendix2.pdf.
• USGS-Environmental
Data Discovery and
Transformation (EnDDaT).
Accesses, compiles, and
processes data from a
variety of data sources,
including NWS, National
Data Buoy Center (NDBC),
and NWIS. EnDDaT can be
used to compile historical
data in a single worksheet
for model development and
to create real-time datasets
for direct import to VB for
model operation.
To access, visit http://cida.
usqs.gov/enddat/.
-------
26
Water Quality Notice
All natural bodies of water contain microscopic
organisms This area 19 monitored for E. coll
bacteria, an Indicator of the possible presence
of human health risks If bacteria levels are
above state health standards, an advisory or
closure sign will be posted at this location. Do
not Ingest lake water and, as always, swim at
your own i
me latest water con
www.ldern.IN.gov/tMaahM
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
have been transferred to the data sheet, review for errors and anomalies in
the data sets. Excel 2013 and later versions include a Quick Analysis tool
that allows you to select data and instantly create statistics and charts that
will help identify problems.
Placeholders for unmeasured values. Some data sets, especially those
downloaded from online data sources, use numeric placeholders to
indicate unmeasured data (e.g., 999). You need to identify and replace
these values or delete the cells. Empty data cells are not permitted in
many model building programs, including VB.
Unit errors. Most numeric data are reported
in unit measurements. The units can vary and
must be converted to common units for model
development. The most common conversions
involve converting data from English to metric
units, or vice versa. For modeling purposes, the
unit chosen is not as important as consistently
using the same units. Unit information should be
included in the column title.
il
Water Quality Today
GOOD
BASED ON RECENT MONITORING
FOR E.COLI BACTERIA
For Moro Information Visit: w«w.rdom.IN gov
NO INGIERA AGUA DEL LAGO.
NAPE A SU PBOPIO RIESCO.
Porn mas Inlormwlon- www.IOem.lN gowibeaehts
CAUMD D6 AfllW
AUWIM10 DE HIISOO DE
fNFERMEDAOWUXMBLfi
BASADO thl RECIENTES
iHALJSiS M LA BACTERIA
DE LAflACTlHIA E COU
Date/time errors. Some data sets downloaded
from online data sources list date and time by the
numerical day of the year (DOY, 1-366) and/or
Coordinated Universal Time (UTC). Some data
sources use "Zulu Time" or "Greenwich mean
time." These data need to be converted into the
same time zone and the date/time format selected
for use in the input data sheet. A UTC conversion
tool can be found at http://www.noaanews.noaa.
gov/hurricanes/zulu-utc.html. A DOY conversion
tool can be found at http://www.ngs.noaa.gov/
GRD/GPS/DOC/dov/dov.html.
FIB data inconsistencies. A special case is often
noted with FIB density data. Because laboratories
have minimum density detection limits for FIB,
data sets will sometimes have category-type
entries mixed in with numerical entries (e.g.,
< 10CFU/100 milliliters (mL)). In this case, your
Beach Team must decide how to handle the
"below detection limit" entries so the variable is
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
27
continuous. Typically, one of three options is chosen: (1) use the detection
limit value, (2) use one-half the detection limit value, or (3) use zero
as the value. Too many detection limit substitutions, however, might
compromise the integrity of the FIB density data set.
Multiple stations or sites. Some beaches have one sample site, others
use multiple sites. For modeling purposes, however, you need just one
density measurement to represent the beach as a whole for a sampling
event. In the case of beaches with multiple sites, some sampling schemes
are designed to produce a composite sample composed of subsamples
taken from each of the stations at approximately the same time. In
that case, you would use the composite sample measurement as your
FIB observation. Other programs process multiple station samples
individually, resulting in multiple FIB data points for an event. A
common approach in that case is to calculate the geometric mean of the
samples and use that as your FIB observation. Occasionally, you might
come across duplicate samples taken from the same station for QA or
other purposes. In that case, using the average of the two samples would
be appropriate, or a more conservative approach would be to use the
highest value as your observation.
Characterize the FIB and Independent Variable
Data Sets
EDA usually begins with an examination of the distribution of each of the
data sets. If the "most ideal normal condition" is assumed to be the center
of the data distribution (signal), the spread of data from the center (noise)
should be examined and at least informal inferences made about the range
of environmental circumstances and conditions that produced the variation.
Box Plots
Box plots are an effective way to summarize data distributions. An example
of a box plot is presented in Figure 3. You can generate box plots in VB as
well as other statistical software. Note that the box itself is plotted on the
Y-axis, and the top and bottom of the box represent the lower and upper
quartiles of the ordered data set (25th and 75th percentiles, respectively). The
median is calculated and displayed as a horizontal line inside the box. The
difference between the quartiles is called the "interquartile range". Vertical
lines (whiskers) extend from the quartile lines to represent data above and
below interquartile range. Traditionally, the box plot's whiskers terminate
-------
28
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
with a short horizontal line that represents the highest and lowest data
points of the distribution.
75th percentile
median
25"' percentile
Outlier
Largest non-outlier
I Interquartile
f range
\
Smallest non-outlier
Figure 3. Box plot attributes.
By visually inspecting the box plots, your Beach Team can observe:
• Outliers—Extreme values in the data set that should be investigated.
• Median—The central tendency of the data.
• Spread—The variability in the data set in relationship to the median.
Smaller spreads are generally better for modeling than larger spreads.
The interquartile range is an indicator of spread of the middle half of
the data set.
• Symmetry and Skewness—The variability of the data set on either side
of the median. A symmetric data set shows the median in the middle
of the box. A skewed data set displays the median closer to one edge
of the box, indicating that the spread is greater for those data on the
other side of the median line. If the data are skewed with outliers, the
interquartile range is often a better measure of variability than the
standard deviation because it is not inflated by the entire data set.
Your Beach Team might find that some data sets are difficult to plot and
characterize because the data range over several orders of magnitude. FIB
densities, in particular, often range from very low densities (< 10 CPU per
100 mL) to very high densities (> 10,000 CPU per 100 mL). Data ranges
such as these require that data be transformed to induce symmetry in the
distribution and to make it easier to graph, observe, and interpret results.
The logarithm is the favored method used for this purpose.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
29
As mentioned earlier in step 3, categorical variables have values that
function as labels rather than numbers. Therefore, they are not an ordered
data set and cannot be box-plotted in the same manner as continuous data.
FIB density data, however, can be box-plotted by variable categories. The
resulting plots will indicate how the different categories of the independent
variable individually influence FIB levels (Figure 4).
10,000
_ 1,000
I
u_
o
.= 100
"5
u
u]
T
North
East South
Wind Direction
West
— Notification threshold value (235 CFU/IOOml]
* Outlier
Figure 4. Box plots of £ co//density sorted by wind direction.
Outliers
An "outlier" is a data point located outside of the overall pattern of a
distribution of other data points. Sometimes outliers are a result of a
faulty measurement or a data entry error. In other cases, the data might
be correctly measured, but the measurement or sampling occurred under
unusual circumstances or conditions. This could be especially significant
at beaches with infrequent but predictable exceedances, such as after a
heavy rain event. In still other cases, the outlier is a legitimate data point
and, while uncommon, might be considered within the normal range of
conditions. Because of this uncertainty, your Beach Team should always try
to identify the reason for or cause of an outlier.
Legitimate outliers can be displayed in the box plot as data points that
extend beyond a reformulated minimum or maximum line. Basically the
four quartiles are constructed as usual, but (invisible) "fences" are added at
-------
3O
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
the tails of the distribution. These fences mark the boundaries of what is and
is not an outlier. The fence is usually defined as 1.5 times the interquartile
range. Some analysts like to further categorize outliers as either mild or
extreme. To do this, the analyst calculates an outer fence beyond the initial
(now inner) fence that is defined at 3.0 times the interquartile range. Any
data point that lies between the inner and outer fences is designated as a mild
outlier and any point beyond the outer fence is considered an extreme outlier.
Comparing Data Distributions among Variable Subsets
As mentioned in the introduction to this step, a fundamental assumption
of predictive models is that the data used to build the models represent
normal conditions that are expected to extend into the future. One way to
confirm this assumption, at least for the collected data, is by constructing
a time-series plot (Figure 5). If data levels seem to change in certain time
periods, your Beach Team might also want to prepare box plot presentations
for temporal subsets of the data set to better analyze year-to-year
variations and/or season-to-season data variations. By making side-by-side
comparisons of box plots, the team might note a significant shift of one
subset compared to the others.
10,000
_ 1,000
I
u
.= 100
¥
2012
2013
2014
2015
Year
- - - Notification threshold value (235 CFU/IOOml)
¥ Outlier
Figure 5. Comparison of £ co//density over a four-year period.
If your Beach Team notes a significant shift of one data subset compared
to the others, it should investigate why this is occurring. In some cases,
this exercise could lead to the development of different predictive models
for spring and summer seasons or even the incorporation of a "time of
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
season" variable into the predictive model. In other cases, examining subset
distributions might indicate that one entire season's data might be suspect
because of the use of different sample collection protocols or equipment or
because an important change in environmental conditions occurred that
created a new normal for that time period.
Examine the Relationship between FIB and
Independent Variables
Once your Beach Team is familiar with the data sets, outliers are explained,
and bad data have been removed, the team can begin examining the
relationship between FIB concentrations and independent variables. The
main purpose of this exercise is to document linear correlations between
FIB density and independent variables—another key assumption of
statistical predictive models.
Scatterplots
The "scatterplot" is a graphical technique that portrays the one-to-
one relationship between a dependent variable (FIB densities) and an
independent variable. A clustering of data points in a nonrandom pattern
along an imaginary line indicates that a linear relationship exists. The
strength of the linear association is measured by the Pearson's Correlation
Coefficient (r). Its value can range from -1.0 to 1.0—where -1.0 is a perfect
inverse correlation, 0.0 is no correlation, and 1.0 is a perfect correlation. The
closer the absolute value is to 1.0, the stronger the association is between the
two variables.
31
Your Beach Team should keep in mind
that, even though a scatterplot might
reveal a strong association between
dependent and independent variables, it
does not automatically mean that there is
a cause-and-effect mechanism at work. A
definitive connection of this nature must
be made through other means. The only
finding from the scatterplot analysis is the
correlation between the two data sets.
*~^-
•
/
frf-
'-- V , • Credit: ftyan-Mgeriy/USFVVS
-------
32
A)
1,000
750
0 500
"S
u
111
250
0
B)
1,000
•g 100
I
u_
O
B
"S
u
1
.
.
.
.
•*&-.;." "• •'••
• - «-* * *** *^ '»• •
,•*,*, ,
1.0 0.5 1.0 1.5 2.0 2.5
Rainfall in inches
. *
• »*B • •
. •*"? • * 5* "
•• ^ • * •
..••••A.-'
,0 0.5 1.0 1.5 2.0 2.5
Rainfall in inches
Figure 6. Scatterplots of £ colivs. rainfall
withouttransformation (A) and with a
log-transformation (B)
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
Variable Transformation
If the relationship is nonlinear, your Beach Team should
consider transforming the data to try to improve linearity.
FIB data, for example, are almost always transformed
to a logarithmic scale. Figure 6 illustrates how a LoglO
transformation of E. coli data improves linearity. VB
provides several transformation options, including base 10
logs, natural logs, square, and square root.
Creation of New Variables
Your Beach Team might want to explore manipulating or
combining variables to improve linearity or to enhance the
meaning of the variable. This might include:
• Creating a new composite variable by summing,
multiplying, or averaging data when multiple sites
are measuring the same variable (e.g., multiple FIB
sampling sites in the swimming area or multiple rain
gauges in the drainage area).
• Creating a new composite-weighted variable by
including additional weight to select components of the
same variable (e.g., creating a cumulative 3-day rainfall
total but manipulating the equation so that the more
recent 24-hour period receives a higher weight than the
preceding 24-hour period).
VB allows you to create new variables using sum, maximum,
minimum, mean, or products; it also allows you to define
beach orientation and break down wind, current, wave
direction and magnitude (speed or height) data into
alongshore and offshore and onshore components. These
types of data are often valuable independent variables in
situations in which a major outfall is located near the beach.
Correlation among Independent Variables
Sometimes combinations of independent variables do not work well together
in the context of a predictive model. This frequently occurs when two
independent variables correlate highly with each other. Therefore, your
Beach Team should examine relationships among independent variables
during EDA and identify any strong correlations. The correlations might be
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 3: Perform Exploratory Data Analysis
33
important in step 4 of the model development phase. Where there is a strong
correlation, your Beach Team might consider picking one variable and
discarding the other—a decision made easier if data for one of the variables
is more convenient and/or less expensive to collect.
Analysis of Variance for Categorical Variables
The relationship between an independent categorical variable and FIB
density cannot be represented in a scatterplot with r values calculated in
the same manner as continuous data. As noted above, you can visually
detect categorical influences on density by using categorical box plots of
FIB density. Your Beach Team can use the analysis of variance (ANOVA)
statistical technique to determine if the means of the categorized data as
they relate to FIB density are significantly different.
-------
34
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Develop and Test a Predictive
Model
After your Beach Team has completed the EDA and selected a set of
independent variables that correlate with FIB density, they can proceed to
developing and testing the predictive model.
Introduction to Step 4
Most predictive models in use today are based on linear regression, a
statistical method that assumes a linear—or straight-line—relationship
between variables. Linear regression can be used to predict a dependent
variable measurement (in this case, FIB density) using one or more
independent variable measurements.
A model that uses only one independent variable is generally described as
a "simple linear regression" model. A model using two or more variables is
called a "multivariable linear regression" (MLR) model. In either case, the
model itself is nothing more than an equation with the dependent variable
on one side of the equal sign and independent variable coefficients on the
other side. Conceptually, you plug in the appropriate measured independent
variable values and calculate a predicted FIB density. You can then compare
the FIB density to a state water quality standard or other threshold value
and make a decision concerning beach notification actions (e.g., to issue a
swimming advisory or close the beach).
Three key elements are necessary for producing an effective predictive model:
• Using high-quality data sets to develop and test candidate models.
• Reducing error and increasing predictive power of the model as much
as possible.
• Choosing an appropriate software package.
Data Sets
The importance of using high-quality dependent and independent variable
data sets for model development and testing cannot be overemphasized. A
sufficient amount of good empirical data is necessary for an effective and
reliable model. As mentioned in step 2, a rule of thumb is to collect at least
three years' worth of historical data that represent conditions that are likely
to occur in the future. Then, use two of those years' data to develop the
model (training data set) and one year's data to assess the model's predictive
accuracy (testing data set).
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
Reducing Errors
Recall that the correlation coefficient r was used in context of EDA
scatterplots to measure the linear association of one independent variable
and FIB density. In the context of modeling, r also represents a measure of:
(1) the scatter (variability) of the data points from the regression line, and
(2) the power of the independent variable to correctly predict the value of the
dependent variable. Variability can often be reduced if more independent
variables are added to the mix. This makes sense thinking back to how
bacteria moves from land-based sources, through the drainage network and
the receiving water, and to the beach. Rainfall, wind, currents, sunlight, and
other factors work in combination to influence both the journey and the fate
of bacteria cells. While the complexity of model development increases with
the addition of more independent variables, the result is usually increased
accuracy in predicting FIB density.
Virtual Beach
The material presented in this section focuses on VB's traditional MLR
method of model-building. The current version of VB software is version 3
(VB3), which was released in December 2014.
For more complete information about MLR as well as other modeling
methods available in VB3, consult Virtual Beach 3.0.4: User's Guide
(http://www.epa.gov/sites/production/files/2015-02/documents/vb3
manual 3.0.4.pdf) (Cyterski et al. 2013).
In general, the model-building process in VB3 involves searching for the
combination of independent variables that produces the most accurate
FIB density predictions. Although the VB3 software processes for building
predictive models are automated, you must make important decisions
concerning model construction and testing, including choosing the method
used to build the model, number of variables to include in the model, and
evaluation criteria used to judge model fitness. Unless you or another
member of the team is familiar with VB3, you will probably need to consult a
person who has used it before to help you with these decisions. You can also
visit http://www.seagrant.wisc.edu/home/Default.aspx?tabid=646#Training
for predictive modeling workshop presentations, webinars on accessing
online data, and step-by-step modules on VB.
Model Building
VB3 offers two general methods for selecting variables for the model. One
is called the "genetic algorithm." It is a stepwise procedure that adds or
35
yp
fdit:
-------
36
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
subtracts independent variables from the model based on their level of
statistical significance. The software retains the most significant variables
and discards the least significant.
A more comprehensive approach to model building is called "exhaustive
search." It involves measuring the goodness of fit for all possible
combinations of the chosen independent variables, beginning with models
with a single variable and working up to models with all the variables
incorporated. The best model for each number of variables, up to a defined
maximum, is then identified based on goodness of fit statistics (e.g., the best
2-variable model, the best 3-variable model).
VB3 provides a variety of criteria for evaluating model fitness. In addition,
the software can recommend how many variables are optimal for the model
and determine if collinearity among independent variables is a problem.
Once model-building is completed in VB3, the software presents you with
the 10 best models based on your chosen evaluation criteria. You then
evaluate these candidates using one or more metrics described in detail in
the Virtual Beach 3.0.4: User's Guide (Cyterski et al. 2013). Based on the
results, you select a final model and begin the process of model validation.
Model Validation
The objective of model validation is to determine whether your final model is
good enough to use in your beach program. Keep in mind that your model's
output is used to help officials make timely beach management decisions,
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
including issuing a swimming advisory or closing the beach. These decisions
are not taken lightly because they affect public health and safety and a variety
of related community concerns pertaining to economic prosperity and public
perceptions about the safety of local recreational waters.
How you determine if your model is good enough to use in your program
is up to you. If you have been relying on previous day sampling results
for making your beach notification decisions, you probably want your
predictive model to at least perform better than this "persistence model"
approach. You can define "how much better" be setting performance goals
and testing to see if your predictive model meets or exceeds those goals. If it
passes this test, you can consider the model validated and acceptable to use.
Discussed below is a four-step method for validating a model using a
performance goals approach:
1. Generate evaluation statistics for the persistence model using a testing
dataset. Common evaluation statistics are overall accuracy, specificity,
and sensitivity (described in more detail below). They are defined
and generated in this first step for the persistence model and then
generated again in the third step for the predictive model.
2. Set performance goals for your predictive model based on the
persistence model's evaluation statistics.
3. Generate evaluation statistics for the predictive model using the
testing dataset.
4. Compare the evaluation statistics of the two models and determine
the percentage point increase (or decrease) of the predictive model
compared to the persistence model.
This approach to model validation is illustrated using the work of
Francy and Darner (2006). They developed an MLR predictive model
for Huntington Beach, Ohio, a beach located on Lake Erie, using a
training dataset collected during the 2000-2004 beach seasons. The beach
notification threshold value is an E. coli density of 235 CFU/lOOml. The
explanatory variables incorporated into their model were wave height,
weighted rainfall in the previous 48 hours, and loglO turbidity. Data
collected in the 2005 beach season were used as the testing dataset.
Generate evaluation statistics for the persistence model
Using Francy and Darner's testing dataset, Figure 7 is a plot of the
persistence model results; that is, observed E. coli densities (X-axis) vs.
E. coli densities measured the previous day (Y-axis). The quadrants displayed
37
Forecasting
Future directions that EPA
considers likely for predictive
tools for beach notification
include forecasting beach
water quality conditions a day
or more ahead. Researchers
are also attempting to develop
models applicable to more than
one beach or to a region of
shoreline.
-------
38
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
in the graph are defined by the vertical and horizontal lines set at the
beach notification threshold value of 235 cfu/lOOmL. The numbers in the
parentheses are the number of plot points that appear in the quadrant.
Listed below are the distinguishing characteristics of each quadrant:
• Upper left quadrant. Data points that fall in this quadrant have
observed E. coli densities below the threshold value, but the model
predicts that they will exceed the threshold value. This is known as a
"false positive," or Type 1 error.
• Upper right quadrant. Data points that fall in this quadrant
have observed E. coli densities above the threshold value, and the
model correctly predicts that they will exceed the threshold value.
"Sensitivity" is the percentage of all the observed exceedance data
points that fall in this quadrant.
• Lower left quadrant. Data points that fall in this quadrant have
observed E. coli densities below the threshold value, and the model
correctly predicts that they will not exceed the threshold value.
"Specificity" is the percentage of all the observed non-exceedance data
points that fall in this quadrant.
• Lower right quadrant. Data points that fall in this quadrant have
observed E. coli densities above the threshold value, but the model
predicts that they will not exceed the threshold value. This is known as
a "false negative," or Type 2 error.
10,000
£ 1,000
V)
'« E
False Positive 14]
Correct Nonexceedance (31 )
• *» *
*
* *
•»
* *
Correct Exceedance (0)
False Negative (6)
* *
*
10 100 1,000
Observed E. coli'm CFU/100ml
10,000
• Notification threshold value (235 CFU/100ml)
Number of responses
Figure?. Plot of persistence model results of 2005 data (adapted from Francy and
Darner 2006.)
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
39
Of the four quadrants, plot points that fall in the lower right quadrant
below the horizontal line (Type 2 errors) are the most troubling because
the persistence model is predicting the water is safe for swimming when, in
fact, the water is unsafe because FIB densities exceed the beach notification
threshold value.
The performance statistics for the persistence model are:
• (Overall) Accuracy = 75.6%
• Specificity = 88.6%
• Sensitivity = 0.0%
Set performance goals for the predictive model
There is no standard formula for setting performance goals; you must use
your judgment in context of the goals and objectives of your beach program.
Assuming you have been relying on the persistence model approach for
making notifications decisions, you will want your predictive model to
perform better than the persistence model. Francy et al. (2013a) suggest a
goal of at least 5 percentage points better for accuracy, specificity, and/or
sensitivity.
As discussed above, the sensitivity statistic is especially important because it
characterizes Type 2 errors. Consequently, if you want to take a conservative
approach in protecting public health, you may want to set your sensitivity
performance goal as high as practicable.
For this Huntington Beach example, using the persistence model evaluation
statistics as a baseline, Francy and Darner chose the following performance
goals for model validation purposes:
• Accuracy goal > 81%
• Specificity goal > 94%
• Sensitivity goal > 50%
Generate evaluation statistics for the predictive model and determine if
your performance goals are met
Once you have established performance goals, you can test your predictive
model to see if it meets those goals. Again using Francy and Darner's 2005
testing dataset, Figure 8 is a plot of observed E. coli densities vs.
E. coli densities predicted by the 2000-2004 model. The evaluation statistics
derived from this plot are:
• Accuracy = 88.0% (exceeds performance goal)
-------
4O
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
• Specificity = 95.2% (exceeds performance goal)
• Sensitivity = 50.0% (meets performance goal)
In this example, the Francy and Darner 2000-2004 model passes our
performance goal test and can be considered good enough to use in a beach
notification decision support system.
in,™
!§
°- o
"o
uj
False Positive (2)
Correct Nonexceedance (40)
*
* *
» **
*^ »* *
^ * *
Conect Exceedance (4)
4- *
False Negative (4)
* *
10 100 1,000
Observed E coliin CFU/IOOml
10,000
— Notification threshold value (235 CFU/lOOmI)
(n) Number of responses
Figure 8. Plot of predictive model results of 2005 data (adapted from Francy and Darner
2006.)
Models that Do Not Meet Performance Goals
Throughout this guide, we have been optimistically moving forward
assuming that you are on the path toward creating a successful model.
Unfortunately, this is not always the case. If your model does not meet your
performance goals, there are some things you can do to try to improve
it. For example, you could revisit Step 2 and identify new independent
variables and try rebuilding your model, or segregate your dataset and
create sub-models that may individually offer better predictive capabilities
than one overall model. Another approach is to consider one or more of the
alternative predictive tools described in the text box titled Alternatives to
MLR Modeling.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
41
Exceedance Probability Threshold
VB3 provides the ability to express a FIB density prediction in terms of a
probability that a defined notification threshold value will be exceeded.
Predictions in this form have some advantages over a FIB density output:
• They explicitly convey that there is uncertainty associated with the
model prediction.
• They give you the flexibility to select a specific exceedance
probability—rather than a density number—to function as the beach
notification threshold value.
If you choose exceedance probability as your model output, you must define
a specific probability percentage to function as a notification threshold
value. In general, try to select the lowest (most conservative) exceedance
probability threshold that produces the most correct responses and the
fewest false negative responses. Recall that false negatives (Type 2 errors)
are especially troubling because the model is predicting the water is safe for
swimming when, in fact, the water is unsafe.
Continuing with the Huntington Beach 2000-2004 model example, Francy
and Darner (2006) concluded that a threshold probability of 29 percent
would provide the best balance of correct responses and false negative
responses. Figure 9 is a plot of threshold exceedance prediction and
observed E. coli density using the 2005 testing data set. The quadrants in the
-------
42
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 4: Developing and Testing a Predictive Model
chart are defined by the state standard of 235 CFU/lOOmL (vertical line) and
the probability of exceedance threshold of 29 percent (horizontal line). The
performance statistics from this plot are:
• Accuracy = 82.0 percent
• Specificity = 88.1 percent
• Sensitivity = 50.0 percent
30
80
70
60
bO
40
no
20
10
n
False Positive 15)
#
*
*
Correct Nonexceedance 137)
*• ~>V»O* •»/ **
* «
Correct Exceedance (41
*
»
False Negative (4)
* *
t
ID 100 1.000
Observed E. coli'm CFU/100ml
ID.OOD
— 29-percent threshold
— Notification threshold value 1235 CFU/lOOmll
(n) Number of responses
Figure 9. Plot of predictive model results of 2005 data expressed as exceedance
probability threshold (adapted from Francy & Darner 2006.)
Using this approach, you can establish a beach management protocol that
requires the issuance of a notification if the model predicts a probability of
exceedance of 29 percent or greater.
''f-edit: Chelsi Hornbaker/USFWS
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach 43
Step 4: Developing and Testing a Predictive Model
Alternatives to MLR Modeling
MLR models are a popular predictive tool used by beach programs, but they are not useful or appropriate
for all beaches. If for some reason MLR modeling is not right for your beach, you can explore other
alternatives, including:
Rainfall Alerts
This predictive tool is based exclusively on the positive correlation that sometimes exists between rainfall
and FIB densities: As rainfall totals increase and contaminated runoff reaches the receiving water, there
is a predictable corresponding increase in FIB density at the beach. By factoring in a beach notification
threshold, you can predict exceedances of the threshold using a combination of storm duration and
cumulative precipitation data. Rainfall-based thresholds are derived by simple regression or a frequency of
exceedance analysis. They represent the oldest approach to predictive modeling and are actively used at
many beaches in the U.S.
Partial Least Squares Models
Partial least squares (PLS) regression models can be used as an alternative to MLR models if there is a
large number of independent variables that are not well understood; have poor linear correlation with the
response variable; or have problems with multicollinearity among the independent variables. The primary
objective of PLS regression remains the same as MLR: a model that accurately predicts FIB concentration
given a set of independent variables. The system for selecting the variables is what makes this modeling
different. VB3 includes PLS regression as an optional modeling approach, and it is described in detail in
Virtual Beach 3.0.4: User's Guide (Cyterski et al. 2013).
Decision Trees
In general, decision trees work best when FIB levels are primarily influenced by only a few factors. They
are basically a series of yes/no questions concerning conditions that influence FIB density. The "tree" is
typically portrayed visually as a flow chart with binary decision node "branches." The questions with the
highest importance generally appear at the top of the tree. By moving down the tree and answering the
set of ordered questions, you are ultimately led to a beach notification classification in the simplest form,
either "issue a notification" or "don't issue a notification." Decision trees range from simple to complex,
depending on the number of decision nodes and classification endpoints.
Gradient Boosting Machine
The "gradient boosting machine" (GBM) is a computerized approach to constructing a large hierarchical
set of simple decision trees for making FIB predictions. Similar to PLS regression, it is an alternative to
MLR if there are a large number of independent variables that might not be well understood; have poor
linear correlation with the response variable; or have problems with multicollinearity among independent
variables. VB3 includes GBM as an optional modeling approach, and it is described in detail in Virtual Beach
3.0.4: User's Guide (Cyterski et al. 2013).
Artificial Neural Network
An "artificial neural network" is software that attempts to mimic the working of the biological neural
network. Still in the research phase, it presents potentially another alternative for dealing with a large
amount of independent variables that might not be well understood; have poor linear correlation with the
response variable; or have problems with multicollinearity among independent variables. The technique
incorporates an algorithm that allows it to "learn" relationships between inputs and outputs.
-------
44
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 5: Integrate the Predictive Tool
into a Beach Monitoring and
Notification Program
Introduction to Step 5
Once your Beach Team has developed the predictive tool, they have to
integrate it into their beach monitoring and notification program. Model
outputs will typically be either estimated FIB levels or a probability that the
beach notification threshold will be exceeded. The method your Beach Team
selects to use to integrate the model will depend on several things, including
the model's accuracy and the availability of resources. Some questions to ask
as you consider an integration strategy include:
• How will you use model results to determine beach notifications?
• Will advisories be posted based solely on model results or on a
combination of models and sampling?
• What do you do if the model predicts an exceedance?
• What do you do if sampling results and model results conflict?
• Will you verify model results before posting advisories?
• Will you use a model to remove an advisory or reopen a beach?
• How often will the model be used during the beach season?
• Will you run the model on weekdays and weekends?
• What time of day will you run the model?
As you can see, you must consider many factors when deciding how best
to integrate predictive tools into your beach monitoring and notification
program. EPA recommends that you use a predictive tool to complement
traditional monitoring. A predictive tool cannot completely replace
sampling, but it might allow you to reduce the frequency of sampling. Data
from culture samples can be used as a basis for models that provide timely
results in a cost-effective manner. Predictive tools might also be useful in
developing or adapting routine monitoring programs to focus sampling
efforts when conditions (e.g., rain events) correlate with high FIB levels.
You might choose to issue a beach notification if the model predicts an
exceedance of a beach notification threshold, if sampling results are above
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 5: Integrate a Predictive Tool into a Beach Monitoring and
Notification Program
45
the threshold, or both. If you occasionally use model results in conjunction
with sampling results, consider what to do if the model predictions and
sampling results conflict.
Once you have issued a beach notification, you must decide the process for
removing an advisory. Will you rerun the model with more current data?
Will you collect additional samples? The National Beach Guidance (USEPA
2014) recommends lifting actions that were imposed based on the output
of a predictive model after an additional model run estimates that water
quality conditions have improved to within acceptable parameters.
Frequency of Running the Model
Your Beach Team must decide how often to run the model. Consider
resources available to collect data, run the model, and post results. Running
the model daily might be ideal, but is not always practical. You might want
to have the results available on the weekends when the most people are using
the beach; however, you might not have staff available to collect the data and
run the model. Many beach programs that use predictive models run them
on weekdays while some also run them on weekends.
Notification Protocols
As you consider all the factors that are important in determining beach
notifications, you will use them to develop a protocol for making beach
notification decisions. "Notification protocol" is a general term used
to describe a set of questions or decision points that a beach manager
routinely uses to determine whether to issue a notification or close a beach.
Notification protocols can be simple or complex, but should include all of
the decisions that your Beach Team needs to make after collecting samples
or running a predictive model. The protocol can include the necessary
decisions after a pollution event (CSO or SSO discharge) or hazardous
conditions are discovered (e.g., strong rip currents, red tide) that might
affect whether the beach should be open, closed, or under an advisory. An
example of a notification protocol for a beach that uses sampling results and
a predictive model is shown in Figure 10.
-------
46
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 5: Integrate a Predictive Tool into a Beach Monitoring and
Notification Program
8:00 a.m.
(the previous day):
collect sample and send
to lab for analysis of FIB
8:00 a.m.
(day of sample col lection):
collect inputvariables
and run model
Did a pollution event
(CSO, harmful algal
bloom] occur?
Does previous day's sample
result exceed beach
notification threshold?
Does model predict
exceedance of beach
notification threshold?
Beach is open
Yes
Issue advisory
Figure 10. Notification protocol for a beach program that uses sampling results and a
predictive model to make notification decisions.
Some beach program managers might choose to use the predictive model
alone to make decisions on notification actions, without considering
sampling results when making those decisions. In that case sample results
might be used only to verify the model is making accurate predictions and
to recalibrate or update the model over time. An example of a notification
protocol for this approach is shown in Figure 11, which is much simpler than
the protocol shown in Figure 10.
8:00 a.m. (day of
sample collection):
collectinputvariables
and run model
Does model predict
exceedanceof beach
notification threshold?
Beach is open
Did a pollution event
(CSO, harmful algal
bloom] occur?
Issue advisory
Figure 11. Notification protocol for a beach that uses only model results to make
notification decisions.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 5: Integrate a Predictive Tool into a Beach Monitoring and
Notification Program
You should also explore whether you need different notification protocols
for different seasons or for different parts of the beach season (e.g., if there
is a dry part and a wet part to the beach season). In the case study for the
South Shore model, the MHD found that their Environmental Monitoring
for Public Access and Community Tracking (EMPACT) model was less
accurate as the beach season progressed, suggesting there was some level
of seasonality or unidentified influences to water quality between the
beginning and the end of the beach season.
Types of Beach Notifications
A beach advisory is the most common beach notification based on the use
of a predictive tool. However, the following types of notifications might be
appropriate at certain times.
Beach Advisories
When a model predicts the exceedance of a water quality standard, many
beach managers issue a beach advisory, which warns beach goers that the
FIB density is above the water quality standard and swimming and wading
are not recommended.
Beach Closings
Modeling results might lead you to decide that water quality conditions are
poor enough to warrant closing the beach rather than issuing an advisory.
If you close your beach, you might choose to continue running your model
regularly to determine when FIB
levels are low enough to reopen,
thereby minimizing the number of
closure days.
Preemptive Advisories
The exploratory data analysis will
give you a good idea of what events
(such as heavy rainfall or CSOs) are
correlated with higher FIB levels at
your beach; as a result, you might
decide to issue preemptive advisories
or closures based on those events.
For example, if you know that a
1-inch rainfall generally causes an
-------
48
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 5: Integrate a Predictive Tool into a Beach Monitoring and
Notification Program
exceedance of the notification threshold and the weather forecast is calling
for more than 1 inch of rain overnight, a preemptive advisory could be
issued based on what you already know about rain events and exceedances.
You would not need to run the model to issue a preemptive advisory or
closure.
Permanent Advisories
Some beach managers issue permanent advisories when a certain type of
event is highly correlated with elevated FIB levels. A predictive tool can help
determine whether a permanent advisory is necessary. An example of using
this type of advisory is when FIB levels often exceed water quality standards
after almost any amount of rainfall. In that case, you might choose to issue
a permanent advisory that swimming should be avoided for a certain period
after any rainfall has occurred.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 5: Integrate a Predictive Tool into a Beach Monitoring and
Notification Program
Public Communication
The predictive tool development process does not necessarily indicate a
need for public involvement. Much of the process involves scientific and
technical expertise and centers around the staff and resources of state and
local agencies and public health departments. Although much of the process
involves experts, predictive modeling stems from the need to protect public
health and much can be gained from involving the public.
Public Education
Public education is an important part of the outreach process. Outreach
often involves teaching the public about beach health and safety—what an
advisory means, what health risks exist, and what precautions should be
taken. When you are using a predictive model, you need to also explain the
use of the model to the public. Some general questions and answers useful
for public education include:
What is a predictive model? Predictive models are a means of predicting
or forecasting water quality conditions in the absence of a current water
sample. Beach managers assess previous sampling data to determine
which factors affect water quality. The model uses these factors to
estimate water quality under current conditions.
Why use a model? Predictive models are most useful in increasing the
timeliness of beach notifications, conserving resources by reducing
sampling, and improving the accuracy of identifying notification days by
adding to the existing monitoring program.
How accurate are models? The accuracy of a model depends on the data
on which it is based and local conditions. A thorough understanding of
the beach environment and a strong data set can support accurate and
reliable models. Models should be routinely verified and validated by
sampling and laboratory analysis, and continuously updated based on
sampling results.
How does the model change postings and advisories? With the use of
a model, postings and advisories can be updated more frequently and
provide real-time estimates of water quality at beaches.
Does this mean samples are no longer collected and analyzed? Water
samples are still collected regularly and analyzed for FIB, both to
determine the water quality and to verify and update the model.
49
The Ohio Nowcast webpage
http://www.ohionowcast.info/
index.asp is a great example
of an outreach website and
includes detailed information
for the public, such as:
• Where Nowcast is used —
detailed maps.
• How Nowcast works.
• How Nowcast performs.
• Accuracy of Nowcast for
each beach.
• List of variables used to
make predictions.
• List of current advisories.
• FAQs.
-------
so
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 5: Integrate a Predictive Tool into a Beach Monitoring and
Notification Program
How does this improve public health protection? Beach managers
are able to predict water quality and post advisories in a more timely
manner to prevent illnesses associated with recreating in waters with high
densities of bacteria or pathogens.
Public Outreach
Public outreach involves directly communicating with the public about
beach health and safety. You should consider whether notifications
and advisories are easily accessible and whether you are effectively
communicating key information. The National Beach Guidance (USEPA
2014) discusses a number of possible formats for conducting outreach. The
Chicago Parks Department has an especially good outreach program, which
includes a public education campaign and a Beach Ambassadors program
(see the case study for more information).
Other Uses for Predictive Models
A predictive model might provide other benefits to a beach program besides
being used for notifications. For example, the Michigan Department of
Natural Resources uses beach models as a tool to identify and remediate
sources of contamination to assist with Total Maximum Daily Load (TMDL)
development for beaches.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
51
Step 6: Evaluate the Predictive Tool
over Time
Introduction to Step 6
You should plan to evaluate your model periodically to verify that the
performance goals are being met. Many programs choose to assess model
accuracy at the end of the beach season. Any significant decreases in
performance might signal that environmental conditions that affect FIB
density have changed. For example, the season might have been unusually
wet or dry. In this case you might want to conduct more exploratory data
analysis (Step 3) and build a new model (Step 4) using the past season's data
as part of the historical database. Your "rebuilt" updated model may or may
not include the same explanatory variables. The overall goal is to keep the
model current with the environmental conditions that affect FIB density at
the beach.
Several of the case studies at the end of this guide describe situations that
required officials to adjust their model in response to changing conditions or
circumstances.
• South Carolina Department of Health and Environmental Control
updated their stormwater model by using radar data from NexRad
instead of the data they previously obtained from rain gauges.
• Milwaukee Health Department is updating their Nowcast model by
collaborating with a new partner for their local expertise and using
improved data at three beach sites instead of at the one beach where
the model was initially used. They hope to automate data integration,
translation, and loading to improve the efficiency of their model.
• The City of Racine plans to update their model every year to ensure it
is still predictive. They also will continue to evaluate whether they can
decrease monitoring frequency.
• Charles River Watershed Association has continued to enhance its
model over the past 15 years, is always looking at other parameters
that may improve model predictions to add to the model, and has a
future goal of real-time data collection for a real-time model.
Changes to the Fate and Transport of FIB
The predictive tools described in this guidance assume that the relationships
between FIB and the environmental conditions associated with the
explanatory variables remain constant over time. This is almost never the
-------
52
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 6: Evaluate the Predictive Tool over Time
case, however, because landscapes and human activities change over time
and may affect bacteria sources and their movement through the drainage
area. An annual sanitary survey of your beach would likely capture many
of these changes. Some of the factors that affect FIB movement include the
following:
• Land use alterations.
• Infrastructure changes (e.g., repairs to leaky sewer lines).
• Changes to bounding structures (e.g., jetties, breaker walls, piers).
• Changes in pollutant sources (e.g., increase or decrease in algal blooms
or presence of wildlife).
All of these factors can cause shifts in the underlying processes influencing
FIB densities at your beach.
Changes to Data Sources
Step 2 included a discussion of some of the key attributes of the data
needed to build and operate the model to make same-day FIB predictions.
In general independent variable data need to be collected in a manner
consistent with the historical data used to build the model. Additionally,
data collected locally are preferred over data obtained from external or
online sources, primarily because your model is site-specific. In reality,
however, your choice of data
sources is often driven by the
availability of funding and
resources Using data readily
available online is much
less expensive and resource
intensive to obtain than
deploying and maintaining
your own system of rain
gauges, weather stations, water
quality sondes, and other
equipment. For example,
USGS is working with VB
developers to make a variety of
explanatory data collected by
Federal agencies easier for users
to access and process using the
EnDDaT online system. As
Ozaukee County, Wisconsin
The Ozaukee County Public Health Department developed a model
for a lake. In 2012, they experienced unusual weather conditions—no
rain fell, and the lake temperatures were very warm. The biology and
ecology of the lake changed, and the nearshore environment became
the source of high FIB densities. Advisories were issued for about one
third of the 2012
beach season, and
the model was found
to be only 60 percent
accurate. A revised
model would only
be useful if these
conditions become a
trend.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Step 6: Evaluate the Predictive Tool over Time
described in the Stormwater Model (Horry County, South Carolina) case
study, the SCDHEC initially used rainfall data collected at local rain gauges
for their predictive models, but over time they switched to using NexRad
data, which eliminated the need for updates to and maintenance of the rain
gauges, while also improving timeliness and accuracy of the model. In other
cases, a beach program might have originally used data from NWS but
plans to install local rain gauges to get more accurate rainfall measurements
for their beach. The MHD initially collected data for its predictive model
using a sonde, but because of high maintenance costs, they chose to use
NWS rainfall data and the previous day's E. coli concentrations—along with
sanitary survey data, which provided additional insight on weather, rainfall,
algae content, litter, and wildlife.
If the data source changes, you will need to collect enough data to rebuild
your model (see the recommendations on amount of data in step 1), as the
relationships between the independent variables and FIB will change from
the relationships in the original model.
Changes to Your Beach Program
The needs of your beach program and the availability of resources can also
change over time. You will need to reevaluate your beach program and its
need for a predictive tool and assess whether you have the resources to meet
that need.
You also should evaluate your notification protocol over time to make sure
it is still appropriate for making the best decisions about beach notifications.
For example, if model results are highly accurate, a beach program that
initially used both sampling results and modeling results to make beach
notification decisions might decide to rely solely on modeling results for
their beach. In that case, they might limit sampling to the days on which
the model predicts an exceedance of the water quality standard or other
notification threshold.
S3
Credit: Ryan Hagerty/USFWS
-------
54
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Bibliography
Aim, E.W., J. Burke, and A. Spain. 2003. Fecal indicator bacteria are
abundant in wet sand at freshwater beaches. Water Research 37(2003)
3978-3982.
APHA (American Public Health Association). 1998. Standard Methods for
the Examination of Water and Waste-water, 20th ed. American Public
Health Association, Washington, DC.
Biedrzycki, Paul, Disease Control and Environmental Health, City of
Milwaukee Health Department. 2012-2013. Personal communication.
Boehm, A.B., R.L. Whitman, M.B. Nevers, D. Hou, and S.B. Weisberg.
2007. Nowcasting recreational water quality. In Statistical Framework
for Recreational Water Quality Criteria and Monitoring, ed. L. Wymer.
Wiley-Interscience, Chichester, West Sussex, England.
Breitenbach, Cathy, Chicago Parks District. 2012. Personal communication.
Briggs, Shannon, Michigan Department of Environmental Quality. 2012.
Personal communication.
Brooks, W.R., Fienen, M. N., and Corsi, S.R. 2013. Partial least squares
for efficient models of fecal indicator bacteria on Great Lakes beaches:
Journal of Environmental Management 114:470-475.
Charles River Watershed Association. Charles River Water Quality
Notification Flagging Program.
http://www.crwa.org/field-science/water-quality-notification.
Chicago Park District. 2012. Chicago Park District Improves Beach
Monitoring for 2012 Season, http://www.chicagoparkdistrict.com/
chicago-park-district-improves-beach-monitoring-for-2012-season.
Cicero, K. The 10 Best Beaches for Families: 2011. Parents Magazine. June
2011. Accessed January 22, 2013. http://www.parents.com.
Clark, J., Hortobagyi, M., and Yancey, K.B. Just for Summer: 51 Great
American Beaches. USA Today. March 27, 2012. Accessed January 22,
2013. http://travel.usatoday.com.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Bibliography
Converse, R.R., J.L. Kinzelman, E.A. Sams, E. Hudgens, A.P. Dufbur, H.
Ryu, J.W. Santo-Domingo, C.A. Kelty, O.C. Shanks, S.D. Siefring, R.A.
Haugland, and T.J. Wade. 2012. Dramatic Improvements in Beach Water
Quality Following Gull Removal. Environmental Science and Technology
46:10206-10213.
Cyterski, M., W Brooks, M. Galvin, K. Wolfe, R. Carvin, T. Roddick, M.
Fienen, S. Corsi. 2013. Virtual Beach 3.0.4: User's Guide. National
Exposure Research Laboratory, U.S. Environmental Protection Agency,
Athens, GA and U.S. Geological Survey, Middleton, WI.
Eleria, A. and R.M. Vogel. 2005. Predicting fecal coliform bacteria levels in
the Charles River, Massachusetts, USA. Journal of the American Water
Resources Association. No. 03111. October 2005.
Francy, D. 2009. Use of predictive models and rapid methods to nowcast
bacteria levels at coastal beaches. Aquatic Ecosystem Health and
Management 12(2):177-182.
Francy, D.S., and Darner, R.A. 2006. Procedures for Developing Models to
Predict Exceedances of Recreational Water Quality Standards at Coastal
Beaches: U.S. Geological Survey Techniques and Methods 6-B5, 34 p.
Francy, D.S., A.M.G. Brady, R.B. Carvin, S.R. Corsi, L.M. Fuller, J.H.
Harrison, B.A. Hayhurst, J. Lant, M.B. Nevers, P.J. Terrio, and T.M.
Zimmerman. 2013a. Developing and Implementing Predictive Models for
Estimating Recreational Water Quality at Great Lakes Beaches. Scientific
Investigations Report 2013-5166. U.S. Geological Survey, Reston, VA.
Accessed March 2015. http://pubs.usgs.gov/sir/2013/5166/pdf/sir2013-
5166.pdf.
Francy, D.S., E.A. Stelzer, J.W. Duris, A.M.G. Brady, and J.H. Harrison.
2013b. Predictive Models for Escherichia coli Concentrations at Inland
Lake Beaches and Relationship of Model Variables to Pathogen Detection.
USGS Staff-Published Research. Paper 706.
Fulton, Jeff. No date. Public Beaches in Chicago. USA Today.
http://traveltips.usatoday.com/public-beaches-chicago-53741.html.
Hansen, D.L., S. Ishii, M.J. Sadowsky, R. E. Hicks. 2011. Waterfowl
abundance does not predict the dominant avian source. Journal of
Environmental Quality 40:1924-1931.
-------
B6
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Bibliography
Hartmann, J.W., S.F. Beckerman, R.M. Engeman, and T.W. Seamans. 2013.
Report to the City of Chicago on Conflicts with Ring-billed Gulls and the
2012 Integrated Ring-billed Gull Damage Management Project. USDA
National Wildlife Research Center, Staff Publications. Paper 1145.
Helsel, D.R. and R.M. Hirsch. 2002. Statistical Methods in Water Resources.
Elsevier Publishing.
Hou, D., S.J.M. Rabinovici, and A.B. Boehm. 2006. Enterococci Predictions
from Partial Least Squares Regression Models in Conjunction with a
Single-Sample Standard Improve the Efficacy of Beach Management
Advisories. Environmental Science and Technology (40)6:1737-1743.
Kesteloot, K., A. Azizan, R. Whitman, and M. Nevers. 2012-2013.New
recreational water testing alternatives. Park Science 29(2).
Kinzelman, Julie, City of Racine. 2012-2013. Personal communication.
Kurdas, Stephan, City of Racine. 2012-2013. Personal communication.
Mas, D.M.L., and K. Baker. Fuss and O'Neill. BIT Guidance for Developing
Predictive Models for Ontario Beaches. Ontario Ministry of the
Environment. Toronto, Ontario Canada. February 2011.
Mednick, A.C. 2009. Accessing Online Data for Building and Evaluating Real-
Time Models to Predict Beach Water Quality. Publication PUB-SS-1063.
Wisconsin Department of Natural Resources, Madison, WI. Accessed
March 2015. http://dnr.wi.gov/files/PDF/pubs/ss/SS1063.pdf.
Mednick, Adam, Wisconsin Department of Natural Resources. 2012.
Personal communication.
NRDC (Natural Resources Defense Council). Testing the Waters: South
Carolina, http://www.nrdc.org/water/oceans/ttw/sc.asp.
Seltman, H.J. 2013. Experimental Design and Analysis, Chapter 4
Exploratory Data Analysis. June 10, 2013.
Olyphant, G.A., and R.L. Whitman. 2004. Elements of a predictive model
for determining beach closures on a real time basis: The case of 63rd
Street Beach Chicago. Environmental Monitoring and Assessment
98(1-3):175-190.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Bibliography
57
Our 7 Top Midwest City Beaches. Midwest Living Magazine. July-August
2010. Accessed January 22, 2013. http://www.midwestliving.com.
Porter, Dwayne, University of South Carolina. 2012. Personal
communication.
Rockwell, D., K. Campbell, G. Lang, D. Schwab, G. Mann, and R.
Wagenmaker. 2013. Beach Water Quality Decision Support System.
Technical Memorandum GLERL-156. National Oceanic and
Atmospheric Administration, Ann Arbor, MI. Accessed March 2015.
http://www.glerl.noaa.gov/ftp/publications/tech reports/glerl-156/
tm-156.pdf.
Rockwell, David, University of Michigan. 2012. Personal communication.
Schwab, D.J., and K.W. Bedford. 1994. The Great Lakes Forecasting System.
In Coastal andEstuarine Studies: Coastal Ocean Prediction, ed. C.N.K.
Mooers. American Geophysical Union, Washington, DC.
South Carolina Department of Health and Environmental Control. Beach
Monitoring Program.
http://www.scdhec.gov/HomeAndEnvironment/Pollution/
DHECPollutionMonitoringServices/BeachMonitoring/.
Southeast Coastal Ocean Observing Regional Association. Water Quality
Observations and Models Help Managers Make Decisions on Issuing
Swim Advisories, www.secoora.org.
Torrens, Sean, South Carolina Department of Health and Environmental
Control. 2012-2013. Personal communication.
USEPA (U.S. Environmental Protection Agency). 1999. Action Plan
for Beaches and Recreational Waters. EPA 600/R-98-079. U.S.
Environmental Protection Agency, Office of Research and Development
and Office of Water, Washington, DC.
USEPA (U.S. Environmental Protection Agency). 1999. Review of Potential
Modeling Tools and Approaches to Support the BEACH Program. EPA-
823-R-99-002. U.S. Environmental Protection Agency, Office of Science
and Technology, Washington, DC.
-------
B8
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Bibliography
USEPA (U.S. Environmental Protection Agency). 2002. Time-Relevant Beach
and Recreational Water Quality Monitoring and Reporting. United States
Environmental Protection Agency, Office of Research and Development,
National Risk Management Research Laboratory. EPA/625/R-02/017.
October 2002. Cincinnati, Ohio.
http://www.scdhec.gov/HomeAndEnvironment/Water/SwimSafety/
USEPA (U.S. Environmental Protection Agency). 2007. Report of the Experts
Scientific Workshop on Critical Research Needs for the Development of
New or Revised Recreational Water Quality Criteria. EPA 823-R-07-
006. U.S. Environmental Protection Agency, Office of Water, Office of
Research and Development. Airlie Center, Warrenton, Virginia.
USEPA (U.S. Environmental Protection Agency). 2010a. Predictive Tools
for Beach Notification. Volume I: Review and Technical Protocol. EPA-
823-R-10-003. U.S. Environmental Protection Agency, Office of Water,
Washington, DC.
USEPA (U.S. Environmental Protection Agency). 2010b. Predictive
Modeling at Beaches. Volume II: Predictive Tools for Beach Notification.
EPA-600-R-10-176. U.S. Environmental Protection Agency, National
Exposure Research Laboratory, Athens, Georgia.
USEPA (U.S. Environmental Protection Agency). 2010c. Sampling and
Consideration of Variability (Temporal and Spatial) for Monitoring of
Recreational Waters. EPA-823-R-10-005. U.S. Environmental Protection
Agency, Office of Water, Washington, DC. Accessed March 2015.
http://www.epa.gov/sites/production/files/2015-ll/documents/sampling-
consideration-recreational-waters.pdf.
USEPA (U.S. Environmental Protection Agency). 2012. Recreational Water
Quality Criteria. EPA 820-F-12-058. U.S. Environmental Protection
Agency, Office of Water, Washington, DC.
USEPA (U.S. Environmental Protection Agency). 2014. National Beach
Guidance and Required Performance Criteria for Grants. EPA-
823-B-14-001. U.S. Environmental Protection Agency, Office of Water,
Washington, DC.
Whitman, R.L. and M.B. Nevers. 2003. Foreshore Sand as a Source of
Escherichia coli in Nearshore Water of a Lake Michigan Beach. Applied
and Environmental Microbiology 69(9): 5555-5562.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach
Bibliography
B9
Whitman, R.L., D.A. Shively, H. Pawlik, M.B. Nevers and M.N.
Byappanahalli. 2003. Occurrence of Escherichia coli and Enterococci in
Cladophora (Chlorophyta) in Nearshore Water and Beach Sand of Lake
Michigan. Applied and Environmental Microbiology 69(8):4714-4719.
Whitman, R.L., V.J. Harwood, T.A. Edge, M.B. Nevers, M. Byappanahalli,
K. Vijayavel, J. Brandao, M.J. Sadowsky, E.W. Aim, A. Crowe, D.
Ferguson, Z. Ge, E. Halliday, J. Kinzelman, G. Kleinheinz, K. Przybyla-
Kelly, C. Staley, Z. Staley, and H. Solo-Gabriele. 2014. Microbes in beach
sands: integrating environment, ecology and public health. Rev Environ
Sci Biotechnol 13:329-368.
Wood, Julie, Charles River Watershed Association. 2012-2013. Personal
communication.
Ziegler, Dan, Ozaukee County Public Health Department. 2012. Personal
communication.
-------
This page intentioanlly left blank. -
-------
Case Study
The South Shore Beach Model (Milwaukee, Wisconsin)
61
Introduction
South Shore Beach is in Milwaukee, Wisconsin's
South Shore Park on the western shore of Lake
Michigan. South Shore Beach is a public beach with
150 meters of sandy shoreline within the South Shore
Marina (owned and operated by the South Shore
Yacht Club). A 20-meter embankment separates the
sandy beach area from a cobble/pebble beach area
that has a high-sloping shore (South Shore Rocky
Area). The entire beach and marina area is partially
enclosed by a breakwall, approximately 300 meters
offshore, which limits wave action, water circulation,
and exchange with the outer harbor. The beach is a
few kilometers south of Milwaukee Harbor and the
Milwaukee Metropolitan Sewerage District Jones
Island Water Reclamation Facility. Three rivers-
Milwaukee, Menomonee, and Kinnickinnic—reach
Milwauke
Bay
Lake
Michigan
a confluence prior to discharging to Lake Michigan
inside the Milwaukee Harbor breakwall.
Visitors to Milwaukee's beaches on hot summer
weekend days exceed 1,000 persons for all three
public beaches combined: Bradford Beach, McKinley
Beach, and South Shore Beach. South Shore Beach is
home to a number of waterfowl and shore birds given
its proximity to a public park and related greenspace.
South Shore Beach also experiences algal blooms of
cladophora, which is native to Lake Michigan and
nearshore environments.
In 1998 the City of Milwaukee Health Department
(MHD) decided to develop a beach water quality
predictive model for purposes of (1) improving
water quality forecasting at the public beaches and
(2) improving water quality advisories and related
messaging to public beachgoers when
water quality is unsafe for public
swimming or contact because of
elevated bacteria levels. In 2005 MHD
implemented a different predictive
model, and variations of the model are
still in use today.
Water Quality
South Shore Beach has a history of
poor water quality due to elevated fecal
bacteria levels. Potential sources of
fecal bacteria contamination include
combined sewer overflows (CSOs);
urban/suburban and agricultural
runoff from the Milwaukee River
Basin; runoff from impervious surfaces,
including South Shore Park parking lots,
pedestrian sidewalk and roadways, and
marina infrastructure including docks,
slips, and boats; and domestic and wild
animal populations including Canadian
geese, seagull, and other waterfowl
flocks. The beach is directly adjacent to
the South Shore Yacht Club and a small
paved parking area that drains into the
lake.
-------
62
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
The South Shore Beach Model (Milwaukee, Wisconsin) (continued)
Milwaukee Harbor. (USAGE)
The Natural Resources Defense Council has included
South Shore Beach several times on its list of the top
10 dirtiest beaches in the United States. A possible
contributor to the water quality problem might
be an offshore breakwall (stone jetty), designed to
block wave action and protect the lakefront from
erosion. Unfortunately, it also limits the circulation
of freshwater into the shallow-depth beach area.
Pollution that enters the relatively stagnant lake
through runoff near or around the beach area is
therefore not readily turned over. To reduce pollution
entering the lake, Milwaukee County installed a
trench drain and rain garden along a parking lot
near the beach. These practices were ineffective.
The county is considering relocating the beach 100
yards south—to the other side of the breakwall—as a
possible long-term solution to improving beach water
quality conditions during the summer season.
Model Development
MHD used two different models over time—the
EMPACT model and the Nowcast model. Both are
described here in separate subsections.
EMPACT Model
In 1998 MHD developed a statistical model for three
of its public beaches using funding awarded through
the U.S. Environmental Protection Agency (EPA)
Environmental Monitoring for Public Access and
Community Tracking (EMPACT) grant. The model is
E. co//.
based on 24-hour rainfall data and previous 24-hour
bacterial sampling data (E. coli MPN/lOOmL),
which are the two most predictive variables. The
University of Indiana and U.S. Geological Survey
(USGS) assisted MHD in developing the model.
Key factors when selecting which beach model
to further develop and refine were the amount of
funding and availability of technical support (both
data management and model development) that
could be leveraged to achieve improved predictive
water quality outcomes. The EMPACT program
significantly helped MHD take advantage of new
technologies to provide environmental risk-related
information to the public in a reliable and accurate
near real-time context.
When developing the model in 1998, the MHD
was initially excited for the opportunity to try new
technology for improving the accuracy of water
quality advisories; however, the project posed many
unanticipated technical and maintenance challenges.
To collect data for the model, USGS used a sonde.
A sonde is a water quality monitoring instrument
that can measure numerous parameters including
temperature, conductivity, salinity, dissolved oxygen,
pH, turbidity, and depth. The harsh lake environment
was unsuitable for long-term deployment of
instrumentation. Furthermore, MHD did not
have sufficient internal capability or resources to
adequately manage the myriad of sampling, data
analysis, and routine equipment maintenance. In
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
63
The South Shore Beach Model (Milwaukee, Wisconsin) (continued)
South Shore Boat Park.
addition, budget and staff cuts made the model too
complex to sustain by a local public health agency
with limited environmental health fiscal resources.
Eventually MHD exhausted all funding and related
external agency technical support, and stopped using
the EMPACT model as its primary predictive model
at the end of the 2004 beach season. The EMPACT
project provided valuable insight, however, into the
challenges of developing cost-effective and sustainable
predictive water quality models at the local level and
in the context of the Lake Michigan public beach
environment.
Nowcast Model
After the 2004 beach season, MHD decided that
a simpler Nowcast model would be more effective
and discontinued use of the EMPACT model. For
the Nowcast model, the development team selected
a single public beach in Milwaukee (South Shore
Beach), where the monitoring equipment could be
located near a secure power source, protected against
vandalism, and shielded from harsh environmental
conditions. South Shore Beach also traditionally
recorded the highest fecal indicator bacteria counts
and, therefore, represented the highest potential
health risk to the public during the typical beach
season (June-August).
It took MHD approximately 6 to 8 months to develop
the Nowcast model, which was ready for use by the
start of the 2005 beach season. If developed today,
MHD could have done it more efficiently because
better statistical and modeling software is more
widely available and less costly to the end user.
Data and Variables
EMPACT Model
The initial variables MHD considered for the
EMPACT model included total rainfall for the
previous 24 hours, pH, conductivity, wave height,
water temperature, and Escherichia coli densities
from the previous 24-hour sampling period. The
MHD deployed a sonde in the water near the beach to
collect real-time water quality data. National Weather
Service (NWS) was utilized to derive daily rainfall
data, which relies on geographically dispersed city
-------
64
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
The South Shore Beach Model (Milwaukee, Wisconsin) (continued)
weather stations and gauges. In addition, sanitary
surveys (typically conducted annually by the MHD)
were useful in identifying and describing site-specific
attributes and pollution influences to each of the
Milwaukee public beaches. The MHD used regression
analysis to determine which independent variables
of interest might be most highly associated with or
predictive of elevated E. coli counts at public beaches
on a seasonal basis.
Predictive variables differed between beaches, but
rainfall data were used in determining water quality
advisories at all three. Total rainfall over a previous
24-hour period was determined to be an important
predictive variable in all three beach models primarily
due to contributions of: (1) wastewater treatment
plant induced CSOs and diversions, (2) sanitary sewer
cross-connections and infiltration, and (3) stormwater
runoff. MHD continued collecting select physical and
chemical water quality data to integrate within beach
water quality modeling through 2004.
Design of beaches varies greatly and can determine the
magnitude of impact, as well as duration of a pollution
event (how much pollution input and time interval
required for a beach to naturally recover). More
specifically, for Milwaukee beaches, total rainfall was
most highly correlated with bacterial contamination
and predictive of water quality exceedances when
it exceeded one-half inch along with temporal
occurrence early in the beach season (June).
Raw and summarized data were available daily or
by request through a public website. MHD collected
the data electronically via the sonde and transmitted
it to the website, after review and analysis, for use
by academic and research entities, the general
public, and other interested parties (e.g., media and
environmental groups).
Nowcast Model
The Nowcast model that MDH developed after
the 2004 beach season is primarily dependent and
Wisconsin BeachJJ&aJtJ
m
JuaJity Button on me tart bacfl lountynanaies water testing snsaflwsones maepenaenny. L
counties ate available through the Tit* and County Health Depart-nent Contact' link on Hie left
There are more beactiesm Wisconsin ttian appear iii Ihisiisl The beaches In tftis ftsi have current or historical E coli monitoring data in the wiflesch Heath system
M .' '.I-1- Annan Reason Darts Of This MvHory NHreHTown BHCAMUOI
'::..' ':--, E i:.
Wisconsin Beach Health Advisory website.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
65
The South Shore Beach Model (Milwaukee, Wisconsin) (continued)
Bradford Beach.
based on regional precipitation, using the previous
24-hour rainfall total. The MHD model development
team continues to explore and identify markers for
nonpoint sources of pollution including chemical
biomarkers in stormwater discharge (e.g., caffeine
and triclosan derivatives). Avian and waterfowl
populations, as well as algal impact, are noted
but have not been particularly predictive of beach
water quality in terms of contributions to microbial
contamination of public health significance.
Cladophora blooms, however, have increased in the
past decade at each public beach, causing primarily
nuisance and aesthetic concerns (e.g., objectionable
odor and water discoloration). USGS and the
Wisconsin Department of Natural Resources support
all data collection and statistical analyses needed to
develop and implement the Nowcast model. Most
recently, the MHD partnered with research faculty
at the newly formed Zilber School of Public Health
at the University of Wisconsin-Milwaukee (UWM)
to improve Nowcast modeling and identify other
indicators of water contamination predictive of or
directly associated with adverse public health impact.
Model Implementation
MHD exclusively used the EMPACT model for
beach water quality advisory decision making
from 1998-2004. However, the model expressed
predicted exceedances with a maximum accuracy of
60 percent-70 percent at only one beach and often
approached only 50 percent accuracy at the remaining
two beaches. MHD also noted that the model's
predictive accuracy tended to wane at each beach as
the summer progressed, which suggests some level
of seasonality or unidentified influences to water
quality between early season and late season beach
monitoring periods. As a result, MHD confidence in
sustaining the model diminished over time. MHD
assessed the effectiveness of the model by examining
the degree of sensitivity and specificity. The criterion
for issuing advisories was the exceedance of EPAs
single sample maximum or geometric mean threshold
for E. coli as expressed in MPN/lOOml.
MHD uses the Nowcast model output for beach
advisories. Because model results continue to be less
than optimal in terms of predictive value, MHD
relies on long-term trending of data and overall
environmental conditions (i.e., water temperature,
multiple day bacterial sampling results, and heavy
rainfall) to refine the issuance of water quality
advisories. MHD posts advisories for 24-hour
intervals and uses the model and trending to
determine when the advisories can be lifted. MHD
would like to see more readily visible, meaningful,
and informative public signs posted on each of the
beaches including explicit illness risk and prevention
-------
66
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
The South Shore Beach Model (Milwaukee, Wisconsin) (continued)
messaging. However, some key community
policymakers and associated stakeholders (beach
operators) are concerned that signs would interfere
with the beach ambiance, tourism, and patron
use. Current beach water quality advisory signage,
therefore, remains limited in size, posting, location,
and level of content.
Model Cost
The overall cost to develop the EMPACT model was
initially in the range of $50,000-$75,000. The most
costly aspects were siting, maintaining, and refining
the beach sonde because of the harsh Lake Michigan
environment and lack of MHD in-house capacity
and expertise in this regard. Overall, the model did
not prove to be cost effective due in large part to the
cost of maintaining the sonde. Annual maintenance
costs for the sonde ranged from $5,000-$10,000.
New equipment replacement and upgrades cost an
additional $20,000-$50,000 every 2 years.
Milwaukee's beach program currently has a budget of
around $50,000. MHD is no longer using the sonde
and has saved additional money by partnering with
the Zilber School of Public Health at UWM, whose
graduate students do some of the sampling and data
collection. They have even been able to increase the
sampling frequency to 5 times per week at each public
beach over the season. This represents a marked
improvement from 1-2 times per week since 2006.
Issues Encountered
For the EMPACT model, the sonde equipment was
placed in a very harsh environment. It required
weekly maintenance. Security and data feed issues
contributed to the challenges encountered. MHD
relied on external sources to provide the maintenance
and replaced equipment on a more frequent basis
than originally anticipated.
In addition to the issues MHD had with the sonde,
they did not have sufficient funding for refining and
sustaining the model use. The only statistical software
they had in-house (Epi-Info) was primarily directed
toward use in tracking the spread of communicable
disease and outbreak management, which was not
useful for developing an environmental predictive
model for a beach water quality monitoring program.
MHD needed software that is readily available
and easy to use with basic comparison analysis
capabilities. Most public health agencies do not have
these resources in-house, and they do not have the
technical familiarity and capabilities to effectively use
the resources. This often creates a knowledge gap and
vulnerability with regard to environmental statistics
collection, analysis, trending, and interpretation.
The EMPACT model was not piloted or tested before
implementation. In hindsight, MHD should have
presented the model to the regional beach stakeholder
group for reaction and feedback, as well as to conduct
beta testing. Moreover, conducting a more thorough
comparative analysis with other available models
and methodologies as part of model implementation
would have been helpful. In hindsight, the MHD staff
did not have sufficient knowledge and expertise to
design, develop, implement, and evaluate a model that
could be cost effective and sustained.
Moving Forward
MHD has developed Nowcast models for each of the
three public beaches located in Milwaukee. MHD
developed the Bradford beach Nowcast model in
partnership with the Zilber School of Public Health
at the UWM and is working with Dr. Todd Miller
and graduate students to conduct field sampling and
monitoring on a seasonal basis. The MHD/UWM
team collected water samples from Lake Michigan
at three beach sites (Bradford, McKinley, and South
Shore) from early June until late August 2015. UWM
and MHD assessed these water samples for E. coli
levels. UWM also investigated fecal coliform levels.
In addition to fecal indicator bacteria, Dr. Miller's
study is looking at chemical markers in wastewater,
specifically the identification of wastewater bacteria
involved in the degradation of triclocarban. This has
been shown to be very effective in predicting FIB
exceedances at beaches. Dr. Miller is also looking at
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
67
The South Shore Beach Model (Milwaukee, Wisconsin) (continued)
temporal fluctuations in E. coli sampling; morning
and afternoon results might be markedly different.
This collaboration has yielded benefits in both the
leveraging of available local expertise and improving
the understanding of beach water quality as related
to the protection of community health. The MHD/
UWM team also recorded environmental conditions,
including weather, rainfall, algae content, litter,
and wildlife, for each beach on every date that
they collected water samples. They will continue to
translate and load the data into a database for long-
term storage, analysis, and prediction forecasting.
Further work will automate data integration,
translation, and loading. The team will explore
development of a website and an appropriate secure
interface to provide access to elements from the
database and forecasting framework to researchers,
other government agencies, and members of the
public. The team continues to use USGS EndaTT
service, NWS data and sanitary surveys periodically
conducted by the MHD. They believe that these types
of readily available inputs will result in a more cost-
effective model for use by the MHD in determining
seasonal water quality advisories at each of the
public beaches. The team is no longer using the
sonde equipment, which has significantly reduced
maintenance costs. They are currently using rainfall
data and the previous day's E. coli concentrations.
The focus of the new modeling efforts was expanded
to all three beaches in 2014, although significant
attention continues to be spent on water quality
conditions at Bradford Beach. Bradford Beach is very
popular and supports various recreational activities,
including national volleyball tournaments, and was
numerous beachfront attractions including a pavilion,
beachfront tiki bars, and recreational equipment.
Finally, the team hopes to refine the predictive model
and generate more hypotheses on the contribution
of various sources of intermittent pollution at each
public beach. For example, they have determined that
birds and algal blooms were not particularly relevant
factors at every beach and that chemical markers
in wastewater, along with sub-daily fluctuations
of E. coli concentrations, may be more important
in future predictive modeling initiatives. In 2016,
MHD is planning to pilot the implementation of
buoy equipped with various water quality sensors at
each beach by partnering with Dr. Todd Miller. They
will evaluate the ability to more rapidly collect data
relevant to beach water quality conditions and refine
existing models to improve predictive accuracy.
Advice and Lessons Learned
In 1998 the EMPACT beach predictive model
developed and used by MHD was cutting-edge
because it attempted to identify key environmental
variables other than rainfall that would help predict
elevated E. coli levels at three public beaches in
Milwaukee. It also pioneered the collection and
analysis of select and real-time physical and chemical
characteristics of beach water quality for use by local
public health authorities in determining the need for
posting of beach water quality advisories. However,
the model did not readily improve predictive accuracy
as compared to simple use of previous 24-hour
rainfall measurements, nor was it cost effective. The
project did, however, provide valuable information
about the unique characteristics and attributes of
each beach site in Milwaukee and it allowed MHD
to consider continued exploration of more scientific
and evidence-based approaches important to the
successful development, testing, implementation, and
evaluation of future predictive models.
Overall, the struggles with the initial EMPACT model
had a major impact on MHD's beach monitoring
program. They do not regret going through the
process because of how much they learned. MHD
understands that citizens expect them to protect
public health; therefore, they need the tools to provide
the best available information and meet the needs of
each community. The model used must be a good fit
for the local public health department—in MHD's
case, this meant a simple, low-maintenance, user-
friendly model that allows them to share accurate
health information with the public. It is very
-------
68
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
The South Shore Beach Model (Milwaukee, Wisconsin) (continued)
important to earn and keep the public's trust. False
positives and errors must be minimized.
Paul Biedrzycki of MHD also offers the following
advice to fellow beach managers:
1. Conduct a broad stakeholder planning and review
process.
2. Review evidence-based best practices from other
jurisdictions and research studies.
3. Build "buy-in" from local policymakers for
resource allocation (program funding).
4. Develop quality assurance and quality control
criteria.
5. Anticipate resources needed for sustainability.
6. Conduct independent evaluation and review.
7. Conduct thorough piloting/testing phase before
implementation.
In general, local public health departments have
increasingly limited resources to conduct either
extensive or comprehensive environmental health
assessments. It is anticipated that the public health
sector will continue to experience significant budget
cuts at the local, state, and federal levels in the near
future. While sustainability and green movements
have provided some moderate assistance in terms
of additional community resource availability,
governments are not growing and state agency budget
and revenue sharing with locals is being reduced.
Therefore, collaboration and information sharing
between entities is essential if recreational water
quality monitoring programs are to remain in the
future. Partnerships between states and within states,
as well as between a diverse group of stakeholders
(e.g., environmental groups, universities, community
organizations, and federal agencies), must be fostered
and encouraged.
References
USEPA (U.S. Environmental Protection Agency).
2010b. Predictive Modeling at Beaches. Volume
II, Predictive Tools for Beach Notification. EPA-
600-R-10-176. U.S. Environmental Protection
Agency, National Exposure Research Laboratory,
Athens, Georgia.
Biedrzycki, Paul. Disease Control and Environmental
Health, City of Milwaukee Health Department.
Personal communication. 2012.
-------
Case Study
69
Charles River Watershed Association Flag Program
(Boston, Massachusetts)
Introduction
The Charles River, flowing about 80 miles from
Hopkinton, Massachusetts, to its terminus in Boston
Harbor, is one of the busiest recreational rivers in
the country. On a typical summer weekend, the river
will attract tens of thousands of people in a large
and often colorful array of vessels including canoes,
kayaks, dragonboats, sailboats, fishing boats, and
rowing shells. Unfortunately, given the urban nature
of development along the river (it runs through 23
cities and towns), a variety of sources of pollution,
including combined sewer overflows (CSOs), cause
water quality problems, especially in the Lower
Basin—the approximately 9-mile stretch from the
Watertown Dam to the New Charles River Dam.
In 1998, the Charles River Watershed Association
(CRWA) initiated a flag program, flying color-
coded flags to alert people about water quality
conditions in the Charles River Lower Basin. This
case study explores the efforts of the CRWA to build
the scientific foundation of the flag program by
developing a water quality model.
Water Quality
In 1995 the U.S. Environmental Protection Agency
(EPA) established the Clean Charles Initiative with
the purpose of restoring the Charles River and
making it fishable and swimmable. Much progress
has been made, thanks to the collaborative efforts
of EPA; other federal, state, and local government
agencies; nonprofit groups; private institutions; and
-------
7O
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued)
The Charles River.
the public. However, more work remains. Stormwater
runoff and CSOs remain a special concern and, while
water quality is usually sufficient for boating and
other secondary contact water activities, swimming
and other activities involving continuous full-body
contact are not recommended because of bacterial
levels that exceed primary contact standards.
Model Development
CRWA was founded in 1965 for the purpose of
spearheading projects aimed at cleaning up the
Charles River. Conditions improved over time,
allowing more people to safely use the river for
secondary recreation use; however, the river remained
impaired for bacteria, especially during wet weather.
Therefore, in 1998 CRWA, in a joint project with Tufts
University and funding from EPA, began developing
a statistical model that predicts the likelihood of a
violation of the state boating standard in the Lower
Charles River Basin. One of the project's goals was to
be able to forecast and publicize daily water quality
conditions. The Lower Charles River Basin does not
have a swimming beach, but it is the busiest section
of the river and secondary recreational activities
continue to expand. CRWA initially developed two
different statistical models, adopting the one with
the best performance. It took a few years to build up
a data set of indicator bacteria sample results large
enough to use to develop the model.
A former staff member developed the original model
as part of their master's thesis at Tufts University. To
select the model's variables, the project team conducted
a literature review of similar projects, with the major
limitation that data used had to be readily available on
a daily basis. The best predictive variables were rainfall
volume, river flow, and wind. The project team used
the ordinary least squares (OLS) method in Minitab*
to develop the regression model, and they used
Microsoft Excel to run the equation on a daily basis.
An intern at CRWA who had recently received a
master's degree, overseen by Julie Wood, updated the
model in 2009 to account for changes in availability
of real-time data and a switch from fecal coliform to
Escherichia coli as the primary indicator bacteria for
state water quality standards.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
71
Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued)
Model Implementation
In 1998 CRWA began flying color-coded flags to alert
people about water quality conditions in the Charles
River Lower Basin. Flown from July through October
at select shore locations between Watertown and
Boston Harbor, CRWA flags informed boaters about
E. coli bacteria levels and blue-green algae blooms.
Specifically:
• A blue flag indicates CRWAs forecast that the
likelihood of bacteria exceeding the boating
standard is less than 50 percent and a blue-green
algae bloom is not present.
• A yellow flag indicates that health risks are
possible, but data are inconclusive to predict risks
with certainty. Yellow flags are flown when signs of
a blue-green algae bloom are present but the actual
human health risk is unconfirmed or unknown.
• A red flag means that the probability of the river
exceeding boating standards is equal to or greater
than 50 percent, or that a health risk is present
because of a confirmed blue-green algae bloom.
Red flags are also flown for 48 hours following a
reported1 CSO.
The decision on which color flag to fly is based on
the results of a mathematical model that uses rainfall
and other weather factors along with river conditions
to estimate the probability of the river exceeding the
state secondary contact recreation (boating) standard
of 630 E. coli colony forming units per 100 milliliters
of water (cfu/100 mL). In addition to the model,
CRWA collects weekly water samples to help verify
model predictions and to add to the database of water
quality information.
Over the past 15 years, CRWA has continued to
enhance the model; water sampling has confirmed
an accuracy rate of about 90 percent for predicting
water quality violations. The program provides daily
advisory information and allows river users to make
more informed decisions about recreating on the river
Red advisory flag indicating potential health risks.
1 Unfortunately, only 1 of the 11 active CSOs in the Charles River
Lower Basin provides real-time overflow notifications.
Charles River Watershed Association Flag Program
website.
-------
72
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued)
on that day. The program is not used for enforcement
actions, and the river is never closed to the public on
the basis of model results.
The model-generated advisory information continues
to be communicated to the general public through
the posting of color-coded flags and through email,
CRWA's website, Twitter, and a telephone hotline.
Eleven facilities fly the color-coded flags along the
river, providing a valuable public service. These
facilities include yacht clubs, boating centers, canoe
and kayak outfitters, and Harvard University's famed
Weld Boathouse.
Model Costs
Key costs for model development included labor
costs and sample analyses. Labor to collect and
compile online data was the most significant cost. In
some cases, older weather data had to be purchased.
Collecting and organizing free data into a usable
format, especially when it must be formatted to work
with a specific statistical software package, can be
time-consuming. Collecting and analyzing E. coli
samples also required staff time and lab costs of about
$30 per sample. CRWA collects four samples, once
or twice each week to verify its model predictions.
Before implementing the model, monitoring occurred
at least twice a week up to as often as daily. Since
implementation, CRWA has been able to reduce
monitoring frequency to weekly when funding is
limited. CRWA believes that the cost of the model
is offset by the value of the daily water quality
notifications for public health and safety.
Issues Encountered
Challenges associated with model development
included the following:
• Choosing input variables that were easily available
daily. These include rainfall volume (previous 24,
48, 72, and 168 hours), wind speed, time since
specific rainfall volume (more than 0.01 inch;
more than 0.1 inch), flow, and solar radiation.
These data are available from the National Oceanic
and Atmospheric Administration or the U.S.
Geological Survey.
• Building a database of E. coli concentrations for
model calibration and verification.
• Meeting the needs of all users.
• Working with a limited budget.
The biggest challenge that CRWA faced in the
development phase was the availability of data to test
predictive factors. The CRWA did not collect any real-
time data, so it could use only what was available on
the Web. Consequently, CRWA had to rely on other
organizations to continue to collect the data and
publish it in a timely manner.
Data availability continues to be a challenge in
the implementation phase. The model is run every
morning at 8 a.m. in the recreational season when
data are available. Usually CSO discharge data are
not collected, but while CSOs are not a part of the
statistical model, any existing discharge information
is incorporated in the notification protocol.
It is a time-consuming process to develop and
employ a model. CRWA runs its model Monday
through Friday from July through October. On Friday
afternoons, CRWA provides a weekend forecast using
model simulations based on weather predictions.
CRWA has discussed running the model on the
weekends and has done so on occasion; however, this
is logistically challenging because most of the staff
work only Monday through Friday. The model is run
once a day around 8 a.m. This limits its utility for the
river users (primarily scullers), of which there are
many who are out on the river in the early mornings.
Additionally, the model is not updated throughout
the day, although in reality water quality conditions
do change continuously. Finally, since the model is
not run on weekends, accurate information is not
available to weekend users.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
73
Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued)
Weld Boathouse at Harvard University flying a blue flag indicating suitable boating conditions.
Moving Forward
In 2012 CRWA added two additional boathouse
locations where flags are flown (12 sites total) to
provide more complete coverage of the area.
CSOs are a major challenge to maintaining the river's
water quality. Under the CSO control plan for Boston,
some CSOs may remain in the long term. Under
the control plan, some CSOs have added primary
treatment and notification, but several have not. A
goal and priority for CRWA is to continue to reduce
CSOs significantly and notify the public in a timely
manner in the event that CSO discharges occur.
Recreation continues to expand in the watershed
and might include swimming in the future if water
quality improves. Real-time modeling is expected to
help document improving water quality and serve
as a notification tool for water-based activities in the
Charles River.
The CRWA is collaborating with Coastal
Environmental Sensing Network (CESN) at the
University of Massachusetts in Boston. CESN
established a real-time weather station and wrote a
program that allows data from the weather station to
be continuously fed into the model, along with flow
data. The station went online in August 2012; the
group has verified data starting in September 2012.
So far, the group has eight overlapping sampling
points with weather station data for October and
three overlapping sampling points for September.
The group completed the analysis of overlapping data
during the 2013 season. Running the model using
data inputs from this new weather/water quality
station is working well. The accuracy of the model
using inputs from this station has improved when
compared to the current system because the model is
automatically updated every hour based on the most
recent data.
-------
74
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued)
Although CRWA does not have additional resources
to put toward the real-time data collection, the group
would like to develop a real-time model; continuing
to collaborate with the university will make this
goal more realistic. CRWA also hopes to add other
parameters, such as turbidity, to the model. A real-
time model would be more effective for quickly
notifying the public of water quality conditions
because the Charles River hosts a wide variety of
recreational activities. For example, water quality
forecasts go out at 9 a.m. (based on NOAA updates
at 8 a.m.). However, rowers are out on the water at
5 a.m.—well before any water quality notifications
are available. Real-time forecasting capabilities
would greatly improve the program.
Unfortunately, the long-term outlook for the project
depends on the resources CESN and CRWA have
available to continue to maintain the weather station
and the real-time data feed to the model.
Advice and Lessons Learned
In light of the experience and success of CRWAs
modeling efforts, Julie Wood of CRWA recommends
that beach managers "go for it" with regard to
developing their own models. The model does not
have to be complicated—a simple regression model
can be effective in many systems to broadly predict
possible risk. In addition, it is important to consider
the availability of your staff to run the model and
post notifications, since that affects how often the
model can be run. It can be especially challenging if
you want to run the model on the weekends. Overall,
resources are a major factor when developing and
implementing predictive models.
Based on their experience with the CESN station,
the CRWA team recommends that model developers
select a model that can be automated and run
continuously in real-time based on data readily
available on the Web. You will still need staff to
collect the samples to verify the forecasts, but you will
not need staff to run the model. You can run this type
of automated model every day of the week and early
in the morning, providing water quality predictions
based on the most current data. This would help meet
public expectations for real-time now-casting in very
fine timescales.
References
Charles River Watershed Association. Charles River
Water Quality Notification Flagging Program.
http://www.crwa.org/field-science/
water-quality-notification.
Eleria, A. and R.M. Vogel. 2005. Predicting fecal
coliform bacteria levels in the Charles River,
Massachusetts, USA. Journal of the American
Water Resources Association. No. 03111. October
2005.
Wood, Julie. Charles River Watershed Association.
Personal communication. 2012-2013.
-------
Case Study
Chicago Park District Beach Modeling (Chicago, Illinois)
75
Introduction
Chicago's 26 miles of shoreline along Lake Michigan
provide residents and visitors with many water-based
recreational opportunities. Especially popular are
a series of 24 beaches owned and managed by the
Chicago Park District (CPD). Over 20 million people
visit these beaches each year between Memorial Day
weekend and Labor Day to swim and enjoy the sand,
sun, and scenery. CPD's mission with these beaches,
as with all their parks, is to provide a customer-
focused experience that prioritizes and responds to
the safety and needs of children and families.
To aid in providing a safe beach environment CPD
developed a system of colored flags to communicate
safe swimming status at the beaches. A green flag
means that weather conditions and water quality
*—^lontrose Beach")
are good and swimming is permitted. A yellow flag
indicates that swimming is permitted but beach-
goers are cautioned that weather conditions are
unpredictable and/or water quality does not meet
state swimming standards. A red flag indicates that
swimming is not permitted either because weather
or water quality is causing unsafe or dangerous
conditions.
In general, the lifeguards stationed at each beach are
responsible for monitoring weather conditions and
changing swim status when necessary. However,
while beachgoers can usually relate to unsafe weather
conditions such as high waves and lightening, unsafe
water quality conditions are not nearly as obvious.
Currently, CPD's decision to change swim status
due to water quality is based on two complementary
approaches: (1) analysis of water
samples and (2) a computer model that
uses weather and hydrology data and
water conditions to predict real-time
water quality.
Lake Michigan
Park
Washington
Jackson
i Park
63rd Street Bea
aclT)
Illinois
Indiana
own.
Water Quality
Most water quality problems found
at CPD's beaches can be linked to
nonpoint sources of pollution origi-
nating in the small watersheds along
the shoreline. Runoff from roadways,
parklands, and other nearshore land
areas collects and drains to the lake
through a network of stormwater
outfalls. Chicago's human sewage is
not directed into Lake Michigan except
during extreme storm events, when the
locks that separate the Chicago River
system from Lake Michigan are opened
to minimize or prevent flooding.
CPD believes that the relatively
large resident gull and Canada geese
populations are one of the most
significant contributors to the pollution
load at the beaches. In response, the
District has initiated various programs
-------
76
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Chicago Park District Beach Modeling (Chicago, Illinois) (continued)
Uniformed border collie chasing birds off the beach.
to discourage their presence, including prohibiting
feeding and using uniformed border collies to chase
birds off the beaches.
Similar to most actively managed freshwater beaches,
CPD routinely collects water samples and has them
analyzed in a laboratory for E. coli. Samples are
collected at each beach every Monday through Friday
during the swimming season. Additional samples are
also collected through the weekend if weekday results
show high levels of E. coli.
CPD's sampling program follows U.S. Environmental
Protection Agency (EPA) guidelines and protocol
for water collection and laboratory analysis for E.
coli concentration. The bacteria culture process
takes 18-24 hours to complete (Colilert method);
consequently, sample results are not available until
the day after they are taken. If E. coli levels are found
to be above the state's water quality standard of
235 CFU/lOOmL, the water is considered unsafe for
swimming. CPD subsequently notifies the public of
the threat through their website and other outlets and
by posting an advisory at the beach and changing the
swim status flags.
Fortunately in most instances when E. coli levels are
found to be above the 235 CFU/100 mL water quality
standard, the next-day's sample results are usually
below the water quality standard. Part of the reason for
this phenomenon is because the large open shoreline
encourages water circulation between shore waters
and deeper offshore waters. Thus, bacteria that enter
most beach areas during and after storms are dispersed
and flushed away from near-shore areas fairly rapidly.
However, beaches that are sheltered in an embayment
or protected by piers or seawalls often do not circulate
their beach water as freely and sometimes experience
more persistent high bacteria levels, with swimming
advisories lasting multiple days.
The fact that high FIB levels at most Chicago beaches
only last a day underscores the problem of having at
least an 18-hour lag time between sample collection
and laboratory results. Beachgoers are unknowingly
swimming in water with high FIB levels the day the
water sample is collected, and are advised not to swim
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
77
Chicago Park District Beach Modeling (Chicago, Illinois) (continued)
the following day, when levels are usually safe based
on the analysis of that day's sample. This lag-time
problem caused CPD to explore the possibility of
developing a predictive mathematical model so that
beach management officials could make more timely
decisions concerning swim status and thus better
protect the health of the beach-going public.
Model Development
CPD began the predictive modeling project in 2011
with the assistance of the U.S. Geological Survey
(USGS) and a $243,000 Great Lakes Restoration
Initiative (GLRI) grant. Together the agencies decided
on a group of weather-related parameters that could
potentially be incorporated into the model. They
then developed and deployed buoys for in-water
measurements and pole-mounted weather stations
near the beaches to monitor atmospheric conditions.
Given resource limitations, CPD decided to initially
focus on a set of Chicago beaches that: (1) most
frequently exceeded the E. coli criteria and (2)
had the highest beach attendance. Eventually five
beaches were selected for the modeling exercise.
The list included the largest beach in size (Montrose
Beach) to one of the city's most popular (Oak Street
Beach). The other three beaches were Foster Beach,
63rd Street Beach, and Calumet Beach. All the
beaches are primarily affected by nonpoint sources
of contamination and have a history of E. coli
exceedance rates between 8 and 15 percent (percent
of days when the mean of two samples exceeds 235
CFU/100 mL) over the last few years. Attendance
records for the beaches ranged from approximately
100,000 visitors to several million visitors per
swimming season.
Foster Beach.
-------
78
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Chicago Park District Beach Modeling (Chicago, Illinois) (continued)
Model Development Key
Components
Technical and Financial Resources
The USGS was instrumental in getting the project
off the ground. They helped select the monitoring
equipment and trained staff to use and maintain
it, USGS also provided guidance on developing
the model and performed statistical analyses. The
Lake County Health Department, which already
has experience implementing a predictive model
program, also provided expertise during model
development. In addition, several presentations at the
Great Lakes Beach Association Conferences provided
Installing monitoring equipment at Chicago beaches.
CPD staff with options and a variety of potential
methods for developing the model.
The models were developed using multivariate
regression analysis. The USGS selected variables by
identifying the ones that fit best statistically. USGS
considered including gull counts, but found that this
information was difficult to use and implement in the
context of the model.
CPD used its own resources to deploy the monitoring
equipment, including scuba divers, electricians, and
related heavy equipment such as boats and a bucket
truck for installing the weather station on light poles.
If CPD had contracted the installation, costs would
have increased significantly.
Currently CPD provides funding for data collection
and equipment maintenance but continues to rely on
the USGS to perform statistical analyses. CPD could
possibly hire contractors to complete this work, but
few would have the necessary depth of understanding
of Lake Michigan ecology.
CPD spent approximately a year and a half developing
the first models and expects to change and improve
them with additional data in the future. CPD initially
anticipated the need for two years of data to have
working models developed because results depend
strongly on the weather. The Chicago area has very
different beach seasons from year to year; therefore, a
larger data set will help improve the model's accuracy.
Data Resources
When developing the model, CPD relied on daily
weather and water quality data, along with water
quality data collected as part of CPD's existing beach
monitoring program. CPD also considered data
collected during daily sanitary surveys for model
development purposes.
USGS explored whether other data sources, such as
that from the National Oceanic and Atmospheric
Administration (NOAA), might be useful. They did
not use NOAA or other external data because these
data sources did not work as well. For example,
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
79
Chicago Park District Beach Modeling (Chicago, Illinois) (continued)
Come Out and Play!
Chicago Park District Beach Notification website,
NOAA data comes from further offshore; beaches in
Chicago are man-made and have many structures in
place, so they require detailed on-site data.
Model Implementation
Public Involvement
CPD did not involve the public during the
initial phases of model development because the
information was too technical. However, CPD
conducted significant public outreach to inform
people about implementation efforts. All data were
made available to the public via the CPD website.
CPD also posted information about how the model
worked, how advisories work, what changes would
occur, and how this would improve public health.
There was a lot of media interest, which gave CPD the
opportunity for interviews with numerous papers and
news stations.
CPD did not receive much feedback from the public
even though the public could submit questions and
comments via website or hotline. CPD received
occasional feedback, however, when there were
unusual data or equipment malfunctions.
Calumet Beach
SWIM STATUS
O SWIM
WATER QUALITY INFORMATION
Forecasifortotiay 266 4
Most recent test result 789
Hfsr/ ran : hj-fri fmtfffnr twim xlfvftfl
Model Output and Validation
The key variables CPD used for these models include
the following:
• Air temperature.
• 6-hour solar radiation.
• 4-hour wave period.
• Longshore (NNW) wind.
• 6-hour longshore (NW) wind.
• 6-hour rainfall.
-------
8O
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Chicago Park District Beach Modeling (Chicago, Illinois) (continued)
• 48-hour rainfall.
• 4-hour log-wave period.
• Day of year.
• 4-hour onshore wind.
• 4-hour log turbidity.
• 4-hour log wave height.
Each model used a different combination of these
variables.
CPD conducted routine sampling throughout the
2012 beach season to collect data for validating the
model. They compared actual sampled results with
modeled results to ensure the model's accuracy. CPD
reports predicted values and will continue to refine
the models over the next several beach seasons.
Although model accuracy fluctuated between years,
CPD is confident that the advisories they issued
on the basis of modeled results were more accurate
than they would have been without the model. With
regard to confidence in model results, CPD remains
"cautiously optimistic."
CPD is assessing the effectiveness of the model by
evaluating whether more Type 1 and Type 2 errors
would have been generated relying only on traditional
water testing and waiting 24 hours for results.
Currently, if the model predicts a bacteria level over
235 CFU/100 mL, CPD issues an advisory. CPD also
posts the most recent lab results from traditional
water testing at each beach. If the test results and
the model do not agree, CPD then uses the model to
determine the advisory status.
Implementation
CPD began using the model in 2012 to make manage-
ment decisions on notification actions. They monitor
all beaches every weekday. They also monitor on
weekend days following an exceedance on a Friday, or
if the model predicts an exceedance on the weekend.
CPD runs the models at 9:00 a.m. and issue advisories
by 9:30 a.m. If the model shows no exceedance, CPD
posts a green flag. The public can view both model
results and sampling values by visiting the beach,
viewing the website, or calling a hotline.
Model Costs
The $243,000 GLRI grant provided the bulk of the
financial resources for the project. CPD also set aside
$50,000 in their capital budget to help purchase the
equipment in the first year (2011), and $25,000 in
capital funds to increase the amount of equipment in
2012.
In addition, CPD spent about $120,000 in 2011 for
water sampling at all the beaches in Chicago. Most
of these costs would have been incurred without the
modeling project. The extra sampling for modeling
was about $15,000. The most costly aspects of the
modeling process included the purchase of equipment
and USGS support.
Equipment costs were approximately $70,000.
Monthly bills for cellular data were about $3,000—
this covers data transmitted by eight cellular modems.
Obtaining water quality data (FIB testing results)
did not cost extra because this work would have
been done regardless of development of the models.
However, for reference, the lab costs for water quality
sampling were about $100,000, and the personnel
costs for water sampling were approximately $20,000
annually.
The grant was funded in the fall of 2010 and
continued through 2013. A large portion of the funds
was used to purchase and install the equipment
and for USGS statistical analysis. Some grant funds
remain; these will be used to offset ongoing costs
(maintenance, statistical analyses, etc.). Currently,
CPD relies on internal funding, which could decrease
in the future.
When determining overall cost effectiveness of the
model, CPD concluded that they would save money
only if sampling is reduced. CPD does not currently
plan to reduce sampling; however, if the BEACH
Act funding is cut, this would affect sampling
significantly because fewer resources would be
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
81
Chicago Park District Beach Modeling (Chicago, Illinois) (continued)
Montrose Avenue dog beach,
available. For CPD, the bottom line is, "How do you
put a price on better information?"
Issues Encountered
CPD had their share of issues with field equipment,
including equipment getting damaged by rough
weather. They adjusted the anchoring scheme for
the buoys, which helped, and have eliminated some
buoys, although some equipment issues continue and
the buoys are expensive to maintain. Looking back,
CPD might have selected a different anchoring system
to ensure that the equipment remained in place.
Moving Forward
CPD intends to keep moving forward with their
models. CPD has already invested over $75,000 of
department funding into the modeling program,
which shows confidence in the model's effectiveness.
CPD has expanded to other beaches since 2011,
and for the 2015 season predictive models were
used at all 24 of the city's beaches. In addition, they
have substantial resources going into mitigation
practices. They are also working on developing
better information and methods to address non-
anthropogenic sources of bacteria such as shore birds.
During the initial year of data collection, CPD
increased sampling frequency to twice per day at
the modeled beaches. USGS and Michigan State
University have helped validate and update the
models annually as of 2015.
Some beaches with higher exceedance rate have been
difficult to model. CPD is prioritizing rapid methods
at these beaches. CPD also conducts public outreach
about beach water quality. They implemented a
new texting service that allows beachgoers to text
the name of their beach to a dedicated number and
receive an automatic response with the current
beach conditions. A public education campaign
encourages people not to litter or feed wildlife, since
waste from seagulls and geese has been shown to
be a major source of fecal bacteria in the water.
The campaign also includes signage on Chicago
-------
82
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Chicago Park District Beach Modeling (Chicago, Illinois) (continued)
public transit, posters at beaches, and a large mural
at one of Chicago's busiest beaches. A new Beach
Ambassadors program with direct public outreach
asks beachgoers not to litter or feed wildlife, and
expanded programming for CPD's summer day camp
program educates kids on what they can do to keep
the water clean.
Finally, CPD is working to reduce bacteria sources
directly. New grooming equipment removes debris
and exposes wet sand to sunlight, killing bacteria.
At beaches with a history of problems from seagull
waste, CPD is using dog handlers and trained border
collies to chase the gulls from the beach. This project
has significantly reduced the number of days where
FIB levels exceeded water quality standards.
Advice and Lessons Learned
Sanitary survey data were tested in 2010, but it was
determined that more accurate and timely data
(buoy-based) was needed for the models. While
daily sanitary survey data are helpful for monitoring
operations such as garbage collection and beach
grooming, and keeping track of pollution sources,
survey parameters are not included in the models—
the models are all based on data collected by sensors.
The success of a model depends on a number of
factors. For CPD, the most important factor was
related to the presence of nonpoint versus point
sources of pollution. You need to have comprehensive
knowledge about the beach before you can
successfully develop a predictive model.
Cathy Breitenbach of CPD noted they are a large
jurisdiction with many resources. They were able
to do all equipment maintenance and monitoring
in-house and did not have to hire or rely on outside
support. Without this, they would not have been
as successful, especially considering that Chicago
is a big city with a large beach-going population to
protect. Other agencies who want to develop a model
must have access to funding and technical resources
necessary to collect data and conduct statistical
analyses. If their jurisdiction is small, however, they
can likely develop and implement a predictive model
at a lower cost.
References
Cathy Breitenbach, Chicago Parks District. Personal
interview. 2012.
Chicago Park District. 2012. Chicago Park District
Improves Beach Monitoring for 2012 Season.
http://www.chicagoparkdistrict.com/
chicago-park-district-improves-beach-monitoring-
for-2012-season.
Fulton, Jeff. No date. Public Beaches in Chicago.
USA Today, http://traveltips.usatoday.com/public-
beaches-chicago-53741.html.
-------
Case Study
City of Racine Nowcast Model (Racine, Wisconsin)
83
Introduction
How does a small coastal Wisconsin city of about
79,000 citizens reel in a "Best Beach in the State" title?
One reason might be its cutting edge approach to
staying on top of water quality. The City of Racine,
on Lake Michigan between Milwaukee and Chicago,
manages two popular swimming beaches, North
Beach and Zoo Beach. At 50 acres, North Beach is
the larger of the two. In 2012, USA Today named
it the best beach in Wisconsin, joining 50 other
beaches similarly selected from each of the states and
the District of Columbia. This honor can be added
to a long list of accolades for North Beach, which
includes a Top 10 Family Friendly Beach designation
by Parents magazine in 2011 and the Midwest Living
magazine's Top City Beaches list in 2010.
North Beach has medium- to fine-grained sand and
is groomed to remove trash and aerate the sand. The
swim area has a fairly shallow slope (2 to 5 percent)
and the beach has a 1 to 1.5 percent slope toward
the water. A harbor break wall increases swimming
safety by keeping waves in check. The beach face is
kept at a steep grade to prevent waves from spilling
over the berm crest. The city maintains restrooms,
a bathhouse, a concession stand, and an adjacent
playground. The city hires lifeguards to ensure public
safety. Weekend visitor numbers can exceed 11,000;
daily visitors average up to 2,200 persons per day
during the swimming season.
Lake Michigan
. Douglas
4 Park"
4 Lake View
'Park
I North Beach
Park
.
"*" 0.25 0.5 Miles
L
J
Zoo Beach, adjacent and north
of North Beach, is smaller, less-
developed, and attracts fewer
beachgoers than North Beach. So
named because of the adjacent
Racine Zoo, it has fewer access
points and amenities. Lifeguards
are on duty only on weekends. The
swim area has a steep drop-off and
no break wall, so the wave action
is more intense. Because of these
contrasts, Zoo Beach offers visitors
a quality beach experience with
beautiful views of Lake Michigan
in a more peaceful, less-populated
setting.
Water Quality
Racine's beachgoers have not always
enjoyed the current levels of high
water quality at their beaches. For
example, in 2003 North Beach was
under a no-swimming advisory
for 34 days because of high fecal
indicator bacteria counts. On
several of these days the beach
was closed entirely. That same
year, Zoo Beach had notifications
-------
84
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
City of Racine Nowcast Model (Racine, Wisconsin) (continued)
START OF
PROTECTED
AREA
Sampling at North Beach,
issued on 29 days. Since the swimming season in
Wisconsin is approximately three months long, these
problems resulted in a loss of almost 40 percent of
Racine's potential beach days in 2003.
In response, the city began a campaign to deal with
the point and nonpoint sources of fecal pollution that
were polluting their beaches. Sanitary surveys proved
to be important tools in helping city officials identify
pollution sources and plan mitigation projects such as
wetland construction, dune restoration, and improved
beach grooming practices. The results of these efforts
were outstanding, especially in terms of reducing
the number of beach advisories and closings. In
2010 North Beach was closed or under a swimming
advisory on only one day and on only three days in
2011. Zoo Beach had four notifications in 2010 and
five in 2011. This increase in safe-swimming days
provides clear evidence of the power of active beach
management.
Model Development
With beach clean-up efforts underway, Racine
focused on the lag time problem associated with
the traditional culture-based method of beach
monitoring.
Racine explored two options for dealing with this
lag-time dilemma. One was testing a new method of
measuring Escherichia coli (E. coli) concentration—
quantitative polymerase chain reaction (qPCR).
Instead of growing and enumerating bacterial
colonies in cultures, qPCR yields more timely results
by identifying and quantifying genetic sequences
of bacteria. qPCR results can be obtained from a
laboratory on the same day the sample is taken, in
most cases within three hours of sample collection,
allowing more rapid determinations of beach water
quality for swimmers' safety.
Racine also explored using mathematical models
to predict beach water quality. An accurate model
would provide a basis for issuing preemptive notifi-
cations in advance of water sampling, allowing city
officials to take an even more conservative approach
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
85
City of Racine Nowcast Model (Racine, Wisconsin) (continued)
to swimmers' safety. Racine officials believe that
the daily use of models, supported by daily beach
survey data and verified by qPCR monitoring, will
be the cornerstone of their future beach monitoring
program.
Statistical models were developed for Racine's two
beaches using the U.S. Environmental Protection
Agency's (EPA's) Virtual Beach (VB) software
(v2.0-2.2). The Wisconsin Department of Natural
Resources (WDNR) Sciences Services assisted
throughout the model development process. WDNR
coordinates Wisconsin's beach monitoring program
and administers the BEACH Act grants for the state's
193 public beaches along 55 miles of Lake Superior
and Lake Michigan coastlines. Because WDNR staff
had expertise in model development for the state's
many public beaches, they were well-equipped to
offer guidance to the City of Racine. WDNR's support
proved invaluable as they pulled together various
data sources, including data from older and recently
developed models. Identified as Nowcast models,
the "real-time" predictive models developed for
the project use multiple linear regression and other
statistical procedures to evaluate the relationships
between measured FIB concentrations in the water
Performing qPCR,
and certain meteorological factors and onshore and
near-shore conditions associated with water quality.
The output of the current models developed by the
city, in conjunction with the WDNR, expresses two
values: predicted E. coli concentrations and predicted
probability of exceedance.
Model Development Key
Components
The key for developing a good model is selecting
the proper set of component variables and ensuring
that staff have the necessary skills. In the initial
development phase, in 2010, Racine examined
a diverse set of variables for potential use in the
model. Variables included water temperature, air
temperature, seagull counts, dog counts, wildlife
counts, wave height and intensity, water clarity, sky
conditions (i.e., cloud cover), water color changes,
odor, algae amount, algae type, bather load (in, out,
and total), long shore current direction and speed,
wind direction and speed, stream discharge, pollution
discharge, rainfall (24-, 48-, and 72-hour) and other
precipitation records, day of year, season, lake levels,
and the previous day's E. coli values. Initially, all
variables were included because the majority could
have been considered factors that influence local
water quality. The project team reduced the initial
number of variables by conducting correlation
analyses. The model was developed using the
variables that had the strongest associations.
Important data sources for the model development
included the U.S. Geological Survey's (USGS') real-
time data viewer, Racine Water and Wastewater
Utilities, the Great Lakes Observing System (GLCFS
Nowcast 2D), local weather station data, and National
Oceanic and Atmospheric Administration (NOAA)
buoy data. Staff also obtained data from routine
sanitary surveys housed on the Wisconsin Beach
Health website (hosted by the USGS). Exploratory
data analyses revealed that the sanitary survey
data were especially valuable. The presence of algae
and water clarity, for example, proved to be a good
predictor of high FIB levels at some locations.
-------
86
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
City of Racine Nowcast Model (Racine, Wisconsin) (continued)
Racine's beaches proved to be good candidates
for modeling because they have large, consistent
databases of FIB concentrations and fairly predictable
pollution incidents associated with storms which
resulted in advisories. Because North Beach is
sampled at least five times per week, model developers
could more frequently compare model results with
actual FIB concentrations.
By 2011 the VB software (VB v2.1) was fully
developed and the city built an operational Nowcast
model for North Beach. Key variables selected for
the model included rainfall, wave height, long shore
current vectors, stream discharge, water clarity, and
sky conditions. Racine conducted a pilot test using
qPCR and a culture-based method for measuring
E. coli concentrations. This preparatory step was
important because it allowed the city to track model
predictions with laboratory results and validate the
model using real-time data.
The results were very encouraging. The model
predicted E. coli concentration with 91 percent
accuracy for culture-based results and with 98
percent accuracy for qPCR-based results.
In 2012 Racine built new models (using VB v2.2) for
North and Zoo Beaches. The new models included
a greater proportion of Web captured data than the
2011 models which relied heavily on beach sanitary
data collected locally. By developing two different
types of models, Racine was able to determine
whether the number and types of field data could be
reduced or eliminated (as a cost savings measure). In
the new model, wave height was found to be the most
predictive variable at Zoo Beach. As developed, the
Zoo Beach model required significantly less locally
collected field data to run than the 2011 model and
results have been encouraging. However, the city
found that the 2011 North Beach model (which
included several beach sanitary survey parameters)
was more robust than the 2012 model construct.
Model Implementation
Before developing the Nowcast models, the City of
Racine used the persistence model (i.e., the previous
day's culture-based results) for issuing beach
notifications. In 2011 the city used the Nowcast
model in combination with the lab-based methods
to support management decisions. Even when the
model predicted exceedance of the E. coli water
quality standard, the city did not use the model
alone to make notification decisions. Instead,
the city developed a set of guidelines for making
notification decisions. For example, they issue a
preemptive advisory in advance of results from the
laboratory analyses if the probability of exceedance
is greater than 10 percent and the predicted E. coli
concentration exceeds 50 colony-forming units per
100 milliliters of water.
Each beach monitoring component—sanitary survey,
Nowcast model, culture-based testing, and qPCR—is
designed and applied to complement and reinforce
the others to generate timely, accurate results and a
better understanding of the conditions and variables
that accelerate FIB growth in water to create unsafe
conditions for swimming. In June 2012 Racine,
Wisconsin, became the first municipality in the
nation to base notification decisions on qPCR results.
In conjunction with the qPCR assay, Racine also ran
the model at North and Zoo Beaches daily (the city
runs the model only on weekdays, unless there is an
advisory or closure that extends their sampling and
sanitary survey data collection into the weekend).
The results of qPCR, along with sanitary survey
information, model estimations and staff judgment
are all considered when determining whether to issue
a beach advisory or closure.
Model Costs
The city did not incur additional costs for
data collection for model development and
implementation. The necessary data were already
being routinely collected, using equipment already
in place. The costs associated with the development
of the Nowcast model were mostly for labor. The staff
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
87
City of Racine Nowcast Model (Racine, Wisconsin) (continued)
Stormwater retention management practices at North Beach,
needed to have a basic understanding of statistics,
intuitive ability to manipulate data, and a working
knowledge of factors affecting local water quality.
The development team usually consisted of two
laboratory personnel, with support and guidance
from staff at the WDNR Science Services division.
In some cases, WDNR staff took a more impromptu
role by developing models in coordination with
laboratory personnel. Labor costs included the time
it took staff to retrieve, format, and assess the data
and build, train, and revise the operational version
of the initial model. Staff needed several months to
collect and format data to develop the initial model.
Model costs were minimal but required one person
to work through the modeling process. The newer VB
software reduced the model development time, but
data evaluation and model development still required
a week or more.
The daily cost to run the model is minimal—most
of the cost is in data collection and processing (i.e.,
the initial effort required to build the model, run
correlations, and perform statistics), which occurred
over several days. Importantly, EPA's continued
improvement of the VB software allows for more
rapid statistical model development and simplifies the
application of the model for the end user. Newer ver-
sions of the model not only provide quantified results,
but also add an exceedance probability providing
another dimension to beach management decisions.
The time spent running the model is only a
fraction of the time spent on routine, culture-based
monitoring. Once all the routine sanitary survey
data are available, the model takes approximately five
minutes to run—significantly faster than laboratory
sample analyses, which require at least 2 hours and
up to 18 hours, depending on the method used.
-------
88
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
City of Racine Nowcast Model (Racine, Wisconsin) (continued)
Issues Encountered
Although the overall model prediction results have
been very accurate, the City of Racine encountered
a few issues. During the model development phase,
the city had trouble building a robust dataset because
of the large amount of data required. They resolved
this problem after they compared the electronic data
against the original hardcopies and found that missing
data and incorrectly entered data often caused issues
with empty cells or incorrect predictions.
Another issue encountered was with the estimation
of E. coli data. North Beach typically has very few
advisories; as a result, building a model to predict
those exceedances was difficult. For example,
since advisory dates were so few and far between,
those dates could have possibly been identified as
statistical outliers (i.e., sample results that were
numerically distant from the rest of the data) and
it was sometimes difficult to decide which data
should be culled. Once these decisions were made,
implementation was far less problematic. City of
Racine laboratory staff noted, "Where there are few
exceedances, we sometimes remove them as statistical
outliers, but we have to be careful doing so because if
we leave them out, we are essentially excluding event-
based data."
The city sometimes had issues with data retrieval.
Occasionally, either online data were unavailable for
running the model, or data were unusable because
of a reporting error. In those cases, the city had
to find a comparable data source. For example, if
rainfall data were unavailable from the local airport,
the city used precipitation records from the local
wastewater treatment plant (a comparable distance
from the beach) to make an initial estimation.
Once precipitation records from the airport became
available, they re-ran the model using the amended
data from the original source.
Moving Forward
The City of Racine validates the model by comparing
model results to monitoring (culture and qPCR)
results. They consider the model to be successful
because of the low number of Type I and Type II
errors found after evaluating beach management
decisions at the end of the beach season. The city ran
the 2011 (VB v2.1) and 2012 (VB v2.2) models side-
by-side to compare the results and verified which
model was most appropriate for each beach. They will
continue to evaluate their model every year to ensure
that it is still predictive since major changes can occur
to beaches and the weather varies significantly from
year to year. Data collection methods and variables
Waiting for fireworks.
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
89
City of Racine Nowcast Model (Racine, Wisconsin) (continued)
Life guard watching North Beach,
used in the model have not changed. The city
compared the 2011 model to newer model iterations
and have found that incorporating additional years of
data has not made any significant improvements.
As of 2015, the city is using both the monitoring
and modeling results to make beach notification
decisions. Because the model results have shown high
accuracy, monitoring could be reduced in the future.
However, staff would still need to visit the beaches
regularly to complete the routine beach sanitary
survey form that includes data elements necessary to
run the model. In the future, the city plans to focus
more on cost-efficiency; Nowcast models will likely
play a large role in this endeavor. Eventually, model
results might be the primary means for making beach
notification decisions, restricting laboratory analyses
to only those days when exceedances are predicted.
Through the use of qPCR this can be accomplished
in near real time, striking a balance between
public health protection and maximum utility of
recreational beaches.
Advice and Lessons Learned
To assist others planning to develop a predictive
model, the City of Racine shared these lessons
learned:
• Partner with agencies or universities that have
software expertise and experience with predictive
models.
• Build your model using easily retrievable data
and collect data in a consistent manner and in
sufficient quantity. You can't compare an apple
to an orange (i.e., estimations of wave height
beachside might not be equivalent to data retrieved
at a NOAA buoy). It is often best to collect your
own data and not rely on someone else's.
• Have a robust data set—at least 2 seasons' worth of
data are preferable.
• Use sanitary surveys to identify pollution sources
as well as gaps in model performance. One
season may have a dominant variable that wasn't
-------
9O
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
City of Racine Nowcast Model (Racine, Wisconsin) (continued)
previously accounted for. Sanitary survey data
should be consistently collected each day that
sampling occurs.
Evaluate your dataset before building the model.
Sometimes modelers will expect an unreasonably
high R-squared value without much knowledge
of their data. As a result, modelers might spend
unnecessary time finding, acquiring, compiling,
formatting, and reviewing additional data,
which might not significantly improve model
performance. A flow chart of inputs and outputs
for FIB at your beach can help with this.
Not all beaches will have a single driving force, but
those that have unique situations might require
evaluative criteria prior to model development to
improve chances of a success. There can be a lot
of background noise from the frequency of non-
event related observations in predictive variables.
The City of Racine improved model performance
by implementing a rainfall threshold to reduce the
size of the dataset.
Examine the interaction between variables—not
just variables as single elements. For example,
wind direction might not be predictive, but wind
direction plus speed might be (e.g., onshore winds
exceeding a velocity threshold).
Determine how to best represent your data (i.e.,
quantitative, qualitative, categorical, or binary).
Discuss the threshold for exceedance probabilities
during the implementation phase. Depending on
the model, the probability of exceedance result
might be less or more than expected, given the
model estimate.
Have comparable backup data sources for your
model inputs. Be realistic about model outputs
and combine the results with experience. Does
the model output match what my experience tells
me? How should I expect these environmental
conditions to affect local water quality? (i.e., your
model needs to make sense.)
• Validate the model periodically because ambient
conditions might change.
References
Cicero, K. The 10 Best Beaches for Families: 2011.
Parents Magazine. June 2011. Accessed January 22,
2013. http://www.parents.com.
Clark, J., Hortobagyi, M., and Yancey, K.B. Just for
Summer: 51 Great American Beaches. USA Today.
March 27, 2012. Accessed January 22, 2013.
http://travel.usatoday.com.
Kinzelman, Julie. City of Racine. Personal
communication.
Kurdas, Stephan. City of Racine. Personal
communication.
Our 7 Top Midwest City Beaches. Midwest Living
Magazine. July-August 2010. Accessed January 22,
2013. http://www.midwestliving.com.
-------
Case Study
The Stormwater and NexRad Rainfall Models (Horry
County, South Carolina)
91
Introduction
Horry County, South Carolina has 180 miles of
coastline containing a series of beaches, its most
famous being Myrtle Beach, also known as the
"Grand Strand." Its beaches attract more than 13
million visitors each year.
Like all the public beaches in the state, Grand Strand
beaches are regularly monitored for fecal indicator
bacteria levels by the South Carolina Department of
Health and Environmental Control (SCDHEC), in
conjunction with local governments. The goal of the
monitoring program is to allow the public to make
««.*
Atlantic Ocean
5.5
I
11 Miles
_|
informed decisions about their recreational activities
and any potential for swimming-associated health
effects.
Water Quality
The water quality of Grand Strand beaches is
typically very good. However, during and after heavy
rainstorms, Stormwater discharges occasionally
cause bacteria levels to rise above state water quality
standards, prompting SCDHEC to issue swimming
advisories. To minimize the impact of Stormwater
on these beaches, some Grand Strand communities
have extended their Stormwater outfall structures
further out into the ocean to discharge
runoff into deeper waters, away from
swimming areas.
In 2011 Myrtle Beach completed
a project at 4th Avenue North
that consolidated nine nearshore
Stormwater drainage pipes into one
large pipe which runs underneath the
seabed and empties into the Atlantic
Ocean more than 1,000 feet from
shore. Similar projects have been
conducted at 7th Avenue South in
North Myrtle Beach and at Deep Head
Swash in Myrtle Beach. These and
other infrastructure investments have
significantly reduced fecal indicator
bacteria levels at Grand Strand
beaches.
Model Development
Stormwater Model
In 2007 SCDHEC developed a model
as part of a staffer's master's thesis
project to predict fecal indicator
bacteria levels at South Carolina state
beaches. To be adopted and applied
by SCDHEC, the model needed to be
simple to operate and provide reliable
-------
92
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Stormwater Model (Horry County, South Carolina) (continued)
Myrtle Beach,
results. The model effort evolved to include a project
team consisting of the local health department,
SCDHEC and University of South Carolina (USC)
professors with geostatistical modeling, database
management, and geographic information system
(GIS) expertise. They chose to develop models using
information for the popular Grand Strand beaches.
These beaches are Tier 1 (the highest priority beaches
because of high risk, high use, or both) beaches and
were best suited for modeling because they have
direct Stormwater input and high number of bathers.
Tier 2 beaches typically had very few exceedances and
bathers.
SCDHEC and the project team used various statistical
methods, a literature review, and professional
judgment to determine which variables to include.
Rainfall was found to be the primary predictive
variable.
The initial models developed were statistical models
with rainfall as the most important variable. A
Multiple Linear Regression (MLR) model and a
Classification and Regression Tree (CART) model
were developed and run separately for each sample
site. To improve prediction, SCDHEC developed
an ensemble forecast—a statistical approach using
results from multiple models—by combining results
from the MLR (predicting estimated fecal indicator
bacteria levels) and CART (estimating the range
of expected fecal indicator bacteria levels) for each
sample site (or section of beach). By combining these
two results, SCDHEC could approximate a third
possible fecal indicator bacteria level, called the
Ensemble prediction. Beach managers could use all
three model outputs to determine the advisory level
needed to protect public health in different areas.
NexRad Rainfall Model
In 2011 SCDHEC began collaborating with USC and
the University of Maryland to develop an updated
version of the Stormwater model, (i.e., NeRad
rainfall model) one that would not require the use of
expensive rainfall equipment. The project entailed
enhancing a user application with new models and
developing an automated, database-driven tool
that would estimate bacteria levels and visualize
model results, allowing SCDHEC to better predict
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
93
Stormwater Model (Horry County, South Carolina) (continued)
and analyze bacteria-related public health threats.
The project was led by Dr. Dwayne Porter of USC
and built on previous efforts and incorporated new
models that provide rainfall estimates using radar-
based data. These radar data improved existing
tools by (1) allowing spatial estimates to be averaged
over a watershed area instead of applying point
estimates and (2) allowing for automated integration
of remotely sensed data, eliminating the need for
SCDHEC's costly rain gauge network.
The NexRad rainfall model essentially combined
the MLR, CART, and ensemble techniques into
one modeling user interface, and added a new
element—Next-Generation Radar (NexRad) data,
compiled from a network of high-resolution Doppler
weather radars operated by the National Oceanic and
Atmospheric Administration's (NOAAs) National
Weather Service (NWS). The goal of using NexRad
is to have as close to real-time data as possible. As of
2013, the NexRad data just included rainfall; however,
the development team planned to consider other
variables such as sunlight, temperature, salinity, and
the number of preceding dry days. The development
team is still determining the best data sources. As of
2015, USC still uses sanitary surveys although they
can be time-consuming.
The NexRad rainfall model also used GIS polygons
of individual watersheds, which were created by
overlaying piping diagrams of the Stormwater systems
provided by the area's individual municipalities. GIS
polygons are overlaid to create mini-watersheds to
determine how much rain falls on each beach site.
SCDHEC tested the NexRad rainfall model in several
counties during the 2012 beach swimming season
(May 15 through October 15) and used model results
as one of several tools in deciding whether to issue
swimming advisories. Exceedances of water quality
standards are expressed as High, Medium, and Low
(using the MLR and CART model predictions), but
the model can also provide actual predicted FIB
levels.
Data
SCDHEC used historical water quality data to
develop and validate the Stormwater model in
2007. The data and variables considered included
cumulative rainfall, rainfall intensity, number of
preceding dry days, wind speed and direction, tides
and lunar phase data, water current, and salinity. The
water quality data were collected by the SCDHEC
beach monitoring program. Rainfall data were
collected by a system of gauges installed in several
locations. Wind speed and direction data were
obtained online from NOAA.
In 2011, USC began developing the NexRad rainfall
model based on the assimilation and integration of
multiple sources of data including field programs
(bacteria density, salinity, air and water temperature,
tide, weather); observing systems (rainfall, currents,
salinity, wind); and remote sensing models (salinity,
air and water temperature, rainfall, currents, wave
activity). SCDHEC provided the bacteria density
data. All other data for the NexRad rainfall came
from a variety of sources, including the NWS, the
National Estuarine Research Reserve System, and
the Southeast Coastal Ocean Observing Regional
Association's Integrated Ocean Observing System
(IOOS).
Model Output and Validation
To validate the Stormwater model, USC compared
the predicted MLR calculations to actual sampled
values twice a month. In general, the Stormwater
model expressed predicted exceedances of the water
quality standard with above-average accuracy;
however, SCDHEC did not sample after rain events
at sites where acceptable water quality was predicted.
Therefore, an unknown quantity of false positives
might have occurred.
In 2005 SCDHEC ended the data collection used to
validate the Stormwater model. Officials felt that the
post-2005 changes (i.e., offshore Stormwater outfall
pipe [discussed above] and new infiltration pits and
ultraviolet disinfection systems) drastically changed
the environment; therefore, the model was no longer
-------
94
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Stormwater Model (Horry County, South Carolina) (continued)
Myrtle Beach,
relevant since it was based on data collected before
these changes occurred.
The NexRad rainfall model worked better for some
beaches than others. USC performed receiver
operating characteristic (ROC) analyses to determine
the frequency of Type I and Type II errors. USC staff
assessed model effectiveness by cross-referencing
samples taken against predicted MLR calculations.
If the MLR model calculated fecal bacteria levels of
greater than 103 colony-forming units (CPU) per
100 milliliters (mL) of water, SCDHEC issued an
advisory. If the CART model calculated High, they
issued an advisory. If the MLR model calculated
a concentration of greater than 74 CFU/100 mL
and CART model calculated Medium at the same
site, they issued an advisory. USC validates the
NexRad rainfall model using VB's toolbox for model
development and validation which has made model
updates and validation fairly easy as long as data are
available. This tool allows the user to compare model
predictions against actual monitoring data.
Model Implementation
When implementing the Stormwater model (during
the 2007-2009 beach seasons), SCDHEC discovered
that the effect of rainfall and other variables differed
by beach site. Consequently, the agency decided that
each beach site should be modeled independently
(i.e., using a different statistical model for each station
or section of beach) to provide the most accurate
information.
SCHDEC applied the Stormwater model to 10 beaches
in Horry and Georgetown counties. The model was
designed to extract rainfall data from rain gauges at
each beach and independently input weather and tidal
information. These data were continuously added
to the model, which was constantly recalibrated,
although a more intensive recalibration was needed to
adjust to the infrastructure changes.
When developing the NexRad rainfall model in 2011,
USC found that combining the separate, sample
site-specific models (MLR, CART, and Ensemble)
into one user interface was fairly easy. As of 2015 USC
makes the daily model results available via email, a
Web interface, and a phone application. SCDHEC
-------
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
95
Stormwater Model (Horry County, South Carolina) (continued)
publishes advisory information at www.howsthe-
beach.org. In 2012, SCDHEC used the NexRad model
to implement the initial suite of preemptive beach
swimming advisory models as a tool to determine
when an advisory should be issued in Horry County,
South Carolina. Because of program management
changes, SCDHEC did not continue using the model
for advisory decisions in Horry County after 2012.
Issues Encountered
The Stormwater model was used during the 2007-
2009 beach seasons to make beach management
decisions. Little or no modeling was performed in
2010-2011. In 2010, SCDHEC changed the advisory
program and began placing permanent advisory
signs at beach sites that routinely exceeded the state
water quality standards (e.g., Stormwater outfalls
and swashes). Remaining sites were either not
modeled because of historically low enterococci
counts or because they never exceeded water quality
standards, even after a rainfall event (other than a
tropical storm). This, coupled with drier seasons,
meant that the Stormwater model was not used very
frequently, if at all, in most locations. Out of a total
of 43 sites monitored, SCDHEC placed 29 permanent
signs saying, "Caution, swimming not advised, high
bacteria counts, refrain from fishing and wading, do
not put head below water, no swimming within 200
feet of sign." SCDHEC still monitored all 43 sites, but
did not want to invest in new monitoring equipment
to support the modeling when many sites already had
permanent signs. In addition, the outdated equipment
used to measure rainfall began failing and was not
compatible with new computers. The entire system
was expensive to replace, with an estimated cost of
$20,000-$30,000.
Learning from SCDHEC's experiences, beach
managers should be aware of the limitation of
hardware and equipment. For South Carolina, the
tipping buckets used to gather rain data required a
significant amount of maintenance and time to keep
running and clean. Replacement parts for all types
of equipment can be costly; in addition, equipment
can sometimes become obsolete, being replaced with
newer technology. In addition, equipment can be
difficult to maintain with limited amounts of staff
and resources.
Model Costs
The initial cost to develop the Stormwater model
in 2007 was low—basically the cost of a graduate
student's time. Once the model was operational, costs
increased because a series of 11 rain gauges needed to
be installed and maintained. Unfortunately, SCDHEC
budget cuts reduced the resources and staff available
to perform maintenance. Using data collected during
the 2007-2009 beach seasons, they were able to target
sites with frequent exceedances and could reduce
monitoring and maintenance of the rain gauges at
other sites, further reducing costs. This continued
until permanent signs were put in place at beach sites
where the water quality standards were routinely
exceeded and eventually they stopped using the rain
gauges all together.
The primary costs for the 2011 NexRad model are
development and continual model updates. There
were no costs associated with model implementation
because model data were obtained for free.
Moving Forward
The NexRad rainfall model eliminated the need for
updates and maintenance of the rain gauge network;
improve timeliness by providing robust decision
support well in advance of verification by biological
sample cultures; and improved accuracy by providing
reliable forecasts of beach hazards that would merit
closures, while reducing false positives. These models
are some of the first marine Enterococcus models,
and some of the first to use CART models. They
are transferable to other swimming beaches in the
southeast United States that experience similar
weather and water circulation patterns and have
Stormwater runoff as the most significant pollution
source. In the future, the scientists who developed
-------
96
Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study
Stormwater Model (Horry County, South Carolina) (continued)
the model hope to increase buoy and radar coverage
to provide improved spatial resolution of data and to
assess the use of the model for predicting salinity and
currents.
USC's Dr. Dwayne Porter advises other beach
programs that "you do not want to shortchange the
modeling effort, but simpler is often better." Sean
Torrens with SCDHEC encourages beach managers to
collaborate with others, such as graduate students and
universities, and to research what others are doing to
avoid reinventing the wheel.
References
NRDC (Natural Resources Defense Council). Testing
the Waters: South Carolina.
http://www.nrdc.org/water/oceans/ttw/sc.asp.
Porter, Dwayne, University of South Carolina.
Personal communication.
South Carolina Department of Health and
Environmental Control. Beach Monitoring
Program.
http://www.scdhec.gov/HomeAndEnvironment/
Water/SwimSafety/.
Southeast Coastal Ocean Observing Regional
Association. Water Quality Observations and
Models Help Managers Make Decisions on Issuing
Swim Advisories, www.secoora.org.
Torrens, Sean, South Carolina Department of
Health and Environmental Control. Personal
Communication.
------- |