Predictive Tools for Beach Notification Volume I: Review and Technical Protocol


Predictive Tools for Beach
        Notification
  Volume I: Review and
    Technical Protocol
         U.S. Environmental Protection Agency
            Office of Water
          Office of Science and Technology


            EPA-823-R-10-003


           November 22, 2010

-------

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010
Executive Summary	 xi
1  Document Overview and Background	1
2  Tools for Beach Notification Decisions	3
  2.1   Statistical Models	3
  2.2   Rain Threshold Levels	4
  2.3   Notification Protocols	4
  2.4   Deterministic Models	6
3  Hydrologic Environments and Their Effects on Modeling for Beach Notifications	7
  3.1   Introduction	7
  3.2   Sources of Fecal Pollution in the Watershed	8
     3.2.1    Animal Sources	8
     3.2.2    Human Sewage	8
  3.3   Bacteria Movement to the Waterbody	8
  3.4   Bacteria Movement within the Waterbody	9
  3.5   Hydrodynamic Factors Affecting Pollution Movement	9
     3.5.1    Hydrodynamic Dispersal in Lakes and Other Lentic Environments	9
     3.5.2    Hydrodynamic Dispersal in Streams and Rivers	10
  3.6   Great Lakes Hydrologic Environment	10
  3.7   Inland Lakes	10
  3.8   Rivers	11
  3.9   Marine Waters	11
4  Current Applications of Predictive Tools	13
  4.1   Statistical Models	14
     4.1.1    SwimCast (Lake Michigan Beaches, Illinois)	16
     4.1.2    Nowcast (Lake Erie Beaches, Ohio)	16
     4.1.3    Nowcast (Port Washington, Wisconsin)	17
     4.1.4    Project S.A.F.E. (Indiana)	17
     4.1.5    RainFlow (Port Washington, Wisconsin)	18
     4.1.6    South Shore Beach Model (Milwaukee, Wisconsin)	18
     4.1.7    Charles River Flag Program, Massachusetts	18
     4.1.8    PhillyRiverCast (Philadelphia, Pennsylvania)	19
     4.1.9    BacteriALERT (Chattahoochee River, Georgia)	19
     4.1.10  USGS  Model (Kansas)	19
     4.1.11  Stormwater Model (Horry County, South Carolina)	20
     4.1.12  Stormwater Model (Fairhaven Beach, New York)	20
     4.1.13  Stormwater Model (Sandy Point State Park, Maryland)	20
     4.1.14  Virtual Beach Manager  Toolset (Various Locations)	21
                                          in

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010

    4.1.15  Receiver Operator Characteristic Curve Modeling in Boston Harbor Beaches
            (Boston, Massachusetts)	21
  4.2   Other Predictive Tools	22
    4.2.1   Rain Threshold Levels	22
    4.2.2   Notification Protocol	26
  4.3   Deterministic and Combination Models	28
5  Developing a Beach Notification Statistical Model	29
  5.1   Considerations for Developing a Statistical Model	29
  5.2   Selecting Variables for A Statistical Model	30
    5.2.1   Physical	30
    5.2.2   Chemical	30
    5.2.3   Meteorological and Hydrologic	30
    5.2.4   Other	32
  5.3   Collecting Data for a  Statistical Model	33
    5.3.1   Widely Used Data and Data Sources	35
  5.4   Ensuring Data Quality	36
  5.5   Conducting Exploratory Data Analysis	38
  5.6   Developing Your  Statistical Model	40
  5.7   Assessing and Refining Your Statistical Model	41
  5.8   Implementing Your Statistical Model	43
  5.9   Additional Resources	43
6  Developing Rain Threshold Levels and a Rain Notification Protocol	45
  6.1   Collecting Data For Rain Threshold Levels and Notification Protocols	45
  6.2   Developing a Rain Threshold Level	47
    6.2.1   Frequency of Exceedance Analysis	47
    6.2.2   Regression Modeling	47
  6.3   Developing a Notification Protocol	47
7  Common Challenges and Obstacles	49
8  Beyond Statistical Modeling	51
  8.1   Forecast Modeling	51
  8.2   Regional Modeling	52
    8.2.1   Great Lakes Finite Element Nested Models	52
  8.3   Hydrodynamic and Fate and Transport Modeling	52
    8.3.1   Hobie Beach	53
    8.3.2   Other	53
  8.4   New Use of Existing Data and Innovative Analysis	53
    8.4.1   Use of Hydrography (NHDPlus Network) and Land Use Data	53
  8.5   Neural Networks and Genetic Algorithms	54
9  References	57
                                          IV

-------
Predictive Modeling at Beaches—Volume I                                     November 22, 2010
Table 4-1. Overview of predictive tools in use	13
Table 4-2. Beaches assessed using predictive tools	14
Table 4-3. Beaches assessed using rainfall thresholds	23
Table 4-4. New York City wet-weather advisory information for 2009	25
Table 7-1. Issues concerning statistical predictive models and tools	49
Figure 3-1. Sources and fate of fecal pollution in watersheds	7
Figure 4-1. Flow chart used in Washington as part of its notification protocol	27
Figure 5-1. Points A, B, and C represent an outlier, a high-leverage point, and an influential
   point, respectively, in a regression context	38
Figure 5-2. An ideal residual plot for a multiple linear regression model showing no
   discernible pattern among the residuals	41
Figure 5-3. A plot of regression residuals that shows indications of heteroscedasticity and
   the need for a Box-Cox transformation	42

-------
Predictive Modeling at Beaches—Volume I                                     November 22, 2010
                           This page is intentionally blank.
                                            VI

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
Abbreviations and Acronyms
AIC          Akaike's Information Criterion
ANN         artificial neural network
BEACH Act   Beaches Environmental Assessment and Coastal Health Act
BIC          Bayes Information Criterion
CART        classification and regression tree
CPU          colony forming units
cm           centimeters
CSO          combined sewer overflow
DNA         deoxyribonucleic acid
DO           dissolved oxygen
E. coli        Escherichict coli
EFDC        Environmental Fluid Dynamics Code
EMPACT      Environmental Monitoring for Public Access and Community Tracking
EPA          U.S. Environmental Protection Agency
GLERL       Great Lakes Environmental Research Laboratory
FIB           fecal indicator bacteria
HSPF         Hydrological Simulation Program—Fortran
in            inches
mL           milliliters
NOAA        National Oceanic and Atmospheric Administration
NWS         National Weather Service
POTW        publicly owned treatment works
qPCR         quantitative Polymerase Chain Reaction
R2            coefficient of determination
ROC          Receiver Operating Characteristic
RTI          Research Triangle Institute
S.A.F.E.       Swimming Advisory Forecast Estimate
                                        vn

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010

SMTM        Simple Mixing and Transport Model
SSO          sanitary sewer overflow
SWMM       Storm Water Management Model
TPM          Tidal Prism Model
USGS         U.S. Geological Survey
UV           ultraviolet
WWTP        wastewater treatment plant
                                         Vlll

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
Executive Summary
The U.S. Environmental Protection Agency (EPA) is in the process of developing new or revised
recreational water quality criteria as required in the Beaches Environmental Assessment and
Coastal Health Act (BEACH Act) of 2000. As part of that process, EPA is investigating a faster
method of analysis of fecal indicator bacteria (FIB) in a series of epidemiology studies. Such a
rapid method would provide analytical results in 4-6 hours, which does not include
transportation time to the laboratory or laboratory preparation time. Even with the rapid method,
results might not be available on the same morning samples are collected or even on the same
day. One means to supplement, not replace, analytical results and to make same-day public
health decisions is to use predictive tools such as statistical models, rainfall threshold levels, and
notification protocols.
This document is Volume I of a two-volume report. Volume I summarizes current uses of
predictive tools to provide model developers with the basic concepts for developing predictive
tools for same-day beach notifications at coastal marine, Great Lakes, and inland waters.

Volume II provides results of research conducted by EPA on developing statistical models at
research sites. It also presents Virtual Beach—a software package designed to build statistical
multivariable linear regression predictive models.
The types of predictive tools that can be used to make beach notification decisions fall into the
following categories—statistical regression models, rainfall-based notifications, decision trees or
notification protocol, deterministic models, and combinations of tools.

• A statistical model (also called a statistically based model or ^predictive model) is a
general term for any type of statistical modeling approach to predicting beach water
quality. A statistical correlation is observed between FIB and environmental and water
quality variables that are easier to measure than FIB. Typical variables include
meteorological conditions (solar radiation, air temperature, precipitation, wind speed and
direction, and dew point); water quality (turbidity, pH, conductivity/salinity, and
ultraviolet (UV)/visible spectra); and hydrodynamic conditions (flows of nearby
tributaries, magnitude and direction of water currents, wave height, and tidal stage).

• A rain threshold level is another predictive tool used in many locations as the basis for a
beach notification. Many beach managers have noticed a connection between the
concentration of FIB at a beach and the amount of rain received in nearby areas. That
relationship can be quantified as an amount or intensity of rainfall (a threshold level) that
is likely to cause exceedances of water quality standards at a beach, and the length of
time over which the standards will be exceeded.

• Beach managers can also develop a series of questions or a decision tree, considering
factors such as rainfall, to guide beach notifications. Such evaluations use water quality
sampling, rainfall data, and other environmental factors that could influence the FIB
levels (such as proximity to pollution sources, wind direction, visual observations, or
IX

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
other information specific to the region or beach). In this document, that process is
referred to as developing a notification protocol.

Developing a basic understanding of the regional hydrology can be an important part of
developing and using a predictive tool. This report addresses the influence of different
hydrologic settings and how they affect the development of a predictive tool.

This document presents predictive tools that health departments and other responsible agencies
are using for predicting water quality conditions and making timely decisions on beach
notifications. An overview and a short description are presented for each predictive tool for
which information was available. The document presents details on the elements required for
developing a statistical model, according to a review of available literature. It discusses
techniques for refining models and advanced statistical methodologies. The document also
outlines general procedures for developing rainfall threshold levels and notification protocols.

A review of predictive tools for beach notifications reveals different challenges for each beach.
Developing a predictive tool requires a commitment of resources (data collection, computer
software, expertise), but there is no guarantee that a useful predictive tool will be produced. The
applicability and challenges of predictive tools are discussed.

Finally, the document discusses future directions that EPA considers likely for predictive tools
for beach notifications. Reliably forecasting beach water quality conditions a day or more in the
future, related to weather forecasting, is a next step. Attempts are being made to develop models
that apply to more than one beach or to a region of shoreline.

-------
Predictive Modeling at Beaches—Volume I November 22, 2010

1 Document Overview and Background

Document Overview
This document is Volume I of a two-volume report. The goal of this document is to provide
beach managers and developers of predictive tools with technical protocol for developing
predictive tools for same-day beach notifications at coastal marine, Great Lakes, and inland
waters. The technical protocol, or specific directions for developing predictive tools for beach
advisories, is in Chapters 5 and 6. The background section below provides background
information on the BEACH Act, beach monitoring, water quality criteria, analytical methods,
and previously used beach predictive tools. Chapter 2 describes basic types of predictive tools
used for making timely beach advisory decisions. Chapter 3 discusses the influence that different
hydrologic environments have on beaches and how they affect model development. Chapter 4
summarizes current uses of predictive tools in predicting water quality exceedances at beaches.
Chapter 5 provides technical protocol for developing statistical models, and Chapter 6 provides
technical protocol for developing rain threshold levels for beach advisory decisions. Chapter 7
reviews the applicability of predictive tools, and Chapter 8 discusses trends that EPA sees in
predictive tools for beach notifications.

Volume II of this report describes site-specific applications of statistical models to several beach
environments and analyses of results. Volume II also presents Virtual Beach, a software tool
designed to build statistical multivariable linear regression predictive models at beaches.

Background
Coastal, Great Lakes, and inland beaches are treasured natural resources that provide significant
value, including recreational benefits. Those benefits are challenged by regular input of
pollutants from point and nonpoint sources. Recreational activities are especially affected if fecal
matter, treated or untreated, originating from human and animal sources is in the water. Ingesting
polluted waters or other exposure, resulting from recreational uses, can lead to illnesses such as
gastroenteritis, fever, hepatitis, and cryptosporidiosis, as well as infections of the skin, ears, and
respiratory system (USEPA 2002).

The BEACH Act was passed in October 2000 to reduce the human health risks associated with
water contact at coastal and Great Lakes beaches. The act requires EPA to coordinate and
provide grant funds to support water quality monitoring and beach notification programs in
states, territories, and eligible tribes with coastal or Great Lakes recreational waters. The
program goals consist of informing the public of water quality problems at beaches (through
notices that either provide advice about beach usage or close beaches), identifying sources of
pollution, investing in analytical methods development, and improving water quality at beaches.

The terms indicator bacteria., bacteria., and indicators all refer to FIB whose presence in
recreational water signal the presence of fecal material and any pathogens it might contain.

FIB are associated with disease-causing pathogens and are detected through sample collection
and laboratory analysis. Culture-based, analytical methods for quantifying FIB included in
EPA's 1986 recreational water quality criteria commonly take 24-48 hours to provide results.
Those culture methods have been improved to 18-48 hours (depending on the method and the

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
bacteria being detected). In the time between sampling and public notification of the sampling
results, swimmers can be exposed to pathogens through activity in the water. Information used to
issue closure notification today is often based on yesterday's sample data. Conversely, elevated
indicator densities detected in today's sample might no longer be present when the analytical
results are received, resulting in unnecessary closure and an unjustified adverse economic effect.

EPA is in the process of revising the Recreational Ambient Water Quality Criteria as required in
the BEACH Act of 2000. As part of that process, the quantitative polymerase chain reaction
(qPCR), a more rapid analytical method for detecting and quantifying the presence of specific
deoxyribonucleic acid (DNA) sequences—in this case, DNA sequences contained in FIB—is
being evaluated. FIB and associated methods are being linked to health effects in beach users
through a series of epidemiology studies. The rapid method has been defined variously as one
that provides analytical results in 4-6 hours, which does not include transportation time to the
laboratory, or laboratory preparation time. While laboratory methods are improving, results still
might not be available on the same day samples are collected.

In 1999 before the BEACH Act of 2000 was enacted and EPA's grant-based monitoring and
notification program began at coastal and Great Lakes beaches, EPA published a report, Review
of Potential Modeling Tools and Approaches to Support the BEACH Program (USEPA 1999b).
That report concentrates on rain threshold levels and the potential use of deterministic mixing
zone, fate and transport, and hydrodynamic models and only briefly mentions statistically based
models. This report includes statistical models, which have greatly increased in use since 1999.

Even with the advent of rapid methods, real-time or even same-day water quality data collected
to inform the public of the risks of using a waterbody will not always be available. One means to
supplement analytical results is to use statistical models and other predictive tools (such as
rainfall threshold levels and notification protocols). Significant development and implementation
of statistically based models has occurred, especially in the Great Lakes (Lake Erie and Lake
Michigan) (Francy 2009; Nevers and Whitman 2005). All those predictive tools have proven to
be reliable and cost-effective. EPA believes such predictive tools could be applicable in many
other settings as well, including marine and inland beaches. Those tools develop statistical
relationships or models between FIB densities (dependent variables) and various observations
that describe the environmental conditions at the beach (independent variables). The models use
recent and historical FIB densities and independent variables that include other water quality,
hydrodynamic, and meteorological data to predict current levels of FIB and to forecast near-
future levels of FIB or the likelihood of exceeding a water quality standard. Statistical models
and other predictive tools can be run as frequently as data are available for measured
independent variables and as long as models are shown to be producing reliable predictions that
protect public health.

Rainfall-based notifications and closures have been widely used at marine and freshwater
beaches for decades. Rainfall threshold levels are issued at some beaches on the basis of an
analysis of historical data. At such beaches, it has been shown that after a certain amount of
rainfall, a beach is likely to have high FIB densities (USEPA 1999). Other similar notification
protocols could be developed in which a certain combination of conditions has been shown to
result in high levels of FIB. For purposes of this document, rainfall threshold levels and other
notification protocols are not considered as models, but as predictive tools.

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
2 Tools for Beach Notification Decisions

Statistically based models are being used to estimate water quality at many beaches in the United
States, especially at beaches on the Great Lakes (Francy 2009). Rainfall threshold levels and
other notification protocols are being used throughout the country. The primary reason for
developing a predictive tool for beach notifications is to improve timeliness and accuracy of
notification decisions and public notification in comparison to the current practice of waiting 18-
48 hours for sample results before making a decision (Francy et al. 2006). Predictive tools might
also be useful in developing or adapting routine monitoring programs to focus efforts when
conditions favoring high FIB levels exist. The predictive tools examined in this report include
statistical models, rain threshold levels, notification protocols, and deterministic models.

2.1 STATISTICAL MODELS
A statistical model (also called a statistically based model or ^predictive model) is a general
term for any type of statistical modeling approach to predicting beach water quality. Linear
regression models assume a linear relationship between factors, or combinations of factors, and
FIB (Boehm et al. 2007; USEPA 2007; Nevers and Whitman 2005; Olyphant and Whitman
2004). The most highly developed statistical model approach is a multivariable linear regression
relationship between FIB and several independent variables. Typical, easy-to-measure
environmental and water-quality variables include the following: meteorological conditions
(solar radiation, air temperature, precipitation, wind speed and direction, dew point); water
quality (turbidity, pH, conductivity/salinity, UV/visible spectra); hydrodynamic conditions
(flows of nearby tributaries, magnitude and direction of water currents, wave height, tidal stage);
and other factors such as presence/number of birds or bathers. The most common model outputs
are estimated levels of FIB or probability of exceedance of the state water quality standard for
FIB. The process of developing a statistical model for beach advisories is explained in more
detail in Chapter 5.

Statistical models are especially useful at some beaches and less useful at others. According to
Francy (2006), statistically based modeling can also effectively predict water quality in situations
where nonpoint or unidentified sources dominate, as well as in settings where discrete sources
have been identified (Nevers and Whitman 2005). If a beach rarely has high bacteria densities or,
conversely, almost always exceeds a bacterial water quality standard, it is unlikely that a
statistical predictive model would significantly improve practices for timely decision making and
notification. If a beach occasionally exceeds the water quality standard or if bacteria levels are
frequently near the water quality standard level, statistical models can help by providing a timely
prediction of whether FIB are likely to exceed the water quality standard according to parameters
that are easier and faster to measure than FIB densities.

Modeling tools are used to supplement, not replace, monitoring, and their primary purpose is to
make predictions because of the lag time between sampling and obtaining microbial indicator
results. Developing and using a statistical predictive model is a dynamic process based on data
collected via existing beach-monitoring programs. Statistical modeling employs a retrospective
correlation of measured water quality (FIB levels) with conditions observed at the time of sample
collection to produce an estimate of water quality that is time-relevant for recreational water

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
management and use by the public. Model developers can create Internet-based systems that
provide model predictions (similar to weather forecasts) to the public for the current period, not
for a day or two in the past once exposure has already occurred. However, models need to be
periodically validated and refined to improve predictions and better protect public health. More
information on that topic is provided in Volume II of this report.

2.2 RAIN THRESHOLD LEVELS
When significant rainfall occurs in a short period, runoff is generally produced, which can carry
harmful pollutants. Stormwater runoff and other surface water runoff (streams and rivers) are
widespread primary pathways by which FIB and pathogens reach beaches (Lipp et al. 2001;
Boehm et al. 2002; Schiff et al. 2003; Ackerman and Weisberg 2003). Runoff can contain animal
feces and other bacterial sources that were deposited on land between storm events (Ackerman
and Weisberg 2003). Runoff can also carry human sewage from leaks in the sewage transmission
infrastructure (Ackerman and Weisberg 2003). Stormwater volume and pollutant loads generated
depend on the characteristics of the drainage area, conditions of wastewater and Stormwater
infrastructure, and the volume and intensity of rainfall. The process of developing a rain
threshold level, and other notification protocols, is explained in Chapter 6.

For some beaches, a defined intensity or duration of rainfall is frequently associated with
observations of poor water quality. With that information, many beach managers and public
health officials commonly issue a rain threshold notification after a rain event of a predefined
intensity or duration. Beachgoers are familiar with routine, wet-weather closures in locations
where they are implemented.

The objective of a rain threshold level is to identify a threshold level of rainfall at which FIB
levels are likely to exceed the water quality standard. That is achieved if a statistical relationship
between rainfall events and FIB densities can be observed or if a level of rainfall and rainfall
conditions is consistently shown to be associated with increased FIB densities. The threshold can
then serve as a management tool for developing notification protocols or predicting water quality
standard exceedances requiring a beach notification. Several agencies have developed beach
operating rules by studying site-specific relationships between rainfall and water quality
monitoring data. Chapter 3 provides examples of such tools. Those types of tools are based on a
simple regression or a frequency of exceedance analysis of simultaneous observations of FIB
levels at representative monitoring stations near the beach and rainfall events at one or more
locations at the beach or in the upstream watershed.

2.3 NOTIFICATION PROTOCOLS
Notification protocols are based on a set of decision criteria and questions that trigger
notifications in anticipation of poor water quality or other potentially hazardous conditions
(rough waves, strong rip currents, red tide). This document focuses on only the water quality
aspect of notification protocol. Notification protocol is a general term used to describe a protocol
or a set of questions or decision points a beach manager routinely uses to determine whether to
close a beach or issue a notification. The protocol can rely on sampling results, other
information, or beach characteristics either alone or in addition to sampling results. A decision
tree can be used as a type of notification protocol. Several states (see Section 4.3) use a series of

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
questions or decision trees to guide their decisions for beach notifications. Such evaluations are
designed to supplement bacteria data with characteristics of the beach that can influence the
related bacteria levels (i.e., proximity to pollution sources, stormwater runoff, and current or
wind direction).

Decision Trees and Binary Models
A decision tree classifies data from general to specific. It can be a simple tree that beach
managers use to assess recent sampling results with other current conditions (such as rainfall
amount and information on sewage bypasses) to decide whether to issue an advisory notification
or to close a beach, and if so, for how long. If FIB are influenced by only one or two binary
factors, a decision tree can be a simple and accurate predictive tool.

If the underlying correlation in a statistical model is being driven by critical environmental
factors that change daily, the empirical relationship will always be somewhat unclear. The
strength of statistical regression models is also their weakness. Regression analysis requires
sufficient data to both establish the relationship between the predictive variables and observed
water quality and define the confidence that can be placed in model predictions. The weakness of
the prediction is that it is based on data, some of which cements the correlation, and some of
which interferes with it. It is difficult to identify a particular set of circumstances that applies on
a given day unless beach managers are fully aware of the relationship between FIB sources and
their beach, and apply a discriminator to the observed data that incorporates that understanding.
An example of that would be a situation in which a stream outlet (a major source of FIB) was
west of beach with an east-west oriented shoreline. A stiff breeze from either the west or east
creates choppy water that causes turbidity to increase. Historic data indicate that turbidity is a
fairly strong predictor of elevated FIB densities. When the wind is from the west, turbidity is
high and FIB are being transported from the stream outlet to the beach. When the wind is from
the east, turbidity is just as high, but the stream is no longer a source. Without wind information
and knowledge of the stream source, a model prediction derived from only increased turbidity
does not accurately inform the beach manager concerning the presence of pollution at the beach.
A decision tree modeling approach can handle such a situation as a set of decision points or
yes/no junctions: IF turbidity is elevated AND the wind is from the west, THEN indicator
densities will likely be elevated. Those decisions would likely be combined with other decision
points stemming from a statistical analysis of historic beach data. Using the same scenario
described above, a binary regression model would propose two different empirical relationships,
one used when wind comes from the west and one for wind from the east.
Another binary model might use different sets of independent variables for early- and late-season
observations, e.g., Ohio Nowcast (Francy and Darner 2007). Such an approach acknowledges
that some variables might be more relevant in the early part of a season, but it might not be
useful for prediction later in the season. The model developer analyzes both early and late season
data, over a number of years. If the data set had not been segregated, its predictive power would
be reduced by dilution because of changing circumstances. That approach is closely aligned to
Hierarchical Bayesian Modeling, which has become more popular in the past decade.

An example of using a single, readily measured parameter is the use of turbidity. A strong
correlation between turbidity and elevated FIB densities makes turbidity a strong predictive
variable in a regression model. However, that univariate simplicity would not be desired in the

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
situation where wind is the primary transport mechanism from the source to the beach or in the
case where rainfall-driven turbidity is important but wind and wave-induced turbidity is
irrelevant.

An interesting approach to decision tree modeling of bacterial indicator prediction is described in
Bae et al. (2010). The commercial classification and regression tree (CART) software they used
can perform tradition linear regression, but it can also classify data using a decision node
approach. Each node is based on a response threshold with the strongest decision nodes
occurring higher in the decision tree with less significant thresholds/variables having a lower
place in the decision hierarchy. Each tree node is univariate in nature—a decision is made using
a single independent variable. It allows for greater flexibility in defining the influence of
different variables. Contrast that with multilinear regression, whose regression equation demands
the participation of all independent variables in the model.

The goal of a classification tree approach is to minimize classification errors (i.e., false positives
and false negatives). The CART method is more commonly used to address the question of
whether a water quality standard will be exceeded than to produce a quantitative prediction of
FIB. A successful CART model was designed for several beaches in South Carolina (Johnson
2007).

2.4 DETERMINISTIC MODELS
Deterministic models use mathematical representations of the processes that affect bacteria
densities to predict exceedances of water quality standards. They include a range of simple to
complex modeling techniques.

The 1999 EPA report Review of Potential Mode ling Tools and Approaches to Support the
BEACH Program (USEPA 1999b), mentioned in Chapter 1, includes various types of
deterministic models such as mixing zone, fate and transport, and hydrodynamic models, as well
as simple, predictive tools such as rainfall-curve-based closures. Specific deterministic models
discussed in the 1999 report include CORMIX, EFDC (Environmental Fluid Dynamics Code),
HSPF (Hydrological Simulation Program—Fortran), PLUMES, QUAL2E, Regional Bypass
Model, SMTM (Simple Mixing and Transport Model), STORM, SWMM (Storm Water
Management Model), and TPM (Tidal Prism Model). Those models were developed for general
purposes, but they were perceived to have potential use in support of implementing criteria for
beach notification and advisories. With the exception of the Regional Bypass Model, EPA is
unaware of any widespread use for any of those models for predicting water quality at beaches.
Most of the models used for timely beach notifications are statistically based models.

EPA believes there is potential for applying deterministic models to support the Beach
monitoring and notification program. Using statistical models in combination with existing
deterministic models, or stacking models, has been shown to have potential for increasing the
quality of the results produced by using statistical models alone. That and other potential
applications of deterministic or process models are described in detail in Chapter 8.

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
 3  Hydrologic Environments and Their Effects on
     Modeling for Beach Notifications
 3.1  INTRODUCTION
FIB are used to indicate the presence and extent of fecal pollution. FIB originate and thrive in the
intestinal tracts of warm-blooded animals. Identifying sources of fecal pollution, tracking their
movement through a watershed, and quantifying attenuation during transport within the aquatic
environment are very difficult tasks, especially in highly dynamic shoreline environments.
Having a good understanding of pollution sources and hydrologic setting can greatly improve
model prediction accuracy, especially if that knowledge can be used to direct data collection
efforts.

Fecal pollution sources can be roughly categorized into human sewage and animal sources
(Schueler 1999). Human sources include publicly owned treatment works (POTW) discharge,
runoff or seepage from septic systems, leaking wastewater infrastructure, and direct effects from
bather shedding at the beach (Figure 3-1).  The nature of the sources of FIB detected at a beach is
of primary concern to beach managers and is part of the information that should be documented
at beaches through the use of sanitary surveys (for more information on sanitary surveys, see
Section 5.3.1.3).
Figure 3-1. Sources and fate of fecal pollution in watersheds.

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
3.2 SOURCES OF FECAL POLLUTION IN THE WATERSHED

3.2.1 Animal Sources
Animal sources can be categorized as domestic animal sources and wildlife. Domestic animal
sources consist of agricultural animals and pets. Wildlife includes birds, rodents, upland game
animals, and marine animals, among others.

3.2.2 Human Sewage
The potential sources of human sewage vary depending on whether the watershed is sewered. In
sewered watersheds, sources can be treated sewage, combined sewer overflows (CSOs), sanitary
sewer overflows (SSOs), leaky wastewater infrastructure, illegal connections, and dumping to
storm drains. CSO and SSO discharges are generally associated with storm events that cause
capacity exceedances resulting in raw sewage (CSO) or partially treated sewage (SSO) flowing
into natural waters. Power failures at pumping stations and line blockages and breaks also can
result in human sewage reaching receiving waters. Illegal or improper connections to the sewer
system and illegal dumping in storm drains are problems in some communities. Knowledge of
outfall locations and the quantity and quality of storm sewer discharges is important for beach
managers and should be determined by using a sanitary survey.

In non-sewered watersheds, human sewage is usually processed by septic systems or community
package plants. If any part of the systems fails, human waste can escape and migrate to
waterways. The design life of most septic systems is limited, usually in the range of 15 to 30
years, and proper maintenance of septic systems is widely variable. In watersheds with older
systems and in heavily developed shore areas near lakes, faulty septic systems can be an
important source of untreated human sewage.

By having some knowledge of what sources of fecal pollution are and where they are, data
collection for predictive tool development can be focused on streams (for example) that are
considered to be likely inputs of fecal matter.

3.3 BACTERIA MOVEMENT TO THE WATERBODY
Although the location and extent of pollution sources vary by the land use and ground cover in a
watershed, a common characteristic is that loading to the receiving waterbody is usually strongly
linked with the duration and intensity of storms. That is generally true for sewered areas and
developed areas with failing septic systems.

In less developed watersheds, the bacteria in runoff take longer routes in reaching receiving
waters, such as by detention and infiltration of stormwater in wetlands or in the soil. Such an
environment reduces the direct flow of fecal pollution to waterways. Some areas have been
developed to include infiltration ponds where stormwater is routed to the ponds where it
infiltrates slowly, mimicking a natural system.

-------
Predictive Modeling at Beaches—Volume I November 22, 2010

3.4 BACTERIA MOVEMENT WITHIN THE WATERBODY
Once in a waterbody, the chances that fecal pollution and associated FIB move into swimming
areas depend on many variables including the distance of the point of entry from the swimming
area, effluent mixing and transport by currents, sediment settling and resuspension, and fate and
transport factors experienced by the bacteria that affect survival. Those processes and factors
differ greatly according to the receiving waterbody. Lotic (flowing water) environments such as
rivers and streams present conditions very different from lentic (still water) environments such as
lakes. Oceans and estuaries are influenced by tides (affecting bacteria movement), longshore
currents, waves, and saltwater (affecting survivability). Geography, climate, rainfall, drainage,
and other conditions also play important roles in determining the final occurrence of fecal
pollution in a waterbody.

3.5 HYDRODYNAMIC FACTORS AFFECTING POLLUTION MOVEMENT

3.5.1 Hydrodynamic Dispersal in Lakes and Other Lentic
Environments
When an effluent stream enters a standing body of water, the incoming water flows into the
density layer in the receiving waterbody that is most similar to its density. Density is governed
primarily by temperature and dissolved and suspended material.

Three types of inflow water movements can result, depending on density differences between the
inflowing water and the receiving water:

1. Overflow—inflow water density is less than the receiving water density

2. Underflow—inflow water density is greater than the receiving water density

3. Interflow—inflow water enters the receiving water at an intermediate depth

The extent of turbulent mixing that occurs depends on the volume and velocity of the influx.
Once in the open water, the inflow velocity is reduced, and the mixing zone expands. The
reduction of flow velocity typically enhances deposition of suspended material.

Unlike lotic environments where water movement in the waterbody is generated primarily by
downstream flow, the directional movement of bacteria in lentic environments is generated
primarily by the transfer of wind energy to the water. The frictional movement of wind blowing
over water sets the water surface into motion, producing traveling surface waves. In deep water
where wave length is much less than water depth, that motion is confined to surface layers with
little effect on the displacement of deep waters. In shallower waters, when wavelength becomes
more than 20 times the water depth, the wave becomes a shallow water wave, and the cycloid
motions are transformed into a to-and-fro sloshing that can extend to the bottom of the water
column. Morphometry of the water basin, stratification structure (density layers), and the area
exposed to wind all contribute to water turbulence, currents, and mixing and transport of bacteria
cells into and out of a swimming area.

Settling and resuspension of bacteria are important factors in lentic environments because of
wind-generated water turbulence. Unlike streams, which transport resuspended sediments

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
downstream, nearshore sediments tend to stay nearshore except in the case of powerful storms.
Because bacteria cells tend to survive longer in sediment than in open water (Sherer et al. 1992;
Burton et al. 1987; Thomann and Mueller 1987), the resuspension factor can make sediment an
important source of bacteria in swimming areas.

3.5.2 Hydrodynamic Dispersal in Streams and Rivers
Streams and rivers are lotic environments. Bacteria-laden effluent entering such environments
moves and disperses in the direction of the flow. In idealized conditions of constant width, depth,
area, and velocity, the stream will have uniform flow. That condition rarely occurs in natural
channels, however (FISRWG 1998). Uniform flow is typically disrupted by meander bends,
changes in cross-section geometry, and channel features and obstructions such as fallen timber,
boulders, sand bars, riffles, and pools, which cause turbulence, mixing, and the convergence,
divergence, acceleration, or deceleration of flow. Those conditions, combined with flow volume
and velocity and the influence of survival factors, such as water clarity, affect the appearance of
bacteria in a lotic swimming area.

The nature of a waterbody strongly influences the ability of statistically based regression models
to effectively predict FIB densities in waters adjacent to swimming beaches. In the following
paragraphs, the amenability of Great Lakes locations, inland lakes, rivers, and marine settings are
discussed relative to the modeling process.

3.6 GREAT LAKES HYDROLOGIC ENVIRONMENT
The five Great Lakes represent a distinctive hydrologic environment in North America. They are
the largest freshwater bodies on the continent. The Great Lakes, with the exception of Lake Erie,
all have maximum depths greater than 200 meters. The Great Lakes experience very little tidal
effect but, nevertheless, do experience variations in water level due to wind and season. Like the
oceans, they can be greatly disturbed by storms. Unlike marine settings, the Great Lakes
represent a relatively confined set of environments and are separated from the global currents
that characterize the oceans. Discharges to the Great Lakes, therefore, are more likely to have a
local cumulative effect.

The hydrologic environment of the Great Lakes has been modeled extensively (Schwab and
Bedford 1994; Nevers and Whitman 2005) and, as detailed later in Chapter 4, has been the focus
of most of the successful statistical predictive modeling efforts of FIB at swimming beaches.
That comparative success stems in part from the fact that turbulent mixing, and thus FIB
variability, is more dynamic at marine beaches that are strongly affected by tides, surge, and
wave action. In addition, the lower number of variables associated with Great Lakes hydrologic
environments has facilitated the implementation of deterministic hydrodynamic and fate and
transport models as described in Chapter 7.

3.7 INLAND LAKES
Inland lakes constitute waters that are amenable to statistical modeling and to using other
predictive tools such as rainfall threshold levels. Smaller inland lakes are less likely to be
receiving waters for POTWs but can become more degraded from overdevelopment and high
10

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
levels of recreational use. Desirable lake locations can be affected by septic systems and can be
very sensitive to runoff effects if little water exchange occurs through the lake system. Inland
lakes are also sensitive to effects from nitrogen and phosphorous pollution, which can increase
water column turbidity, thereby reducing the effects of degradation by sunlight (insolation) on
FIB and pathogens.

3.8 RIVERS
Statistical models for predicting FIB densities for the likelihood of exceedances have been
successful in at least three rivers, as described in Chapter 4. Rivers lend themselves to statistical
modeling. They possess a predictable flow direction and easily gauged water level and flow
velocity. In addition, the locations of permitted discharges are known, and travel times from
other features, such as tributaries, are readily determined. Accordingly, rivers tend to respond
predictably to varying conditions and yield good modeling results.

Because of the good understanding of fluvial processes and well-studied mixing and transport
characteristics, rivers can be also good systems for developing and using deterministic models.
Deterministic models in rivers have been used in National Pollutant Discharge Elimination
System permitting and for other in-stream water quality assessment purposes relating to Clean
Water Act programs. Some examples of those models are the EPA Hydrological Simulation
Program—FORTRAN (HSPF) (www.epa.gov/ceampubl/swater/hspf) and the EPA Stormwater
Management Model (SWMM) (www.epa.gov/ednnrmrl/models/swmm/index.htm). However,
they are not used for beach notifications.

3.9 MARINE WATERS
Marine waters constitute the most challenging environment for statistical predictive models
because of the complex hydrodynamic nature of ocean settings and the resulting high number of
variables associated with them. The effects of large tidal ranges (> 9 feet) and resulting tidal
currents and changes in flow direction have, in the past, made marine models more the focus of
deterministic modeling efforts (Zhu 2009). Some state beach programs (e.g., South Carolina and
Maine) are implementing statistically based models. Within the range of ocean beaches, a wide
variety of site characteristics exists, some of which will be more amenable to use in statistical
models than others. Beaches in estuaries, harbors, and coastal embayments are less dynamic than
beaches on the open ocean and, thus, might be good candidates for statistical modeling. As with
freshwater settings, not all swimming beaches can benefit from the use of statistical predictive
models. For many settings, however, statistically based models will be effective, and those
settings will become apparent as beach managers experiment in coming years.
11

-------
Predictive Modeling at Beaches—Volume I                                    November 22, 2010
                           This page is intentionally blank.
                                            12

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
4 Current Applications of Predictive Tools

Various predictive tools (statistical models, rainfall thresholds, and notification protocols) are
being used as part of beach management programs across the United States. EPA reviewed the
tools that are in use or almost complete, and this chapter includes a short description of each for
which information was available. Table 4-1 provides an overview of the predictive tools. EPA
has successfully used Virtual Beach at several freshwater and marine sites. They are described in
Volume II of this report.

Table 4-1. Overview of predictive tools in use
Location
California
Connecticut
Delaware
Florida
Hawaii
Georgia
Illinois
Indiana
Kansas
Maine
Maryland
Massachusetts
New Jersey
New York
Ohio
Pennsylvania
Rhode Island
Scotland
South Carolina
Washington
Wisconsin
Prediction method
Rain threshold level
Rain threshold level
Rain threshold level
Rain threshold level
Rain threshold level
Statistical model
Statistical model
Statistical model
Statistical model
Notification protocol
Statistical model and notification protocol
Statistical model
Rain threshold level
Regional hydrologic model combined with local
notification protocol
Statistical model
Statistical model
Notification protocol
Rain threshold level
Statistical model and notification protocol
Notification protocol
Statistical model
Areas of application
Southern California counties
Cities of Greenwich, Norwalk, and Stamford
Entire state
Individual cities
Entire state
Chattahoochee River
Great Lakes beaches
Great Lakes beaches
Selected rivers
Entire state
Sandy Point State Park (in development)
Charles River
Six beaches
New York City
Great Lakes beaches
Schuylkill River
Entire state
Various beaches
Georgetown and Horry counties
All counties
Ozaukee County
Note: The information provided in this chapter is based on personal communications with state contacts and other responsible
agency personnel. In some instances, references of published works are available. Those references are listed in Chapter 9. The
topics discussed with beach contacts are summarized in Appendix A. EPA realizes that new models are continually being
developed. Table 4-1 is not an exhaustive list of all the tools being used or developed for beach management. EPA welcomes input
on additional predictive tools in use or being developed for beach programs.
13

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
4.1 STATISTICAL MODELS
Statistical models are created on the basis of observed relationships among variables such as
sunlight, temperature, turbidity, input flows, wind speed (and other variables that can affect
bacteria loading or survival rate) and indicator bacteria levels. Such models have been found to
be useful for making timely beach notification decisions. Model outputs can be estimated
densities of indicator bacteria or probability of exceeding a threshold level such as the state water
quality standard. That information is used in making a beach notification decision. The models
are described in more detail in Chapter 5. This chapter provides short descriptions of statistical
models in use.
As shown, a number of statistical models are successfully implemented at Great Lakes beaches.
Local and state agencies managing beaches on the Great Lakes have put an extraordinary effort
into protecting the health of recreational users by working with the U.S. Geological Survey
(USGS) and independently to pioneer the method of statistical modeling (and other predictive
tools) for timely beach notifications. Much of the work has been published in the scientific
literature by USGS and others. Table 4-2 summarizes the statistical models recently used or in
development.

Table 4-2. Beaches assessed using predictive tools
Name
SwimCast

SwimCast

Nowcast

Beach
Beaches (four)
in Lake County,
Illinois

63rd Street
Beach, Chicago
Park District,
Illinois

Lake Erie
Beaches, Ohio

Model inputs
Air temperature
Wind speed
Wind direction
Precipitation
Relative humidity
Lake stage
Water temperature
Water clarity
Insolation
Wave heights
Flow
Rainfall
Sunlight
Temperature
Turbidity
Wave height
Wind speed
Day of year
Lake level
Rainfall
Temperature
Turbidity
Wave height
Model output
Estimated
Escherichia
CO//
concentration

Estimated E.
CO//
concentration

Probability that
water quality
standard will
be exceeded

Leading
agencies
Lake County
Health
Department,
Lakes
Management
Unit

Chicago Park
District (with
Remote Data,
Inc.)

U.S. Geological
Survey

Status
In use

Applied for
trial period

In use

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
Name
Nowcast
Project
S.A.F.E.
RainFlow
South Shore
Beach Model
Flag
Program
PhillyRiver-
Cast
BacteriAlert
Stormwater
Model
Stormwater
Model
Beach
Upper Lake
Park, Ozaukee
County,
Wisconsin
Lake Michigan
Beaches,
Indiana
Upper Lake
Park, Ozaukee
County,
Wisconsin
South Shore
Beach,
Milwaukee,
Wisconsin
Charles River,
Boston,
Massachusetts
Schuylkill River,
Philadelphia,
Pennsylvania
Chattahoochee
River near
Atlanta, Georgia
Horry County,
South Carolina
Fairhaven
Beach, Lake
Ontario, New
York
Model inputs
Day of year
Lake level
Rainfall
Temperature
Turbidity
Wave height
Gauge height
Rainfall
Chlorophyll a
Turbidity
Wind direction
24-hour rainfall
48-hour rainfall
Bacteria composite
sample
Lake conditions
Turbidity
Stream flow
Stream velocity
Algae
Chlorophyll a
Conductivity
£. co// concentration
from previous sample
Temperature
Wave direction
Wave vector
Rainfall
Recent bacteria
sample
Flow
Rainfall
Turbidity
Flow
Turbidity
Cumulative rainfall
Current UV level
Current weather
Moon phase
Preceding dry days
Rainfall intensity
Rainfall
Turbidity
Current speed
Current direction
Model output
Probability that
water quality
standard will
be exceeded
Probability that
water quality
standard will
be exceeded
Yes/No
advisory based
on water
quality
standard
Estimated E.
CO//
concentration
Predicted
concentration
and probability
of exceeding
secondary
standards
Yes/No
advisory based
on water
quality
standard
Low/High risk
level
Estimated £.
CO//
concentration
To be
determined
Leading
agencies
U.S. Geological
Survey
Indiana
Department of
Environmental
Management
Ozaukee County
City of
Milwaukee
Charles River
Watershed
Association
Philadelphia
Water
Department
U.S. Geological
Survey
Horry County
(with the
University of
South Carolina)
USGS and New
York State
Status
In use
In use
previously
and currently
under revision
Used before
switching to
Nowcast
Under
consideration
In use
In use
In use
In use and
being
recalibrated
Being
developed
                                           15

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
Name
Stormwater
Model
Receiver
Operating
Characteristi
c Curve
Analysis for
Boston
Harbor
Beaches
Unnamed
Beach
Sandy Point
State Park,
Chesapeake
Bay, Maryland
Constitution
Beach in East
Boston, Carson
Beach in South
Boston, Tenean
Beach in
Dorchester, and
Wollaston Beach
in Quincy
Little Arkansas
River,
Rattlesnake
Creek, and
Kansas River,
Kansas
Model inputs
Day
Moon phase
Rainfall
Salinity
Temperature
Wind speed
Antecedent rainfall
Seasonality
Turbidity
Temperature
Chlorophyll
Dissolved oxygen
Other water quality
parameters
Model output
Yes/No
advisory based
on water
quality
standard
Yes/No
advisory based
on water
quality
standard
Estimated fecal
coliform and
nutrient
concentrations
Leading
agencies
National
Oceanic and
Atmospheric
Administration &
University of
Maryland
Massachusetts
Department of
Conservation
and Recreation
and
Massachusetts
Water
Resources
Authority
U.S. Geological
Survey
Status
Being
developed
In use
Not in use
4.1.1 SwimCast (Lake Michigan Beaches, Illinois)
Several Lake Michigan beaches are using a program called SwimCast to predict Escherichict coli
(E. coli) densities. In Lake County, Illinois, the program is installed at several beaches, including
Forest Park-Lake Forest, Rosewood-Highland Park, and Waukegan Beach. Meteorological
equipment is on a station in the lake to measure air temperature, wind speed and direction,
precipitation, relative humidity, lake stage, water temperature and clarity, insolation (sunlight),
and wave height. The data are transferred to a data logger and used in an equation to predict the
E. coli density. Sampling is still performed at the beaches 4 days a week between May and
September, and, as a result, predictions have been approximately 90 percent accurate. Lake
County has been using the SwimCast system since 2004 and provides daily data and beach
notifications on its website.

Chicago Park District is also testing the SwimCast program. It announced SwimCast for a trial
period at the 63r beach in 2008, and the District plans to apply the model once it is calibrated
efficiently enough to provide 90 percent accuracy. Additional information on SwimCast's use in
Lake County, Illinois, is at www.lakecountyil.gov/Health/want/SwimCast.htm.

4.1.2 Nowcast (Lake Erie Beaches, Ohio)
The Ohio Nowcast provides advisory information based on predictive models for two Lake Erie
beaches (Huntington and Edgewater) and one recreational river site (Cuyahoga River at Jaite).
The models are uniquely fitted to the characteristics of each beach using multivariable linear
regression (further explanation is in Chapter 5). Water samples are collected daily and analyzed
16

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
for indicator microbes and other parameters to obtain inputs for the model. The inputs are
turbidity, rainfall, wave height, water temperature, day of the year, and lake level. The model
produces a probability of exceeding the 235 colony forming units (CPU) per 100 milliliters (mL)
standard for E. coli. The probability threshold is site-specific, based on historical data, and is set
by the beach manager for decision making. For example, a beach manager might decide post an
advisory if there is a 25 percent probability of exceeding the water quality standard. During
model development, water quality sampling continued so that decision accuracy could be tested.
The number of false positive and false negative predictions were calculated. The probability
threshold is reconsidered for maximizing correct decisions (Francy 2006). The use of models at
other Lake Erie beaches has been investigated, and the beach models are all in different phases.
Development of models for Villa Angela and Lakeshore were suspended because the model
results were not more accurate than the use of the persistence model. At Maumee Bay State Park,
a model was developed with variables for turbidity and wind direction; it was validated during
2010 and is available to include in the Ohio Nowcast in 2011. During 2010, data were collected
for model testing at Lakefront Park (Huron, Ohio), Mentor Headlands State Park, and Fairport
Harbor Lakefront Park. Further information is available at:
http://www.ohionowcast.info/nowcast technical.asp.

4.1.3 Nowcast (Port Washington, Wisconsin)
In 2009 Ozaukee County Public Health Department partnered with the Wisconsin Department of
Natural Resources to develop an operational Nowcast model at Upper Lake Park Beach in Port
Washington, using Virtual Beach software to develop the model. It took approximately 40 hours
of combined staff time to develop it, using data collected during the 2007 and 2008 beach
seasons through routine beach monitoring and data collection. Variables used in the model are
wave height, turbidity, 24- and 48-hour rainfall, stream flow, water and air temperature, and the
previous day's lab results on E. coli (when they are available). The model is used to predict E.
coli densities four days a week. Running the model takes approximately 5 minutes per day as
part of routine monitoring activities. County staff enter daily data for each of the explanatory
variables into the model and report the results (swimming advisory or not) on the Wisconsin
Beach Health website (www.wibeaches.us).

The model proves to be highly accurate, with a mean absolute error of 15 CFU/100 mL and an
overall R2 of 62 percent. A visual inspection of the model's performance confirms that the model
was highly sensitive to small fluctuations in E. coli concentration during 2009.

4.1.4 Project S.A.F.E. (Indiana)
Project S.A.F.E. (Swimming Advisory Forecast Estimate) is a statistical predictive model
applied to four beaches in Indiana (Lake Street, Marquette Park, Wells Street Beaches of Gary,
and Ogden Dunes in Portage Township). The model is specifically designed to include the
pollutant load coming from a significant outfall near the beaches (Burns Ditch). The model uses
characteristics of the individual beaches, wind direction, rainfall, chlorophyll a, turbidity, and
Burns Ditch gauge height. The model is run daily to obtain the predicted likelihood that the E.
coli concentration will exceed safe limits. The beach managers can use the probability to
determine if a beach should be closed or under advisory. USGS developed the Project S.A.F.E.
model (Whitman 2008).
17

-------
Predictive Modeling at Beaches—Volume I November 22, 2010

The model is being applied in combination with a regional Hydrodynamic Model for Lake
Michigan, which the National Oceanic and Atmospheric Administration (NOAA) developed.
The NOAA model simulates to a high degree of accuracy frequent changes in direction of long-
shore current in this region of Lake Michigan. The data is combined with the Project S.A.F.E.
statistical model for more accurate beach water quality predictions than the Project S.A.F.E.
model alone.

Project S.A.F.E. has been under revision for several years and is expected to be put back in use
soon. Additional information on Project S.A.F.E. is at
www.glsc.usgs.gov/main.php?content=research_projectSAFE about&title=Project%20S.A.F.E.
0&menu=research initiatives_projectSAFE.

4.1.5 RainFlow (Port Washington, Wisconsin)
Before using Nowcast at Upper Lake Park in Port Washington, Wisconsin, bacteria levels were
estimated daily using a model named RainFlow (City of Port Washington 2007). No stormwater
controls are in Port Washington, and the nearest source of runoff affecting the beach is Valley
Creek. The model used the velocity and volume of water passing through Valley Creek in
combination with recent rainfall data and a turbidity reading at the beach to provide a
notification recommendation. Valley Creek parameters and turbidity were taken by Ozaukee
staff; daily rainfall was measured at the Port Washington Wastewater Treatment Plant (WWTP)
just south of the beach. The overall model accuracy according to historical observations and
predictions was 90 percent. The model was validated with daily composite samples. In the 2008
season, the model correctly recommended if advisory notification was needed 94 percent of the
time. Upper Lake Park now uses Nowcast, as described earlier.

4.1.6 South Shore Beach Model (Milwaukee, Wisconsin)
In Milwaukee, the city attempted using models for Bradford Beach and South Shore Beach. The
city aborted the Bradford Beach model because the maximum sensitivity that could be obtained
for estimating the indicator concentration was only 80 percent. The city is considering the South
Shore Beach Model. That model would require a weekly calibration and the E. coli concentration
determined in the previous 24 hours. None of the nearby laboratories are open 7 days a week.
The city is confronting other issues such as equipment costs, sensitivity, and available staff.
However, EPA has successfully modeled South Shore Beach, and details about that effort are in
Volume II of this report.

4.1.7 Charles River Flag Program, Massachusetts
The Charles River Watershed Association has been using the Flag Program since 1998 to
estimate and communicate the potential risks associated with recreational activities in the river
each day. The association samples bacteria approximately twice each week and uses the data in
the model estimations along with current rainfall data. Model estimates are based on ordinary
least squares and logistic regression models (explained further in Chapter 5). The model
predictions are posted online for nine sites, and four sites along the river have a water quality
flag to communicate to recreational users. Most of the river is designated for secondary contact
recreation (boating), and a fecal coliform bacteria target is based on that use. Red flags are raised
18

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
at the four monitored stations if the probability of the river exceeding boating standards is equal
to or greater than 50 percent. If the probability is less than 50 percent, a blue flag is raised.
Yellow flags are used when there is uncertainty of the water quality or other factors could be
affecting boater safety (e.g., Cyanobacteria). Red flags are normally raised only after heavy
rainfall.

For more information about and updates to the program, see
www.crwa.org/water quality/daily/daily.html.

4.1.8 PhillyRiverCast (Philadelphia, Pennsylvania)
The PhillyRiverCast is a Web-based water quality forecasting system developed by the
Philadelphia Water Department to provide the public with information on the status of the
Schuylkill River. The model uses real-time turbidity, flow, and rainfall data to predict fecal
coliform concentration. It runs automatically and updates the ratings every hour on the basis of
the estimated current fecal coliform concentration. The website also explains what recreational
activities are considered safe for the estimated bacteria levels (Maimone et al. 2007). For more
information and updates, see www.phillvrivercast.org/; for information about how the model was
created, see www.phillvrivercast.org/Navhowcreated.aspx.

4.1.9 BacteriALERT (Chattahoochee River, Georgia)
The Chattahoochee River BacteriALERT program monitors similarly to the PhillyRiverCast
system, using real-time turbidity readings and flow. Unlike the PhillyRiverCast system,
BacteriALERT does not use rainfall. Using the current flow as an input captures the influences
of rain on bacteria levels and most regular operation of the upstream Buford Dam. The model
also runs automatically and updates recreational users via a website every hour. The ultimate
posting by BacteriALERT is a risk level for exposure to harmful bacteria and organisms (none,
low, or high), as suggested from the estimated E. coli concentration. A low risk is an E. coli
concentration of < 177 CFU/100 mL, and a high risk is posted if E. coli counts are estimated to
be more than 235 CFU/100 mL.

The BacteriALERT program is a partnership between state and federal agencies and
nongovernment organizations. It was first tested on the Chattahoochee River. Additional details
on the entire Chattahoochee River project are at
http://ga2.er.usgs.gov^acteria/SummarvIntroduction.cfm.

4.1.10 USGS Model (Kansas)
The USGS used in-stream water quality monitoring results and regression equations to estimate
real-time bacteria and nutrient concentrations for two stations in the Little Arkansas River and
one station in the Kansas River and in Rattlesnake Creek. Stream gauges were installed at each
location. The stations monitor turbidity, water temperature, specific conductance, dissolved
oxygen (DO), pH, and total chlorophyll. The water quality data were used to develop a
relationship between fecal coliform and the water quality parameters that could be measured in-
stream. Each regression equation was specific to its stream. The purpose of the model was to
calculate accurate loads for various parameters, not for public health advisories. Turbidity and
19

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
seasonality were significant variables for the bacteria estimations at all locations. No information
is available to confirm whether the model is applied. It is possible that the model is used at such
a local scale that it is not well publicized. Christensen et al. (2001) describe their experience with
the model.

4.1.11 Stormwater Model (Horry County, South Carolina)
A model used in South Carolina was developed with assistance from the Public Health
Department of the University of South Carolina. The model is a combination of two separate
methods, each of which estimates the bacteria concentration in the water each morning, mainly
using cumulative rainfall, rain intensity, preceding dry days, current weather conditions, and tide
information according to moon phase. One model uses a multivariable linear regression to
predict an estimated bacteria concentration. The other model uses the CART method to estimate
what range the bacteria concentration will be (high, medium, or low). The estimated bacteria
concentration range and the estimated bacteria concentration are combined to approximate a
third possible bacteria concentration, called the Ensemble prediction. The beach manger uses all
three outputs to determine the necessary notification level.

The model series is applied to 10 beaches in Horry and Georgetown counties. The model self-
extracts rainfall data from rain gauges at each beach and independently inputs weather and tidal
information. Data are continuously added to the model, which is constantly recalibrated. A more
intensive recalibration is underway to adjust to infrastructure changes. Recently, a Stormwater
outfall pipe was extended further into the ocean at one of the beaches. That is expected to
significantly affect the model's calibration. The beaches have a standard and constant warning of
health risks while swimming, meaning there is always some level of notification on the beaches.

4.1.12 Stormwater Model (Fairhaven Beach, New York)
New York also attempted modeling at Fairhaven Beach on Lake Ontario, motivated by
exceedances in 2005 and 2006. The managers were considering statistical predictive models that
would use rainfall, turbidity, and current speed and direction as dependant variables. Acoustic
Doppler Current Profiler equipment would be used to measure the currents. The following
2 years did not have as many bacteria exceedances; therefore, there was less of a demand for the
effort.

4.1.13 Stormwater Model (Sandy Point State Park, Maryland)
In the spring of 2006, Sandy Point State Park in Maryland received several days of heavy
rainfall, which closed the beach for the first time in recent knowledge. Now, the managers of the
park, which is on the Chesapeake Bay shoreline, are developing a model to regularly estimate the
concentration of bacteria in the beach water. The model developers collaborated with South
Carolina. Similar to the South Carolina models, multivariable linear regression, CART, and
ensemble methods will be used to develop a trio of results.

Intensive sampling was performed in 2007 and 2008 but, unfortunately, the sampling protocol
was recently changed, and sampling will need to be repeated using the new procedure. The
20

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
earlier sampling and model development demonstrated strong effects on bacteria from
temperature and salinity.

The model is expected to be complete in the next 2 to 3 years. The managers carefully chose the
software (GIS and R) for the model so that the model could be used by other agencies for
bacteria prediction. The managers intend for the model to be applied at other Chesapeake Bay
beaches in the future.

4.1.14 Virtual Beach Manager Toolset (Various Locations)
Virtual Beach is a set of decision support software tools developed to help local beach managers
make decisions as to when beaches should be closed because of predicted high levels of FIB.
EPA's lab in Athens, Georgia, is developing the tools in support of the BEACH Act. One
primary function of Virtual Beach is a data exploration and model builder tool that facilitates
developing a multivariable linear regression equation for predicting FIB densities for a beach on
the basis of environmental data such as wave height and water temperature. Another function of
Virtual Beach is to take a best fit multivariable linear regression model for a data set and
automatically pull in (from Internet sources) distributed data for the significant independent
variables of the model. The model user can also enter criteria for decision making that will
maximize correct predictions. Virtual Beach should benefit the beach-going public by helping
beach managers make more accurate and timely beach notification decisions. Virtual Beach's
testing and deployment was done with the collaboration of several Great Lakes states and
organizations (Wolfe et al. 2008). Virtual Beach is thoroughly documented in Volume II,
Chapter 1 of this report.

4.1.15 Receiver Operator Characteristic Curve Modeling in Boston
Harbor Beaches (Boston, Massachusetts)
In 1996 the Massachusetts Department of Conservation and Recreation and Massachusetts Water
Resources Authority began a study to intensively monitor a subset of beaches to better
understand variability in water quality and to develop predictive tools to make timely beach
decisions. Beach selection was in part based on the number and variety of urban pollution
sources including storm drains, CSOs, illicit sewer connections, boats, and animals (e.g., birds,
dogs). Four urban beaches were selected: Constitution Beach in East Boston, Carson Beach in
South Boston, Tenean Beach in Dorchester, and Wollaston Beach in Quincy.

The Department of Conservation and Recreation and Massachusetts Water Resources Authority
teamed up with a Harvard School of Public Health biostatistician to create multiple linear
regression models. The models incorporate tide, rainfall, sunlight, temperature, days since last
rain, wind strength, and direction. However the models were not able to explain 30 percent to 40
percent of variability in bacteria counts.

Subsequently, they used a prediction tool, Receiver Operating Characteristic (ROC) Curves,
which were developed in the 1940s to make sense of radio signals used to analyze radar images
during World War II. Beginning in the 1970s, they were recognized as useful for interpreting
medical test results. ROC curves have the advantage of being simple to use: they compare the
21

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
effectiveness of several swimming advisory triggers, including the previous day's Enterococcus
levels and rainfall. However, the analysis requires daily monitoring for a prolonged period.

The ROC curve analysis will identify two water quality conditions: swimmable and not
swimmable. ROC curve analysis evaluates the overall ability of an indicator variable to correctly
classify beach water quality as suitable or unsuitable for swimming, and it allows direct
comparison of different indicator variables by a common metric. It facilitates the identification of
a maximum threshold value for the indicator variable that produces a desired true positive rate
and false positive rate.

In the Boston Harbor analysis, the researchers concluded that the previous day's Enterococcus
levels are an inadequate indicator for determining beach use daily at every beach; antecedent
rainfall is usually a more accurate indicator and is available in real-time (Morrison et al. 2003).

4.2 OTHER PREDICTIVE TOOLS

4.2.1 Rain Threshold Levels
For many beaches, intensity of rainfall correlates to observations of poor water quality. Agencies
use historical data to identify the relationship between rainfall amount and bacteria levels and
then apply a threshold of rainfall beyond which the beach will be under advisement. How rainfall
thresholds are determined differs among states and localities.

Stormwater runoff is a primary pathway by which bacteria loads reach waterbodies and beaches.
The amount of stormwater generated depends on the characteristics of the drainage area and the
amount and intensity of rainfall. When significant rainfall occurs in a short period, more runoff is
produced, which can carry harmful pollutants in its course.

Many beach managers can directly relate the concentration of bacteria to the amount of rain
received in nearby areas. Using historical data, the localities are able to observe a relationship
between rainfall and resulting bacteria densities. Managers can then use that relationship to
identify an intensity of rainfall (or threshold) that is likely to cause exceedances of water quality
standards and the length of time in which the standards will be exceeded.

Table 4-3 shows the places that use rainfall measures to directly predict the need for recreational
waters notifications. Each broader jurisdiction or local beach can apply the determined rainfall
threshold differently. The sections below discuss how managers are using rain thresholds to
trigger advisories in anticipation of poor water quality.
22

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
Table 4-3. Beaches assessed using rainfall thresholds
Location
California
California
California
Delaware
Hawaii
New Jersey
New York City,
New York
Milwaukee,
Wisconsin
Scotland
Rainfall threshold
0.25 centimeters
(cm)
0.5 cm
0.5 cm in 24 hours
7.5 cm in 24 hours
Flash flood warning
2.5 cm in 24 hours
7.1 cm in 24 hours
0.5 cm in 6 hours
1.0 cm in 24 hours
(beach dependent)
2.5 cm in 24 hours
Beach specific
Notification
period
3 days
3 days
3 days
Until clean sample
3 days
24 hours
48 hours
24-48 hours
(beach dependent)
48 hours
Beach specific
Notification
General advisory
General advisory
General advisory
No-swimming advisory
Brown water warning
Closed
Yellow advisory level
Yellow or red advisory
level
No-swimming advisory
Area covered by
threshold
Los Angeles County
beaches
Orange County beaches
San Diego County
beaches
All beaches
Whole island or state3
Six beaches
All city beaches
All (five beaches)
Individual beach
a. Island-wide or state-wide, depending on weather patterns

California
Several counties in California preemptively issue recreational notification at their beaches using
a rainfall threshold.  Little rain falls through most of the year in Southern California.
Consequently, even a light storm can produce runoff with concentrated amounts of pollutants.
The rainfall thresholds are set much lower than the rest of the country—normally in the range of
tenths of an inch. Most counties post beach advisories for 72 hours after the rain threshold has
been met.

Connecticut
Using compiled bacterial analyses to predict water quality when certain conditions are observed
provides a way to establish a proactive public health policy. In a study by Kuntz and Murray
(2009) the authors reviewed the use of the geometric mean of various conditions including the
amount of rain in previous days, wind direction and speed, tides and high tide height, water
temperature, and drought or flood conditions for the season, different materials coming into the
swimming areas, and the location and amount of any sewage spills as possible predictors of
water quality exceedances. Only three events showed statistical significance (Chi-squared
p< 0.0001):

    •   Rain events of one inch or more in a 24-hour period under normal weather conditions

    •   Rain events of more than one-half inch in a 24-hour period under drought conditions
    •   When floatable material from distant sewage spills (i.e., grease balls) are present at a
       beach

Such evaluations enable a public health policy to be easily developed that restricts swimming
when certain identified conditions are present without waiting for sampling results to prove that a
problem exists.
                                            23

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
Delaware
Delaware developed its rainfall threshold levels in 1993 through intensive local sampling at
several locations and still uses the levels. Marine waters and coastal beaches are not affected by
rainfall or stormwater as much as the inland lakes and bays; therefore, the rain threshold is
relatively high compared to other states. The threshold level for Delaware marine waters and
coastal beaches is 3 inches (in), and a notification lasts for 24 hours. The inland waters of
Delaware have much lower rain thresholds. For the inland waters, closure length and rain
threshold are individually determined by lake.

Hawaii
Unlike most states using a rainfall threshold, Hawaii does not have a specified rainfall intensity
for which a notification is automatically issued. A Brown Water Warning is automatically posted
if a flash flood warning is issued by the National Weather Service (NWS) and is posted for all
beaches on the island for which the flash flood warning applies. In other times of heavy rain (or
steady rain over several days), a Brown Water Warning might be posted depending on staff
assessments of the weather and visual observations. For most Brown Water Warnings, a beach
advisory is issued for 3 days after the posting. However, notifications can last from a few days to
more than a week, depending on whether the rain continued, the amount of silt discharged, and
the water currents.

New Jersey
New Jersey has six beaches that close automatically if a rain threshold is reached. Four of the
beaches are affected by a pond outfall pipe, and the other two beaches are small, river beaches
affected by stormwater runoff and a marina. Five of the six beaches close for 24 hours if it rains
2.5 cm (1 in) within a 24-hour period, and 48 hours if it rains 7.1 cm (2.8 in) within a 24-hour
period. The thresholds of the four beaches affected by the pond outfall pipe are being reevaluated
because the pond outfall pipe has been extended 300 feet into the ocean. The pipe extension
could lessen the effect of the stormwater discharge and, consequently, raise the rain threshold.
The two beaches affected by the small river input are not affected by the lengthening of the
stormwater pipe.

Most beaches in New Jersey do not have a rainfall threshold level for notifications and rely on
traditional sampling to determine levels of potential risk.

New York City, New York
In New York City's complex coastal hydrologic setting, which is affected by tides, river flow,
stormwater, and sewage treatment outfalls, existing regional models have helped determine the
times and locations where recreational water quality standard will be exceeded because of
rainfall and associated stormwater and CSO bypasses. The New York City Department of Health
and Mental Hygiene uses predetermined rainfall limits and notification durations that are set
each year for each of its beaches. The department develops the rainfall limits from a multiyear
analysis of data from the New York/New Jersey Harbor Pathogens Model, combined with
sampling data. The rainfall limits and notification durations were tested and validated in the
development phase, and the department reevaluates and updates them each year as needed. The
Regional Bypass Model provides the department information about the effects of CSOs, sewage
24

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
pipe breaks or diversions, and consequential closure time needed. The predetermined rainfall
limits are considered to be conservative, and sampling is conducted weekly during the recreation
season. The New York/New Jersey Harbor Pathogens Model is also used to develop total
maximum daily loads in the New York/New Jersey Harbor.

The 2009 rainfall limits and notification durations are shown in Table 4-4.

Table 4-4. New York City wet-weather advisory information for 2009
Beach
South Beach, Midland Beach, Manhattan Beach,
Kingsborough CC
Orchard Beach
Coney Island
Gerritsen Beach, Whitestone Booster
All Bronx Private Beaches
Douglaston Manor
Rainfall limit
(inches)
1.5-2
>2.5
>2.5
>2.5
0.3-0.6
>0.6
0.6-2.5
>2.5
0.3-0.6
0.6-2.5
>2.5
Duration of notification
(hours)
12
24
24
12
18
40
36
48
30
60
72
Milwaukee, Wisconsin
Five public city beaches in Milwaukee are on Lake Michigan, and each beach varies greatly in
terms of water quality conditions and characteristics. The water quality near the northern beaches
fluctuates daily, whereas the water quality at the southern beaches varies less frequently. The
variations are influenced by a combination of nonpoint source pollution, stormwater outfalls,
CSO, and the hydrodynamic characteristics of the lake near the beaches. A statistical model of
bacterial densities was attempted at the southern beaches in Milwaukee, but it is no longer being
used because of poor sensitivity and inadequate funding.

After frequent observations of stormwater outfalls, a rainfall threshold level was established for
all beaches. All beaches operate on a threshold of one inch (2.5 cm) of rainfall within 24 hours
(data are from the NWS, 7 a.m.-7 a.m. accumulation), which results in a 2-day notification.
Milwaukee uses standard signs designed by the Wisconsin Department of Natural Resources to
post notifications at its beaches, which are assessed daily. Jf a sewage diversion occurs, the
beaches are closed for 4 days.

Scotland
The Scottish Environment Protection Agency has  developed and runs a real-time water quality
prediction tool for  10 beaches throughout Scotland. The tool has been in effect since 2004.  It
uses a set of site-specific criteria for rainfall and river flow to predict water quality on the basis
of historical data. Real-time predictions against the current European Union Bathing Water
Directive were correct or precautionary on 99 percent of days and correct for 82 percent of
compliance samplings during 2007, and 81 percent of compliance samples during the 2008
season (McPhail and Stidson 2009). The revised 2006 directive sets out more stringent bathing
                                           25

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
water quality standards that require increased model performance. New predictive tools are being
developed using decision tree statistical software and are intended to replace the existing
prediction tool by 2012. The agency plans to extend the system to additional beaches (about 15
more) by 2012. For more information, visit www.sepa.org.uk/water/bathing waters.aspx.

4.2.2 Notification Protocol
This section discusses the primary types of notification protocol that are used to predict beach
notifications.

Maine
Maine uses sampling and a risk-based assessment matrix (Maine State Planning Office 2004) to
determine the beach conditions and the probability of infecting swimmers. The Maine Healthy
Beach Program is in the early stages of training beach managers and community members how
to assess and monitor beaches. As the program progresses, beach managers and community
members will develop rain thresholds to apply to their beaches. Notifications are determined
using the assessment matrix, which is shaped to the needs of each beach. The matrix is similar to
a sanitary survey, where the assessor looks for certain beach characteristics and pollution sources
and either adds or removes points according to conditions. The total score puts the beach into a
category that determines what the action would be.

Rhode Island
Rhode Island has implemented notification protocol at two of its highest priority beaches. The
state used more than 10 years of water quality data as a foundation for protocol development.
The data focus on significant rain events (0.5 in), bacteria monitoring data (criteria exceedances),
monitoring frequency, flushing rates, and total closure days. Rhode Island will use the protocols
to streamline the beach advisory and notification process.

Washington
Washington uses a flowchart (see Figure 4-1) in addition to regular sampling to assess the
conditions of its beaches. Each county develops a process specific to the beaches in its area.
Factors influencing the managers' closure decisions include the sampling method, site history,
visual inspection, consultation with the beach coordinator, and beach characteristics such as
activity/usage frequency (Tiers 1 through 3). The Washington BEACH program also receives
information from the state shellfish program, when its model results for shellfish beds indicate
that nearby beach areas might be affected.
26

-------
Predictive Modeling at Beaches—Volume I
                                                                                                                        November 22, 2010
                                                             Washington  BEACH Program
                                                 Recommended  Decision Process for Notification
        Management decisions for public health and safety at recreational beaches should be based on specific water contact activities, usage, shoreline or sanitary surveys including site
        history and identification of possible impacts from pollution sources. Water quality monitoring for fecal contamination can be an additional tool used to investigate possible fecal
        contamination. It is importantto communicate risk to the public reflective of actual beach conditions.
    Beach is
    regularly
    sampled
    Beach is
        not
    sampled
  Is aver age of samples
      BELOW
104 enterococci/100 ml?
 Is average of samples
      BELOW
276 enterococci/100 ml?
                         Is there any reason to doubt
                        the accuracy of the samples?
                          (One result high and the
                         rem ainder near backgrou nd
                         levels, or extreme low tide?)
 A sewage event
 such as a CSO,
 pipe rupture, or
 mechanical failure-
 occur or 3 illnesses
 are reported at
 one beach.
V	J
                             ACTION ITEM
                           Resample & visually
                            inspect beach for
                            possible pollution
 Is average of resam pies
      BELOW
104 enterococci/100 ml?
                                                  Is average of samples
                                                       BELOW
                                                 276 enterococci/100 ml
                                                                                                         CAUTION
                                                                                                          SHELUiSrBVGrfflT HEV1SEB
                                                                                                          ACTION ITEM
                                                                                                        Post signs, complete
                                                                                                          press release,
                                                                                                          notify Jessica's
Source: Washington Department of Ecology

Figure 4-1. Flow chart used in Washington as part of its notification protocol.
                                                                                                                                      ACTION ITEM
                                                                                                                                      Remove Signs
POSTED FOR MONITORING RESULTS:
Post a permanent advisory when
monitoring results indicate a chronic
problem or seasonal geomean is over 35
mpn/lOOmL Remove signs once the
pollution source is addressed and
results indicate background bacteria
levels.
                                                                                                   POSTED FOR SEWAGE EVENT:
                                                                                                   Remove signs after the problem is fixed
                                                                                                   and sufficient time passes to allow for
                                                                                                   dilu tion or flu shing. C onsider a
                                                                                                   permanent advisory if a beach has a
                                                                                                   chronic pollution source.
                                                                                                   POSTED FOR ILLNESSES:
                                                                                                   Examine beach facilities and swimming
                                                                                                   area for possible reasons for illness.
                                                                                                   Refer to the sanitary / shoreline survey.
                                                                                                   Remove signs when source is identified
                                                                                                   and addressed. It is important not to
                                                                                                   open the beach until all sanitary
                                                                                                   concerns are addressed.
                                                                                   27

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
4.3 DETERMINISTIC AND COMBINATION MODELS

Regional Bypass Model (New York and New Jersey)
Information from the New York/New Jersey Regional Bypass Model is used in combination with
monitoring data and historical data to set rain threshold levels and to make beach advisory
decisions at many New York City beaches. The net result of tidal and current effects, combined
with stormwater outfall data after a rain event is considered each year to determine a rain
threshold level for each beach. The model can also be used to determine the effects of a sewage
pipe break or diversion. The model is considered conservative, and sampling is conducted
weekly during the recreation season to observe the protectiveness of each rain threshold level.

120-Hour Forecasting Model (Michigan)
The beach water quality forecasting model (a decision support system) is being developed to test
the ability to predict beach water quality 120 hours into the future using available parameters
from NOAA's deterministic forecasting models and forecast data sets (Great Lakes
Environmental Research Laboratory [GLERLJ/Great Lakes Coastal Forecasting System and the
NOAA/NWS National Digital Forecast Database). The model developed for an individual beach
using deterministic parameters will be run in an operational forecasting setting by the
NOAA/National Weather Forecast Offices where the beach is geographically located with
forecast information provided to the Beach County Health Department responsible for beach
management. Parameters used by the model are all forecasted by NWS and NOAA-GLERL out
to 120 hours in the future. Some of the parameters, for example, are surface and bottom current
speed and direction near the beach, wave height, sunlight, wind gustiness, dew point, cloud
cover, and rainfall. Data needed from the beach manager are the E. coli measurements and time
of sampling.

NOAA's GLERL, in Ann Arbor, Michigan, has been developing deterministic forecasting
methods that incorporate process modeling (river and lake dynamics) for Grand River, Michigan,
and Burns Ditch, Indiana, (http://www.glerl .noaa.gov/res/glcfs/gh/ and
http://www.glerl.noaa.gov/res/glcfs/bd/) and forecast input variables (rainfall, wind velocity and
direction, and wave height). GLERL is continuing to develop the river and lake model in the
Clinton River watershed in Michigan and Lake St. Clair. The lake model will include
unstructured variable grid patterns to allow for effective modeling of a complex shoreline.
GLERL will advance basic science by modeling the deposition and resuspension of suspended
solids in the near-shore zone using resuspension potential based on shear stress and the Grant-
Madsen boundary layer model. Resuspension potential can be substituted for turbidity
measurements, a key parameter in many Nowcast models.
28

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010



 5  Developing a Beach  Notification  Statistical

     Model	

Section 2.1 introduces statistical modeling for beach notification decisions. This chapter provides
more details on the elements required for developing a statistical model on the basis of a review
of available literature. Volume n of this report gives a more detailed discussion of information
on data sources, techniques for refining models, advanced statistical methodologies, and specific
applications of Virtual Beach software.


 5.1  CONSIDERATIONS FOR DEVELOPING A STATISTICAL MODEL
To develop a statistical model, the beach manager needs an existing monitoring program, a basic
knowledge of statistics, and statist 1 software ica (Francy et al. 2006). Equipment costs for data
collection and initial model development are typically not much more than are required for a
beach monitoring program, and much of the data required for statistical models are available
from other agencies or are easily measured by field staff.  Once a model proves to be useful, a
beach manager can invest in more expensive equipment to measure environmental conditions in
real time.

A statistical software package (such as SAS), Virtual Beach, or Excel can construct a
multivariable linear regression equation using a data set containing the necessary data. The result
of a multivariable linear regression statistical analysis will be an equation of this generic form:

                             7 = b0 + bjXi + b2X2 + bsX3 + E

Where 7, the dependent variable, is an FIB measurement; X\, X2, andX3 are independent
variables such as turbidity or rainfall amount; b\, b2, and b3 are regression coefficients; bo is the
intercept of the model; and E is random variation or unexplained error. The Es are assumed to be
independently and identically distributed from some distribution (frequently assumed to be
normal) with a mean of zero and a standard deviation of crE.IfY and Xs do not have linear
relationships, both 7 and the Xs can be transformed using the log function, the natural  log, square
root, square, or other transformations to ensure that 7and Xhave a linear relationship.

Often, models are refined and improved as new data are gathered (Frick et al. 2008). In Ohio, for
example, it has been found that splitting the swimming season into two separate periods (early
summer and late summer) produces the best predictive models (Francy and Darner 2007). That
approach of developing sub-models for complete data sets has also been applied to Huntington
State Beach in California, where pathogen dynamics during the wet and dry seasons can be
driven by different environmental factors (Boehm et al. 2007). When the model changes over
time, i.e., the independent variables important for predicting water quality at a site change over
time, statistical techniques like weighted regression (weighting most recent observations the
highest) and machine learning might be needed. The actual causal factors behind the temporal
changes could be related to watershed development, seasonal differences, climate change,
changes in large-scale hydrologic patterns, and so forth. The developers of Virtual Beach
software have achieved favorable results at certain beaches by using a rolling data set—
                                         29

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
developing predictive models using only the most recent 90 days of data, as described in Volume
II of this report.

5.2 SELECTING VARIABLES FOR A STATISTICAL MODEL
Many independent variables might be included in a statistical model, ranging from physical
hydrologic measurements (such as turbidity and air/water temperature), chemical parameters
(such as DO, pH, and specific conductivity), biological (chlorophyll a), meteorological (rainfall,
solar irradiance, stream discharge), beach characteristics (number of birds and bathers), and
pollution inputs (stormwater, sewage). The following sections provide information about
potentially useful independent variables and why they might be included in a statistical model for
a beach program.

5.2.1 Physical
Turbidity can be increased by stormwater input or stream inflow, wind speed and direction, wave
activity, swimmer activity, and other factors. Some of these factors might be associated with
input of pollution (input of stormwater or stream flow), resuspension of bottom sediments (which
might or might not be associated with higher indicator counts), or both (for example, swimmer
activity). No matter the cause of increased turbidity, if a correlation to FIB levels exists, it is
usually a positive correlation.

Water temperature can also be important in assessing the persistence of FIB in the environment,
because some are intolerant of extreme high or low temperatures. Unusual water temperature
stratifications or large changes in water temperature can be an indication of important water
inputs that could carry high FIB loads to the beach (e.g., stormwater or stream flow input).

Sunlight intensity or solar irradiance can also be an important independent variable because
some FIB are sensitive to sunlight and might not tolerate high levels.

5.2.2 Chemical
Conductivity is highly correlated with the concentration of dissolved solids in the water column.
In a freshwater environment, elevated conductivity could be associated with runoff or effluent
from a POTW. In a marine environment, changes in conductivity might be associated with input
from a freshwater tributary, POTW effluent, or tidal stage. DO is measured at some beaches and
can be associated with a variety of pollution sources. Also, increases in dissolved organic matter
and UV absorption coefficients can provide an indication of FIB contamination from tributaries
near a beach. pH has also sometimes proven to be a useful predictor of FIB levels. Fluctuations
in those or other chemical parameters can indicate surface water inputs to a beach and a potential
source of FIB.

5.2.3 Meteorological and Hydrologic
Rainfall data have historically been important in developing predictive tools for beach
management. For some beaches, characteristics of antecedent rainfall are the primary inputs
needed to predict the beach water quality. In addition to knowing the rainfall volume, the
30

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
intensity of rain and antecedent dry or wet days are also useful. The rain data can be for specific
locations in a watershed affecting a beach, at the beach itself, or at some other monitored rain
gauge for which data are available.

Often, a threshold level of rainfall exists beyond which elevated bacteria counts are likely. The
threshold varies from site to site and from region to region and is determined through sampling
and data analysis. Developing rainfall threshold levels is discussed in Chapter 6.

Conditions that affect surface-water runoff from rainfall include amount and intensity of rainfall,
land use and land cover, saturation level of soil, stormwater management systems and retention
ponds, and other factors. Intense rainfall, leading to overland flows, can erode soil and stream
sediments and transport entrained material, including animal feces.

Wind Speed and Direction
Wind speed and direction can play a crucial role in transporting FIB from a potential source
location to a beach. Wind especially influences wave formation. Waves are the main source of
energy that causes beaches to change in size, shape, and sediment type. They facilitate
movement of debris between the beach and the offshore zone. The three main characteristics of
waves are their height, wavelength, and the direction from which they approach. Bacteria in
bottom sediments or sand can be resuspended by wave action, increasing FIB levels in the
adjacent waters. For example, studies at beaches along the southern shore of Lake Michigan have
shown that E. coll densities in the sands of the swash zone are high, or higher, than those of the
water column (Whitman et al. 1999). When storm winds initiate waves and direct them onto
beaches, the foreshore sand is disturbed, and stored bacteria are released into the water, raising
the E. coli densities to levels above the allowable threshold for full-body contact (Whitman et al.
1999; Haack et al. 2002). In such an instance, the sand acts as a reservoir of FIB that might or
might not be accompanied by other fecal constituents.

Current Magnitude and Direction
Several studies have shown that the magnitude and direction (alongshore and cross-shore
components) of currents strongly influence FIB levels at beaches (Thupaki et al. 2010). A
longshore or littoral current runs parallel to the shore as a result of waves breaking at an angle on
the shore or as a result of larger hydrologic processes. The speed and direction of the currents
can be critical parameters that explain the transport of FIB from a nearby source to the beach.

Tide/Moon Phase
Depending on the location of the beach, the tidal phase can have an effect on water quality. Such
information is easy to find and could be a useful, independent variable in a statistical model.
Incoming tides are associated with onshore currents, which tend to prevent pollutants from
flowing seaward. Tidal flushing of an embayment might occur, moving pollutants out from
beach areas. However, in some cases (e.g., physical barriers or structures), tidal flushing can be
inhibited. Tidal activity has the potential to affect ambient water quality conditions either by
increasing or decreasing FIB levels. The increased range of spring tides has been shown to be
associated with increased indicator (enterococci) densities at 60 beaches in Southern California
(Boehm and Weisberg 2005). Likely sources of indicator bacteria include groundwater
discharges from the beach face as well as beach materials containing FIB such as wrack and bird
31

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
feces on sand newly inundated by spring tides. High tides also have been associated with
elevated levels of FIB at beaches, presumably from resuspension of FIB from contaminated
beach sands into the water column (Shibata et al. 2004).

River Flow
Increases in river flow are typically associated with rain events and runoff, and they could be
indicative of high pollutant loads. If the beach area is along the river itself, higher flows are
typically correlated with higher FIB levels. Flow rate has an effect on the travel time of
pollutants moving from a source to a beach and would therefore affect the timing of a potential
associated notification.

River Stage
It is well known that a key factor in causing episodes of short-term pollution with elevated FIB
densities is wet weather. A rise in the river stage is associated with rain. The river stage is related
to the river discharge and velocity. River stage has frequently been found to correlate with
bacteria densities or with a likelihood of exceeding the bacteria standard. At 10 beaches in
Scotland, a predictive tool has been developed that uses rainfall and river flow to predict FIB
densities (McPhail and Stidson 2009).

Lake Stage
The lake stage gives an indication that previous rainfall amounts might have increased the
volume of the lake. Rainfall and stormwater flow into a lake is usually associated with increased
FIB levels. Inundation of shoreline areas previously unexposed (and suspension of the bacteria
harbored in the sediments) can lead to decreased water quality.

Groundwater
Groundwater flow into beach water can carry FIB and enteric viruses from nearby septic systems
or leaking wastewater infrastructure. It has been shown at a California beach that microbes can
be transported through pore spaces in groundwater (Boehm et al. 2004). Groundwater flow into
the system could also be a dilution factor.

5.2.4 Other

Physical Location of the Beach (Bay or Shore)
The geographic setting of a beach should be considered when developing a statistically based
model. Knowledge of the location of potential FIB sources and hydrologic attributes of the
waterbody, as incorporated in a sanitary survey at a beach, are the basis of beach management
including the use of statistical models.

Sampling Methods
Factors such as sampling location, sample depth, and time of sample collection need to be
considered for data collection when developing a predictive tool. EPA's Environmental
Monitoring for Public Access and Community Tracking (EMPACT) study (USEPA 2005)
32

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
examines relationships between FIB measurements (using EPA-approved culture methods) and
such factors. Because of the predictable differences in microbial counts with spatial and temporal
factors, choosing a consistent sampling strategy is important.

EPA found that at three of the five beaches studied, no statistical difference existed among
bacteria densities on the basis of samples collected from different points parallel to the
beachfront, which spanned a distance of 60 meters, as long as they came from water of the same
depth.

The greatest single determinant of bacteria densities was found to be the depth zone (distance
from the shoreline at which the sample was collected). Bacterial densities became substantially
lower as one moved from ankle-deep to knee-deep to chest-deep water. That has important
implications for sample design and for public health. However, the study also found no
significant difference in indicator levels among samples that were taken at different depths below
the surface, such as between those taken 0.3 meter beneath the surface and those taken near the
bottom.

EPA observed significant declines in indicator densities from the morning to the afternoon (9:00
a.m. to 2:00 p.m.) at four of the five beaches investigated in the EMPACT study. That effect was
seen only on sunny days at one freshwater beach, but it was observed to be independent of
sunshine at three others—a freshwater beach and two marine beaches. Indicator levels at the
remaining beach, a West Coast marine environment, tended to be very low at all times.

The EMPACT study was conducted using culture methods. Similar data using qPCR methods
are preliminary and not yet available.

Pollution Inputs
Some pollution sources are measureable and could be included in a statistical model. They
include agricultural runoff volume during a rain event, stormwater flows, CSO/SSO discharges,
number of swimmers^athers, and the presence of sediment, measured as turbidity or total
suspended solids.

5.3 COLLECTING DATA FOR A STATISTICAL MODEL
For beach water quality modeling, free, publicly available meteorological data from a nearby
airport or weather station will typically form the base or foundation of the independent variable
data set. Collecting on-site hydrological (e.g., current direction, gauge height), meteorological
(e.g., rainfall, solar irradiation, air temperature), or water chemistry (e.g., turbidity, water
temperature, DO, specific conductivity) data requires additional personnel and financial
resources. The benefits gained by having on-site data should be considered against the costs of
deploying, maintaining, and collecting data from on-site equipment. Volume II, Chapter 2 details
an example of such a comparative analysis at South Shore Beach, Milwaukee. Improvements in
model accuracy from having additional on-site data are likely to be beach-specific. Microbial
data must be collected at the beach. Given the day-to-day variability of microbial data,
measurement frequency should be a minimum of several times per week over a period of months
to years to develop reliable models (see Frick et al. 2008 and Volume II of this report).
33

-------
Predictive Modeling at Beaches—Volume I                                    November 22, 2010


Several options for data collection are available, depending on the needs and resources of the
prediction efforts. Automatic samplers are desirable for frequent and regular sampling but might
not be useful (except in special situations) for collecting microbial samples because of required
holding times and sample degradation. Automated sampling equipment (such as ISCO samplers)
has been used to collect water samples for microbial measurements in streams and rivers
following rainfall events. Collection by such samplers is initiated automatically by signals from
rainfall sensors. Sensors, meters, data loggers, and telemetry can be used to communicate an
instantaneous reading to a network system or for readings at regular times when staff are
unavailable. Typical USGS gauging stations and those applied in Kansas and Georgia report
instantaneous water stage measurements at 15-minute or 1-hour intervals.

Maintaining a full weather station at the beach or at another location for the purpose of
developing and operating a predictive tool has been done at several locations in the Great Lakes.
In Lake County, Illinois, for example, the data are transmitted via satellite to the office. For
information about automated sampling and remote transmittal of data, see Volume II of this
report. Maintaining a local river stage gauge is feasible and has relatively low cost. Data can be
transmitted via satellite to the office. Managers can intensify sampling efforts for model setup or
calibration, and then lessen them as a prediction tool becomes established and proven to be
reliable.

The following list summarizes data collection parameters likely to be most useful for developing
a predictive tool.

   •   Possible for automated collection
       -  Stream or river:  flow, velocity, stage (gauge height)
       -  Waterbody: current speed and direction (often measures using Acoustic Doppler
          Current Profiler equipment), tidal phases, swimmers, wave height, lake level,
          underwater light sensors
       -  Weather: air temperature, wind speed and direction, precipitation, relative humidity,
          lake stage, water temperature and clarity, solar irradiance (sunlight intensity), and
          wave height
       -  Water characteristics: water quality (data sondes), turbidity,  salinity, conductivity,
          temperature, DO, pH
       -  Season, day of year, and moon phase are all easily obtainable

   •   Field observations
       -  Number of swimmers and animals on beach and in swim area
       -  Chlorophyll a
       -  Bacterial indicator
       -  Surface water flow conditions (are there visible changes in surface water flow?)
                                           34

-------
Predictive Modeling at Beaches—Volume I November 22, 2010

5.3.1 Widely Used Data and Data Sources

5.3.1.1 National Weather Service Weather Station Data
Weather data are frequently available and easily downloaded from local stations and airports in
various formats and for different variables. Meteorological weather data can be successfully used
for statistical predictive tool development and implementation. The location of a weather station
relative to the beach is important. The correlation between meteorology at the weather station
and the beach degrades as the distance of the weather station from the beach increases. To find a
weather station closest to a beach, use http://lwf.ncdc.noaa.gov/oa/climate/stationlocator.html.

Some variables, such as insolation or solar radiation, which influences the survival of FIB, are
not collected at some public weather stations and would therefore have to be collected locally.

Many municipalities prefer to have site-specific weather data collection equipment at the beach
to ensure more reliable predictive models. Especially on large lakes and coastlines, weather
conditions can be dramatically different on the shore compared to inland.

Historical data of water quality and microbial indicator levels are not as common, but they are
very useful if available. A historical data set is useful only if it includes bacterial indicator levels
coupled with other measurements and if it is of consistent and of acceptable quality.

5.3.1.2 U.S. Geological Survey Stream Gaging Station Data
The USGS maintains a network of stream gauging stations around the country. To see if a stream
is in the network, visit http://waterdata.usgs.gov/nwis/rt. Often, historical data can be
downloaded from that site. Data at the stations might include results from occasional bacterial
sampling, but rarely will it include results from an established, periodic bacterial sampling
regime, unless the station was designated for FIB monitoring under a special project. Station
records can include meteorological data, in some cases, and occasionally water chemistry
sampling data, such as pH, specific conductance, and DO.

Stream flow data include river stage, and a calculated river discharge or volume variable. Either
of those parameters could prove useful as an independent variable.

Real-time, daily stream flow conditions typically are recorded at either 15- or 60-minute
intervals, stored on-site, and then transmitted to USGS offices every 1 to 4 hours, depending on
the data relay technique. Recording and transmission times can be more frequent during high
flow events. Data from real-time sites are relayed to USGS offices via satellite, telephone, or
radio and are available for viewing within minutes of arrival.

5.3.1.3 Data from Sanitary Survey Investigations
Sanitary survey information can help beach managers synthesize all contributing beach and
watershed information—including water quality data, pollutant source data, and land use data—
so that sources of pollution can be identified.

Sanitary survey investigations will contribute to the knowledge of the hydrologic setting of a
beach and can result in information about which data are most important for developing a
predictive tool.

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
Beach sanitary surveys consist of collecting information on contributing sources of water and
water pollution in the watershed and gathering information. Depending on the level of detail, the
survey can include collecting and analyzing already available information, or it could include
new daily measurements and field observations. A sanitary survey provides a documented
historical record of beach and watershed water quality. It serves as a baseline snapshot to
compare future beach and watershed assessments, and it enables beach managers to perform
long-range water quality and resource planning. An official sanitary survey that has been well
documented and validated can provide information for prioritizing funds used to remediate and
eliminate pollution sources. The information in the survey can benefit stormwater program
managers, wastewater facility managers, local elected officials, local planning authorities,
academic researchers, and other beach and water quality professionals.

The sanitary survey information collected on Great Lakes beaches has been useful to the Great
Lakes states for developing statistical models. That information consists of measurements of
turbidity, nearby stream discharge measurements, longshore currents, and regular observations of
beach activity. It is important that observations be associated with microbial sampling because
that will serve as the response variable when developing a statistical model.

Data collected for a beach sanitary survey can facilitate the development of a predictive tool;
however, the purpose of a beach sanitary survey is different from that of a data-collection effort
specifically designed for statistical modeling. For that reason, data collected for a beach sanitary
survey might not always be useful in predictive modeling.

Information from a sanitary survey can be used to develop a predictive tool in several ways:

• If the pollution source seems to be mostly urban or stormwater runoff or CSOs,
developing a rain threshold level might work well and not require as many resources as a
statistical model.

• If leaky septic systems abound, sewage overflows occur, or other known sources of
human fecal material are present, managers will want to account for such factors in the
model or notification protocol.

• If weather conditions (rainfall, wind speed, intensity of sunlight) or water currents affect
bacteria levels at the sampling stations, managers should include such conditions in the
data-collection efforts for predictive tool development.

5.4 ENSURING DATA QUALITY
Developing and implementing a Quality Assurance Plan is recommended for a beach monitoring
program. Managers should develop a plan when developing a data set leading to an effective,
reliable predictive tool. Decisions regarding what data are of a useable quality are made when
developing a Quality Assurance Plan, which is then followed throughout the project.

Examples of details covered in a Quality Assurance Plan include the following:

• Laboratory methods (laboratories employ different methods, and their detections will
vary)

• Laboratory certification requirements
36

-------
Predictive Modeling at Beaches—Volume I November 22, 2010

• Sampling protocol (sampling time of day, sampling depth)

• Field sampling procedures and schedule

• Replicate sampling procedure and schedule

• Procedures for data collected by an auto sampler
• Data processing procedures and documentation

• Quality assurance/quality control procedures

• Information on using historical data (when acquiring and using historical data, the quality
of the data is important if the data will be compared with newly collected samples)

Data quality guidelines should apply to all data collected, including turbidity, water chemistry,
stream gaging measurements, and microbiological sampling data. However, more inherent
uncertainty exists within the microbiological samples. That uncertainty is in the sample
collection location, the sample collection method, and in the analytical method. For that reason,
duplicate sampling or composite sampling protocol can be incorporated into the sampling
procedures.

Immediately after data collection, an examination should follow to identify any observations
with high leverage or outliers in the data set. Points with high leverage could greatly affect the
fitting of model coefficients. An outlier, on the other hand, refers to an observation that markedly
differs from the other points in a data set, possibly because of data entry error. Influential points
both have high leverage and are outliers.

Figure 5-1 demonstrates those concepts in terms of a simple linear regression of Y on a single X
variable. Point A is an outlier. It does not fit the trend mapped by the rest of the data. However,
the X value of Point A is very near the mean value of X across the data set, so it would have very
little influence on the slope of the fitted regression line. It could pull the line up, thereby
influencing the value of the intercept but not the slope. Point B has high leverage, because its X
value is much greater than the X values of the other observations. However, it is in line with the
trend mapped by the other observations, so it has little influence on the slope or intercept of a
fitted regression line. Point C is the most influential point in the data set. Not only does it have
high leverage, but it also is an outlier, meaning it does not fit the trend mapped by the rest of the
data set. If one includes Point C in the analysis, it would have a large effect on the regression
model fit to the data. That observation should be examined closely to determine if it is indeed a
valid data point.
37

-------
Predictive Modeling at Beaches—Volume I
November 22, 2010
o
On °
o
o
O
O ° O
o
o o
o
X
Figure 5-1. Points A, B, and C represent an outlier, a high-leverage point,
and an influential point, respectively, in a regression context.

Several statistical techniques are useful for detecting such types of points:

• Index plot of residuals: This is the simplest way to visually identify outliers. The residual
measures how far off the general trend of the data a given data point lies.

• Index plot of leverages: The average leverage of a data point isp/n, where p is the
number of parameters, and n is the number of observations in the data set. As a general
rule, any leverage greater than 2p/n is relatively large and should be investigated further.

• Studentized (either internal or external) residuals: The advantage of studentized residuals
over normal residuals is that they have been standardized to have equal variance. An
internally studentized residual is based on the residual of a given observation when that
observation is included in the data set. An externally studentized residual is based on the
residual of a given observation when that observation is removed from the data set before
a regression line is fit.

• Cook's distance: This statistic combines the internally studentized residual and leverage
value for a data point and, thus, can be used to identify influential points.

• DFFITS: This statistic is similar to Cook's distance, but it is based on the externally
studentized residual and as such can more easily identify highly influential points.

Observations identified as extreme values using the above techniques should be noted for further
investigation. If highly influential points or extreme outliers are confirmed as stemming from bad
data, they should be removed to maximize model accuracy and efficiency.

5.5 CONDUCTING EXPLORATORY DATA ANALYSIS
Developers of statistical models should answer a number of key questions using exploratory data
analysis before beginning the actual model development. They include the following:
38

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
• Are there outliers or high leverage observations, as detailed in the previous section?

• Is the relationship between the FIB densities and the independent variables linear? If not,
consider a transformation of any of the explanatory variables to linearize the relationship,
or consider omitting the variable.

• Are any pairs of independent variables highly correlated? Such co-linearity can lead to
later problems with regression analyses.

• Which explanatory variables have strong univariate associations with FIB densities? In
multivariable linear regression, the association of each independent variable is being
measured in the presence of the other independent variables in the model. Some
independent variables alone would have a strong relationship to the response, but once
other independent variables are added to a model, the usefulness of the original
independent variable would be minimal. Likewise, two independent variables could
interact in such a way that neither is highly correlated to the response alone, but both are
highly correlated to the response if they appear in a regression model together. The lesson
is to interpret univariate correlations between the response and any independent variables
carefully.

• Was the strength of the univariate correlation between the response and independent
variables consistent through time?

A statistical model is developed using a water quality data set including FIB concentrations (the
dependent variable) and an assortment of independent variables. For each beach, summarize
beach data as the data are collected, so that errors can be quickly identified and corrected.
Because of their wide range, bacterial densities are generally log(10) transformed before data
analysis (Francy 2006) to ensure normality of the measurement. If data are available for one year
or more, start by summarizing the data for each year or for years of data combined. Include the
median, minimum, and maximum bacterial indicator concentration and the number of days the
standard was exceeded. Simple relationships to potential explanatory variables might begin to
emerge.

If data are available for less than one year, they can still be analyzed, especially if samples have
been taken frequently. Keep in mind that relationships between variables and bacteria densities
might not be apparent with a smaller data set. The correspondence between the predictions of the
statistical model and actual observations at the beach is the final proof of model
efficacy/reliability. The model is best evaluated using a data set outside the one used to develop
the model (Frick et al. 2008; Boehm et al. 2007). Developers of Virtual Beach used data sets as
small as 25-30 data points over a period of 60 days, or approximately a single swimming season
(Frick et al. 2008). Evaluation of models is covered in Volume II of this report).

Next, examine scatter plots of all measured independent variables versus bacteria densities. If a
continuous linear relationship is apparent in the scatter plot, the constituent could be useful as a
predictive variable. If the relationship is continuous but nonlinear, try transforming the variable
using a second-order polynomial, square root, logarithm, or inverse. If a linear relationship is still
not apparent after transformation, consider expressing the variable in categories or omit the
variable from consideration in the model. For variables that might not be continuous (cloud
cover, wave heights), use box plots to summarize mean responses by category. Analyze plots by
year and for all years combined (Francy 2006).

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
Before the multivariable linear regression model development phase proceeds, check the data set
to determine if it includes explanatory variables that are strongly related to each other (co-
linearity). If such pairs of independent variables exist, consider using only one in a multiple
linear regression model because high co-linearity among the independent variables leads to
poorly estimated regression coefficients. A general rule is to be cautious about correlation
coefficients that exceed 0.80. Often, the choice of which of the pair to retain for analysis comes
down to deciding which one is easiest or cheapest to measure or interpret.

Although not discussed in this volume, Boehm et al. (2007) have provided evidence that the
partial least squares modeling approach does not require that the independent data sets be
uncorrelated. Additional work is required to determine if the partial least squares approach can
be generalized.

5.6 DEVELOPING YOUR STATISTICAL MODEL
This section is a compilation of published and unpublished methods of developing a statistical
model for predicting beach advisories. Many of the recommendations come from publications of
the Ohio USGS and from the EPA developers of Virtual Beach. After constructing a set of
independent variables, a statistical software package such as Excel, Virtual Beach, or SAS can be
used to initiate multiple linear regression model development evaluation.

Several metrics can be used to measure model goodness-of-fit or explanatory power. The
coefficient of determination, R2, was historically used as the primary determinant of model
fitness. The R2 value summarizes the percent of the variability in the response variable that can
be attributed to the variability in the independent variables. Values of R2 can vary from 0 (no
variability explained) to 1 (all variability explained).

Statisticians recognized a flaw in R2, however. It always rises as more parameters are added to
the model. At some point, a model becomes over-parameterized—meaning the ratio of
explanatory parameters to data observations is too low. An over-parameterized model can
closely fit a set of training data (i.e., data used to generate the model), but it poorly predicts any
new observations outside the original data. In essence, it is too tightly tailored to the training
data.

To counteract that phenomenon, statisticians developed new metrics that include a penalty for
adding parameters. The adjusted R2 is one such measure, as is Mallows' Cp (Mallows 1973;
Frick et al. 2008), Akaike's Information Criterion (AIC) (Akaike 1974), and the Bayes
Information Criterion (BIC) (Schwarz 1978). Those metrics attempt to maximize the amount of
explained variability in the response variable while relying on a minimum number of parameters
in the model, thus avoiding over-fitting. If the modeler adds a parameter to an existing model,
and the parameter does little to reduce the unexplained variability in the response variable, the
metrics will identify the parameter as relatively useless. In comparing two models using
Mallow's Cp, AIC, or BIC, the model with the smaller metric has a better fit to the data, the
same fit with fewer independent variables used, or both. The metrics vary in how severely they
penalize additional parameters. In general, the order from least to most severe is as follows:

adjusted R2 < Mallow's Cp < AIC < BIC
40

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
There is also a corrected AIC (McQuarrie and Tsai 1998), which fits between the AIC and BIC
in terms of penalty severity. Model developers who choose to use the adjusted R2 as their
selection criterion will likely end up with larger models than if they had used the BIC as their
criterion.

In the model-selection process, modelers should consider the selection algorithm in addition to
the selection criteria as discussed above. One can choose from backward elimination, forward
selection, or stepwise procedure.

• Backward elimination: Start with the full model including all predictors, then remove the
predictor with highest p value greater than the significance threshold (usually 0.05). The
process is repeated until all p values are less than the significance threshold.

• Forward selection: Start with no variables in the model, then add the predictors to the
model one by one based on their p values. Choose the one with lowest p value less than
the significance threshold. The process continues until no new predictors can be added.

• Stepwise procedure: This is a combination of the backward elimination and forward
selection algorithms. At each step, a variable can be added or removed; any removed
variable still has a chance to reenter the model.

Models can be ranked according to any one of the criteria using one of the model-selection
algorithms, and the best models are then selected for further examination.

5.7 ASSESSING AND REFINING YOUR STATISTICAL MODEL
Given a candidate best model, it is important to analyze the residuals of that model to ensure an
important assumption of regression analysis is being met. Multivariable linear regression
assumes that the residuals are independent and identically distributed. A plot of the model's
residuals versus the fitted values of the response should indicate no obvious pattern (Figure 5-2).
0)
o
2 '
03
o o o
o o o
0° ° O
Fitted Response
Figure 5-2. An ideal residual plot fora multiple linear regression model
showing no discernible pattern among the residuals.
41

-------
Predictive Modeling at Beaches—Volume I                                    November 22, 2010
If a non-constant variance is evident in the residuals (Figure 5-3), the Box-Cox transformation
can be used.
  0)
 CO

"CO
                              o
                                     o
                                          o
                                o
                     o                °
              n    	n    	Q_
                  ^J             ^J
  CO
  <»
                                 i->
                                            O
                        0         O       °
                                        o
                 Fitted Response
Figure 5-3. A plot of regression residuals that shows indications of
heteroscedasticity and the need fora Box-Cox transformation.

The Box-Cox method transforms^ into (y^ - 1) / X, where the value of X is determined by an
iterative algorithm (Box and Cox 1964). The goal of the procedure is to find X such that the
model residuals have equal variance across the range of the fitted values (i.e., they appear as in
Figure 5-2, rather than as in Figure 5-3, where variance increases as the fitted response gets
larger).

Once the residuals of the candidate model are shown to be independent and identically
distributed, the next step is to quantitatively  assess the accuracy of model predictions.  Such an
assessment would ideally use data other than that used to develop the model. The data used to
generate the model are called the training data set, while the  testing data are used to assess model
predictive accuracy.

Because the statistical model will be used for making public health decisions, the most critical
assessment question asks if the model can make correct decisions to post beach notifications (a
notification is defined as either an advisory or a closure). In a multivariable linear regression, the
total proportion of variability in the response explained by all independent variables in the model
is expressed as R2, and if the model is applied to the testing data set, it could be used as a
measure of the predictive strength of the model. Another test of model performance is to
examine the number of Type I and Type II errors. A recommendation to close a beach when
there is not actually a threshold-exceeding density of bacteria (a Type 1 error, or false positive
result) would be considered conservative from a public health point of view. However, false
positives deprive the would-be swimmer of the enjoyment and use of the beach and can have
adverse economic effects on business owners in near-beach locations and erode public
confidence in public health decisions. Type II errors (or false negatives) result in beaches being
opened (or not having a notification in place) when bacterial  levels actually exceed the standard.
By evaluating how many false positive and false negative predictions a model produces (or the
percentage of errors compared to correct predictions), the analyst can begin to determine if the
model is indeed good enough for reliable use.
                                           42

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
In addition to evaluating model performance on the basis of a comparison of model predictions
with known measurements, the model's performance should also be compared against a basic
and commonly applied method used for assessing recreational water-quality—the persistence
model. That model uses the most recent bacterial measurement (typically from the previous day)
to predict today's water quality.

Factors that influence the fate and transport of FIB to a specific beach site can change over time.
Land use alterations, degradation or improvements to infrastructure, adding near-site sources,
and other conditions can cause shifts in the underlying processes driving FIB densities. Even
without ostentatious changes in the watershed or near-site geography, one should continually
monitor model performance for signs of degrading performance through time.

5.8 IMPLEMENTING YOUR STATISTICAL MODEL
How are the outputs from predictive tools used in decision making? Model developers and beach
managers, who consider protecting public health their top priority, continue to creatively address
that important question.

Model outputs can be estimated FIB densities, a probability that the water quality standard will
be exceeded, or a daily notification status that is to be posted. Chapter 4 provides an overview of
several beach models in different stages of use. In the case of the Nowcast model developed by
the USGS in Ohio, a probability threshold is set for decision making (e.g., at 25 percent
probability of exceeding the water quality standard, close the beach). Such probability thresholds
can be adjusted and refined to maximize the number of correct predictions or to minimize false
negative or false positive outcomes. At those Ohio sites, water quality sampling continued during
model development so that decision accuracy could be tested. Given the low threshold
(25 percent rather than 50 percent or higher), an emphasis was clearly placed on minimizing
false negatives so that public health would be protected. The process of model development,
implementation, validation, monitoring, and refinement is used not only for multivariable linear
regression analyses, but also for other models of any type as a groundwork for predictive tool
implementation.

5.9 ADDITIONAL RESOURCES
Procedures for developing a statistical model are outlined in the USGS document, Procedures
for Developing Models to Predict Exceedances of Recreational Water Quality Standards at
Coastal Beaches (Francy and Darner 2006). Donna Francy and others with the USGS in Ohio
have published work regarding the Nowcast system of predicting and forecasting, as have
Richard Whitman and others with the USGS in Indiana, regarding their work in Indiana and
Illinois. Olyphant and Pfister (2005) also published work for the Swimcast model in Lake
County, Illinois. Those are all good sources of information. Another good source on developing
a statistical model is Chapter 9—Nowcasting recreational marine environments (Boehm et al.
2007)—in Statistical Framework for Recreational Water Quality Criteria and Monitoring
(Wymer 2007). That chapter discusses predictive variables that should be considered in marine
environments, and statistical methods other than multivariable linear regression models (e.g.,
Partial Least Squares Regression) that should be considered. Language conventions used in this
43

-------
Predictive Modeling at Beaches—Volume I                                     November 22, 2010
report might differ slightly from other publications. Those resources will provide more technical
and detailed information than is covered in this report.
                                            44

-------
Predictive Modeling at Beaches—Volume I November 22, 2010

6 Developing Rain Threshold Levels and a Rain

Notification Protocol

Many beach managers have noticed a relationship between the concentration of FIB at a beach
and the amount of rain received in nearby areas. That relationship can be quantified as an amount
or intensity of rainfall (a threshold level) that is likely to cause exceedances of water quality
standards at a beach, and the length of time over which the standards will be exceeded. Rain
threshold levels can be used as the basis for a beach notification.

Beach managers can also develop a series of questions, or a decision tree, considering factors
other than rainfall, to guide beach notifications. Such evaluations use water quality sampling,
rainfall data, and other environmental factors that could influence the FIB levels, such as
proximity to pollution sources, wind direction, visual observations, or other information specific
to the region or beach. In this document, that process is referred to as developing a notification
protocol.

Exceedance of a predetermined rain threshold level might be one piece of data that is considered
in a notification protocol, or it might be the only piece of information considered in a notification
protocol.

Guidelines developed by the beach manager should allow for consistent decisions resulting from
protocol that can be repeated and tested to determine, for a rainfall event, whether a beach should
be placed under a notification. The guidelines might also recommend how long the notification
should persist and when follow-up sampling for FIB should occur. Some examples of places that
have notification protocol are given in Section 4.2.

The process of developing a rain threshold and notification protocol has three steps:

1. Collecting data for rain threshold levels and notification protocols
2. Developing a rain threshold level
3. Developing a beach notification protocol

6.1 COLLECTING DATA FOR RAIN THRESHOLD LEVELS AND
NOTIFICATION PROTOCOLS
To develop rain threshold levels and notification protocols, a large amount of site-specific data
on rainfall amounts and FIB sampling results is needed. The investigator must either install a
local rain gauge or select a rainfall station that adequately represents local conditions affecting
the beach. That technique, at its foundation, assumes that the magnitude and duration of rainfall
determines surface water runoff and, thus, FIB loadings to the beach. Data from several rain
gauges can be compared before choosing the one whose data best relate to on-site FIB measures.

Key rainfall characteristics when developing rain threshold levels include the following:

• Amount of rainfall
• Storm duration
45

-------
Predictive Modeling at Beaches—Volume I November 22, 2010

• Intervening periods expressed in dry days

• Lag time between rainfall record event and receiving beach response

• The season/times of year when the beach receives the most use
Meteorological stations commonly collect and submit daily or hourly data. Hourly stations are
preferred especially when dealing with small- to medium-sized watersheds. Additional factors
that can be incorporated into the decision protocol include river or lake stage, tide, and current
information.

When initiating development of a rain threshold level, it is important to understand wastewater
and stormwater infrastructure affecting a beach. Some good questions to ask are as follows:

• Is the sewer collection system combined with the wastewater collection system and
routed to a WWTP? If so, what is the level of treatment?

• What i s the capacity of the WWTP?

• How often is that capacity exceeded and what amount of rainfall causes that exceedance?

If an amount of rainfall produces a stormwater volume that exceeds the capacity of the WWTP,
parts of the treatment process can be bypassed. In watersheds where stormwater is not routed
through a WWTP, stream surges can be evident after even small rain events, transporting
contamination accumulated on the land surface since the last rain. Flow from a combination of
such infrastructure scenarios can contribute to FIB loadings at a beach. It is important to also
determine if the impacts of rainfall and loadings have a strong seasonal component and, if so,
why that might be the case.

FIB data supporting the development of rain threshold levels are generated from water column
densities obtained from ambient or targeted monitoring programs. FIB densities can be used in
the analyses as direct observations or can be transformed as geometric mean values.
Transformation of FIB observations before developing regression models or exceedance analyses
can allow direct comparison to state water quality standards for recreational uses. A rain
threshold level can be developed for one or several FIB species. Fecal coliform, E. coli, and
Enterococcus bacteria are common indicator species used in those models.

For relatively small watersheds, it is common to use a single rainfall station selected to be
representative of storm conditions experienced by the upstream drainage area. The investigator
selecting a representative rainfall station takes into consideration its location in the watershed
and its ability to capture the most dominant rainfall events (magnitude and duration) that could
generate relatively high storm runoff volumes and transport FIB loadings to the beach. For
example, in Delaware, the Department of Natural Resources and Environmental Control selected
a rainfall station because of its central location in the watershed and strong statistical correlation
with observed FIB densities at the beach site of interest (Delaware Department of Natural
Resources and Environmental Control 1998).

When dealing with nonpoint-source-dominated systems, antecedent rainfall conditions can be
very significant factors in explaining the relationship between rainfall and FIB densities. Kuntz
(1998) found higher FIB densities during periods of low rainfall or near-drought conditions than
during seasons of normal rainfall. However, Ackerman and Weisberg (2003) found that an
46

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
antecedent dry period in Southern California had a minimal effect on FIB levels, given the same
storm intensity. Storm event duration might also be a key factor in explaining rainfall-water
quality relationships. For a watershed in Delaware, an examination of the relationship between
cumulative rainfall over two different durations (24 and 72 hours) and FIB densities shows that
the 24-hour cumulative rainfall data yield a statistically stronger relationship than the 72-hour
cumulative rainfall data (Delaware Department of Natural Resources and Environmental Control
1997).

6.2 DEVELOPING A RAIN THRESHOLD LEVEL

6.2.1 Frequency of Exceedance Analysis
Frequency of exceedance analysis is a rainfall-based method that is used to develop rain
threshold levels (also called rainfall-based alert curves). A rain threshold level is the smallest
amount of rainfall likely to result in an exceedance of the water quality standard. A realization or
prediction of that amount of rainfall would trigger a beach notification. Such a method can be
applied to situations only in which historical rainfall data and corresponding FIB data exist. After
establishing a relationship between rainfall amounts and FIB densities, developing guidelines or
a decision protocol for a beach notification is the next step.

Analyzing rainfall data by storm events and identifying a representative data set yields storm
characteristics to consider in developing a rain threshold level (e.g., station location, storm
duration, intensity, antecedent conditions). Once a representative data set has been obtained,
divide the total amount of rainfall over a certain period into segments that range from no rainfall
to an upper limit representative of the rainfall record, type of storms, and season. For each
rainfall volume category, compare the observed FIB measurement to the water quality standard.

As with any model, the rain threshold level should be validated by testing predictions at beach
locations of concern. Those validation exercises will aid in selecting the most appropriate rain
threshold level.

6.2.2 Regression Modeling
Another way to develop a rain threshold level is using a simple linear regression (a single
independent variable) relating FIB levels to rainfall amounts. Rainfall-based regression models
require relatively large monitoring data sets of both rainfall and FIB densities. The basics of that
process are described in Chapter 5.

6.3 DEVELOPING A NOTIFICATION PROTOCOL
The rain threshold level is used along with other site-specific information to develop the
notification protocol for a beach. Some examples of ancillary information to be considered are
observational data (e.g., indications of WWTP bypass), communications with WWTP managers,
wind direction, tidal phase, river stage, long shore current direction, and season. Decision
protocol will typically specify the conditions requiring a beach warning/notification, closure, or
increased water testing.
47

-------
Predictive Modeling at Beaches—Volume I                                    November 22, 2010
Because of the seasonality of recreational activities and rainstorm characteristics, rainfall
threshold levels and notification protocols can be developed for targeted seasons. Developing
predictive tools for various seasons can significantly enhance the predictive capability of the
tool. Milwaukee developed beach closure rules on the basis of an analysis of fecal coliform and
E. coli densities collected daily (Monday-Friday) during the June-September season (City of
Milwaukee Health Department 1998). Another example of this is Ohio's model for beaches on
Lake Erie. One model is applied for early summer, and another model has been found to be most
effective for late summer (Francy 2009).

Consider whether the decision protocol is still in the information collection phase or in the
implementation phase. If it is in the information collection phase, decision protocol performance
should be tested frequently and opened for readjustment as needed. In the implementation phase,
performance testing does not need to be as frequent unless a significant change in conditions has
occurred. Once in the implementation phase, establish a schedule for testing and reevaluating the
decision protocol and for recording performance results.
                                           48

-------
Predictive Modeling at Beaches—Volume I
                                         November 22, 2010
 7  Common  Challenges and Obstacles	

A review of predictive tools for beach notifications reveals different challenges for each beach.
Several beach managers have expressed concern that predictive statistical models are cost
prohibitive because they require a commitment of resources (data collection, use of software,
expertise) and there is no guarantee that a useful predictive tool will be produced. Efforts to
develop statistical models in some locations have been aborted. Table 7-1 provides a collective
list of challenges and problems reported with predictive tools.

Table 7-1. Issues concerning statistical predictive models and tools
           Model setup
       Model application
   Administrative concerns
 • Difficulty in achieving good
   calibration, degree of accuracy, or
   correct predictions needed
 • Determining necessary inputs
 • Establishing nearby rain gauges
 • Intensive sampling to explore and
   establish statistical correlations
 • Additional sampling required for
   validation period
 • Necessary monitoring equipment
 • Computer programs for analysis
   (statistics software package)
 • Long period for setup
 • Unknown outcome of model
   accuracy before necessary funding
 • Water quality standard
   exceedances inconsistent or
   sporadic, not enough data for the
   times when FIB levels are high
• Monitoring equipment for inputs
  requires maintenance
• Challenges placing equipment in
  secure and meaningful location
• Accuracy of prediction and
  accuracy of analytical method
• Staff is still needed to take
  samples
• Prediction accuracy varies
  depending on environmental
  (weather) conditions
• Model recalibration is necessary
  for changes in infrastructure and
  land development
• Knowledgeable staff are
  needed to understand and run
  the model
• Expertise is needed for model
  development and maintenance
• Equipment,  sampling,  staff
  costs
• Public confidence
• Staff time used in sampling
  procedures
• Communication and storage of
  data, if data loggers are used
Collective experience suggests that statistical modeling will improve when paired with an
enhanced understanding regarding the relationships between weather conditions and bacteria
residence time, source-concentration, and water flow directions and hydrology at the beach and
in the contributing watershed. Managers of the most successful models discussed in Chapter 4
have documented and understood pollutant sources and how they are manifested at their beaches
(e.g., Valley Creek at Port Washington Beach, stormwater outfalls in New York City). Having a
good understanding of the important sources and relevant fate and transport mechanisms can
greatly improve model prediction accuracy,  especially if that knowledge can be used to direct
data collection efforts.  In some cases, developing a statistical model leads to identification of
control measures or correction of infrastructure problems. Even though complex physical,
biological, and chemical processes are being represented by a relatively simple statistical
relationship, model accuracy can be augmented if the important environmental processes are
considered, leading to the monitoring and data collection of useful explanatory variables. Rain
threshold and river stage models have proven histories because clear and direct relationships
exist between measurable inputs (e.g., rain, turbidity), and measured FIB densities.
                                            49

-------
Predictive Modeling at Beaches—Volume I                                    November 22, 2010
                           This page is intentionally blank.
                                           50

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
8 Beyond Statistical Modeling
The theme of Volumes I and II of this report is using statistical models and other predictive tools
to produce timely estimates of water quality at beaches.

Different types of statistical models are used successfully at many beaches. Development of
deterministic models has also advanced. In deterministic models, algorithms are applied that
reflect natural processes such as sediment resuspension as they are understood. Deterministic
models typically require expertise to implement and would likely be challenging for local beach
managers or public health agencies to set up, run, and achieve the accuracy and reliability needed
for protecting human health.

An important distinction between statistical models and deterministic models is that in a
statistical model, the relationship between water quality and the predictive variable does not have
to be understood. In deterministic models, the system being modeled must be fully or partially
understood, because applying modeling algorithms that reflect natural processes is what makes
them work. Blending the performance of statistically based models with the ability of
deterministic models to account for complex variations in circumstances can improve the
successful use of statistical models in making accurate and timely predictions of high bacteria
levels or water quality standard exceedances.

Researchers are making advances in predictive tool development throughout the United States.
The advances are in the areas of data collection and telemetry, data sources, selecting
independent variables, quantifying natural processes, statistical methods, computing
technologies, and understanding hydrological influences. Components of models are being
advanced and fine-tuned at such a rapid pace that it is difficult to capture all developments in this
chapter.

8.1 FORECAST MODELING
The prospect of days-in-advance forecasts of beach water quality raises the endeavor to a new
level. Being able to incorporate forecasting technologies into the process of making beach water
quality decisions further increases the utility of predictive modeling to the beach-going public.

Weather forecasting is a core activity of NOAA and is one that is itself based on the use of many
models, including deterministic predictive meteorological models. NOAA forecasts include
predictions for many of the variables known to directly or indirectly influence water quality at
beaches (cloud cover, precipitation, wind velocity, and direction). Using available forecasts for
current speed and direction, water and air temperature, wave height, precipitation, stream or river
stage combined with knowledge from other models, it is hoped that beach water quality will soon
be forecasted in the Great Lakes. NOAA is collaborating with USGS, EPA, and state agencies to
develop a system for water quality forecasts at any swimming beach. The system will involve
weather predictions from NWS, hydrodynamic predictions from the Great Lakes Environmental
Research Laboratory, and site-specific factors for individual beaches (Schwab and Bedford
1994). For more details on the system, visit
http://www.glerl.noaa.gov/res/Centers/HumanHealth/near shore.html.
51

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010
 8.2 REGIONAL MODELING
In general, statistical models and rain threshold level determinations are beach-specific. It would
be beneficial if the techniques could apply to a larger geographic area such as a shoreline of
several miles or more. On the southern shore of Lake Michigan, NOAA's regional model is
being combined with locally collected beach information to enhance predictive capabilities on a
regional level.


        8.2.1 Great Lakes Finite Element Nested Models
In the mid 1990s, NOAA developed a 5-kilometer-scale finite element model of all five Great
Lakes. The model was calibrated to yield real-time predictions of three-dimensional water
particle velocity; the three-dimensional temperature field; the water level distribution and the
wind-wave height, length, period, and direction; and resuspension, transport, and deposition of
bottom sediments on the basis of wave and current conditions. Inputs to the model were provided
by satellite feed of NOAA weather data (Schwab and Bedford 1994).

NOAA, USGS, Ohio State University, and EPA researchers have applied and calibrated
100-meter grid nested models in NOAA's 5-kilometer-scale finite element model of Lake
Michigan at three locations to provide greater resolution to the effects of stream discharge
plumes on nearby beaches. Water quality at Grand Haven State Park and other beaches in the
vicinity of the mouth of the Grand River is overwhelmingly influenced by the direction of the
plume from that major river, which carries the discharge from a watershed of more than 5,000
square miles. The application and calibration of the nested 100-meter grid in that setting provides
real-time updates on the trajectory of the discharge plume four times a day (Schwab and Bedford
1994).

Several statistical  models in southern Lake Michigan use outputs such as wave information and
current direction from the model, exemplifying how data from a deterministic model can provide
input data to a  statistical model.


 8.3 HYDRODYNAMIC AND FATE AND TRANSPORT MODELING
Nevers and Boehm (2010) provide an overview of using deterministic models to predict FIB
densities in surface waters. Nevers and Boehm underscore the value of fate and transport models
for increasing and refining the understanding of mechanisms that lead to observed variations in
water quality but that are not well defined. That would be of value in instances where the
characteristics  of a site produce poor or counterintuitive empirical relationships or sites where
statistically based models, or decision tree models, fail to achieve satisfactory results. The
authors also provide guidance on parameterization of many mechanisms applicable to fate  and
transport modeling of fecal indicators including advective flux, dispersion, inactivation, growth,
predation, adsorption/desorption, deposition/resuspension, and loading. Their work also
discusses empirical statistical models as described in Chapter 5 of this report.
                                          52

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010


         8.3.1  Hobie Beach
Hobie Beach, near Miami, Florida, used a deterministic model to further investigate observed
contamination problems for which no apparent cause had been identified. At the site, elevated
indicator bacteria densities had been a recurring problem, despite the absence of any identified
specific point source of pollution.  Researchers at the University of Miami (Zhu 2009) employed
a predictive numerical model of water column proxy densities for a nonpoint source recreational
marine beach. In the first of two phases, the model was used as a tool to investigate microbial
processes and source functions and the relationship between observed indicator densities and
identified sources  in historic data sets. The microbial process model was based on the combined
application of hydrodynamic and advection-diffusion equations for transport and mixing. The
model calculated the reduction in culturable indicator bacteria densities as a first-order decay
process solely on the basis of sunlight deactivation.

Quantitative estimates of enterococci loadings from human shedding and animal fecal (avian and
canine) inputs were integrated with beach-use data to estimate the source strength and timing.
Model  simulations illustrate the transient concentration plumes associated with heavy bather use
and animal fecal input events. Model outputs also include current vectors at varying tidal stages
and flow conditions. The outputs of the model show that the source of high FIB densities at the
beach were from a nearby dog beach. That has clear implications to beach management so that
water quality problems could be avoided. Deterministic models such as this might be applicable
in a variety of settings without the requirement of an extensive data history (Zhu 2009).


         8.3.2  Other
The Nevers and Boehm (2010) report also addresses the uses of deterministic model functions in
quantitative microbial risk assessments, which are stacked  deterministic models that include fate
and transport models along with modeling the infectivity of various pathogens for which
epidemiological data are not available.


 8.4  NEW USE OF EXISTING DATA AND  INNOVATIVE ANALYSIS


         8.4.1  Use of Hydrography (NHDPIus Network) and Land Use Data
In a project conducted in 2007-2009, Research Triangle Institute (RTI2007), under contract to
EPA, integrated calculated outputs from the NUDPlus geospatial data with a statistical model to
relate watershed and waterbody characteristics to possible  sources of pathogens that can affect
water quality at  a beach. RTI developed a multivariable linear regression model that relates
characteristics of a river flowing into the ocean near the beach to water quality at a beach
influenced by the outfall of that river. It incorporates explanatory variables from NHDPIus
(elevation-based catchment for each flowline in the stream network, cumulative drainage area
characteristics for  example), calculated time of travel from potential watershed sources, and
publicly available  meteorological, marine, beach, and flow data.
                                          53

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
The project employs an SAS-based regression system to facilitate a multivariable linear
regression analysis. The system was applied and tested at several sites for which adequate data
sets were available:

• Santa Barbara, California (coastal)

• Sunset Beach, Oregon (coastal)

• Little Calumet Watershed, Indiana (freshwater)
• Huntingdon Beach, Ohio (freshwater) (also a Virtual Beach site)

RTFs approach achieved results comparable to applying statistical relationships using other
available data and other statistical regression software. As with most predictive models, the best
results in terms of correct predictions were obtained with data from the intensively monitored
sites over short periods. Although including time of travel information from potential watershed
sources did not improve model performance as measured by R2 in most settings, results were
obtained with good performance in terms of false negatives, false positives, and correct
predictions, and using publicly available data.

Routine monitoring data for FIB were made available to model developers, and no data were
collected specifically for the purpose of developing the predictive model. RTFs approach is a
good example of broadening the range of potential predictive variables for regression models by
using publicly available data from NHDPlus. Applying watershed land use data and calculated
river hydrology characteristics as sources of empirical model inputs is an innovative approach.

8.5 NEURAL NETWORKS AND GENETIC ALGORITHMS
An artificial neural network (ANN) is a construct of software that partially mimics the workings
of a biological neural network. ANNs are often applied as nonlinear statistical data modeling
tools. They can be used to model relationships between inputs and outputs or to find patterns.
The technique is often useful when relationships between inputs and outputs are complex and not
clearly understood. An ANN learns relationships between inputs and outputs using a learning
algorithm.

ANNs have been used in a handful of studies for predicting pathogen and pathogen indicators in
recreational beach and watershed surface water. He and He (2008) successfully used ANNs to
predict FIB at marine recreational beaches receiving watershed baseflow and stormwater runoff
in Southern California. Mas and Ahlfeld (2007) observed that ANNs performed better than
ordinary least squares and binary logistic regression methods for predicting surface water fecal
coliform concentrations in a mixed land use watershed. Jin and Englande (2006) used ANNs and
logistic regression to predict swimmability for a brackish waterbody. They observed that ANNs
performed better than logistic regression especially when conditions were not safe to swim.
However, ANNs were successful in forecasting not safe to swim conditions only 53.9 percent of
the time. Jin and Englande note that the poor performance was probably because most of the data
used in developing the model were collected during safe to swim conditions.

Genetic algorithms are search methods inspired by evolutionary biology. The algorithms are
based on techniques such as inheritance, selection, crossover, and mutation used by nature for
evolution of species. While genetic algorithms cannot not be used directly for modeling beach
54

-------
Predictive Modeling at Beaches—Volume I                                    November 22, 2010
pathogens, they can be used to evaluate and select models developed by other modeling
techniques. For example, as the number of independent variables increases, the number of
possible models to be evaluated by multivariable linear regression increases geometrically
resulting in degraded computer performance. In such cases, genetic algorithms can be used to
assist multivariable linear regression. Rather than evaluating every possible model, genetic
algorithms would intelligently select models to be evaluated, resulting in fewer models needing
evaluation.  The objective of genetic algorithms is to select the near best model as opposed to
finding the best model. The newer version of Virtual Beach uses genetic algorithms to assist
multivariable linear regression in reducing the number of models to be evaluated when the
number of independent variables is large.
                                           55

-------
Predictive Modeling at Beaches—Volume I                                    November 22, 2010
                           This page is intentionally blank.
                                           56

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
9 References
Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on
Automatic Control 19(6):716-723.

Ackerman, D., and S.B. Weisberg. 2003. Relationship between rainfall and beach bacterial
concentrations on Santa Monica Bay beaches. Journal of Water and Health 01(2):85-89.

Bae, H.K., B.H. Olson, K.L. Hsu, and S. Sorooshian. 2010. Classification and Regression Tree
(CART) analysis for indicator bacteria concentration prediction for a California coastal
area. Water Science and Technology 61(2):545-53.

Boehm, A.B., S.B. Grant, J.H. Kim, S.L. Mowbray, C.D. McGee, C.D. Clark, D.M. Foley, and
D.E. Wellman. 2002. Decadal and shorter period variability and surf zone water quality
at Huntington Beach, California. Environmental Science and Technology 36(18):3885-
3892.

Boehm, A.B., G.G. Shellenbarger, and A. Paytan. 2004. Groundwater discharge: Potential
association with fecal indicator bacteria in the surf zone. Environmental Science and
Technology 38(13):3558-3566.

Boehm, A.B., and S. Weisberg. 2005. Tidal forcing of enterococci at marine recreational beaches
at fortnightly and semidiurnal frequencies. Environmental Science and Technology
39(15):5575-5583.

Boehm, A.B., R.L. Whitman, M.B. Nevers, D. Hou, and S.B. Weisberg. 2007. Nowcasting
recreational water quality. In Statistical Framework for Recreational Water Quality
Criteria and Monitoring, ed. L. Wymer. Wiley-Interscience, Chichester, West Sussex,
England.

Box, G., and D. Cox. 1964. An analysis of transformations. Journal of the Royal Statistical
Society, Series B 26(2):211-252.

Burton, G.A., D. Gunnison, and G.R. Lanza. 1987. Survival of pathogenic bacteria in various
freshwater sediments. Applied and Environmental Microbiology 53(4):633-638.

Christensen, V.G., A.C. Ziegler, and X. Jian. 2001. Continuous turbidity monitoring and
regression analysis to estimate total suspended solids and fecal coliform bacteria loads in
real time. In Proceedings of the Seventh Federal Inter agency Sedimentation Conference,
March 25-29, 2001, Reno, NV. Subcommittee on Sedimentation, vol 1, pp 111-94 to III-
101.

City of Port Washington, Wisconsin. 2007. Annual Sanitary Survey Report Comprehensive
sanitary survey of Upper Lake Park Beach, Port Washington, Wl 53074. Ozaukee
County Planning, Resources and Land Management, Ozaukee County Public Health
Department.
. Accessed
March 2009.
57

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
Delaware Department of Natural Resources and Environmental Control. 1997. Swimming
(Primary Body Contact) Water Quality Attainability for Priority Watersheds in Sussex
County. Delaware Department of Natural Resources and Environmental Control, Dover,
DE.

Delaware Department of Natural Resources and Environmental Control. 1998. 1998
Recreational Water Guidelines, Swimming. Delaware Department of Natural Resources
and Environmental Control, Dover, DE.

FISRWG (Federal Interagency Stream Restoration Working Group). 1998. Stream Corridor
Restoration: Principles, Processes, and Practices. GPO Item No. 0120-A; SuDocs No. A
57.6/2:EN 3/PT.653. ISBN-0-934213-59-3. Federal Interagency Stream Restoration
Working Group, United States.

Francy, D. 2006. Status and Future of Predictive Modeling at Beaches. U.S. Geological Survey,
Ohio Water Science Center. Presented at the National Beach Conference, October 11-13,
2006. . Accessed
March 2009.

Francy, D. 2009. Use of predictive models and rapid methods to nowcast bacteria levels at
coastal beaches. Aquatic Ecosystem Health and Management 12(2):177-182.

Francy, D. and R. Darner. 2006. Procedures for Developing Models to Predict Exceedances of
Recreational Water Quality Standards at Coastal Beaches—Techniques and Methods 6-
B5. U.S. Geological Survey, Reston, VA.

Francy, D.S., and R.A. Darner. 2007 Now casting Beach Advisories at Ohio Lake Erie Beaches.
Open File Report 2007-1427. U.S. Geological Survey, Reston, VA.

Francy, D, R. Darner, and E. Bertke. 2006. Models for Predicting Recreational Water Quality at
Lake Erie Beaches. SIR 2006-5192. U.S. Geological Survey, Reston, VA, and U.S.
Department of the Interior, Washington, DC.

Frick, W.A., Z. Ge, and R.G. Zepp. 2008. Nowcasting and forecasting concentrations of
biological contaminants at beaches: A feasibility and case study. Environmental Science
and Technology 42(13):4218-4824.

Haak, S. K., Fogarty, L. R. and Wright, C. 2002, Environmental Influences on Numbers ofE.
coli and Enterococci in Beach Water, Grand Traverse Bay, Michigan. 2002 Great Lakes
BeachConference, Chicago, IL.

He, L., and Z. He. 2008. Water quality prediction of marine recreational beaches receiving
watershed baseflow and stormwater runoff in Southern California, USA. Water Research
42(10-11):2563-2573.

Jin, G., and AJ. Englande Jr. 2006. Prediction of swimmability in a brackish water body.
Management of Environmental Quality: An InternationalJournal 17(2): 197-208.

Johnson, E.A. 2007. Predictive modeling of Enterococcus concentrations at South Carolina
Tier 1 Beaches. M.S. thesis. University of South Carolina, Department of Environmental
Health Sciences, Columbia, SC.
58

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010
Kuntz, I.E. 1998. Non-point Sources of Bacteria at Beaches. City of Stamford Health
       Department, Stamford, CT.

Kuntz, J., and R. Murray. 2009. Predictability of swimming prohibitions by observational
       parameters: A proactive public health policy, Stamford, Connecticut. Journal of
       Environmental Health 72( 1): 17-22.

Lipp, E. K., R. Kurz, R. Vincent, C. Rodriguez-Palacios, S.R. Farrah, and J.B. Rose. 2001. The
       effects of seasonal variability and weather on microbial fecal pollution and enteric
       pathogens in a subtropical estuary. Estuaries 24(2):266-276.

Maimone, M., C.S. Crockett, and W.E. Cesanek. 2007. PhillyRiverCast: A real-time bacteria
       forecasting model and Web application for the Schuylkill River. Journal of Water
       Resources Planning and Management 133(6):542-549.

Maine State Planning Office. 2004. Maine's Health Coastal Beaches Risk Assessment Matrix.
       Guidelines for Beach Closures (3/04). Maine State Planning Office, Maine Coastal
       Program, Augusta, ME. .
       Accessed March 2009.

Mallows, C.L. 1973. Some comments on Cp. Technometrics 15:661-675.

Mas, D.M.L., and D.P. Ahlfeld. 2007. Comparing artificial neural networks and regression
       models for predicting faecal coliform concentrations. Hydrological Sciences-Journal-des
       Hydrologiques 52(4):713-731.

McPhail, C.D., and R.T. Stidson. 2009. Bathing water signage and predictive water quality
       models in Scotland. Aquatic Ecosystem Health and Management 12(2): 183-186.

McQuarrie, A., and C. Tsai. 1998. Regression and Time Series Model Selection. World
       Scientific, Singapore.

Morrison, A.M., K. Coughlin, J.P. Shine, B.A. Coull, and A.C. Rex. 2003. Receiver operating
       characteristic curve analysis of beach water quality indicator variables. Applied and
       Environmental Microbiology 69(11):6405-6411.

Nevers, M.B., and R.L. Whitman. 2005. Nowcast modeling of Escherichia coli concentrations at
       multiple urban beaches of southern Lake Michigan.  Water Research 39(20):5250-5260.

Nevers, M.B., and A.B. Boehm. 2010. Modeling fate and transport of fecal bacteria in surface
       water. In The Fecal Bacteria, ed. MJ. Sadowsky and R.L. Whitman, pp. 165-188. ASM
       Press, Washington, DC.

Olyphant, G.A., and R.L. Whitman. 2004. Elements of a predictive model for determining beach
       closures on a real time basis: The case of 63rd Street Beach Chicago. Environmental
       Monitoring and Assessment 98(1-3):175-190.

Olyphant, G.A., and M. Pfister. 2005.  SwimCast: Its physical and statistical basis. In
       Proceedings of the Joint Conference—Lake Michigan: State of the Lake and the Great
       Lakes Beach Association, Green Bay, WI, November 2-3, 2005.
                                          59

-------
Predictive Modeling at Beaches—Volume I November 22, 2010
RTI (Research Triangle Institute). 2007. Regression Modeling with NHDPlus to Address Beach
Advisories for Pathogens. Report under EPA Contract Number 68-C-02-110. Research
Triangle Institute, Research Triangle Park, NC.

Schiff, K.C., J. Morton, and S.B. Weisberg. 2003. Retrospective evaluation of shoreline water
quality along Santa Monica Bay beaches. Marine Environmental Research 56(l-2):245-
253.

Schueler, T. 1999. Microbes and Urban Watersheds: Concentrations, Sources, and Pathways.
Watershed Protection Techniques 3(l):554-565.

Schwab, D.J., and K.W. Bedford. 1994. The Great Lakes Forecasting System. In Coastal and
Estuarine Studies: Coastal Ocean Prediction, ed. C.N.K. Mooers. American Geophysical
Union, Washington, DC.

Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics 6(2):461-464.

Sherer, B.M., R. Miner, J.A. Moore, and J.C. Buckhouse. 1992. Indicator bacteria survival in
stream sediments. Journal of Environmental Quality 21(4):591-595.

Shibata, T., H.M. Solo-Gabriele, L.E. Fleming, and S. Elmir. 2004. Monitoring marine
recreational water quality using multiple microbial indicators in an urban tropical
environment. Water Research 38(13):3119-3131.

Thomann, R.V., and J.A. Mueller. 1987. Principals of Surface Water Quality Modeling and
Control. Harper & Row, New York.

Thupaki, P., M.S. Phanikumar, D. Beletsky, DJ. Schwab, M.B. Nevers, and R.L. Whitman.
2010. Budget analysis of Escherichia coll at a southern Lake Michigan beach based on
three-dimensional transport modeling. Environmental Science and Technology
44(3):1010-1016.

USEPA (U.S. Environmental Protection Agency). 1999. Review of Potential Modeling Tools and
Approaches to Support the BEACH Program. EPA-823-R-99-002. U.S. Environmental
Protection Agency, Office of Science and Technology, Washington, DC.

USEPA (U.S. Environmental Protection Agency). 2002. National Beach Guidance and Required
Performance Criteria for Grants. EPA-823-B-02-004. U.S. Environmental Protection
Agency, Office of Science and Technology, Washington, DC.

USEPA (U.S. Environmental Protection Agency). 2005. The EMPACT Beaches Project: Results
from a Study on Microbiological Monitoring in Recreational Waters. EPA-600-R-04-
023. U.S. Environmental Protection Agency, Washington, DC.

USEPA (U.S. Environmental Protection Agency). 2007. Report of the Experts' Scientific
Workshop on Critical Needs for the Development of New or Revised Recreational Water
Quality Criteria. EPA-823-R-07-006. U.S. Environmental Protection Agency, Office of
Water and Office of Research and Development, Washington, DC.

Whitman, R.L., M. Becker-Nevers, and PJ.Gerovac. 1999. Interaction of ambient conditions and
fecal coliform bacteria in Southern Lake Michigan beach waters: Monitoring program
implications. Natural Areas Journal 19:166-171.
60

-------
Predictive Modeling at Beaches—Volume I                                   November 22, 2010


Whitman, R.L. 2008. About Project S.A.F.E. U.S. Geological Survey, Great Lakes Science
       Center.
       . Accessed March 2009.

Wolfe, K.L., W.E. Frick, R.G Zepp, M. Molina, MJ. Cyterski, and R.S. Parmar. 2008. Virtual
       Beach Manager Toolset. Presented at the 7th Annual Surface Water Monitoring and
       Standards Meeting (SWiMS), Chicago, IL, March 18- 20, 2008.

Wymer, L. 2007. Statistical Framework for Recreational Water Quality Criteria and
       Monitoring. Wiley-Interscience, Chichester, West Sussex, England.

Zhu, X. 2009. Modeling Microbial Water Quality at a Non-Point Source Subtropical Beach.
       Master's thesis, University of Miami, Coral Gables, FL
                                          61

-------