oEPA United States Environmental Protection Agency Six Key Steps for Developing and Using Predictive Tools at Your Beach Wat< BASf ;r Quality Today ~^£i+ GOOD D ON RECENT MONIIOHING FOR E.COU BACVEHIA U.S. Environmental Protection Agency Office of Water March 8,2016 820-R-16-001 ------- Foreword This non-technical guide was developed by the U.S. Environmental Protection Agency (EPA) to provide local government officials, beach managers, health department personnel, and others basic information on how to develop predictive tools in the context of an overall beach monitoring and notification program. Five case studies are presented toward the end of this document as examples of how predictive tools have been developed and used at actual beaches. Readers seeking more in-depth design and implementation information are encourage to review the sources used to develop this document as well as various on-line resources provided by EPA and other agencies. Front cover photos, starting upper left and moving clockwise: * Little boy enjoying the waves, ©istockphoto.com. « Water quality notification sign, USEPA. • qPCR analysis, City of Racine Health Department. « Lake Superior, Michigan, upper peninsula, ©istockphoto.com. • Miami Beach, ©islockpholo.com. Case study images are courtesy of the Chicago Parks District, Charles River Watershed Association, Milwaukee Department of City Development, Milwaukee County Parks, University of Wisconsin Zilber School of Public Health, and City of Racine Health Department. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Contents Foreword Acronym List v Introduction 1 The Time-Lag Problem 2 Predictive Tools 3 Developing a Predictive Model 4 Step 1: Evaluate the Appropriateness of a FIB Predictive Tool 6 Introduction to Step 1 6 Is There a Need for a Predictive Tool? 6 Are Beach Characteristics Compatible with Predictive Tools? 7 Are There Sufficient Historical Data to Develop and Test a Predictive Tool? 7 Are There Funding and Other Resources Available to Develop, Operate, Maintain, and Update a Predictive Tool? 8 Personnel and Technical Experts 9 Data Collection 9 Monitoring Equipment and Supplies 10 Modeling and Statistical Software 10 Model Evaluation Over Time 11 Step 2: Identify Variables and Collect Data 12 Introduction to Step 2 12 Key Attributes of Variable Data Sets 12 FIB Density 14 Independent Variables 16 Variables Relating to Bacteria Movement through the Drainage Area 16 Variables Relating to Bacteria Movement through the Receiving Water 17 Variables Relating to the Fate of Bacteria in the Swimming Area 19 Variables Relating to Activities and Conditions at the Beach 20 Step 3: Perform Exploratory Data Analysis 23 Introduction to Step 3 23 Virtual Beach Software 24 Data Management 24 Characterize the FIB and Independent Variable Data Sets 27 BoxPlots 27 Outliers 29 Comparing Data Distributions among Variable Subsets 30 ------- i i Six Key Steps for Developing and Using Predictive Tools at Your Beach Contents Examine the Relationship between FIB and Independent Variables 31 Scatterplots 31 Variable Transformation 32 Creation of New Variables 32 Correlation among Independent Variables 32 Analysis of Variance for Categorical Variables 33 Step 4: Develop and Test a Predictive Model 34 Introduction to Step 4 34 Data Sets 34 Reducing Errors 35 Virtual Beach 35 Models that Do Not Meet Performance Goals 40 Exceedance Probability Threshold 41 Step 5: Integrate the Predictive Tool into a Beach Monitoring and Notification Program 44 Introduction to Step 5 44 Frequency of Running the Model 45 Notification Protocols 45 Types of Beach Notifications 47 Beach Advisories 47 Beach Closings 47 Preemptive Advisories 47 Permanent Advisories 48 Public Communication 49 Public Education 49 Public Outreach 50 Other Uses for Predictive Models 50 Step 6: Evaluate the Predictive Tool over Time 51 Introduction to Step 6 51 Changes to the Fate and Transport of FIB 51 Changes to Data Sources 52 Changes to Your Beach Program 53 Bibliography 54 Case Studies 61 ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach i i i Contents Figures Figure 1. Using sampling and culture analysis to make a beach notification decision 2 Figure 2. Using predictive modeling to make a beach notification decision 3 Figure 3. Box plot attributes 28 Figure 4. Box plots of E. coli density sorted by wind direction 29 Figure 5. Comparison of E. coli density over a four-year period 30 Figure 6. Scatterplots of E. coli vs. rainfall without transformation (A) and with a log-transformation (B) 32 Figure 7. Plot of persistence model results of 2005 data (adapted from Francy and Darner 2006.) 38 Figure 8. Plot of predictive model results of 2005 data (adapted from Francy and Darner 2006.) 40 Figure 9. Plot of predictive model results of 2005 data expressed as exceedance probability threshold (adapted from Francy & Darner 2006.) 42 Figure 10. Notification protocol for a beach program that uses sampling results and a predictive model to make notification decisions 46 Figure 11. Notification protocol for a beach that uses only model results to make notification decisions 46 Tables Table 1. Beaufort Wind Scale 18 Table 2. Independent variables used in final statistical models from case studies 21 ------- ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Acronym List ANN Artificial Neural Network ANOVA Analysis of Variance CART Classification and Regression Tree CESN Coastal Environmental Sensing Network CPU Colony Forming Unit CPD Chicago Parks District CRWA Charles River Watershed Association CSO Combined Sewer Overflow DOY Day of the Year EDA Exploratory Data Analysis EMPACT Environmental Monitoring for Public Access and Community Tracking EnDDaT Environmental Data Discovery and Transformation EPA U.S. Environmental Protection Agency FIB Fecal Indicator Bacteria GBM Gradient Boosting Machine GIS Geographic Information System GLRI Great Lakes Restoration Initiative MHD Milwaukee Health Department MLR Multivariable Linear Regression MPN Most Probable Number NDBC National Data Buoy Center NOAA National Oceanic and Atmospheric Administration NWIS National Water Information System NWS National Weather Service OLS Ordinary Least Squares PLS Partial Least Squares ------- vi Six Key Steps for Developing and Using Predictive Tools at Your Beach Acronym List QA Quality Assurance QAPP Quality Assurance Project Plan QC Quality Control qPCR Quantitative Polymerase Chain Reaction SCDHEC South Carolina Department of Health and Environmental Control (SCDHEC) SSO Sanitary Sewer Overflow TMDL Total Maximum Daily Load USAGE U.S. Army Corps of Engineers USGS U.S. Geological Survey UTC Coordinated Universal Time UV Ultraviolet UWM University of Wisconsin-Milwaukee VB Virtual Beach VB3 Virtual Beach Version 3 WDNR Wisconsin Department of Natural Resources WWTP Wastewater Treatment Plant ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Introduction Even the most pristine waters contain a variety of microscopic organisms. Most of them are harmless, but a small portion can cause illness in humans, including gastroenteritis; eye, ear, and throat infections; hepatitis; and giardiasis. Generally disease-causing (pathogenic) organisms encountered at swimming beaches originate from the feces of humans and warm-blooded animals and are carried into recreational waters by stormwater runoff. Monitoring directly for pathogens in recreational waters is currently impractical for a number of reasons, which include the difficulty in identifying which pathogens are present, filtering large volumes of water to isolate enough organisms to measure, and the high cost of analytical methods. Fortunately, some types of nonpathogenic fecal bacteria are transported along with disease-causing microbes. Known generically as "fecal indicator bacteria" (FIB), they exist in far greater numbers than pathogens and are easier to isolate and enumerate in the laboratory. Consequently, FIB can serve as markers for the potential presence of pathogens. Currently EPA recommends two types of FIB for use in beach monitoring programs: enterococci and Escherichia coli (E. coli) Either type can be used at freshwater beaches, and enterococci are recommended for marine water. State beach programs use exceedance of a beach notification threshold based on the U.S. Environmental Protection Agency's (EPA's) national criteria recommendation or a site-specific water quality standard for these bacteria to determine when to issue a swimming advisory or close a beach (beach notification). Information on EPA's recommended water quality criteria is provided in the National Beach Guidance and Required Performance Criteria for Grants (the National Beach Guidance) at http:// www.epa.qov/sites/production/ files/2014-07/documents/beach- auidance-final-2014.pdf. Key Resources on Predictive Tools • Predictive Tools for Beach Notification. Volume I, Review and Technical Protocol (USEPA 201 Oa) • Predictive Modeling at Beaches. Volume II, Predictive Tools for Beach Notification (USEPA 201 Ob) • Developing and Implementing Predictive Models for Estimating Recreational Water Quality at Great Lakes Beaches (Francy et al. 2013a) • Virtual Beach 3.0.4: User's Guide (Cyterski et al. 2013) • Accessing Online Data for Building and Evaluating Real-Time Models to Predict Beach Water Quality (Mednick2009) • Report of the Experts Scientific Workshop on Critical Research Needs for the Development of New or Revised Recreational Water Quality Criteria (USEPA 2007) • Beach Water Quality Decision Support System (Rockwell etal. 2013) ------- Sampling beach water for FIB, Six Key Steps for Developing and Using Predictive Tools at Your Beach Introduction The Time-Lag Problem At first glance, the process for determining when beach water is safe for swimming seems fairly straightforward. If laboratory results indicate FIB densities above the state water quality standard or other threshold value, a beach notification is issued. If FIB densities are below the threshold value, no action is taken (Figure 1). 8:00 a.m. Collect FIB sample 10:00 a.m. Deliver sample to lab for culture analysis 7:00 a.m. the next day Receive sample results 7:00 a.m. the next day Make beach notification decision Figure 1. Using sampling and culture analysis to make a beach notification decision. Underlying this beach notification system is the assumption that FIB densities do not change (i.e., they persist) between the time a water sample is taken and the laboratory results are known (usually a span of 18-24 hours for culture methods—the methods most often used). At some beaches, this "persistence model" is valid, especially when natural or artificial barriers restrict water movement at the beach. At many open water beaches, however, studies have shown that FIB density can fluctuate significantly over relatively short periods of time. This phenomenon sets up possible undesirable scenarios, for example: Beach water is sampled on Monday. Results obtained on Tuesday indicate that FIB density was above the state standard, so the beach manager issues an advisory. On Wednesday, results of follow-up samples taken on Tuesday reveal that FIB density was back to normal and the water was actually safe for swimming (i.e., Monday's FIB levels did not persist into Tuesday). Consequently, Tuesday—a perfectly good beach day—was lost. Monday's swimmers, on the other hand, were exposed to high levels of FIB and potentially unhealthy levels of pathogens. None of the consequences are good: (1) Monday's swimmers might have swum in contaminated water, (2) beachgoers might have lost recreational time on Tuesday, and (3) area businesses might have suffered economic losses due to the lack of customers. The 18-24-hour time-lag can be a problem. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Introduction Predictive Tools The time-lag problem of culture analysis and the shortcomings of the persistence model have led to the development of tools that predict whether the applicable water quality standard has been or is likely to be exceeded so that beach notifications can be issued in a more timely way. When integrated properly into a beach notification program, these tools can provide an early warning of potentially unsafe swimming conditions. This guide presents an overview of how to develop a predictive tool for your beach program. It focuses mainly on implementation activities and issues and not on technical details. In most instances, the "tool" is actually a mathematical equation or "model" designed to produce one of two types of output: (1) a FIB density prediction, or (2) a probability prediction that expresses the chances that an applicable water quality standard or notification threshold will be exceeded (e.g., "There is 60 percent chance that the standard or threshold will be exceeded."). Either output type can be used by beach managers to "trigger" a beach notification. Throughout this document, when "bacteria density" is mentioned as the model output, assume that it includes "exceedance probability" as an alternative form of output, unless indicated otherwise. Figure 2 shows the timeline for a beach program using predictive modeling. The time required to make the beach notification decision is significantly shorter than the time required in the scenario shown in Figure 1. 8:00 a.m. Collect model input variables (e.g., rainfall, turbidity) 8:30 a.m. Run predictive model 8:45 a.m. Interpret results, compare to WQS or other notification threshold 9:00 a.m. Make beach notification decision Figure 2. Using predictive modeling to make a beach notification decision. In addition to improving the timeliness of beach notifications, predictive models can also help reduce sampling and increase the accuracy of identifying notification days by adding to the existing monitoring program (e.g., if FIB sampling occurs only once or twice a week because of resource constraints, predictive models can provide information for timely notification on other days). ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Introduction Developing a Predictive Model This document presents six basic steps that an interdisciplinary project team (the Beach Team) might take to analyze, develop, implement, and evaluate the success of a predictive model. Each step is discussed in a separate section. • Step 1: Evaluate the Appropriateness of a Predictive Tool. This section outlines factors that your Beach Team should consider before proceeding with a modeling project. The team should assess the degree of risk to the public from swimming at the beach, confirm that essential historical FIB data exist that can be used to develop the model, identify any beach conditions or attributes that are not compatible with FIB modeling, and evaluate whether sufficient resources are available locally to support model development, operation, and maintenance. • Step 2: Identify Variables and Collect Data. This section introduces independent variables influencing the movement of bacteria from their sources, through the drainage system and receiving water, and into the swimming area of a beach. It offers insights into which independent variables might serve as the best candidates for modeling FIB at a beach. • Step 3: Perform Exploratory Data Analysis. Once a set of candidate independent variables is selected, they must be statistically evaluated to see how well they correlate with FIB densities. Results from exploratory data analysis further refine the list of candidate variables. • Step 4: Develop and Test the Predictive Tool. Models can range from simple to complex. This section begins with a discussion of rainfall- based models that need only one independent variable to develop and run. The discussion continues with modeling using multiple variables and concludes with techniques for testing the model. • Step 5: Integrate the Predictive Tool into a Beach Monitoring and Notification Program. Predictive tools are one component of an overall beach program. To successfully integrate a model into a beach monitoring program, your Beach Team should develop protocols for collecting input data, running the model, and using model results. • Step 6: Evaluate the Predictive Tool over Time. To ensure that model output remains accurate and relevant over time as beach conditions change, your Beach Team should evaluate the model's accuracy at least annually. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Introduction This document concludes with a series of case studies that illustrate various ways that predictive models have been developed and implemented. The following case studies helped inform this guidance: • The Grand Strand, South Carolina. The South Carolina Department of Health and Environmental Control (SCDHEC) developed a stormwater model to predict FIB densities at South Carolina state beaches. This case study highlights the limitations of monitoring equipment and the value of collaboration and technology. • Charles River, Massachusetts. The Charles River Watershed Association (CRWA) worked with Tufts University to develop a statistical model to predict water quality in the Lower Charles River Basin. CRWA's experience highlights the importance of model simplicity and the availability of real-time data when resources are limited. • Chicago, Illinois. The Chicago Parks District (CPD) developed a predictive model in 2011 with the assistance of the U.S. Geological Survey (USGS). CPD s experience emphasizes the need for comprehensive knowledge of the beach environment as well as adequate funding and technical resources to collect data and conduct statistical analyses. • Racine, Wisconsin. The City of Racine and the Wisconsin Department of Natural Resources developed NOWCAST statistical models for Racine's two beaches using EPA's Virtual Beach (VB) software. Racine's experience illustrates the importance of a robust data set and the advantages of reinforcing a model with other beach monitoring components. • South Shore Beach, Wisconsin. With assistance from the University of Indiana and USGS, the Milwaukee Health Department (MHD) developed a statistical model for three of its public beaches based on 24-hour rainfall data and previous 24-hour bacterial sampling data. MHD's experience shows that a model can be a good fit for the local public health department. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 1: Evaluate the Appropriateness of a FIB Predictive Tool Introduction to Step 1 While a predictive tool might provide a huge benefit to some beach programs, your Beach Team should first carefully consider and answer the following questions to make sure that a predictive tool is right for your beach: a. Is there a need for a predictive tool? b. Are beach characteristics compatible with predictive tools? c. Are there sufficient historical data to develop and test a predictive tool? d. Are there funding and personnel experienced with model development and maintenance available to develop, operate, maintain, and update a predictive tool? Is There a Need for a Predictive Tool? One of the first things your Beach Team should evaluate is whether a predictive tool is needed. Remember that the main purpose of a predictive tool is to predict whether the applicable water quality standard has been or is likely to be exceeded during the time period prior to culture results being available (time lag) on a sampling day or on nonsampling days. Using time series analyses, EPA reports that bacteria levels at the beach can change over relatively short periods of time (USEPA 2010c). If FIB density at your beach, however, is known to persist for 24 hours or more, the need for making predictions is not as important. Traditional water sampling and laboratory analysis alone might adequately protect swimmer health. Other situations when a predictive tool might not be needed include (1) beaches being sampled daily using rapid methods, and (2) beaches never, or hardly ever, exceeding applicable recreational water quality standards. Given several beaches to manage and limited budgets, your Beach Team will likely rank their beaches according to factors such as potential risk to human health presented by pathogens and beach use. These rankings (described further in Chapter 3 of EPA's 2014 National Beach Guidance) can also help identify the beaches that could benefit most from predictive models. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 1: Evaluate the Appropriateness of a FIB Predictive Tool Are Beach Characteristics Compatible with Predictive Tools? Beaches that make the best candidates for predictive tools are located in environmental settings that are themselves predictable. A good candidate beach operates under a fairly constant range of "normal" conditions that, when processed through a predictive tool, should yield a good estimate of FIB levels. The tool operates like an "if...then" statement. If a set of these conditions occurs, then you get a specific FIB density. Importantly, most predictive tools are developed using historical data which, in effect, describe and define the norm in terms of both the conditions and the predicted value. Once in operation, if the tool is presented with conditions outside the norm, it might not yield accurate results. Therefore, the team might have to revisit the conditions predictive for FIB density. Beaches that might not be good candidates for predictive tools are usually those subject to a wide or frequently changing set of conditions and disturbances that impact FIB density, making "normal" difficult to define and characterize. These conditions might include frequent impacts by spills or illicit discharges or periodic visits by large flocks of birds. Some open ocean beaches are not good candidates for modeling simply because of the sheer complexity of the various meteorological conditions, tidal patterns, offshore currents, and other factors that occur. Are There Sufficient Historical Data to Develop and Test a Predictive Tool? Access to a sufficient amount of historical FIB density data and corresponding data describing a variety of environmental conditions (i.e., independent variable data) is crucial for developing and testing predictive models. EPA recommends having at least 50 observations; but 100 or more is preferable (USEPA 2010b). Ideally the observations should represent a range of conditions experienced at the beach and include data collected in normal seasons, dryer-than-normal seasons, and wetter-than-normal seasons. This is rarely the case; but the closer you can get to this ideal, the more robust your model will be. An important part of the model development process is testing the model. Francy et al. (2013a) recommends that you collect data for at least three seasons, then use two seasons' data as the training dataset and one season's data as the testing dataset. Checklist of Beach and Program Characteristics Compatible with Modeling • The beach operates under a constant range of "normal" conditions. • Exceedances of beach notification threshold values occur occasionally but are not a chronic problem. • FIB densities change over relatively short periods of time (time-lag problem). • A sufficient amount of historical FIB and independent data exists. • Funding for personnel and technical experts is available. • Monitoring equipment is available. • Computer equipment and software are available. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 1: Evaluate the Appropriateness of a FIB Predictive Tool A more complete discussion of independent variable data along with tips on how to collect them is provided in step 2. For preliminary purposes, however, your Beach Team should investigate the FIB density, rainfall data, and data on factors that affect water movement in and around the beach (i.e., wind and wave direction and magnitude). Water quality data such as turbidity and water temperature are also important factors at some beaches; as are data on near-shore sources of fecal pollution (e.g., birds). Sources of data include federal agencies (e.g., the National Weather Service (NWS) and USGS) as well as various state and local agencies. A particularly valuable resource is beach sanitary surveys, especially if they are conducted on a daily basis (see http://www.epa.gov/beach-tech/beach-sanitary-surveys for sanitary surveys developed by EPA). Sanitary surveys provide site-specific data that match exactly to the time a FIB sample is collected. If a minimum of three seasons' worth of historical data is not readily available, then your Beach Team might need to collect more data before developing the model. Step 2 provides more information on data collection. Are There Funding and Other Resources Available to Develop, Operate, Maintain, and Update a Predictive Tool? The development of a predictive model is just the first phase of an overall predictive modeling program. Once the model is developed, there are a variety of costs associated with operating and maintaining it. The majority of local agencies responsible for beach programs—usually city or county public health departments—have limited staff time, technical exper- tise, and funding available for projects. Consequently, resources and costs must be carefully planned and budgeted. Major costs to consider include: • Personnel and technical experts. • Data collection. • Monitoring equipment and supplies. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 1: Evaluate the Appropriateness of a FIB Predictive Tool • Modeling and statistical software. • Model evaluation over time. Personnel and Technical Experts Your Beach Team needs to have the right combination of staff to develop and implement a predictive tool. The most important staff will be the following: • Field staff—to conduct sampling and maintain equipment. • Modeler/statistician—to analyze data and develop, validate, and refine the model. • Beach manager—to integrate the model into your beach program and conduct public outreach. It can be helpful to collaborate with others, such as universities, federal agencies, and state or local governments (see Collaboration with Others text box). They can be excellent resources, especially when the technical knowledge of a statistician is required. Data Collection In addition to gathering historical data, the team will need to continue to collect data from the same sources to run the model once it is implemented. Data collection is discussed in more detail in step 2. If the data source changes Collaboration with Others Many partnerships have successfully developed a number of modeling programs. For example, USGS has played a major role in modeling efforts in the Great Lakes region. They offer extensive resources, expertise, and comprehensive knowledge of watersheds (including beaches) and can provide in-depth statistical tools and statisticians to run them. Local universities can be another highly valuable resource. Graduate students from ecology, biology, and environmental science and engineering departments might be available to assist with water quality monitoring, sampling, and even model development. A mutually beneficial partnership might develop as students have the opportunity to apply their research to a real-world scenario and it allows for low-cost sampling and monitoring. In addition, universities often have their own monitoring equipment, laboratories, and even statistical software that can be shared. Some beach programs have models that began as part of graduate theses and dissertations. For example, SCDHEC developed its model with the help of a graduate student at the University of South Carolina who used it as part of a master's thesis. The CRWA's predictive model was also developed as part of a master's thesis by a student at Tufts University. These collaborative efforts proved to be highly advantageous, providing a wealth of knowledge and expertise, as well as significant cost savings. ------- 1O Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 1: Evaluate the Appropriateness of a FIB Predictive Tool for any of the model's variables or significant alterations occur to the beach and surrounding area, the model will need to be recalibrated (see step 6). Monitoring Equipment and Supplies Even when there is an abundance of data for your beach from external sources, use of monitoring equipment such as data sondes, flow meters, and rain gauges might provide you with more accurate data. The main drawback of using this equipment is that it can be expensive to purchase and maintain, especially when it must be placed in harsh environments and exposed to weather, waves, sand, and vandalism. Sufficient funding resources as well as staff to maintain and repair equipment are necessary. As described in the case studies, both MHD and SCDHEC stopped using data sondes and rain gauges because of their high maintenance costs, but were able to develop successful models using other data sources. Modeling and Statistical Software Some models are simple enough to run in a basic Excel spreadsheet, with no additional software required. Statistical software can be purchased, but it might have licensing costs. EPA developed VB, a free model builder software program (described in more detail in steps 3 and 4), that enables ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 1: Evaluate the Appropriateness of a FIB Predictive Tool 11 beach managers and others to develop or update models using statistical techniques. The software is user-friendly; however, preparing data for input into the modeling software requires considerable time and expertise. Visit http://www.epa.gov/exposure-assessment-models/virtual-beach-vb for more information about VB. For information on how to manage your data set for use in VB, see step 2. If the answers to the four questions asked in the introduction to this step are "Yes", you are in a good position to move forward with the development of a predictive model. If you determine that a predictive model is not needed or your beach is not a good candidate for a modeling project, consider working with your public health officials to alter the current monitoring program to focus your efforts during times when conditions favor high FIB densities. If your answer is "No" to question c., you might need to collect additional data to build a historical database for use for model development in the future. If your answer is "No" to question d., consider exploring the potential of collaborating with others interested in model development (see "Collaborating with Others" text box). If that option is not available, there are other ways to increase the level of public health protection at beaches, including the use of sanitary surveys and preemptive advisories. Model Evaluation Over Time Once your model has been developed, it must be maintained to keep it running properly and performing as expected. This process is covered in step 6. ------- 12 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data Introduction to Step 2 After your Beach Team has determined that a predictive model is appropriate for a beach, it can proceed to Step 2: identifying candidate independent variables for use in model building and collecting a set of high- quality historical data for those variables and FIB density. Refer to page 16 for list of independent variables. To predict an exceedance of a water quality standard, your Beach Team must first identify environmental conditions that likely affect the levels of bacteria at the beach. In the context of predictive modeling, those conditions are the "independent variables." In this step you are trying to identify and collect data for the independent variables that exhibit the strongest statistical relationship with the dependent variable, FIB density. It is important to keep in mind, however, that a strong statistical association should not be interpreted as reflecting actual causative mechanisms for an observed elevation of FIB densities. The association is based only on the correlation of past observations of independent variables with FIB density. When such associations are evident, further scientific investigation can inform beach managers of the nature of the association and improve their understanding of future occurrences and how much weight to give them. Key Attributes of Variable Data Sets For model-building, the variable data sets should possess the following basic characteristics. An adequate amount of data to develop (training dataset) and validate (testing dataset) the model. EPA recommends at least 50 observations for model development, but 100 or more are preferred. There are several ways of portioning the available data into training and testing datasets. One common way is to collect three seasons of data, then designate two seasons as the training data set and the third as the testing data set. You can construct your model using fewer data, but model performance might suffer because accuracy might be low. High-quality data, including quality assurance documentation. Ideally, a quality assurance plan exists that describes data collection methods, protocols, and procedures. Given the particular variable, the plan might include laboratory methods; field sampling protocols, including metadata (e.g., sampling time and depth of sample); and data processing procedures. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data Easily collected or obtained data. Because predictive models are often run daily, all input data must be obtained quickly. Automatic samplers with data transmission capabilities and data that are easily downloaded from government agencies' websites (e.g., NWS and USGS) represent good data sources. The "ease of collection" will likely eliminate many potentially good candidate variables from consideration. In some cases, a more easily collected surrogate variable might convey similar information; in other cases, your Beach Team will have to abandon the variable and look elsewhere. In general, data collected locally are preferred over data obtained from external sources. If data from external sources are used, it is preferable if the collection methods are subject to good QA/QC (e.g., USGS or NWS data). Consistent procedures for collecting data for pre- and post-model development. Independent variable data have two functions: (1) they are used to develop the predictive model, and (2) they are used as input variables to run the model. When you use historical data to build a model, you assume that the methods used to collect and report that data will remain in place for future model input data collection. Consistency is key. You cannot mix and match data sources for the same variables without re-validating the model. Temporally relevant independent variable and FIB data. FIB sampling at swimming beaches usually occurs early in the morning. Some independent variable data are likely collected at the time of sampling. Other data might relate to conditions that occurred prior to the sample collection time, such as cumulative rainfall over the previous 12 hours. As you determine your independent variables, you must keep them temporally relevant to the sample time. If the sample time was at 8:00 a.m., you need to ensure that your independent variables are also based on 8:00 a.m. or an earlier time based on knowledge of stream effects and runoff. 13 Quality Assurance and Quality Control EPA's National Beach Guidance (2014) provides important information and recommendations concerning primary data collection to ensure that all observations, samples, and measurements are properly and consistently collected and processed. Specifically, the Agency recommends that a quality assurance project plan (QAPP) be developed to ensure that collected data are complete, accurate, and suitable for the intended purpose. Essentially, the QAPP serves as a blueprint for collection activities and quality assurance (QA) and quality control (QC) procedures. Also included in the plan should be detailed descriptions of standard operating procedures and staff training requirements. ------- 14 Some Basic Bacteria Facts FIB are very small, immobile single-celled organisms. They have to be physically transported from point to point by some mechanism. Usually this mechanism is moving water. Life expectancy of individual cells outside their natural environment is usually short, around 2-5 days. Many stressors can shorten it further. FIB can survive and even multiply for some time in sediments and algal mats. They can be easily stirred up and resuspended in overlying waters. Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data FIB Density It is important that FIB density measurements are taken at a consistent location, depth, and time. Sample collection, handling procedures, and analytical methods must be consistent as well. This section lists EPA recommendations concerning FIB sampling and analysis. They are presented not to encourage immediate changes to data collection procedures (which would disrupt consistency), but as background to allow better interpretation of FIB density data that have already been collected. • Sample Location. Sample sites should be located where the greatest recreational use occurs. Features that might directly affect the movement of FIB to and from the beach, such as outfalls and jetties, should also be taken into account. • Sample Depth. Samples should generally be taken in approximately knee- to waist-deep water unless that depth poses a safety risk to the sampler (e.g., powerful waves). The sample should be drawn 0.5-1 foot below the surface. Samples taken from shallower waters might not accurately represent ambient FIB density due to the resuspension of bacteria from sediments. • Sample Time. Samples taken early in the morning are generally considered the best for beach monitoring programs because that is the time when FIB densities are usually the highest. The sampling time should be consistent day to day because FIB density can change fairly quickly in response to increasing sunlight intensity, temperature, and other environmental conditions. EPA's National Beach Guidance includes a detailed discussion on event-scale, diurnal, and tidal variability (USEPA 2014). • Sample Frequency. For the purpose of developing a predictive model, the more samples the better. Most beach programs sample high priority beaches at least once a week during the swimming season. In general, a model will be increasingly robust as more FIB data are collected and matched with independent variable data. The Report of the Experts Scientific Workshop on Critical Research Needs for the Development of New or Revised Recreational Water Quality Criteria recommends you collect data four or five times a week covering a variety of sampling events to capture temporal variability (e.g., if FIB sampling occurs only once or twice a week due to resource constraints, predictive models can provide information for timely notifications on other days) (USEPA 2007). ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data • Sample Processing and Analysis. Your Beach Team should consult the state for the proper procedures and QA/QC requirements, including holding times, for collecting, handling, and analyzing water samples. EPA has approved a number of analytical methods for culture analyses of recreational waters (40 CFR part 136). In addition, EPA has validated quantitative polymerase chain reaction (qPCR) methods for measuring water quality at beaches (see http://www.epa.gov/cwa-methods/other- clean-water-act-test-methods-microbiological). Standard Methods for the Examination of Water and Waste-water (APHA 1998) is also a source of valuable information. FIB density data is usually reported as colony forming units (CPUs) and most probable number (MPN). CPU is a measurement based on a direct count of bacteria colonies grown on Petri plates and substrate media from water samples passed through membrane filters. MPN tests involve multiple tubes that are allowed to ferment over time. Probability formulas are applied to the number of tubes that produce a positive reaction, and a FIB density estimate is calculated. EPA has approved methods for both types of analyses and either is acceptable to use in modeling. The key is consistency, however. If the type of analysis has changed for your beach, construct (or reconstruct) your model using only data generated using the current analytical method. Sources of Bacteria Human Sources Some older cities have combined sewer systems that convey both sanitary sewer wastewater and stormwater in one piping system. During periods of significant rainfall, the capacity of the combined sewer might be exceeded. When this happens, the excess mixture of sanitary wastewater and stormwater is discharged at combined sewer overflow (CSO) points, typically to rivers and streams. During dry weather periods, human-derived bacteria usually cause a problem at beaches only if septic systems in the area fail or wastewater pipes are compromised or illegally connected to storm drains. Animal Sources In urban and suburban landscapes, animal-derived bacteria and other pollutants tend to collect on impervious surfaces. Sources typically include dogs and cats; waterfowl such as geese, gulls, and ducks; and scavenger species such as raccoons, rats, and pigeons. During the beginning of a storm, the initial runoff flow will sweep up most of the deposited fecal matter and quickly carry it into the drainage network. Known as the "first flush" phenomenon, this flow typically has significantly higher concentrations of bacteria than subsequent flows that occur as the storm lingers. In general, the amount of first flush pollutants available for transport is a function of the number of dry days since the previous storm. Animal-derived bacteria can also be transported from feedlots, barnyards, and other confined-animal facilities located in the drainage area. is qPCR Newer analytical technologies have accelerated the timeliness of laboratory results. One such method is quantitative polymerase chain reaction (qPCR) which quantifies a targeted genetic sequence for both viable and nonviable forms of the indicator bacteria. Because the method does not require culturing live bacteria, analysis can be completed in less time—within 2-4 hours of receipt of the sample by the laboratory. Although both qPCR and bacteria culture methods report FIB density, they are derived using significantly different methods. These results should not be combined when building and operating a predictive model. ------- 16 Common Parameters Used in Models • Parameters relating to sources of FIB atthe beach - Beach attendance - Bather counts - Dog counts - Bird counts • Parameters relating to movement of FIB through the drainage area - Cumulative rainfall - Antecedent dry days - Stream discharge - Stream stage • Parameters relating to movement of FIB in receiving waters - Currentspeed - Current direction - Current A- and 0-components (created byVB) - Wind speed - Wind direction - Wind A-and 0-components (created byVB) - Water level - Barometric pressure • Parameters relating to the fate FIB atthe beach - Solar irradiance - Air temperature - Water temperature - Cloud cover - Dew point - Day of year (ordered number) - Turbidity - Conductivity - Wave height - Wave direction - Wave A-and 0-components - Chlorophyll - Dissolved Oxygen Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data Independent Variables Independent variables associate directly and indirectly to environmental conditions. To aid in choosing the best candidate variables, your Beach Team should become familiar with the likely sources of bacteria that affect the beach, how they are transported to the beach area, and conditions that tend to increase or decrease FIB density in the swimming area. A useful way to collect this information is by using a sanitary survey (see Data Sources text box on page 21 for more details on sanitary surveys). That information can serve as a valuable starting point for selecting candidate independent variables. Independent variables can be roughly categorized into one of four groups: • Variables relating to bacteria movement through the drainage area. • Variables relating to bacteria movement through the receiving water. • Variables relating to the fate of bacteria in the swimming area. • Variables relating to activities and conditions at the beach. Other good sources of guidance on selecting variables include: • Predictive Tools for Beach Notification. Volume I, Review and Technical Protocol (USEPA 2010a). • Predictive Modeling at Beaches. Volume II, Predictive Tools for Beach Notification. (USEPA 2010b) • Procedures for Developing Models to Predict Exceedances of Recre- ational Water Quality Standards at Coastal Beaches: U.S. Geological Survey Techniques and Methods 6-B5 (Trancy and Darner 2006). Variables Relating to Bacteria Movement through the Drainage Area The amount, intensity, and duration of a rain event determine the timing and amount of runoff and the extent of water movement in the drainage area. Since runoff functions as the primary transport mechanism for both human- and animal-derived bacteria, rainfall is usually identified as a very important independent variable for FIB predictive modeling. Your Beach Team's analysis of the drainage network and the potential sources of bacteria within the network should help identify the specific types of rainfall statistics that might be considered for use in the predictive model. The most common choice is cumulative rainfall over a specific time period prior to the FIB sample time (e.g., 6-hour, 24-hour, 48-hour lag). ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data 17 Further analysis might lead you to create an even better variable by assigning more importance or "weight" to segments within a chosen time range. For example, Francy and Darner (2006) created a weighted 3-day rainfall statistic by assigning the most weight to the rainfall total occurring 24-hours immediately prior to the sample time and progressively lesser weights to the amounts occurring one and two days before sampling (see equation below). +R Day3) where: Rw = weighted cumulative variable RDa 1 = 24-hour total rainfall/0-hour lag RDa 2 = 24-hour total rainfall/24-hour lag RDa 3 = 24-hour total rainfall/48-hour lag Rainfall data can be collected locally using a rain gauge, or it can be obtained from an external source such as the NWS. Some water and wastewater utilities operate rain gauges near beaches and might be good sources of data. Locally collected data might correlate better with actual conditions at the beach site; however, operating and maintaining rain gauges can be challenging. Data available on the Internet can be easy to download and use, but might not adequately characterize local conditions. Since drainage flow is a direct result of rainfall, if one or more streams in the drainage network have monitoring gauges in place that provide daily or hourly measurements of discharge or the height of the water surface (i.e., stage or gauge height), those data might also prove to be valuable as independent variables. Variables Relating to Bacteria Movement through the Receiving Water The endpoints of the drainage networks are typically mouths of streams or drainage outfall structures that discharge into a lake, river, estuary, or ocean (the "receiving waters"). When outfalls are not located directly on the beach, ------- 18 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data bacteria contained in the discharge must be transported from the outfall, through the receiving water, and to the beach to cause unhealthy conditions for swimmers. In addition to the lateral movement of bacteria from outfall to beach, bacteria residing in sand or sediments can move vertically into the water column when the sand or sediment is stirred up. Wind, waves, and water currents are usually the three most important independent variables associated with the movement of bacteria through the receiving water. They all can be characterized by direction and magnitude measurements. In general, "continuous variables" (those with numeric values) are preferred over "categorical variables" (those with labels as values), but both forms have been successfully used in predictive models. Routine sanitary surveys tend to collect either (1) very simple discrete measurements (continuous variable) of wind speed and direction, current speed and direction, and wave height, or (2) categorical descriptions of wind and wave attributes. The Beaufort Wind Scale, developed in 1805 by Sir Francis Beaufort, U.K. Royal Navy, is an example of a categorical approach to measuring wind and waves (Table 1). Table 1. Beaufort Wind Scale Wind (Knots) Less than 1 1-3 4-6 7-10 11-16 17-21 22-27 28-33 34-40 41-47 48-55 56-63 64+ Classification Calm Light Air Light Breeze Gentle Breeze Moderate Breeze Fresh Breeze Strong Breeze Near Gale Gale Strong Gale Storm Violent Storm Hurricane On the Water Sea surface smooth and mirror-like Scaly ripples, no foam crests Small wavelets, crests glassy, no breaking Large wavelets, crests begin to break, scattered whitecaps Small waves 1-4 ft. becoming longer, numerous whitecaps Moderate waves 4-8 ft taking longer form, many whitecaps, some spray Larger waves 8-13 ft, whitecaps common, more spray Sea heaps up, waves 13-19 ft, white foam streaks off breakers Moderately high (18-25 ft) waves of greater length, edges of crests begin to break into spindrift, foam blown in streaks High waves (23-32 ft), sea begins to roll, dense streaks of foam, spray may reduce visibility Very high waves (29-41 ft) with overhanging crests, sea white with densely blown foam, heavy rolling, lowered visibility Exceptionally high (37-52 ft) waves, foam patches cover sea, visibility more reduced Air filled with foam, waves over 45 ft, sea completely white with driving spray, visibility greatly reduced ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data 19 Automated collection of wind, wave, and current data offers several advantages over manual collection because (1) it is more easily obtained, (2) it eliminates the subjectivity associated the measurements, and (3) assuming data are recorded continuously, it allows for the construction of antecedent variables (e.g., average wind speed over the previous 24 hours). The most convenient source of wind, wave, and current data is the National Data Buoy Center. This agency is part of the NWS and maintains a network of 90 buoys and 60 coastal stations that collect hourly data on wind speed and direction and wave height. Some also collect data on currents. Tides also create currents that can affect FIB density in beach areas. Incoming tides usually tend to keep FIB in residence at some beaches, while outgoing tides can serve to flush them away. Although very site-specific, the tidal cycle might be an important independent variable at some ocean beaches. Man-made structures such as jetties, groins, piers, breakwaters, and seawalls can affect FIB movement through the receiving water at some beaches. Those structures can enclose most or part of a beach, preventing water circulation between the beach and open water. Several studies have reported higher densities of FIB in those situations because of the retention of bacteria from lack of flushing. Variables Relating to the Fate of Bacteria in the Swimming Area Bacteria residing in the receiving water, including in the swimming area, are subject to many conditions that can increase or decrease their presence in the water column. One of the more important stressors of bacteria is sunlight—specifically, ultraviolet (UV) light. Exposure causes bacteria to die off, which is why FIB densities are usually found to be greater in the early morning before the sun rises higher in the sky. ------- 2O Waterfowl as a Pollution Source Gulls and other waterfowl are often a source of fecal contamination at beaches, particularly in the Great Lakes. Hansen et al. (2011) concluded that waterfowl, including Canada geese, ring-billed gulls, and mallard ducks were the primary source of E, coli contamination at beaches near Duluth, Minnesota, and Superior, Wisconsin. Chicago and Racine have also correlated gull populations at its beaches to FIB densities in beach water samples (Converse et al. 2012; Whitman and Nevers 2003; Hartmann et al. 2013). Chicago has reduced the numbers of gulls at its beaches by managing their nests. Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data Turbidity is a common measurement and often found to be an important independent variable for predictive modeling. It is essentially the cloudiness of the water as defined by a measurement of scattered light. Turbidity is generally caused by a combination of suspended solids, colloidal matter, and algae. Cloud cover also affects the amount of light penetrating the water column and is sometimes used as an inverse surrogate for UV light. Staining of the water by tannins also affects light penetration. Light alone is not the only factor attributable to turbidity's value as an independent variable. Perhaps even more importantly, stormwater runoff carries with it a load of suspended solids, silt, and other material. Consequently, outfall discharge during and following storms is usually more turbid than the receiving water. Thus, turbid water moves in tandem with outfall bacteria. Other parameters associated with stormwater runoff—such as total suspended solids, salinity, and conductivity—can also serve as independent variables. Suspended solids can play a role in removing FIB from the water column via sedimentation. Individual bacteria cells are very small (some are only a micron in length) and easily remain suspended in water. But they can also be adsorbed on sediment particles and, in doing so, increase their weight and their chances of settling to the bottom. Once in the sediments, however, they can remain viable and be resuspended in the water column by any number of turbulent forces, including waves and even swimmer activity. Variables Relating to Activities and Conditions at the Beach The variables described in the previous subsection are related to sources of FIB that (1) originate in the drainage area and are subsequently transported ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data through the drainage network and receiving waters to the beach, and (2) have settled from the water column to the sediments. At some beaches, however, significant sources of bacteria found in or immediately adjacent to beaches can cause high FIB densities within the swimming area. For example, resident populations of gulls and Canada geese have been identified as important contributors to bacteria loads at some beaches. Table 2 includes a list of the independent variables included in the final models for the five case studies. The variables were useful in making timely beach notification decisions. Models must be developed on a beach-specific basis using site-specific data, as shown by the variety of independent variables used in the case studies as well as the number of variables used in similar models (Francy et al. 2013b). Table 2. Independent variables used in final statistical models from case studies. Location Chicago Parks District Charles River Watershed Association Milwaukee, Wisconsin Horry County, South Carolina Racine, Wisconsin Beaches Montrose Beach, Oak Street Beach, Foster Beach, 63rd Street Beach, and Calumet Beach Lower Charles River Basin from Watertown to Boston Harbor Bradford Beach, McKinley Beach, and South Shore Beach Grand Strand beaches North Beach, Zoo Beach Independent Variables Used in Final Model 6-hour rainfall, 4-hour wave period, 6-hour solar radiation, 48-hour rainfall, 6-hour longshore wind, onshore wind, turbidity Rainfall volume, river flow, and wind 24-hour rainfall, previous 24-hour £ coll sampling, pH, conductivity, wave height, water temperature Cumulative rainfall, rainfall intensity, preceding dry days, weather (e.g., wind speed), tides and lunar phase data, current and salinity Water temperature, air temperature, seagull counts, dog counts, wildlife counts, wave height and intensity, water clarity, sky conditions, color changes, odor, algae amount, algae type, bather load (in, out, and total), long shore current and components, wind direction and speed, stream discharge, pollution discharge , rainfall (24-, 48-, and 72- hour), day of year, season, lake levels, and previous day's £ co/; values 21 Sand and Algal Mats Sand in the wave-washed zone of a beach can be a potential source of fecal contamination (Aim et al. 2003). Beach sand can support large densities of FIB for prolonged periods, independent of lake, human, or animal input (Whitman et al. 2014). Other research has examined the presence of FIB in algal mats along beaches. Whitman et al. (2003) found that Cladophora can provide a secondary habitat for FIB that could potentially impact water quality in affected Great Lakes swimming areas. ------- 22 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 2: Identify Variables and Collect Data Data Sources National Oceanic and Atmospheric Administration (NOAA)/NWS Weather Station Data • NWS airport weather data (e.g., rainfall, temperature, cloud cover, wind speed and direction) are frequently available and easily downloaded. • NOAA maintains a network of buoys, tidal stations, and satellite measurements that provide data on tides, currents, wind, cloud cover, and other marine characteristics (http://tidesandcurrents.noaa.gov). • Additional water quality data are available from NWS (e.g., forecast maps, radar, river/lake levels, rainfall, air quality, and past weather) (http://www.weather.gov). uses • USGS provides continuous real-time water quality data, including streamflow, water temperature, conductivity, pH, dissolved oxygen, turbidity, and runoff (http://water.usgs.gov/data). • USGS supports the National Water Information System (NWIS), which includes data from more than 1.5 million sites, some in operation for more than 100 years (http://waterdata.usgs.gov/nwis). Sanitary Surveys Sanitary surveys are an excellent source of information on site characteristics that can support the development of predictive models. The surveys provide detailed environmental data, including the following observational variables that could be translated into predictive variables for a model: • Number of swimmers/bathers. • Boat traffic. • Wildlife and domestic animals. • Debris and litter. • Presence of algae. • Infrastructure (e.g., parking lots, storm drains, WWTPs). EPA has developed beach sanitary survey tools—one each for marine and Great Lakes beaches—to help beach managers evaluate all contributing beach and watershed information, including water quality data, pollution source data, and land use data (http://www.epa.gov/beach-tech/beach-sanitary-surveys). ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis After selecting and collecting high-quality FIB data and independent variable data sets, your Beach Team is ready to proceed to exploratory data analysis (EDA). Introduction to Step 3 The primary purpose of EDA is to explore the relationships between the independent and FIB density variables and identify the best candidate variables for model development. Another purpose is to assess two fundamental assumptions of the statistical models described in this guidance: (1) the data sets represent the normal range of conditions that are expected in the future, and (2) the FIB density and independent variables are linearly related. Your Beach Team should consider working with a statistician who can provide statistical expertise during EDA. The EDA work is valuable because it adds to your Beach Team's depth of knowledge about relationships between FIB density and the various drainage area, receiving water, and fate independent variables. This knowledge is crucial for integrating predictive modeling into an overall beach program. 23 The purpose of this section is to provide an overview of the approach to exploratory analysis. It does not attempt to provide a thorough discussion of techniques or evaluations. Further information can be found at http://www3.epa.gov/ caddis/da exploratory O.html. ------- 24 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis Virtual Beach Software Your Beach Team will need to use specialized computer software for many of the data processing and EDA tasks described in this step, as well as for model development and testing activities described in the next step. EPA's Virtual Beach (VB) software package is specifically designed for constructing site- specific FIB prediction models at freshwater and marine beaches. Created for use by beach managers and researchers, VB includes a variety of EDA techniques, including the basic ones described in this section. Although many free and proprietary statistical packages that include EDA programs are available, VB allows predictive beach modelers to seamlessly integrate all the necessary components for preparing and analyzing data and building and testing models. VB also includes an integrated mapping component to determine geographic orientation of the beach and assists the user in compiling wind/current speed and direction in along-shore and onshore/offshore components. Your Beach Team should ensure they have staff with appropriate skills as VB does not replace the need to work with someone knowledgeable in data management and analysis. For more information about VB and its capabilities, including how to download a free copy, visit http://www.epa.gov/exposure-assessment- models/virtual-beach-vb. You can also visit http://www.seagrant.wisc.edu/ home/Default.aspx?tabid=646#Training for predictive modeling workshop presentations, webinars on accessing online data, and step-by-step modules onVB. Exposure Assessment Models in arf tifif 'PJ5 Homo » F*fKWJirp flw^menr Mnrlp!* » Vimnt frw+i (VR) Virtual Beach (VB) 3&A December 2014 2.4.3 September 2013 liilyjfll? Applications and Possible Uses Technical Support and Training Duality Assurance and Oualitv Control VMiJdl Bedi.ii b d wjllwdie pdtkdye designed fui developing bile-spetifii. sLdUbliidl mudelb lui me prediction of pathogen indicator levels at recreational beaches. VD is primarily designed for beach managers responsible for making decisions regarding beach closures due to pathogen contamination. However, researchers, scientists, engineers, and students ntcrcstcd in studying relationships between water quality indicators and ambient environmental conditions will find VB useful. Data Management The management of data is an important part of the model development process. Before data can be uploaded to VB or other modeling software, it must be manipulated and formatted properly. This can be a fairly complex and time-consuming process and enlisting the help of data processing experts is often necessary. It is important to keep in mind that each measurement in an independent variable data set must pair with one, and only one, FIB density measurement. Some beaches collect multiple samples at about the same ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis time and record all of them in a database. In that case, you can take the geometric mean and use that as your data point. As with any data-driven analyses, variable data must be checked carefully and identified errors or anomalies corrected before they are entered into any analytical software, including VB. Some basic things to watch out for include missing data, improperly recorded information, invalid data cells, and other potential formatting problems. Data formatting and structure must meet all of the input standards and requirements of the software. For example, empty data cells are not permitted in VB. In such cases, you need to either identify and replace these values or delete the observation from the data set. VB includes a component that assists users in the input data-check process. It can go through a spreadsheet cell by cell looking for blanks as well as non-numeric or user-specified values. If a bad cell or value is identified, the user is presented with an opportunity to fix it. Other data checks can include: Linking FIB observations with independent variable data. A key challenge in developing the input data sheet for VB is selecting only those data temporally linked to the FIB observations. The challenge is further complicated if you are also creating antecedent variables from those data. There are several methods for accomplishing this data manipulation task, both within and outside of the VB-input file. Mednick (2009) describes a system for joining various data tables into one master table using Microsoft's Access database software. Numerical conversion of categorical variables. VB requires that all categorical variable labels be given a numerical designation. Ordinal variables can be simply converted to a continuous-like numerical variable. For example, turbidity values can be translated as Clear = 1, Slightly Turbid = 2, Turbid = 3, and Opaque = 4. Of course, even though they appear as numbers, they are still categorical values and, therefore, most summary statistics (e.g., mean values) are not applicable. VB provides an opportunity for the user to flag categorical variables to prevent the creation of inappropriate summary statistics and variable transformations (e.g., natural log or square root variable). Data entry errors. Your Beach Team should put in place data management QA oversight and QC procedures that include the transfer and manipulation of data such as in the VB input data sheet. After data 25 NOAAand USGS have developed tools to help automate the process of downloading data from online sources and compiling them into a single data sheet. • NOAA-PROCESSNOAA. Accesses, compiles, and processes wind speed and direction (instantaneous and previous 24 hours) and rainfall totals for 24-hour windows of lag times of 1, 2, and 3 days. It also has the ability to display data graphically and weight rainfall variables. To access the tool, visit http://pubs.usgs.gov/ sir/2013/5166/Ddf/sir2013- 5166 appendix2.pdf. • USGS-Environmental Data Discovery and Transformation (EnDDaT). Accesses, compiles, and processes data from a variety of data sources, including NWS, National Data Buoy Center (NDBC), and NWIS. EnDDaT can be used to compile historical data in a single worksheet for model development and to create real-time datasets for direct import to VB for model operation. To access, visit http://cida. usqs.gov/enddat/. ------- 26 Water Quality Notice All natural bodies of water contain microscopic organisms This area 19 monitored for E. coll bacteria, an Indicator of the possible presence of human health risks If bacteria levels are above state health standards, an advisory or closure sign will be posted at this location. Do not Ingest lake water and, as always, swim at your own i me latest water con www.ldern.IN.gov/tMaahM Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis have been transferred to the data sheet, review for errors and anomalies in the data sets. Excel 2013 and later versions include a Quick Analysis tool that allows you to select data and instantly create statistics and charts that will help identify problems. Placeholders for unmeasured values. Some data sets, especially those downloaded from online data sources, use numeric placeholders to indicate unmeasured data (e.g., 999). You need to identify and replace these values or delete the cells. Empty data cells are not permitted in many model building programs, including VB. Unit errors. Most numeric data are reported in unit measurements. The units can vary and must be converted to common units for model development. The most common conversions involve converting data from English to metric units, or vice versa. For modeling purposes, the unit chosen is not as important as consistently using the same units. Unit information should be included in the column title. il Water Quality Today GOOD BASED ON RECENT MONITORING FOR E.COLI BACTERIA For Moro Information Visit: w«w.rdom.IN gov NO INGIERA AGUA DEL LAGO. NAPE A SU PBOPIO RIESCO. Porn mas Inlormwlon- www.IOem.lN gowibeaehts CAUMD D6 AfllW AUWIM10 DE HIISOO DE fNFERMEDAOWUXMBLfi BASADO thl RECIENTES iHALJSiS M LA BACTERIA DE LAflACTlHIA E COU Date/time errors. Some data sets downloaded from online data sources list date and time by the numerical day of the year (DOY, 1-366) and/or Coordinated Universal Time (UTC). Some data sources use "Zulu Time" or "Greenwich mean time." These data need to be converted into the same time zone and the date/time format selected for use in the input data sheet. A UTC conversion tool can be found at http://www.noaanews.noaa. gov/hurricanes/zulu-utc.html. A DOY conversion tool can be found at http://www.ngs.noaa.gov/ GRD/GPS/DOC/dov/dov.html. FIB data inconsistencies. A special case is often noted with FIB density data. Because laboratories have minimum density detection limits for FIB, data sets will sometimes have category-type entries mixed in with numerical entries (e.g., < 10CFU/100 milliliters (mL)). In this case, your Beach Team must decide how to handle the "below detection limit" entries so the variable is ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis 27 continuous. Typically, one of three options is chosen: (1) use the detection limit value, (2) use one-half the detection limit value, or (3) use zero as the value. Too many detection limit substitutions, however, might compromise the integrity of the FIB density data set. Multiple stations or sites. Some beaches have one sample site, others use multiple sites. For modeling purposes, however, you need just one density measurement to represent the beach as a whole for a sampling event. In the case of beaches with multiple sites, some sampling schemes are designed to produce a composite sample composed of subsamples taken from each of the stations at approximately the same time. In that case, you would use the composite sample measurement as your FIB observation. Other programs process multiple station samples individually, resulting in multiple FIB data points for an event. A common approach in that case is to calculate the geometric mean of the samples and use that as your FIB observation. Occasionally, you might come across duplicate samples taken from the same station for QA or other purposes. In that case, using the average of the two samples would be appropriate, or a more conservative approach would be to use the highest value as your observation. Characterize the FIB and Independent Variable Data Sets EDA usually begins with an examination of the distribution of each of the data sets. If the "most ideal normal condition" is assumed to be the center of the data distribution (signal), the spread of data from the center (noise) should be examined and at least informal inferences made about the range of environmental circumstances and conditions that produced the variation. Box Plots Box plots are an effective way to summarize data distributions. An example of a box plot is presented in Figure 3. You can generate box plots in VB as well as other statistical software. Note that the box itself is plotted on the Y-axis, and the top and bottom of the box represent the lower and upper quartiles of the ordered data set (25th and 75th percentiles, respectively). The median is calculated and displayed as a horizontal line inside the box. The difference between the quartiles is called the "interquartile range". Vertical lines (whiskers) extend from the quartile lines to represent data above and below interquartile range. Traditionally, the box plot's whiskers terminate ------- 28 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis with a short horizontal line that represents the highest and lowest data points of the distribution. 75th percentile median 25"' percentile Outlier Largest non-outlier I Interquartile f range \ Smallest non-outlier Figure 3. Box plot attributes. By visually inspecting the box plots, your Beach Team can observe: • Outliers—Extreme values in the data set that should be investigated. • Median—The central tendency of the data. • Spread—The variability in the data set in relationship to the median. Smaller spreads are generally better for modeling than larger spreads. The interquartile range is an indicator of spread of the middle half of the data set. • Symmetry and Skewness—The variability of the data set on either side of the median. A symmetric data set shows the median in the middle of the box. A skewed data set displays the median closer to one edge of the box, indicating that the spread is greater for those data on the other side of the median line. If the data are skewed with outliers, the interquartile range is often a better measure of variability than the standard deviation because it is not inflated by the entire data set. Your Beach Team might find that some data sets are difficult to plot and characterize because the data range over several orders of magnitude. FIB densities, in particular, often range from very low densities (< 10 CPU per 100 mL) to very high densities (> 10,000 CPU per 100 mL). Data ranges such as these require that data be transformed to induce symmetry in the distribution and to make it easier to graph, observe, and interpret results. The logarithm is the favored method used for this purpose. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis 29 As mentioned earlier in step 3, categorical variables have values that function as labels rather than numbers. Therefore, they are not an ordered data set and cannot be box-plotted in the same manner as continuous data. FIB density data, however, can be box-plotted by variable categories. The resulting plots will indicate how the different categories of the independent variable individually influence FIB levels (Figure 4). 10,000 _ 1,000 I u_ o .= 100 "5 u u] T North East South Wind Direction West — Notification threshold value (235 CFU/IOOml] * Outlier Figure 4. Box plots of £ co//density sorted by wind direction. Outliers An "outlier" is a data point located outside of the overall pattern of a distribution of other data points. Sometimes outliers are a result of a faulty measurement or a data entry error. In other cases, the data might be correctly measured, but the measurement or sampling occurred under unusual circumstances or conditions. This could be especially significant at beaches with infrequent but predictable exceedances, such as after a heavy rain event. In still other cases, the outlier is a legitimate data point and, while uncommon, might be considered within the normal range of conditions. Because of this uncertainty, your Beach Team should always try to identify the reason for or cause of an outlier. Legitimate outliers can be displayed in the box plot as data points that extend beyond a reformulated minimum or maximum line. Basically the four quartiles are constructed as usual, but (invisible) "fences" are added at ------- 3O Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis the tails of the distribution. These fences mark the boundaries of what is and is not an outlier. The fence is usually defined as 1.5 times the interquartile range. Some analysts like to further categorize outliers as either mild or extreme. To do this, the analyst calculates an outer fence beyond the initial (now inner) fence that is defined at 3.0 times the interquartile range. Any data point that lies between the inner and outer fences is designated as a mild outlier and any point beyond the outer fence is considered an extreme outlier. Comparing Data Distributions among Variable Subsets As mentioned in the introduction to this step, a fundamental assumption of predictive models is that the data used to build the models represent normal conditions that are expected to extend into the future. One way to confirm this assumption, at least for the collected data, is by constructing a time-series plot (Figure 5). If data levels seem to change in certain time periods, your Beach Team might also want to prepare box plot presentations for temporal subsets of the data set to better analyze year-to-year variations and/or season-to-season data variations. By making side-by-side comparisons of box plots, the team might note a significant shift of one subset compared to the others. 10,000 _ 1,000 I u .= 100 ¥ 2012 2013 2014 2015 Year - - - Notification threshold value (235 CFU/IOOml) ¥ Outlier Figure 5. Comparison of £ co//density over a four-year period. If your Beach Team notes a significant shift of one data subset compared to the others, it should investigate why this is occurring. In some cases, this exercise could lead to the development of different predictive models for spring and summer seasons or even the incorporation of a "time of ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis season" variable into the predictive model. In other cases, examining subset distributions might indicate that one entire season's data might be suspect because of the use of different sample collection protocols or equipment or because an important change in environmental conditions occurred that created a new normal for that time period. Examine the Relationship between FIB and Independent Variables Once your Beach Team is familiar with the data sets, outliers are explained, and bad data have been removed, the team can begin examining the relationship between FIB concentrations and independent variables. The main purpose of this exercise is to document linear correlations between FIB density and independent variables—another key assumption of statistical predictive models. Scatterplots The "scatterplot" is a graphical technique that portrays the one-to- one relationship between a dependent variable (FIB densities) and an independent variable. A clustering of data points in a nonrandom pattern along an imaginary line indicates that a linear relationship exists. The strength of the linear association is measured by the Pearson's Correlation Coefficient (r). Its value can range from -1.0 to 1.0—where -1.0 is a perfect inverse correlation, 0.0 is no correlation, and 1.0 is a perfect correlation. The closer the absolute value is to 1.0, the stronger the association is between the two variables. 31 Your Beach Team should keep in mind that, even though a scatterplot might reveal a strong association between dependent and independent variables, it does not automatically mean that there is a cause-and-effect mechanism at work. A definitive connection of this nature must be made through other means. The only finding from the scatterplot analysis is the correlation between the two data sets. *~^- • / frf- '-- V , • Credit: ftyan-Mgeriy/USFVVS ------- 32 A) 1,000 750 0 500 "S u 111 250 0 B) 1,000 •g 100 I u_ O B "S u 1 . . . . •*&-.;." "• •'•• • - «-* * *** *^ '»• • ,•*,*, , 1.0 0.5 1.0 1.5 2.0 2.5 Rainfall in inches . * • »*B • • . •*"? • * 5* " •• ^ • * • ..••••A.-' ,0 0.5 1.0 1.5 2.0 2.5 Rainfall in inches Figure 6. Scatterplots of £ colivs. rainfall withouttransformation (A) and with a log-transformation (B) Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis Variable Transformation If the relationship is nonlinear, your Beach Team should consider transforming the data to try to improve linearity. FIB data, for example, are almost always transformed to a logarithmic scale. Figure 6 illustrates how a LoglO transformation of E. coli data improves linearity. VB provides several transformation options, including base 10 logs, natural logs, square, and square root. Creation of New Variables Your Beach Team might want to explore manipulating or combining variables to improve linearity or to enhance the meaning of the variable. This might include: • Creating a new composite variable by summing, multiplying, or averaging data when multiple sites are measuring the same variable (e.g., multiple FIB sampling sites in the swimming area or multiple rain gauges in the drainage area). • Creating a new composite-weighted variable by including additional weight to select components of the same variable (e.g., creating a cumulative 3-day rainfall total but manipulating the equation so that the more recent 24-hour period receives a higher weight than the preceding 24-hour period). VB allows you to create new variables using sum, maximum, minimum, mean, or products; it also allows you to define beach orientation and break down wind, current, wave direction and magnitude (speed or height) data into alongshore and offshore and onshore components. These types of data are often valuable independent variables in situations in which a major outfall is located near the beach. Correlation among Independent Variables Sometimes combinations of independent variables do not work well together in the context of a predictive model. This frequently occurs when two independent variables correlate highly with each other. Therefore, your Beach Team should examine relationships among independent variables during EDA and identify any strong correlations. The correlations might be ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 3: Perform Exploratory Data Analysis 33 important in step 4 of the model development phase. Where there is a strong correlation, your Beach Team might consider picking one variable and discarding the other—a decision made easier if data for one of the variables is more convenient and/or less expensive to collect. Analysis of Variance for Categorical Variables The relationship between an independent categorical variable and FIB density cannot be represented in a scatterplot with r values calculated in the same manner as continuous data. As noted above, you can visually detect categorical influences on density by using categorical box plots of FIB density. Your Beach Team can use the analysis of variance (ANOVA) statistical technique to determine if the means of the categorized data as they relate to FIB density are significantly different. ------- 34 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Develop and Test a Predictive Model After your Beach Team has completed the EDA and selected a set of independent variables that correlate with FIB density, they can proceed to developing and testing the predictive model. Introduction to Step 4 Most predictive models in use today are based on linear regression, a statistical method that assumes a linear—or straight-line—relationship between variables. Linear regression can be used to predict a dependent variable measurement (in this case, FIB density) using one or more independent variable measurements. A model that uses only one independent variable is generally described as a "simple linear regression" model. A model using two or more variables is called a "multivariable linear regression" (MLR) model. In either case, the model itself is nothing more than an equation with the dependent variable on one side of the equal sign and independent variable coefficients on the other side. Conceptually, you plug in the appropriate measured independent variable values and calculate a predicted FIB density. You can then compare the FIB density to a state water quality standard or other threshold value and make a decision concerning beach notification actions (e.g., to issue a swimming advisory or close the beach). Three key elements are necessary for producing an effective predictive model: • Using high-quality data sets to develop and test candidate models. • Reducing error and increasing predictive power of the model as much as possible. • Choosing an appropriate software package. Data Sets The importance of using high-quality dependent and independent variable data sets for model development and testing cannot be overemphasized. A sufficient amount of good empirical data is necessary for an effective and reliable model. As mentioned in step 2, a rule of thumb is to collect at least three years' worth of historical data that represent conditions that are likely to occur in the future. Then, use two of those years' data to develop the model (training data set) and one year's data to assess the model's predictive accuracy (testing data set). ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model Reducing Errors Recall that the correlation coefficient r was used in context of EDA scatterplots to measure the linear association of one independent variable and FIB density. In the context of modeling, r also represents a measure of: (1) the scatter (variability) of the data points from the regression line, and (2) the power of the independent variable to correctly predict the value of the dependent variable. Variability can often be reduced if more independent variables are added to the mix. This makes sense thinking back to how bacteria moves from land-based sources, through the drainage network and the receiving water, and to the beach. Rainfall, wind, currents, sunlight, and other factors work in combination to influence both the journey and the fate of bacteria cells. While the complexity of model development increases with the addition of more independent variables, the result is usually increased accuracy in predicting FIB density. Virtual Beach The material presented in this section focuses on VB's traditional MLR method of model-building. The current version of VB software is version 3 (VB3), which was released in December 2014. For more complete information about MLR as well as other modeling methods available in VB3, consult Virtual Beach 3.0.4: User's Guide (http://www.epa.gov/sites/production/files/2015-02/documents/vb3 manual 3.0.4.pdf) (Cyterski et al. 2013). In general, the model-building process in VB3 involves searching for the combination of independent variables that produces the most accurate FIB density predictions. Although the VB3 software processes for building predictive models are automated, you must make important decisions concerning model construction and testing, including choosing the method used to build the model, number of variables to include in the model, and evaluation criteria used to judge model fitness. Unless you or another member of the team is familiar with VB3, you will probably need to consult a person who has used it before to help you with these decisions. You can also visit http://www.seagrant.wisc.edu/home/Default.aspx?tabid=646#Training for predictive modeling workshop presentations, webinars on accessing online data, and step-by-step modules on VB. Model Building VB3 offers two general methods for selecting variables for the model. One is called the "genetic algorithm." It is a stepwise procedure that adds or 35 yp fdit: ------- 36 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model subtracts independent variables from the model based on their level of statistical significance. The software retains the most significant variables and discards the least significant. A more comprehensive approach to model building is called "exhaustive search." It involves measuring the goodness of fit for all possible combinations of the chosen independent variables, beginning with models with a single variable and working up to models with all the variables incorporated. The best model for each number of variables, up to a defined maximum, is then identified based on goodness of fit statistics (e.g., the best 2-variable model, the best 3-variable model). VB3 provides a variety of criteria for evaluating model fitness. In addition, the software can recommend how many variables are optimal for the model and determine if collinearity among independent variables is a problem. Once model-building is completed in VB3, the software presents you with the 10 best models based on your chosen evaluation criteria. You then evaluate these candidates using one or more metrics described in detail in the Virtual Beach 3.0.4: User's Guide (Cyterski et al. 2013). Based on the results, you select a final model and begin the process of model validation. Model Validation The objective of model validation is to determine whether your final model is good enough to use in your beach program. Keep in mind that your model's output is used to help officials make timely beach management decisions, ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model including issuing a swimming advisory or closing the beach. These decisions are not taken lightly because they affect public health and safety and a variety of related community concerns pertaining to economic prosperity and public perceptions about the safety of local recreational waters. How you determine if your model is good enough to use in your program is up to you. If you have been relying on previous day sampling results for making your beach notification decisions, you probably want your predictive model to at least perform better than this "persistence model" approach. You can define "how much better" be setting performance goals and testing to see if your predictive model meets or exceeds those goals. If it passes this test, you can consider the model validated and acceptable to use. Discussed below is a four-step method for validating a model using a performance goals approach: 1. Generate evaluation statistics for the persistence model using a testing dataset. Common evaluation statistics are overall accuracy, specificity, and sensitivity (described in more detail below). They are defined and generated in this first step for the persistence model and then generated again in the third step for the predictive model. 2. Set performance goals for your predictive model based on the persistence model's evaluation statistics. 3. Generate evaluation statistics for the predictive model using the testing dataset. 4. Compare the evaluation statistics of the two models and determine the percentage point increase (or decrease) of the predictive model compared to the persistence model. This approach to model validation is illustrated using the work of Francy and Darner (2006). They developed an MLR predictive model for Huntington Beach, Ohio, a beach located on Lake Erie, using a training dataset collected during the 2000-2004 beach seasons. The beach notification threshold value is an E. coli density of 235 CFU/lOOml. The explanatory variables incorporated into their model were wave height, weighted rainfall in the previous 48 hours, and loglO turbidity. Data collected in the 2005 beach season were used as the testing dataset. Generate evaluation statistics for the persistence model Using Francy and Darner's testing dataset, Figure 7 is a plot of the persistence model results; that is, observed E. coli densities (X-axis) vs. E. coli densities measured the previous day (Y-axis). The quadrants displayed 37 Forecasting Future directions that EPA considers likely for predictive tools for beach notification include forecasting beach water quality conditions a day or more ahead. Researchers are also attempting to develop models applicable to more than one beach or to a region of shoreline. ------- 38 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model in the graph are defined by the vertical and horizontal lines set at the beach notification threshold value of 235 cfu/lOOmL. The numbers in the parentheses are the number of plot points that appear in the quadrant. Listed below are the distinguishing characteristics of each quadrant: • Upper left quadrant. Data points that fall in this quadrant have observed E. coli densities below the threshold value, but the model predicts that they will exceed the threshold value. This is known as a "false positive," or Type 1 error. • Upper right quadrant. Data points that fall in this quadrant have observed E. coli densities above the threshold value, and the model correctly predicts that they will exceed the threshold value. "Sensitivity" is the percentage of all the observed exceedance data points that fall in this quadrant. • Lower left quadrant. Data points that fall in this quadrant have observed E. coli densities below the threshold value, and the model correctly predicts that they will not exceed the threshold value. "Specificity" is the percentage of all the observed non-exceedance data points that fall in this quadrant. • Lower right quadrant. Data points that fall in this quadrant have observed E. coli densities above the threshold value, but the model predicts that they will not exceed the threshold value. This is known as a "false negative," or Type 2 error. 10,000 £ 1,000 V) '« E False Positive 14] Correct Nonexceedance (31 ) • *» * * * * •» * * Correct Exceedance (0) False Negative (6) * * * 10 100 1,000 Observed E. coli'm CFU/100ml 10,000 • Notification threshold value (235 CFU/100ml) Number of responses Figure?. Plot of persistence model results of 2005 data (adapted from Francy and Darner 2006.) ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model 39 Of the four quadrants, plot points that fall in the lower right quadrant below the horizontal line (Type 2 errors) are the most troubling because the persistence model is predicting the water is safe for swimming when, in fact, the water is unsafe because FIB densities exceed the beach notification threshold value. The performance statistics for the persistence model are: • (Overall) Accuracy = 75.6% • Specificity = 88.6% • Sensitivity = 0.0% Set performance goals for the predictive model There is no standard formula for setting performance goals; you must use your judgment in context of the goals and objectives of your beach program. Assuming you have been relying on the persistence model approach for making notifications decisions, you will want your predictive model to perform better than the persistence model. Francy et al. (2013a) suggest a goal of at least 5 percentage points better for accuracy, specificity, and/or sensitivity. As discussed above, the sensitivity statistic is especially important because it characterizes Type 2 errors. Consequently, if you want to take a conservative approach in protecting public health, you may want to set your sensitivity performance goal as high as practicable. For this Huntington Beach example, using the persistence model evaluation statistics as a baseline, Francy and Darner chose the following performance goals for model validation purposes: • Accuracy goal > 81% • Specificity goal > 94% • Sensitivity goal > 50% Generate evaluation statistics for the predictive model and determine if your performance goals are met Once you have established performance goals, you can test your predictive model to see if it meets those goals. Again using Francy and Darner's 2005 testing dataset, Figure 8 is a plot of observed E. coli densities vs. E. coli densities predicted by the 2000-2004 model. The evaluation statistics derived from this plot are: • Accuracy = 88.0% (exceeds performance goal) ------- 4O Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model • Specificity = 95.2% (exceeds performance goal) • Sensitivity = 50.0% (meets performance goal) In this example, the Francy and Darner 2000-2004 model passes our performance goal test and can be considered good enough to use in a beach notification decision support system. in,™ !§ °- o "o uj False Positive (2) Correct Nonexceedance (40) * * * » ** *^ »* * ^ * * Conect Exceedance (4) 4- * False Negative (4) * * 10 100 1,000 Observed E coliin CFU/IOOml 10,000 — Notification threshold value (235 CFU/lOOmI) (n) Number of responses Figure 8. Plot of predictive model results of 2005 data (adapted from Francy and Darner 2006.) Models that Do Not Meet Performance Goals Throughout this guide, we have been optimistically moving forward assuming that you are on the path toward creating a successful model. Unfortunately, this is not always the case. If your model does not meet your performance goals, there are some things you can do to try to improve it. For example, you could revisit Step 2 and identify new independent variables and try rebuilding your model, or segregate your dataset and create sub-models that may individually offer better predictive capabilities than one overall model. Another approach is to consider one or more of the alternative predictive tools described in the text box titled Alternatives to MLR Modeling. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model 41 Exceedance Probability Threshold VB3 provides the ability to express a FIB density prediction in terms of a probability that a defined notification threshold value will be exceeded. Predictions in this form have some advantages over a FIB density output: • They explicitly convey that there is uncertainty associated with the model prediction. • They give you the flexibility to select a specific exceedance probability—rather than a density number—to function as the beach notification threshold value. If you choose exceedance probability as your model output, you must define a specific probability percentage to function as a notification threshold value. In general, try to select the lowest (most conservative) exceedance probability threshold that produces the most correct responses and the fewest false negative responses. Recall that false negatives (Type 2 errors) are especially troubling because the model is predicting the water is safe for swimming when, in fact, the water is unsafe. Continuing with the Huntington Beach 2000-2004 model example, Francy and Darner (2006) concluded that a threshold probability of 29 percent would provide the best balance of correct responses and false negative responses. Figure 9 is a plot of threshold exceedance prediction and observed E. coli density using the 2005 testing data set. The quadrants in the ------- 42 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 4: Developing and Testing a Predictive Model chart are defined by the state standard of 235 CFU/lOOmL (vertical line) and the probability of exceedance threshold of 29 percent (horizontal line). The performance statistics from this plot are: • Accuracy = 82.0 percent • Specificity = 88.1 percent • Sensitivity = 50.0 percent 30 80 70 60 bO 40 no 20 10 n False Positive 15) # * * Correct Nonexceedance 137) *• ~>V»O* •»/ ** * « Correct Exceedance (41 * » False Negative (4) * * t ID 100 1.000 Observed E. coli'm CFU/100ml ID.OOD — 29-percent threshold — Notification threshold value 1235 CFU/lOOmll (n) Number of responses Figure 9. Plot of predictive model results of 2005 data expressed as exceedance probability threshold (adapted from Francy & Darner 2006.) Using this approach, you can establish a beach management protocol that requires the issuance of a notification if the model predicts a probability of exceedance of 29 percent or greater. ''f-edit: Chelsi Hornbaker/USFWS ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach 43 Step 4: Developing and Testing a Predictive Model Alternatives to MLR Modeling MLR models are a popular predictive tool used by beach programs, but they are not useful or appropriate for all beaches. If for some reason MLR modeling is not right for your beach, you can explore other alternatives, including: Rainfall Alerts This predictive tool is based exclusively on the positive correlation that sometimes exists between rainfall and FIB densities: As rainfall totals increase and contaminated runoff reaches the receiving water, there is a predictable corresponding increase in FIB density at the beach. By factoring in a beach notification threshold, you can predict exceedances of the threshold using a combination of storm duration and cumulative precipitation data. Rainfall-based thresholds are derived by simple regression or a frequency of exceedance analysis. They represent the oldest approach to predictive modeling and are actively used at many beaches in the U.S. Partial Least Squares Models Partial least squares (PLS) regression models can be used as an alternative to MLR models if there is a large number of independent variables that are not well understood; have poor linear correlation with the response variable; or have problems with multicollinearity among the independent variables. The primary objective of PLS regression remains the same as MLR: a model that accurately predicts FIB concentration given a set of independent variables. The system for selecting the variables is what makes this modeling different. VB3 includes PLS regression as an optional modeling approach, and it is described in detail in Virtual Beach 3.0.4: User's Guide (Cyterski et al. 2013). Decision Trees In general, decision trees work best when FIB levels are primarily influenced by only a few factors. They are basically a series of yes/no questions concerning conditions that influence FIB density. The "tree" is typically portrayed visually as a flow chart with binary decision node "branches." The questions with the highest importance generally appear at the top of the tree. By moving down the tree and answering the set of ordered questions, you are ultimately led to a beach notification classification in the simplest form, either "issue a notification" or "don't issue a notification." Decision trees range from simple to complex, depending on the number of decision nodes and classification endpoints. Gradient Boosting Machine The "gradient boosting machine" (GBM) is a computerized approach to constructing a large hierarchical set of simple decision trees for making FIB predictions. Similar to PLS regression, it is an alternative to MLR if there are a large number of independent variables that might not be well understood; have poor linear correlation with the response variable; or have problems with multicollinearity among independent variables. VB3 includes GBM as an optional modeling approach, and it is described in detail in Virtual Beach 3.0.4: User's Guide (Cyterski et al. 2013). Artificial Neural Network An "artificial neural network" is software that attempts to mimic the working of the biological neural network. Still in the research phase, it presents potentially another alternative for dealing with a large amount of independent variables that might not be well understood; have poor linear correlation with the response variable; or have problems with multicollinearity among independent variables. The technique incorporates an algorithm that allows it to "learn" relationships between inputs and outputs. ------- 44 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 5: Integrate the Predictive Tool into a Beach Monitoring and Notification Program Introduction to Step 5 Once your Beach Team has developed the predictive tool, they have to integrate it into their beach monitoring and notification program. Model outputs will typically be either estimated FIB levels or a probability that the beach notification threshold will be exceeded. The method your Beach Team selects to use to integrate the model will depend on several things, including the model's accuracy and the availability of resources. Some questions to ask as you consider an integration strategy include: • How will you use model results to determine beach notifications? • Will advisories be posted based solely on model results or on a combination of models and sampling? • What do you do if the model predicts an exceedance? • What do you do if sampling results and model results conflict? • Will you verify model results before posting advisories? • Will you use a model to remove an advisory or reopen a beach? • How often will the model be used during the beach season? • Will you run the model on weekdays and weekends? • What time of day will you run the model? As you can see, you must consider many factors when deciding how best to integrate predictive tools into your beach monitoring and notification program. EPA recommends that you use a predictive tool to complement traditional monitoring. A predictive tool cannot completely replace sampling, but it might allow you to reduce the frequency of sampling. Data from culture samples can be used as a basis for models that provide timely results in a cost-effective manner. Predictive tools might also be useful in developing or adapting routine monitoring programs to focus sampling efforts when conditions (e.g., rain events) correlate with high FIB levels. You might choose to issue a beach notification if the model predicts an exceedance of a beach notification threshold, if sampling results are above ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 5: Integrate a Predictive Tool into a Beach Monitoring and Notification Program 45 the threshold, or both. If you occasionally use model results in conjunction with sampling results, consider what to do if the model predictions and sampling results conflict. Once you have issued a beach notification, you must decide the process for removing an advisory. Will you rerun the model with more current data? Will you collect additional samples? The National Beach Guidance (USEPA 2014) recommends lifting actions that were imposed based on the output of a predictive model after an additional model run estimates that water quality conditions have improved to within acceptable parameters. Frequency of Running the Model Your Beach Team must decide how often to run the model. Consider resources available to collect data, run the model, and post results. Running the model daily might be ideal, but is not always practical. You might want to have the results available on the weekends when the most people are using the beach; however, you might not have staff available to collect the data and run the model. Many beach programs that use predictive models run them on weekdays while some also run them on weekends. Notification Protocols As you consider all the factors that are important in determining beach notifications, you will use them to develop a protocol for making beach notification decisions. "Notification protocol" is a general term used to describe a set of questions or decision points that a beach manager routinely uses to determine whether to issue a notification or close a beach. Notification protocols can be simple or complex, but should include all of the decisions that your Beach Team needs to make after collecting samples or running a predictive model. The protocol can include the necessary decisions after a pollution event (CSO or SSO discharge) or hazardous conditions are discovered (e.g., strong rip currents, red tide) that might affect whether the beach should be open, closed, or under an advisory. An example of a notification protocol for a beach that uses sampling results and a predictive model is shown in Figure 10. ------- 46 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 5: Integrate a Predictive Tool into a Beach Monitoring and Notification Program 8:00 a.m. (the previous day): collect sample and send to lab for analysis of FIB 8:00 a.m. (day of sample col lection): collect inputvariables and run model Did a pollution event (CSO, harmful algal bloom] occur? Does previous day's sample result exceed beach notification threshold? Does model predict exceedance of beach notification threshold? Beach is open Yes Issue advisory Figure 10. Notification protocol for a beach program that uses sampling results and a predictive model to make notification decisions. Some beach program managers might choose to use the predictive model alone to make decisions on notification actions, without considering sampling results when making those decisions. In that case sample results might be used only to verify the model is making accurate predictions and to recalibrate or update the model over time. An example of a notification protocol for this approach is shown in Figure 11, which is much simpler than the protocol shown in Figure 10. 8:00 a.m. (day of sample collection): collectinputvariables and run model Does model predict exceedanceof beach notification threshold? Beach is open Did a pollution event (CSO, harmful algal bloom] occur? Issue advisory Figure 11. Notification protocol for a beach that uses only model results to make notification decisions. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 5: Integrate a Predictive Tool into a Beach Monitoring and Notification Program You should also explore whether you need different notification protocols for different seasons or for different parts of the beach season (e.g., if there is a dry part and a wet part to the beach season). In the case study for the South Shore model, the MHD found that their Environmental Monitoring for Public Access and Community Tracking (EMPACT) model was less accurate as the beach season progressed, suggesting there was some level of seasonality or unidentified influences to water quality between the beginning and the end of the beach season. Types of Beach Notifications A beach advisory is the most common beach notification based on the use of a predictive tool. However, the following types of notifications might be appropriate at certain times. Beach Advisories When a model predicts the exceedance of a water quality standard, many beach managers issue a beach advisory, which warns beach goers that the FIB density is above the water quality standard and swimming and wading are not recommended. Beach Closings Modeling results might lead you to decide that water quality conditions are poor enough to warrant closing the beach rather than issuing an advisory. If you close your beach, you might choose to continue running your model regularly to determine when FIB levels are low enough to reopen, thereby minimizing the number of closure days. Preemptive Advisories The exploratory data analysis will give you a good idea of what events (such as heavy rainfall or CSOs) are correlated with higher FIB levels at your beach; as a result, you might decide to issue preemptive advisories or closures based on those events. For example, if you know that a 1-inch rainfall generally causes an ------- 48 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 5: Integrate a Predictive Tool into a Beach Monitoring and Notification Program exceedance of the notification threshold and the weather forecast is calling for more than 1 inch of rain overnight, a preemptive advisory could be issued based on what you already know about rain events and exceedances. You would not need to run the model to issue a preemptive advisory or closure. Permanent Advisories Some beach managers issue permanent advisories when a certain type of event is highly correlated with elevated FIB levels. A predictive tool can help determine whether a permanent advisory is necessary. An example of using this type of advisory is when FIB levels often exceed water quality standards after almost any amount of rainfall. In that case, you might choose to issue a permanent advisory that swimming should be avoided for a certain period after any rainfall has occurred. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 5: Integrate a Predictive Tool into a Beach Monitoring and Notification Program Public Communication The predictive tool development process does not necessarily indicate a need for public involvement. Much of the process involves scientific and technical expertise and centers around the staff and resources of state and local agencies and public health departments. Although much of the process involves experts, predictive modeling stems from the need to protect public health and much can be gained from involving the public. Public Education Public education is an important part of the outreach process. Outreach often involves teaching the public about beach health and safety—what an advisory means, what health risks exist, and what precautions should be taken. When you are using a predictive model, you need to also explain the use of the model to the public. Some general questions and answers useful for public education include: What is a predictive model? Predictive models are a means of predicting or forecasting water quality conditions in the absence of a current water sample. Beach managers assess previous sampling data to determine which factors affect water quality. The model uses these factors to estimate water quality under current conditions. Why use a model? Predictive models are most useful in increasing the timeliness of beach notifications, conserving resources by reducing sampling, and improving the accuracy of identifying notification days by adding to the existing monitoring program. How accurate are models? The accuracy of a model depends on the data on which it is based and local conditions. A thorough understanding of the beach environment and a strong data set can support accurate and reliable models. Models should be routinely verified and validated by sampling and laboratory analysis, and continuously updated based on sampling results. How does the model change postings and advisories? With the use of a model, postings and advisories can be updated more frequently and provide real-time estimates of water quality at beaches. Does this mean samples are no longer collected and analyzed? Water samples are still collected regularly and analyzed for FIB, both to determine the water quality and to verify and update the model. 49 The Ohio Nowcast webpage http://www.ohionowcast.info/ index.asp is a great example of an outreach website and includes detailed information for the public, such as: • Where Nowcast is used — detailed maps. • How Nowcast works. • How Nowcast performs. • Accuracy of Nowcast for each beach. • List of variables used to make predictions. • List of current advisories. • FAQs. ------- so Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 5: Integrate a Predictive Tool into a Beach Monitoring and Notification Program How does this improve public health protection? Beach managers are able to predict water quality and post advisories in a more timely manner to prevent illnesses associated with recreating in waters with high densities of bacteria or pathogens. Public Outreach Public outreach involves directly communicating with the public about beach health and safety. You should consider whether notifications and advisories are easily accessible and whether you are effectively communicating key information. The National Beach Guidance (USEPA 2014) discusses a number of possible formats for conducting outreach. The Chicago Parks Department has an especially good outreach program, which includes a public education campaign and a Beach Ambassadors program (see the case study for more information). Other Uses for Predictive Models A predictive model might provide other benefits to a beach program besides being used for notifications. For example, the Michigan Department of Natural Resources uses beach models as a tool to identify and remediate sources of contamination to assist with Total Maximum Daily Load (TMDL) development for beaches. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach 51 Step 6: Evaluate the Predictive Tool over Time Introduction to Step 6 You should plan to evaluate your model periodically to verify that the performance goals are being met. Many programs choose to assess model accuracy at the end of the beach season. Any significant decreases in performance might signal that environmental conditions that affect FIB density have changed. For example, the season might have been unusually wet or dry. In this case you might want to conduct more exploratory data analysis (Step 3) and build a new model (Step 4) using the past season's data as part of the historical database. Your "rebuilt" updated model may or may not include the same explanatory variables. The overall goal is to keep the model current with the environmental conditions that affect FIB density at the beach. Several of the case studies at the end of this guide describe situations that required officials to adjust their model in response to changing conditions or circumstances. • South Carolina Department of Health and Environmental Control updated their stormwater model by using radar data from NexRad instead of the data they previously obtained from rain gauges. • Milwaukee Health Department is updating their Nowcast model by collaborating with a new partner for their local expertise and using improved data at three beach sites instead of at the one beach where the model was initially used. They hope to automate data integration, translation, and loading to improve the efficiency of their model. • The City of Racine plans to update their model every year to ensure it is still predictive. They also will continue to evaluate whether they can decrease monitoring frequency. • Charles River Watershed Association has continued to enhance its model over the past 15 years, is always looking at other parameters that may improve model predictions to add to the model, and has a future goal of real-time data collection for a real-time model. Changes to the Fate and Transport of FIB The predictive tools described in this guidance assume that the relationships between FIB and the environmental conditions associated with the explanatory variables remain constant over time. This is almost never the ------- 52 Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 6: Evaluate the Predictive Tool over Time case, however, because landscapes and human activities change over time and may affect bacteria sources and their movement through the drainage area. An annual sanitary survey of your beach would likely capture many of these changes. Some of the factors that affect FIB movement include the following: • Land use alterations. • Infrastructure changes (e.g., repairs to leaky sewer lines). • Changes to bounding structures (e.g., jetties, breaker walls, piers). • Changes in pollutant sources (e.g., increase or decrease in algal blooms or presence of wildlife). All of these factors can cause shifts in the underlying processes influencing FIB densities at your beach. Changes to Data Sources Step 2 included a discussion of some of the key attributes of the data needed to build and operate the model to make same-day FIB predictions. In general independent variable data need to be collected in a manner consistent with the historical data used to build the model. Additionally, data collected locally are preferred over data obtained from external or online sources, primarily because your model is site-specific. In reality, however, your choice of data sources is often driven by the availability of funding and resources Using data readily available online is much less expensive and resource intensive to obtain than deploying and maintaining your own system of rain gauges, weather stations, water quality sondes, and other equipment. For example, USGS is working with VB developers to make a variety of explanatory data collected by Federal agencies easier for users to access and process using the EnDDaT online system. As Ozaukee County, Wisconsin The Ozaukee County Public Health Department developed a model for a lake. In 2012, they experienced unusual weather conditions—no rain fell, and the lake temperatures were very warm. The biology and ecology of the lake changed, and the nearshore environment became the source of high FIB densities. Advisories were issued for about one third of the 2012 beach season, and the model was found to be only 60 percent accurate. A revised model would only be useful if these conditions become a trend. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Step 6: Evaluate the Predictive Tool over Time described in the Stormwater Model (Horry County, South Carolina) case study, the SCDHEC initially used rainfall data collected at local rain gauges for their predictive models, but over time they switched to using NexRad data, which eliminated the need for updates to and maintenance of the rain gauges, while also improving timeliness and accuracy of the model. In other cases, a beach program might have originally used data from NWS but plans to install local rain gauges to get more accurate rainfall measurements for their beach. The MHD initially collected data for its predictive model using a sonde, but because of high maintenance costs, they chose to use NWS rainfall data and the previous day's E. coli concentrations—along with sanitary survey data, which provided additional insight on weather, rainfall, algae content, litter, and wildlife. If the data source changes, you will need to collect enough data to rebuild your model (see the recommendations on amount of data in step 1), as the relationships between the independent variables and FIB will change from the relationships in the original model. Changes to Your Beach Program The needs of your beach program and the availability of resources can also change over time. You will need to reevaluate your beach program and its need for a predictive tool and assess whether you have the resources to meet that need. You also should evaluate your notification protocol over time to make sure it is still appropriate for making the best decisions about beach notifications. For example, if model results are highly accurate, a beach program that initially used both sampling results and modeling results to make beach notification decisions might decide to rely solely on modeling results for their beach. In that case, they might limit sampling to the days on which the model predicts an exceedance of the water quality standard or other notification threshold. S3 Credit: Ryan Hagerty/USFWS ------- 54 Six Key Steps for Developing and Using Predictive Tools at Your Beach Bibliography Aim, E.W., J. Burke, and A. Spain. 2003. Fecal indicator bacteria are abundant in wet sand at freshwater beaches. Water Research 37(2003) 3978-3982. APHA (American Public Health Association). 1998. Standard Methods for the Examination of Water and Waste-water, 20th ed. American Public Health Association, Washington, DC. Biedrzycki, Paul, Disease Control and Environmental Health, City of Milwaukee Health Department. 2012-2013. Personal communication. Boehm, A.B., R.L. Whitman, M.B. Nevers, D. Hou, and S.B. Weisberg. 2007. Nowcasting recreational water quality. In Statistical Framework for Recreational Water Quality Criteria and Monitoring, ed. L. Wymer. Wiley-Interscience, Chichester, West Sussex, England. Breitenbach, Cathy, Chicago Parks District. 2012. Personal communication. Briggs, Shannon, Michigan Department of Environmental Quality. 2012. Personal communication. Brooks, W.R., Fienen, M. N., and Corsi, S.R. 2013. Partial least squares for efficient models of fecal indicator bacteria on Great Lakes beaches: Journal of Environmental Management 114:470-475. Charles River Watershed Association. Charles River Water Quality Notification Flagging Program. http://www.crwa.org/field-science/water-quality-notification. Chicago Park District. 2012. Chicago Park District Improves Beach Monitoring for 2012 Season, http://www.chicagoparkdistrict.com/ chicago-park-district-improves-beach-monitoring-for-2012-season. Cicero, K. The 10 Best Beaches for Families: 2011. Parents Magazine. June 2011. Accessed January 22, 2013. http://www.parents.com. Clark, J., Hortobagyi, M., and Yancey, K.B. Just for Summer: 51 Great American Beaches. USA Today. March 27, 2012. Accessed January 22, 2013. http://travel.usatoday.com. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Bibliography Converse, R.R., J.L. Kinzelman, E.A. Sams, E. Hudgens, A.P. Dufbur, H. Ryu, J.W. Santo-Domingo, C.A. Kelty, O.C. Shanks, S.D. Siefring, R.A. Haugland, and T.J. Wade. 2012. Dramatic Improvements in Beach Water Quality Following Gull Removal. Environmental Science and Technology 46:10206-10213. Cyterski, M., W Brooks, M. Galvin, K. Wolfe, R. Carvin, T. Roddick, M. Fienen, S. Corsi. 2013. Virtual Beach 3.0.4: User's Guide. National Exposure Research Laboratory, U.S. Environmental Protection Agency, Athens, GA and U.S. Geological Survey, Middleton, WI. Eleria, A. and R.M. Vogel. 2005. Predicting fecal coliform bacteria levels in the Charles River, Massachusetts, USA. Journal of the American Water Resources Association. No. 03111. October 2005. Francy, D. 2009. Use of predictive models and rapid methods to nowcast bacteria levels at coastal beaches. Aquatic Ecosystem Health and Management 12(2):177-182. Francy, D.S., and Darner, R.A. 2006. Procedures for Developing Models to Predict Exceedances of Recreational Water Quality Standards at Coastal Beaches: U.S. Geological Survey Techniques and Methods 6-B5, 34 p. Francy, D.S., A.M.G. Brady, R.B. Carvin, S.R. Corsi, L.M. Fuller, J.H. Harrison, B.A. Hayhurst, J. Lant, M.B. Nevers, P.J. Terrio, and T.M. Zimmerman. 2013a. Developing and Implementing Predictive Models for Estimating Recreational Water Quality at Great Lakes Beaches. Scientific Investigations Report 2013-5166. U.S. Geological Survey, Reston, VA. Accessed March 2015. http://pubs.usgs.gov/sir/2013/5166/pdf/sir2013- 5166.pdf. Francy, D.S., E.A. Stelzer, J.W. Duris, A.M.G. Brady, and J.H. Harrison. 2013b. Predictive Models for Escherichia coli Concentrations at Inland Lake Beaches and Relationship of Model Variables to Pathogen Detection. USGS Staff-Published Research. Paper 706. Fulton, Jeff. No date. Public Beaches in Chicago. USA Today. http://traveltips.usatoday.com/public-beaches-chicago-53741.html. Hansen, D.L., S. Ishii, M.J. Sadowsky, R. E. Hicks. 2011. Waterfowl abundance does not predict the dominant avian source. Journal of Environmental Quality 40:1924-1931. ------- B6 Six Key Steps for Developing and Using Predictive Tools at Your Beach Bibliography Hartmann, J.W., S.F. Beckerman, R.M. Engeman, and T.W. Seamans. 2013. Report to the City of Chicago on Conflicts with Ring-billed Gulls and the 2012 Integrated Ring-billed Gull Damage Management Project. USDA National Wildlife Research Center, Staff Publications. Paper 1145. Helsel, D.R. and R.M. Hirsch. 2002. Statistical Methods in Water Resources. Elsevier Publishing. Hou, D., S.J.M. Rabinovici, and A.B. Boehm. 2006. Enterococci Predictions from Partial Least Squares Regression Models in Conjunction with a Single-Sample Standard Improve the Efficacy of Beach Management Advisories. Environmental Science and Technology (40)6:1737-1743. Kesteloot, K., A. Azizan, R. Whitman, and M. Nevers. 2012-2013.New recreational water testing alternatives. Park Science 29(2). Kinzelman, Julie, City of Racine. 2012-2013. Personal communication. Kurdas, Stephan, City of Racine. 2012-2013. Personal communication. Mas, D.M.L., and K. Baker. Fuss and O'Neill. BIT Guidance for Developing Predictive Models for Ontario Beaches. Ontario Ministry of the Environment. Toronto, Ontario Canada. February 2011. Mednick, A.C. 2009. Accessing Online Data for Building and Evaluating Real- Time Models to Predict Beach Water Quality. Publication PUB-SS-1063. Wisconsin Department of Natural Resources, Madison, WI. Accessed March 2015. http://dnr.wi.gov/files/PDF/pubs/ss/SS1063.pdf. Mednick, Adam, Wisconsin Department of Natural Resources. 2012. Personal communication. NRDC (Natural Resources Defense Council). Testing the Waters: South Carolina, http://www.nrdc.org/water/oceans/ttw/sc.asp. Seltman, H.J. 2013. Experimental Design and Analysis, Chapter 4 Exploratory Data Analysis. June 10, 2013. Olyphant, G.A., and R.L. Whitman. 2004. Elements of a predictive model for determining beach closures on a real time basis: The case of 63rd Street Beach Chicago. Environmental Monitoring and Assessment 98(1-3):175-190. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Bibliography 57 Our 7 Top Midwest City Beaches. Midwest Living Magazine. July-August 2010. Accessed January 22, 2013. http://www.midwestliving.com. Porter, Dwayne, University of South Carolina. 2012. Personal communication. Rockwell, D., K. Campbell, G. Lang, D. Schwab, G. Mann, and R. Wagenmaker. 2013. Beach Water Quality Decision Support System. Technical Memorandum GLERL-156. National Oceanic and Atmospheric Administration, Ann Arbor, MI. Accessed March 2015. http://www.glerl.noaa.gov/ftp/publications/tech reports/glerl-156/ tm-156.pdf. Rockwell, David, University of Michigan. 2012. Personal communication. Schwab, D.J., and K.W. Bedford. 1994. The Great Lakes Forecasting System. In Coastal andEstuarine Studies: Coastal Ocean Prediction, ed. C.N.K. Mooers. American Geophysical Union, Washington, DC. South Carolina Department of Health and Environmental Control. Beach Monitoring Program. http://www.scdhec.gov/HomeAndEnvironment/Pollution/ DHECPollutionMonitoringServices/BeachMonitoring/. Southeast Coastal Ocean Observing Regional Association. Water Quality Observations and Models Help Managers Make Decisions on Issuing Swim Advisories, www.secoora.org. Torrens, Sean, South Carolina Department of Health and Environmental Control. 2012-2013. Personal communication. USEPA (U.S. Environmental Protection Agency). 1999. Action Plan for Beaches and Recreational Waters. EPA 600/R-98-079. U.S. Environmental Protection Agency, Office of Research and Development and Office of Water, Washington, DC. USEPA (U.S. Environmental Protection Agency). 1999. Review of Potential Modeling Tools and Approaches to Support the BEACH Program. EPA- 823-R-99-002. U.S. Environmental Protection Agency, Office of Science and Technology, Washington, DC. ------- B8 Six Key Steps for Developing and Using Predictive Tools at Your Beach Bibliography USEPA (U.S. Environmental Protection Agency). 2002. Time-Relevant Beach and Recreational Water Quality Monitoring and Reporting. United States Environmental Protection Agency, Office of Research and Development, National Risk Management Research Laboratory. EPA/625/R-02/017. October 2002. Cincinnati, Ohio. http://www.scdhec.gov/HomeAndEnvironment/Water/SwimSafety/ USEPA (U.S. Environmental Protection Agency). 2007. Report of the Experts Scientific Workshop on Critical Research Needs for the Development of New or Revised Recreational Water Quality Criteria. EPA 823-R-07- 006. U.S. Environmental Protection Agency, Office of Water, Office of Research and Development. Airlie Center, Warrenton, Virginia. USEPA (U.S. Environmental Protection Agency). 2010a. Predictive Tools for Beach Notification. Volume I: Review and Technical Protocol. EPA- 823-R-10-003. U.S. Environmental Protection Agency, Office of Water, Washington, DC. USEPA (U.S. Environmental Protection Agency). 2010b. Predictive Modeling at Beaches. Volume II: Predictive Tools for Beach Notification. EPA-600-R-10-176. U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, Georgia. USEPA (U.S. Environmental Protection Agency). 2010c. Sampling and Consideration of Variability (Temporal and Spatial) for Monitoring of Recreational Waters. EPA-823-R-10-005. U.S. Environmental Protection Agency, Office of Water, Washington, DC. Accessed March 2015. http://www.epa.gov/sites/production/files/2015-ll/documents/sampling- consideration-recreational-waters.pdf. USEPA (U.S. Environmental Protection Agency). 2012. Recreational Water Quality Criteria. EPA 820-F-12-058. U.S. Environmental Protection Agency, Office of Water, Washington, DC. USEPA (U.S. Environmental Protection Agency). 2014. National Beach Guidance and Required Performance Criteria for Grants. EPA- 823-B-14-001. U.S. Environmental Protection Agency, Office of Water, Washington, DC. Whitman, R.L. and M.B. Nevers. 2003. Foreshore Sand as a Source of Escherichia coli in Nearshore Water of a Lake Michigan Beach. Applied and Environmental Microbiology 69(9): 5555-5562. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach Bibliography B9 Whitman, R.L., D.A. Shively, H. Pawlik, M.B. Nevers and M.N. Byappanahalli. 2003. Occurrence of Escherichia coli and Enterococci in Cladophora (Chlorophyta) in Nearshore Water and Beach Sand of Lake Michigan. Applied and Environmental Microbiology 69(8):4714-4719. Whitman, R.L., V.J. Harwood, T.A. Edge, M.B. Nevers, M. Byappanahalli, K. Vijayavel, J. Brandao, M.J. Sadowsky, E.W. Aim, A. Crowe, D. Ferguson, Z. Ge, E. Halliday, J. Kinzelman, G. Kleinheinz, K. Przybyla- Kelly, C. Staley, Z. Staley, and H. Solo-Gabriele. 2014. Microbes in beach sands: integrating environment, ecology and public health. Rev Environ Sci Biotechnol 13:329-368. Wood, Julie, Charles River Watershed Association. 2012-2013. Personal communication. Ziegler, Dan, Ozaukee County Public Health Department. 2012. Personal communication. ------- This page intentioanlly left blank. - ------- Case Study The South Shore Beach Model (Milwaukee, Wisconsin) 61 Introduction South Shore Beach is in Milwaukee, Wisconsin's South Shore Park on the western shore of Lake Michigan. South Shore Beach is a public beach with 150 meters of sandy shoreline within the South Shore Marina (owned and operated by the South Shore Yacht Club). A 20-meter embankment separates the sandy beach area from a cobble/pebble beach area that has a high-sloping shore (South Shore Rocky Area). The entire beach and marina area is partially enclosed by a breakwall, approximately 300 meters offshore, which limits wave action, water circulation, and exchange with the outer harbor. The beach is a few kilometers south of Milwaukee Harbor and the Milwaukee Metropolitan Sewerage District Jones Island Water Reclamation Facility. Three rivers- Milwaukee, Menomonee, and Kinnickinnic—reach Milwauke Bay Lake Michigan a confluence prior to discharging to Lake Michigan inside the Milwaukee Harbor breakwall. Visitors to Milwaukee's beaches on hot summer weekend days exceed 1,000 persons for all three public beaches combined: Bradford Beach, McKinley Beach, and South Shore Beach. South Shore Beach is home to a number of waterfowl and shore birds given its proximity to a public park and related greenspace. South Shore Beach also experiences algal blooms of cladophora, which is native to Lake Michigan and nearshore environments. In 1998 the City of Milwaukee Health Department (MHD) decided to develop a beach water quality predictive model for purposes of (1) improving water quality forecasting at the public beaches and (2) improving water quality advisories and related messaging to public beachgoers when water quality is unsafe for public swimming or contact because of elevated bacteria levels. In 2005 MHD implemented a different predictive model, and variations of the model are still in use today. Water Quality South Shore Beach has a history of poor water quality due to elevated fecal bacteria levels. Potential sources of fecal bacteria contamination include combined sewer overflows (CSOs); urban/suburban and agricultural runoff from the Milwaukee River Basin; runoff from impervious surfaces, including South Shore Park parking lots, pedestrian sidewalk and roadways, and marina infrastructure including docks, slips, and boats; and domestic and wild animal populations including Canadian geese, seagull, and other waterfowl flocks. The beach is directly adjacent to the South Shore Yacht Club and a small paved parking area that drains into the lake. ------- 62 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study The South Shore Beach Model (Milwaukee, Wisconsin) (continued) Milwaukee Harbor. (USAGE) The Natural Resources Defense Council has included South Shore Beach several times on its list of the top 10 dirtiest beaches in the United States. A possible contributor to the water quality problem might be an offshore breakwall (stone jetty), designed to block wave action and protect the lakefront from erosion. Unfortunately, it also limits the circulation of freshwater into the shallow-depth beach area. Pollution that enters the relatively stagnant lake through runoff near or around the beach area is therefore not readily turned over. To reduce pollution entering the lake, Milwaukee County installed a trench drain and rain garden along a parking lot near the beach. These practices were ineffective. The county is considering relocating the beach 100 yards south—to the other side of the breakwall—as a possible long-term solution to improving beach water quality conditions during the summer season. Model Development MHD used two different models over time—the EMPACT model and the Nowcast model. Both are described here in separate subsections. EMPACT Model In 1998 MHD developed a statistical model for three of its public beaches using funding awarded through the U.S. Environmental Protection Agency (EPA) Environmental Monitoring for Public Access and Community Tracking (EMPACT) grant. The model is E. co//. based on 24-hour rainfall data and previous 24-hour bacterial sampling data (E. coli MPN/lOOmL), which are the two most predictive variables. The University of Indiana and U.S. Geological Survey (USGS) assisted MHD in developing the model. Key factors when selecting which beach model to further develop and refine were the amount of funding and availability of technical support (both data management and model development) that could be leveraged to achieve improved predictive water quality outcomes. The EMPACT program significantly helped MHD take advantage of new technologies to provide environmental risk-related information to the public in a reliable and accurate near real-time context. When developing the model in 1998, the MHD was initially excited for the opportunity to try new technology for improving the accuracy of water quality advisories; however, the project posed many unanticipated technical and maintenance challenges. To collect data for the model, USGS used a sonde. A sonde is a water quality monitoring instrument that can measure numerous parameters including temperature, conductivity, salinity, dissolved oxygen, pH, turbidity, and depth. The harsh lake environment was unsuitable for long-term deployment of instrumentation. Furthermore, MHD did not have sufficient internal capability or resources to adequately manage the myriad of sampling, data analysis, and routine equipment maintenance. In ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 63 The South Shore Beach Model (Milwaukee, Wisconsin) (continued) South Shore Boat Park. addition, budget and staff cuts made the model too complex to sustain by a local public health agency with limited environmental health fiscal resources. Eventually MHD exhausted all funding and related external agency technical support, and stopped using the EMPACT model as its primary predictive model at the end of the 2004 beach season. The EMPACT project provided valuable insight, however, into the challenges of developing cost-effective and sustainable predictive water quality models at the local level and in the context of the Lake Michigan public beach environment. Nowcast Model After the 2004 beach season, MHD decided that a simpler Nowcast model would be more effective and discontinued use of the EMPACT model. For the Nowcast model, the development team selected a single public beach in Milwaukee (South Shore Beach), where the monitoring equipment could be located near a secure power source, protected against vandalism, and shielded from harsh environmental conditions. South Shore Beach also traditionally recorded the highest fecal indicator bacteria counts and, therefore, represented the highest potential health risk to the public during the typical beach season (June-August). It took MHD approximately 6 to 8 months to develop the Nowcast model, which was ready for use by the start of the 2005 beach season. If developed today, MHD could have done it more efficiently because better statistical and modeling software is more widely available and less costly to the end user. Data and Variables EMPACT Model The initial variables MHD considered for the EMPACT model included total rainfall for the previous 24 hours, pH, conductivity, wave height, water temperature, and Escherichia coli densities from the previous 24-hour sampling period. The MHD deployed a sonde in the water near the beach to collect real-time water quality data. National Weather Service (NWS) was utilized to derive daily rainfall data, which relies on geographically dispersed city ------- 64 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study The South Shore Beach Model (Milwaukee, Wisconsin) (continued) weather stations and gauges. In addition, sanitary surveys (typically conducted annually by the MHD) were useful in identifying and describing site-specific attributes and pollution influences to each of the Milwaukee public beaches. The MHD used regression analysis to determine which independent variables of interest might be most highly associated with or predictive of elevated E. coli counts at public beaches on a seasonal basis. Predictive variables differed between beaches, but rainfall data were used in determining water quality advisories at all three. Total rainfall over a previous 24-hour period was determined to be an important predictive variable in all three beach models primarily due to contributions of: (1) wastewater treatment plant induced CSOs and diversions, (2) sanitary sewer cross-connections and infiltration, and (3) stormwater runoff. MHD continued collecting select physical and chemical water quality data to integrate within beach water quality modeling through 2004. Design of beaches varies greatly and can determine the magnitude of impact, as well as duration of a pollution event (how much pollution input and time interval required for a beach to naturally recover). More specifically, for Milwaukee beaches, total rainfall was most highly correlated with bacterial contamination and predictive of water quality exceedances when it exceeded one-half inch along with temporal occurrence early in the beach season (June). Raw and summarized data were available daily or by request through a public website. MHD collected the data electronically via the sonde and transmitted it to the website, after review and analysis, for use by academic and research entities, the general public, and other interested parties (e.g., media and environmental groups). Nowcast Model The Nowcast model that MDH developed after the 2004 beach season is primarily dependent and Wisconsin BeachJJ&aJtJ m JuaJity Button on me tart bacfl lountynanaies water testing snsaflwsones maepenaenny. L counties ate available through the Tit* and County Health Depart-nent Contact' link on Hie left There are more beactiesm Wisconsin ttian appear iii Ihisiisl The beaches In tftis ftsi have current or historical E coli monitoring data in the wiflesch Heath system M .' '.I-1- Annan Reason Darts Of This MvHory NHreHTown BHCAMUOI '::..' ':--, E i:. Wisconsin Beach Health Advisory website. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 65 The South Shore Beach Model (Milwaukee, Wisconsin) (continued) Bradford Beach. based on regional precipitation, using the previous 24-hour rainfall total. The MHD model development team continues to explore and identify markers for nonpoint sources of pollution including chemical biomarkers in stormwater discharge (e.g., caffeine and triclosan derivatives). Avian and waterfowl populations, as well as algal impact, are noted but have not been particularly predictive of beach water quality in terms of contributions to microbial contamination of public health significance. Cladophora blooms, however, have increased in the past decade at each public beach, causing primarily nuisance and aesthetic concerns (e.g., objectionable odor and water discoloration). USGS and the Wisconsin Department of Natural Resources support all data collection and statistical analyses needed to develop and implement the Nowcast model. Most recently, the MHD partnered with research faculty at the newly formed Zilber School of Public Health at the University of Wisconsin-Milwaukee (UWM) to improve Nowcast modeling and identify other indicators of water contamination predictive of or directly associated with adverse public health impact. Model Implementation MHD exclusively used the EMPACT model for beach water quality advisory decision making from 1998-2004. However, the model expressed predicted exceedances with a maximum accuracy of 60 percent-70 percent at only one beach and often approached only 50 percent accuracy at the remaining two beaches. MHD also noted that the model's predictive accuracy tended to wane at each beach as the summer progressed, which suggests some level of seasonality or unidentified influences to water quality between early season and late season beach monitoring periods. As a result, MHD confidence in sustaining the model diminished over time. MHD assessed the effectiveness of the model by examining the degree of sensitivity and specificity. The criterion for issuing advisories was the exceedance of EPAs single sample maximum or geometric mean threshold for E. coli as expressed in MPN/lOOml. MHD uses the Nowcast model output for beach advisories. Because model results continue to be less than optimal in terms of predictive value, MHD relies on long-term trending of data and overall environmental conditions (i.e., water temperature, multiple day bacterial sampling results, and heavy rainfall) to refine the issuance of water quality advisories. MHD posts advisories for 24-hour intervals and uses the model and trending to determine when the advisories can be lifted. MHD would like to see more readily visible, meaningful, and informative public signs posted on each of the beaches including explicit illness risk and prevention ------- 66 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study The South Shore Beach Model (Milwaukee, Wisconsin) (continued) messaging. However, some key community policymakers and associated stakeholders (beach operators) are concerned that signs would interfere with the beach ambiance, tourism, and patron use. Current beach water quality advisory signage, therefore, remains limited in size, posting, location, and level of content. Model Cost The overall cost to develop the EMPACT model was initially in the range of $50,000-$75,000. The most costly aspects were siting, maintaining, and refining the beach sonde because of the harsh Lake Michigan environment and lack of MHD in-house capacity and expertise in this regard. Overall, the model did not prove to be cost effective due in large part to the cost of maintaining the sonde. Annual maintenance costs for the sonde ranged from $5,000-$10,000. New equipment replacement and upgrades cost an additional $20,000-$50,000 every 2 years. Milwaukee's beach program currently has a budget of around $50,000. MHD is no longer using the sonde and has saved additional money by partnering with the Zilber School of Public Health at UWM, whose graduate students do some of the sampling and data collection. They have even been able to increase the sampling frequency to 5 times per week at each public beach over the season. This represents a marked improvement from 1-2 times per week since 2006. Issues Encountered For the EMPACT model, the sonde equipment was placed in a very harsh environment. It required weekly maintenance. Security and data feed issues contributed to the challenges encountered. MHD relied on external sources to provide the maintenance and replaced equipment on a more frequent basis than originally anticipated. In addition to the issues MHD had with the sonde, they did not have sufficient funding for refining and sustaining the model use. The only statistical software they had in-house (Epi-Info) was primarily directed toward use in tracking the spread of communicable disease and outbreak management, which was not useful for developing an environmental predictive model for a beach water quality monitoring program. MHD needed software that is readily available and easy to use with basic comparison analysis capabilities. Most public health agencies do not have these resources in-house, and they do not have the technical familiarity and capabilities to effectively use the resources. This often creates a knowledge gap and vulnerability with regard to environmental statistics collection, analysis, trending, and interpretation. The EMPACT model was not piloted or tested before implementation. In hindsight, MHD should have presented the model to the regional beach stakeholder group for reaction and feedback, as well as to conduct beta testing. Moreover, conducting a more thorough comparative analysis with other available models and methodologies as part of model implementation would have been helpful. In hindsight, the MHD staff did not have sufficient knowledge and expertise to design, develop, implement, and evaluate a model that could be cost effective and sustained. Moving Forward MHD has developed Nowcast models for each of the three public beaches located in Milwaukee. MHD developed the Bradford beach Nowcast model in partnership with the Zilber School of Public Health at the UWM and is working with Dr. Todd Miller and graduate students to conduct field sampling and monitoring on a seasonal basis. The MHD/UWM team collected water samples from Lake Michigan at three beach sites (Bradford, McKinley, and South Shore) from early June until late August 2015. UWM and MHD assessed these water samples for E. coli levels. UWM also investigated fecal coliform levels. In addition to fecal indicator bacteria, Dr. Miller's study is looking at chemical markers in wastewater, specifically the identification of wastewater bacteria involved in the degradation of triclocarban. This has been shown to be very effective in predicting FIB exceedances at beaches. Dr. Miller is also looking at ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 67 The South Shore Beach Model (Milwaukee, Wisconsin) (continued) temporal fluctuations in E. coli sampling; morning and afternoon results might be markedly different. This collaboration has yielded benefits in both the leveraging of available local expertise and improving the understanding of beach water quality as related to the protection of community health. The MHD/ UWM team also recorded environmental conditions, including weather, rainfall, algae content, litter, and wildlife, for each beach on every date that they collected water samples. They will continue to translate and load the data into a database for long- term storage, analysis, and prediction forecasting. Further work will automate data integration, translation, and loading. The team will explore development of a website and an appropriate secure interface to provide access to elements from the database and forecasting framework to researchers, other government agencies, and members of the public. The team continues to use USGS EndaTT service, NWS data and sanitary surveys periodically conducted by the MHD. They believe that these types of readily available inputs will result in a more cost- effective model for use by the MHD in determining seasonal water quality advisories at each of the public beaches. The team is no longer using the sonde equipment, which has significantly reduced maintenance costs. They are currently using rainfall data and the previous day's E. coli concentrations. The focus of the new modeling efforts was expanded to all three beaches in 2014, although significant attention continues to be spent on water quality conditions at Bradford Beach. Bradford Beach is very popular and supports various recreational activities, including national volleyball tournaments, and was numerous beachfront attractions including a pavilion, beachfront tiki bars, and recreational equipment. Finally, the team hopes to refine the predictive model and generate more hypotheses on the contribution of various sources of intermittent pollution at each public beach. For example, they have determined that birds and algal blooms were not particularly relevant factors at every beach and that chemical markers in wastewater, along with sub-daily fluctuations of E. coli concentrations, may be more important in future predictive modeling initiatives. In 2016, MHD is planning to pilot the implementation of buoy equipped with various water quality sensors at each beach by partnering with Dr. Todd Miller. They will evaluate the ability to more rapidly collect data relevant to beach water quality conditions and refine existing models to improve predictive accuracy. Advice and Lessons Learned In 1998 the EMPACT beach predictive model developed and used by MHD was cutting-edge because it attempted to identify key environmental variables other than rainfall that would help predict elevated E. coli levels at three public beaches in Milwaukee. It also pioneered the collection and analysis of select and real-time physical and chemical characteristics of beach water quality for use by local public health authorities in determining the need for posting of beach water quality advisories. However, the model did not readily improve predictive accuracy as compared to simple use of previous 24-hour rainfall measurements, nor was it cost effective. The project did, however, provide valuable information about the unique characteristics and attributes of each beach site in Milwaukee and it allowed MHD to consider continued exploration of more scientific and evidence-based approaches important to the successful development, testing, implementation, and evaluation of future predictive models. Overall, the struggles with the initial EMPACT model had a major impact on MHD's beach monitoring program. They do not regret going through the process because of how much they learned. MHD understands that citizens expect them to protect public health; therefore, they need the tools to provide the best available information and meet the needs of each community. The model used must be a good fit for the local public health department—in MHD's case, this meant a simple, low-maintenance, user- friendly model that allows them to share accurate health information with the public. It is very ------- 68 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study The South Shore Beach Model (Milwaukee, Wisconsin) (continued) important to earn and keep the public's trust. False positives and errors must be minimized. Paul Biedrzycki of MHD also offers the following advice to fellow beach managers: 1. Conduct a broad stakeholder planning and review process. 2. Review evidence-based best practices from other jurisdictions and research studies. 3. Build "buy-in" from local policymakers for resource allocation (program funding). 4. Develop quality assurance and quality control criteria. 5. Anticipate resources needed for sustainability. 6. Conduct independent evaluation and review. 7. Conduct thorough piloting/testing phase before implementation. In general, local public health departments have increasingly limited resources to conduct either extensive or comprehensive environmental health assessments. It is anticipated that the public health sector will continue to experience significant budget cuts at the local, state, and federal levels in the near future. While sustainability and green movements have provided some moderate assistance in terms of additional community resource availability, governments are not growing and state agency budget and revenue sharing with locals is being reduced. Therefore, collaboration and information sharing between entities is essential if recreational water quality monitoring programs are to remain in the future. Partnerships between states and within states, as well as between a diverse group of stakeholders (e.g., environmental groups, universities, community organizations, and federal agencies), must be fostered and encouraged. References USEPA (U.S. Environmental Protection Agency). 2010b. Predictive Modeling at Beaches. Volume II, Predictive Tools for Beach Notification. EPA- 600-R-10-176. U.S. Environmental Protection Agency, National Exposure Research Laboratory, Athens, Georgia. Biedrzycki, Paul. Disease Control and Environmental Health, City of Milwaukee Health Department. Personal communication. 2012. ------- Case Study 69 Charles River Watershed Association Flag Program (Boston, Massachusetts) Introduction The Charles River, flowing about 80 miles from Hopkinton, Massachusetts, to its terminus in Boston Harbor, is one of the busiest recreational rivers in the country. On a typical summer weekend, the river will attract tens of thousands of people in a large and often colorful array of vessels including canoes, kayaks, dragonboats, sailboats, fishing boats, and rowing shells. Unfortunately, given the urban nature of development along the river (it runs through 23 cities and towns), a variety of sources of pollution, including combined sewer overflows (CSOs), cause water quality problems, especially in the Lower Basin—the approximately 9-mile stretch from the Watertown Dam to the New Charles River Dam. In 1998, the Charles River Watershed Association (CRWA) initiated a flag program, flying color- coded flags to alert people about water quality conditions in the Charles River Lower Basin. This case study explores the efforts of the CRWA to build the scientific foundation of the flag program by developing a water quality model. Water Quality In 1995 the U.S. Environmental Protection Agency (EPA) established the Clean Charles Initiative with the purpose of restoring the Charles River and making it fishable and swimmable. Much progress has been made, thanks to the collaborative efforts of EPA; other federal, state, and local government agencies; nonprofit groups; private institutions; and ------- 7O Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued) The Charles River. the public. However, more work remains. Stormwater runoff and CSOs remain a special concern and, while water quality is usually sufficient for boating and other secondary contact water activities, swimming and other activities involving continuous full-body contact are not recommended because of bacterial levels that exceed primary contact standards. Model Development CRWA was founded in 1965 for the purpose of spearheading projects aimed at cleaning up the Charles River. Conditions improved over time, allowing more people to safely use the river for secondary recreation use; however, the river remained impaired for bacteria, especially during wet weather. Therefore, in 1998 CRWA, in a joint project with Tufts University and funding from EPA, began developing a statistical model that predicts the likelihood of a violation of the state boating standard in the Lower Charles River Basin. One of the project's goals was to be able to forecast and publicize daily water quality conditions. The Lower Charles River Basin does not have a swimming beach, but it is the busiest section of the river and secondary recreational activities continue to expand. CRWA initially developed two different statistical models, adopting the one with the best performance. It took a few years to build up a data set of indicator bacteria sample results large enough to use to develop the model. A former staff member developed the original model as part of their master's thesis at Tufts University. To select the model's variables, the project team conducted a literature review of similar projects, with the major limitation that data used had to be readily available on a daily basis. The best predictive variables were rainfall volume, river flow, and wind. The project team used the ordinary least squares (OLS) method in Minitab* to develop the regression model, and they used Microsoft Excel to run the equation on a daily basis. An intern at CRWA who had recently received a master's degree, overseen by Julie Wood, updated the model in 2009 to account for changes in availability of real-time data and a switch from fecal coliform to Escherichia coli as the primary indicator bacteria for state water quality standards. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 71 Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued) Model Implementation In 1998 CRWA began flying color-coded flags to alert people about water quality conditions in the Charles River Lower Basin. Flown from July through October at select shore locations between Watertown and Boston Harbor, CRWA flags informed boaters about E. coli bacteria levels and blue-green algae blooms. Specifically: • A blue flag indicates CRWAs forecast that the likelihood of bacteria exceeding the boating standard is less than 50 percent and a blue-green algae bloom is not present. • A yellow flag indicates that health risks are possible, but data are inconclusive to predict risks with certainty. Yellow flags are flown when signs of a blue-green algae bloom are present but the actual human health risk is unconfirmed or unknown. • A red flag means that the probability of the river exceeding boating standards is equal to or greater than 50 percent, or that a health risk is present because of a confirmed blue-green algae bloom. Red flags are also flown for 48 hours following a reported1 CSO. The decision on which color flag to fly is based on the results of a mathematical model that uses rainfall and other weather factors along with river conditions to estimate the probability of the river exceeding the state secondary contact recreation (boating) standard of 630 E. coli colony forming units per 100 milliliters of water (cfu/100 mL). In addition to the model, CRWA collects weekly water samples to help verify model predictions and to add to the database of water quality information. Over the past 15 years, CRWA has continued to enhance the model; water sampling has confirmed an accuracy rate of about 90 percent for predicting water quality violations. The program provides daily advisory information and allows river users to make more informed decisions about recreating on the river Red advisory flag indicating potential health risks. 1 Unfortunately, only 1 of the 11 active CSOs in the Charles River Lower Basin provides real-time overflow notifications. Charles River Watershed Association Flag Program website. ------- 72 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued) on that day. The program is not used for enforcement actions, and the river is never closed to the public on the basis of model results. The model-generated advisory information continues to be communicated to the general public through the posting of color-coded flags and through email, CRWA's website, Twitter, and a telephone hotline. Eleven facilities fly the color-coded flags along the river, providing a valuable public service. These facilities include yacht clubs, boating centers, canoe and kayak outfitters, and Harvard University's famed Weld Boathouse. Model Costs Key costs for model development included labor costs and sample analyses. Labor to collect and compile online data was the most significant cost. In some cases, older weather data had to be purchased. Collecting and organizing free data into a usable format, especially when it must be formatted to work with a specific statistical software package, can be time-consuming. Collecting and analyzing E. coli samples also required staff time and lab costs of about $30 per sample. CRWA collects four samples, once or twice each week to verify its model predictions. Before implementing the model, monitoring occurred at least twice a week up to as often as daily. Since implementation, CRWA has been able to reduce monitoring frequency to weekly when funding is limited. CRWA believes that the cost of the model is offset by the value of the daily water quality notifications for public health and safety. Issues Encountered Challenges associated with model development included the following: • Choosing input variables that were easily available daily. These include rainfall volume (previous 24, 48, 72, and 168 hours), wind speed, time since specific rainfall volume (more than 0.01 inch; more than 0.1 inch), flow, and solar radiation. These data are available from the National Oceanic and Atmospheric Administration or the U.S. Geological Survey. • Building a database of E. coli concentrations for model calibration and verification. • Meeting the needs of all users. • Working with a limited budget. The biggest challenge that CRWA faced in the development phase was the availability of data to test predictive factors. The CRWA did not collect any real- time data, so it could use only what was available on the Web. Consequently, CRWA had to rely on other organizations to continue to collect the data and publish it in a timely manner. Data availability continues to be a challenge in the implementation phase. The model is run every morning at 8 a.m. in the recreational season when data are available. Usually CSO discharge data are not collected, but while CSOs are not a part of the statistical model, any existing discharge information is incorporated in the notification protocol. It is a time-consuming process to develop and employ a model. CRWA runs its model Monday through Friday from July through October. On Friday afternoons, CRWA provides a weekend forecast using model simulations based on weather predictions. CRWA has discussed running the model on the weekends and has done so on occasion; however, this is logistically challenging because most of the staff work only Monday through Friday. The model is run once a day around 8 a.m. This limits its utility for the river users (primarily scullers), of which there are many who are out on the river in the early mornings. Additionally, the model is not updated throughout the day, although in reality water quality conditions do change continuously. Finally, since the model is not run on weekends, accurate information is not available to weekend users. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 73 Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued) Weld Boathouse at Harvard University flying a blue flag indicating suitable boating conditions. Moving Forward In 2012 CRWA added two additional boathouse locations where flags are flown (12 sites total) to provide more complete coverage of the area. CSOs are a major challenge to maintaining the river's water quality. Under the CSO control plan for Boston, some CSOs may remain in the long term. Under the control plan, some CSOs have added primary treatment and notification, but several have not. A goal and priority for CRWA is to continue to reduce CSOs significantly and notify the public in a timely manner in the event that CSO discharges occur. Recreation continues to expand in the watershed and might include swimming in the future if water quality improves. Real-time modeling is expected to help document improving water quality and serve as a notification tool for water-based activities in the Charles River. The CRWA is collaborating with Coastal Environmental Sensing Network (CESN) at the University of Massachusetts in Boston. CESN established a real-time weather station and wrote a program that allows data from the weather station to be continuously fed into the model, along with flow data. The station went online in August 2012; the group has verified data starting in September 2012. So far, the group has eight overlapping sampling points with weather station data for October and three overlapping sampling points for September. The group completed the analysis of overlapping data during the 2013 season. Running the model using data inputs from this new weather/water quality station is working well. The accuracy of the model using inputs from this station has improved when compared to the current system because the model is automatically updated every hour based on the most recent data. ------- 74 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Charles River Watershed Association Flag Program (Boston, Massachusetts) (continued) Although CRWA does not have additional resources to put toward the real-time data collection, the group would like to develop a real-time model; continuing to collaborate with the university will make this goal more realistic. CRWA also hopes to add other parameters, such as turbidity, to the model. A real- time model would be more effective for quickly notifying the public of water quality conditions because the Charles River hosts a wide variety of recreational activities. For example, water quality forecasts go out at 9 a.m. (based on NOAA updates at 8 a.m.). However, rowers are out on the water at 5 a.m.—well before any water quality notifications are available. Real-time forecasting capabilities would greatly improve the program. Unfortunately, the long-term outlook for the project depends on the resources CESN and CRWA have available to continue to maintain the weather station and the real-time data feed to the model. Advice and Lessons Learned In light of the experience and success of CRWAs modeling efforts, Julie Wood of CRWA recommends that beach managers "go for it" with regard to developing their own models. The model does not have to be complicated—a simple regression model can be effective in many systems to broadly predict possible risk. In addition, it is important to consider the availability of your staff to run the model and post notifications, since that affects how often the model can be run. It can be especially challenging if you want to run the model on the weekends. Overall, resources are a major factor when developing and implementing predictive models. Based on their experience with the CESN station, the CRWA team recommends that model developers select a model that can be automated and run continuously in real-time based on data readily available on the Web. You will still need staff to collect the samples to verify the forecasts, but you will not need staff to run the model. You can run this type of automated model every day of the week and early in the morning, providing water quality predictions based on the most current data. This would help meet public expectations for real-time now-casting in very fine timescales. References Charles River Watershed Association. Charles River Water Quality Notification Flagging Program. http://www.crwa.org/field-science/ water-quality-notification. Eleria, A. and R.M. Vogel. 2005. Predicting fecal coliform bacteria levels in the Charles River, Massachusetts, USA. Journal of the American Water Resources Association. No. 03111. October 2005. Wood, Julie. Charles River Watershed Association. Personal communication. 2012-2013. ------- Case Study Chicago Park District Beach Modeling (Chicago, Illinois) 75 Introduction Chicago's 26 miles of shoreline along Lake Michigan provide residents and visitors with many water-based recreational opportunities. Especially popular are a series of 24 beaches owned and managed by the Chicago Park District (CPD). Over 20 million people visit these beaches each year between Memorial Day weekend and Labor Day to swim and enjoy the sand, sun, and scenery. CPD's mission with these beaches, as with all their parks, is to provide a customer- focused experience that prioritizes and responds to the safety and needs of children and families. To aid in providing a safe beach environment CPD developed a system of colored flags to communicate safe swimming status at the beaches. A green flag means that weather conditions and water quality *—^lontrose Beach") are good and swimming is permitted. A yellow flag indicates that swimming is permitted but beach- goers are cautioned that weather conditions are unpredictable and/or water quality does not meet state swimming standards. A red flag indicates that swimming is not permitted either because weather or water quality is causing unsafe or dangerous conditions. In general, the lifeguards stationed at each beach are responsible for monitoring weather conditions and changing swim status when necessary. However, while beachgoers can usually relate to unsafe weather conditions such as high waves and lightening, unsafe water quality conditions are not nearly as obvious. Currently, CPD's decision to change swim status due to water quality is based on two complementary approaches: (1) analysis of water samples and (2) a computer model that uses weather and hydrology data and water conditions to predict real-time water quality. Lake Michigan Park Washington Jackson i Park 63rd Street Bea aclT) Illinois Indiana own. Water Quality Most water quality problems found at CPD's beaches can be linked to nonpoint sources of pollution origi- nating in the small watersheds along the shoreline. Runoff from roadways, parklands, and other nearshore land areas collects and drains to the lake through a network of stormwater outfalls. Chicago's human sewage is not directed into Lake Michigan except during extreme storm events, when the locks that separate the Chicago River system from Lake Michigan are opened to minimize or prevent flooding. CPD believes that the relatively large resident gull and Canada geese populations are one of the most significant contributors to the pollution load at the beaches. In response, the District has initiated various programs ------- 76 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Chicago Park District Beach Modeling (Chicago, Illinois) (continued) Uniformed border collie chasing birds off the beach. to discourage their presence, including prohibiting feeding and using uniformed border collies to chase birds off the beaches. Similar to most actively managed freshwater beaches, CPD routinely collects water samples and has them analyzed in a laboratory for E. coli. Samples are collected at each beach every Monday through Friday during the swimming season. Additional samples are also collected through the weekend if weekday results show high levels of E. coli. CPD's sampling program follows U.S. Environmental Protection Agency (EPA) guidelines and protocol for water collection and laboratory analysis for E. coli concentration. The bacteria culture process takes 18-24 hours to complete (Colilert method); consequently, sample results are not available until the day after they are taken. If E. coli levels are found to be above the state's water quality standard of 235 CFU/lOOmL, the water is considered unsafe for swimming. CPD subsequently notifies the public of the threat through their website and other outlets and by posting an advisory at the beach and changing the swim status flags. Fortunately in most instances when E. coli levels are found to be above the 235 CFU/100 mL water quality standard, the next-day's sample results are usually below the water quality standard. Part of the reason for this phenomenon is because the large open shoreline encourages water circulation between shore waters and deeper offshore waters. Thus, bacteria that enter most beach areas during and after storms are dispersed and flushed away from near-shore areas fairly rapidly. However, beaches that are sheltered in an embayment or protected by piers or seawalls often do not circulate their beach water as freely and sometimes experience more persistent high bacteria levels, with swimming advisories lasting multiple days. The fact that high FIB levels at most Chicago beaches only last a day underscores the problem of having at least an 18-hour lag time between sample collection and laboratory results. Beachgoers are unknowingly swimming in water with high FIB levels the day the water sample is collected, and are advised not to swim ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 77 Chicago Park District Beach Modeling (Chicago, Illinois) (continued) the following day, when levels are usually safe based on the analysis of that day's sample. This lag-time problem caused CPD to explore the possibility of developing a predictive mathematical model so that beach management officials could make more timely decisions concerning swim status and thus better protect the health of the beach-going public. Model Development CPD began the predictive modeling project in 2011 with the assistance of the U.S. Geological Survey (USGS) and a $243,000 Great Lakes Restoration Initiative (GLRI) grant. Together the agencies decided on a group of weather-related parameters that could potentially be incorporated into the model. They then developed and deployed buoys for in-water measurements and pole-mounted weather stations near the beaches to monitor atmospheric conditions. Given resource limitations, CPD decided to initially focus on a set of Chicago beaches that: (1) most frequently exceeded the E. coli criteria and (2) had the highest beach attendance. Eventually five beaches were selected for the modeling exercise. The list included the largest beach in size (Montrose Beach) to one of the city's most popular (Oak Street Beach). The other three beaches were Foster Beach, 63rd Street Beach, and Calumet Beach. All the beaches are primarily affected by nonpoint sources of contamination and have a history of E. coli exceedance rates between 8 and 15 percent (percent of days when the mean of two samples exceeds 235 CFU/100 mL) over the last few years. Attendance records for the beaches ranged from approximately 100,000 visitors to several million visitors per swimming season. Foster Beach. ------- 78 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Chicago Park District Beach Modeling (Chicago, Illinois) (continued) Model Development Key Components Technical and Financial Resources The USGS was instrumental in getting the project off the ground. They helped select the monitoring equipment and trained staff to use and maintain it, USGS also provided guidance on developing the model and performed statistical analyses. The Lake County Health Department, which already has experience implementing a predictive model program, also provided expertise during model development. In addition, several presentations at the Great Lakes Beach Association Conferences provided Installing monitoring equipment at Chicago beaches. CPD staff with options and a variety of potential methods for developing the model. The models were developed using multivariate regression analysis. The USGS selected variables by identifying the ones that fit best statistically. USGS considered including gull counts, but found that this information was difficult to use and implement in the context of the model. CPD used its own resources to deploy the monitoring equipment, including scuba divers, electricians, and related heavy equipment such as boats and a bucket truck for installing the weather station on light poles. If CPD had contracted the installation, costs would have increased significantly. Currently CPD provides funding for data collection and equipment maintenance but continues to rely on the USGS to perform statistical analyses. CPD could possibly hire contractors to complete this work, but few would have the necessary depth of understanding of Lake Michigan ecology. CPD spent approximately a year and a half developing the first models and expects to change and improve them with additional data in the future. CPD initially anticipated the need for two years of data to have working models developed because results depend strongly on the weather. The Chicago area has very different beach seasons from year to year; therefore, a larger data set will help improve the model's accuracy. Data Resources When developing the model, CPD relied on daily weather and water quality data, along with water quality data collected as part of CPD's existing beach monitoring program. CPD also considered data collected during daily sanitary surveys for model development purposes. USGS explored whether other data sources, such as that from the National Oceanic and Atmospheric Administration (NOAA), might be useful. They did not use NOAA or other external data because these data sources did not work as well. For example, ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 79 Chicago Park District Beach Modeling (Chicago, Illinois) (continued) Come Out and Play! Chicago Park District Beach Notification website, NOAA data comes from further offshore; beaches in Chicago are man-made and have many structures in place, so they require detailed on-site data. Model Implementation Public Involvement CPD did not involve the public during the initial phases of model development because the information was too technical. However, CPD conducted significant public outreach to inform people about implementation efforts. All data were made available to the public via the CPD website. CPD also posted information about how the model worked, how advisories work, what changes would occur, and how this would improve public health. There was a lot of media interest, which gave CPD the opportunity for interviews with numerous papers and news stations. CPD did not receive much feedback from the public even though the public could submit questions and comments via website or hotline. CPD received occasional feedback, however, when there were unusual data or equipment malfunctions. Calumet Beach SWIM STATUS O SWIM WATER QUALITY INFORMATION Forecasifortotiay 266 4 Most recent test result 789 Hfsr/ ran : hj-fri fmtfffnr twim xlfvftfl Model Output and Validation The key variables CPD used for these models include the following: • Air temperature. • 6-hour solar radiation. • 4-hour wave period. • Longshore (NNW) wind. • 6-hour longshore (NW) wind. • 6-hour rainfall. ------- 8O Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Chicago Park District Beach Modeling (Chicago, Illinois) (continued) • 48-hour rainfall. • 4-hour log-wave period. • Day of year. • 4-hour onshore wind. • 4-hour log turbidity. • 4-hour log wave height. Each model used a different combination of these variables. CPD conducted routine sampling throughout the 2012 beach season to collect data for validating the model. They compared actual sampled results with modeled results to ensure the model's accuracy. CPD reports predicted values and will continue to refine the models over the next several beach seasons. Although model accuracy fluctuated between years, CPD is confident that the advisories they issued on the basis of modeled results were more accurate than they would have been without the model. With regard to confidence in model results, CPD remains "cautiously optimistic." CPD is assessing the effectiveness of the model by evaluating whether more Type 1 and Type 2 errors would have been generated relying only on traditional water testing and waiting 24 hours for results. Currently, if the model predicts a bacteria level over 235 CFU/100 mL, CPD issues an advisory. CPD also posts the most recent lab results from traditional water testing at each beach. If the test results and the model do not agree, CPD then uses the model to determine the advisory status. Implementation CPD began using the model in 2012 to make manage- ment decisions on notification actions. They monitor all beaches every weekday. They also monitor on weekend days following an exceedance on a Friday, or if the model predicts an exceedance on the weekend. CPD runs the models at 9:00 a.m. and issue advisories by 9:30 a.m. If the model shows no exceedance, CPD posts a green flag. The public can view both model results and sampling values by visiting the beach, viewing the website, or calling a hotline. Model Costs The $243,000 GLRI grant provided the bulk of the financial resources for the project. CPD also set aside $50,000 in their capital budget to help purchase the equipment in the first year (2011), and $25,000 in capital funds to increase the amount of equipment in 2012. In addition, CPD spent about $120,000 in 2011 for water sampling at all the beaches in Chicago. Most of these costs would have been incurred without the modeling project. The extra sampling for modeling was about $15,000. The most costly aspects of the modeling process included the purchase of equipment and USGS support. Equipment costs were approximately $70,000. Monthly bills for cellular data were about $3,000— this covers data transmitted by eight cellular modems. Obtaining water quality data (FIB testing results) did not cost extra because this work would have been done regardless of development of the models. However, for reference, the lab costs for water quality sampling were about $100,000, and the personnel costs for water sampling were approximately $20,000 annually. The grant was funded in the fall of 2010 and continued through 2013. A large portion of the funds was used to purchase and install the equipment and for USGS statistical analysis. Some grant funds remain; these will be used to offset ongoing costs (maintenance, statistical analyses, etc.). Currently, CPD relies on internal funding, which could decrease in the future. When determining overall cost effectiveness of the model, CPD concluded that they would save money only if sampling is reduced. CPD does not currently plan to reduce sampling; however, if the BEACH Act funding is cut, this would affect sampling significantly because fewer resources would be ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 81 Chicago Park District Beach Modeling (Chicago, Illinois) (continued) Montrose Avenue dog beach, available. For CPD, the bottom line is, "How do you put a price on better information?" Issues Encountered CPD had their share of issues with field equipment, including equipment getting damaged by rough weather. They adjusted the anchoring scheme for the buoys, which helped, and have eliminated some buoys, although some equipment issues continue and the buoys are expensive to maintain. Looking back, CPD might have selected a different anchoring system to ensure that the equipment remained in place. Moving Forward CPD intends to keep moving forward with their models. CPD has already invested over $75,000 of department funding into the modeling program, which shows confidence in the model's effectiveness. CPD has expanded to other beaches since 2011, and for the 2015 season predictive models were used at all 24 of the city's beaches. In addition, they have substantial resources going into mitigation practices. They are also working on developing better information and methods to address non- anthropogenic sources of bacteria such as shore birds. During the initial year of data collection, CPD increased sampling frequency to twice per day at the modeled beaches. USGS and Michigan State University have helped validate and update the models annually as of 2015. Some beaches with higher exceedance rate have been difficult to model. CPD is prioritizing rapid methods at these beaches. CPD also conducts public outreach about beach water quality. They implemented a new texting service that allows beachgoers to text the name of their beach to a dedicated number and receive an automatic response with the current beach conditions. A public education campaign encourages people not to litter or feed wildlife, since waste from seagulls and geese has been shown to be a major source of fecal bacteria in the water. The campaign also includes signage on Chicago ------- 82 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Chicago Park District Beach Modeling (Chicago, Illinois) (continued) public transit, posters at beaches, and a large mural at one of Chicago's busiest beaches. A new Beach Ambassadors program with direct public outreach asks beachgoers not to litter or feed wildlife, and expanded programming for CPD's summer day camp program educates kids on what they can do to keep the water clean. Finally, CPD is working to reduce bacteria sources directly. New grooming equipment removes debris and exposes wet sand to sunlight, killing bacteria. At beaches with a history of problems from seagull waste, CPD is using dog handlers and trained border collies to chase the gulls from the beach. This project has significantly reduced the number of days where FIB levels exceeded water quality standards. Advice and Lessons Learned Sanitary survey data were tested in 2010, but it was determined that more accurate and timely data (buoy-based) was needed for the models. While daily sanitary survey data are helpful for monitoring operations such as garbage collection and beach grooming, and keeping track of pollution sources, survey parameters are not included in the models— the models are all based on data collected by sensors. The success of a model depends on a number of factors. For CPD, the most important factor was related to the presence of nonpoint versus point sources of pollution. You need to have comprehensive knowledge about the beach before you can successfully develop a predictive model. Cathy Breitenbach of CPD noted they are a large jurisdiction with many resources. They were able to do all equipment maintenance and monitoring in-house and did not have to hire or rely on outside support. Without this, they would not have been as successful, especially considering that Chicago is a big city with a large beach-going population to protect. Other agencies who want to develop a model must have access to funding and technical resources necessary to collect data and conduct statistical analyses. If their jurisdiction is small, however, they can likely develop and implement a predictive model at a lower cost. References Cathy Breitenbach, Chicago Parks District. Personal interview. 2012. Chicago Park District. 2012. Chicago Park District Improves Beach Monitoring for 2012 Season. http://www.chicagoparkdistrict.com/ chicago-park-district-improves-beach-monitoring- for-2012-season. Fulton, Jeff. No date. Public Beaches in Chicago. USA Today, http://traveltips.usatoday.com/public- beaches-chicago-53741.html. ------- Case Study City of Racine Nowcast Model (Racine, Wisconsin) 83 Introduction How does a small coastal Wisconsin city of about 79,000 citizens reel in a "Best Beach in the State" title? One reason might be its cutting edge approach to staying on top of water quality. The City of Racine, on Lake Michigan between Milwaukee and Chicago, manages two popular swimming beaches, North Beach and Zoo Beach. At 50 acres, North Beach is the larger of the two. In 2012, USA Today named it the best beach in Wisconsin, joining 50 other beaches similarly selected from each of the states and the District of Columbia. This honor can be added to a long list of accolades for North Beach, which includes a Top 10 Family Friendly Beach designation by Parents magazine in 2011 and the Midwest Living magazine's Top City Beaches list in 2010. North Beach has medium- to fine-grained sand and is groomed to remove trash and aerate the sand. The swim area has a fairly shallow slope (2 to 5 percent) and the beach has a 1 to 1.5 percent slope toward the water. A harbor break wall increases swimming safety by keeping waves in check. The beach face is kept at a steep grade to prevent waves from spilling over the berm crest. The city maintains restrooms, a bathhouse, a concession stand, and an adjacent playground. The city hires lifeguards to ensure public safety. Weekend visitor numbers can exceed 11,000; daily visitors average up to 2,200 persons per day during the swimming season. Lake Michigan . Douglas 4 Park" 4 Lake View 'Park I North Beach Park . "*" 0.25 0.5 Miles L J Zoo Beach, adjacent and north of North Beach, is smaller, less- developed, and attracts fewer beachgoers than North Beach. So named because of the adjacent Racine Zoo, it has fewer access points and amenities. Lifeguards are on duty only on weekends. The swim area has a steep drop-off and no break wall, so the wave action is more intense. Because of these contrasts, Zoo Beach offers visitors a quality beach experience with beautiful views of Lake Michigan in a more peaceful, less-populated setting. Water Quality Racine's beachgoers have not always enjoyed the current levels of high water quality at their beaches. For example, in 2003 North Beach was under a no-swimming advisory for 34 days because of high fecal indicator bacteria counts. On several of these days the beach was closed entirely. That same year, Zoo Beach had notifications ------- 84 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study City of Racine Nowcast Model (Racine, Wisconsin) (continued) START OF PROTECTED AREA Sampling at North Beach, issued on 29 days. Since the swimming season in Wisconsin is approximately three months long, these problems resulted in a loss of almost 40 percent of Racine's potential beach days in 2003. In response, the city began a campaign to deal with the point and nonpoint sources of fecal pollution that were polluting their beaches. Sanitary surveys proved to be important tools in helping city officials identify pollution sources and plan mitigation projects such as wetland construction, dune restoration, and improved beach grooming practices. The results of these efforts were outstanding, especially in terms of reducing the number of beach advisories and closings. In 2010 North Beach was closed or under a swimming advisory on only one day and on only three days in 2011. Zoo Beach had four notifications in 2010 and five in 2011. This increase in safe-swimming days provides clear evidence of the power of active beach management. Model Development With beach clean-up efforts underway, Racine focused on the lag time problem associated with the traditional culture-based method of beach monitoring. Racine explored two options for dealing with this lag-time dilemma. One was testing a new method of measuring Escherichia coli (E. coli) concentration— quantitative polymerase chain reaction (qPCR). Instead of growing and enumerating bacterial colonies in cultures, qPCR yields more timely results by identifying and quantifying genetic sequences of bacteria. qPCR results can be obtained from a laboratory on the same day the sample is taken, in most cases within three hours of sample collection, allowing more rapid determinations of beach water quality for swimmers' safety. Racine also explored using mathematical models to predict beach water quality. An accurate model would provide a basis for issuing preemptive notifi- cations in advance of water sampling, allowing city officials to take an even more conservative approach ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 85 City of Racine Nowcast Model (Racine, Wisconsin) (continued) to swimmers' safety. Racine officials believe that the daily use of models, supported by daily beach survey data and verified by qPCR monitoring, will be the cornerstone of their future beach monitoring program. Statistical models were developed for Racine's two beaches using the U.S. Environmental Protection Agency's (EPA's) Virtual Beach (VB) software (v2.0-2.2). The Wisconsin Department of Natural Resources (WDNR) Sciences Services assisted throughout the model development process. WDNR coordinates Wisconsin's beach monitoring program and administers the BEACH Act grants for the state's 193 public beaches along 55 miles of Lake Superior and Lake Michigan coastlines. Because WDNR staff had expertise in model development for the state's many public beaches, they were well-equipped to offer guidance to the City of Racine. WDNR's support proved invaluable as they pulled together various data sources, including data from older and recently developed models. Identified as Nowcast models, the "real-time" predictive models developed for the project use multiple linear regression and other statistical procedures to evaluate the relationships between measured FIB concentrations in the water Performing qPCR, and certain meteorological factors and onshore and near-shore conditions associated with water quality. The output of the current models developed by the city, in conjunction with the WDNR, expresses two values: predicted E. coli concentrations and predicted probability of exceedance. Model Development Key Components The key for developing a good model is selecting the proper set of component variables and ensuring that staff have the necessary skills. In the initial development phase, in 2010, Racine examined a diverse set of variables for potential use in the model. Variables included water temperature, air temperature, seagull counts, dog counts, wildlife counts, wave height and intensity, water clarity, sky conditions (i.e., cloud cover), water color changes, odor, algae amount, algae type, bather load (in, out, and total), long shore current direction and speed, wind direction and speed, stream discharge, pollution discharge, rainfall (24-, 48-, and 72-hour) and other precipitation records, day of year, season, lake levels, and the previous day's E. coli values. Initially, all variables were included because the majority could have been considered factors that influence local water quality. The project team reduced the initial number of variables by conducting correlation analyses. The model was developed using the variables that had the strongest associations. Important data sources for the model development included the U.S. Geological Survey's (USGS') real- time data viewer, Racine Water and Wastewater Utilities, the Great Lakes Observing System (GLCFS Nowcast 2D), local weather station data, and National Oceanic and Atmospheric Administration (NOAA) buoy data. Staff also obtained data from routine sanitary surveys housed on the Wisconsin Beach Health website (hosted by the USGS). Exploratory data analyses revealed that the sanitary survey data were especially valuable. The presence of algae and water clarity, for example, proved to be a good predictor of high FIB levels at some locations. ------- 86 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study City of Racine Nowcast Model (Racine, Wisconsin) (continued) Racine's beaches proved to be good candidates for modeling because they have large, consistent databases of FIB concentrations and fairly predictable pollution incidents associated with storms which resulted in advisories. Because North Beach is sampled at least five times per week, model developers could more frequently compare model results with actual FIB concentrations. By 2011 the VB software (VB v2.1) was fully developed and the city built an operational Nowcast model for North Beach. Key variables selected for the model included rainfall, wave height, long shore current vectors, stream discharge, water clarity, and sky conditions. Racine conducted a pilot test using qPCR and a culture-based method for measuring E. coli concentrations. This preparatory step was important because it allowed the city to track model predictions with laboratory results and validate the model using real-time data. The results were very encouraging. The model predicted E. coli concentration with 91 percent accuracy for culture-based results and with 98 percent accuracy for qPCR-based results. In 2012 Racine built new models (using VB v2.2) for North and Zoo Beaches. The new models included a greater proportion of Web captured data than the 2011 models which relied heavily on beach sanitary data collected locally. By developing two different types of models, Racine was able to determine whether the number and types of field data could be reduced or eliminated (as a cost savings measure). In the new model, wave height was found to be the most predictive variable at Zoo Beach. As developed, the Zoo Beach model required significantly less locally collected field data to run than the 2011 model and results have been encouraging. However, the city found that the 2011 North Beach model (which included several beach sanitary survey parameters) was more robust than the 2012 model construct. Model Implementation Before developing the Nowcast models, the City of Racine used the persistence model (i.e., the previous day's culture-based results) for issuing beach notifications. In 2011 the city used the Nowcast model in combination with the lab-based methods to support management decisions. Even when the model predicted exceedance of the E. coli water quality standard, the city did not use the model alone to make notification decisions. Instead, the city developed a set of guidelines for making notification decisions. For example, they issue a preemptive advisory in advance of results from the laboratory analyses if the probability of exceedance is greater than 10 percent and the predicted E. coli concentration exceeds 50 colony-forming units per 100 milliliters of water. Each beach monitoring component—sanitary survey, Nowcast model, culture-based testing, and qPCR—is designed and applied to complement and reinforce the others to generate timely, accurate results and a better understanding of the conditions and variables that accelerate FIB growth in water to create unsafe conditions for swimming. In June 2012 Racine, Wisconsin, became the first municipality in the nation to base notification decisions on qPCR results. In conjunction with the qPCR assay, Racine also ran the model at North and Zoo Beaches daily (the city runs the model only on weekdays, unless there is an advisory or closure that extends their sampling and sanitary survey data collection into the weekend). The results of qPCR, along with sanitary survey information, model estimations and staff judgment are all considered when determining whether to issue a beach advisory or closure. Model Costs The city did not incur additional costs for data collection for model development and implementation. The necessary data were already being routinely collected, using equipment already in place. The costs associated with the development of the Nowcast model were mostly for labor. The staff ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 87 City of Racine Nowcast Model (Racine, Wisconsin) (continued) Stormwater retention management practices at North Beach, needed to have a basic understanding of statistics, intuitive ability to manipulate data, and a working knowledge of factors affecting local water quality. The development team usually consisted of two laboratory personnel, with support and guidance from staff at the WDNR Science Services division. In some cases, WDNR staff took a more impromptu role by developing models in coordination with laboratory personnel. Labor costs included the time it took staff to retrieve, format, and assess the data and build, train, and revise the operational version of the initial model. Staff needed several months to collect and format data to develop the initial model. Model costs were minimal but required one person to work through the modeling process. The newer VB software reduced the model development time, but data evaluation and model development still required a week or more. The daily cost to run the model is minimal—most of the cost is in data collection and processing (i.e., the initial effort required to build the model, run correlations, and perform statistics), which occurred over several days. Importantly, EPA's continued improvement of the VB software allows for more rapid statistical model development and simplifies the application of the model for the end user. Newer ver- sions of the model not only provide quantified results, but also add an exceedance probability providing another dimension to beach management decisions. The time spent running the model is only a fraction of the time spent on routine, culture-based monitoring. Once all the routine sanitary survey data are available, the model takes approximately five minutes to run—significantly faster than laboratory sample analyses, which require at least 2 hours and up to 18 hours, depending on the method used. ------- 88 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study City of Racine Nowcast Model (Racine, Wisconsin) (continued) Issues Encountered Although the overall model prediction results have been very accurate, the City of Racine encountered a few issues. During the model development phase, the city had trouble building a robust dataset because of the large amount of data required. They resolved this problem after they compared the electronic data against the original hardcopies and found that missing data and incorrectly entered data often caused issues with empty cells or incorrect predictions. Another issue encountered was with the estimation of E. coli data. North Beach typically has very few advisories; as a result, building a model to predict those exceedances was difficult. For example, since advisory dates were so few and far between, those dates could have possibly been identified as statistical outliers (i.e., sample results that were numerically distant from the rest of the data) and it was sometimes difficult to decide which data should be culled. Once these decisions were made, implementation was far less problematic. City of Racine laboratory staff noted, "Where there are few exceedances, we sometimes remove them as statistical outliers, but we have to be careful doing so because if we leave them out, we are essentially excluding event- based data." The city sometimes had issues with data retrieval. Occasionally, either online data were unavailable for running the model, or data were unusable because of a reporting error. In those cases, the city had to find a comparable data source. For example, if rainfall data were unavailable from the local airport, the city used precipitation records from the local wastewater treatment plant (a comparable distance from the beach) to make an initial estimation. Once precipitation records from the airport became available, they re-ran the model using the amended data from the original source. Moving Forward The City of Racine validates the model by comparing model results to monitoring (culture and qPCR) results. They consider the model to be successful because of the low number of Type I and Type II errors found after evaluating beach management decisions at the end of the beach season. The city ran the 2011 (VB v2.1) and 2012 (VB v2.2) models side- by-side to compare the results and verified which model was most appropriate for each beach. They will continue to evaluate their model every year to ensure that it is still predictive since major changes can occur to beaches and the weather varies significantly from year to year. Data collection methods and variables Waiting for fireworks. ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 89 City of Racine Nowcast Model (Racine, Wisconsin) (continued) Life guard watching North Beach, used in the model have not changed. The city compared the 2011 model to newer model iterations and have found that incorporating additional years of data has not made any significant improvements. As of 2015, the city is using both the monitoring and modeling results to make beach notification decisions. Because the model results have shown high accuracy, monitoring could be reduced in the future. However, staff would still need to visit the beaches regularly to complete the routine beach sanitary survey form that includes data elements necessary to run the model. In the future, the city plans to focus more on cost-efficiency; Nowcast models will likely play a large role in this endeavor. Eventually, model results might be the primary means for making beach notification decisions, restricting laboratory analyses to only those days when exceedances are predicted. Through the use of qPCR this can be accomplished in near real time, striking a balance between public health protection and maximum utility of recreational beaches. Advice and Lessons Learned To assist others planning to develop a predictive model, the City of Racine shared these lessons learned: • Partner with agencies or universities that have software expertise and experience with predictive models. • Build your model using easily retrievable data and collect data in a consistent manner and in sufficient quantity. You can't compare an apple to an orange (i.e., estimations of wave height beachside might not be equivalent to data retrieved at a NOAA buoy). It is often best to collect your own data and not rely on someone else's. • Have a robust data set—at least 2 seasons' worth of data are preferable. • Use sanitary surveys to identify pollution sources as well as gaps in model performance. One season may have a dominant variable that wasn't ------- 9O Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study City of Racine Nowcast Model (Racine, Wisconsin) (continued) previously accounted for. Sanitary survey data should be consistently collected each day that sampling occurs. Evaluate your dataset before building the model. Sometimes modelers will expect an unreasonably high R-squared value without much knowledge of their data. As a result, modelers might spend unnecessary time finding, acquiring, compiling, formatting, and reviewing additional data, which might not significantly improve model performance. A flow chart of inputs and outputs for FIB at your beach can help with this. Not all beaches will have a single driving force, but those that have unique situations might require evaluative criteria prior to model development to improve chances of a success. There can be a lot of background noise from the frequency of non- event related observations in predictive variables. The City of Racine improved model performance by implementing a rainfall threshold to reduce the size of the dataset. Examine the interaction between variables—not just variables as single elements. For example, wind direction might not be predictive, but wind direction plus speed might be (e.g., onshore winds exceeding a velocity threshold). Determine how to best represent your data (i.e., quantitative, qualitative, categorical, or binary). Discuss the threshold for exceedance probabilities during the implementation phase. Depending on the model, the probability of exceedance result might be less or more than expected, given the model estimate. Have comparable backup data sources for your model inputs. Be realistic about model outputs and combine the results with experience. Does the model output match what my experience tells me? How should I expect these environmental conditions to affect local water quality? (i.e., your model needs to make sense.) • Validate the model periodically because ambient conditions might change. References Cicero, K. The 10 Best Beaches for Families: 2011. Parents Magazine. June 2011. Accessed January 22, 2013. http://www.parents.com. Clark, J., Hortobagyi, M., and Yancey, K.B. Just for Summer: 51 Great American Beaches. USA Today. March 27, 2012. Accessed January 22, 2013. http://travel.usatoday.com. Kinzelman, Julie. City of Racine. Personal communication. Kurdas, Stephan. City of Racine. Personal communication. Our 7 Top Midwest City Beaches. Midwest Living Magazine. July-August 2010. Accessed January 22, 2013. http://www.midwestliving.com. ------- Case Study The Stormwater and NexRad Rainfall Models (Horry County, South Carolina) 91 Introduction Horry County, South Carolina has 180 miles of coastline containing a series of beaches, its most famous being Myrtle Beach, also known as the "Grand Strand." Its beaches attract more than 13 million visitors each year. Like all the public beaches in the state, Grand Strand beaches are regularly monitored for fecal indicator bacteria levels by the South Carolina Department of Health and Environmental Control (SCDHEC), in conjunction with local governments. The goal of the monitoring program is to allow the public to make ««.* Atlantic Ocean 5.5 I 11 Miles _| informed decisions about their recreational activities and any potential for swimming-associated health effects. Water Quality The water quality of Grand Strand beaches is typically very good. However, during and after heavy rainstorms, Stormwater discharges occasionally cause bacteria levels to rise above state water quality standards, prompting SCDHEC to issue swimming advisories. To minimize the impact of Stormwater on these beaches, some Grand Strand communities have extended their Stormwater outfall structures further out into the ocean to discharge runoff into deeper waters, away from swimming areas. In 2011 Myrtle Beach completed a project at 4th Avenue North that consolidated nine nearshore Stormwater drainage pipes into one large pipe which runs underneath the seabed and empties into the Atlantic Ocean more than 1,000 feet from shore. Similar projects have been conducted at 7th Avenue South in North Myrtle Beach and at Deep Head Swash in Myrtle Beach. These and other infrastructure investments have significantly reduced fecal indicator bacteria levels at Grand Strand beaches. Model Development Stormwater Model In 2007 SCDHEC developed a model as part of a staffer's master's thesis project to predict fecal indicator bacteria levels at South Carolina state beaches. To be adopted and applied by SCDHEC, the model needed to be simple to operate and provide reliable ------- 92 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Stormwater Model (Horry County, South Carolina) (continued) Myrtle Beach, results. The model effort evolved to include a project team consisting of the local health department, SCDHEC and University of South Carolina (USC) professors with geostatistical modeling, database management, and geographic information system (GIS) expertise. They chose to develop models using information for the popular Grand Strand beaches. These beaches are Tier 1 (the highest priority beaches because of high risk, high use, or both) beaches and were best suited for modeling because they have direct Stormwater input and high number of bathers. Tier 2 beaches typically had very few exceedances and bathers. SCDHEC and the project team used various statistical methods, a literature review, and professional judgment to determine which variables to include. Rainfall was found to be the primary predictive variable. The initial models developed were statistical models with rainfall as the most important variable. A Multiple Linear Regression (MLR) model and a Classification and Regression Tree (CART) model were developed and run separately for each sample site. To improve prediction, SCDHEC developed an ensemble forecast—a statistical approach using results from multiple models—by combining results from the MLR (predicting estimated fecal indicator bacteria levels) and CART (estimating the range of expected fecal indicator bacteria levels) for each sample site (or section of beach). By combining these two results, SCDHEC could approximate a third possible fecal indicator bacteria level, called the Ensemble prediction. Beach managers could use all three model outputs to determine the advisory level needed to protect public health in different areas. NexRad Rainfall Model In 2011 SCDHEC began collaborating with USC and the University of Maryland to develop an updated version of the Stormwater model, (i.e., NeRad rainfall model) one that would not require the use of expensive rainfall equipment. The project entailed enhancing a user application with new models and developing an automated, database-driven tool that would estimate bacteria levels and visualize model results, allowing SCDHEC to better predict ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 93 Stormwater Model (Horry County, South Carolina) (continued) and analyze bacteria-related public health threats. The project was led by Dr. Dwayne Porter of USC and built on previous efforts and incorporated new models that provide rainfall estimates using radar- based data. These radar data improved existing tools by (1) allowing spatial estimates to be averaged over a watershed area instead of applying point estimates and (2) allowing for automated integration of remotely sensed data, eliminating the need for SCDHEC's costly rain gauge network. The NexRad rainfall model essentially combined the MLR, CART, and ensemble techniques into one modeling user interface, and added a new element—Next-Generation Radar (NexRad) data, compiled from a network of high-resolution Doppler weather radars operated by the National Oceanic and Atmospheric Administration's (NOAAs) National Weather Service (NWS). The goal of using NexRad is to have as close to real-time data as possible. As of 2013, the NexRad data just included rainfall; however, the development team planned to consider other variables such as sunlight, temperature, salinity, and the number of preceding dry days. The development team is still determining the best data sources. As of 2015, USC still uses sanitary surveys although they can be time-consuming. The NexRad rainfall model also used GIS polygons of individual watersheds, which were created by overlaying piping diagrams of the Stormwater systems provided by the area's individual municipalities. GIS polygons are overlaid to create mini-watersheds to determine how much rain falls on each beach site. SCDHEC tested the NexRad rainfall model in several counties during the 2012 beach swimming season (May 15 through October 15) and used model results as one of several tools in deciding whether to issue swimming advisories. Exceedances of water quality standards are expressed as High, Medium, and Low (using the MLR and CART model predictions), but the model can also provide actual predicted FIB levels. Data SCDHEC used historical water quality data to develop and validate the Stormwater model in 2007. The data and variables considered included cumulative rainfall, rainfall intensity, number of preceding dry days, wind speed and direction, tides and lunar phase data, water current, and salinity. The water quality data were collected by the SCDHEC beach monitoring program. Rainfall data were collected by a system of gauges installed in several locations. Wind speed and direction data were obtained online from NOAA. In 2011, USC began developing the NexRad rainfall model based on the assimilation and integration of multiple sources of data including field programs (bacteria density, salinity, air and water temperature, tide, weather); observing systems (rainfall, currents, salinity, wind); and remote sensing models (salinity, air and water temperature, rainfall, currents, wave activity). SCDHEC provided the bacteria density data. All other data for the NexRad rainfall came from a variety of sources, including the NWS, the National Estuarine Research Reserve System, and the Southeast Coastal Ocean Observing Regional Association's Integrated Ocean Observing System (IOOS). Model Output and Validation To validate the Stormwater model, USC compared the predicted MLR calculations to actual sampled values twice a month. In general, the Stormwater model expressed predicted exceedances of the water quality standard with above-average accuracy; however, SCDHEC did not sample after rain events at sites where acceptable water quality was predicted. Therefore, an unknown quantity of false positives might have occurred. In 2005 SCDHEC ended the data collection used to validate the Stormwater model. Officials felt that the post-2005 changes (i.e., offshore Stormwater outfall pipe [discussed above] and new infiltration pits and ultraviolet disinfection systems) drastically changed the environment; therefore, the model was no longer ------- 94 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Stormwater Model (Horry County, South Carolina) (continued) Myrtle Beach, relevant since it was based on data collected before these changes occurred. The NexRad rainfall model worked better for some beaches than others. USC performed receiver operating characteristic (ROC) analyses to determine the frequency of Type I and Type II errors. USC staff assessed model effectiveness by cross-referencing samples taken against predicted MLR calculations. If the MLR model calculated fecal bacteria levels of greater than 103 colony-forming units (CPU) per 100 milliliters (mL) of water, SCDHEC issued an advisory. If the CART model calculated High, they issued an advisory. If the MLR model calculated a concentration of greater than 74 CFU/100 mL and CART model calculated Medium at the same site, they issued an advisory. USC validates the NexRad rainfall model using VB's toolbox for model development and validation which has made model updates and validation fairly easy as long as data are available. This tool allows the user to compare model predictions against actual monitoring data. Model Implementation When implementing the Stormwater model (during the 2007-2009 beach seasons), SCDHEC discovered that the effect of rainfall and other variables differed by beach site. Consequently, the agency decided that each beach site should be modeled independently (i.e., using a different statistical model for each station or section of beach) to provide the most accurate information. SCHDEC applied the Stormwater model to 10 beaches in Horry and Georgetown counties. The model was designed to extract rainfall data from rain gauges at each beach and independently input weather and tidal information. These data were continuously added to the model, which was constantly recalibrated, although a more intensive recalibration was needed to adjust to the infrastructure changes. When developing the NexRad rainfall model in 2011, USC found that combining the separate, sample site-specific models (MLR, CART, and Ensemble) into one user interface was fairly easy. As of 2015 USC makes the daily model results available via email, a Web interface, and a phone application. SCDHEC ------- Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study 95 Stormwater Model (Horry County, South Carolina) (continued) publishes advisory information at www.howsthe- beach.org. In 2012, SCDHEC used the NexRad model to implement the initial suite of preemptive beach swimming advisory models as a tool to determine when an advisory should be issued in Horry County, South Carolina. Because of program management changes, SCDHEC did not continue using the model for advisory decisions in Horry County after 2012. Issues Encountered The Stormwater model was used during the 2007- 2009 beach seasons to make beach management decisions. Little or no modeling was performed in 2010-2011. In 2010, SCDHEC changed the advisory program and began placing permanent advisory signs at beach sites that routinely exceeded the state water quality standards (e.g., Stormwater outfalls and swashes). Remaining sites were either not modeled because of historically low enterococci counts or because they never exceeded water quality standards, even after a rainfall event (other than a tropical storm). This, coupled with drier seasons, meant that the Stormwater model was not used very frequently, if at all, in most locations. Out of a total of 43 sites monitored, SCDHEC placed 29 permanent signs saying, "Caution, swimming not advised, high bacteria counts, refrain from fishing and wading, do not put head below water, no swimming within 200 feet of sign." SCDHEC still monitored all 43 sites, but did not want to invest in new monitoring equipment to support the modeling when many sites already had permanent signs. In addition, the outdated equipment used to measure rainfall began failing and was not compatible with new computers. The entire system was expensive to replace, with an estimated cost of $20,000-$30,000. Learning from SCDHEC's experiences, beach managers should be aware of the limitation of hardware and equipment. For South Carolina, the tipping buckets used to gather rain data required a significant amount of maintenance and time to keep running and clean. Replacement parts for all types of equipment can be costly; in addition, equipment can sometimes become obsolete, being replaced with newer technology. In addition, equipment can be difficult to maintain with limited amounts of staff and resources. Model Costs The initial cost to develop the Stormwater model in 2007 was low—basically the cost of a graduate student's time. Once the model was operational, costs increased because a series of 11 rain gauges needed to be installed and maintained. Unfortunately, SCDHEC budget cuts reduced the resources and staff available to perform maintenance. Using data collected during the 2007-2009 beach seasons, they were able to target sites with frequent exceedances and could reduce monitoring and maintenance of the rain gauges at other sites, further reducing costs. This continued until permanent signs were put in place at beach sites where the water quality standards were routinely exceeded and eventually they stopped using the rain gauges all together. The primary costs for the 2011 NexRad model are development and continual model updates. There were no costs associated with model implementation because model data were obtained for free. Moving Forward The NexRad rainfall model eliminated the need for updates and maintenance of the rain gauge network; improve timeliness by providing robust decision support well in advance of verification by biological sample cultures; and improved accuracy by providing reliable forecasts of beach hazards that would merit closures, while reducing false positives. These models are some of the first marine Enterococcus models, and some of the first to use CART models. They are transferable to other swimming beaches in the southeast United States that experience similar weather and water circulation patterns and have Stormwater runoff as the most significant pollution source. In the future, the scientists who developed ------- 96 Six Key Steps for Developing and Using Predictive Tools at Your Beach—Case Study Stormwater Model (Horry County, South Carolina) (continued) the model hope to increase buoy and radar coverage to provide improved spatial resolution of data and to assess the use of the model for predicting salinity and currents. USC's Dr. Dwayne Porter advises other beach programs that "you do not want to shortchange the modeling effort, but simpler is often better." Sean Torrens with SCDHEC encourages beach managers to collaborate with others, such as graduate students and universities, and to research what others are doing to avoid reinventing the wheel. References NRDC (Natural Resources Defense Council). Testing the Waters: South Carolina. http://www.nrdc.org/water/oceans/ttw/sc.asp. Porter, Dwayne, University of South Carolina. Personal communication. South Carolina Department of Health and Environmental Control. Beach Monitoring Program. http://www.scdhec.gov/HomeAndEnvironment/ Water/SwimSafety/. Southeast Coastal Ocean Observing Regional Association. Water Quality Observations and Models Help Managers Make Decisions on Issuing Swim Advisories, www.secoora.org. Torrens, Sean, South Carolina Department of Health and Environmental Control. Personal Communication. ------- |