Water Security Toolkit User Manual
Version 1.1
                              .. n,        n ,
                             = No Detection  Positive Detection

-------
WATER SECURITY TOOLKIT USER MANUAL:
            VERSION 1.1

-------
Disclaimer
The U.S. Environmental Protection Agency (EPA) through its Office of Research and Development funded
and collaborated in the research described here under an Interagency Agreement  (IA # DW8992192801)
with the Department of Energy's Sandia National Laboratories.  The material in this document has been
subject to Agency technical and policy review, and approved for publication as an  EPA report.  The views
expressed by individual authors, however, are their own, and do not necessarily reflect those of the U.S.
Environmental Protection Agency. Mention of trade names, products, or services  does not convey official
EPA approval, endorsement, or recommendation.
NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Gov-
ernment. Accordingly, the United States Government retains a nonexclusive, royalty-free license to publish
or reproduce the published form of this contribution, or allow others to do so for United States Government
purposes.  Neither Sandia Corporation, the United States Government, nor any agency thereof,  nor any of
their employees makes any warranty, express or implied,  or assumes any legal liability or responsibility for
the  accuracy, completeness, or usefulness of any information, apparatus, product,  or process disclosed, or
represents that its use would not infringe privately-owned rights. Reference herein to any specific commer-
cial product, process, or service by trade name, trademark, manufacturer, or otherwise does not  necessarily
constitute or imply its endorsement,  recommendation, or favoring by Sandia Corporation, the United States
Government, or any  agency thereof.  The views and opinions expressed herein  do  not necessarily state or
reflect those of Sandia Corporation, the United States Government or  any agency thereof.
Questions concerning this document or its application should be addressed to:
 Terra Haxton
 National Homeland Security Research Center                              .
 Office of Research and Development                                    •&
 U.S. Environmental Protection Agency                                  3
 Cincinnati, OH 45268                                                  ^ <-
                                                                                   ,
 Haxton.Terra@epamail.epa.gov                                          <^          ,p
 513-569-7810

-------
License  Notice
The Water Security Toolkit (WST) v.1.1

Copyright © 2012 Sandia Corporation.  Under the terms of Contract  DE-AC04-94AL85000, there is a
non-exclusive license for use of this work by or on behalf of the U.S. government.
    Aero
    argparse
    Boost
    Coopr
    Coverage
    Distribute
    EPANET
    EPANET-ERD
    EPANET-MSX
    gcovr
    GRASP
This software is distributed under the Revised BSD License (see below).  In addition, WST leverages a
variety of third-party software packages, which have separate licensing policies:
                      Revised BSD License
                      Python Software Foundation License
                      Boost Software License
                      Revised BSD License
                      BSD License
                      Python Software Foundation License / Zope Public License
                      Public Domain
                      Revised BSD License
                      GNU Lesser General Public License (LGPL) v.3
                      Revised BSD License
                      AT&T  Commercial License for noncommercial use; includes randomsample and
                      sideconstraints executable files
    LZMA SDK      Public Domain
    nose              GNU Lesser General Public License (LGPL) v.2.1
    ordereddict       MIT License
    pip               MIT License
    PLY              BSD License
    PyEPANET      Revised BSD License
    Pyro              MIT License
    PyUtilib          Revised BSD License
    PyYAML         MIT License
    runpy2           Python Software Foundation License
    setuptools        Python Software Foundation License / Zope Public License
    six                MIT License
    Tiny XML        zlib License
    unittest2          BSD License
    Utilib             Revised BSD License
    virtualenv        MIT License
    Vol               Common  Public License
    vpykit            Revised BSD License
Additionally, some precompiled  WST binary distributions might bundle other third-party executables files:
    Coliny   Revised BSD License (part of Aero project)
    Dakota  GNU Lesser General Public License (LGPL) v.2.1
    PICO    Revised BSD License (part of Aero project)

-------
Revised BSD License

Redistribution and use in source and binary forms, with or without modification, are permitted provided
that the following conditions are met:

   •  Redistributions of source code must retain the above copyright notice, this list of conditions and the
     following disclaimer.

   •  Redistributions in binary form must reproduce the above copyright notice, this list of conditions and
     the following disclaimer in the documentation and/or other materials provided with the distribution.

   •  Neither the name of Sandia National Laboratories nor Sandia Corporation nor the names of its con-
     tributors may be used to endorse or promote products derived from this software without specific prior
     written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IM-
PLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION BE LIABLE FOR ANY DIRECT,
INDIRECT,  INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUD-
ING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
                                           in

-------
Acknowledgements
The National Homeland Security Research Center (NHSRC) would like to acknowledge the following orga-
nizations and individuals for their support in the development of the Water Security Toolkit User Manual,
and in the development and testing of the Water Security Toolkit software.

U.S. EPA Office of Research and Development - NHSRC

    Terra Haxton
    Robert Janke
    Regan Murray

Sandia  National Laboratories (IA # DW8992192801)

    David  Hart
    William Hart
    Katherine Klise
    Cynthia Phillips
    John Siirola

Argonne  National  Laboratory

    Thomas Taxon

Texas AfeM University

    Gabe Hackebeil
    Carl Laird
    Angelica Mann
    Shawn McGee
    Arpan Seth
NHSRC  would also like to acknowledge the following individuals  for their  previous contributions to the
development of the TEVA-SPOT toolkit software: Jonathan  Berry (Sandia National Laboratories),  Erik
Boman (Sandia National Laboratories), Jean-Paul Watson (Sandia National Laboratories), Lee Ann Riesen
(Sandia National Laboratories) and James Uber (University of Cincinnati).
                                             IV

-------
Acronyms
    ATUS    American Time-Use Survey
    BLAS    Basic linear algebra sub-routines
    CFU      Colony-forming unit
    CVAR    Conditional value at risk
    CWS     Contamination warning system
    EA       Evolutionary algorithm
    EDS      Event detection system
    EPA      U.S. Environmental Protection Agency
    EC       Extent of Contamination
    ERD      EPANET results database file
    GLPK    GNU Linear Programming Kit
    GRASP  Greedy randomized adaptive sampling process
    INP      EPANET input file
    LP       Linear program
    MC       Mass consumed
    MILP    Mixed integer linear program
    MIP      Mixed integer program
    MSX     Multi-species extension for EPANET
    NFD      Number of failed detections
    NS       Number of sensors
    NZD      Non-zero demand
    PD       Population dosed
    PE       Population exposed
    PK       Population killed
    TAI      Threat assessment input file
    TCE      Tailed-conditioned expectation
    TD       Time to detection
    TEC      Timed extent of contamination
    TEVA    Threat ensemble vulnerability assessment
    TSB      Tryptic soy broth
    TSG      Threat scenario generation file
    TSI       Threat simulation input file
    VAR      Value at  risk
    VC       Volume consumed
    WST     Water Security Toolkit
    YML     YAML configuration file format for WST

-------
Symbols
Notation
{,}
e
V
E
\


Definition
set brackets
is an element of
for all
summation
set minus
given
cardinality
Example
{1,2,3} means a set containing the values 1,2, and 3.
s (E S means that s is an element of the set S.
s = 1 V s (E S means that the statement s = 1 is true for all s in
set S.
E™=1 si means si + s2 + • • • + sn.
S \ T means the set that contains all those elements of S that are
not in set T.
is used to define conditional probability. P(s\t] means the prob-
ability of s occurring given that t occurs.
Cardinality of a set is the number of elements of the set. If set S
= {2,4,6}, then \S\ = 3.
                          VI

-------
Contents
1  Introduction                                                                               1

2  Getting Started                                                                            4
   2.1   Obtaining the Water Security Toolkit   	   4
   2.2   Dependencies ol the Water Security Toolkit	   4
   2.3   Installing the Water Security Toolkit Binary Distributions	   6
   2.4   Compiling the Water Security Toolkit  Source Code	   6
        2.4.1  Obtaining the Water Security Toolkit Source Code	   7
        2.4.2  Configuring the Python Virtual Environment	   7
        2.4.3  Building the C++ Executable Files   	   7
   2.5   Basic Usage of the Water Security Toolkit	   8
   2.6   Verifying Installation of the Water Security Toolkit	   8
   2.7   Uninstalling the Water Security Toolkit	   9

3  Contaminant Transport                                                                   11
   3.1   Hydraulic and Water Quality Analysis	  11
        3.1.1  EPANET  and EPANET-MSX	  11
        3.1.2  Merlion	  12
   3.2   Contaminant Transport Scenarios	  12
   3.3   tevasim Subcommand	  13
        3.3.1  Configuration File	  13
        3.3.2  Configuration Options	  13
        3.3.3  Subcommand Output  	  15
   3.4   Contaminant Transport Examples  	  15
        3.4.1  Example 1	  16
        3.4.2  Example 2	  17

4  Impact Assessment                                                                        18
   4.1   Impact Metrics	  19

                                               vii

-------
   4.2  Humand Health Impact Model  	  21
        4.2.1  Population  	  22
        4.2.2  Cumulative Dose	  22
        4.2.3  Response 	  23
        4.2.4  Disease Progression Model	  24
   4.3  sim2Impact Subcommand	  24
        4.3.1  Configuration File	  24
        4.3.2  Configuration Options	  25
        4.3.3  Subcommand Output  	  26
   4.4  Impact Assessment Examples	  26
        4.4.1  Example 1	  26
        4.4.2  Example 2	  27
        4.4.3  Example 3	  27

5  Sensor Placement                                                                         29
   5.1  Sensor Placement Formulations	  29
   5.2  Sensor Placement Solvers  	  30
   5.3  sp Subcommand	  31
        5.3.1  Configuration File	  32
        5.3.2  Configuration Options	  32
        5.3.3  Subcommand Output  	  37
   5.4  Sensor Placement Examples	  40
        5.4.1  Example 1	  40
        5.4.2  Example 2	  43
        5.4.3  Example 3	  45

6  Hydrant Flushing Locations                                                              47
   6.1  Flushing Formulation   	  48
   6.2  Flushing Solvers  	  49
        6.2.1  Evolutionary Algorithm	  49
        6.2.2  Network Solver	  49
   6.3  flushing Subcommand	  50
        6.3.1  Configuration File	  50
        6.3.2  Configuration Options	  50
        6.3.3  Subcommand Output  	  55
   6.4  Flushing Response Examples	  57
                                               vm

-------
        6.4.1  Example 1	   57
        6.4.2  Example 2	   60

7  Booster Station Placement                                                                62
   7.1   Booster Placement Using Multi-species Reaction	   63
        7.1.1  Booster MSX Solvers	   64
        7.1.2  booster_msx Subcommand	   64
              7.1.2.1   Configuration File	   65
              7.1.2.2   Configuration Options	   65
              7.1.2.3   Subcommand Output	   70
   7.2   Booster Placement Using Neutralization or Limiting Reagent Reaction	   70
        7.2.1  Neutralization NEUTRAL Formulation  	   70
        7.2.2  Limiting Reagent LIMIT Formulation	   71
        7.2.3  Booster MIP Solvers	   72
        7.2.4  booster_mip Subcommand	   72
              7.2.4.1   Configuration File	   73
              7.2.4.2   Configuration Options	   74
              7.2.4.3   Subcommand Output	   78
   7.3   Booster Placement Examples	   78
        7.3.1  Example 1	   78
        7.3.2  Example 2	   79

8  Source Inversion                                                                           81
   8.1   Source Inversion Formulations	   82
        8.1.1  MIP Formulations	   82
        8.1.2  Bayesian Probability Based Formulation	   84
   8.2   Source Inversion Solvers	   84
   8.3   inversion Subcommand  	   84
        8.3.1  Configuration File	   85
        8.3.2  Configuration Options	   85
        8.3.3  Subcommand Output  	   87
   8.4   Source Inversion Example	   88
        8.4.1  Example 1	   88
        8.4.2  Example 2	   89

9  Grab Sampling                                                                            92
   9.1   Grab Sampling Formulation	   92

                                               ix

-------
   9.2   Grab Sampling Solvers	   93
   9.3   grabsample Subcommand	   93
        9.3.1  Configuration File	   93
        9.3.2  Configuration Options	   94
        9.3.3  Subcommand Output  	   97
   9.4   Grab Sampling Example	   97

10 Advanced Topics and Case Studies                                                       99
   10.1  Merlion Water Quality Model	   99
   10.2  Average-case Sensor Placement	101
        10.2.1 Computing a Bound on the Best Sensor Placement Value	101
        10.2.2 Managing Sensor  Placement Locations	103
        10.2.3 Limited-Memory Sensor Placement Techniques	104
        10.2.4 Evaluating a Sensor Placement	105

11 File Formats                                                                             108
   11.1  Configuration File  	108
   11.2  Cost File  	110
   11.3  ERD File	110
   11.4  Impact File	110
   11.5  Imperfect Junction Class File	Ill
   11.6  Imperfect Sensor Class File	Ill
   11.7  Measurements File	112
   11.8  Nodemap File	112
   11.9  Scenariomap File	112
   ll.lOSensor Placement File	113
   11.11TAI File	113
   11.12TSG File  	115
   11.13TSI File	116
   11.14Weight File	117

12 Executable Files                                                                         118
   12.1  evalsensor	118
        12.1.1 Usage	118
        12.1.2 Options	118
        12.1.3 Arguments  	119
   12.2  filter_impacts	120

-------
        12.2.1 Usage	120
        12.2.2 Options	120
        12.2.3 Arguments  	120
   12.3 scenarioAggr  	121
        12.3.1 Usage	121
        12.3.2 Options	121
        12.3.3 Arguments  	121
   12.4 spotSkeleton  	122
        12.4.1 Usage	122
        12.4.2 Arguments  	122

References                                                                               123
                                               XI

-------
List  of Figures
   2.1   Output from the tevasim subcommand	   10

   3.1   Contaminant transport simulation flowchart	   11
   3.2   The tevasim configuration template file	   13
   3.3   Layout of the EPANET Example Network 3	   16
   3.4   Example TSG  contamination scenario file	   16
   3.5   The tevasim configuration file for example 1	   17
   3.6   The tevasim configuration file for example 2	   17

   4.1   Impact assessment flowchart	   18
   4.2   The sim2Impact configuration template file	   25
   4.3   The sim2Impact configuration file for example 1	   27
   4.4   The sim2Impact configuration file for example 2	   27
   4.5   The sim2Impact configuration file for example 3	   28

   5.1   Sensor placement flowchart	   29
   5.2   The sp configuration template file	   39
   5.3   The sp configuration file for example 1	   40
   5.4   The JSON output file for example 1	   41
   5.5   The sp screen  output for example 1	   42
   5.6   The sp configuration file for example 2	   43
   5.7   The sp screen  output for example 2	   44
   5.8   The sp configuration file for example 3	   45
   5.9   The sp screen  output for example 3	   46

   6.1   Flushing response simulation  flowchart	   48
   6.2   The flushing configuration template file	   56
   6.3   The flushing configuration file for example 1	   58
   6.4   The flushing JSON output file for example 1	   59
                                              xn

-------
6.5   The flushing configuration file for example 2	   60
6.6   The flushing JSON output file for example 2	   61

7.1   Multi-species reaction booster placement flowchart	   63
7.2   MIP booster placement flowchart	   63
7.3   The boosterjnsx configuration template file	   65
7.4   The boosterjnip configuration template file	   73
7.5   The boosterjnip configuration file for example 1	   79
7.6   The boosterjnsx configuration file for example 2	   80

8.1   Contamination source inversion flowchart	   81
8.2   Two different injection profiles used by the formulation variations	   83
8.3   The inversion configuration template file	   85
8.4   The inversion configuration file for example 1	   88
8.5   The inversion JSON output file for example 1	   90
8.6   The inversion configuration file for example 2	   91
8.7   The inversion JSON output file for example 2	   91

9.1   Grab sampling flowchart	   92
9.2   The grabsample  configuration template file	   94
9.3   The grabsample  configuration file for example 1	   98
9.4   The grabsample  JSON output for example 1	   98

10.1  Illustration of the origin tracking algorithm	100
10.2  The sp configuration file using the GLPK solver to compute a lower bound	101
10.3  The sp JSON file with the lower bound from the GLPK solver	102
10.4  The sp configuration file using the Lagrangian solver to compute a bound	102
10.5  The sp JSON file with the lower bound from the Lagrangian solver	103
10.6  The sp configuration file using the Lagrangian solver and the compute bound option	104
10.7  The evalsensor  example output	106
10.8  The evalsensor  output using sensor failure probabilities	107
                                           xm

-------
Chapter  1

Introduction
An abundant supply of safe, high-quality drinking water is critical to modern industrialized societies. At
home, water is used for drinking, cooking, washing clothes, and bathing.  At work, water is used to oper-
ate restaurants, hospitals, and manufacturing plants.  In our  communities, water is used for fighting fires.
Consequently,  contamination of drinking water infrastructure could severely impact the public health and
economic vitality of a community. The distributed physical layout of drinking water systems  makes them
inherently vulnerable to a variety of incidents, such as terrorist attacks, accidents and even natural disasters.
The physical destruction of water infrastructure can disrupt water  service to communities, and particularly
key facilities such as hospitals, power stations, and military installations.  Similarly, contamination with
deadly agents could result in large numbers of illnesses and fatalities.
Since the events of September 11, 2001, water utilities have had increasing concerns about the possibility of
harm to our water quality due to an accidental or intentional contamination incident within a distribution
network. The U.S. EPA's Response Protocol Toolbox (EPA, 2004) provides recommendations on actions
that water utilities  can  take to minimize potential impacts to consumers following a contamination threat
or incident. Detection and consequence management are major steps in this protocol. EPA has developed
modeling and  simulation tools to assist in the detection of contamination incidents in  water distribution
networks.  The  Threat Ensemble Vulnerability Assessment-Sensor Placement Optimization Tool, or TEVA-
SPOT  (EPA, 2011), identifies the optimal placement of online  water quality monitoring sensors to detect
contamination incidents. Another EPA developed  tool to assist in detection is the CANARY event detection
system (Hart and McKenna, 2012), which analyzes water quality data from the sensors and identifies periods
of anomalous water quality. These tools work together to help form  a contamination warning system (CWS).
The overall goal of a CWS is to detect contamination incidents in time to reduce potential public health and
economic consequences.  The current  terminology for a CWS is a  water  quality surveillance and  response
system. For more information on CWS, see U.S. EPA Water Security Initiative (EPA, 2013b).
Should a CWS detect the presence of a contaminant in a water  distribution network, consequence manage-
ment must be employed. Decision-making tools that assist water utilities in evaluating and planning various
response strategies  are needed to support rapid response to contamination incidents.  The  Water Security
Toolkit (WST)  is a suite of tools that help provide the information necessary to make good decisions resulting
in the minimization of further human exposure to contaminants, and the maximization of the  effectiveness
of intervention  strategies. WST is intended to assist in:

   • Planning  response actions to natural disasters and terrorist attacks,

   • Developing consequence management plans,

   • Informing large-scale exercises/training,

   • Planning  response actions to  address traditional utility challenges, such as pipe breaks and water
     quality problems, and

-------
   • Evaluating implications of different response strategies.

For water utilities with hydraulic modeling expertise,  WST combined with EPANET-RTX (EPA,  20f3a;
Hatchett et al., 2011; Janke et al.,  2011) could use data from CANARY, other sensor stations, and field
investigations to optimize and implement response actions in real-time.

WST assists in the evaluation of multiple response actions in order to select the most beneficial consequence
management strategy. It includes hydraulic and water quality modeling software and optimization method-
ologies to identify: (1) sensor locations to detect contamination, (2) locations in the network in which the
contamination was introduced, (3) hydrants to remove contaminated water from the distribution system, (4)
locations in the network to inject decontamination agents to inactivate, remove, or destroy contaminants,
(5) locations in the network to take grab sample to confirm contamination or cleanup, and (6) valves to close
in order to isolate contaminated areas of the network.
This user manual describes the different components of WST.  It is also available as a  Sandia Report Klise
et al. (2013b). The manual contains one chapter on each of the water security tools:

   • Contaminant transport

   • Impact assessment

   • Sensor placement

   • Hydrant flushing locations

   • Booster placement

   • Source inversion

   • Grab sampling

Another chapter discusses advanced topics and provides case studies. WST uses YAML  format configuration
files  to supply input  parameters to each water security tool. Additional information on the YAML  format
can be found in File  Formats Section 11.1.

The  contaminant transport simulation,  impact assessment and sensor placement optimization tools were
all developed as part of the TEVA-SPOT  Toolkit (EPA,  2011).   All functionality in TEVA-SPOT  has
been replicated in WST using new, user friendly YAML  format configuration files.  WST  builds upon the
simulation and optimization  framework  of TEVA-SPOT and adds  several new features.   These features
were all developed to model  possible response action plans once  contamination has been  detected in the
system. These action plans include redirecting flow by either opening hydrants or closing  valves, injecting
decontaminant to inactivate biological agents, and using sensor measurements to identify possible  source
locations.
The  main data requirement to use WST is a calibrated water utility network model. Additional input data
is dependent on the WST application. This includes information on the simulated contamination incident(s)
(e.g., type, location(s),  amount), the impact metric (e.g.,  extent of contamination, population exposed),
and  the response actions (e.g., flushing hydrants, injecting disinfectant). To optimize a  response  action,
additional information on the potential locations for water quality sensors, hydrants to flush, valves to close,
disinfectant booster  stations, and manual grab samples is needed.  The  operating characteristics of these
different  response actions are  also  required, such  as  the detection limits of the water  quality sensors,  the
rate  and duration that  hydrants can be flushed, the control settings for injecting disinfectant at booster
stations,  and the number of manual grab samples that  can be taken at the same time.  More details  on the
data requirements are provided in the chapter describing the specific water security  tool. In addition, each
chapter has example applications. All examples are included with WST and can be found in the examples
folder.  These examples use simple networks and data files that are also distributed with WST. The examples

-------
shown in this user manual are all executed on a Linux computer, so the CPU time for each example might
not be the same on computers with different operating systems.

-------
Chapter 2

Getting  Started
This chapter provides information on downloading and installing the Water Security Toolkit (WST). WST
is an open source toolkit for modeling and analyzing water distribution systems to minimize the potential
impact of contamination incidents.

2.1    Obtaining the Water  Security Toolkit

WST is distributed by Sandia National Laboratories in both source and pre-built binary forms through the
World  Wide Web at https://software, sandia.gov/trac/wst. Prom the main WST  web page, click the
Download link at the very top of the main page. The Download page has options to download the WST source
code as well as  pre-built binary packages  for 32- and 64-bit Windows,  and 64-bit Linux.  For most  users,
installing pre-built binary versions of WST is recommended. Along with formal releases, a VOTD (version
of the day)  build can also be downloaded.  This package is an  automated build of the current development
branch of the code meant to facilitate rapid dissemination of research developments to interested partners.
VOTD builds should be considered  bleeding edge and might frequently be unstable; as such, general users
are discouraged from relying on them for any production analyses.

Alternatively, the WST source code can be checked out directly from the master Subversion version con-
trol system through https://software, sandia.gov/svn/teva/wst/.  In particular, the mainline  trunk
development branch can be  checked out  with
   svn checkout https://soft¥are.sandia.gov/svn/teva/¥st/¥st/trunk ¥st
Individual releases are archived in the https://software, sandia.gov/svn/teva/wst/wst/tags directory.
The repository contains references to external repositories (notably, the EPANET repository on SourceForge).
Please note that your local network  configuration might require site-specific Subversion proxy settings, as
well as network access to https://software, sandia.gov and https ://epanet. svn. sourceforge.net.

2.2   Dependencies of the  Water Security Toolkit

WST is a collection of Python and  compiled C++ software.  It has dependencies on several third-party
software packages. First and foremost, a Python interpreter must be installed. WST is currently compatible
with Python 2.6 or 2.7. Python 3.x is not yet supported. Python is available from http://python.org/.
The WST source code and binary distributions bundle several additional Python packages, including:
    Coopr
        A collection of open-source optimization-related Python packages that  support  a diverse set of
        optimization capabilities for formulating and analyzing optimization models. Coopr in  turn bundles
        several third-party dependency libraries:
        argparse

-------
             A Python command line argument parsing utility.
        coverage
             A Python utility for capturing and reporting code coverage.
        distribute
             A Python utility for building and installing Python packages.
        gcovr
             A utility for parsing and reporting GCOV code coverage reports.
        nose
             A Python test-harness driver.
        ordereddict
             A utility that back-ports ordered dictionaries to Python 2.6.
        pip
             A Python utility for installing Python packages.
        ply
             A general parser-lexer.
        pyro
             A utility for managing distributed Python execution.
        runpy2
             A utility that back-ports runpy functionality to Python 2.4.
        setuptools
             A Python utility for building and installing Python packages.
        six
             A utility that provides a portable interface to Python 2.x and 3.x.
        unittest2
             A utility that back-ports unittest functionality from Python 2.7 to 2.3-2.6.
        virtualenv
             A utility for creating virtual Python environments.
    PyUtilib
        A collection of Python utilities, including the testing harness used in WST.
    PyEPANET
        Python wrappers for the EPANET 2.0 Programmers Toolkit.
    PyYAML
        A YAML parser and emitter  for Python.
WST subcommands can leverage numerous third-party programs that are not available in the WST source
code:

    AMPL
        A commercial algebraic modeling environment, available from http://www.ampl.com/.
    CBC
        An open-source mixed-integer linear  programming solver,  available from https://projects.
        coin-or.org/Cbc.  The COIN Binary Project provides pre-compiled binaries through the CoinAll
        distribution, available from https://projects. coin-or.org/CoinBinary.
    CPLEX
        A commercial mixed-integer linear programming solver, available  from http://www.ibm.com/
        software/integration/optimization/cplex-optimizer/.
    Coliny
        An open-source package that provides algorithms for model transformation  and black-box opti-
        mization, available as the Acro-coliny project from https://software, sandia.gov/trac/acro/.
    Dakota
        An open-source package that provides algorithms for black-box optimization, sensitivity analysis,

-------
        surrogate modeling and uncertainty quantification, available from http://dakota.sandia.gov/.
        For Windows users, the 5.1 MinGW build is recommended.
    GLPK
        An open-source mixed-integer linear programming solver, available from http://www.gnu.org/
        software/glpk/. Pre-compiled binary distributions are available as part of most UNIX-like oper-
        ating systems. The GLPK for Windows Project provides pre-compiled Windows binaries, available
        from http://winglpk.sourceforge.net/.
    Gurobi
        A commercial mixed-integer linear programming solver, available from http://www.gurobi.com/.
    PICO
        An open-source mixed-integer linear programming solver, available as the Acro-pico project  from
        https://software.sandia.gov/trac/acro/.
Please refer to the individual projects' documentation for licensing, pricing and installation information.

2.3   Installing the Water Security Toolkit  Binary Distributions

Precompiled binary distributions (for Windows and Linux platforms) are distributed as a single ZIP archive.
Installing WST requires unzipping the archive to any location on  the hard drive.  The archive will create
several folders  within the main WST folder:
    bin       The compiled WST programs and driver utilities
    dist      Third-party dependencies
    doc      WST documentation (including this guide)
    etc       Common data, including model files and example EPANET networks
    examples Files associated with the WST subcommand examples

The main WST executable is located in the wst/bin directory. WST can be executed by typing the full path
to the executable (e.g., C:\WST-l.l\bin\wst.exe on Windows or wst-1.I/bin/wst on Linux) or by adding
the wst-1.1/bin directory to the system PATH variable and calling wst from the command line prompt. The
first time the wst command is used a message about configuring Python packages for first use is displayed.
This process is normal and could take several minutes depending on the system.

2.4   Compiling the Water  Security Toolkit Source Code

Compiling  WST  from the source code is an advanced topic and targeted only at potential developers. It
assumes familiarity with compilers and build terminology. General users  are strongly recommended to use
the pre-built binary packages  whenever possible.

Compiling  WST from the source code  uses the Python VirtualEnv package  to set  up a virtual Python
environment within the WST the source code distribution. The Python components  of WST  are installed
into this virtual environment to better insulate WST from the rest of the particular environment (and vice-
versa). The compiled (C++) binary executable files are installed into a bin directory within the source code
distribution. Currently, WST does not support out-of-source builds.

While WST can be compiled from the  source code for Windows and Linux operating systems, Windows
users  are recommended to leverage the pre-built binary distributions.  WST can be compiled for Linux using
a 3-step process:

   1.  Obtain the WST source code

   2.  Configure the Python virtual environment

   3.  Build the C++ executable files

-------
2.4.1   Obtaining the Water Security Toolkit Source Code

The  WST source code can either be extracted from a downloaded source zip/tar archive or checked out
directly from the repository using Subversion. The following directions assume that the source code is in the
wst-1.1  directory.

2.4.2   Configuring the Python Virtual Environment

The  Python virtual environment is automatically configured  by the setup  command distributed in the
top-level directory of the source code distribution:
   cd -/wst-1.1
   ./setup
This configures WST using the system's default Python interpreter and the bundled versions of the Python
dependencies.  A different version of Python can be used with WST by specifing it explicitly when running
the setup command:
   cd ~/¥st-l.1
   python2.7 ./setup
WST can also be configured with the  bleeding edge (trunk)  versions of the key Python dependencies by
specifing the -trunk option:
   cd ~/¥st-l.1
   ./setup —trunk
Setup configures the Python virtual environment within the wst/python directory (e.g.,/wst-l.I/python).
The  virtual interpreter  and the main wst  command both reside in wst/python/bin directory (e.g.,/wst-
1.1/python/bin/wst).  If only  a single  virtual environment is going to be on  the machine,  adding the
wst/python/bin directory to the system PATH variable is recommended.  Alternatively, the Ibin  and
Ipython commands  (installed into wst/python/bin) can be used to correctly locate local binaries and the
local virtual python interpreter. To run the wst command from anywhere under the main WST directory,
use the  Ibin wst command. Similarly,  to run the local python (virtual environment) interpreter, use the
Ipython command. It is safe to copy both Ibin and Ipython to other directories (e.g.,/bin).

2.4.3   Building the C++ Executable Files

WST relies on  the GNU Autotools to manage the build process  for compiled executables.  In particular,
Autoconf version 2.60 or newer  must be installed on the system along with a relatively new C++ compiler
and  linker  (e.g., gcc >= 3.4).  The build  process follows the normal autoreconf - configure - make
sequence:
   cd ~/¥st-l.1
   ./setup
   autoreconf  -v -i -f
   ./configure
   make
It is not recommended to use the make  install command. The resulting compiled binaries reside in wst/bin,
and are easily accessed from anywhere under the main WST directory using the Ibin command.
This process could be simplified by using the main setup command:
   cd ~/
   ./setup build

-------
2.5   Basic Usage of the Water Security Toolkit

The main command line structure to execute a WST subcommand is the following:
   wst SUBCOMMAND 
where SUBCOMMAND is the one of subcommands available under the wst command and configfile is the
configuration file associated with the specified subcommand. The subcommands include the following:

   • tevasim

   • sim2Impact

   • sp

   • flushing

   • booster_msx

   • booster_mip

   • inversion

   • grabsample

Each subcommand is described in more detail in Chapters 3 through 9.

In addition, the --help option prints information about the different subcommand options available.
   ¥st —help
Each subcommand has the option to generate a template configuration file by using the following command
line:
   wst SUBCOMMAND —template 
where configfile is the name of the template configuration file created for the specified SUBCOMMAND.

2.6   Verifying Installation of the Water Security Toolkit

An example using one of the WST subcommands can be used to verify the proper installation of WST. This
example uses the WST subcommand tevasim, which is documented in Chapter 3.

   1. A template configuration file for the tevasim subcommand can be generated using the following com-
     mand line, in which verify-wst.yml is the template configuration file to be created:
        ¥st tevasim —template verify-¥st.yml
     This example assumes that the wst/bin directory was added to the PATH variable.  If the path was
     not modified, the wst command would be replaced with the full path to the main WST script (e.g.,
     C:\WST-l.l\bin\wst) in this and all subsequent commands.

   2. The EPANET input file for the  example network (NetS.inp)  needs to be copied from the  wst/ex-
     amples/Net3 directory to the current working directory, since it is the network file referenced in the
     generated template file. On Windows (assuming WST is installed to C:\WST-1.1), the command line
     to copy this file is the following:
        copy C:\WST-1.l\examples\Net3\Net3.inp
     On Linux (assuming WST is installed to ~/wst-l.l), the command line to copy this file is the following:

-------
        cp ~/¥st-l.l/examples/Net3/Net3.inp
   3. The tevasim subcommand using this example is executed with the following command line:
        ¥st tevasim verify-¥st.yml
     This runs the tevasim subcommand and produces the output shown in Figure 2.1.

2.7   Uninstalling the Water Security Toolkit

As WST does not rely on a formal installer, uninstalling WST only requires deleting the main WST directory
(regardless if the pre-built binaries were installed or WST was built from the source  code). If the wst/bin
and/or wst/python/bin directories were added to the system PATH variable, these entries should be removed
also.

-------
THREAT ENSEMBLE VULN
(T E V
THREAT SI
Version
T E V A is using Epanet release 20012
Initializing Epanet
Computing network hydraulics at time
Computing network water quality:
Scenario 00001, injection nodes
Scenario 00002, injection nodes
Scenario 00003, injection nodes
Scenario 00004, injection nodes
Scenario 00005, injection nodes
Scenario 00006, injection nodes
Scenario 00007, injection nodes
Scenario 00008, injection nodes
Scenario 00009, injection nodes
Scenario 00010, injection nodes
Scenario 00011, injection nodes
Scenario 00012, injection nodes
Scenario 00013, injection nodes
Scenario 00014, injection nodes
Scenario 00015, injection nodes
Scenario 00016, injection nodes
Scenario 00017, injection nodes
Scenario 00018, injection nodes
Scenario 00019, injection nodes
Scenario 00020, injection nodes
Scenario 00021, injection nodes
Scenario 00022, injection nodes
Scenario 00023, injection nodes
Scenario 00024, injection nodes
Scenario 00025, injection nodes
Scenario 00026, injection nodes
Scenario 00027, injection nodes
Scenario 00028, injection nodes
Scenario 00029, injection nodes
Scenario 00030, injection nodes
Scenario 00031, injection nodes
Scenario 00032, injection nodes
Scenario 00033, injection nodes
Scenario 00034, injection nodes
Scenario 00035, injection nodes
Scenario 00036, injection nodes
Scenario 00037, injection nodes
Scenario 00038, injection nodes
Scenario 00039, injection nodes
Scenario 00040, injection nodes
Scenario 00041, injection nodes
Scenario 00042, injection nodes
Scenario 00043, injection nodes
Scenario 00044, injection nodes
Scenario 00045, injection nodes
Scenario 00046, injection nodes
Scenario 00047, injection nodes
Scenario 00048, injection nodes
Scenario 00049, injection nodes
Scenario 00050, injection nodes
Scenario 00051, injection nodes
Scenario 00052, injection nodes
Scenario 00053, injection nodes
Scenario 00054, injection nodes
Scenario 00055, injection nodes
Scenario 00056, injection nodes
Scenario 00057, injection nodes
Scenario 00058, injection nodes
Scenario 00059, injection nodes
Species generated/saved:
Species idx stored?
species 0 stored
Elapsed time for quality simulations
T E V A Normal Termination
E R A B I
A)
M U L A T
1.2


= 48:00

15
35
101
103
105
107
109
111
113
115
117
119
121
123
125
127
131
139
141
143
145
147
149
151
153
157
159
161
163
166
167
171
177
185
189
191
193
197
199
201
203
205
207
209
211
213
215
217
219
225
229
231
237
239
243
247
251
253
255



( seconds ) :

L I T Y

0 R





, time48 :
time48:
, time48 :
time48:
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
time48:
, time48 :
, time48 :
time48:
, time48 :
time48:



1 .OOOOOOe

ANALYSIS







00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00



+ 00

Figure 2.1: Output from the tevasim subcommand.



                      10

-------
Chapter 3

Contaminant  Transport
This chapter describes how to simulate contamination incidents in a water distribution network, which is
one of the first steps before designing a water quality sensor network or evaluating response actions to  a
contamination incident.  The tevasim subcommand simulates the hydraulics and contaminant transport
within a water distribution network model, which consists of pipe, node, pump,  valve, storage tank and
reservoir components. The tevasim subcommand uses the hydraulic engine from EPANET to solve the flow
continuity and headless equations (Rossman, 2000).  Water quality simulations are calculated using either
EPANET (Rossman, 2000), EPANET-MSX (Shang et al., 2011), or Merlion (Mann et al., 2012a). To increase
efficiency when simulating a large ensemble of contamination incidents, the tevasim subcommand uses  a
single hydraulic simulation to simulate an ensemble of water quality simulations.  A flowchart representation
of the tevasim subcommand  is shown in Figure  3.1.  The utility network model is defined by an EPANET
compatible network models (INP format) in WST. The  simulation input is supplied through the tevasim
WST configuration file.
                               Utility Network / /  Simulation
                                 Model    /  / 	Input
                                       Contaminant
                                         Transport
                                        Threat Ensemble
                                         Database
                      Figure 3.1: Contaminant transport simulation flowchart.


3.1   Hydraulic and Water  Quality Analysis

Three water quality simulators, EPANET, EPANET-MSX and Merlion, are used within WST. These simu-
lators are explained in more detail in the following subsections.

3.1.1  EPANET and EPANET-MSX

EPANET performs extended-period simulation of the hydraulic and water quality behavior within pressurized
pipe networks.  These models can evaluate the expected flow in  water  distribution systems, and model
the transport of contaminants and related chemical interactions.  The multi-species extension, EPANET-
MSX, is also included in  WST to simulate  contamination incidents using multi-species reactions.  Any

                                               11

-------
reaction dynamics between chemical and/or biological species (e.g., chemical-chemical, chemical-biological,
or biological-biological) can be modeled and simulated using EPANET-MSX. EPANET-MSX can be used
in sensor network design and booster station placement.  More specifics on these applications can be found
in Chapters 5  and 7.  Additional information on EPANET can be found at http://www.epa.gov/nrmrl/
wswrd/dw/epanet. html and in the EPANET user manual (Rossman, 2000).  Additional information  on
EPANET-MSX can be found in the EPANET-MSX user  manual (Shang et al.,  2011).

3.1.2  Merlion

The  tevasim  subcommand also includes  a water quality modeling framework  called Merlion.  Unlike
EPANET, Merlion does not model bulk or wall  reactions.  Given hydraulic information from simulation
packages like EPANET or experimental data, Merlion models  the transport of a substance as it spreads
through the water distribution system based on the network dynamic flow patterns. Merlion first formulates
a linear water  quality  model with explicit all-to-all mapping (inputs include injections at all possible nodes
and time steps, and outputs include concentrations at all  possible nodes and  time  steps).  This model is
then easily used for  forward tracing simulations by first specifying the injection profile and then solving the
system for the  network concentration profile. The linear model also can be embedded within other numerical
applications or for analysis in many security applications. Using Merlion in the tevasim subcommand can
be faster for multi-scenario simulation; however, it is also more memory intensive.  Merlion is also used in
the tevasim subcommand to identify booster station locations,  contaminant source injection locations and
manual grab sample locations.  More specifics about these applications are in  Chapters 7, 8 and 9. More
information on Merlion can be found in Section 10.1 and  (Mann et al., 2012a).

3.2    Contaminant Transport Scenarios

Contaminant transport scenarios can be defined directly in a WST configuration file or by using a TSI or TSG
file. These options are set in the [scenario] block of the configuratin file for all of the WST subcommands
that  require scenarios.

The recommended approach is to define the contamination transport scenarios directly in the scenario block
of WST configuration file. The options that must be set  are the location, type, strength, species (required
only  for EPANET-MSX), and start and end times for the contamination scenarios.  The injection location
can be specified by a list of EPANET node IDs, or by  the key words NZD (non-zero demand nodes) or
ALL (all nodes) to create an ensemble of contaminant  scenarios. The injection type can be CONCEN,
MASS, FLOWPACED, or SETPOINT as defined in the EPANET user manual(Rossman, 2000).  CONCEN
represents the  concentration of an external source entering a node and applies only when the node has a net
negative demand.  MASS, FLOWPACED, and SETPOINT represent  booster sources, where a contaminant
is injected directly into the network regardless of nodal demand. A  MASS source type  adds  a fixed mass
flow  to that resulting  from inflow  to the node, while a FLOWPACED booster adds a fixed concentration
to the  resultant inflow concentration at  the node.  A SETPOINT booster fixes the concentration  leaving
the node as long as the inflow concentration was below  the setpoint.  The strength of a MASS source is
in units of mass flow per minute, while CONCEN, FLOWPACED, and SETPOINT  sources are in units of
concentration  (mass per volume).  The configuration file  defines injection time in  minutes and strength in
mg/L or mg/min depending on the injection type.

Alternatively, the contamination transport scenarios can be defined using a TSI or TSG file.  Each line of
a TSI file specifies a single  contaminant scenario by listing the injection location, type, species  (required
only  for EPANET-MSX), strength, and time frame.  Each scenario can include multiple injection locations
and multiple injection species with unique injection strengths and time frames. This format allows for the
greatest flexibility in combined scenario options. For more detail on  the TSI file, see File Formats Section
11.13.
The  TSG file  is a short hand format of the more  detailed TSI file.  Multiple injection locations  can  be
specified on a single line.  All permutations of the combined locations are used to create multiple scenarios.

                                                12

-------
Each line of the TSG file is limited to a single injection species and time frame.  For more detail on the TSI
file, see File Formats Section 11.12. The TSI and TSG files specify the injection time frame in seconds and
the strength units depend on in the INP network model file units.
If a TSI file is specified in the WST  configuration file, the TSG file and location, type, strength, species,
start time, and end time options are overridden.  If a TSG file is specified in the WST configuration file,
the location, type, strength, species, start time, and end time options specified  in the configuration file are
overridden.

3.3   tevasim Subcommand

The tevasim subcommand is executed using the following command line:
   ¥st tevasim 
where conf igf ile is a WST configuration file in the YAML format.
The --help option prints information about this subcommand, such as usage, arguments, and a brief de-
scription:
   ¥st tevasim —help
3.3.1   Configuration File

The tevasim subcommand generates a template configuration file using the following command line:
   ¥st tevasim —template 
The tevasim WST template configuration file is shown in Figure 3.2. Brief descriptions of the options are
included in the template after the # sign.
   # tevasim configuration template
   network:
     epanet file: NetS.inp  # EPANET network file name
   scenario:
     location:  [NZD]        # Injection location: ALL, NZD or EPANET ID
     type: MASS            # Injection type: MASS, CONCEN, FLOMPACED, or SETPOINT
     strength:  100.0        # Injection strength [mg/min or mg/L depending on type]
     species: null          # Injection species, required for EPANET-MSX
     start time: 0          # Injection start time [min]
     end time:  1440         # Injection end time [min]
     tsg file:  null         # TSG file name, overrides injection parameters above
     tsi file:  null         # TSI file name, overrides TSG file
     msx file:  null         # Multi-species extension file name
     msx species: null      # MSX species to save
     merlion: false         # Use Merlion as WQ simulator, true or false
   configure:
     output prefix: Net3    # Output file prefix
     debug:  0              # Debugging level, default = 0
                          Figure 3.2: The tevasim configuration template file.

3.3.2   Configuration Options

Full descriptions of the WST configuration options used by the tevasim subcommand are listed below.
network
     epanet file
          The name of the EPANET input (INP) file that defines the water distribution network model.

                                                  13

-------
          Required input.
scenario
     location
          A list that  describes the injection locations for the contamination scenarios.  The options are:
          (1) ALL, which denotes all nodes (excluding  tanks and reservoirs) as contamination injection
          locations; (2) NZD,  which  denotes all nodes with non-zero demands as contamination injection
          locations; or (3) an  EPANET node ID, which identifies the node where contamination is intro-
          duced.  This allows easy specification of single or multiple contamination scenarios.
          Required input unless a TSG or TSI file is specified.
     type
          The injection type for the contamination scenarios.  The options are MASS, CONCEN, FLOW-
          PACED or SETPOINT. See the EPANET manual for additional information about source types
          (Rossman, 2000).
          Required input unless a TSG or TSI file is specified.
     strength
          The amount of contaminant injected into the network for the contamination scenarios. If the type
          option is MASS, then the units for the strength are in mg/min. If the  type option is CONCEN,
          FLOWPACED or SETPOINT, then units are in mg/L.
          Required input unless a TSG or TSI file is specified.
     species
          The name of the contaminant species injected into the network. This the name of a single species.
          It is required when using EPANET-MSX, since multiple species might be simulated, but only one
          is injected into the network.  For cases where multiple contaminants are injected, a TSI file is
          needed.
          Required input for EPANET-MSX unless a TSG or TSI file is specified.
     start time
          The injection start time that defines when the contaminant injection begins. The time is given in
          minutes and is measured from the start of the simulation.  For example, a value of 60 represents
          an injection that  starts at hour 1 of the simulation.
          Required input unless a TSG or TSI file is specified.
     end time
          The injection end time that defines when the contaminant injection stops. The time is given in
          minutes and is measured from the start of the simulation. For example, a value of 120 represents
          an injection that  ends at hour 2 of the simulation.
          Required input unless a TSG or TSI file is specified.
     tsg file
          The name of the TSG scenario file that defines the ensemble of contamination scenarios to be
          simulated. Specifying a TSG file will override the location, type, strength, species,  start and end
          times options specified in the WST configuration file. The TSG file  format is documented in File
          Formats Section 11.12.
          Optional input.
     tsi file
          The name of the  TSI scenario file that defines the ensemble of contamination scenarios to be
          simulated. Specifying a TSI file will override the TSG file, as well as the location, type, strength,
          species,  start and end time options specified in the WST configuration file. The TSI file  format
          is documented in File Formats Section 11.13.
          Optional input.
                                                14

-------
     msx file
          The name of the EPANET-MSX multi-species file that defines the multi-species reactions to be
          simulated using EPANET-MSX.
          Required input for EPANET-MSX.
     msx species
          The name of the MSX species whose concentration profile will be saved by the EPANET-MSX
          simulation and used for later calculations.
          Required input for EPANET-MSX.
     merlion
          A flag (true  or false) to indicate if the Merlion water quality simulator should be used. If an MSX
          file is provided, EPANET-MSX will be used.
          Required input, default = false.
configure
     output prefix
          The prefix used for all output files.
          Required input.
     debug
          The debugging level that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).

3.3.3  Subcommand Output

The  _tevasim.json file contains a summary of the tevasim  results. This file includes the
name of the EPANET report file and the name of the binary ERD database file, the run date, and CPU
time. These files are described below.

   • EPANET report:  This  file provides information on the EPANET simulations. The EPANET  report
     file format is described in Appendix C.3 of the EPANET Users Manual (Rossman, 2000).

   • ERD  database: The database contains the simulation results, and is stored in four files: header file,
     index file, hydraulics file, and a water quality file. The ERD database format is described in Geib et al.
     (2011). This files is not intended to be read by users but rather is read  by other WST subcommands.

3.4   Contaminant Transport Examples

An EPANET network model  (INP format) and a configuration file are required to run the tevasim subcom-
mand.

The  tevasim template configuration file uses the EPANET Example Network 3 (NetS.inp).  The network is
shown in Figure  3.3. Net3 contains 92 junctions, 2 reservoirs, and 3 tanks.  This network has 59 non-zero
demand (NZD) nodes.  The network file is setup to run a 48 hour hydraulic and water quality simulation.  A
1 hour hydraulic time step  and 5 minute water quality time step are used.

The  scenario ensemble  in the tevasim template configuration file defines a contaminant injection from each
NZD node,  with a MASS injection of 100 mg/L, starting at time 0 and injecting for 24 hours (1440 minutes).

To define scenarios that start and stop at multiple times, a TSG file can be used to define the scenario set.
Figure 3.4 shows the example TSG file,  NetS.tsg,  in which the contamination scenarios are injected at all
NZD nodes starting at 12 am, 6 am, 12  pm, and 6 pm for  a total of 236 scenarios.  Each injection lasts 24
hours and injects a contaminant at 100 mg/L.
                                               15

-------
                       Figure 3.3: Layout of the EPANET Example Network 3.
; 24-hour events, 12am
NZD
NZD
NZD
NZD
MASS 100
MASS 100
MASS 100
MASS 100
6am, 12pm, and 6pm
0
21600
43200
64800

86400
108000
129600
151200
                        Figure 3.4: Example TSG contamination scenario file.
3.4.1  Example 1
The first example uses the NetS.inp network file, the contamination scenario set defined by NetS.tsg and the
Merlion water quality model.  The configuration file, tevasim_exl.yml, for this example is shown in Figure
3.5.
The example can be executed using the following command line:
   ¥st tevaslm tevaslm_exl.yml
                                                16

-------
   network:
     epanet file:  Net3/Net3. inp
   scenario:
     location:  null
     type: null
     strength:  null
     species: null
     start time: null
     end time:  null
     tsg file:  Net3/Net3.tsg
     tsi file:  null
     msx file:  null
     msx species:  null
     merlion: true
   configure:
     output prefix: tevasim_exl/Net3
     debug:  0
                        Figure 3.5:  The tevasim configuration file for example 1.
3.4.2   Example 2
The second example uses EPANET-MSX to simulate the transport of multiple contaminants. For a multi-
species simulation, a MSX file and the MSX species must be added to the tevasim configuration file.  The
MSX species  is the species whose concentration profile will be saved by EPANET-MSX to be used for future
calculations.  The MSX species  can be different than the species which is injected into the network.  The
configuration file, tevasim_ex2.yml, for this example is shown in Figure 3.6. The example uses the NetS.inp
network  file  and the MSX file,  Net3_EColi_TSB.msx, which simulates the reaction dynamics  between
Escherichiacoli, chlorine and a tryptic soy broth  (TSB), a nutrient broth that helps to grow bacteria. For
more information about this specific reaction dynamics, see the  E. co/*-TSB model described  in Murray
et al. (2011).  In this example, both the species and MSX species are the same.
   network:
     epanet file:  Net3/Net3. inp
   scenario:
     location:  ['15']
     type: MASS
     strength:  5.77e8
     species: EColi
     start time: 0
     end time:  360
     tsg file:  null
     tsi file:  null
     msx file:  Net3/Net3_EColi_TSB.msx
     msx species:  EColi
     merlion: false
   configure:
     output prefix: tevasim_ex2/Net3_EColi_TSB
     debug:  0
                        Figure 3.6:  The tevasim configuration file for example 2.

The example can be executed using the following command line:
   ¥st tevasim tevasim_ex2.yml
To simulate the simultaneous  injection of two species, a TSI  file is needed.  For example, the TSI file,
Net3_EColi_TSB.tsi, defines the simultaneous injection of E.  coli and  TSB at multiple locations within
Net3.

                                                  17

-------
Chapter 4

Impact  Assessment
The potential consequences of individual contamination scenarios can be quantified using results from the
contaminant transport simulations and a variety of impact assessment metrics.  The sim2Impact subcom-
mand performs impact assessment using the output threat ensemble database  (ERD)  from the tevasim
subcommand. This analysis provides all necessary network statistics for sensor nework design (described in
Chapter 5) as well as response actions, such as flushing hydrants and boosting disinfectant  (described in
Chapters 6 and 7, respectively).

A flowchart representation of the sim2Impact subcommand is shown in Figure  4.1. The threat ensemble
database (ERD) is the output  from  the tevasim subcommand and is required  input for the sim2Impact
subcommand. The consequences input parameters are supplied through the sim2Impact WST configuration
file. Additional input data that  describes the exposure and dose response models of a particular contaminant
is required if a  human health impact metric is used.  These models are defined by parameters listed in a
threat assessment input  (TAI)  file.  More details on the TAI file are  provided in the File Formats  Section
11.11.
                              Threat Ensemble  / / Consequences
                                 Database   / /	Input
                                           Impact
                                         Assessment
                                          Impact File
                              Figure 4.1: Impact assessment flowchart.

Several impact metrics are included in the sim2Impact subcommand to reflect different criteria that decision
makers could use in sensor  network  design or response actions.  These metrics include:  population dosed
(PD), population exposed (PE), population killed  (PK),  extent of contamination (EC), timed extent of
contamination (TEC), mass consumed (MC), volume consumed (VC), time to detection (TD), and number
of failed detections (NFD).  The equations used to  compute the impact metrics are listed in Section 4.1.
Impact metrics are calculated at discrete time steps for a given contamination scenario. The discrete time
steps are defined by the reporting time step and the duration of the water quality simulation.
Human health impacts (PD, PE and  PK) can be estimated by combining the water quality simulations with
exposure models. Contaminant-specific data are needed to accurately estimate the health endpoints.  For

                                                18

-------
many contaminants, reliable data are lacking, and the ensuing uncertainty in the results must be understood.
More information on the human health impacts is provided in Section 4.2.

4.1    Impact Metrics

Impact assessment results are calculated and stored in an impact file. This file is not typically read by a
WST user, rather it is read by a sensor or response optimization routine. For each contamination scenario,
the impact file contains a list of  all the locations (nodes) in the network where a sensor  might detect
contamination from a specific scenario.   Nodes that do not detect contamination are not included in  the
impact file for that specific scenario.  For each node that detects contamination, the impact file contains the
detection time and consequence at that time, as measured by one of the impact metrics.
The  impact file is used as input for sensor placement optimization and during the optimization process of
response actions, such as flushing hydrants and boosting disinfectant. When calculating impacts, a detection
threshold can be specified such that contaminants are only detected above a specified concentration limit (the
default limit is zero).  Second, a response time can be specified in the  sim2Impact configuration file, which
accounts for the time needed to verify the presence of contamination (e.g., by field investigation), inform the
public, and/or initate flushing, booster disinfection, or other response action (the default response time is
zero). The contamination impact is computed at the time when the response has been initiated (the detection
time plus response time), which is  called  the effective response time. Finally, a detection confidence can be
defined, which specifies the number of sensors that must detect contamination from any given scenario before
it is  considered to be detected, at which time the impacts are calculated (the default is 1 sensor).
The  impact file contains  four  columns of information:

   •  Column  1 contains the contamination scenario number, a

   •  Column 2 contains the node location where contamination was detected, i

   •  Column 3 contains the effective response time in minutes, T/

   •  Column 4 contains the impact at the effective response time as measured by a specified metric,  da^

The  impact file is documented in the File Formats Section 11.4. The impact metric, da^, is used directly in
the sensor placement formulation (Equation 5.1), the flushing formulation, and the booster formulation.
The  effective response time at node i, T/, is calculated using the following equation:

            T[ = min (t : \Cn^ > detection limit|  > detection confidence)AT + response time        (4.1)

where Cn detection limit| is the number of node, time step pairs
where contaminant was  detected above the detection limit, this includes detection at node i).

In the impact file, the impact at the end of the simulation time is included for each contamination scenario.
Essentially, this is the impact if contamination was not detected at any node location, and is often referred
to as the dummy sensor location. For this entry, i is set to -1, T/ is the time at the end of the water quality
simulation, and da^ is the impact at the end of the simulation.

The  impact, da^ can be computed using one of the following metrics:  PD, PE,  PK, EC, TEC, MC, VC,
TD and NFD. These metrics  are defined in the following equations. In the equations, the effective response
time step for node i, t^,  equals T//AT, and subscripts n, t and p are used to reference a specific node, time
step  and person, respectivly.

                                                 19

-------
PDa,i, populationd dosed, is the total number of individuals that received a cumulative dose of con-
taminant above a specified threshold for scenario a when contamination is detected at node i:

                       N POpn                     ( -,   ,ri         1    ,1    111
                      V^V-        u    r       J1   udnpt>.> dose threshold
              PDa,i =  > . > . 5n,P,t>. where 5npt( = <          'p' *                            4.2
                      ^_/ ^_/   >*- > *                 o   otherwise
                      n=l p=l                     I.
where N is the number of nodes in  the network, popn is the population at node n calculated  using
Equation 4.11, dn calculated using Equation 4.18.  The variables, 7n>t/
and -Dn>t'., are the infected and diseased state, respectively, at  node n at the effective response time
step t^ computed from the disease progression model described in Section 4.2.4.
       population killed,  is the number of individuals killed by a contaminant for scenario a when
contamination is detected at node i:

                                                N
                                       PKaji = Y, En,t'.                                   (4.4)
                                               n=l
where N is the number of nodes in the network and Fn^/ is the fatality state at node n at the effective
response time step t^ computed from the disease progression model described in Section 4.2.4.

ECaii, extent of contamination, is the length of contaminated pipe for scenario a when contamination
is detected at node i:

                        -r-^                        I 1   if Cn ti >  detection limit
               ECai =  >  Lnt'dnti where ont/  = \         ' *                              (4.5)
                        ^—'    ' *  '  *         ' *      o   otherwise
                       n=l                        (.
where N is the number of nodes in the network, Ln^>. is the length of all pipes connected to node n with
flow starting at node n at the effective response time step t'i and Cn^. is the contaminant concentration
at node n  at the effective response  time step t^.  An entire pipe is considered contaminated if the
contaminant enters the pipe.

TECaii, timed extent of contamination, is the length of contaminated pipe for scenario a when con-
tamination is detected at node i and includes the length of contaminated pipe each time a node is
contaminated, up to the time when contamination is detected:

                            +'                     ,
                                                   \ 1    if Cn t > detection limit
                                n,tSn,t where 5nt = < n        ' .                           (4.6)
                        n=i t=i                     1°    otherwise
where N is the number of nodes in the network,  Ln>t is the length of all pipes connected to node n
with flow starting at node n at time step t and Cr^t is  the contaminant concentration at node n at
time step t. An entire pipe is considered contaminated if the contaminant enters the pipe.  This metric
is only intended to be used within the flushing response optimization routine, which

                                           20

-------
        Ca  detection limit
                   VCaj, time to detection, is the time from the beginning of scenario a until contamination is first
     detected at a node i.

                                    TDaji = T/ -  injection start time                           (4.9)

     where T/ is the effective response time.

     NFDa^, number of failed detections, is a binary value to indicate the detection of scenario a at node
                                      I 1   if scenario a is not detected at node i
                                     1 0   otherwise

     where the total impact is given a value of 1 if scenario a is not detected at node i or the value of 0 if
     scenario a is detected at node i. Since the impact file only lists nodes which detect scenarios, all node,
     time pairs have a total impact of 0, except for the dummy location (i = -1), which is given a value of
     1.


4.2    Humand Health Impact Model

The human health impact model is used to compute PD, PE  and PK. In order to calculate these metrics,
an estimate of the  population ingesting water and the  cumulative dose and response for each individual  at
each node is required. A disease progression model is used to compute the population susceptible, infected,
diseased and killed given a cumulative dose  of contaminant. Input parameters for the human health impact
model are stored in a TAI file. The TAI file format is described in Section 11.11. Additional information on
human health impact models can be found in the EPA compendium report (Murray et al., 2010).
                                                21

-------
4.2.1  Population

The  population at each network node can either be defined explicitly in the TAI  file using a population
file or calculated based on the demand at each node. The population file has one line per node. Each line
contains the node ID followed by the population value for that node.  For the demand-based calculation, it
is assumed that all water leaving the network is consumed by the population. Therefore, the population at
node n, popn, is computed using the following equation:

                                            POpn = %                                       (4.11)
                                                    it

where ~qn is the average volume of water  consumed at node n per day and R is the average volume of water
consumed per capita per day.  The variable, R, is set in the TAI file.  A USGS report provides usage rates
by state and gives a nationwide average of 179 gallons per capita per day  (U.S. Geological Survey, 2004).
Often 200 gallons per capita  per day is  used  for R. The  population is assumed to be constant over time.
The water consumption rate is more than just the ingestion of water by people since it includes all uses of
water, such as domestic, commercial, industrial, agricultural, and others.

4.2.2  Cumulative Dose

At each node, the total number of people potentially ingesting water is given by popn.  In order to compute the
cumulative dose, additional information is needed, including when and how much a person drinks. Ingestion
timing and volume models are used to make this calculation.  Additional information on the ingestion and
volume models can be found in Davis and Janke (2008). Three different ingestion timing models are available:

   • Demand-based (D24): assumes that tap water is ingested at every time step in an amount  proportional
     to the total water demand at that node.

   • Fixed  (F5):  assumes that tap water is ingested at five fixed times during a day.  These times are set
     to the typical starting times for the three major meals on weekdays  (7:00,  12:00, and 18:00) and times
     halfway between these meals (9:30  and  15:00).

   • Probabilistic (P5):  also assumes that tap water is ingested at five times per day at major meals and
     halfway between them,  but  it uses a probabilistic approach to determine meal times.  This is based
     on data from the American Time-Use Survey (ATUS)  (Bureau of  Labor Statistics and  U.S. Census
     Bureau, 2005).

In addition, there are two ingestion volume models:

   • Mean (M): assumes the same average quantity of tap water is ingested by all individuals  in the popu-
     lation who consume tap water.

   • Probabilistic (P): uses a probabilistic approach to estimate the volume ingested by individual people.

The ingestion timing model and the volume model are set in the TAI  file. The D24  ingestion timing model
is used only with the M volume model. Either the M or P volume models can be used with the F5 and P5
timing models. The volume models are used to determine a per capita ingestion volume,  Vntpy  in liters/day
for each person p at node n.  When using the M volume model,  Vn

tty to calculate cumulative dose for each time step. When using the D24 ingestion timing model, Vntl>tt is related to the demand at that node. The fraction of demand water that is ingested at node n at time step t, considering the entire length of the simulation, is


-------
computed by:
                                     Pn,t =   natya	nstepsAT
                                           Et=l   1n,t
where qnjj is calculated using Equation 4.13 or Equation 4.14 and Cnj  is the contaminant  concentration in
the water at node  n at time step j as predicted by the water quality simulations. Cumulative dose is given
in number of organisms or mass in milligrams.

4.2.3  Response

Dose-response functions are used to predict the percentage of the population that might experience a par-
ticular health outcome after receiving  a specific cumulative dose. Two dose-response functions, r(dntl>tt), are
available in  the sim2Impact subcommand:

   • Log-Probit model:
                                      r(dn,p,t) = ^(/3ln(dntpttLD50))                            (4.16)
     where $ is the cumulative distribution function of a standard  normal random variable, (3 is related  to
     the slope of the curve, LD5Q (or /_D50 for biological agents) is the dose at which 50% of the exposed
     population would die and  dntl>tt is calculated  using Equation  4.15. The parameters j3 and  LD50 are
     set in  the TAI file.

   • Generic logistic  function:
                                          + mf.   r,,p,t                     ^                 ^^
     where a,  m, r\ and  T are  function coefficients used to fit the model to  available data and dn,p,t is
     calculated using Equation 4.15.  The parameters a, m, r\ and T are set in the TAI file.

The average response, rn^, of the population at node n at time step t is calculated by:
                                                 EpOpn  / T   \
                                                 ;Ji r(dn^t)
                                        rn,t = — - -                                   (4.18)
where popn  is calculated using Equation 4.11 and r(dn^pjt) is calculated using Equation 4.16 or Equation
4.17.

                                                 23

-------
4.2.4  Disease Progression Model

To track how the population at each node responds to a specified contaminant over time, a disease progression
model is used.  Given the percentage of people at each node who would become ill after being exposed to
the contaminant, disease transmission models predict how the disease  would progress over time.  Disease
models are used to predict the number of people at each node susceptible to illness from the contaminant
(S), exposed to a lethal or infectious dose (/), experiencing symptoms of disease (D) and either recovering
(R) or being fatally impacted (F).  These equations assume that that  the recovered population does not
rejoin the susceptible  population. These quantities are predicted at each node over time according to the
following differential equations:
                                        — = XS-aI                                         (4.20)
                                         at

                                        ^-=aI-(a + v)D                                   (4.21)
                                        (MJ

                                        ^ = VD                                             (4.22)
                                        at

                                        — = aD                                             (4.23)
                                        OiV

where A is the per capita rate of infection, a is the per capita rate at which infected move to diseased, a
is the per capita disease induced untreated death rate, and v is the per capita recovery rate, or the rate at
which diseased moved to recovered or fatal states.

The infection rate, A, is given by:
                                         •    —                                                -
                                                  at   on t
where r(dntptt) is calculated using Equation 4.16 or Equation 4.17 and S is calculated by Equation 4.19.
In the TAI file, the LATENCY TIME is the inverse of a, the FATALITY RATE is a, and the FATALITY
TIME is the inverse of v. For more detail on the disease  progression model and health impacts, see Murray
et al. (2006).

4.3   sim2Impact  Subcommand

The sim2Impact subcommand is executed using the following command line:
   ¥st sim2Impact 
where conf igf ile is a WST configuration file in the YAML format.
The  --help option prints information about  this subcommand, such as usage, arguments, and a brief de-
scription:
   ¥st sim2Impact —help
4.3.1  Configuration File

The  sim2Impact subcommand generates a template configuration file using the following command line:
   ¥st sim2Impact —template 
The sim2Impact WST configuration template is shown in Figure 4.2.  Brief descriptions of the options are
included in the configuration template after the # sign.
                                                24

-------
   #  sim2Impact  configuration template
   impact:
     erd file:  [NetS.erd]     # ERD database file name
     metric:  [MC]            # Impact metric
     tai file: null          # Health impact file name,  required for public health metrics
     response time: 0        # Time [min] needed to respond
     detection limit: [0.0]   # Thresholds needed to perform detection
     detection confidence:  1  # Number of sensors for detection
     msx species: null       # MSX species used to compute impact
   configure:
     output prefix: Net3     # Output file prefix
     debug: 0               # Debugging level, default  = 0
                        Figure 4.2: The sim2Impact configuration template file.


4.3.2   Configuration Options

Full descriptions of the WST configuration options used by the sim2Impact subcommand are listed below.
impact
     erd file
          The name of the ERD database file that contains the contaminant transport simulation results.
          It is created by running the tevasim subcommand.  Multiple ERD files (entered as a list) can be
          combined to generate a single impact file. This can be used to combine simulation results from
          different types of contaminants, in which the ERD files were generated from different TSG files.
          Required input.
     metric
          The impact metric used to compute the impact file.  Options include EC, MC, NFD, PD, PE,  PK,
          TD, TEC or VC. One impact file is created for each metric selected. These metrics are defined
          in  Section 4.1.
          Required input.
     tai file
          The name of the TAI file that contains health impact information.  The TAI file format is docu-
          mented in File Formats Section 11.11.
          Required input if a public health metric is used (PD, PE or PK).
     response time
          The number of  minutes that are needed to respond to  the detection of a contaminant.  This
          represents the time that it takes a water utility to stop the spread of the contaminant in the
          network and eliminate the  consumption of contaminated water.  As the response time  increases,
          the impact increases because the contaminant  affects the network for a greater length of time.
          Required input,  default = 0 minutes.
     detection limit
          The concentration thresholds needed to perform detection with a sensor.  There must be one
          threshold  for each ERD file.  The units of these detection limits depend  on the units of the
          contaminant simulated for  each ERD  file (e.g., number of cells of a biological  agent). The units
          for detection limit are the same as for the MASS  values that are specified in the TSG file.
          Required input,  default = 0.
     detection confidence
          The number of sensors that must detect an incident before the impacts are calculated.
          Required input,  default = 1 sensor.
     msx species

                                                 25

-------
          The name of the MSX species tracked in the EPANET-MSX  simulation.  This parameter is
          required for multi-species contamination incidents created by tevasim subcommand.
          Required input for EPANET-MSX, default = first species listed in the ERD file
configure
     output prefix
          The prefix used for all output files.
          Required input.
     debug
          The debugging level that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).

4.3.3  Subcommand Output

The  _sim2Impact.json file  contains  a summary of the sim2Impact subcommand results.
This file includes the name  of impact file(s), ID file(s), nodemap file, and scenario map file, the  run date,
and CPU time.  These files are described below.

   • Impact file:  One impact file is generated  for each of the impact metrics specified.  The file contains
     the observed impact at each location where a contamination scenario could be observed by a potential
     sensor. This file is not intended to be read by users, but it is used later for sensor placement or other
     response optimization. The impact file is documented in the File Formats Section 11.4.

   • ID file: For each impact file (e.g., wst_Net3_mc.impact ), a corresponding ID file is generated to map
     the location IDs back  to the  network node labels.  This file is not intended to be read by users, since
     it is used internally by the software code.

   • Nodemap  file: The nodemap file maps sensor placement IDs  to the network node labels (defined by
     EPANET). This file is  not intended to be read by users, since it is used internally by the software code.
     The nodemap file is documented in the File  Formats Section 11.8.

   • Scenario map file: The scenario  map file maps contamination scenario IDs to the network node labels
     (defined by EPANET). This file is not intended to be read by users, since it is used internally by the
     software code. The scenario map file is documented in the File Formats Section 11.9.

4.4   Impact Assessment Examples

After simulating the fate and transport of contaminants in a water distribution network, the output can
be used to quantify the impacts of the contamination incidents.  An ERD file and a configuration file are
required to run  the  sim2Impact subcommand. In the following examples, the EPANET Example Network
3 is used. The output database, NetS.erd,  from the first tevasim subcommand example is used to compute
the impact assessments.

4.4.1  Example 1

Figure 4.3 shows the configuration file, sim2Impact_exl.yml, for the first sim2Impact subcomannd example.
This example computes an  impact assessment, based on NetS.erd, for the mass  consumed (MC), volume
consumed  (VC), extent of contamination (EC), time to detection (TD), number of failed detections (NFD),
and population  exposed (PE) impact metrics.  The TAI file, Net3_bio.tai, is added to define the human
health impact for a biological contaminant. The response time, detection limit, and detection confidence are
all set at the default values  (i.e., 0, 0, 1, respectively).

The example can be executed using the following command line:

                                               26

-------
   impact:
     erd file:  [Net3/Net3.erd]
     metric: [MC, VC, EC,  TD, NFD, PE]
     tai file:  Net3/Net3_bio.tai
     response time: 0
     detection limit: [0.0]
     detection confidence:  1
     msx species: null
   configure:
     output prefix: sim2Impact_exl/Net3
     debug:  0
                      Figure 4.3: The sim2Impact configuration file for example 1.
   ¥st sim2Impact  sim2Impact_exl.yml
For each impact metric, an impact file (e.g.,  Net3_pe.impact) and a corresponding ID file is generated
(e.g., Net3_pe.impact.id). For each contamination scenario (shown in column 1 after a two line header), the
impact file contains a list of nodes in the network (column 2) where a sensor might detect that contamination.
For each such node, the impact file contains the detection time (column 3) and the total impact (column 4)
given a sensor at that node  is the first to detect contamination from that scenario.

4.4.2  Example 2

The second example using the sim2Impact subcommand investigates the effect of changing the response time
and detection limit on a specific impact metric, MC. The example 2 configuration file, sim2Impact_ex2.yml,
is shown in Figure 4.4. This example uses a response  time of 60 minute response time and a  detection limit
of 0.1. Note that the units for detection limit are the same as for the mass values specified in the TSG file.
   # sim2Impact configuration template
   impact:
     erd file:  [Net3/Net3.erd]
     metric:  [MC]
     tai file:  null
     response time: 60
     detection limit: [0.1]
     detection confidence:  1
     msx species: null
   configure:
     output prefix: sim2Impact_ex2/Net3
     debug:  0
                      Figure 4.4: The sim2Impact configuration file for example 2.

The example can be executed using the following command line:
   ¥st sim2Impact  sim2Impact_ex2.yml
4.4.3   Example 3

The sim2Impact example 3 calculates the impact associated with multi-species contamination incidents.
The msx species option specifies which species concentration profile to use to calculate impact metrics. This
option is required for multi-species contamination scenarios  created by the tevasim subcommand. The
configuration file for the multi-species example, sim2Impact_ex3.yml, is shown in Figure 4.5.  This example
uses an ERD file created  by EPANET-MSX and computes the MC  impact metric for the E. coli species.
The response time, detection limit, and detection confidence are all set  at their default values.

                                                 27

-------
   impact:
     erd file:  [Net3/Net3_EColi_TSB.erd]
     metric:  [MC]
     tai file: null
     response time: 0
     detection limit: [0.0]
     detection confidence:  1
     msx species: EColi
   configure:
     output prefix: sim2Impact_ex3/Net3
     debug: 0
                       Figure 4.5: The sim2Impact configuration file for example 3.


The example can be executed using the following command line:
   ¥st  sim2Impact sim2Impact_ex3.yml
                                                    28

-------
Chapter 5

Sensor  Placement
The  sp subcommand optimizes the location of sensors  in a water distribution network to  minimize  the
impact of potential contamination  incidents.  The sp subcommand has a rich  interface that supports a
variety of optimization formulations, and it integrates a  wide range of optimization solvers.  The standard
sensor placement problem is to minimize the expected impact over an ensemble of scenarios while limiting
the number of potential sensors. An impact file is used to define the ensemble of contamination scenarios.
By default, sensors can be placed at any node in the network, but the sp subcommand can  easily specify
fixed locations and infeasible locations.
A  flowchart representation of the sp subcommand is shown in Figure 5.1.  The required input for the  the
sp subcommand is an impact file and sensor characteristics. The impact file is created by the sim2Impact
subcommand. Multiple impact files can be used as input to the sensor placement problem. The other required
input is the sensor characteristics. These characteristics are supplied through the sp WST configuration file
as well as additional files that provide details on the cost and failure rates of the sensors.
                                T    iT-i      /  /     Sensor
                                Impact File   / /    ,     ....
                                           / /    Characteristics
                                          Sensor
                                        Placement
                                      Sensor Locations
                                     	/

                              Figure 5.1: Sensor placement flowchart.


5.1   Sensor Placement Formulations

The most widely studied sensor placement formulation for a contamination warning system (CWS) design
is to minimize the expected impact of an  ensemble of contamination incidents given a fixed number of
sensors.  This formulation has also become the standard formulation in the sp subcommand, because it can
be effectively used to select sensor placements in large water distribution networks.
                                               29

-------
A mixed-integer programming (MIP)  formulation for expected- impact sensor placement is (eSP):

                                            daixai                                               (5.1)
                        minimize
Si e {0,1}
0 < xai < 1
                                                                    Va e A

                                                                    Va e A i e £0
                                                                    VieL
                                                                    Va e A *
                                                                                                (5.2)

                                                                                                (5.3)
                                                                                                (5.4)

                                                                                                (5.5)
                                                                                                (5.6)
This MIP minimizes the expected impact of a set of contamination incidents defined by A. For each incident
a (E A, aa is the weight of incident a, frequently a probability.  This formulation integrates contamination
simulation results, which are reported at a set of locations from the full set, denoted L, where a location refers
to a network node.  For each incident a, jCa  C L is the set of locations that can be contaminated by a.  Thus,
a sensor at a location i (E Ca can detect contamination from incident a at the time contamination first arrives
at location i. Each incident is witnessed by the first sensor to see it.  For each incident a (E A and location
i & Ca, dai defines the impact of the  contamination incident  a if it is witnessed  by location i.  This impact
metric assumes that as soon as  a sensor witnesses contamination, then any further contamination impacts
are mitigated (perhaps after  a suitable delay that accounts for the response time of the water utility). The
Si variables  indicate where sensors are placed in the network; Q is  the cost of placing a sensor at location i,
and p is the budget.
The xai variables indicate whether incident a  is witnessed by a sensor at location  i. A given set  of sensors
might not be able to witness  all contamination incidents. To account for this, L contains a dummy location,
q. This dummy location is in all subsets Ca. If the dummy location witnesses an incident, it generally means
that no real sensor can detect that incident. The impact for this location is the impact  of the contamination
incident after the entire contaminant transport simulation has  finished, which  estimates the  impact that
would occur without an online sensor network. The impact  of a dummy detection is no smaller than all
other impacts for each incident, so the witness variable xai for the dummy will only be  selected  if no sensors
have been placed that can detect this incident with smaller impact.
Berry et al.  (2006) describe eSP, and they note that this formulation is identical to the well-known p-median
facility location problem (Mirchandani and Francis, 1990) when Q = 1. In the p-median problem, p facilities
(e.g., central warehouses) are to be located on m potential sites such that the  sum of distances dai between
each of n customers (e.g., retail outlets) and the nearest facility i is minimized.  In comparing eSP and
p- median  problems,  there is equivalence between (1) sensors and facilities,  (2) contamination incidents and
customers, and (3) contamination impacts and distances. While  eSP allows placement  of at most  p sensors,
p-median  formulations generally enforce placement of all p facilities; in practice,  the distinction is irrelevant
unless p approaches the number of possible locations.

5.2  Sensor Placement Solvers

The sp  subcommand performs  optimization using a solver specified in the configuration file. All of the
solvers supported by the sp subcommand are practical for small-sized water distribution networks, and the
heuristic solvers can find sensor  placements for very large networks.

The sp subcommand interfaces with a variety of external solvers that can be used to perform sensor place-
ment. Several different MIP solvers can be used to find a globally optimal solution for  the eSP  MIP formu-
lation. However, this might be a computationally expensive process (especially for  large problems), and the
size of the MIP  formulation can become prohibitively large in some cases. A  variety of public-domain and
                                                 30

-------
commercial solvers can be used by the sp subcommand, including GLPK, CBC, PICO, CPLEX, GUROBI
and XPRESS.

The GRASP heuristic performs sensor placement optimization without explicitly creating a MIP formulation.
Thus, this solver uses much less memory,  and it usually runs very quickly.  Although the GRASP heuristic
does not guarantee that a globally  optimal  solution is found, it has  proven effective at finding optimal
solutions to a variety of large-scale applications. Two different implementations of the GRASP  solvers can
be used: an ATT commercial solver (att_grasp) or an open-source implementation of this solver (snl_grasp).
The Lagrangian heuristic uses the structure  of the p-median MIP formulation (eSP) to  find near-optimal
solutions while  computing a lower bound on the best possible solution.

5.3    sp Subcommand

The sp subcommand is executed with the following command:
   ¥st sp 
where conf igf ile is a WST configuration file in the YAML format.

The --help option prints information about this subcommand, such as usage, arguments, and a brief de-
scription:
   ¥st sp —help
Two other options can be used to print help information.  The --help-problems option prints a table of
the different types of optimization problems that can be solved with the sp subcommand. For example, the
following is a description of the standard problem solved by the sp subcommand:
   default, p-median, average-case perfect-sensor

   mean obj
   0  constraints
   perfect
   single obj
   1  stage
   exact
The first row lists the different names that can be used to specify this problem type. The six rows following
the dashed lines are different characteristics of this problem:  the  objective is a mean impact  value, the
problem has no side-constraints, the sensors detect perfectly (i.e., without errors), a single objective is used,
this is a single-stage problem, and an exact solution is desired. These six characteristics are used by the sp
subcommand to verify the suitability of solvers that are specified for optimization.
The --help-solvers option prints a table of the different solvers that can be applied to perform optimiza-
tion. For example, the following is a description of the solvers that  can be used to optimize  average-case
perfect-sensor problems  (which is  the default):
Problem Type
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
average-case perfect-sensor
Solver
*att_grasp
*cbc
*cplex
*glpk
gurobi
*lagrangian
pico
pico
*snl_grasp
xpress
Modeling Language
none
pyomo
pyomo
pyomo
pyomo
none
ampl
pyomo
none
pyomo
                                                 31

-------
Solvers highlighted with an asterisk are available in the current installation of WST. The modeling language
indicates whether AMPL  (Fourer et al., 2002), Pyomo  (Hart et al., 2012), or neither is used to solve sensor
placement optimization problem.

5.3.1  Configuration File

The sp subcommand generates a template configuration file using the following command line:
   ¥st sp —template 
The sp WST template configuration file is shown in Figure 5.2. Brief descriptions of the options are included
in the template after the # character.

5.3.2  Configuration Options

Full descriptions of the WST configuration options used by the sp subcommand are listed below.

impact data
     name
          The name of the impact block that is used in the objective or constraint block.
          Required input.
     impact file
          The name of the impact file that is created by sim2Impact and contains the detection time and
          the total impact given a sensor at that node is the first to detect contamination from that scenario.
          The impact file format is documented in File Formats Section 11.4.
          Required input.
     nodemap file
          The name of the nodemap file that is created by sim2Impact and maps sensor placement ids to
          the network node labels. The nodemap file format  is documented in File Formats Section  11.8.
          Required input.
     weight file
          The name of the weight file that specifies  the  weights for contamination incidents.  This file
          supports  the optimization of weighted impact  metrics. The weight file  format is documented in
          File Formats Section 11.14.
          Optional  input, by default, incidents are optimized with weight 1.
     directory
          The name of the directory where the impact file, nodemap file and weight file are located.
          Optional  input, default = working directory.
cost
     name
          The name of the cost block that is used in the objective  or constraint block.
          Optional  input.
     cost file
          The name of the cost file that contains the costs for the installation of sensors throughout the dis-
          tribution network. This file contains EPANET ID/cost pairs.  The cost file format is documented
          in File Formats Section 11.2.
          Optional  input.
     directory
          The name of the directory where the cost  file is located.

-------
          Optional input, default = working directory.
objective
     name
          The name of the objective block that is used in sensor placement.
          Required input.
     goal
          The objective of the optimization process that defines what is going to minimized. The options
          are the name of the impact block, the name of the cost block, the number of sensors (NS) or the
          number of failed detections (NFD).
          Required input.
     statistic
          The objective  statistic.   The options are MEAN, MEDIAN, VAR, TCE,  CVAR, TOTAL or
          WORST. For example, MEAN will minimize the mean impacts over all  of the contamination
          scenarios, while WORST  will only minimize the worst impacts from the ensemble of contamina-
          tion scenarios.
          Required input.
     gamma
          The value of gamma that specifies the fraction of the distribution of impacts that will be used to
          compute the VAR, CVAR and TCE statistics.  Gamma is assumed to be in the interval(0,1], It
          can  be interpreted as specifying the 100(*)gamma percent of the worst contamination incidents
          that are used for these calculations.
          Required input for VAR or CVAR objective statistics, default =  0.05.
constraint
     name
          The name of the constraint block that is used in sensor placement.
          Required input.
     goal
          The constraint goal. The options are the name of the impact block name, the name  of the cost
          block, the number of sensors (NS) or the number of failed detections (NFD).
          Required input.
     statistic
          The constraint statistic.  The  options  are MEAN, MEDIAN, VAR, TCE,  CVAR, TOTAL or
          WORST.
          Required input.
     gamma
          The value of gamma that specifies the fraction of the distribution of impacts that will be used to
          compute the VAR, CVAR and TCE statistics.  Gamma is assumed to be in the interval(0,1], It
          can  be interpreted as specifying the 100(*)gamma percent of the worst contamination incidents
          that are used for these calculations.
          Required input for VAR or CVAR objective statistics, default =  0.05.
     bound
          The upper bound on the constraint.
          Optional input.
     scenario
aggregate
     name

                                                33

-------
          The name of the aggregation block that is used in sensor placement.
          Optional input.
     type
          The type of aggregation used to reduce the size  of the sensor placement problem.  The options
          are THRESHOLD, PERCENT or RATIO.
          THRESHOLD is used to  aggregate similar impacts by specifying a goal and a value. This is
          used to reduce the total size of the sensor placement formulation (for large problems).  Solutions
          generated with non-zero thresholds are not  guaranteed to be globally optimal.
          PERCENT is an alternative method to compute the aggregation threshold in which the value (of
          the goal-value pair) is a double between 0.0 and  1.0. Over all contamination incidents, compute
          the maximum difference, d, between the impact of the contamination incident if it is not detected
          and the impact if it is detected at the earliest possible  feasible location and  set the aggregation
          threshold to d (*) aggregation percent. If both THRESHOLD and PERCENT are set to valid
          values, then PERCENT takes priority.
          RATIO is also specified with a goal-value pair in  which value is a double between 0.0 and  1.0.
          Optional input.
     goal
          The aggregation goal for the aggregation type.
          Optional input.
     value
          The aggregation value for the aggregation type. If the aggregation type is PERCENT or RATIO,
          then this value is a double between 0.0 and 1.0.
          Optional input.
     conserve memory
          The maximum number of impact files that should be read into memory at any one time. This
          option allows impact files to be processed in a memory conserving mode if location aggregation
          is chosen and the original  impact files are very large. For  example, a conserve memory value of
          10000 requests that while  original impact files are being processed into smaller aggregated files,
          no more than 10000 impacts should be read into  memory  at any one time.
          Optional input, default = zero to turn off this option.
     distinguish detection
          A  goal for  which aggregation should not allow incidents to become  trivial.  If the aggregation
          threshold is so large that all locations, including  the dummy, would form a single superlocation,
          this forces the dummy to be in a superlocation by itself. Thus, the sensor placement will  distin-
          guish between detecting and not detecting. This option can be listed multiple times,  to specify
          multiple goals.
          Optional input, default = 0.
     disable aggregation
          Disable aggregation for this goal, even at value zero,  which would incur no error. Each witness
          incident will be in a separate superlocation. This option can be listed multiple times,  to specify
          multiple goals. ALL can be used to specify all goals.
          Optional input, default = 0.
imperfect
     name
          The name of the imperfect block that is used  in sensor placement.
          Optional input.


                                                34

-------
     sensor class file
          The name of the imperfect sensor class file that defines the detection probabilities for all sensor
          categories. It is used with the imperfect-sensor model and must be specified in conjunction with a
          imperfect junction class file. The imperfect sensor class file format is documented in File Formats
          Section 11.6.
          Optional input.
     junction class file
          The name of the imperfect junction class file that defines a sensor category for each network node.
          It is used with the imperfect-sensor model and must be specified in conjunction with a imperfect
          sensor class file. The imperfect junction class file format  is documented in File Formats Section
          11.5.
          Optional input.
     directory
          The name of the directory where the imperfect junction class and sensor class files are located.
          Optional input.
sensor placement
     type
          The sensor placement problem type.  The command wst sp  -help-problems provides a list of
          problem types for sensor placement.  For example, average-case perfect-sensor is the standard
          problem type for sensor placement, since it uses the mean statistic, zero constraints, single objec-
          tive and perfect sensors.
          Required option, default = average-case perfect-sensor.
     modeling language
          The modeling language to generate the sensor placement  optimization problem.  The options are
          NONE, PYOMO or AMPL.
          Required input, default = NONE.
     objective
          The name of the objective block used in sensor placement.
          Required input.
     constraint
          The name of the constraint block used in sensor placement.
          Required input.
     imperfect
          The name of the imperfect block used in sensor placement.
          Optional input.
     aggregate
          The name of the aggregate block used in sensor placement.
          Optional input.
     compute bound
          A flag to indicate if bounds should be computed on the sensor  placement solution. The options
          are true or false.
          Optional input, default = false.
     presolve
          A flag to indicate if the sensor placement  problem should be presolved.  The options are true or
          false.
          Optional input, default = true.


                                                35

-------
     compute greedy ranking
          A flag to indicate if a greedy ranking of the sensor locations should be calculated.  The options
          are true or false.
          Optional input, default = false.
     location
          feasible nodes
              A list that defines nodes that can be  considered for  the sensor placement problem.  The
              options are:  (1) ALL, which specifies all nodes as feasible sensor locations; (2)  NZD, which
              specifies all non-zero demand nodes as  feasible sensor  locations; (3) NONE, which specifies
              no nodes as feasible sensor locations; (4) a list of EPANET node IDs, which identifies specific
              nodes as feasible sensor locations; or (5) a filename, which is a space or comma separated file
              containing a list of specific nodes as feasible sensor locations.
              Required input, default = ALL.
          infeasible nodes
              A list that defines nodes that cannot be considered for the  sensor placement problem.  The
              options are:  (1) ALL, which specifies all nodes as infeasible sensor locations; (2)  NZD, which
              specifies non-zero  demand nodes as infeasible sensor locations; (3) NONE, which specifies no
              nodes as infeasible sensor locations; (4)  a list of EPANET node IDs, which identifies specific
              nodes as infeasible sensor locations; or  (5)  a filename,  which is a space or comma separated
              file containing a list of specific nodes as infeasible sensor locations.
              Optional input, default = NONE.
          fixed nodes
              A list that defines nodes that are already sensor locations. The options are: (1) ALL, which
              specifies all nodes as fixed sensor locations; (2) NZD, which specifies non-zero  demand nodes
              as fixed sensor locations; (3)  NONE, which specifies no nodes as fixed sensor locations;  (4) a
              list of EPANET node IDs, which identifies specific nodes as fixed sensor locations; or  (5) a
              filename, which is a space or comma separated file containing a list of specific nodes as  fixed
              sensor locations.
              Optional input, default = NONE.
          unfixed nodes
              A list that defines nodes that are unfixed sensor locations. The options are: (1) ALL, which
              specifies all  nodes as unfixed sensor locations;  (2) NZD, which specifies non-zero demand
              nodes as unfixed  sensor locations;  (3)  NONE,  which  specifies no nodes as unfixed sensor
              locations; (4) a list of EPANET node IDs, which identifies  specific nodes as  unfixed sensor
              locations; or (5) a filename, which is a space or comma separated file containing a list of
              specific nodes as unfixed sensor locations.
              Optional input, default = NONE.
solver
     type
          The solver type.
          Required input.
     options
          A list of options  associated with a specific solver type.
          Optional input.
     logfile
          The name of a file to output the results of the solver.
          Optional input.


                                                36

-------
     verbose
          The solver verbosity level.
          Optional input, default = 0 (lowest level).
     initial points
          nodes
              A list of node locations (EPANET IDs) to begin the optimization process.
              Optional input.
          pipes
              A list of pipe locations (EPANET IDs) to begin the optimization process.
              Optional input.
configure
     output prefix
          The prefix used for all output files.
          Required input.
     debug
          The debugging level that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).

5.3.3  Subcommand Output

The sp subcommand outputs the sensor placement results in a variety of methods. Some of the solvers used
for sensor placement create temporary output files.  The main sensor placement results are summarized in a
_sp.json output file. This file includes the sensor locations (listed as EPANET node ID and
as an integer node ID which is  used in the sensor placement formulation), and objective value, the run date,
and CPU time. Additionally, the lower and upper bound on the objective is reported (for some solvers) and
the problem type, solver type and modeling language is recorded.

The  sp subcommand also outputs information to  the screen, some of which is repeated in  the JSON file.
Screen output  includes the following data:

   • Objective:  The impact metric value achieved with the sensor network design.

   • Lower bound: The lower bound on the impact metric with the sensor network design.

   • Upper bound:  The upper bound on the impact metric with the sensor network design.

   • Solutions: The internal node indices used by sp for the sensor network design.

   • Locations:  The EPANET node labels for the sensor placement locations.

   • Sensor placement ID: An integer ID used to distinguish the sensor network design.

   • Number of sensors: The number of sensors in the sensor network design.

   • Total cost: The cost of the sensor network design, which could be nonzero if cost data is provided.

   • Sensor node IDs: The internal node indices used by sp  for the sensor network design. The same as
     Solutions.

   • Sensor junctions:  The EPANET node labels for  the  sensor placement locations.  The same  as
     Locations.

   • Impact  file: The name of the impact file  used in the sensor network design

                                               37

-------
   • Number of events:  The number of contamination scenarios that were simulated.

The  performance of the sensor network design is summarized for each impact data file specified in the
configuration file. The impact statistics included are:

   • Min  impact:  The minimum impact over  all contamination incidents simulated.  Assuming that a
     sensor protects the node at which it is placed, this statistic will typically be zero.

   • Mean impact: The mean (or average) impact over all contamination incidents simulated.

   • Lower quartile impact: 25% of the contamination incidents, weighted by their likelihood, have an
     impact value less than this quartile.

   • Median impact:  50% of the contamination incidents simulated, weighted by their likelihood, have
     an impact value less than this quartile.

   • Upper quartileimpact:  75% of the contamination incidents simulated, weighted by their likelihood,
     have an impact value  less than this quartile.

   • Value at Risk (VaR): VaR is the minimum value for which 100 * (1 — /3)% of the contamination
     incidents simulated have a smaller impact, in which (3 is a user-defined percentage between 0.0 < (3 <
     1.0.

   • TCE: The tailed-conditioned expectation (TCE) is the mean value of the impacts that are greater
     than or equal to VaR.

   • Worst impact: The  worst impact over all  contamination incidents simulated.

If the [compute  greedy ranking] option is used, a greedy sensor placement is also included in the screen
output.  This places  sensors one-at-a-time at the locations in the optimal sensor network design by consecu-
tively minimizing the mean impact of placing each sensor. This analysis gives a sense of the relative priorities
for these sensors.  The greedy ranking is listed in terms of the sensor node IDs used by the sp subcommand
and not the EPANET node IDs.
                                                38

-------
# sp configuration template
impact data:

    name:  impact1
    impact file: Net3_mc.impact
    nodemap file: Net3.nodemap
    weight file: null
    directory:  null
cost:

    name:  null
    cost file:  null
    directory:  null
objective:

    name:  obj1
    goal:  impact1
    statistic:  MEAN
    gamma: 0.05
constraint:

    name:  const 1
    goal:  NS
    statistic:  TOTAL
    gamma: 0.05
    bound: 5.0
    scenario: []
aggregate:

    name:  null
    type:  null
    goal:  null
    value: null
    conserve memory: 0
    distinguish detection:  0
    disable aggregation:  [0]
imperfect:

    name:  null
    sensor class file: null
    junction class file:  null
    directory:  null
sensor placement:

    type:  default
    modeling language: NONE
    objective:  objl
    constraint:  constl
    imperfect:  null
    aggregate:  null
    compute bound: false
    presolve: true
    compute greedy ranking: false

    location:

        feasible nodes: NZD
        infeasible nodes:  NONE
        fixed nodes: NONE
        unfixed nodes: NONE
solver:
  type: snl_grasp
  options:
  logfile: null
  verbose: 0
  initial points: []
configure:
  output prefix: Net3
# Impact block name
# Impact file name
# Nodemap file name
# Weight file name
# Impact data directory
# Cost block name
# Cost file name
# Cost data directory
# Objective block name
# Optimization objective
# Objective statistic
# Gamma, required with statistics VAR or CVAR
# Constraint block name
# Constraint goal
# Constraint statistic
# Gamma, required with statistics VAR or CVAR
# Constraint upper bound
# Aggregation block name
# Aggregation type: THRESHOLD,  PERCENT or RATIO
# Aggregation goal
# Aggregation value
# Aggregation conserve memory
# Detection goal
# Aggregation disable aggregation
# Imperfect block name
# Imperfect sensor class file
# Imperfect junction class file
# Imperfect file directory
# Sensor placement problem type
# Modeling language:  NONE, PYOMO or AMPL,  default = NONE
# Objective block name used in sensor placement
# Constraint block name used in sensor placement
# Imperfect block name used in sensor placement
# Aggregate block name used in sensor placement
# Compute bounds: true or false, default = false
# Presolve problem: true or false,  default = true
# Compute greedy ranking of sensor locations, default =
#   false
# Feasible sensor nodes
# Infeasible sensor nodes
# Fixed sensor nodes
# Unfixed sensor nodes

# Solver type
# A dictionary of solver options
# Redirect solver output to a logfile
# Solver verbosity level
# Output file prg
                            Figure 5.2: The sp configuration template file.

-------
5.4   Sensor Placement Examples

The following examples illustrate common ways that the sp subcommand can be used. Additional examples
using the sp subcommand are provided in Chapter 10.

5.4.1  Example 1

The first example uses the configuration file,  sp_exl.yml, shown in Figure 5.3. It specifies the impact file
as Net3_ec.impact created by the sim2Impact subcommand using the extent of contamination (EC) impact
metric and EPANET Example Network 3. The objective is to minimize the mean EC over all contamination
incidents simulated while limiting the number of sensors  (NS) to no more than 5. The solver is the GLPK
mixed-integer programming (MIP) solver, which finds globally optimal sensor placements. In addition, the
greedy ranking  option is used.
   impact data:
   - name:  impact 1
    impact file: Net3_ec.impact
    nodemap file: NetS.nodemap
    directory: Net3
   objective:
   - name:  obj1
    goal:  impact 1
    statistic: MEAN
   constraint:
   - name:  constl
    goal:  NS
    statistic: TOTAL
    bound:  5.0
   sensor placement:
    type:  default
    objective: objl
    constraint: constl
    presolve: True
    compute greedy ranking: True
   solver:
    type:  glpk
    options:  {}
    logfile:  null
    verbose:  0
   configure:
    output prefix: sp_exl/Net3
    debug:  0
                          Figure 5.3: The sp configuration file for example 1.

The sp subcommand is executed using the following command line:
   ¥st sp sp_exl.yml
The sp subcommand for example 1 generates the JSON output file, sp_exl.json,  which summarizes the
sensor placement results (Figure 5.4). The sensor network design places sensors at nodes 113, 121, 141, 163
and 209 to achieve an objective value of approximately 8655 of pipe feet contaminanted.
The screen output for the first sp subcommand example is shown in Figure 5.5. It displays the same sensor
network design and impact value as in the JSON file.  It also includes  the greedy ranking of the sensor
network design, in which a sensor at node 163 would provide the greatest reduction in the impact followed
by sensors at nodes 209, 113, 141 and 121.
                                                 40

-------
"solver type":  "glpk" ,
"run date":  "2013-08-22  14:37:40"
"problem type":  "default",
"lower bound" :  8655.806351,
"CPU time":  11.658955812454224,
"EPANET node ID":  [
  [
    "113",
    "121",
    "141",
    "163",
    "209"
"modeling language":  "pyomo"
"upper bound" :  8655.806351,
"objective": 8655.81,
"node ID" :  [
  [
    16,
    21,
    28,
    38,
    65
                         Figure 5.4:  The JSON output file for example 1.
                                                41

-------
WARNING: found CBC version (2.5.0.0)
WARNING: consider upgrading CBC
Gamma = 0.05
                             < 2.7;  ASL support disabled.
Optimization Results
Objective:   8655.81
Lower Bound: 8655.806351
Upper Bound: 8655.806351

Solutions:  [[16, 21, 28, 38, 65]]
Locations:  [['113', '121', '141',  '163',  '209']]
Sensor placement id:
Number of sensors:
Total cost:
Sensor node IDs:
Sensor junctions:

Impact File:
Number of event s:
Min impact:
Mean impact:
Lower quartile impact:
Median impact:
Upper quartile impact:
Value at Risk (VaR) (
TCE                 (
Max impact:
                    51996
                    5
                    0
                    16 21 28 38 65
                    113 121 141 163 209

                    Net3/Net3_ec.impact
                    236
                    0.0000
                    8655.8064
                    0.0000
                    7110.0000
                    12444.0000
               57,):  27269.0000
                    29853.9750
                    36740.0000
Greedy ordering of sensors:  Net3/Net3_ec.impact
-1
38
65
16
28
21
47126.3322
23998.0814
16138.4225
11534.0903
9821.7386
8655.8064
                           Figure 5.5: The sp screen output for example 1.

-------
5.4.2   Example 2

The sp subcommand can also be configured to evaluate a sensor network design on a variety of different
objectives. The second example uses the configuration file, sp_ex2.yml, shown in Figure 5.6. Several impact
files, Net3_ec.impact and Net3_mc.impact, are defined in the impact block of the WST configuration file.
The objective of the sensor placement optimization is to minimize the mean EC  impact metric. The optimal
sensor network design is then evaluated against the mass consumed (MC) impact metric.
   impact data:
   - name: ec
     impact file:  Net3_ec.impact
     nodemap file: NetS.nodemap
     directory:  Net3
   - name: me
     impact file:  Net3_mc.impact
     nodemap file: NetS.nodemap
     directory:  Net3
   objective:
   - name: obj1
     goal: ec
     statistic:  MEAN
   constraint:
   - name: constl
     goal: NS
     statistic:  TOTAL
     bound:  5.0
   sensor placement:
     type: default
     objective:  objl
     constraint: constl
     presolve: True
     compute greedy ranking:  True
   solver:
     type: glpk
     options: {}
     logfile: null
     verbose: 0
   configure:
     output prefix: sp_ex2/Net3
     debug:  0
                           Figure 5.6:  The sp configuration file for example 2.

The sp subcommand is executed using  the following command line:
   ¥st  sp sp_ex2.yml
Each of these impact files are used when evaluating the sensor placement, and a greedy sensor placement is
generated for each (Figure 5.7). The sensor network design optimized for EC results in a mean MC impact
of 56320.  The  greedy ranking  of the sensors is different for the two different impact metrics.  A sensor at
node 209 would be the first sensor placed for the MC impact metric compared to a sensor at node 163 for
the EC impact metric.
                                                  43

-------
WARNING: found CBC version
WARNING: consider upgrading
Gamma = 0.05
Optimization Results
Objective: 8655.81
Lower Bound: 8655.806351
Upper Bound: 8655.806351
Solutions: [[16, 21, 28, 38
Locations: [['113', '121',
Sensor placement id:
Number of sensors :
Total cost:
Sensor node IDs :
Sensor junctions:
Impact File :
Number of event s :
Min impact :
Mean impact :
Lower quartile impact:
Median impact :
Upper quartile impact:
Value at Risk (VaR) ( 57.) :
TCE ( 57.) :
Max impact :
Impact File :
Number of event s :
Min impact :
Mean impact :
Lower quartile impact:
Median impact :
Upper quartile impact:
Value at Risk (VaR) ( 57.) :
TCE ( 57.) :
Max impact :

Greedy ordering of sensors :
-1 47126.3322
38 23998.0814
65 16138.4225
16 11534.0903
28 9821.7386
21 8655.8064
Greedy ordering of sensors :
-1 136858.7347
65 71509.1322
28 56685.0096
16 56409.8878
38 56329.1362
21 56320.3850
(2.5.0.0) < 2.7; ASL support disabled.
CBC





, 65]]
'141', '163', '209']]
52211
5
0
16 21 28 38 65
113 121 141 163 209
Net3/Net3_ec . impact
236
0.0000
8655.8064
0.0000
7110.0000
12444.0000
27269.0000
29853.9750
36740 . 0000
Net3/Net3_mc . impact
236
0.0000
56320.3850
307.6690
4200 . 0900
137021.0000
143999.0000
143999.0000
143999.0000

Net3/Net3_ec . impact






Net3/Net3_mc . impact






Figure 5.7: The sp screen output for example 2.
                     44

-------
5.4.3   Example 3

The third example illustrates the use of a heuristic solver for sensor placement.  A greedy randomized
adaptive sampling process (GRASP) heuristic iteratively applies local search to adaptively sample locations.
Two GRASP heuristics are provided in WST. The AT&T GRASP solver is based on the AT&T popstar
software and it can be used for research purposes.  The SNL GRASP solver  is a new implementation of the
GRASP algorithm. The configuration file, sp_ex3.yml, for example 3 is shown in Figure 5.8.  The sensor
placement parameters are the same as in example 1, besides the SNL GRASP solver is used instead of GLPK.
   impact data:
   - name:  impact 1
    impact file: Net3_ec.impact
    nodemap file: NetS.nodemap
    directory: Net3
   objective:
   - name:  obj1
    goal:  impact 1
    statistic: MEAN
   constraint:
   - name:  constl
    goal:  NS
    statistic: TOTAL
    bound:  5.0
   sensor placement:
    type:  default
    objective: objl
    constraint: constl
    presolve: True
    compute greedy ranking: True
   solver:
    type:  snl_grasp
    options: {}
    logfile: null
    verbose: 0
   configure:
    output prefix: sp_ex3/Net3
    debug:  0
                          Figure 5.8: The sp configuration file for example 3.

The sp subcommand is executed using the following command line:
   ¥st sp sp_ex3.yml
The screen output for example 3 is shown in Figure 5.9. The SNL GRASP heuristic solver finds the same
solution found by the GLPK solver in example 1.  In addition, the GRASP solver found another solution
during search with the same performance. Since the GRASP solvers are heuristics, they are not guaranteed
to find sensor placements with the globally optimal value.  However, these solvers have proven capable of
finding optimal or near-optimal solutions even for large sensor placement problems. While the GRASP solver
in example 1 provided an upper and lower bound on the value of the solution, the GRASP solver does not
generate these bounds since it is a heuristic.
                                                 45

-------
WARNING: found CBC version (2.5.0.0)
WARNING: consider upgrading CBC
Gamma = 0.05
                             < 2.7;  ASL support disabled.
Optimization Results
Objective:   8655.81
Upper Bound: 8655.81

Solutions:  [[16, 19, 28, 47, 66],  [16,  21,  28,  38,  65]]
Locations:  [['113', '119', '141',  '181',  '211'],  ['113',  '121',  '141'
                                                               '163',  '209']]
Sensor placement id:
Number of sensors:
Total cost:
Sensor node IDs:
Sensor junctions:

Impact File:
Number of event s:
Min impact:
Mean impact:
Lower quartile impact:
Median impact:
Upper quartile impact:
Value at Risk (VaR) (
TCE                 (
Max impact:
                    52011
                    5
                    0
                    16 21 28 38 65
                    113 121 141 163 209

                    Net3/Net3_ec.impact
                    236
                    0.0000
                    8655.8064
                    0.0000
                    7110.0000
                    12444.0000
               57,):  27269.0000
                    29853.9750
                    36740.0000
Greedy ordering of sensors:  Net3/Net3_ec.impact
-1
38
65
16
28
21
47126.3322
23998.0814
16138.4225
11534.0903
9821.7386
8655.8064
                           Figure 5.9: The sp screen output for example 3.
                                                   46

-------
Chapter 6

Hydrant  Flushing Locations
A common operational approach that water utilities use to address water quality concerns is flushing, which
is the purging of water from the distribution network via a fire hydrant or blow-off port.  Many utilities flush
water mains following maintenance work or in response to customer complaints. Flushing can remove the
sources of poor water quality (e.g., pipe corrosion, bio-film microorganisms), as well as loose or suspended
material that has accumulated in low-flow portions or dead-ends of the distribution system. It is a response
option that can be undertaken relatively quickly after an contamination incident, and it can be made more
efficient through the careful selection of where to implement flushing  activities. This chapter describes the
flushing subcommand in  WST that  assists in the identification of effective hydrant locations to flush in
order to remove contaminated water and the valves to close in order to direct the contaminated water towards
the hydrants.

A flowchart representation of the flushing subcommand is shown in Figure 6.1. The flushing subcommand
employs an iterative process that combines contaminant transport, impact assessment and optimization. The
optimization process identifies a set of flushing activities that are simulated in the contaminant transport
process and evaluated based upon the impact assessment process. Since the flushing  subcommand relies
on the  tevasim and sim2Impact subcommands, their required input is also  required for the flushing
subcommand. In addition,  the sensor network design used to detect the contamination  incident(s) and the
flushing characterstics are  required inputs.  The  utlity network model is defined by a  EPANET INP file,
while the rest of the input can be specified in  the flushing WST configuration file.
                                               47

-------
         Utility Network
           Model
Simulation
  Input
                 Contaminant
                   Transport
           Consequences
              Input
                  Threat Ensemble
                   Database
             Impact
            Assessment
   Sensor
Placements,
  Flushing
Characteristics
                                          Impact File
                                       Flushing
                                     Optimization
                                                                     Flushing
                                                                     Locations
                          Figure 6.1: Flushing response simulation flowchart.


6.1    Flushing Formulation

The  flushing problem formulation can be summarized as selecting a  set of hydrant locations to flush and
valves  to close that minimizes the average impact of all contamination incidents given a set  of potential
hydrant locations to flush and valves to close. The mathematical formulation can be written as follows:
                           minimize — ^ da,max
                     subject to     2^ yh < Hmax
                                    heH
                                    ]T
                                    vev
                                                                        Vh&H
                                                                        Vv e v
                                                                  (6.1)

                                                                  (6.2)

                                                                  (6.3)

                                                                  (6.4)
                                                                  (6.5)
where A represents a set of contamination incidents, damax defines the maximum impact of the contamination
incident a, H represents the set of potential hydrant locations, yh is a binary variable which is 1 if node h
is selected as a flushing location, Hmax is the  maximum number of hydrant locations, V represents the set
of potential valve locations, yv is a binary variable which is 1 if node v is selected as a valving location, and
Vmax is the maximum number of valve locations. The maximum impact of a contamination incident, da^max.f
is the total impact  across the entire network  at the end of the simulation assuming that the contaminant
was not detected by a sensor, and no interventions to reduce impacts were implemented. This value is found
in the -1 entry of the impact file.
For this problem, hydrants are assumed to be located at any user-defined nodes in the network. In addition,
valves are assumed  to be located on any user-defined pipes in the network.
                                                 48

-------
6.2   Flushing Solvers

The flushing problem is solved through an iterative optimization process which selects different sets of hydrant
and valve locations and evaluates their effectiveness in minimizing the impact of a set  of contamination
incidents. Two optimization methods, an evolutionary algorithm and a network solver, are available in WST
to solve this problem. Each solver is explained in more detail in the following subsections.

6.2.1   Evolutionary Algorithm

The evolutionary algorithm (EA) included with DAKOTA, Coliny EA, is used in the optimization routine for
the flushing subcommand.  Additional information on DAKOTA/Coliny solvers can be found at http://
dakota. sandia.gov/docs/dakota/5.2/html-ref/index.html and in the DAKOTA user manual (Adams
et al., 2013).
To design an EA, the parameter space for the optimization problem is first encoded into a string of numbers.
This encoded representation of the  problem is called a genetic string, where each element of the genetic
string represents one parameter.  When the EA is used with the flushing subcommand, the parameter
space is defined by the number of flushing and valve closure locations. Each location is assigned a sequential
integer that represents a feasible location within the EPANET network. The final EA solution is translated
to represent EPANET node/pipe IDs.
The  EA has several solver options that define how the EA evolves.  These options can be set  in the
[solver] [options]  sections  of the flushing WST configuration file and are  specificto the Coliny EA
solver.    The EA  evolves an  initial genetic strings of size [population_size]  that is set  based on
[initialization_type]  using the following steps:

   f. Evaluation:  Evaluate the solution for each genetic string.  This involves function calls to the tevasim
     and sim2Impact subcommands for each string to define the  impact  value.

   2. Breeding: Select two members of the population based on fitness.  The probability of selection is based
     on [fitness_type].

   3. Crossover: Crossover two members based on [crossover_type]  and [crossover_rate] .

   4. Mutation: Mutate the two members based on [mutation_type] and [mutation_rate].

   5. Steps 2-4  are repeated until the entire population has been changed.

   6. Replacement: After a new population is created, the old population is replaced by the current popu-
     lation while  keeping the highest ranked string (elitist  = f replacement option).

Steps 1-6 are repeated until [max_iterations] or  [max_function_evaluations] criteria is met.

6.2.2   Network  Solver

The network solver used in WST is a network-constrained, derivative-free local search optimization algorithm.
It is a discrete analog-to-pattern search. The allowable moves are to adjacent nodes (or pipes), rather than
moves in the continuous space. This approach provides local refinement  of candidate solutions.  The valid
moves include removing a node (or pipe) location and replacing it with one anywhere in the network.  Two
forms of the network solver can be used: with and without initial starting  points. The initial starting points
are node (or pipe) locations in the network in which the algorithm should begin  its  local search. If these
points are not  supplied to the  algorithm, then it reduces to a greedy placement algorithm. Convergence is
met when no remaining moves improve  the solution.
                                                49

-------
6.3   flushing  Subcommand

The flushing subcommand is executed using the following command line:
   ¥st flushing 
where conf igf ile is a WST configuration file in the YAML format.
The --help  option prints information about this subcommand, such as usage, arguments, and a brief de-
scription:
   ¥st flushing —help
6.3.1  Configuration File

The flushing subcommand generates a template configuration file using the following command line:
   ¥st flushing —template 
The flushing template configuration file is shown in Figure 6.2.  Brief descriptions of the options are included
in the template after the # sign.

6.3.2  Configuration Options

Full descriptions of the WST configuration options used by the flushing subcommand are listed below.

network
     epanet file
          The name of the EPANET input (INP) file that defines the water distribution network model.
          Required input.
scenario
     location
          A list that  describes the injection locations  for the contamination scenarios.  The options are:
          (1) ALL, which denotes all nodes (excluding tanks and reservoirs) as contamination injection
          locations; (2) NZD, which  denotes all nodes with non-zero demands as contamination injection
          locations; or (3) an EPANET  node ID,  which identifies the node where contamination is intro-
          duced. This allows easy specification of single or multiple contamination scenarios.
          Required input unless a TSG or TSI file is specified.
     type
          The injection type for the contamination scenarios.  The options are MASS,  CONCEN, FLOW-
          PACED  or SETPOINT. See the EPANET manual for additional information about source types
          (Rossman, 2000).
          Required input unless a TSG or TSI file is specified.
     strength
          The amount of contaminant injected into the  network for the contamination scenarios.  If the type
          option is  MASS, then the units for the strength are in mg/min. If the type option is  CONCEN,
          FLOWPACED or SETPOINT, then units are in mg/L.
          Required input unless a TSG or TSI file is specified.
     species
          The name of the contaminant species injected into the network. This the name of a single species.
          It is required when using EPANET-MSX, since multiple species might be simulated, but only one
          is injected into the network.  For cases  where multiple contaminants are injected, a TSI file is
          needed.


                                                50

-------
          Required input for EPANET-MSX unless a TSG or TSI file is specified.
     start time
          The injection start time that defines when the contaminant injection begins.  The time is given in
          minutes and is measured from the start of the simulation.  For example, a value of 60 represents
          an injection that starts at hour 1 of the simulation.
          Required input unless  a TSG or TSI file is specified.
     end time
          The injection end time that defines when the contaminant injection stops. The time is given in
          minutes and is measured from the start of the simulation. For example, a value of 120 represents
          an injection that ends  at hour 2 of the simulation.
          Required input unless  a TSG or TSI file is specified.
     tsg file
          The name of the TSG scenario file that defines the ensemble of contamination scenarios to be
          simulated. Specifying a TSG file will override the location, type, strength, species, start and end
          times options specified in the WST configuration file.  The TSG file format is documented in File
          Formats Section 11.12.
          Optional input.
     tsi file
          The name of the TSI  scenario file that defines the ensemble of contamination  scenarios to be
          simulated. Specifying a TSI file will override the TSG file, as well as the location, type,  strength,
          species, start and end  time options specified in the WST configuration file.  The TSI file  format
          is documented in File Formats Section 11.13.
          Optional input.
     msx file
          The name of the EPANET-MSX multi-species file that defines the multi-species reactions to be
          simulated using EPANET-MSX.
          Required input for EPANET-MSX.
     msx species
          The name of the MSX species  whose concentration profile will be saved by the  EPANET-MSX
          simulation and used for later calculations.
          Required input for EPANET-MSX.
     merlion
          A flag (true or false) to indicate if the Merlion water quality simulator should be used. If an MSX
          file is provided, EPANET-MSX will be used.
          Required input, default = false.
impact
     erd file
          The name of the ERD database file that contains the contaminant transport simulation results.
          It is created by running the tevasim subcommand. Multiple ERD files (entered as a list)  can be
          combined to generate a single impact file. This can be used to combine simulation results  from
          different types of contaminants, in which the ERD files were  generated from different TSG files.
          Required input.
     metric
          The impact metric used to compute the impact file.  Options include EC, MC, NFD, PD, PE, PK,
          TD, TEC or VC. One impact file is  created for each metric  selected.  These metrics are defined
          in Section 4.1.
          Required input.

                                                51

-------
     tai file
          The name of the TAI file that contains health impact information. The TAI file format is docu-
          mented in File Formats Section 11.11.
          Required input if a public health metric is used (PD, PE or PK).
     response time
          The number of minutes that are needed to respond to the  detection of a  contaminant.  This
          represents the time that  it takes a water utility to  stop the spread of the  contaminant in the
          network and eliminate the consumption of contaminated water. As the response time increases,
          the impact increases because the  contaminant affects the network for a greater length of time.
          Required input, default = 0 minutes.
     detection limit
          The concentration  thresholds needed to perform detection with  a sensor.  There must be one
          threshold  for each  ERD  file. The units of these detection limits depend on the units of the
          contaminant simulated for each ERD file (e.g., number  of cells of a biological agent).  The units
          for detection limit are the same as for the MASS values that are specified in  the TSG file.
          Required input, default = 0.
     detection confidence
          The number of sensors that must detect  an incident before the impacts are calculated.
          Required input, default = 1 sensor.
     msx species
          The name of the MSX  species  tracked  in  the EPANET-MSX simulation.   This parameter is
          required for multi-species contamination  incidents created by  tevasim subcommand.
          Required input for  EPANET-MSX, default = first species listed in the ERD  file
flushing
     detection
          The sensor network design used to detect contamination scenarios. The sensor locations are used
          to compute a detection time for each contamination scenario.  The options are a list of EPANET
          node IDs or a file name which contains a list of EPANET node IDs.
          Required input.
     flush nodes
          feasible nodes
              A list that defines the nodes  in the network that can be flushed. The options are: (1) ALL,
              which specifies all nodes as feasible flushing locations; (2) NZD, which specifies all non-zero
              demand nodes  as feasible flushing locations; (3) NONE, which specifies no nodes as feasible
              flushing locations; (4) a list of EPANET node IDs, which identifies specific nodes as feasible
              flushing locations; or (5) a filename, which is a space or  comma separated file containing a
              list of specific nodes as feasible flushing  locations.
              Required input, default = ALL.
          infeasible nodes
              A list that defines the nodes in the  network that  cannot be  flushed.  The options are:  (1)
              ALL,  which  specifies all nodes as  infeasible flushing locations;  (2) NZD, which specifies all
              non-zero demand nodes as infeasible flushing locations; (3) NONE, which specifies no nodes
              as infeasible flushing locations; (4)  a list  of EPANET node IDs, which identifies specific nodes
              as infeasible  flushing locations; or (5) a filename, which is a  space or comma separated  file
              containing a list of specific nodes as  infeasible flushing locations.
              Optional input, default = NONE.
          max nodes

-------
         The maximum number of node locations that can be flushed simulatenously in the network.
         The value is a nonnegative integer or a list of nonnegative integers. When a list is specified, the
         optimization will be performed for each number in this list. For example, a value of 3 means
         that a maximum of 3 node will be identified as flushing locations during the optimization
         process.
         Required input.
     rate
         The flushing rate for each node location to be flushed. A new demand pattern will be created
         using this rate for the node. The value is a nonnegative integer. For example,  a value of 800
         means that an additional demand of 800 gpm is applied at a  particular node.  This rate is
         applied to  all flushing locations identified in the optimization process.
         Required input.
     response time
         The time in minutes  between the detection of a contamination incident and the start of
         flushing. The value is a nonnegative integer.  For example,  a value of 120 represents a 120
         minutes or a 2 hour delay between the time of detection and the start of flushing.
         Required input.
     duration
         The length of time in minutes that flushing will be simulated at a particular node.  The value
         is a nonnegative integer. For example, a value of 240 means that flushing would be simulated
         at a particular node for 4 hours. This duration is applied to all flushing locations identified
         in the optimization process.
         Required input.
close valves
     feasible pipes
         A list that defines the pipes in the network that can be closed.  The options  are:  (1) ALL,
         which specifies all pipes as feasible pipes to close; (2) DIAM  min max, which specifies all
         pipes with a specific diameter as feasible pipes to close; (3) NONE, which specifies no pipes
         as feasible pipes to close; (4)  a list of EPANET pipe IDs, which identifies  specific pipes as
         feasible pipes to close; or  (5) a filename, which is a space or comma separated  file containing
         a list of specific pipes as feasible pipes to close.
         Required input, default = ALL.
     infeasible pipes
         A list that defines the pipes in the network that cannot be closed.  The options are:  (1) ALL,
         which specifies all pipes as infeasible  pipes to close; (2) DIAM min max, which specifies all
         pipes with a specific diameter as infeasible pipes to close; (3)  NONE, which specifies no pipes
         as infeasible pipes to close; (4) a list of EPANET  pipe IDs, which identifies specific pipes as
         infeasible pipes to close; or (5) a filename, which is a space or comma separated file containing
         a list of specific pipes as infeasible pipes to close.
         Optional input, default = NONE.
     max pipes
         The maximum number of pipes that can be closed simulatenously in the network.  The value
         must be a  nonnegative integer or a list of nonnegative integers. When a list is specified, the
         optimization will be performed for each number in this list. For example, a value of 2 means
         that a maximum of 2 pipes to close will be identified during the optimization process.
         Required input.
     response time
         The time in minutes between the detection of a contamination incident and closing pipes. The

                                            53

-------
              value is a nonnegative integer. For example, a value of 120 would represent a 120 minutes or
              a 2 hour delay between the time of detection and the start of pipe closures.
              Required input.
solver
     type
          The solver  type.
          Required input.
     options
          A list of options associated with a specific solver type.
          Optional input.
     logfile
          The name of a file to output the results of the solver.
          Optional input.
     verbose
          The solver  verbosity level.
          Optional input, default = 0 (lowest level).
     initial points
          nodes
              A list of node locations (EPANET IDs) to begin the optimization process.
              Optional input.
          pipes
              A list of pipe locations (EPANET IDs) to begin the optimization process.
              Optional input.
configure
     output  prefix
          The prefix  used for all output files.
          Required input.
     debug
          The debugging level that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).
In addition to these standard WST configuration options, the solver block can define specific options for the
solver selected. The solver options should be modified according to the specific optimization problem. If the
options are not set in the solver block, then the default values for these options are used. The two solvers
available in the flushing subcommand each have their own options. The EA solver has numerous options
which can be defined.  Additional on the options available for the EA solver can found in the DAKOTA user
manual (Adams et al., 2013). An example of the EA solver options are listed below.
   solver:
    type: coliny_ea
    options:
      crossover_rate:  0.8
      crossover_type:  uniform
      fitness_type: linear_rank
      initialization_type: unique_random
      max_function_evaluations:  30000
      max_iterations:  1000
      mutation_rate: 1
      mutation_type: offset_uniform
      population_size: 50
                                                 54

-------
      seed:  11011011
The network solver has two options that can be set in the solver block of the configuration file.
   solver:
    type: collny:StateMachlneLS
    options:
      verbosity: 2
      max_fcn_evaluations: 0
6.3.3  Subcommand  Output

The _flushing.json file contains a summary of the flushing subcommand results. This file
includes an optimized set of node locations to flush (EPANET node IDs), an optimized set of pipe locations
to close (EPANET pipe IDs), the final impact metric, the detection time of the contamination incident(s)
based on the sensor network design (in minutes from the start of the simulation), the run date, and CPU
time.
                                                55

-------
# flushing configuration template
network:
  epanet  file: NetS.inp
scenario:
  location: [NZD]
  type: MASS
  strength: 100.0
  species: null
  start time:  0
  end time: 1440
  tsg file: null
  tsi file: null
  msx file: null
  msx species: null
  merlion: false
impact:
  erd file: null
  metric:  [PE]
  tai file: Net3_bio.tai
  response time: 0
  detection limit: [0.0]
  detection confidence: 1
  msx species: null
flushing:
  detection:  [111, 127, 179]
  flush nodes:
    feasible nodes: NZD
    infeasible nodes:  NONE
    max nodes: 2
    rate: 800.0
    response time: 0.0
    duration:  480.0
  close valves:
    feasible pipes: ALL
    infeasible pipes:  NONE
    max pipes: 0
    response time: 0.0
solver:
  type: coliny:StateMachineLS
  options:
  logfile: null
  verbose: 0
  initial points:   []
configure:
  output  prefix: Net3
  debug:  0
# EPANET network file name

# Injection location: ALL, NZD or EPANET ID
# Injection type:  MASS,  CONCEN,  FLOMPACED,  or SETPOINT
# Injection strength [mg/min or mg/L depending on type]
# Injection species, required for EPANET-MSX
# Injection start  time [min]
# Injection end time [min]
# TSG file name, overrides injection parameters above
# TSI file name, overrides TSG file
# Multi-species extension file name
# MSX species to save
# Use Merlion as WQ simulator, true or false

# ERD database file name
# Impact metric
# Health impact file name, required for public health  metrics
# Time [min] needed to respond
# Thresholds needed to perform detection
# Number of sensors for detection
# MSX species used to compute impact

# Sensor locations to detect contamination scenarios

# Feasible flushing nodes
# Infeasible flushing nodes
# Maximum number of nodes to flush
# Flushing rate [gallons/min]
# Time [min] between detection and flushing
# Flushing duration [min]

# Feasible pipes to close
# Infeasible pipes to close
# Maximum number of pipes to close
# Time [min] between detection and closing pipes

# Solver type
# A dictionary of  solver options
# Redirect solver output to a logfile
# Solver verbosity level
# Output file prefix
# Debugging level, default = 0
                        Figure 6.2:  The flushing configuration template file.
                                                   56

-------
6.4   Flushing  Response Examples

To demonstrate the two different solvers available in the flushing subcommand, two examples are presented.
Both examples have the same characteristics in terms of the contamination scenario and flushing parameters.
EPANET Example  Network 3 (NetS.inp) is the network used and the contamination scenario is an hour long
injection at node fOf beginning at hour 3 in the simulation. A maximum of three hydrants can be flushed for
a duration of eight  hours at a rate 800 gal/min.  The impact metric being minimized is population exposed
(PE).

6.4.1  Example 1

The first example uses the EA solver and the configuration file, flushing_exl.yml, is shown in Figure 6.3.

The example can be executed using  the following command line:
   ¥st flushing flushing_exl.yml
The JSON output file, flushing_exf .json, for example 1 is shown in Figure 6.4. The EA selected to flush
nodes f If, 193 and 197 for a PE impact value of 5226. The CPU time was 409f seconds or a little over an
hour.
                                               57

-------
network:
  epanet  file:  Net3/Net3. inp
scenario:
  location:  [101]
  type: MASS
  strength:  1.450000e+010
  species: null
  start time:  180
  end time:  240
  tsg file:  null
  tsi file:  null
  msx file:  null
  msx species:  null
  merlion: false
impact:
  erd file:  null
  metric: [PE]
  tai file:  Net3/Net3_bio.tai
  response time: 0
  detection limit: [0.0]
  detection confidence:  1
  msx species:  null
flushing:
  detection: [111, 127,  179]
  flush nodes:
    feasible nodes: NZD
    infeasible  nodes:  NONE
    max nodes:  3
    rate: 800.0
    response time: 0.0
    duration:  480.0
  close valves:
    feasible pipes: NONE
    infeasible  pipes:  NONE
    max pipes:  0
    response time: 0.0
solver:
  type: coliny_ea
  options:
    crossover_rate: 0.8
    crossover_type: uniform
    fitness_type:  linear_rank
    initialization_type: unique_random
    max_function_evaluations:  2
    max_iterations: 2
    mutation_rate: 1
    mutation_type: offset_uniform
    population_size:  2
    seed: 11011011
  logfile: null
  verbose: 0
configure:
  output  prefix: flushing_exl/Net3
  debug:  0
                      Figure 6.3: The flushing configuration file for example 1.
                                                   58

-------
{
  "run date":  "2013-07-01 09:30:30"
  "CPU time":  4091.9778118133545,
  "detection time": [
    220

  "pipes  to close": [] ,
  "nodes  to flush": [
    "197",
    "111",
    "193"

  "final  metric": 5226.35
                     Figure 6.4:  The flushing JSON output file for example  1.
                                                 59

-------
6.4.2   Example 2

The second  example uses the  network  solver  without  initial  points  and  the configuration file,  flush-
ing_ex2.yml, is shown in Figure 6.5.
   network:
     epanet file: Net3/Net3. inp
   scenario:
     location: [101]
     type: MASS
     strength: 1.450000e+010
     species: null
     start time:  180
     end time: 240
     tsg file: null
     tsi file: null
     msx file: null
     msx species: null
     merlion: false
   impact:
     erd file: null
     metric:  [PE]
     tai file: Net3/Net3_bio.tai
     response time: 0
     detection limit: [0.0]
     detection confidence:  1
     msx species: null
   flushing:
     detection:  [111, 127,  179]
     flush nodes:
       feasible nodes: NZD
       infeasible nodes:  NONE
       max nodes: 3
       rate: 800.0
       response time: 0.0
       duration:  480.0
     close valves:
       feasible pipes: NONE
       infeasible pipes:  NONE
       max pipes: 0
       response time: 0.0
   solver:
     type: coliny:StateMachineLS
     options:
       verbosity: 2
       max_fcn_evaluations:  0
     logfile: null
     verbose: 0
   configure:
     output prefix: flushing_ex2/Net3
     debug: 0
                        Figure 6.5: The flushing configuration file for example 2.

The example can be executed using the following command line:
   ¥st  flushing fIushing_ex2.yml
The JSON output file, flushing_ex2.json, for example 2 is shown in Figure 6.6.  The network solver selected
to flush nodes 101,  103 and  109 for  a PE impact  metric of 4919.   The  CPU  time was  126 seconds or
approximately 2 minutes.
Examining the output files from the two examples shows that the optimization solvers identified different

                                                   60

-------
   {
     "run date": "2013-07-01 09:27:48"
     "CPU time": 126.31300497055054,
     "detection time":  [
       220

     "pipes to close":  [] ,
     "nodes to flush":  [
       "101",
       "103",
       "109"

     "final metric": 4918.76
                       Figure 6.6:  The flushing JSON output file for example 2.


solutions.  As EAs are not guaranteed to find the optimal solution, these results are not unexpected.  In
addition, the CPU times to obtain the solutions are quite different. The EA solution took about 68 minutes
to obtained, while the network solver solution was achieved in approximately 2 minutes.
                                                   61

-------
Chapter 7

Booster  Station  Placement
Disinfection booster stations are commonly used throughout water distribution networks to maintain drinking
water standards.  Disinfectants degrade as  they move through the system  and booster stations,  designed
to inject disinfectant at strategic  locations, help maintain  residual  levels.  Booster stations can also be
placed with water security objectives in mind. WST includes two booster subcommands, booster_msx and
boosterjnip that are designed to place booster stations to minimize the impact of a contamination incident.
These subcommands use different approaches to model the reaction dynamics between a contaminant and
disinfectant.
The booster_msx subcommand uses EPANET-MSX to simulate the reaction dynamics between the con-
taminant and disinfectant. A flowchart representation of the booster_msx subcommand is shown in Figure
7.1.  The booster_msx subcommand employs an iterative process that combines contaminant transport,
impact assessment and optimization. The optimization process identifies a  set of booster station locations
where disinfectant is injected. The contaminant and disinfecant injections and their reaction dynamics  are
simulated in  the contaminant transport process and the effectiveness of the booster injections are evaluated
based upon the impact assessment  process.  Since the boosterjnsx subcommand relies on the tevasim and
sim2Impact  subcommands,  their required input  is also required for  the booster_msx subcommand.  Ad-
ditionally,  the  sensor network design used to detect the contamination incident(s) and the booster station
characterstics are  required inputs.  The utlity network model is defined by  a EPANET  INP file, while  the
rest of the input can be specified in the booster_msx WST configuration file.
The booster_mip  subcommand uses the linear water quality model Merlion and assumes the reaction dynam-
ics between the contaminant and disinfectant can be approximated by  a neutralization  or limiting reagent
reaction model. A flowchart representation of the booster_mip subcommand is shown  in Figure 7.2. The
utlity network  model is defined by a EPANET INP file. Additional input speficied in the booster_mip WST
configuration file are the contamination scenarios, the sensor network design used to detect the contamination
incident(s) and the booster station characterstics.

-------
         Utility Network
           Model
Simulation
  Input
                 Contaminant
                   Transport
           Consequences
              Input
                  Threat Ensemble
                   Database
              Impact
            Assessment
         Sensor
      Placement
       Booster
    Characteristics
                                          Impact File
                                   Booster Placement
                                      Optimization
                                                                      Booster
                                                                     Locations
                     Figure 7.1: Multi-species reaction booster placement flowchart.
             Utility Network
               Model
    Simulation
      Input
   Sensor
Placement
   Booster
Characteristics,
                     Contaminant
                       Transport
                               Booster Placement
                                 Optimization
                                                                  Booster
                                                                 Locations
                             Figure 7.2: MIP booster placement flowchart.
7.1    Booster Placement  Using Multi-species Reaction

The  booster_msx subcommand  uses  optimization  methods  integrated with EPANET-MSX to evaluate
booster placements using multi-species reaction dynamics between the contaminant and disinfectant.  The
booster station placement problem formulation selects a set of booster station locations that minimizes of
the average impact of all contamination incidents given a set of potential booster stations that inject  a
disinfecting agent. The mathematical formulation can be written as follows:

                           minimize  —-  >   d
                      subject to      2_^ yb < Bmax
                                     beB
                                    yb e {o, 1}
                                          Vb&B
                                                 63

-------
where A represents a set of contamination incidents, damax defines the maximum impact of the contamination
incident a, B represents the set of potential booster station locations, y^ is a binary variable which is 1 if node
6 is selected as a booster station location, and Bmax is the maximum number of booster station  locations.
The maximum impact of a contamination incident,  dajmax, is the total impact across the entire network at
the end of the simulation assuming that the contaminant was not detected by a sensor, and no interventions
to reduce impacts were implemented.  This value is found in the -1 entry of the impact file.  For this problem,
it is assumed that booster stations can be located at any user-defined nodes in the network.

7.1.1  Booster MSX Solvers

The multi-species booster station placement problem is solved  through an  iterative optimization process
which selects different sets of booster station locations and evaluates their effectiveness in minimizing the
impact of a set of contamination incidents. The booster_msx subcommand uses the  evolutionary  algorithm
(EA)  included with DAKOTA, Coliny EA.  Additional information on DAKOTA/Coliny solvers can be
found at http://dakota.sandia.gOv/docs/dakota/5.2/html-ref/index.html and in the DAKOTA user
manual (Adams et al., 2013).

To design an EA, the parameter space for the optimization problem is first encoded into a string of numbers.
This encoded representation of the problem is called a genetic  string, where each  element  of the genetic
string represents one parameter. When the EA is used with boosterjnsx the parameter space  is defined
by the number of booster station locations. Each location is assigned a sequential integer that represents a
feasible location within the  EPANET network. The final EA solution  is translated  to represent  EPANET
node IDs.
The EA  has several solver options that define  how the EA evolves.  These options can  be set in the
[solver] [options] sections  of the  booster_msx WST configuration file and are  specific  to the Coliny
EA solver.  The EA  evolves  an initial genetic strings of size  [population_size] that is set  based on
[initialization_type] using the following steps:

   1. Evaluation:  Evaluate the solution for each genetic string.  This  involves function calls to the tevasim
     and sim2Impact subcommands  for each string to define the fitness score.

   2. Breeding:  Select two members of the population based on fitness.  The probability of selection is based
     on  [fitness_type].

   3. Crossover: Crossover two members based on [crossover_type] and [crossover_rate].

   4. Mutation: Mutate the two members based on [mutation_type] and [mutation_rate].

   5. Steps 2-4 are repeated until the entire  population has been changed.

   6. Replacement: After a  new population is created,  the old population  is replaced by the  current popu-
     lation while  keeping the highest ranked string (elitist  = 1 replacement option).

Steps 1-6 are repeated until [max_iterations] or [max_function_evaluations] criteria is  met.

7.1.2  booster_msx Subcommand

The booster_msx subcommand is executed using the following command line:
   ¥st booster_msx 
where conf igf ile is a WST configuration file in the YAML format.
The  --help option prints information about this subcommand, such as usage,  arguments, and a brief de-
scription:
   ¥st booster_msx —help
                                                64

-------
7.1.2.1   Configuration  File

The booster_msx subcommand generates a template configuration file using the following command line:
   ¥st booster_msx —template 
The booster_msx template configuration file is shown in Figure 7.3.
included in the template after the # sign.
                                        Brief descriptions of the options  are
   # booster_msx configuration
   network:
     epanet  file: NetS.inp
   scenario:
     location:  ['101']
     type: MASS
     strength:  100.0
     species: BIO
     start time: 0
     end time:  1440
     tsg file: null
     tsi file: null
     msx file: Net3_bio.msx
     msx species: BIO
     merlion: false
   impact:
     erd file: null
     metric:  [MC]
     tai file: null
     response time: 0
     detection limit: [0.0]
     detection confidence: 1
     msx species: BIO
   booster msx:
     detection:  [111, 127, 179]
     toxin species: BIO
     decon species: CLF
     feasible nodes: ALL
     infeasible nodes: NONE
     max boosters: 2
     type: FLOWPACED
     strength: 4.0
     response time: 0.0
     duration: 600.0
   solver:
     type: coliny_ea
     options:
     logfile: null
     verbose: 0
     initial points:  []
   configure:
     output  prefix: Net3
     debug:  0
template

  # EPANET  network file name

  # Injection  location: ALL, NZD or EPANET ID
  # Injection  type: MASS, CONCEN,  FLOMPACED, or SETPOINT
  # Injection  strength [mg/min or mg/L depending on type]
  # Injection  species, required for EPANET-MSX
  # Injection  start time [min]
  # Injection  end time [min]
  # TSG file name, overrides injection parameters above
  # TSI file name, overrides TSG file
  # Multi-species extension file name
  # MSX species to save
  # Use Merlion as WQ simulator, true or false

  # ERD database file name
  # Impact  metric
  # Health  impact file name, required for public health metrics
  # Time [min] needed to respond
  # Thresholds needed to perform detection
  # Number  of  sensors for detection
  # MSX species used to compute impact

  # Sensor  locations to detect contamination scenarios
  # Toxin species injected in each contaminant scenario
  # Decontaminant injected from booster station
  # Feasible booster nodes
  # Infeasible booster nodes
  # Maximum number of booster stations
  # Booster source type:  FLOWPACED
  # Booster source strength [mg/L]
  # Time [min] between detection and booster injection
  # Time [min] for booster injection

  # Solver  type
  # A dictionary of solver options
  # Redirect solver output to a logfile
  # Solver  verbosity level
  # Output  file prefix
  # Debugging  level, default = 0
                         Figure 7.3: The booster_msx configuration template file.

7.1.2.2   Configuration  Options

Full descriptions of the WST configuration options used by the boosterjnsx subcommand are listed below.
network
      epanet file
           The name of the EPANET input (INP) file that defines the water distribution network model.
                                                    65

-------
          Required input.
scenario
     location
          A list that  describes the injection locations for the contamination scenarios.  The options are:
          (1) ALL, which denotes all nodes (excluding  tanks and reservoirs) as contamination injection
          locations; (2) NZD,  which  denotes all nodes with non-zero demands as contamination injection
          locations; or (3) an  EPANET node ID, which identifies the node where contamination is intro-
          duced.  This allows easy specification of single or multiple contamination scenarios.
          Required input unless a TSG or TSI file is specified.
     type
          The injection type for the contamination scenarios.  The options are MASS, CONCEN, FLOW-
          PACED or SETPOINT. See the EPANET manual for additional information about source types
          (Rossman, 2000).
          Required input unless a TSG or TSI file is specified.
     strength
          The amount of contaminant injected into the network for the contamination scenarios. If the type
          option is MASS, then the units for the strength are in mg/min. If the  type option is CONCEN,
          FLOWPACED or SETPOINT, then units are in mg/L.
          Required input unless a TSG or TSI file is specified.
     species
          The name of the contaminant species injected into the network. This the name of a single species.
          It is required when using EPANET-MSX, since multiple species might be simulated, but only one
          is injected into the network.  For cases where multiple contaminants are injected, a TSI file is
          needed.
          Required input for EPANET-MSX unless a TSG or TSI file is specified.
     start time
          The injection start time that defines when the contaminant injection begins. The time is given in
          minutes and is measured from the start of the simulation.  For example, a value of 60 represents
          an injection that  starts at hour 1 of the simulation.
          Required input unless a TSG or TSI file is specified.
     end time
          The injection end time that defines when the contaminant injection stops. The time is given in
          minutes and is measured from the start of the simulation. For example, a value of 120 represents
          an injection that  ends at hour 2 of the simulation.
          Required input unless a TSG or TSI file is specified.
     tsg file
          The name of the TSG scenario file that defines the ensemble of contamination scenarios to be
          simulated. Specifying a TSG file will override the location, type, strength, species,  start and end
          times options specified in the WST configuration file. The TSG file  format is documented in File
          Formats Section 11.12.
          Optional input.
     tsi file
          The name of the  TSI scenario file that defines the ensemble of contamination scenarios to be
          simulated. Specifying a TSI file will override the TSG file, as well as the location, type, strength,
          species,  start and end time options specified in the WST configuration file. The TSI file  format
          is documented in File Formats Section 11.13.
          Optional input.
                                                66

-------
     msx file
          The name of the EPANET-MSX multi-species file that defines the multi-species reactions to be
          simulated using EPANET-MSX.
          Required input for EPANET-MSX.
     msx species
          The name of the MSX species whose concentration profile will be saved by the EPANET-MSX
          simulation and used for later calculations.
          Required input for EPANET-MSX.
     merlion
          A flag (true or false) to indicate if the Merlion water quality simulator should be used. If an MSX
          file is provided, EPANET-MSX will be used.
          Required input, default = false.
impact
     erd file
          The name of the ERD database file that contains the contaminant transport simulation results.
          It is created by running the tevasim subcommand.  Multiple ERD files (entered as a list) can be
          combined to generate a single impact file. This can be used to combine simulation results from
          different types of contaminants, in which the ERD files were  generated from different TSG files.
          Required input.
     metric
          The impact metric used to compute the impact file. Options include EC, MC, NFD, PD, PE, PK,
          TD, TEC or VC. One impact file is  created for each metric  selected. These metrics are defined
          in Section 4.1.
          Required input.
     tai file
          The name of the TAI file that contains health impact information.  The TAI file format is docu-
          mented in File Formats Section 11.11.
          Required input if a public health metric is used (PD, PE or PK).
     response time
          The number of minutes that are needed to respond to the detection of a contaminant. This
          represents the time  that it  takes a water utility  to stop the spread of the contaminant in  the
          network and eliminate the  consumption of contaminated water. As the response time increases,
          the impact increases because the contaminant affects the network for a greater length of time.
          Required input, default = 0 minutes.
     detection limit
          The concentration thresholds needed to perform detection with a sensor.  There must be  one
          threshold for each ERD file.  The units of these detection limits depend on  the units of  the
          contaminant simulated for  each ERD file (e.g., number of cells of a biological agent).  The units
          for detection limit are the same as for the MASS values that are specified in the TSG file.
          Required input, default = 0.
     detection confidence
          The number of sensors that must detect an incident before the impacts are calculated.
          Required input, default = 1 sensor.
     msx species
          The name of the MSX species  tracked in the EPANET-MSX simulation.  This parameter is
          required for multi-species contamination incidents created by tevasim subcommand.
                                                67

-------
          Required input for EPANET-MSX, default = first species listed in the ERD file
booster msx
     detection
          The sensor network design used to detect contamination scenarios. The sensor locations are used
          to compute a detection time for each contamination scenario in the TSG file.  The options are a
          list of EPANET node IDs or a file name which contains a list of EPANET node IDs.
          Required input.
     toxin species
          The name of the contaminant species that is  injected in each contamination scenario. This is the
          species that interacts with the injected disinfecant and whose impact is going to be minimized.
          Required input.
     decon species
          The name of the decontaminant or disinfectant species that is injected  from the booster stations.
          Required input.
     feasible nodes
          A list that defines nodes that can be considered for the booster station placement problem. The
          options  are:  (1) ALL, which specifies all nodes as feasible  booster station locations;  (2) NZD,
          which specifies all non-zero demand nodes as feasible booster station locations; (3) NONE, which
          specifies no nodes as feasible booster station locations; (4)  a list of EPANET node IDs, which
          identifies specific nodes as feasible booster station locations; or (5) a filename, which is a space
          or comma separated file containing a list of specific nodes as feasible booster station locations.
          Required input, default = ALL.
     infeasible nodes
          A list that defines nodes that cannot be considered for the booster station placement problem.
          The options are:  (1) ALL, which specifies all nodes as infeasible booster station  locations; (2)
          NZD, which specifies non-zero demand nodes as infeasible booster station locations; (3) NONE,
          which specifies no nodes as infeasible booster station locations; (4) a list of EPANET node IDs,
          which identifies specific nodes as infeasible booster station locations; or (5) a filename, which is
          a space  or comma separated  file containing  a list of specific nodes as  infeasible booster  station
          locations.
          Optional input, default = NONE.
     max boosters
          The maximum number of booster stations that can be  placed in the network.  The value must be
          a nonnegative integer or a list of nonnegative integers.  When a list is specified, the optimization
          will be performed for each number in this list.
          Required input.
     type
          The injection type for the disinfectant at the booster stations. The option is FLOWPACED. See
          the EPANET manual for additional information about source types Rossman (2000).
          Required input.
     strength
          The amount of disinfectant injected into the network from the booster stations. For the source
          type FLOWPACED, the strength units are in mg/L.
          Required input.
     response time
          The time in minutes between the detection of a contamination incident and the start of injecting
          disinfectants from the booster stations. The  value is a nonnegative integer. For example,  a value


                                                68

-------
          of 120 represents a 120 minutes or a 2 hour delay between the time of detection and the start of
          booster injections.
          Required input.
     duration
          The length of time in minutes that disinfectant will be injected at the booster stations during the
          simulation. The value is a nonnegative integer. For example, a value of 240 means that a booster
          would simulate injection of disinfectant at a particular node for 4 hours. This duration is applied
          to all booster station locations identified in the optimization process.
          Required input.
solver
     type
          The solver type.
          Required input.
     options
          A list of options associated with a specific solver type.
          Optional input.
     logfile
          The name of a file  to output the results of the solver.
          Optional input.
     verbose
          The solver verbosity level.
          Optional input, default = 0 (lowest level).
     initial points
          nodes
              A list of node locations (EPANET  IDs) to begin the optimization process.
              Optional input.
          pipes
              A list of pipe locations (EPANET IDs) to begin the optimization process.
              Optional input.
configure
     output prefix
          The prefix used for all output files.
          Required input.
     debug
          The debugging level that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).

In addition  to these standard WST configuration options, the solver block can define  specific options for
the solver selected.  The solver options  should be  modified according to the specific  optimization problem.
If the options are not set in the solver block, then the default values for these options are used. The EA
solver available in the booster_msx subcommand has numerous options which can be defined. Additional
on the options available for the EA solver can found in the DAKOTA user manual (Adams et al., 2013). An
example of the EA solver options are listed below.
   solver:
    type: coliny_ea
    options:
      crossover_rate:  0.8
                                                69

-------
      crossover_type: uniform
      fitness_type:  linear_rank
      initialization_type: unique_random
      max_function_evaluations: 30000
      max_iterations: 1000
      mutation_rate:  1
      mutation_type:  offset_uniform
      population_size: 50
      seed:  11011011
7.1.2.3  Subcommand Output

The _booster_msx.json file contains a summary of the booster_msx subcommand results.
This file includes an optimized list of node locations (EPANET node IDs) to inject the disinfectant, the final
impact metric, the detection time based on the sensor placement locations, the run date, and CPU time.

7.2    Booster Placement Using Neutralization  or Limiting Reagent Reaction

If the  contaminant or the disinfectant are present in the water  distribution system in large excess quantities
to each other, the booster station placement problem can be formulated as a mixed-integer program (MIP).
The booster_mip subcommand uses this MIP to determine optimal node locations of booster stations used
for responding to contamination incidents.
Two separate model formulations are available within the booster_mip subcommand. These are referred to
as the neutralization (NEUTRAL) formulation and the limiting reagent (LIMIT) formulation.  Each  has a
unique set of advantages that can be leveraged depending on the needs of the user.  The NEUTRAL formu-
lation models the idealized situation in which the disinfecting agent (e.g., chlorine)  immediately inactivates
any amount of contaminant it comes into contact with; the amount of disinfectant available for  inactivation
is not considered.  This allows for a highly  compact model  formulation, which  is tractable for application
to both large water distribution systems and large scenario  ensembles. The placements resulting  from the
NEUTRAL formulation represent booster station locations that are optimal when the amount of disinfectant
required to perform inactivation is small and the time required for complete inactivation is fast.
The LIMIT formulation models the case where  the disinfectant is consumed  during the reaction  with the
contaminant. This is more realistic than the NEUTRAL formulation in that the optimal solution  is highly
dependent on the amount of disinfectant injected by the booster stations. However, the model still assumes
that upon mixing,  the reaction quickly continues until either the contaminant or disinfectant is completely
consumed.  The LIMIT formulation has an input parameter,  sigma, which indicates the ratio of disinfectant
to contaminant consumed in the reaction.  This input parameter can be  adjusted to approximate a  more
realistic pairing of  specific disinfectant and contaminant species (e.g., chlorine and E. coli).

The optimization is performed over an ensemble of contamination scenarios. The booster_mip subcommand
uses Merlion to perform water quality  simulations, which are used to generate the necessary data for the
optimization formulation. The amount of time required for simulations can differ depending on the problem
formulation selected by the user (e.g., LIMIT or NEUTRAL).

7.2.1   Neutralization NEUTRAL  Formulation

The NEUTRAL formulation is as follows:
                                                70

-------
     minimize
               aeA   neN teT
Subject tO
                                 « 53 53^n>*>amn>*>a
                                  neN teT

                           Sntt!a > 1 - 53 ybZn,t,a,b
                                      beB

                            53 2/6 - Bmax
                            beB
                           0 < Sn,t,a < 1
                           2/5 e {0,1}
where mn t a = cntadnt
Vn e AT, t e T, a e
                                                    Vn e AT, t e T, a e
                                                    V5e B
where A represents the set of scenarios, N defines the set of network nodes, T represents the set of time steps,
B defines the set of potential booster station locations, aa is the probability of scenario a, mn^,a and cnit,a
are  the mass and concentration, respectively, of contaminant leaving node n at time step t for scenario a,
dnjt is the demand for the corresponding node and time step, and Bmax is the maximum number of booster
stations. The continuous variable, 5n^a, indicates whether the  contaminant is available for consumption at
node n and time step t for scenario a and the binary variable, yt,, is 1 if node b is selected as a booster station
location.  In addition, Zn^,a,b is determined  from the pre-computed booster  station simulations.  These
simulations  determine  the node-time pairs that are neutralized based on the specific contaminant scenario
and feasible booster station locations. The parameter Znj^aj, is equal to  1 only  if a booster station installed
at node 6 neutralizes the contaminant leaving node n and time  step t for scenario a,  otherwise, Zn^,a,b is 0.
Equation 7.4 is the  objective function,  which minimizes the  mass consumed across all nodes, for every
scenario, for every time  step  in the simulation.  Equation  7.5  ensures that 5ntt
             cn,t,a) cn,t,a  —
              con   > n
             ' n,t,a  — u
             yb e {o, 1}
                                        o
                                        u
                              (7.10)
                              (7.11)
                              (7.12)
                              (7.13)

                              (7.14)

                              (7.15)
                              (7.16)
                              (7.17)
                                                                  V6eB

where A represents the set of scenarios, N defines the set of network nodes, T represents the set of time

                                                 71

-------
steps, B defines the set of potential booster locations, aa is the probability of scenario a, dnj is the demand
at node n and time step t, rn
where conf igf ile is a WST configuration file in the YAML format.

-------
The --help option prints information about this subcommand, such as usage, arguments,  and a brief de-
scription:
   ¥st booster_mip  —help
7.2.4.1   Configuration File

The booster_mip subcommand generates a template configuration file using the following command line:
   ¥st booster_mip  —template 
The booster_mip template configuration file  is shown in Figure
included in the template after the # sign.
                                 7.4.  Brief descriptions of the options  are
   # booster_mip configuration template
   network:
     epanet  file:  NetS.inp
   scenario:
     location:  null
     type: null
     strength:  null
     species: null
     start time:  null
     end time:  null
     tsg file:  NetS.tsg
     tsi file:  null
     msx file:  null
     msx species:  null
     merlion: false
   booster mip:
     detection:  [111, 127, 179]
     model type:  NEUTRAL
     model format: PYOMO
     stoichiometric ratio: [0.0]
     objective:  MC
     toxin decay coefficient: 0
     decon decay coefficient: 0
     feasible nodes: ALL
     infeasible nodes: NONE
     max boosters:  [2]
     type: FLOWPACED
     strength:  4.0
     response time: 0.0
     duration:  600.0
     evaluate:  false
   solver:
     type: glpk
     options:
     logfile: null
     verbose: 0
     initial points:  []
   configure:
     output  prefix: Net3
     debug:  0
   boostersim:
   eventDetection:
   boosterimpact:
# EPANET network  file name

# Injection location: ALL, NZD or EPANET ID
# Injection type: MASS, CONCEN, FLOWPACED, or SETPOINT
# Injection strength  [mg/min or mg/L depending on type]
# Injection species, required for EPANET-MSX
# Injection start time  [min]
# Injection end time  [min]
# TSG file name,  overrides injection parameters above
# TSI file name,  overrides TSG file
# Multi-species extension file name
# MSX species  to  save
# Use Merlion  as  WQ simulator, true or false

# Sensor locations to detect contamination scenarios
# Booster model type: NEUTRAL or LIMIT
# Booster optimization model: AMPL or PYOMO
# Stoichiometric  ratio  [decon/toxin], LIMIT model only
# Objective to minimize
# Toxin decay  coeffienct: None, INP or number
# Decontaminant decay coefficient: None, INP or number
# Feasible booster nodes
# Infeasbile booster nodes
# Maximum number  of booster stations
# Booster source  type: MASS or FLOWPACED
# Booster source  strength  [mg/min or mg/L depending on type]
# Time [min] between detection and booster injection
# Time [min] for  booster injection
# Evaluate booster placement: true or false,  default =  false

# Solver type
# A dictionary of solver options
# Redirect solver output to a logfile
# Solver verbosity level
# Output file prefix
# Debugging level, default = 0
                         Figure 7.4: The booster_mip configuration template file.
                                                     73

-------
7.2.4.2  Configuration Options

Full descriptions of the WST configuration options used by the boosterjnip subcommand are listed below.
network
     epanet file
          The name of the EPANET input (INP) file that defines the water distribution network model.
          Required input.
scenario
     location
          A list that describes the injection locations for the contamination scenarios.  The options are:
          (1) ALL, which denotes all nodes (excluding tanks and reservoirs) as  contamination injection
          locations; (2)  NZD, which  denotes all nodes with non-zero demands as  contamination injection
          locations; or (3) an EPANET node ID, which identifies the node where contamination is intro-
          duced.  This allows easy specification of single or multiple contamination scenarios.
          Required input unless a TSG or TSI file is specified.
     type
          The injection  type for the contamination scenarios. The options are MASS, CONCEN, FLOW-
          PACED or SETPOINT. See the EPANET manual for additional information about source types
          (Rossman, 2000).
          Required input unless a TSG or TSI file is specified.
     strength
          The amount of contaminant injected into the network for the contamination scenarios.  If the type
          option is MASS, then the units for the strength are in mg/min. If the type option is  CONCEN,
          FLOWPACED or  SETPOINT, then units are in mg/L.
          Required input unless a TSG or TSI file is specified.
     species
          The name of the contaminant species injected into the network. This the  name of a single species.
          It is required when using EPANET-MSX, since multiple species might be simulated, but only one
          is injected into the network.  For cases where multiple contaminants are  injected, a TSI file is
          needed.
          Required input for EPANET-MSX unless a TSG or TSI file is specified.
     start time
          The injection start time that defines when the contaminant injection begins. The time is given in
          minutes and is measured from the start of the simulation. For example,  a  value of 60 represents
          an injection that starts at hour 1 of the simulation.
          Required input unless a TSG or TSI file is specified.
     end time
          The injection  end  time that defines when the contaminant injection stops. The time is given in
          minutes and is measured from the start of the simulation. For example, a value of 120 represents
          an injection that ends at hour 2 of the simulation.
          Required input unless a TSG or TSI file is specified.
     tsg file
          The name of the TSG scenario file that  defines the ensemble of contamination scenarios to be
          simulated. Specifying a TSG file will override the location,  type, strength,  species, start and end
          times options specified in the WST configuration file.  The TSG file format is documented in File
          Formats Section 11.12.
          Optional input.

                                                74

-------
     tsi file
          The name of the TSI scenario file that defines the ensemble of contamination  scenarios  to be
          simulated. Specifying a TSI file will override the TSG file, as well as the location, type, strength,
          species, start and end time options specified in the WST configuration file.  The TSI file format
          is documented in File Formats Section 11.13.
          Optional input.
     msx file
          The name of the EPANET-MSX multi-species file that defines the multi-species reactions to be
          simulated using  EPANET-MSX.
          Required input for EPANET-MSX.
     msx species
          The name of the MSX species whose concentration profile will be saved by the  EPANET-MSX
          simulation and used for later calculations.
          Required input for EPANET-MSX.
     merlion
          A flag (true or false) to indicate if the Merlion water quality simulator should be used. If an MSX
          file is provided, EPANET-MSX will be used.
          Required input,  default = false.
booster mip
     detection
          The sensor network design used to detect contamination scenarios. The sensor locations are used
          to compute a detection time for each contamination scenario in the TSG file. The options are a
          list of EPANET node IDs or a file name which contains a list of EPANET node IDs.
          Required input.
     model type
          The model type  used to determine optimal booster station locations. Options include NEUTRAL
          (complete neutralization) or LIMIT (limiting reagent).
          Required input,  default = NEUTRAL.
     model format
          The modeling language used to  build the  formulation specified by the model type option.  The
          options are AMPL  and PYOMO. AMPL is a third party package that must be installed by the
          user if this option is specified.  PYOMO is an open source software package that is distributed
          with WST.
          Required input,  default = PYOMO.
     stoichiometric ratio
          The stoichiometric ratio used by the limiting reagent model (LIMIT) which represents the amount
          of disinfectant used for each mass unit of toxin inactivated by  the neutralization reaction. This
          can be a number or a list of numbers greater than 0.0. When a list is specified, the optimization
          will be performed for each  number  in this list.  As the stoichiometric ratio approaches 0, the
          LIMIT model converges to the NEUTRAL model.
          Required input if the model type = LIMIT.
     objective
          The impact metric used to place the booster stations. In the current version, this parameter must
          be set to MC (mass of toxin consumed through the node demands).
          Required input,  default = MC.
     toxin decay coefficient
          The contaminant (toxin) decay coefficient. The options are (1) None, which runs the simulations

                                               75

-------
     without first-order decay, (2) INP, which runs the simulations with first-order decay using the
     coefficient specified in the EPANET INP file or  (3) a number, which runs the simulation with
     first-order decay and the specified first-order decay coefficient in units of (f/min) (overrides the
     decay coefficient in the EPANET  INP file).  If the EPANET time units are not in minutes, the
     value is converted.
     Required input, default = 0.
decon decay coefficient
     The disinfectant (decontaminant)  decay coefficient.  The options are (1) None, which runs the
     simulations without first-order decay, (2) INP, which runs the simulations with first-order decay
     using the coefficient specified in the EPANET INP file or (3) a number, which runs the simulation
     with first-order  decay and the specified first-order decay coefficient in units  of (1/min) (overrides
     the  decay coefficient in the EPANET INP file). If the EPANET time units are not  in minutes,
     the value is converted.
     Required input, default = 0.
feasible  nodes
     A list that defines nodes that can  be considered for the booster station placement problem. The
     options are:  (1) ALL, which specifies all nodes as feasible booster station locations; (2) NZD,
     which specifies all non-zero demand nodes as feasible booster station locations; (3) NONE, which
     specifies no nodes as feasible booster station locations; (4) a list of EPANET node IDs, which
     identifies specific nodes as feasible booster station locations;  or (5) a filename, which is a space
     or comma separated file containing a list of specific nodes as feasible booster station  locations.
     Required input, default = ALL.
infeasible nodes
     A list that defines nodes that  cannot be considered for the booster station placement problem.
     The options are:  (1) ALL, which specifies  all nodes as infeasible booster  station locations; (2)
     NZD,  which specifies non-zero demand nodes as infeasible booster station locations;  (3) NONE,
     which specifies no nodes as infeasible booster station locations; (4) a list of EPANET node IDs,
     which identifies specific nodes  as infeasible booster station locations; or (5) a  filename, which is
     a space or comma separated file containing a list of specific  nodes as infeasible booster  station
     locations.
     Optional input, default = NONE.
max boosters
     The maximum number of booster  stations that can be placed in the network. The value must be
     a nonnegative integer or a list of nonnegative integers.  When a list is specified, the optimization
     will  be performed for each number in this list.
     Required input.
type
     The injection type for the disinfectant at the booster stations.  The options  are MASS or FLOW-
     PACED. See the EPANET manual for additional information about source types Rossman  (2000).
     Required input.
strength
     The amount of disinfectant injected into the network from the booster stations.  If the source
     type option is MASS, then the units for the strength are in mg/min. If the source type option is
     FLOWPACED, then units are in mg/L.
     Required input.
response time
     The time in minutes between the detection of a contamination incident and the start of injecting
     disinfectants from the booster stations. The value is a nonnegative integer.  For example,  a value
                                           76

-------
          of 120 represents a 120 minutes or a 2 hour delay between the time of detection and the start of
          booster injections.
          Required input.
     duration
          The length of time in minutes that disinfectant will be injected at the booster stations during the
          simulation. The value is a nonnegative integer. For example, a value of 240 means that a booster
          would simulate injection of disinfectant at a particular node for 4 hours. This duration is applied
          to all booster station locations identified in the optimization process.
          Required input.
     evaluate
          The option to evaluate the  booster  station placement created from the optimization process.
          Optional input, default = false.
solver
     type
          The solver type.
          Required input.
     options
          A list of options associated with a specific solver type.
          Optional input.
     logfile
          The name of a file  to output the results of the solver.
          Optional input.
     verbose
          The solver verbosity level.
          Optional input, default = 0 (lowest level).
     initial points
          nodes
              A list of node locations (EPANET IDs) to begin the optimization process.
              Optional  input.
          pipes
              A list of pipe locations (EPANET IDs) to begin the optimization process.
              Optional  input.
configure
     output prefix
          The prefix used for all output files.
          Required input.
     debug
          The debugging level that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).
In addition to these standard  WST configuration options, the solver block can define specific  options for the
solver selected.  The solver options should be modified according to the specific optimization problem. The
boosterjnip subcommand recognizes the following options in the solver  block of the configuration file:
   solver:
    type: glpsol
    options:
                                                77

-------
      mipgap: 0.01
The solver block above shows an example of using the public domain solver GLPK (glpsol) with the LP-file
(Ip) interface available in the modeling language Pyomo. A common option available with MIP solvers is
mipgap, which  is used to balance the quality of the solution found by the solver with the time taken to
obtain the solution.

7.2.4.3  Subcommand Output

The -booster_mip-.json file, where  is an integer starting at 1, contains
a summary of the booster_mip subcommand results. If more than one booster station design is requested
in the  WST configuration file, the  suffix is incrementally increased each time to create multiple
JSON files. This output file includes which node locations to inject disinfectant (EPANET IDs), details on
scenarios that are detected  and not detected, the run date, and CPU time.

7.3    Booster Placement Examples

Two booster station placement examples are provided. The first example determines booster station place-
ment assuming  complete inactivation of the contaminant, and the second example evaluates this placement
in terms of a more realistic reaction dyanamic between the contaminant and the disinfectant. The examples
use the EPANET Example Network 3 INP file, NetS.inp. A  contaminantion scenario ensemble is defined
using  all  NZD nodes and a biological contaminant injection of 5.77e8 CFU/min (colony forming units per
minute),  starting at time 0 and continuing for 6 hours.  Sensors located at nodes 15, 35, 219, and 253 are
used to detect each contamination scenario and initiate the booster response action. Booster stations inject
disinfectant at  4 mg/L for 12 hours after detection, since  no additional  response time is added between
detection and booster station operation.

7.3.1   Example  1

The first example uses the booster_mip subcommand and  the  NEUTRAL approach.  The model  for-
mat is PYOMO and the  solver is  GLPK. These parameter options  are listed in  the configuration  file,
booster_mip_exl.yml, shown in Figure 7.5. The maximum number of booster stations is listed as an array
to indicate that five  booster  station designs should be  created, using  2, 4, 6, 8,  and 10 as the maximum
number of booster stations to  place in the network. This notation uses the generated model files to efficiently
solve for more than one design.  The feasible booster station locations are limited to NZD nodes.
The example  can be executed using the following command line:
   ¥st booster_mip booster_mip_exl.yml
Since five booster station designs were requested, five JSON output files with the results are produced. The
file Net3_booster_mip-3. json contains results for placing six booster stations in the network.  The JSON
files  list the booster station locations under the header booster nodes. The injected mass (mass injected
grams), the mass consumed before booster stations were initiated (pre-booster mass consumed grams), and
the total mass consumed at  the end of the simulation  (mass consumed grams)  are also listed in the JSON
file. This information can be used to evaluate the effectiveness of the booster stations on each contamination
injection scenario.
     "booster nodes":  [
      "101",
      "141",
      "171",
      "215",
      "219",
      "255"
                                                78

-------
   network:
     epanet file: Net3/Net3. inp
   scenario:
     location:  [NZD]
     type: MASS
     strength:  5.77e8
     start time: 0
     end time:  360
   booster mip:
     detection: ['15', '35',  '219', '253']
     model type: NEUTRAL
     model format: PYOMO
     stoichiometric ratio:  0
     objective: MC
     toxin decay coefficient: 0
     decon decay coefficient: 0
     feasible nodes: NZD
     infeasible nodes: NONE
     fixed nodes:  []
     max boosters:  [2,4,6,8,10]
     type: FLOMPACED
     strength:  4
     response time: 0
     duration:  1440
     evaluate:  false
   solver:
     type: glpk
     options:  -Q
     logfile:  null
     verbose:  0
   configure:
     output prefix: booster_mip_exl/Net3
     debug:  0
                     Figure 7.5: The booster_mip configuration file for example 1.


Booster station placement using the LIMIT approach requires a few minor modifications to the WST con-
figuration file above.  The model type option is changed from NEUTRAL to LIMIT and the sigma option
is set to a value greater than 0.  The sigma option represents the stoichiometric ratio, which is a unit less
value that defines the unit mass of disinfectant needed to inactivate a unit mass of contaminant. For exam-
ple, a stoichiometric ratio of 0.01 specifies that 0.01 mg of a disinfectant is needed to inactivate 1 CFU of
contaminant.

7.3.2  Example 2

Since both the NEUTRAL and LIMIT formulations use simplifying assumptions  to model the reaction
dynamics between a contaminant and a disinfectant, it is often useful to evaluate a booster station design from
the MIP methods using a more complex multi-species reaction model through the booster_msx subcommand.
While the booster_msx subcommand coupled with an EPANET-MSX model can be used to optimize booster
station placement, the number  of function evaluations required for convergence often makes this process
infeasible.
The multi-species reaction equations in the Net3_EColi_TSB.msx file describe the inactivation of E. coli by
chlorine and  the reaction of E. coli and chlorine with the nutrient broth (TSB) (Murray et al., 2011). The
contamination scenarios are setup using the TSI file, Net3_EColi_TSB.tsi. This file defines the same E. coli
injection as in the NEUTRAL approach, but includes a TSB injection at all NZD nodes in the  network as
well.  The configuration file, booster_msx_exl.yml, is shown in  Figure  7.6 to evaluate  a  booster station
design from the NEUTRAL approach.

-------
   network:
     epanet file: Net3/Net3. inp
   scenario:
     tsi  file: Net3/Net3_EColi_TSB.tsi
     msx  file: Net3/Net3_EColi_TSB.msx
     msx  species: EColi
   impact:
     erd  file: null
     metric:  [MC]
     tai  file: null
     response time: 0
     detection limit: [0.0]
     detection confidence:  1
     msx  species: EColi
   booster msx:
     detection:  ['15', '35',  '219',  '253']
     toxin species: EColi
     decon species: CL
     feasible nodes:  ['101','141','171','215','219','255']
     infeasible nodes: NONE
     max  boosters: 6
     type: FLOWPACED
     strength: 4.0
     response time: 0.0
     duration: 720
   solver:
     type: EVALUATE
     options: -Q
     verbose: True
   configure:
     output prefix: booster_msx_exl/Net3
     debug: 0
                      Figure 7.6: The booster_msx configuration file for example 2.


The example can be executed using the following command line:
   ¥st  booster_mip booster_msx_exl.yml
This analysis indicates that the booster stations placed with the assumption that the disinfectant com-
pletely inactivates the contaminant underestimates  the mass  consumed given a more realistic disinfectant
and contaminant reaction dynamics as represented in the the E. coli-TSE model.
                                                    80

-------
Chapter 8

Source Inversion
If a contamination incident is detected by a water utility, it will be important to determine the time and
location where the contaminant injection occurred. Once this information is available, the current extent
of contamination within the network can be estimated and appropriate control and cleanup strategies can
be devised in order to protect the population.  The inversion subcommand included in WST is designed
to calculate  a list of possible injection nodes and times given  a set of measurments that could come from
manual grab samples and/or an event detection system (EDS).

Three major challenges  associated with source inversion calculations are addressed by this subcommand.
First, due to the currently available  water  quality sensor technology,  only a discrete yes/no indication of
contamination is available from  these sensors.  Therefore, the source inversion algorithms provided through
this subcommand are designed to work with binary measurements.  Second, measurement information avail-
able  from a  sparse set of fixed sensors might not be sufficient to narrow down the possible contamination
nodes to  a tractable number.  One strategy is to get additional measurements in the form of manual grab
samples from locations  around  the  network to help the source inversion calculations  better identify the
contamination location(s). The  location of these manual grab  samples can be carefully selected to provide
maximum distinguishability between the possible incidents by using the grabsample subcommand (described
in Chapter 9.  The source inversion algorithms in WST can use measurement information  from both fixed
sensors and  manual grab samples.  Finally, source inversion needs to happen in real-time. The provided
algorithms benefit  from  using the EPANET hydraulics model  and  the Merlion water quality model, along
with additional model reductions that help improve computational performance.  A flowchart representation
of the inversion subcommand is shown in Figure 8.1. The utility network model is defined  by an EPANET
compatible network models (INP format) in WST. The sensor/EDS measurements are  supplied through a
measurements  file (See File Formats  Section 11.7).  Additional details on the source inversion approach to
identify the contaminant injection location(s) is supplied by the user in the WST configuration file.
                                 Utility Network  / /  ..
                                    -it j i     //  Measurements
                                    Model
                                         Source Inversion
                                         Likely Injection
                                            Scenarios
                        Figure 8.1: Contamination source inversion flowchart.
                                                81

-------
8.1    Source Inversion Formulations

The  source inversion module contains two different source inversion formulations, one is a Mixed Integer
Programming (MIP) formulation while the other is based on Bayesian probability calculations. The following
subsections provide brief descriptions of these formulations.  For more detailed information, refer to Mann
et al. (2012b).

8.1.1  MIP Formulations

This formulation assumes that a field sensor  (or manual grab sample) would yield a positive measurement
if the contaminant concentration is above a certain positive threshold concentration and a negative mea-
surement if it is below a certain negative threshold concentration. Therefore, if a sensor measurement (or
a manual grab sample) yields a positive measurement, any corresponding calculated concentration from the
water quality model above the positive threshold is deemed to be a perfect fit with this measurement data.
Therefore, while constructing an objective for estimation, only calculated concentrations below this positive
threshold should be penalized. Likewise,  if a sensor (or manual grab sample) yields a negative measurement,
only the corresponding calculated concentration above the negative threshold should be penalized. The base
MIP formulation is presented followed by descriptions of three  additional  variations that perform  source
inversion under different assumptions. The base MIP formulation for discrete measurements can be selected
using the formulation option of MIP_discrete in the inversion block of the inversion WST configuration
file.
minimize
                             y,  ne9n,t +  / _,
                           (n,t)es_         (n,t)es+
               subject to   Gcnj =
                           0 < mnf < Byn

                           /  j yn _; J-max
                                                     neg
                                                     Vn e N, t e T
                                                     VneN,t eT
                                                     yn e {0,1}

                                                     V(n,t) e S_
                                                     V(n,t) e S+.
(8.1)
(8.5)
(8.6)
where N is the set of all nodes, T is the set of all time steps, S_ represents the set containing the node-time
step pairs where the discrete measurement is a negative detection, and S+ defines the set containing the node-
time step pairs where the discrete measurement is a positive detection. The variables G and D are matrices
from the linear water quality model. The variable cn_t is the calculated concentrations from the water quality
model at node n and time step t, m is the vector of unknown time-discretized contaminant injection profile
over all  node and time  steps, and mn^ is an  element  in the rn/j vector representing unknown mass injected
at node n and time step t. A binary variable, yn, indicates  contaminant injection at node n if yn=l, and
B is a reasonable upper bound on  the contaminant injection mass flow rate mnjt.  The variable Imax is the
maximum number of possible injection locations. The user supplies two thresholds, rneg  and rpos, which can
be used as  concentration set-points indicating the presence or absence of contaminant.  The variable negnj
is the non-negative difference between the modeled concentration  cn>t  and the user supplied threshold Tneg
for node-time step pairs belonging to S_.  The variable posn^ is the non-negative difference between the
modeled concentration  cnj and the user supplied threshold rpos for node-time step  pairs belonging to S+.
Equation 8.1 is the  MIP objective, which minimizes the mismatch between the discrete measurements and
their corresponding concentrations calculated from the  model given the detection thresholds. Equation 8.2
is the embedded linear  water quality model (Merlion). Note that the reduced version of this model given by
equation (ref) is used in place of the full model. Equation 8.3  is the big-M constraint that enforces bound on
the maximum mass flow rate of the injections. Equation 8.4 is the maximum number of injections constraint,

-------
while Equation 8.5 and Equation 8.6 are used to enforce negn^ andposn_t as non-negative differences between
modeled concentration and threshold for positive and negative measurements, rpos and rneg, respectively.

The source inversion problem is ill-posed with non-unique solutions.  To tackle this issue, the inversion
subcommand solves the problem multiple times, each time adding integer cuts to not include previously found
solutions until the objective at the solution has reduced significantly. Therefore, the final solution generated
by the inversion subcommand contains a list of objective values for each solve and the corresponding source
node that was identified for that particular solve. Because of the spatial and temporal diversity of the possible
contaminant  injection profiles,  obtaining reasonable solutions from the base formulation  (in the presence
of limited data)  can be  challenging.   Therefore, depending  on the possible contaminant injection profile
assumptions, additional restrictions can be applied to reduce the size of the search space. Figure 8.2 shows
the different  injection profile restrictions that  are supported through the different formulation variations.
When limited and/or less frequent measurement data is available  (e.g., only manual grab samples), the Step
or the No Decrease formulation variations are  recommended.
                        200
                        150
                        100
                         50
                          0123456789
                                                 Time


              Figure 8.2: Two different injection profiles used by the formulation variations.

The first variation of this formulation adds a constraint to allow for the case where the mass flow rate of the
contaminant can stay the same or increase with time. The No Decrease profile shown in Figure 8.2 gives an
example of this kind of injection. This variant of the formulation can be run using the formulation option
of MIP   discrete  nd in the inversion block.
                                                          Vn e N, t e T : j ^ 0
The second variation includes two additional constraints:
                                                                 Vne N,t eT
                                                                 Vne N,t eT
This variant requires the injection  to be a single step of any calculated strength S and can be run using
the formulation option of MIP_discrete_step in the inversion block. The Step profile shown in Figure 8.2
illustrates an example of this type of injection.
                                                 83

-------
The  third variation is a Linear Program (LP) that can be solved very quickly in the case where a single
incident node is sought. The LP can be solved efficiently by enumerating the binary variables. In this case,
each node is checked separately but fixing its corresponding binary variable to 1, and the remaining to zero.
This LP is solved for each of the nodes and the objective values are compared.  The LP variant of this
algorithm can be  run using the formulation option of LP_discrete in the inversion block.

8.1.2   Bayesian Probability Based Formulation

This formulation  calculates the probability of a node being the true injection node using Bayes rule:
where contamination incident  i is an injection at a node and at a particular time step and P(i\m) is the
probability of an incident  i given a set of measurements  m.  Here, P(i) is the prior probability of an
incident. This formulation assumes that only a single injection incident is possible, and therefore it uses an
uniform prior  of l/(all possible incidents).  Since it is difficult to estimate P(m)  (the  prior probability  of
a measurement), this calculation is substituted by obtaining the P(i\m) for all possible incidents and then
normalizing them to 1. Finally, P(m\i)  is the probability of  a measurement given an injection incident. It is
calculated using the following equation:
                               P(m\l) = (1 - pf)™atch(i)p™eas-match(i)                          (gll)

where, pf is the probability of measurement failure, meas is the total number of measurements and match(i)
is the number of discrete  measurements that match the discrete concentrations obtained by simulating
incident i. Note that calculating the discrete concentration profile obtained by simulating an incident requires
a threshold that is specified by using the negative threshold option of the inversion block of the inversion
configuration file.

After calculating the  normalized probability P(i\m) for  all  incidents, only the ones having a  probability
above a confidence limit are reported as the set of likely incidents. This confidence limit is by default set to
95% (0.95) and can be changed by using the confidence option of the inversion block.

8.2  Source Inversion Solvers

The inversion subcommand can be run using the two different algorithms, MIP or Bayesian.  The Bayesian
probability based algorithm only requires simulating incidents  and calculating  probabilities and does not
require  any optimization solvers.  The MIP algorithm requires standard MIP solvers to perform source
inversion.  The solvers recognized by the  inversion subcommand are the same as the ones recognized by
boosterjnip subcommand (See Section 7.2.3 for more details). The algorithm are referred to as optimization
(MIP) and Bayesian within the inversion WST configuration file.

8.3  inversion Subcommand

The inversion subcommand is executed using the following command line:
   ¥st inversion 
where conf igf ile is a WST configuration file in the YAML format.

The  --help option prints information about this subcommand, such as usage, arguments, and a brief de-
scription:
   ¥st inversion —help
                                                 84

-------
8.3.1   Configuration File

The inversion subcommand generates a template configuration file using the following command line:
   ¥st  inversion —template 
The inversion template configuration file is  shown in Figure  8.3.  Brief descriptions of the  options  are
included in the template after the # sign.
   # inversion configuration template
   network:
     epanet file:  NetS.inp         # EPANET network file name
   measurements:
     grab samples:  measures.dat    # Measurements file name
   inversion:
     algorithm: optimization       # Source inversion algorithm:  optimization or bayesian
     formulation:  MIP_discrete_nd  # Optimization formulation type,  optimization only
     model format:  PYOMO           # Source inversion optimization formulation:  AMPL or PYOMO
     horizon: 1440.0              # Amount of past measurement data to use (min)
     num injections: 1.0           # No. of possible injections
     measurement  failure: 0.05     # Probability that a sensor fails
     positive threshold:  100.0     # Sensor threshold for positive contamination measurement
     negative threshold:  0.1       # Sensor threshold for negative contamination measurement
     feasible nodes: null          # Feasible source nodes
     candidate threshold: null     # Objective cut-off for candidate nodes.
     confidence:  null             # Probability confidence for candidate nodes.
     output impact nodes: false    # Print likely injection nodes file
   solver:
     type: glpk                   # Solver type
     options:                     # A  dictionary of solver options
     logfile: null                # Redirect solver output to a logfile
     verbose: 0                   # Solver verbosity level
     initial points: []
   configure:
     output prefix: Net3           # Output file prefix
     debug: 0                     # Debugging level, default = 0
                         Figure 8.3:  The inversion configuration template file.

8.3.2   Configuration Options

Full descriptions of the WST configuration options used by the inversion subcommand are listed below.
network
     epanet file
          The name of the EPANET input (INP) file that defines the water distribution network model.
          Required input.
measurements
     grab samples
          The name of the file that contains all the measurements from the manual grab samples and the
          fixed sensors. The measurement file  format is documented in File Formats Section 11.7.
          Required input.
inversion
     algorithm
          The algorithm used to perform source inversion. The options are optimization or bayesian. The
          optimization algorithm requires  AMPL or PYOMO along with  a MIP solver.   The bayesian
          algorithm uses  Bayes'  Rule to update probability of a particular node being the contaminant

                                                  85

-------
     source node.
     Required input, default = optimization.
formulation
     The  formulation  used by  the  optimization  algorithm.   The options are LP_discrete  (dis-
     crete LP), MIP_discrete (discrete MIP), MIP_discrete_nd (discrete MIP with no decrease) or
     MIP_discrete_step (discrete MIP for step injection).
     Required input for optimization algorithm, default = MIP_discrete.
model format
     The modeling language used to build the formulation specified by  the formulation option. The
     options are AMPL and PYOMO. AMPL is a third party package that must be installed by the
     user if this option is specified. PYOMO is an open source software package that is distributed
     with WST.
     Required input for optimization algorithm, default = PYOMO.
horizon
     The minutes over which the past measurement data is used for source inversion. It is calculated
     backwards from the latest measurement time in the measurements file. If the horizon is longer than
     the time between the latest measurement and simulation start time, then all the measurements
     are used for source inversion.
     Required input, default = None (Start of simulation).
num injections
     The number of possible injections to consider  when performing inversion. Multiple injections are
     only  supported by the MIP formulation.  This value must be set to 1 for the LP model or the
     probability algorithm.
     Required input for optimization algorithm, default = 1.
measurement failure
     The probability that a sensors gives an incorrect reading. Must be between 0 and 1.
     Required input for the bayesian algorithm, default = 0.05.
positive threshold
     The concentration threshold used by the sensors to flag a positive detection measurement.  This
     is a parameter in the optimization algorithm (Equation 8.6).
     Required input for optimization algorithm, default = 100 mg/L.
negative threshold
     The concentration threshold used by the sensors to flag a negative detection measurement.  This
     is a parameter in the optimization algorithm (Equation 8.5).
     Required input for optimization algorithm, default = 0.1 mg/L.
feasible nodes
     A list that defines nodes that can be considered for the source inversion problem. The options are:
     (1) ALL, which specifies all nodes as feasible  source locations;  (2) NZD, which specifies all non-
     zero demand nodes as feasible source locations; (3)  a list of EPANET node IDs, which identifies
     specific nodes as feasible source locations; or (4) a filename, which is a space or comma separated
     file containing a list of specific nodes as feasible source locations.
     Optional input.
candidate threshold
     The objective cut-off value for candidate contamination  incidents  using the optimization  algo-
     rithm. The objective value represents the likelihood of a particular node being the injection node
     (See Equation 8.13). The objective values are normalized to 1 and only the nodes having  their
     objective values greater or equal to the threshold are reported in  the inversion results.
                                           86

-------
          Required input for optimization algorithm, default = 0.20.
     confidence
          The probability cut-off value for candidate contamination incidents using the bayesian algorithm.
          The value is between 0 and f.
          Required for the bayesian algorithm, default = 0.95.
     output impact nodes
          A option to output a Likely_Nodes.dat  file that contains only  the  node IDs of the possible
          contaminant injection nodes obtained from the inversion subcommand. This file can be used as
          the feasible nodes for the next iteration of the inversion subcommand to only consider this set
          of possible contaminant injection nodes.
          Optional input, default = false.
solver
     type
          The solver type.
          Required input.
     options
          A list of options associated with a specific solver type.
          Optional input.
     logfile
          The name of a file to output the results of the solver.
          Optional input.
     verbose
          The solver verbosity level.
          Optional input, default = 0 (lowest level).
     initial points
          nodes
             A list of node locations (EPANET IDs) to begin the  optimization  process.
             Optional input.
          pipes
             A list of pipe locations (EPANET IDs) to begin the optimization process.
             Optional input.
configure
     output prefix
          The prefix used for all output files.
          Required input.
     debug
          The debugging level that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).

8.3.3  Subcommand Output

The  _inversion.json  file contains a summary of the inversion subcommand results. This
file includes a list of node locations along with their objective or probability values, in which a higher value
indicates a higher likelihood of that node being the true contaminant injection node. The injection profiles
(start and end times and the strength) for each possible  injection node are also included along with the run
date and CPU time for inversion. The inversion subcommand also outputs an _profiles.tsg

                                                87

-------
file that contains the list of likely injection profiles in the TSG file format (See File Formats Section 11.12).
As discussed in Chapter 9, this TSG file can be directly used as an input by the grabsample subcommand.

8.4    Source Inversion Example

Two examples illustrating the use of different source inversion formulations are presented.  In the first example
the optimization formulation is used to solve a source inversion problem, while the second example uses the
Bayesian probability formulation to solve the same problem.

8.4.1  Example 1

An EPANET network model  (INP format) and  a measurements file (See File Formats Section 11.7) are
required  to  run the inversion subcommand.   The configuration file, inversion_exl.yml, shown in  Fig-
ure 8.4 is used  to  identify the possible contaminant source locations.  The MIP optimization formulation,
MIP_discrete_step, is used  for this example. The EPANET Example Network 3 INP file, NetS.inp, is used
as the network file.  The network runs a two day hydraulic and water quality simulation.  The example
also uses the Net3_MEASURES.dat file,  which contains the sensor measurements based upon simulating
an injection at  node 151 from 8 hours and until  24 hours and converting concentrations at five optimally
selected fixed sensor locations into binary  measurements.
   network:
     epanet  file: Net3/Net3. inp
   measurements:
     grab samples: Net37Net3_MEASURES.dat
   inversion:
     algorithm: optimization
     formulation: MIP_discrete_step
     model format: PYOMO
     horizon:  null
     num injections: 1.0
     measurement failure:  0.05
     positive  threshold: 100.0
     negative  threshold: 0.1
     feasible  nodes: null
     candidate threshold:  0.2
     confidence: 0.95
     output  impact nodes:  false
   solver:
     type: 'glpk'
     options:
     logfile:  null
     verbose:  0
     initial points: []
   configure:
     output  prefix: inversion_exl/Net3
     debug:  0
                      Figure 8.4: The inversion configuration file for example 1.

The example can be executed using the following command line:
   ¥st inversion inversion_exl.yml
The results are contained in the file Net3_inversion_results.json.  A section of this results file is shown in
Figure 8.5. The results contain a list of sets where each set contains - possible contaminant source node in
the "Nodes" list (which contains a node "Name" and a "Profile"), CPU computation time in seconds, and the
"Objective" value corresponding to the solution which identifies that node as the source node.  The "Objective"
value  for each candidate  node n in  the results file is related to the objective of the MIP formulations 8.1.1
(Equation 8.1). The objective calculated from the MIP formulation is transformed such that it is normalized

                                                 88

-------
to 1 and a higher value mean a higher likelihood of a node being the source node.  This transformation is
done by the following equations:
                                                    OR T
               INV  NORM  OBJn = l --      -"  '" T,                  VneN          (8.12)
                   ~~      ~~            max(FORM_OBJ)                                   v    '
                                        INV  NORM  OBJn                                    .
                                      max(INV_NORM_OBJ)

where FORM_OBJn (Formulation Objective) is the objective value as calculated from the MIP formulation
Equation 8.1 when node n is identified as the most likely node, INV_NORM_OBJn is an intermediate
variable that represents 1 minus  the normalized formulation objective, and Objectiven is the normalized
form of the INV_NORM_OBJn which is reported in the inversion subcommand results file.
This results file only contains the list of possible contaminant source nodes that have an "Objective" (as calcu-
lated by 8.13) greater than the Candidate threshold provided in the inversion block of the WST configuration
file.
For this example scenario, the inversion  subcommand is able to correctly identify node  151 as one of the
two most likely source nodes.  This means that given the current measurement information available, both
nodes 151 and 149 are equally likely. Further  measurements  can be obtained from selected  grab sampling
locations that can help in distinguishing between these two potential source nodes.

8.4.2  Example 2

In this example, the Bayesian probability formulation is used to solve the same problem described in Example
1. The configuration file, inversion_ex2.yml, shown in Figure 8.6, is used to run this example. The bayesian
formulation is selected by using the algorithm option in the inversion block.
The example can be executed using the following command line:
   ¥st inversion inversion_ex2.yml
The results  are contained in the file  Net3_inversion_results.json  shown in Figure 8.7.  The  "Objective"
value reported in this file corresponds  to the probability value calculated by Equation 8.10. The probability
algorithm is also able to correctly identify node 151 as the one of the three most probable source nodes along
with nodes 149 and 153.
                                                 89

-------
"Objective":  1.0,
"Nodes" :  [
  {
    "Profile" :  [
        "Start":  39900.0,
        " Strength" :  100000 . 0 ,
        "Stop":  172800.0
    "Name":  "149"
"CPU time":  293.88805079460144
"Objective":  1.0,
"Nodes" :  [
  {
    "Profile" :  [
      {
        "Start": 24900.0,
        " Strength" :  100000 . 0 ,
        "Stop":  172800.0
    "Name":  "151"
"CPU time":  293.88805890083313
"Objective":  1.0,
"Nodes" :  [
  {
    "Profile" :  [
      {
        "Start": 33900.0,
        " Strength" :  100000 . 0 ,
        "Stop":  172800.0
    "Name":  "153"
"CPU time":  293.8880648612976
"Objective":  0.96231012730554,
"Nodes" :  [
  {
    "Profile" :  [
      {
        "Start":  16500.0,
        "Strength":  2281.5155571,
        "Stop":  172800.0
    "Name":  "125"
"CPU time":  293.88806986808777
"Objective" :  0.27428129040296345,
"Nodes":  [

    "Profile" :  [
                 Figure 8.5: The inversion JSON output file for example 1.

-------
network:
  epanet  file:  Net3/Net3. inp
measurements:
  grab samples:  Net37Net3_MEASURES.dat
inversion:
  algorithm: bayesian
  formulation:  MIP_discrete_step
  model format:  PYOMO
  horizon:  null
  num injections:  1.0
  measurement  failure: 0.05
  positive  threshold:  100.0
  negative  threshold:  0.1
  feasible  nodes:  null
  candidate threshold: 0.2
  confidence:  0.95
  output  impact nodes: false
solver:
  type: 'gurobi'
  options:
  logfile:  null
  verbose:  0
  initial points:  []
configure:
  output  prefix:  inversion_ex2/Net3
  debug:  0
                     Figure 8.6:  The inversion configuration file for example 2
        "Nodes":  [
            {
                "Name":  "149"
            }
        ],
        "Objective": 0.333107


        "Nodes":  [
            {
                "Name":  "151"
            }
        ],
        "Objective": 0.333107


        "Nodes":  [
            {
                "Name":  "153"
            }
        ],
        "Objective": 0.333107
                     Figure 8.7:  The inversion JSON output file for example 2.
                                                  91

-------
Chapter 9

Grab  Sampling
When source inversion is performed following initial detection of a contamination incident, it is likely that
the identified set of possible injection locations is fairly large due to the limited measurement information
available at the early stages of detection. As time progresses, more measurements become available to help
decrease the number of possible injection locations.  It is possible to optain additional measurements in the
form of grab samples from optimally selected locations that can help in quickly narrowing down the set of
likely incedent locations when source inversion calculations are performed again using these  grab sample
measurements.  The grabsample subcommand can  be used to identify optimal  grab sample locations that
could help reduce the number of possible injection locations identified from the inversion subcommand.
A flowchart representation of the grabsample subcommand is shown in Figure 9.1. The required input for
the grabsample subcommand include a utility network model specified with an EPANET compatible input
file (INP) and a list of likely injection scenarios.
                                                        Utility Network
                                                            Model
                              ikely Injection
                               Scenarios
Grab Sampling
                                                        Grab Sample
                                                         Locations
                                Figure 9.1: Grab sampling flowchart.


9.1    Grab Sampling Formulation

Considering two possible contamination incidents i and j, if a particular sample location is impacted by
incident i,  but not impacted by incident j,  then this sample location is able to distinguish between the
two incidents. The grabsample subcommand can be used to identify grab sample locations that maximize
the number of pairwise distinguishable incidents in a list of possible contamination incidents. The _profiles.tsg obtained from the inversion calculations contains a list of possible injection locations.
The  data sets required by the  optimization formulation below  are obtained by  simulating each possible
incident using the EPANET hydraulics model and the Merlion water quality model (Mann et al.,  2012b).

-------
The grab sampling problem formulation is:
                     maximize
                              (i,j)ePE
                   subject to   V"^  sn > d^                         V (i, j) (E PE

                              ^sn< Smax + \F\                                             (9.3)

                              sn e {0,1}                            Vn e G                    (9.4)
                              sn = l                                Vn e F                    (9.5)
                              0
where conf igf ile is a WST configuration file in the YAML format.

The  --help option prints information about this subcommand, such as usage, arguments, and a brief de-
scription:
   ¥st grabsample —help
9.3.1  Configuration File

The grabsample subcommand generates a template configuration file using the following command line:
   ¥st grabsample —template 
The grabsample template configuration file is shown in Figure 9.2.  Brief descriptions of the options are
included in the template after the # sign.  The grabsample subcommend requires likely scenarios, which are
set in the  [scenario] block. These scenarios must be defined using a TSG file or by specifying the scenario

                                                93

-------
location, type, strength, start and stop time (see Section 3.2 for more information on defining scenarios).
In general,  the TSG file created by the inversion subcommand will be used to  define likely scenarios.
EPANET-MSX options are not used by grabsample and Merlion is always used as the water quality model.
   # grabsample configuration
   network:
     epanet file:  NetS.inp
   scenario:
     location: null
     type: null
     strength: null
     species: null
     start time: null
     end time: null
     tsg file: null
     tsi file: null
     msx file: null
     msx species:  null
     merlion: false
   grabsample:
     model format: PYOMO
     sample time:  720.0
     threshold: null
     fixed sensors: null
     feasible nodes: null
     num samples:  null
     greedy selection: false
   solver:
     type: glpk
     options:
     logfile: null
     verbose: 0
     initial points:  []
   configure:
     output prefix: Net3
     debug: 0
template

# EPANET network file name

# Injection location: ALL, NZD or EPANET ID
# Injection type: MASS, CONCEN, FLOMPACED, or SETPOINT
# Injection strength [mg/min or mg/L depending on type]
# Injection species, required for EPANET-MSX
# Injection start time  [min]
# Injection end time [min]
# TSG file name, overrides injection parameters above
# TSI file name, overrides TSG file
# Multi-species extension file name
# MSX species to save
# Use Merlion as WQ simulator, true or false

# Grab sample model format: AMPL or PYOMO
# Sampling time (min)
# Contamination threshold. Default 0.001
# Fixed sensor nodes
# Feasible sampling nodes
# Maximum number of grab samples, default = 1
# Perform greedy selection. No optimization

# Solver type
# A dictionary of solver options
# Redirect solver output to a logfile
# Solver verbosity level
# Output file prefix
# Debugging level,  default = 0
                         Figure 9.2: The grabsample configuration template file.


9.3.2   Configuration Options

Full descriptions of the WST configuration options used by the grabsample subcommand are listed below.
network
     epanet file
          The name of the EPANET input (INP) file that defines the water distribution network model.
          Required input.
scenario
     location
          A list that describes  the injection locations for the contamination scenarios.  The options are:
          (1) ALL, which  denotes all nodes (excluding tanks and reservoirs) as contamination injection
          locations;  (2) NZD, which  denotes all nodes with non-zero demands as contamination injection
          locations;  or (3)  an EPANET node ID, which identifies the node where contamination is intro-
          duced. This allows easy specification of single or multiple contamination scenarios.
          Required input unless a TSG or TSI file is specified.
     type
          The injection type for the contamination scenarios. The options are MASS, CONCEN, FLOW-
                                                   94

-------
     PACED or SETPOINT. See the EPANET manual for additional information about source types
     (Rossman, 2000).
     Required input unless a TSG or TSI file is specified.
strength
     The amount of contaminant injected into the network for the contamination scenarios. If the type
     option is MASS, then the units for the strength are in mg/min. If the type option is CONCEN,
     FLOWPACED or SETPOINT, then units are in mg/L.
     Required input unless a TSG or TSI file is specified.
species
     The name of the contaminant species injected into the network. This the name of a single species.
     It is required when using EPANET-MSX, since multiple species might be simulated, but only one
     is injected into the network.  For cases where multiple contaminants are injected, a TSI file is
     needed.
     Required input for EPANET-MSX unless a TSG or TSI file is specified.
start time
     The injection start time that defines when the contaminant injection begins. The time is given in
     minutes and is measured from the start of the simulation. For example,  a value of 60 represents
     an injection that starts at hour 1 of the simulation.
     Required input unless a TSG or TSI file is specified.
end time
     The injection  end time that defines when the contaminant injection stops.  The time is given in
     minutes and is measured from the start of the simulation.  For example, a value of 120 represents
     an injection that ends at hour 2 of the simulation.
     Required input unless a TSG or TSI file is specified.
tsg file
     The name of the TSG scenario file that  defines the ensemble of contamination scenarios to be
     simulated. Specifying a TSG file will override the location, type, strength, species, start and end
     times options specified in the WST configuration file.  The TSG file format is documented in File
     Formats Section 11.12.
     Optional input.
tsi file
     The name of  the TSI scenario file that defines the ensemble of  contamination scenarios to be
     simulated. Specifying a TSI file will override the TSG file, as well  as the location, type, strength,
     species, start and end time options specified in the WST configuration file. The TSI file format
     is documented in File Formats Section  11.13.
     Optional input.
msx file
     The name of the EPANET-MSX multi-species file that defines the multi-species reactions to be
     simulated using EPANET-MSX.
     Required input for EPANET-MSX.
msx species
     The name of the MSX species whose concentration profile will be saved by the EPANET-MSX
     simulation and used for later calculations.
     Required input for EPANET-MSX.
merlion
     A flag (true or false) to indicate if the Merlion water quality simulator should be used. If an MSX
     file is provided, EPANET-MSX will be used.
                                          95

-------
          Required input, default = false.
grabsample
     model format
          The modeling language used to build the formulation specified by the model format option.  The
          options are AMPL and PYOMO. AMPL is a third party package that must be installed by the
          user if this option is specified. PYOMO is  an open source software package that is distributed
          with WST.
          Required input, default = PYOMO.
     sample time
          The time at which the manual grab sample is should be taken. The algorithm determines the
          best possible manual grab sample location(s) based upon this time.  Units: Minutes from the
          simulation start time in the EPANET INP file.
          Required input.
     threshold
          This threshold determines wheather or not an incident impacts a candidate sample location.
          Required input, default = 0.001.
     fixed sensors
          A list  that defines nodes that are already fixed continuous sensor locations. The options are: (1)
          ALL, which specifies all nodes as fixed sensor locations; (2) NZD, which specifies non-zero demand
          nodes as fixed sensor locations; (3) NONE, which specifies no nodes as fixed sensor locations; (4)
          a list  of EPANET node IDs, which identifies specific nodes as fixed  sensor locations;  or (5) a
          filename, which is  a space or comma separated file containing a list  of specific nodes as fixed
          sensor locations.
          Optional input.
     feasible nodes
          A list  that defines nodes that can be considered as potential sampling locations for the optimal
          sample location problem. The options are: (1) ALL, which specifies all  nodes as feasible sampling
          locations; (2) NZD, which specifies all non-zero demand  nodes as feasible sampling locations; (3)
          a list of EPANET node IDs, which identifies specific nodes as feasible  sampling locations; or (4)
          a filename, which is a space or comma separated file containing a list of specific nodes as feasible
          sampling locations.
          Optional input.
     num samples
          The maximum number of locations that can be sampled  at one time. This is usually equal to the
          number of sampling teams that are available.
          Required input, default = 1.
     greedy selection
          The option to  select manual grab sample locations based upon a greedy search. This does not
          require any optimization.
          Optional input.
solver
     type
          The solver type.
          Required input.
     options
          A list  of options associated with a specific solver type.
                                                96

-------
          Optional input.
     logfile
          The name of a file to output the results of the solver.
          Optional input.
     verbose
          The solver verbosity level.
          Optional input, default = 0 (lowest level).
     initial points
          nodes
              A list of node locations (EPANET IDs) to begin the optimization process.
              Optional input.
          pipes
              A list of pipe locations (EPANET IDs) to begin the optimization process.
              Optional input.
configure
     output prefix
          The prefix used for all output files.
          Required input.
     debug
          The debugging level  that indicates the amount of debugging information to output to the screen.
          Optional input, default = 0 (lowest level).

9.3.3  Subcommand Output

The _grabsample.json file contains a summary of the grabsample subcommand results. This
file includes a  list of node locations (EPANET IDs) to take grab samples along  with their corresponding
downstream pipe  (EPANET IDs).  In addition, a few important options set in  the configuration file  are
listed.  The objective, which represents the number of pairwise incidents that will be distinguished by  the
grab samples, is also listed in the JSON file. Run date and CPU time are included in the JSON file.

9.4    Grab  Sampling Example

An EPANET network model (INP format) and  a file containing a list of possible injection scenarios (e.g., a
TSG file which is generated by  the inversion module) to run  the grabsample subcommand.  The configu-
ration file for this example,  grabsample_exl.yml, is shown in Figure 9.3. The EPANET Example Network
3 INP  file, NetS.inp, is used for this  example.  A list of eight  equally likely contamination injection loca-
tions are  defined in the Net3_gs_profiles.tsg file.  The information in this file was obtained  by simulating a
contaminant injection at node 251 at  24 hours,  detecting the injection at 30.5 hours by using a set of fixed
sensor locations defined in the Net3_grabsample_fixed_sensors file and identifying possible contamination
injection  locations  by using the inversion subcommand with the Bayesian algorithm  defined in Section
8.1.2. The sample time is set to 1890 minutes (31.5 hours), since it is  assumed that it takes 60 minutes to
perform source inversion and obtain the samples (including travel time). The maximum number of samples
that  can be taken is two.
                                                97

-------
   network:
     epanet file: Net3/Net3. inp
   scenario:
     location: null
     type: null
     strength: null
     species: null
     start time: null
     end time: null
     tsg file: Net3/Net3_gs_profile.tsg
     tsi file: null
     msx file: null
     msx species: null
     merlion: false
   grabsample:
     model format: PYOMO
     sample time: 1890.0
     threshold: null
     fixed sensors: Net3/Net3_grabsample_fixed_sensors
     feasible nodes: null
     num samples: 2
     greedy selection: false
   solver:
     type: glpk
     options: {}
     logfile: null
     verbose: 0
     initial points: []
   configure:
     output prefix: grabsample_exl/Net3
     debug: 0
                       Figure 9.3: The grabsample configuration file for example 1.
The example can be executed using the following command:
   ¥st  grabsample grabsample_exl.yml
The results are available in the Net3_grabsample_results.json, which is shown in Figure 9.4.  The grab
sampling locations identified are nodes 241 and 251. Twenty three pairwise incidents will be distinguished
after taking the samples at these locations. To reiterate the configuration parameters, the sampling time is
1890 minutes and the maximum number of sampling locations is two.
       "Nodes": [
           "241",
           "251"
       ],
       "objective": 23.0,
       "sampleCount":  2.0,
       "sampleTime": 1890.0,
       "threshold": null
                        Figure 9.4:  The grabsample JSON output for example 1.
                                                   98

-------
Chapter 10

Advanced Topics  and  Case  Studies
This chapter provides more background information on the Merlion water quality model and discusses a few
of the more advanced topics for the sensor placement problem.

10.1   Merlion Water  Quality Model

The Merlion water quality model formulation ensures mass balances at all junctions, pipes and tanks. The
following mass balance equations describe the transport of a species inside the network.  For simplicity,
complete instantaneous mixing is assumed for the tanks, and plug flow is assumed for the pipes.
                                                                                            101
                               ier°(t)
                                       Qi(t)cl(t) + mn(t)
53  Qi(t)-
                                                      53
                                   at
Cn(t), Vne ST   (10.2)
                                                                                           (10.3)
where cn and mn denotes the concentration and mass injected, respectively. The variable J is a set of all
junctions, ST is a set of all storage tanks, and P is a set of all pipes.  The variable Q denotes volumetric
flowrates that are pre-calculated using EPANET and are assumed to be piecewise  constant for each time
interval.  The flowrate of a known external source entering a node is also pre-calculated and is denoted by
Qfxt .  The variable  F° represents the set of all pipes with flow going  away from  node n.  Similarly,  F^
represents the set of all pipes with flow coming into node n.

Equation 10.1 represents a set of algebraic equations dependent on time alone and Equation 10.2 represents
a set of ODEs also  dependent on time alone.  Therefore, these two equations can  be discretized in time.
However, discretizing Equation 10.3, which are PDEs, in both time and space would lead to a prohibitively
large model. Instead, these pipe balance PDEs are replaced using  an origin-tracking algorithm.  This al-
gorithm is based on the Lagrangian method; however, instead of tracking concentration values as packets
of water moving through  the network, the origin-tracking algorithm tracks the originating node and time

                                               99

-------
slop of each packet, as it enters a pipe (sec Figure 10.1).  Once the water packet exits the pipe, equations
are written relating the concentration of the pipe inlet and outlet to the concentration of connected nodes
based on time delay.  These time  delay expressions are formulated  for each pipe independently.  Therefore,
the algorithm scales favorably for a large water distribution  system having a linear computational cost as
the size of the network increases.
              Timestep


               t = 1



               t = 3



               t = 5
                                        Flow direction
Equations
                         Figure 10.1:  Illustration of the origin tracking algorithm.


The time delay expressions are included in the mass balance equations to form a large but very sparse linear
system relating input injections  (m)  from all nodes and time  steps  to output concentrations (c) from all
nodes  and time steps.
                                                                                                      14)
Unlike black  box simulations, this linear  model  can be  extended  and embedded inside other numerical
applications.  For example, the water quality model can be embedded inside  a mathematical programming
formulation for applications like booster placement, source inversion and optimal grab sampling.

 \liii li >i mul.ii Mr.1 i h<  Inn ,u ^ ~ii in  pi i lui nun1-1 ,i 11 ,n in1-1  -iniiil.il inn i1- ~l i ,i i1-1 hi |i M" ,u 11  Fif-l  >ui iiiji < I h >n
I'M ill h  i ;?M i1-  ^1" i ilii 11  Thi n  I hi  i^ti-l'ini-l,iili>M'i|,uii|lin.illt l> H I -i il\i 11  Im I hi  in I'  i M I  11 tin i nl til h 'ii
piu|]|e i  Thi1- [>ii M i  -,-, i- l.i-l  ,u n 11  i n tin iji i II h ii nl  'i lii n -.imiil.UniL' ,1 |,n IM> i HM ml ill  i il li,n my -imiilril h >n~
In tin-1 ,i-e the -"-tt-111 i- l.ii ti iiized i ail e  md H 1 'HI k-i >b e i- peili amed ti a H.II h -imuldtii ai   T > »et additii .11 il
^pHHilup ,1 tlllnlHil "-uhel i^  ,d--i) pli'Aldi-'l tint t.ikn1-  id^Ultlue i it  tllH -tlllitlllH ul the llliH,il  ^v-tHln  1,'
p^imntinu nntux (4  mtu L IT-HI  tinnuuLu ^huh IHHIMAH^  the need t a  ,ur l,n tuiiiHtiun  Ihe tulnied
^I'hni  il^n utilize^ tliH Bi-n  Lmeu  llgebi \ Sub-itnitme- iBLA^i libi.u*' tu ppitnim multiple  Imk^Mbe^
1 i Mlle^p, illdlllU t'i multiple lllleiTli'll  -< en,illi >~) lllule eftl, lent!'   Jrul  iddltlnllal llll'illlH tl'ill  ill'lilt Melllull
refer to Mann et al. (2012a).

-------
10.2   Average-case Sensor Placement

The sp subcommmand can design a sensor network for contamination warning systems (CWSs)  using a
variety of different optimization formulations. The most widely studied sensor placement formulation for
CWS design is to minimize the expected impact of an ensemble of contamination incidents given a sensor
budget. This formulation has also become the standard formulation for sp, because it can be effectively used
to select sensor placements in large water distribution networks.  This chapter provides a variety of examples
that illustrate  the application of the sp subcommand for this optimization formulation. Sensor placement
formulation and  examples illustrating common use of sp are included in Chapter 5.

10.2.1   Computing  a Bound on  the Best Sensor Placement Value

The GLPK solver provides upper and lower bounds on the value of the final solution. This is true for all of
the MIP solvers  used by WST.  For large water distribution systems, it might be prohibitively expensive to
perform optimization with a MIP solver. However, these solvers  can still be used to compute a lower bound.
The configuration file shown in Figure 10.2 defines a sensor placement problem with the compute bound
option set to true in the problem block. This option indicates that the goal for the optimizer is to compute
a lower bound on the globally optimal  solution, rather than finding a sensor  placement. All other  options
are those previously defined in example 3 of the Sensor Placement Examples (See Section 5.4).
   impact data:
   - name:  impact 1
    impact file: Net3_ec.impact
    nodemap file: NetS.nodemap
    directory: Net3
   objective:
   - name:  obj1
    goal:  impact 1
    statistic: MEAN
   constraint:
   - name:  constl
    goal:  NS
    statistic: TOTAL
    bound:  5.0
   problem:
    type:  default
    objective: objl
    constraint: constl
    presolve: True
    compute bound: True
    compute greedy ranking: False
   solver:
    type:  glpk
    options:  {}
    logfile:  null
    verbose:  0
   configure:
    output prefix: sp_bound/Net3
    debug:  0
         Figure 10.2:  The sp configuration file using the GLPK solver to compute a lower bound.

The JSON output file in Figure 10.3 contains the lower bound value.  This is the same value as the solution
generated by the  GRASP heuristic in example 3 of the Sensor Placement Examples (See Section 5.4). In
this manner, a MIP solver can be used to evaluate whether a heuristic sensor placement is near-optimal. In
addition, the JSON file list the options, such as solver type, problem type and modeling language, that were
used to create the results. Other information contained in the JSON  file are the run date and CPU time.
                                                101

-------
     "solver type":  "glpk",
     "run date":  "2013-08-22 15:38:54"
     "problem type":  "default",
     "lower bound":  8655.806351,
     "CPU time":  12.888122081756592,
     "EPANET node ID":  [] ,
     "modeling language": "pyomo",
     "upper bound":  null,
     "objective": null,
     "node ID":  []
               Figure 10.3: The sp JSON file with the lower bound from the GLPK solver.


The Lagrangian  heuristic leverages the structure of the eSP  model  (Equations 5.1)  to  guide  its search.
Specifically, this  heuristic computes the optimal values for the integer relaxation of eSP  and then applies
a randomized rounding technique. As a consequence, this heuristic can also be used to compute the same
bound as a MIP  solver.  The configuration file in Figure  10.4 uses the Lagrangian solver  to determine the
sensor placement for example 3 of the Sensor Placement Examples (See Section 5.4).
   impact data:
   - name: impact 1
     impact file: Net3_ec.impact
     nodemap file: NetS.nodemap
     directory: Net3
   objective:
   - name: obj1
     goal: impact 1
     statistic: MEAN
   constraint:
   - name: constl
     goal: NS
     statistic: TOTAL
     bound:  5.0
   problem:
     type: default
     objective: objl
     constraint: constl
     presolve: True
     compute bound: False
     compute greedy ranking: False
   solver:
     type: lagrangian
     options: {}
     logfile: null
     verbose: 0
   configure:
     output prefix: sp_bound_lag/Net3
     debug:  0
         Figure  10.4: The sp configuration file using the Lagrangian solver to compute a bound.

The JSON output file in Figure  10.5 shows the results of of sensor placement using the Lagrangian solver for
example 3  of the Sensor Placement Examples  (See Section 5.4). It contains the sensor locations (EPANET
IDs),  the objective value (the impact metric value), the lower bound on this objective as well as the upper
bound,  which is the same as the objective value.  In addition, the JSON file list the options,  such as solver
type,  problem type and modeling language, that were used to create the results.  Other information contained
in the JSON file are the run date and CPU time. The sensor locations identified are nodes 139, 161, 191


                                                 102

-------
and 208.  The mean extent of contamination (EC) impact for this design is approximately 9889 pipe feet
contaminated. The lower bound is approximately 9819 pipe feet contaminated, so the sensor network design
provided was not the optimal solution.
     "solver type":  "lagrangian" ,
     "run date":  "2013-08-22 15:40:39"
     "problem type": "default",
     "lower bound" :  9819.335391,
     "CPU time":  0.36084604263305664,
     "EPANET node ID": [
       [
        "139",
        "161",
        "191",
        "208"
     "modeling language": "none"
     "upper bound": 9889.90378,
     "objective":  9889.90378,
     "node ID" :  [
       [
        27,
        37,
        53,
        64
            Figure 10.5:  The sp JSON file with the lower bound from the Lagrangian solver.

As with  MIP  solvers,  the Lagrangian solver can also be used to simply compute this lower bound.  The
configuration file in Figure 10.6 shows an example of using the compute bound option in the problem block
with the Lagrangian solver.

10.2.2   Managing Sensor Placement Locations

By default, the sp subcommand assumes that all node locations in a water distribution network are feasible
sensor locations.  In practice, this is a generous assumption, since sensors cannot be practically installed in
many locations without significant cost and inconvenience.  The location block in the configuration file is
used to specify options for declaring feasible and infeasible node locations in the network.  Additionally, the
location block can be used to declare node locations as fixed, where a sensor must be placed, and unfixed,
where a fixed sensor is removed.
A location block consists of a list of declarations that are interpreted in their order within  the configuration
file.  Each declaration consists of a dictionary with a single key, whose value is either a string or list of
EPANET node IDs.  For example, the following location block declares a list of infeasible node locations:
   location:
   - infeasible nodes:
      - 119
      - 141
      - 193
      - 207
      - 239
To demonstrate the effect of declaring some nodes as infeasible sensor locations, example 1 of the Sensor
Placement Examples (See Section 5.4)  can be used.  The  solution from this example placed sensors  at

                                                 103

-------
   impact data:
   - name:  impact 1
    impact file: Net3_ec.impact
    nodemap file: NetS.nodemap
    directory:  Net3
   objective:
   - name:  obj1
    goal:  impact 1
    statistic:  MEAN
   constraint:
   - name:  constl
    goal:  NS
    statistic:  TOTAL
    bound:  5.0
   problem:
    type:  default
    objective:  objl
    constraint:  constl
    presolve: True
    compute bound: True
    compute greedy ranking: False
   solver:
    type:  lagrangian
    options: {}
    logfile: null
    verbose: 0
   configure:
    output prefix: sp_bound_only_lag/Net3
    debug:  0
    Figure 10.6: The sp configuration file using the Lagrangian solver and the compute bound option.


nodes 119, 141, 193,  207 and 239 and the mean extent of contamination (EC) for this sensor design was
8499. If these nodes were listed as infeasible sensor locations (as shown in the location block above) in the
configuration file, the new sensor locations are nodes 115, 151, 185, 211 and 262. The mean EC for this new
solution is 9169, which is expected since this sensor network design would be less effective than the design
when any location could have been selected.

10.2.3   Limited-Memory Sensor Placement Techniques

Controlling memory usage is a critical issue for solving large sensor placement formulations.  This is a
particular challenge for  MIP  methods, but both the GRASP and Lagrangian heuristics can have difficultly
solving very large  problems on a typical workstation. A variety of mechanisms have  been integrated into
the sp subcommand to  reduce the problem representation size while preserving the structure of the sensor
placement problem. These techniques include: scenario aggregation and filtering, feasible locations, witness
aggregation, skeletonization and memory management.

Scenario aggregation is one possible  strategy to reduce memory. The scenarioAggr executable compresses
the impact file while preserving the fundamental structure of the impact file and  it  is appropriate when
optimizing for mean performance objectives. See the Executable Files Section 12.3 for  more information on
s c enar i oAggr.

Filtering impacts can also reduce memory requirements for sensor placement.  The f ilter_impacts exe-
cutable can limit the sensor placement to only consider contamination incidents that are sufficiently bad in
the worst-case. See the Executable Files Section 12.2 for more information on f ilter_impacts.

Another  strategy is to  limit  the number  of feasible  sensor placements, using the location block option
described in Section (10.2.2.  Eliminating feasible locations reduces the problem representation used by the
sp solvers.

                                                 104

-------
Witness aggregation techniques can be used to limit the size of the sensor placement formulation.  This term
refers to the variables in the MIP formulation that witness a contamination incident. By default, variables
that witness contamination incidents with the same impact  are aggregated, and this typically reduces the
MIP constraint matrix by a significant  amount. Further reductions can be performed with more  aggressive
aggregations.
To demonstrate witness aggregation, an aggregate block is added to the sp configuration file. In addition, the
aggregate block name (aggl) is added to the problem block of the configuration file. An example aggregate
block sets the aggregation type option to PERCENT and the value option to 0.125.
   aggregate:
   - name: aggl
    type: PERCENT
    goal: impact 1
    value: 0.125
    conserve  memory: 0
    distinguish detection: 0
    disable aggregation:  [0]
The following table illustrates the use of the two  witness aggregation options when optimizing the mean
extent of contamination:  aggregation type  = PERCENT  and aggregation type =  RATIO. The RATIO
aggregation type can be used with distinguish detection option to help with aggregation.  The second line
of data in this table is the default aggregation, which has about half as many non-zero values in the MIP
constraint matrix.  Both the percent and ratio aggregation strategies effectively reduce the problem size while
finding near-optimal solutions.
Aggregation Type
None
PERCENT
PERCENT
RATIO
Aggregation Value
NA
0.0
0.125
0.125
Binary Variables
97
97
97
97
MIP Nonzeros
220736
119607
49576
12437
Solution Value
8525
8525
9513
10991
Another option to reduce the memory requirement for sensor placement is to reduce the size  of the net-
work model through skeletonization.  Skeletonization groups neighboring nodes based on the topology of
the network and pipe attributes.  The skeletonization code, spotSkeleton, provides techniques for branch
trimming, series pipe merging and parallel pipe merging.  Skeletonization eliminates pipes and  nodes that
have little effect on the overall hydraulics of the system. This effectively contracts a connected piece of the
network into a single node, called a supernode. The Executable Files Section 12.4 provides more information
on on spotSkeleton.  Skeletonized networks can be used to  define geographic proximity in a two-tiered
sensor placement approach for large network models (Klise et al., 2013a).

Additionally,  the GRASP heuristic has several options for  controlling how memory is managed.  The grasp-
representation solver option can be used to control how the local search steps are performed.  By default,
a dense matrix is precomputed to perform local search steps quickly, but a sparse matrix can  be used to
perform local search with less memory.  Also, the GRASP heuristic can be configured to use the local disk to
store this matrix. It should be noted that the Lagrangian  heuristic requires less memory than the GRASP
heuristic, and thus similar techniques have not been developed for it.

10.2.4  Evaluating a Sensor Placement

Sensor placements can be evaluated based on an impact assessment  of possible contaminant incidents. The
evalsensor executable measures the performance of each sensor placement with respect to the set of possible
contamination locations.  This analysis assumes that probabilities have been assigned to these contamination
locations.  If  no probabilities are given, then uniform probabilities  are used. The evalsensor  executable
takes sensor placements in a sensor placement file and evaluates them using data from one or more impact
files. Sensor placement files are generated using the sp subcommand, and the file format is described in File
                                                105

-------
Formats Section 11.10.  Impact files are generated using the sim2Impact subcommand, and the file format
is described in the File Formats Section 11.4. Additional information on evalsensor can be found in the
Executable Files Section 12.1.

The following example demonstrates the use of evalsensor using the sensor network design from example
1 of the Sensor Placement Examples  Section 5.4.  The evalsensor command for this example is executed
using the following command:
   evalsensor —nodemap=Net3.nodemap Net3_ec.sensors Net3_ec.impact Net3_mc.impact
This example generates output shown in Figure 10.7.

Sensor placement id:
Number of sensors :
Total cost:
Sensor node IDs :
Sensor junctions:
Impact File:
Number of event s :
Min impact :
Mean impact :
Lower quartile impact:
Median impact :
Upper quartile impact:
Value at Risk (VaR) ( 57.) :
TCE ( 57.) :
Max impact :
Impact File:
Number of event s :
Min impact :
Mean impact :
Lower quartile impact:
Median impact :
Upper quartile impact:
Value at Risk (VaR) ( 57.) :
TCE ( 57.) :
Max impact :


23112
5
0
17 19 24 65 88
115 119 127 209 267
Net3_ec. impact
236
0.0000
9951.5669
1650.0000
9694.0000
15044.8000
25485.0000
27992.4667
33110.0000
Net3_mc . impact
236
0.0000
70806.1310
503.9170
83999 . 3000
143984.0000
143999.0000
144049 . 5000
144143.0000

                            Figure 10.7:  The evalsensor example output.

The evalsensors command can also evaluate a sensor placement in the case where sensors can fail, and there
is some small number of different classes of sensors (grouped by false negative probability). This information
is specified by an imperfect sensor class file  and an imperfect junction class file, which are defined in the File
Formats  Sections 11.6 and 11.5, respectively.  The imperfect sensor class (sc) file, NetS.imperfectsc, specifies
different  categories  of sensor failures.  Sensors of class 1 have a false negative probability of 25%,  sensors of
class 2 have a probability of 50%, class 3 have a 75% probability, and class 4 100%.
1  0.25
2  0.50
3  0.75
41.0
Once failure classes are defined, the nodes of the network are assigned to failure classes by using a imperfect
junction class (jc) file. The beginning of the imperfect junction class file Net3.imperfect]c is shown below.
1  1
                                                 106

-------
2 1
3 1
4
5
6 1
7 1
9 1
10  1
Given the junction classes, evalsensor is used to determine the expected impact of a sensor placement, given
that sensors might fail. The following command line executes evalsensor with specified failure probabilities:
   evalsensor —nodemap=Net3.nodemap —sc-probabilities=Net3.imperfectsc \
       —scs=Net3.imperfectjc Net3_ec.sensors  Net3_ec.impact
This
example generates output shown in Figure 10.8.
   Sensor placement id:
   Number of sensors:
   Total cost:
   Sensor node IDs:
   Sensor junctions:

   Impact File:
   Number of event s:
   Min impact:
   Mean impact:
   Lower quartile impact:
   Median impact:
   Upper quartile impact:
   Value at Risk (VaR)  (
   TCE                (
   Max impact:
                    57.) :
                    57.) :
23112
5
0
17 19 24  65 88
115 119 127 209 267

Net3_ec.impact
236
0.0000
16472.1977
5270.0000
13440.0000
24199.0000
49232.5172
52421.7876
55814.4578
                  Figure 10.8:  The evalsensor output using sensor failure probabilities.

The mean extent of contamination impact changes dramatically if sensors are allowed to fail.  The original
solution was misleading  if sensors fail according to the assigned  probabilities.  With sensor failures,  the
expected impact is much higher.
                                                   107

-------
Chapter 11

File  Formats
This chapter describes the different file formats used by WST, including a brief description, format,  the
associated subcommand(s), and any additional details.

11.1   Configuration File
   • Description: Input configuration file for all WST subcommands.
   • Format: YAML
   • Created by: Template input configuation files can be created using the -template option from each
     subcommand in WST.
   • Used by: WST
   • Details: The input configuration files for WST are stored in the YAML file format.  YAML is a
     human-readable  file format that is well suited for storing hierachical information.  This information
     can be easily parsed and stored as common data types such as strings, scalars, lists and dictonaries by
     a range of programming languages. WST uses PyYAML to parse YAML files into Python data types.
     Basic YAML format specifications are listed below:
       — Each element of a YAML file is a key, value pair seperated by a colon (key:value).
       — The key is the name given to the element, and the value is the data for that element.
       — The hierarchy of YAML files is maintained by outline indentation.
       — The number of spaces used to indent  an element in the YAML file must be consistent across all
          parallel elements.
       — Nested data elements must be indented further than their preceding level.
       — Tab indentation is not recommended.
       — Optional blank lines can be added for readability.
       — Comments begin with the number sign (#) and must be separated from a key:value pair by space.
          Comments can start anywhere but are limited to a single line.
       — Strings do not require quotation (unless they could be cast as a number) and can contain spaces.
       — Lists are indicated with square brackets or hyphens. When using square backets, the list is comma
          seperated. When using hyphens, each entry of the list is on a new line.
       — Dictonaries are indicated with indentation or curly brackets.
       — PyYAML automatically casts datatypes.  For example, [123] is read as a list with a single integer
          value,  "123" is read as a string, 123 is  read as an integer, and 123.0 is read as a float.

                                               108

-------
Additional information on YAML files can be found on the official YAML website http://www.yaml.
org.
Select aspects of the flushing subcommand template configuration file are used as an example of the
format of a YAML file.  The full the flushing subcommand template configuration file is shown  in
Figure 6.2. In the template, the top level key, denoted flushing, contains  the following data:
   flushing:
     detection:  [111, 127, 179]   # Sensor locations to detect contamination scenarios
     flush nodes:
      feasible  nodes: NZD       # Feasible flushing nodes
      infeasible  nodes: NONE    # Infeasible flushing nodes
      max nodes:  2              # Maximum number of nodes to flush
      rate:  800.0              # Flushing rate [gallons/min]
      response  time: 0.0        # Time [min] between detection and flushing
      duration: 480.0           # Flushing duration [min]
     close valves:
      feasible  pipes: ALL       # Feasible pipes to close
      infeasible  pipes: NONE    # Infeasible pipes to close
      max pipes:  0              # Maximum number of pipes to close
      response  time: 0.0        # Time [min] between detection and closing pipes
This subset of the the flushing subcommand template is refered to as the flushing block. Instead of
a single value assigned to flushing, the value is a dictonary containing a nested stucture of additional
key:value pairs.
The keys detection,  flush nodes and close valves options are all second level keys inside the flushing
block.  The location of these keys is often specified using the notation flushing detection, flushing flush
nodes  and  flushing close valves.  The second level keys must be indented using the same number of
spaces and they must have unique names.  The key flushing detection is assign to a list.  Lists can
be specified in  one of two ways, using  square brackets or using hyphen.  The following two formats
(seperated  by —) are equivalent.
   flushing:
     detection:  [111, 127, 179] # square bracket notation, comma seperated

   flushing:
     detection:  #  hyphen notation, new line  for each entry
     -  Ill
     -  127
     -  179
All template configuration files use square brackets to indicate where a list of input values can be used.
If these input values include keywords, like NZD for non-zero demand nodes, this information is listed
in the comment or in the user manual documentation for that  specific YAML input parameter.
The options flushing flush nodes and flushing close valve are both assigned dictonaries. The data inside
these second level keys contain additional key:value pairs.  All keys within these dictonaries must be
indented using  the same number of spaces and have unique names. Two key:value pairs in the flushing
flush nodes option are listed below:
   flushing:
     flush nodes:
      feasible nodes: NZD
      max nodes:  2
Here, the flushing flush nodes feasible nodes option is set to the string NZD and the flushing flush nodes
max nodes option is set to the integer 2.  Note that there are two third order keys named response
time, however they are used in different nested structures, as shown below:
   flushing:
     flush nodes:
      response time: 0.0
                                            109

-------
          close valve:
           response time: 0.0
     YAML files can be written in a compact format that uses curly brackets to represent the hierarchical
     indentation. While this avoids issues with space indentation, this format is more difficult for the user
     to read. For example, the following two examples (seperated by —) are equivalent:
        flushing:
          detection:  [111, 127, 179]

        {'flushing':  {'detection':  [111,  127, 179]}}
11.2   Cost File

   • Description: Specifies the costs for installing sensors at different nodes throughout a network.

   • Format: ASCII

   • Created by:  WST user

   • Used by: sp

   • Details: Each line of this file has the format:
            
     Nodes not explicitly enumerated in this file are assumed to have a cost of zero unless the ID '_
     default	' is specified. For example to specify that all un-enumerated nodes have a cost of 1.0:
            default  1.0
11.3   ERD File

   • Description: Provides a compact representation of all contamination scenario simulation results.

   • Format: binary

   • Created by:  tevasim

   • Used by: sim2Impact

   • Details: The simulation data generator produces four output files containing the results  of all con-
     tamination simulation scenarios. The database files include an index file (index.erd), a hydraulics file
     (hyd.erd) and  a water quality file (qual.erd). The files are unformatted binary file in order to save disk
     space and computation time. They are not readable using an ordinary text editor.

   • Note: The ERD file format replaced  the TSO and SDX file formats, created by previous versions
     of tevasim, to extend the  capability of tevasim for multi-species simulation  using  EPANET-MSX.
     While the tevasim subcommand  produces only  ERD files  (even for single species  simulation), the
     sim2Impact subcommand accepts  both ERD and TSO file formats.


11.4   Impact File

   • Description: Provides the impact of  all  contamination incidents at every point that it is witnessed
     through a water distribution network.

   • Format: ASCII


                                                110

-------
     Created by:  sim2Impact

     Used by:  sp  and evalsensor

     Details: The first line of an impact file contains the number of incidents.  The second line specifies
     the number of delays (always 1) and the delay time in minutes. Subsequent lines have the format
              
     The scenario index is the index of contamination scenarios that were simulated. A scenario index maps
     to a line in the network scenariomap file, which is defined in Section 11.9. The node index is the index
     of a witness location for the incident. A node index maps to a line in  the network nodemap file, which
     is  defined in Section 11.8.  The time of detection is in minutes.  The value of the impacts are in the
     corresponding units for each  impact metric.  The different impact  metrics in each line correspond to
     the different delays that have been computed.

11.5   Imperfect Junction  Class File

   • Description:  Provides  the  mapping  from EPANET node IDs to  failure classes of different false-
     negative probabilities.

   • Format:  ASCII

   • Created  by:  WST user

   • Used by: sp

   • Details:  The imperfect junction class file provides the mapping from EPANET node IDs to failure
     classes of different false-negative  probabilities as  defined in a imperfect sensor class  file (See Section
     11.6 for information on imperfect sensor class files). The format of this file is:
           •Cnode id> 
           •Cnode id> 
     For example, node 1 is of class 2, node 2 is of class 1 and node 3 is of class 1:
           1 2
           2 1
           3 1
11.6   Imperfect Sensor Class  File

   • Description: Contains false negative probabilities for different types of sensors.

   . Format: ASCII

   • Created by: WST user

   • Used by:  sp

   • Details: The file has format:
           •Cclass id>    
-------
          1 0.25
          2 0.5
11.7   Measurements File

   • Description:  Contains a list of measurements along with their corresponding time  and EPANET
     node ID.

   • Format: ASCII

   • Created by: WST user or a water quality event detection system or a data acquisition system

   • Used by: inversion

   • Details: Each line of this file has the format:
           

-------
     Details:  Each line of this file has the format:
              
                               
     The node index maps to the network nodemap file as described in  Section 11.8, and the EPANET
     ID provides this information. The source type is the injection mode for an scenario,  e.g., flow-paced
     or fixed-concentration.  The scenario source start  and stop times are in minutes, and  these values
     are relative to the start of the EPANET simulation.  The source strength is the concentration  of
     contaminant at the injection source.

11.10   Sensor Placement File

   • Description:  Describes one or more sensor placements.

   • Format: ASCII

   • Created By:  sp

   • Used By: evalsensor
Details: Lines in a sensor placement file that begin with the
Otherwise, lines of this file have the format
                                                                 character are assumed to be comments.
             
     The sensor placement ID is used to identify sensor placements in the file. Sensor placements could have
     differing numbers of sensors, so each line contains this information. The node indices map to values in
     the nodemap file described in Section 11.8.
11.11    TAI  File

   • Description:  Describes the information needed for assessing health impacts.

   • Format: ASCII

   • Created by: WST user

   • Used by: sim2Impact

   • Details: This  file is required for health impact metrics, such as population exposed, population dosed,
     and populationd killed. The following example can be copied  directly into a text editor.
         ;  THREAT ASSESSMENT  INPUT (TAI)  FILE
          USAGE: teva-assess.exe 
          Data items explained below
          UPPERCASE items are non-modifiable keywords
          lowercase items are user-supplied values
          I  indicates a selection
          INPUT-OUTPUT
          * TSONAME      - The location of the ERD or TSO file containing the results
                          to analyze.  This value is ignored when used  in sim2Impact
                          to specify parameters for the pe, pk, or pd  metrics.
          * TAONAME      - Name of threat assessment output (TAO) file. This value
                          is ignored when used in sim2Impact to specify parameters
                          for the pe,  pk, or pd metrics.
          * SPECIES_NAME - The species  name to analyze. This is optional - if it is not
                          specified, the first species will be used. If the database
                          was generated by EPANET-MSX, the value MUST  match one of
                          the species  specified in the MSX input file. If the database
                                                  113

-------
                   ¥as generated by EPANET, the value should be "species" if tevasim
                   generated the database.  If TEVA-SPOT generated the database,
                   the value MUST match the name spefied in the GUI.
  * THRESHOLD    - The concentration threshold. All concentrations below
                   value are set to 0.  Only used in the the threatassess
                   executable, not in sim2Impact
TSONAME       charstring
TAONAME       charstring
SPECIES_NAME  charstring
THRESHOLD     value
  DOSE-RESPONSE PARAMETERS
  * A - Function coefficient
  * M - Function coefficient
  * N - Function coefficient
  * TAU - Function coefficient
  * BODYMASS - Exposed individual body mass (kg)
  * NORMALIZE - Dose in mg/kg (YES) or mg (NO)
  * BETA - Beta value for probit dose response model
  * LD50 - LD50 or ID50 value for the agent being studied
  * TYPE - Either PROBIT or OLD depending on the dose response equation
           to be used. If it is PROBIT, only the LD50 and BETA values
           need to be specified, and if it is OLD, the A, M, N, and TAU
           values need to be specified. The BODYMASS and NORMALIZE
           apply to both equations.
DR:A          value
DR:M          value
DR:N          value
DR:TAU        value
BODYMASS      value
NORMALIZE     YES I NO
DR:BETA       value
DR:LD50       value
DR:TYPE       probit I  old
  DISEASE MODEL
  * LATENCYTIME - Time from exposed to symptoms (hours)
  * FATALITYTIME - Time from symptoms till death (hours)
  * FATALITYRATE - Fraction of exposed population that die
LATENCYTIME   value
FATALITYTIME  value
FATALITYRATE  value
  EXPOSURE MODEL
  * DOSETYPE      - TOTAL = Total ingested mass
  * INGESTIONTYPE - DEMAND = Ingestion probability proportional to demand
                    ATUS RANDOM = ATUS ingestion model,  random volume
                                  selection from volume  curve
                    ATUS MEAN   = ATUS ingestion model,  mean volume of value
                    FIXED5 RANDOM = 5 fixed ingestion times (7AM,  9:30AM, Noon,  3PM,  6PM),
                                  random volume selection from volume curve
                    FIXED5 MEAN = 5 fixed ingestion times (7AM, 9:30AM, Noon,  3PM,  6PM),
                                  mean volume of value
  * INGESTIONRATE - Volumetric ingestion rate (Liters/day) - used for DEMAND,
                    ATUS MEAN and FIXED5 MEAN
DOSETYPE      TOTAL
INGESTIONTYPE DEMAND I  ATUS RANDOM I  ATUS MEAN I  FIXED5 RANDOM I  FIXED5 MEAN
INGESTIONRATE value

;  POPULATION MODEL
;  * POPULATION FILE  - File name that contains the node-based
                                             114

-------
        POPULATION
                              population.  The format of the file is  simply
                              one line per node ¥ith the node ID and the
                              population value for that node.
          * POPULATION DEMAND  - Per capita usage rate (flow units/person).
                                The population ¥ill be estimated by  demand.
                      FILE  I DEMAND  value
          DOSE OVER THRESHOLD MODEL
          * DOSE_THRESHOLDS - The dose over each threshold specified will be
                             computed and output to the TAO file.
          * DOSE_BINDATA  - Specifies the endpoints of bins to tally  the number
                          of people ¥ith a dose between the t¥o endpoints.
                          Values can be either dose values or response values -
                          response values are converted to dose values using the
                          dose-response data specified in this file and are indicated
                          on this line by the keyword RESPONSE. Dose values are
                          IDENTIFIED by the keyword DOSE.
        DOSE_THRESHOLDS  valuel  ...  value_n
        DOSE_BINDATA (DOSE  I RESPONSE) valuel
value n
11.12    TSG File
     Description:  Specifies how an ensemble  of EPANET contamination scenario simulations will be
     performed.

     Format: ASCII

     Created by:  WST user

     Used by: tevasim

     Details: Each line of a TSG file specifies injection location(s), species (optional), injection mass, and
     the injection time-frame:
                
     If  is included, the tevasim subcommand uses EPANET-MSX. The simulation data generator
     uses the specifications in the TSG file to construct  a separate threat simulation input (TSI) file that
     describes each individual contamination scenario in the ensemble.  Each  line in the TSG file uses a
     simple language that is expanded to define the ensemble.  The entire ensemble is  comprised of the
     cumulative effect of all lines in the TSG file. The TSG file is  an optional file, since the ensemble of
     contamination scenarios can be specified in the  configuration file.
             

        :       A  label that describes  the ith source location of an N-source ensemble.
                      This can be either:  1)  An EPANET node ID identifying one node
                      where the contaminant is introduced, 2)  ALL,  denoting all nodes
                         (excluding tanks and reservoirs), 3) NZD,  denoting all nodes with
                         non-zero demands. This simple language allows easy specification of
                         single- or multi-source ensembles.  [Character strings]
        :     The source type, one of: MASS, CONCEN,  FLOWPACED, SETPOINT (see EPANET
                      manual for information about these types of water quality sources).
                      [Character string]
        :  The character ID of  the water quality species  added by the source.  This
                      parameter must be omitted when using executables built from the standard
                      EPANET distribution. [Character string]
        :  The strength of the  contaminant source  (see EPANET documentation for
                                 the various source types).
        :       The time, in seconds, measured from the start  of simulation, when the
                                                   115

-------
        :
        Examples:
contaminant  injection is started.  [Integer]
The time,  in seconds, measured from the start of simulation, when the
contaminant  injection is stopped.  [Integer]
        One scenario ¥ith a single  injection at node ID 10,  mass rate of 5 mg/min of species
        SPECIE1, start time of  0, and stop time of 1000:
        10     MASS    SPECIE1    5    0     1000

        Multiple scenarios ¥ith single injections at all  non-zero demand nodes:
        NZD   MASS    SPECIE1    5     0     1000

        Multiple scenarios ¥ith t¥o injections, one at node  ID  10, and the other at all
        non-zero demand nodes  (NZD):
        10     NZD     MASS     SPECIE1   5     0     1000

        Multiple scenarios ¥ith three injections, at all  combinations of all nodes
        (if there are N nodes,  this ¥ould generate N3 scenarios for the ensemble):
        ALL     ALL     ALL  MASS   SPECIE1   5     0     1000

        Note: this language ¥ill generate scenarios ¥ith  repeat instances of injection node
        locations (e.g., ALL   ALL  ¥ould generate one scenario for node i and j, and another
        identical one for node  j and i). Also, it ¥ill generate multi-source scenarios ¥ith
        the same node repeated.  In  this latter case, the  source mass rate at the repeated
        node is the mass rate  specified in .
11.13   TSI File
      Description:  Specifies  how an ensemble of EPANET contamination scenario simulations  will be
      performed.

      Format:  ASCII

      Created  by: tevasim or WST user

      Used by: tevasim

      Details:  The TSI file is generated as output from the tevasim subcommand and would not normally
      be used, but it is available after the run for reviewing each scenario that was generated for the ensemble.
      The TSG  file is essentially a  short hand for generation of the more cumbersome TSI file.  Each record
      in the TSI input file specifies  the  unique attributes of  one contamination scenario.  The number of
      scenarios  does not have a restriction.
               
              
                                                                              ..
        :              EPANET ID identifying the ith node ¥here the contaminant  is
                                introduced.  [Character string]
        :           The EPANET source type index of  the ith contaminant source.
                                Each EPANET source type is associated ¥ith an integer index
                                (see EPANET toolkit documentation for reference).  [Integer]
        :         The EPANET species index of the  ith contaminant source.  [Integer]
        :           The strength of the ith contaminant source (see EPANET
                                documentation for description of sources).  This value
                                represents the product of contaminant flo¥ rate and
                                concentration.  [Float]
        :               The time, in seconds, measured from the start of simulation,
                                ¥hen the ith contaminant injection is started.  [Integer]
        :                The time, in seconds, measured from the start of simulation,
                                ¥hen the ith contaminant injection is stopped.  [Integer]
                                                    116

-------
        One water quality simulation ¥ill be run for each  scenario specified  in the threat
        simulation input (TSI) file.  For each such simulation, the source  associated ¥ith each
        contaminant location , 1=1,,N ¥ill be activated as the specified type source,
        and all other water quality sources disabled.  If a source node is  specified in the
        EPANET input file, the baseline source strength and source type options ¥ill be ignored,
11.14    Weight File

   • Description: Specifies the weights for contamination incidents.

   . Format: ASCII

   • Created by: WST user

   • Used by: sp

   • Details: Each line of this file has the format:
            <¥eight>
     Scenarios not explicitly enumerated  in this file are assumed to have a weight of zero unless the ID
     	default	is specified.  For example, to specify that all un-enumerated scenarios have a weight of
     1.0:
            .default  1.0
                                                   117

-------
Chapter  12

Executable  Files
This chapter describes the  different executable files that  can be used outside of WST to evaluate differ-
ent sensor network designs, and to reduce the size of the sensor placement problem by filtering impacts,
aggregating impacts or skeletonizing the water distribution network model.

12.1   evalsensor

The evalsensor executable is used to compute information about the impact of contamination incidents for
one or more sensor network designs. The evalsensor executable takes a sensor network design in a sensor
placement file (see File Formats Section 11.10 for more detail) and evaluates them using data from an impact
file or  a list of impact files  (see File Formats Section 11.4).  This executable measures the performance of
each sensor network designs with respect to the set of possible contamination scenarios.
Section 10.2.4 provides more information and an  example  application of this executable.

12.1.1   Usage

Usage with a specific sensor network design:
      evalsensor [options...]     [...]
Usage without a sensor network design:
      evalsensor [options...]  none  [...]
12.1.2   Options
       —all-locs-feasible
       A boolean flag to indicate that all locations are treated as feasible.

       —costs=
       The name of a file that contains the cost information for each node in  the network.
           For more details about the cost file, see File Formats Section \ref{formats_costFile}.

       —debug
       A boolean flag to add output information about each incident.

       —format=
       The type of output that the evaluation ¥ill generate:
                 cout -  Generates output that is easily read,  (default)
                 xls -   Generates output that is easily imported into a MS Excel spreadsheet.
                 xml -   Generates an XML-formated output to communicate
                                with the TEVA-SPOT GUI. (not currently supported)

       —gamma=
       The fraction of the tail distribution used to compute the VaR and TCE


                                                  118

-------
            performance measures,  (default  is  0.05)

            -h,  —help
            A boolean flag to  display usage information.

            —incident-weights 
            The  name of a file that  contains the weights  of  the different contamination incidents.
            For  more details about the weights file,  see  File Formats Section \ref{formats_¥eightFile}.

            —nodemap=
            The  name of a file that  contains the node map information for translating sensor placement
            indices to EPANET  node IDs.  For more details  about the nodemap file, see File Formats
            Section \ref{formats_nodeFile}.

            -r,  —responseTime=
            The  number of minutes that are  needed  to  respond to the detection of
            contamination.  As  the response  time increases, the impact increases
            because the contaminant  affects the network for  a greater length of
            time.

            —sc-probabilities=
            The  name of a file that  contains the probability of detection for each sensor category.
            For  more details about the imperfect sensor class file, see File Formats Section
            \ref{formats_sensorClass}.

            —scs=
            The  name of a file that  contains the sensor category  information for each possible
            sensor location in the network.  For more  details about the imperfect junction class file,
            see  File Formats Section \ref{formats_junctionClass}.

            —version
            A boolean flag to  display version  information.

            Note:  Options like reponseTime  can be  specified  with  the syntax
            —responseTime 10.0 or —responseTime=10.0.
12.1.3  Arguments
        
-------
12.2   filter	impacts

The f ilter_impacts script filters out the low-impact incidents from an impact file. The f ilter_impacts
command reads an impact file, filters out the low-impact incidents, rescales the impact values, and writes
out another impact file.

12.2.1   Usage
      filter_impacts [options...]  
12.2.2   Options
        —threshold=
        The contamination  incidents ¥ith undetected impacts  above a specified threshold should be kept.

        —percent=
        The percentage of  contamination incidents ¥ith the worst undetected impact that should be kept.

        —num= 
        The number of contamination incidents ¥ith the worst undetected impact that should be kept.

        —rescale
        Rescale the impacts using a loglO scale.

        —round
        Round input values to the nearest integer.
12.2.3   Arguments
        
        The output impact file.
                                                   120

-------
12.3   scenarioAggr

The scenarioAggr  executable  takes  an  impact file and produces  an aggregated  impact file.   The
scenarioAggr executable reads an impact file, finds similar incidents, combines them, and writes out another
impact file. The convention is to append the string aggr to the output.

The following files are generated during the execution of scenarioAggr, assuming that the input was named
network, impact:

   • aggrnetwork.impact - The new impact file.

   • aggrnetwork.impact.prob - The probabilities of the aggregated incidents.  These are non-uniform,  so
     any solver must recognize incident probabilities.

Not all of the solvers available in the sp command can perform optimization with aggregated impact files.
In particular, the heuristic GRASP solver  does not currently support aggregation because it does not use
contamination incident  probabilities.  The Lagrangian and PICO solvers support contamination incident
aggregation. However, initial results suggest that although the number of contamination incidents is reduced
significantly, the number of impacts might not be, and solvers might not run much faster.

12.3.1  Usage
     scenarioAggr —numEvents= 
12.3.2   Options
      —numEvent s=
      The number of contamination incidents that should be aggregated.
12.3.3   Arguments
       
-------
12.4   spotSkeleton

The skeletonizer,  spotSkeleton, reduces the size of a network model by grouping neighboring nodes based
on the topology of the network and pipe diameter  threshold. Nodes that are grouped together form a new
node, often refered to as a supernode.  The spotSkeleton executable requires an EPANET INP network file
and a pipe diameter threshold. The executable creates a skeletonized EPANET INP network file and map
file. The map file defines the nodes that belong to  each supernode.

The spotSkeleton executable includes branch trimming, series pipe merging and parallel pipe merging.
A pipe  diameter  threshold determines candidate pipes  for skeleton steps.  The spotSkeleton executable
maintains pipes and  nodes with hydraulic  controls as  it creates  the skeletonized network.  It performs
series and parallel pipe merges if both pipes are below  the  pipe diameter threshold, calculating hydraulic
equivalency for each merge based on the average pipe roughness of the joining pipes. For all  merge steps,
the larger diameter pipe is retained. For a series pipe merge,  demands (and associated demand patterns) are
redistributed to the nearest node.  Branch trimming removes deadend  pipes smaller than the pipe diameter
threshold and redistributes demands  (and associated demand patterns) to the remaining junction.  The
spotSkeleton executable repeats these steps until  no further pipes can be removed from the network. The
spotSkeleton executable creates an EPANET-compatible skeletonized network INP file and a map file that
contains the mapping of original network model nodes into skeletonized supernodes.
Under these skeletonization steps, there is a limit to how much a network can be reduced based on its topology,
e.g., number  of deadend pipes, or pipes in series and  parallel.  For  example,  sections  of the network with a
loop,  or  grid, structure will not reduce under these skeleton steps. Additionally, the number of hydraulic
controls influences skeletonization, as all pipes and nodes associated with these features are preserved.
Commercial skeletonization codes include Haestad Skelebrator, MWHSoft  H2OMAP, and MWHSoft In-
fo Water. To  validate the  spotSkeleton executable, its output was compared to the output of  MWHSoft
H2OMAP and  Infowater.  Input parameters were  chosen to match spotSkeleton options.  Pipe diameter
thresholds of 8  inches, 12  inches and 16 inches were tested using two large networks.  MWHSoft  and WST
skeletonizers  were compared using the Jaccard index. The Jaccard index measures similarity between two
sets by dividing the intersection of the two sets by the union of the two sets.  In this case, the intersection
is the number of pipes that are either both removed or not removed by the two skeletonizers, and the union
is the number of all pipes in  the  original network.  If the  two skeletonizers  define  the same supernodes,
the Jaccard index equals 1. Skeletonized networks  from the MWHSoft and WST skeletonizers  resulted in a
Jaccard index between 0.93 and 0.95. Thus, the spotSkeleton executable is believed to be a good  substitute
for  commercial  skeletonizers.

12.4.1   Usage
      spotSkeleton     
12.4.2   Arguments
       •Cinput  inp file>
       The input EPANET INP file to be skeletonized.

       
       The pipe diameter threshold that determines which pipes might be skeletonized.

       •Coutput inp file>
       The output EPANET INP file created after skeletonization.

       •Coutput map file>
       The output map file that contains the mapping of original network nodes to
       skeletonized supernodes.

-------
References
Adams,  B. M., Ebeida, M.  S., Eldred, M.  S., Jakeman,  J. D., Swiler,  L.  P., Bohnhoff,  W.  J., Dai-
  bey, K. R.,  Eddy, J.  P., Hu, K.  T., and  Vigil,  D.  M. (2013).  Dakota, a multilevel parallel object-
  oriented framework for design optimization, parameter estimation,  uncertainty quantifcation, and sen-
  sitivity analysis.  Technical Report SAND2010-2183,  Sandia National Laboratories.  Available at http:
  //dakota.sandia.gov/documentation.html.

Berry, J., Hart, W.  E., Phillips, C. E.,  Uber, J. G., and Watson, J.-P. (2006).  Sensor placement in mu-
  nicipal water networks with temporal integer programming models.  J.  Water Resources Planning  and
  Management, 132(4) :218-224.

Bureau of Labor Statistics  and U.S. Census Bureau (2005). American  time use survey user's guide 2003 to
  2004.  Technical report, Washington DC. Available at http://www.bls.gov/news.release/archives/
  atus_09142004.pdf.

Davis, M. and  Janke, R. (2008). Importance of exposure model in estimating impacts when a water distri-
  bution system is contaminated. J.  Water Resources Planning and Management, 134(5):449-456.

EPA, U. S. (2004).  Response protocol toolbox:  planning for and responding  to drinking water contami-
  nation threats and incidents.  Technical report, U.S. Environmental  Protection Agency, Office of Water,
  Office of Ground Water  and  Drinking Water, Washington, D.C.  Available at  http://water.epa.gov/
  infrastructure/watersecurity/upload/2004_ll_24_rptb_response_guidelines.pdf.

EPA, U. S. (20ff).  Teva-spot  toolkit and user's manual. Technical  report,  U.S. Environmental Protec-
  tion Agency. Available at http://cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=
  514412.

EPA, U. S. (20f3a). Epanet-rtx, real-time extension for the epanet toolkit, at  open water analytics. Technical
  report, U.S.  Environmental Protection Agency.  Available at http://openwateranalytics.github.io/
  epanet-rtx/index.html.

EPA, U. S. (20f3b).  Water security initiative.  Technical report, U.S. Environmental Protection Agency.
  Available at  http://water.epa.gov/infrastructure/watersecurity/lawregs/initiative.cfm.

Fourer,  R., Gay, D. M., and Kernighan, B. W.  (2002).  AMPL: A Modeling  Language for Mathematical
  Programming. Brooks/Cole, Pacific Grove, CA, 2nd edition.

Geib, C., Taxon, T., and Hatchett, S. (2011). Epanet results database (erd), user's guide, version 1.00.00.
  Technical report.

Hart,  D. B. and McKenna, S. A.  (2012).   Canary  user's manual,  version 4.3.2.   Technical Report
  EPA/600/R/08/040B, Washington, D,C.: U.S. Environmental Protection Agency.  Available at http:
  //cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=513254.

Hart, W. E., Laird,  C., Watson, J., and Woodruff, D. (2012). Pyomo  - Optimization Modeling in Python.
  Springer, 1st edition.

                                              123

-------
Hatchett,  S., Uber, J., Boccelli,  D., Haxton,  T., Janke, R., Kramer, A., Matracia,  A., and  Panguluri, S.
  (2011).  Real-time distribution system modeling:  development,  application, and  insights.  In In Proc.
  Eleventh International Conference on Computing and Control for the Water Industry, Sept.  2011, Exeter,
  UK.

Janke, R.,  Morley, K., Uber, J.,  and Haxton, T. (2011).  Real-time modeling for water  distribution sys-
  tem operation:  Integrating security developed technologies with normal operations.  In  In Proc. AWWA
  Distribution Systems Symposium and Water Security Conference, Sept. 2011, Nashville, TN.

Klise, K., Phillips, C., and Janke, R. (2013a).  Two-tiered sensor placement using skeletonized water distri-
  bution network models. Journal of Infrastructure Systems, (10.1061/(ASCE)IS.1943-555X.0000156).

Klise, K., Siirola,  J., Hart, D., Hart, W., Phillips, C., Haxton, T., Murray, R., Janke, R., Taxon, T., Laird,
  C., Seth, A., Hackebeil, G., McGee, S., and Mann, A. (2013b). Water security toolkit user manual version
  1.1. Technical Report SAND2013-8346 P, Sandia National Laboratories.

Mann, A., Hackebeil,  G.,  and Laird, C. (2012a).   Explicit water  quality model  generation  and rapid
  multi-scenario simulation.  J.  Water Resources Planning and Management,  (10.1061/(ASCE)WR.1943-
  5452.0000278).

Mann, A., McKenna, S., Hart, W., and Laird, C. (2012b). Real-time inversion in large-scale water networks
  using discrete measurements.  Computers & Chemical Engineering, 37:143-151.

Mirchandani, P. and Francis, R., editors (1990). Discrete Location Theory. John Wiley and Sons, Toronto,
  Canada.

Murray, R., Adcock,  N., Rice, G., Uber,  J., and Hatchett, S.  (2011). Predicting pathogen survival when
  introduced into a water distribution system with growth medium.  In Proc.  Water Quality Technology
  Conference, Nov. 2011, Phoenix, AZ.

Murray, R., Haxton, T.,  Janke,  R., Hart, W.  E., Berry, J.,  and Phillips, C.  (2010).  Sensor network design
  for drinking water contamination warning systems: A compendium of research results  and case studies
  using the teva-spot software. Technical Report EPA/600/R-09/141, Cincinnati, OH Office of Research and
  Development, National Homeland Security  Research Center, Water Infrastructure Protection.  Available
  at http://cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=498251.

Murray, R., Uber, J., and Janke, R. (2006). Model for estimating acute health impacts from consumption of
  contaminated drinking water. J. Water Resources Planning and Management: Special Issue on Drinking
  Water Distribution Systems Security, 132(4):293-299.

Rossman, L. A. (2000). Epanet 2 users manual. Technical report, Environmental Protection Agency. Available
  at http://nepis.epa.gov/Adobe/PDF/P1007WWU.pdf.

Shang, F., Uber, J., and Rossman, L. (2011). Epanet mutli-species extension user's manual. Technical Report
  EPA/600/S-07/021, USEPA. Available at http://cf pub. epa.gov/si/si_public_f ile_download. cfm?
  p_download_id=500759.

U.S. Geological Survey (2004). Estimated  use of water in the united states in 2000.  Technical report, U.S.
  Geological Survey.  Available at  http://pubs.usgs.gov/circ/2004/circl268/pdf/circularl268.pdf.
                                                124

-------
United States
Environmental Protection
Agency
PRESORTED STANDARD
 POSTAGE & FEES PAID
         EPA
   PERMIT NO. G-35
Office of Research and Development (8101 R)
Washington, DC 20460

Official Business
Penalty for Private Use
$300

-------