EPA/600/R-13/201 | November 2013 | www.epa.gov/ord
United States
Environmental Protection
Agency
  Sandia National Laboratories
           CANARY Training Tutorials
                         CANARY
Office of Research and Development
National Homeland Security Research Center

-------
Acknowledgements

The National Homeland Security Research Center (NHSRC) would like to acknowledge the
following organizations and individuals for their support in the development of the CANARY
Training Tutorials:

U.S. EPA Office of Research and Development, NHSRC
      Terra Haxton
      Regan Murray
      Jennifer Hagar (ORISE Fellow)

U.S. EPA Office of Water, Water Security Division
      Steve Allgeier
      Katie Umberg

Sandia National Laboratories (IADW-89-92291401)
      Samantha Cafferky
      David Hart
      Sean Hollister
      Sean McKenna
Questions concerning this document or its application should be addressed to:

      Terra Haxton
      USEPA/NHSRC (NG 16)
      26 W. Martin Luther King Drive
      Cincinnati, OH 45268
      (513)569-7810
      haxton.terra@epa.gov

-------
Disclaimer

The U.S. Environmental Protection Agency (EPA) through its Office of Research and
Development funded and collaborated in the research described here under Inter-Agency
Agreement DW-89-92291401 with the Department of Energy's Sandia National Laboratories.
This document has been subjected to the Agency's review and has been approved for
publication. EPA does not endorse the purchase or sale of any commercial products or services.

This report was prepared as an account of work sponsored by an agency of the United States
Government. Accordingly, the United States Government retains a nonexclusive, royalty free
license to publish or reproduce the published form of this contribution, or allow others to do so
for United States Government purposes.

Sandia Corporation, the United States Government, any agency thereof, and their employees do
not make any warranty, express or implied, or assume any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information, apparatus, product, or process
disclosed,  or represents that its use would not infringe privately-owned rights. Reference herein
to any specific commercial product, process, or service by trade name, trademark, manufacturer,
or otherwise does not necessarily constitute or imply its endorsement, recommendation, or
favoring by Sandia Corporation, the United States Government, or any agency thereof. The
views and opinions expressed herein do not necessarily state or reflect those of Sandia
Corporation, the United States Government or any agency thereof.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia
Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S.
Department of Energy's National Nuclear Security Administration under contract DE-AC04-
94AL85000.

-------
Table  of Contents

Acknowledgements	2
Table of Contents	4
List of Figures	5
List of Tables	8
List of Acronyms	9
1. Introduction	10
2. Configuration File Tutorial	13
    2.1 Editing the Configuration File	13
    2.2 Valid Range	16
    2.3 Evaluation Type	19
    2.4 History Window	25
    2.5 Outlier Threshold	28
    2.6 Event Threshold	30
    2.7 Multiple Locations	31
3. Optimizing CANARY Configurations Tutorial	36
    3.1 Multiple Algorithms	36
    3.2 Consensus Algorithm	50
    3.3 Binomial Event Discriminator	57
4. Database Driven Input/Output Tutorial	60
    4.1 Obtaining JDBC Driver	60
    4.2 Modifying Configuration File to Use Databases	62
5. Composite Signals Tutorials	69
    5.1 Suppressing an Alarm After a Calibration Event	69
    5.2 Integrating Pump Flow Operational Data into Event Detection	77
    5.3 Integrating Tank Level Operational Information Into Event Detection	88
References	97
Appendix A: Frequently Asked Questions (FAQs)	98
Appendix B: Binomial Distribution Function Exercise	101
Appendix C: Configuration File Quick Reference	107
Appendix D: File Types	116
Glossary	117

-------
List of Figures

Figure 1: YML file example viewed inUltraEdit	14
Figure 2: YML file example viewed in WordPad	15
Figure 3: YML file example viewed inNotePad++	16
Figure 4: Signals configuration section with default valid range parameters	17
Figure 5: Signal plots produced using the default valid range parameters	18
Figure 6: Signals configuration section with the changed valid range parameters	19
Figure 7: Signal plots produced using the valid range parameters in Figure 6	19
Figure 8: Signals configuration section with the evaluation type parameter as OP	20
Figure 9: Operational signal plot	20
Figure 10: Signals configuration section with the evaluation type parameter as WQ	21
Figure 11: Water quality signal plot	21
Figure 12: Signals configuration section with the evaluation type parameter as CAL	21
Figure 13: Output graph without calibration signal	22
Figure 14: Signals configuration section with the evaluation type parameter switched to OP	23
Figure 15: Output graph with the evaluation type parameter switched to OP	24
Figure 16: Algorithms configuration section with a history window parameter of 144	25
Figure 17: Output graph produced when the history window parameter is 144	26
Figure 18: Algorithms configuration section with a history window parameter of 36	27
Figure 19: Output graph produced when the history window parameter is 36	28
Figure 20: Algorithms configuration section with an outlier threshold parameter of 0.8	29
Figure 21: Event probability plot produced when the outlier threshold parameter is 0.8	29
Figure 22: Algorithms configuration section with an outlier threshold parameter of 0.99	30
Figure 23: Event probability plot produced when the outlier threshold parameter is 0.99	30
Figure 24: Algorithms configuration section with an event threshold parameter of 0.85	30
Figure 25: Event probability plot produced when the event threshold parameter is 0.85	31
Figure 26: Algorithms configuration section with an event threshold parameter of 0.99	31
Figure 27: Event probability plot produced when the event threshold parameter is 0.99	31
Figure 28: Monitoring stations configuration section	33
Figure 29: Output EDSD files	33
Figure 30: Station A output graph	34
Figure 31: Station B output graph	35
Figure 32: Algorithms configuration section with a history window parameter of 36	38
Figure 33: Monitoring stations section with multiple algorithms enabled	39
Figure 34: Output graph when the history window parameter is 36	40
Figure 3 5: Algorithms configuration section with a history window parameter of 72	41
Figure 36: Output graph when the history window parameter is 72	42
Figure 37: Algorithms configuration section with a history window parameter of 108	44
Figure 38: Output graph when the history window parameter is 108	45
Figure 39: Algorithms configuration section with a history window parameter of 144	46
Figure 40: Output graph when the history window parameter is 144	47
Figure 41: Algorithms configuration section with a history window parameter of 180	48
Figure 42: Output graph when the history window parameter is 180	49
Figure 43: CANARY total events graph	50
Figure 44: Signals configuration section	52
CANARY Training Tutorials                                                       Page 5

-------
Figure 45: Algorithms configuration section with the consensus algorithm defined	52
Figure 46: Monitoring configuration section with algorithm B2 activated	53
Figure 47: Output graph produced using the set point algorithm, SPPE	54
Figure 48: Algorithms configuration section with three algorithms defined	55
Figure 49: Monitoring stations configuration section with the consensus algorithm enabled	55
Figure 50: Output graph produced when using the consensus algorithm	56
Figure 51: Algorithms configuration section with a BED window parameter of 20	57
Figure 52: Algorithms configuration section with a BED window parameter of 6	58
Figure 53: Output graph produced using a BED window parameter of 20	58
Figure 54: Output graph produced using a BED window parameter of 6	59
Figure 55: JDBC driver zip file location	60
Figure 56: Unzipping JDBC driver file	61
Figure 57: Unzipped file location	61
Figure 58: Copying JAR files	62
Figure 59: New JAR file location	62
Figure 60: Data sources configuration section with the type parameter set as database	63
Figure 61: Data  sources configuration section with the location parameter listing URL of
database	64
Figure 62: Data sources configuration section with the timestep options parameters	65
Figure 63: Data sources configuration section with the database options parameters	66
Figure 64: Data sources configuration section with database options login parameters	67
Figure 65: Data sources configuration section with  the optional database options parameter,
input format	68
Figure 66: Output graph produced for January 3, 2011	70
Figure 67: Output plot produced for January 6, 2011	71
Figure 68: Signals configuration section for original calibration signal, RAW_CAL	71
Figure 69: Signals configuration section for composite calibration signal	72
Figure 70: Algorithms and monitoring stations configuration sections	73
Figure 71: Output graph produced with composite signal	74
Figure 72: Signals configuration section for CAL_TJJVIE_OUT_CTFD_A	75
Figure 73: Signals configuration section for FINAL_CAL_CTFD	76
Figure 74: Output graph produced with alarm suppression composite signal enabled	77
Figure 75: Monitoring stations configuration section	78
Figure 76: Output graph produced of Station D data with a calibration period and three alarms. 79
Figure 77: Monitoring stations configuration section with three pump signals for Station D	80
Figure 78: Output graph produced with the additional three operational signals	81
Figure 79: Signals configuration section with added composite signals	82
Figure 80: Output graph produced with three operation signals and event probability	83
Figure 81: Signals configuration section with changed calibration signal	83
Figure 82: Signals configuration section with the composite calibration signal	84
Figure 83: Output plots produced using composite calibration signal	85
Figure 84: Signals configuration section with the modified composite signal	86
Figure 85: Output plots produced using modified composite signal	86
Figure 86: Output plots produced using modified composite signal for January 25th only	87
Figure 87: Signals configuration section with final composite signal	87
Figure 88: Output plots produced using final composite signal for entire week	88
Figure 89: Output plots produced using final composite signal for January 25th	88
CANARY Training Tutorials                                                       Page 6

-------
Figure 90: Monitoring stations configuration section	89
Figure 91: Output graph produced using initial configuration file	90
Figure 92: Signals configuration section defining composite signal, REL_TANK_LVL	91
Figure 93: Signals configuration section defining composite signal, TANK_CL_CHANGE	92
Figure 94: Monitoring stations configuration section composite signal enabled	93
Figure 95: Output graph produced when using composite signal	94
Figure 96: Signals configuration section with the modified TANK_CL_CHANGE signal	95
Figure 97: Output graph produced when using modified signal	96
Figure 98: Example of comments within a configuration file	98
Figure 99: NFAILURES column using a NTRIALS of 20	102
Figure 100: Binomial distribution function using a NTRIALS of 20 and a PFAIL of 0.5	103
Figure 101: Probability of event values for a NTRIALS of 20	104
Figure 102: Probability of event graph using a NTRIALS of 20	105
Figure 103: Probability of event values for a NTRIALS of 6	105
Figure 104: Probability of event graph using aNTRIALS of 6	106
Figure 105: Example of canary section using BATCH mode	108
Figure 106: Example of canary section using a database connection	108
Figure 107: Example of timing options section	109
Figure 108: Example of data sources section using a CSV file	Ill
Figure 109: Example of signals section using a chlorine signal	113
Figure 110: Example of algorithms section using the LPCF algorithm	114
Figure 111: Example of monitoring stations section	115
CANARY Training Tutorials                                                    Page 7

-------
List of Tables
Table 1: Key Parts of CANARY Training Tutorials	10
Table 2: Folder Directories for CANARY Training Tutorials	11
Table 3: CANARY Configuration File Sections	107
Table 4: Input Parameters for canary Section of the CANARY Configuration File	107
Table 5: Input Parameters for timing options Section of the CANARY Configuration File	108
Table 6: Input Parameters for data sources Section of the CANARY Configuration File	109
Table 7: Input Parameters for signals Section of the CANARY Configuration File	112
Table 8: Input Parameters for algorithms Section of the CANARY Configuration File	113
Table 9: Input Parameters for monitoring stations Section of the CANARY Configuration File
	114
CANARY Training Tutorials                                                      Page 8

-------
List of Acronyms

ALM        Alarm signal
CAL         Calibration signal
BED         Binomial Event Discriminator
CANARY    Name of the CANARY software, not an acronym!
CAVE       Consensus AVErage, one of the consensus algorithms contained in CANARY
CL2         Residual or free chlorine concentration in water (water quality parameter)
CMAX       Consensus MAXimum, one of the consensus algorithms contained in CANARY
COND       Specific conductivity of water (water quality parameter)
CSV         Comma-Separated Value data file
EDS         Event Detection System
EDSC        File extension for CANARY pattern matching cluster file (e.g., cluster_file.edsc)
EDSD        File extension for CANARY output file (e.g., output_file.edsd)
EDSY        File extension for CANARY configuration file (e.g., config_file.edsy)
EPA         Environmental Protection Agency
FAQ         Frequently Asked Questions
LPCF        Linear Prediction Coefficient Filter, one of the algorithms contained in CANARY
MVNN       Multivariate Nearest Neighbor, one of the algorithms contained in CANARY
ORP         Oxidation reduction potential
PH          pH is a measure of the acidity of water (water quality parameter)
SCADA      Supervisory Control and Data Acquisition
SPPE        Set point proximity exponential distribution algorithm contained in CANARY
TEMP        Temperature of water (water quality parameter)
TOC         Total organic carbon concentration in water (water quality parameter)
TURB        Turbidity in water (water quality parameter)
YML        File extension for CANARY configuration file (e.g., config_file.yml)
CANARY Training Tutorials
Page 9

-------
1.  Introduction
The CANARY event detection software was developed to enhance the detection of contaminants
in drinking water. Many drinking water utilities collect real-time data from sensors located
throughout the water distribution network. The sensors measure water quality parameters such as
pH, residual chlorine, total organic carbon, and specific conductance. CANARY analyzes these
data rapidly to identify anomalous or abnormal periods of water quality that might indicate
contamination incidents. CANARY has operated in several water utilities around the world since
it was released publicly in 2009. The software has also been used as a research platform for
testing and developing new capabilities for event detection system (EDS) algorithms and sensor
performance. CANARY Training Tutorials provide practical examples of the full range of
CANARY'S capabilities. Some basic functions are used each time the software is used, and
others are used mainly in specific situations. Each example is documented with a step-by-step
approach using text and screen-capture figures illustrating the changes made to the configuration
file parameters and discussing the impact of those changes on the CANARY output. Table 1 lists
all of the key sections of the CANARY Training Tutorials.

Table 1: Key Parts of CANARY Training Tutorials
Section 2
Section 3
Section 4
Section 5
Appendix A
Appendix B
Appendix C
Appendix D
Glossary
Specifying parameters for the CANARY configuration file
Optimizing CANARY parameter values for maximum performance
sensor monitoring stations
at multi-
Running CANARY in real-time with input from a database
Creating composite signals in CANARY based on combinations of sensor
data
Frequently Asked Questions (FAQs)
Binomial Distribution Function Exercise
Configuration File Quick Reference
File Types
Definitions of commonly used terms in CANARY
All of the input and output files used in CANARY Training Tutorials are available to download
from the same location as this document and the CANARY Trac site
(http s: //software. sandi a. gov/trac/canary/downl oader/downl oad/category/6) .The
CANARY_Training_Tutorials.zip file contains two folders, Tutorial_Files and Tutorial_Results.
The Tutorial_Files folder contains the input files for the examples covered in the CANARY
Training Tutorials and can be used to replicate the examples. The Tutorial_Results folder
contains the input and output for these examples and can be used to verify the example results.
This folder is Read-only. The main directories of these folders are listed in Table 2.
CANARY Training Tutorials
Page 10

-------
Table 2: Folder Directories for CANARY Training Tutorials
Main Directory in
Tutorial Files
Configuration Tutorial
Optimizing Tutorials
Database_Tutorial
Composite_Tutorials
Subdirectories in
Tutorial Files
Valid_Range\Default
Valid Range\Change
Evaluation_Type\Temp_wq
Evaluation_Type\Temp_op
Evaluation Type\Cal cal
Evaluation Type\Cal op
History WindowMnitial
History Window\Change
Outlier ThresholdMnitial
Outlier Threshold\Change
Event ThresholdMnitial
Event Threshold\Change
Multiple_Locations
Multiple_Algorithms\HW_3 6
Multiple AlgorithmsMTW 72
Multiple_Algorithms\HW_l 08
Multiple_Algorithms\HW_l 44
Multiple AlgorithmsMTW 108
Consensus AlgorithmMnitial
Consensus AlgorithmUoin
BED\Window 6
BED\Window_20

Composite_Signals_l\Initial
Composite_Signals_l\Flip_Cal
Composite Signals l\Suppress
Composite Signals 2\Initial
Composite_Signals_2\Step 1
Composite_Signals_2\Step 2
Composite Signals 2\Step 3
Composite Signals 2\Step 4
Composite Signals 2\Step 5
Composite_Signals_3\Initial
Composite Signals 3\Step 1
Composite Signals 3\Step 2
Corresponding Training
Tutorials Section
2.2 Valid Range
2.3 Evaluation Type
2.4 History Window
2.5 Outlier Threshold
2.6 Event Threshold
2.7 Multiple Locations
3.1 Multiple Algorithms
3.2 Consensus Algorithm
3.3 Binomial Event
Discriminator
4.1 Obtaining JDBC
Driver, 4.2 Modifying
Configuration File to Use
Databases
5.1 Suppressing an Alarm
After a Calibration Event
5.2 Integrating Pump
Flow Operational Data
into Event Detection
5.3 Integrating Tank
Level Operational
Information Into Event
Detection
The user is encouraged to examine other existing CANARY software documentation:
CANARY Training Tutorials
Page 11

-------
   •   CANARY Quick Start Guide describes how to install and run CANARY (U.S. EPA 2012).

   •   CANARY User's Manual provides a concise description of the configuration file and the
       parameters used in the event detection algorithms (Hart and McKenna 2012).

   •   Water Quality Event Detection Systems for Drinking Water Contamination Warning
       Systems (Murray et al. 2010) defines the motivation for the development of CANARY
       and the theory underlying the software's mathematical and statistical algorithms.

  Presentations from CANARY webinars are available in Adobe PDF format at this site:
  https://software.sandia.gov/trac/canary/downloader/download/category/6. Users will be
  required to fill out a short registration form.
CANARY Training Tutorials                                                     Page 12

-------
2. Configuration File Tutorial
The configuration file is one of the required CANARY input files. It specifies all the parameter
values needed to run CANARY for a particular analysis. In addition, the software requires a data
source: either a comma-separated value (CSV) file that contains the sensor data, or a link to a
database containing the data (for many water utilities, part of a Supervisory Control and Data
Acquisition (SCADA) system).

The parameters specified in the configuration file communicate specific details about the type of
data source to be used, the length of the analysis and other timing options, the water quality,
operational and alarm/calibration signals contained in the data sources, and the algorithms and
associated parameter values used to analyze the data.

This section provides a tutorial designed to increase familiarity with the CANARY configuration
file and the parameters contained within it. In particular, these tutorials examine several of the
specific parameters that need to be defined for a CANARY run in the signals, algorithms, or
monitoring stations sections of the configuration file.

   •   In the signals section of the configuration file, the tutorials examine the valid range of the
       water quality (WQ), operational (OP), and calibration (CAL) signals and the type of an
       input signal.
   •   For the algorithms section of the configuration file, the tutorial examines the size of the
       history window, the outlier threshold, and the event threshold. The last tutorial in this
       section explains how to add multiple monitoring stations to a single configuration file
       under the monitoring stations section.

The associated files are found in the "Tutorial_Files\Configuration_Tutorial\" directory.

Additional information on the configuration file and these parameters is provided in Appendix
D: Configuration File Quick Reference of this document and Section 5 of the CANARY User's
Manual (Hart and McKenna 2012).

2.1 Editing the Configuration File
Configuration files are written in the YML markup language. YML files are commonly used for
software configuration because their format allows for a large amount  of detailed information to
be read and edited. YML files are  formatted with a single parameter value defined on each line,
specified by the parameter name and the parameter value. Groups of parameters that go together
to jointly define functionality within CANARY are indicated by indented lines. Descriptive
comments are added following a number sign (#) and are ignored by CANARY. The
configuration files end with either an ".yml" (YML) or ".edsy" (EDSY) extension.1 The EDSY
extension hooks the file into the right-click abilities of the Windows ® operating system;
otherwise it is equivalent to the YML extension.
 This is a change from previous CANARY configuration files that were written in the XML markup language and ended with an
   ".edsx" (EDSX) extension. However, CANARY is backward compatible and can read EDSXfiles and translate into EDSY files.
CANARY Training Tutorials                                                       Page 13

-------
Any text editor can be used to view and edit the CANARY configuration file. An example
configuration file is shown using three different editors: UltraEdit, WordPad, and Notepad++ in
Figure 1, Figure 2,  and Figure 3,  respectively. The formatting of the configuration files,
including the indentations, is easy to see in all three editors. The Notepad++ editor recognizes
the YML format and automatically uses different colors to highlight different components of the
configuration file, for example, comments are shown in green.

The file shown in each figure is stationB_VR_CL.yml and is found in the directory
"Tutorial_Files\Configuration_Tutorial\Valid_Range." As shown, the file specifies that this
CANARY run is in batch mode using historical data contained in a CSV file. The signals to be
analyzed by CANARY include residual chlorine (CL2) and conductivity (COND) data collected
every 20 minutes from 02/21/2006 - 04/30/2006.

   	 B CANARY Config File- Valid Range Exercise
   canary:
     run mode:  BATCH
     control type: INTERNAL
     control messenger: null
     driver files: null

   # Enter the time step options below
   timing options:
     dynamic start-stop:  no
     date-time format: mm/dd/yyyy HH:HH:SS
     date-time start:  02/21/2006 00:00:00
     date-time stop:   04/30/2006 23:40:00
     data interval: 00:20:00
     message interval: 00:00:01

   » Enter the list of data sources below
   data sourc e s:
   - id: stationb  in
     type      :  csv
     location   :  Tutorial Station B.csv
     enabled    :  yes
     timestep options:
      field: "TIME STEP"
   # Enter the list of SCADA/composite signals/parameters below
   signals:
   - id: TEST_CL
     SCADA tag: B_CL2_VAL
     evaluation type: wq
     parameter type:  CL2
     ignore changes:  none
     data options: # DATA
      precision: 0.0035
      units: !Hg/L'
      valid range: [-.inf,  .inf]
      set points: [-.inf, ,inf]

   - id: TEST_COND
     SCADA tag: B_COND_VAL
     evaluation type: wq
     parameter type:  COND
     ignore changes:  none
     data options: # DATA
 •i"    precision: 1


Figure 1: YML file example viewed in UltraEdit.
CANARY Training Tutorials                                                            Page 14

-------
  	 # CANARY  Config File- Valid Range Exercise

  canary:
    run mode: BATCH
    control  type:  INTERNAL
    control  messenger: null
    driver files:  null

  $ Enter the time step options below
  timing options:
    dynamic  start-stop: no
    date-time format: ram/dd/yyyy HH:HH:SS
    date-time start:  02/21/2006 00:00:00
    date-time stop;   04/30/2006 23:40:00
    data interval: 00:20:00
    message  interval: 00:00:01

  # Enter the list of data sources below
  data sources:
  - id: stat ionfo__in
    type       : csv
    location   : Tutorial_3tation_B.csv
    enabled     : yes
    timestep options:
      field: "TIKE STEP"
  # Enter  the  list of SCAD A/composite signals/parameters below
  signals:
  - id:  TEST_CL
    SCABA  tag: B__CL2__VAL
    evaluation type: wq
    parameter  type: CL2
    ignore  changes: none
    data options: # DATA
      precision: 0.0035
      units:  f Hg/L'
      valid range: [-.inf,  .inf]
      set  points:  [-.inf, .inf]

  - id:  TEST__COND
    SCADA  tag: B_COND_VAL
    evaluation type: wq
    parameter  type: COMD
    ignore  changes: none
    data options: # DATA
      precision:  1
      units:  ' l\mu}S/etn'
      valid range: {-.inf,  .inf]
      set  points:  [-.inf, .inf]
Figure 2: YML file example viewed in WordPad.
CANARY Training Tutorials                                                                   Page 15

-------
      canary:
        run mode:  BATCH
        control type: INTERNAL
        control messenger: null
        driver files: null
      timing options:
        dynamic start-stop: off
        date-time format:  mm/dd/yyyy HH:MM:SS
        date-time start:  02/21/2006 00:00:00
        date-time stop:   04/30/2006 23:40:00
        data interval: 00:20:00
        message interval:  00:00:01
      dat a sources:
      - id:  stationb__in
        type      : csv
        location   : Tuto rial_Stat.ion_
        enabled    : yes
        i imestep options:
          field: "TIME STEP"
      signals:
      - id:  TEST_CL
        SCADA tag: B_CL2_VAL
evaluat ion type:
parameter t ype :
ignore changes :
dat a options :
precision:
tin it s : * Mcf/ L '
valid range : [
set point s : [ -
wq
CL2
none



-. inf,
.inf,






.inf]
.inf]
Figure 3: YML file example viewed in NotePad++.
2.2 Valid Range
This tutorial examines the valid range parameter which defines a range of acceptable values for
each input data signal in the configuration file. This parameter also provides the minimum and
maximum bounds on the y-axis of CANARY'S graphical output. However, if the data is within a
smaller range, the graphical output adjusts to this smaller range on the y-axis (i.e., the graph
zooms in on the data) in order examine the smaller fluctuations in the data. Values outside the
valid range are ignored  by CANARY. This parameter can be used to indicate when data should
be ignored as a result of sensor or data transmission malfunctions. Thus, the valid range
parameter value should  be selected with care.

If the valid range parameter is omitted or left blank, then the values are set to the default,  [-.inf,
.inf], where .inf stands for infinity. This default range does not restrict signal values and so all
values will be read and analyzed by CANARY. However, the default can allow for unrealistic
values to be included in the analysis. For example, the pH concentration is always above or equal
to zero and less than 14, then, the valid range might be defined as [0, 14].
CANARY Training Tutorials                                                        Page 16

-------
Figure 4 shows part of the stationB_VR_CL.yml configuration file found in the
"Tutorial_Files\Configuration_Tutorial\Valid_Range\Default" directory. For the CL2, COND
and PH signals defined, the valid range parameter is set to the default value of [-.inf, .inf]. When
this configuration file is run in CANARY, the y-axis of the graphical output for each signal is
automatically scaled to the minimum and maximum value of that signal within the time period of
the plot, as shown in Figure 5 (the blue dots/lines represent the water quality data that
contributed to the identification of an event at that time). In this  case, chlorine varies between 0
and 3 mg/L, while pH varies from 7.2 to 8. This automatic scaling might work for many signals,
but sometimes the scale of the plot might be too broad to provide details of the signal
fluctuations. The valid range parameter can be used to adjust the scale of the plots. However,
reduction in the range might cause some data to be ignored in the CANARY analysis.

 signals:
 -  id: TEST_CL
   SCADA tag;  B_CL2_VAL
   evaluation type: wq
   parameter type: CL2
   ignore changes: none
   data options:
    precision:
    unit s :  T Mg  LT
    valid  range:  [-.inf, .inf]
    set points:  [-.inf,  .inf]

 -  id: TEST_COND
   SCADft tag:  B_COND_VAL
   evaluation type: wq
   parameter type: COND
   ignore changes: none
   data options:    <
    precision:  ;
    units:  '{\mu)S/cm1
    valid  range:  [-.inf, .inf]
    set points:  [-.inf,  .inf]

 -  id: TEST^PH
   SCADA tag:  B_PH_VAL
   evaluation type: wq
   parameter type: PH
   ignore changes: none
   data options:
    precision:  ;. 01
    units:  T pHT
    valid  range:  [-.inf, .inf]
    set points:  [-.inf,  .inf]

Figure 4: Signals configuration section with default valid range parameters.
CANARY Training Tutorials                                                        Page 17

-------
                         StationB 2006-03-28 00:00:00 to 2006-04-03 23:40:00
      BCL2 3
    CL2(Mg/L) 2
           1
          28-Mar      29-Mar      30-Mar      31-Mar      01-Apr       02-Apr       03-Apr       04-Apr
      BPH  8
    PH (pH) 7|
          1A
          7.2
28-Mar      29-Mar      30-Mar      31-Mar      01-Apr      02-Apr
                                                                      03-Apr
04-Apr
Figure 5: Signal plots produced using the default valid range parameters.
Figure 6 shows part of the signals section of the stationB_VR_CL_.yml configuration file found
in the "Tutorial_Files\Configuration_Tutorial\Valid_Range\Change" directory. The valid range
parameter for the CL2 signal was changed from [-.inf, .inf] to [0.5, .inf], and from [-.inf., .inf] to
[7.0, 7.5] for the PH signal. After running the changed YML file in CANARY, the results are
shown in Figure 7. When the data exceeds the valid range, the plot is marked with a pink triangle
on the upper or lower boundary according to which value was exceeded. Multiple pink triangles
are shown in Figure 7; in order to capture all of the data in the plot, the valid range parameter
needs to be broadened. It is useful to recall that the valid range parameter value needs to be
selected with care, because it determines which data is analyzed by CANARY and which data is
included in the graphical output.

Another feature of the CANARY graphing output is shown in Figure 5 and Figure 7; when all
signals drop to a value of zero, the  station is considered offline, the data is not analyzed, and
these values are not show in the plot. In Figure 5 and Figure 7, the one hour of missing data early
on April 2n  is evident by the small gaps in the plots.
CANARY Training Tutorials
  Page 18

-------
 signals:
I-  id:  TEST_CL
   SCADA tag:  B_CL2_VAL
   evaluation  type:  wq
   parameter type:  CL2
   ignore changes:  none
t   data options:  #  DATA
     prec is ion:  0.003
     units:  'Mg/L'
     valid range:  [0.5,  .inf]
     set points:  [-.inf,  .inf]

I-  id:  TEST_COND
   SCADA tag:  B_COHD_VAL
   evaluation  type:  wq
   parameter type:  COND
   ignore changes:  none
[   dat a opt ions:  #  DATA
     precision:
     units:  '{\mu)S/cm'
     valid range:  [-.inf,  .inf]
     set points:  [-.inf,  .inf]

I-  id:  TEST_PH
   SCADA tag:  B_PH_VAL
   evaluation  type:  wq
   parameter type:  EH
   ignore changes:  none
I   data options:  #  DATA
     prec i s ion:  J. 01
     units:  TpH'
     valid range:  [7.0,  7.5]
     set points:  [-.inf,  .inf]
Figure 6: Signals configuration section with the changed valid range parameters.
                          StationB 2006-03-28 00:00:00 to 2006-04-03 23:40:00
       BCL2
    CL2 (Mg/L)
      BPH
    PH (pH)
                                     ~T
                                                                       03-Apr
04-Apr
          28-Mar      29-Mar      30-Mar      31-Mar
          7.2 -
          28-Mar      29-Mar      30-Mar      31-Mar       01-Apr       02-Apr       03-Apr       04-Apr
Figure 7: Signal plots produced using the valid range parameters in Figure 6.

2.3 Evaluation Type
This tutorial examines the evaluation type parameter which classifies input signals into one of
four different types: water quality (WQ), operational (OP), alarm (ALM) or calibration (CAL)
signals. Any signal must be classified into one of these types, but typically, water quality signals
(e.g., pH, residual chlorine,  total organic carbon, specific conductance) are those that directly
CANARY Training Tutorials
   Page 19

-------
measure an aspect of the quality of the water, although pressure, valve status, or other
measurable quantity could be labeled as a WQ type signal and CANARY could use the signal to
see if an event has occurred. Operational signals typically include water temperature, water
levels in tanks, pressures, flow rates, valve settings, or pump operations. Alarm signals generally
indicate a performance or maintenance issue (e.g., the sensor is  out of reagents or a membrane
needs cleaning). A calibration signal denotes a period in which all sensors at the monitoring
station are currently undergoing calibration or maintenance.

The types of signals are treated differently in the event detection calculations. Events are
identified based on WQ signals only; OP, ALM, and CAL signals cannot be used to  identify
events. OP signals can be used indirectly to identify events as part of a composite signal or as
part of pattern matching. ALM or CAL signals are used to indicate when water quality data
should be ignored.

The different signals are also treated differently in the output graphs. When the evaluation type
parameter is set as OP, the y-axis label on the plot is shown in green, indicating it is  an
operational signal and not used to identify events. When the evaluation type parameter is set as
CAL, the plot is not shown in the output graph. As an example,  in the StationB_wq.yml file
located in the "Tutorial_Files\Configuration_Tutorial\Evaluation_Type\Temp_op" directory,
temperature is defined as an operational signal. Figure 8 shows part of the signals section of the
YML file and the temperature plot is shown in Figure 9 with the green y-axis label given to all
OP signals.

- id: TESTJTEMP
  SCADA tag: B_TEMP_VAL
  evaluation type: op
  parameter type: TEMP
  ignore changes: none
  data options:  # DATA
    precision:
    units:  'AoF'
    valid range: [32, .inf]
    set points:   [-.inf,  .inf]

Figure 8: Signals configuration section with the evaluation type parameter as OP.
    BTEMP 62
   TEMP (°F) 60
          58
          56
          28-Feb      01-Mar      02-Mar     03-Mar      04-Mar      05-Mar     06-Mar      07-Mar

Figure 9: Operational signal plot.

If the evaluation type parameter is set to WQ as in Figure 10, the color of the y-axis label
changes from green to black in the  output plot (Figure 11) indicating that the data is included in
the detection analysis. This change was made to signals section of the StationB_wq.yml file
located in the directory, "Tutorial_Files\Configuration_Tutorial\Evaluation_Type\Temp_wq."
CANARY Training Tutorials                                                       Page 20

-------
- id: TEST_TBMP
  SCADA tag:  B_TEMP_VAL
  evaluation type: wq
  parameter type:  TEMP
  ignore changes:  none
  data options: #  DATA
    precision:   .
    units:  IAoFT
    valid range:  [32,  .inf]
    set points:  [-.inf,  .inf]
Figure 10: Signals configuration section with the evaluation type parameter as WQ.
    BTEMP 62
   TEMP(°F) 60
          58
          56
          28-Feb
                    01-Mar
                              02-Mar
                                       03-Mar
                                                 04-Mar
                                                           05-Mar
                                                                     06-Mar
                                                                               07-Mar
Figure 11: Water quality signal plot.

Signals defined as calibration (CAL) or alarm (ALM) are not displayed in the output graphs. In
Figure 12, the calibration signal CAL_StationB is defined as evaluation type CAL in the
StationB_cal-to-op.yml file in the
"Tutorial_Files\Configuration_Tutorial\Evaluation_Type\Cal_cal" directory. For alarm or
calibration signals, the alarm options parameters must also be defined as opposed to the data
options parameters for water quality signals (Figure 10) and operational signals (Figure 8).
Figure 13 shows the output graph, which does not include the  calibration signal CAL_StationB
as calibration signals are not plotted. This figure shows all of the water quality and operational
signal plots as well as the event probability plot. The probability of event is calculated with
CANARY using algorithms and parameter values specified in the configuration file.

 -  id:  CAL_StationB
   SCADA tag: CAL_StationB
   evaluation type:  cal
   parameter type: QUALITY
   ignore changes: none
   alarm options:  #ALARM
     value when  active:  1

Figure 12: Signals configuration section with the evaluation type parameter as CAL.
CANARY Training Tutorials
Page 21

-------
              xicr
       BCL2 2
    CL2(Mg/L) „

            -2
       BPH 7.4
     PH (pH)
           7.2
    TEMP(°F) 60
            58
    BCOND
 COND (|iS/cm)
     BTURB 0.7
  TURB(NTU) H|
           04
           0.3
   B PLNT OP
   PRES (PSI)
            70 -
   B PLNT OP 18
   FLOW(gpm) 16

            14
    BTOC
  TOC(ppb)
           28-Feb
                             StationB 2006-02-28 00:00:00 to 2006-03-06 23:40:00
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
                       01-Mar
                                  02-Mar
                                             03-Mar
                                                        04-Mar
                                                                    05-Mar
                                                                               06-Mar
                                                                                          07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
                                                                                          07-Mar
                                                                                          07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar       07-Mar
          10.32 i	,	1	,	,	1	,	
          10.3
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar       07-Mar
            1 ,-	:	:	:	:	:	:	:
                                                                                          07-Mar
Figure 13: Output graph without calibration signal.


In Figure  14, the evaluation type parameter for the CAL_StationB signal is changed to OP in the
StationB_cal-to-op.yml file in the directory
"Tutorial_Files\Configuration_Tutorial\Evaluation_Type\Cal_op." Figure 15 displays the output
graph with the CAL_StationB signal at the top in green. Note the calibration is zero during this
one week period as there are no calibration  events during this week.
CANARY Training Tutorials
Page 22

-------
 - id: CAL_StationB
  SCftDA tag: CAL_StationB
  evaluation type:  op
  parameter type: QUALITY
  ignore changes: none
  data options: # DATA
    precision:  J.0001
    units:   ''
    valid range: [-.inf,  .inf]
    set points:  [-.inf,  .inf]
Figure 14: Signals configuration section with the evaluation type parameter switched to OP.
CANARY Training Tutorials                                                      Page 23

-------
   CAL Stations 1
     QUALITY
             x10
       BCL2
    CL2 (Mg/L)
           -2
           28-Feb
       BPH
     PH (pH)
     BTEMP 62
    TEMP(°F) 60
           58
           56
           28-Feb
    BCOND
COND (|iS/cm)
           28-Feb
     B TURB n -
  TURB(NTU) p
           28-Feb
   B PLNT OP 75
   PRES (PSI)
           70
   B PLNT OP
   FLOW(gpm)
    BTOC 10.32
  TOC(ppb)
          10.3
           28-Feb

           28-Feb
                            StationB 2006-02-28 00:00:00 to 2006-03-06 23:40:00
                      01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
                      01-Mar       02-Mar       03-Mar       04-Mar
                                                                  05-Mar       06-Mar
                      01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
                      01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
           14
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
                      01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
                      01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
             x103
                                                                                        07-Mar
                                                                                        07-Mar
                                                                                        07-Mar
                                                                                        07-Mar
                                                                                        07-Mar
                                                                                        07-Mar
                                                                                        07-Mar
                                                                                        07-Mar
                                                                                        07-Mar
Figure 15: Output graph with the evaluation type parameter switched to OP.


A simple, useful trick to verify that a calibration signal is performing as expected is to
temporarily change the evaluation type parameter from CAL to OP and examine the output
graph. After verification, the evaluation type parameter needs to be switched back to CAL. This
trick is especially useful when constructing composite signals as discussed later in this
document. It is important to note that multiple CAL signals can be defined in a configuration file,
but only one can be enabled for each station. If multiple calibration signals are needed, the
signals must be changed to operational signals (OP) and combined using a calibration composite
signal (CAL). Section 0 describes the process in more detail.
CANARY Training Tutorials
Page 24

-------
2.4 History Window
This tutorial examines the history window parameter which specifies the number of previous data
points used to predict the next value of a water quality signal. This is a parameter in the
algorithms section of the configuration file that determines how much historical data is used by
CANARY at any given time step, to make a prediction about the next time step. The history
window is a fixed length but moves forward in time: for example, if the history window
parameter is two days, on Monday, CANARY will use data from Saturday and Sunday to make
predictions, and on Tuesday, CANARY will use data from Sunday and Monday.

The value assigned to the history window parameter is applied to all water quality signals
included in the monitoring station definition in the configuration file. In general, there is  an
optimal value for the history window parameter value that yields more accurate predictions.
When applying CANARY to multiple water utility monitoring stations, a window size of 1.5 to
2.0 days has proven to be the most accurate; smaller or larger values resulted in decreased
accuracy. For many water quality signals, there are diurnal patterns of variability, and so  a
history window parameter value of one day or more makes sense. While including a large
number of days might seem like it would increase accuracy, for many water quality signals, data
from weeks or months ago do not add  much additional value to the analysis. For the example
shown in Figure 16, the time steps are 20 minutes long and thus a history window parameter
value of 144 is equivalent to two days.

 algorithms:
I-  id:  test
   type:  LPCF
   history window: J ''
   outlier threshold:
   event  threshold:
   event  timeout: "...
   event  window save:  '-MI
I   BED:
     window:  -_~
     outlier  probability:  J,E

Figure 16: Algorithms configuration section with a history window parameter of 144.

Figure 17 shows the results of running CANARY with the stationB_.yml file contained in the
"Tutorial_Files\Configuration_Tutorial\History_Window\HW_144" directory. The plot shows
one week of CANARY results. The label "LPCF nh = 144," shown on the side of the probability
of event plot, defines the algorithm used (LPCF) and the number of time steps in the history
window (nh = 144). Two events, on March 23r  and March 27*, are identified by the blue lines
(when the probability of an event as predicted by LPCF exceeds the threshold value). The first
event is due to an increase in the turbidity signal of station B (B TURB), while the second is due
to  increases in both the turbidity (B TURB) and chlorine (B CL2) signals. The probability of an
event also rises above zero at several other times during this week but does not exceed the
threshold and so is not labeled an event.
CANARY Training Tutorials                                                      Page 25

-------
     BCL2 0.02
  CL2 (Mg/L) „ Q1

            0
       BPH
     PH (pH) 7.4

           7.2k
     BTEMP 58
    TEMP (°F) 56
           54
    B COND 255
COND ftiS/cm) 25Q

          245
     BTURB
  TURB (NTU)
           0.5
   B PLNT OP 72
   PRES (PSI) 7Q
           68
   B PLNT OP
  FLOW(gpm) 15
           10
    BTOC
  TOC(ppb)
         10.29
           21-Mar
                           StationB 2006-03-21 00:00:00 to 2006-03-27 23:40:00
                     22-Mar
                                23-Mar
                                           24-Mar
                                                     25-Mar
                                                                26-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar
                                                                                     28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
            1
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar
                                                                                     28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar
                                                                                     28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
         10.31
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
            1 ,-	:	:	,	:	:	:	:	,,	:
                                                                           27-Mar
                                                                                     28-Mar
Figure 17: Output graph produced when the history window parameter is 144.

In order to evaluate the impact of the history window parameter on the results, the value is
reduced from 144 to 36 (2 days to 12 hours) and the analysis is redone. Figure 18 highlights the
change made to the configuration file, stationB_.yml file contained in the
"Tutorial_Files\Configuration_Tutorial\History_Window\HW_36" directory. All other
parameters within the configuration file are kept the same. Figure 19 shows the results of running
CANARY on the modified YML file. Instead of identifying 2 events, 12 events are now
identified. With the small history window, CANARY is not able to accurately predict future
water quality signals. Every new data value is far from CANARY'S prediction and so the
algorithms identify multiple events. This illustrates the importance of selecting appropriate
values for each of the parameters in the configuration file: a CANARY user might assume these
12 events indicate real water quality anomalies instead of realizing that the configuration
parameters were not selected appropriately. A history window parameter value of half a day is
CANARY Training Tutorials
Page 26

-------
not appropriate for this monitoring station.

 algorithms:
 -  id:  test
   type:  LPCF
   history window:
   outlier threshold:  0.8
   event  threshold:  0.85
   event  t imeout:
   event  window s ave:  3 0
   BED: #BED
    window:  >
    outlier probability:
Figure 18: Algorithms configuration section with a history window parameter of 36.
CANARY Training Tutorials                                                     Page 27

-------
                           StationE 2006-03-21 00:00:00 to 2006-03-27 23:40:00
     B CL2 0.02
  CL2 (Mg/L) „ m
       BPH
     PH (pH) 7.4

           7.2 -.
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar
     BTEMP 58
    TEMP (°F) 56
           54
    B COND 255
COND ((iS/cm)

          245
     BTURB
  TURB (NTU)
   B PLNT OP 72
   PRES (PSI) 7Q

           68
   B PLNT OP
  FLOW(gpm) 15
           10
    BTOC
  TOC(ppb)
         10.29
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar
     LLCO
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
                                                                                     28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
               s	„	^—.	_	^_^—^_xr
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar
         10.31 i	1	1	1	1	1	1	
                                                                                     28-Mar
           21-Mar      22-Mar      23-Mar       24-Mar      25-Mar      26-Mar       27-Mar      28-Mar

Figure 19: Output graph produced when the history window parameter is 36.
2.5 Outlier Threshold
This tutorial examines the outlier threshold which determines if an observed data value is
considered to be an outlier or within the normal background range of values. The residual is the
difference between the predicted water quality value and the observed water quality value at a
single time step. If the absolute value of the residual is larger than the outlier threshold
parameter, it is an outlier. The  outlier threshold parameter is defined in units of standard
deviations. Comparison of the  size of the residual to the outlier threshold parameter is done at
every time step for every water quality signal and the maximum residual value across all signals
is retained for comparison to the outlier threshold parameter. The algorithms section of the
initial configuration file is shown in Figure 20, with the outlier threshold parameter of 0.80
highlighted. This is part of the  stationB_.yml file in the directory
"Tutorial_Files\Configuration_Tutorial\Outlier_Threshold\Initial." The resulting event
CANARY Training Tutorials
Page 28

-------
probability plot is shown in Figure 21, where two distinct events are identified.
 algorithms:
I-  id:  test
   type:  LPCF
   history window:  144
   outlier threshold:
   event threshold:  1.. L j
   event timeout:  12
   event window  save:  30
I   BED:  # BED
     window:  c
     outlier probability:  D.5
Figure 20: Algorithms configuration section with an outlier threshold parameter of 0.8.
          1
         0.5
          04-Apr      05-Apr      06-Apr      07-Apr      08-Apr      09-Apr      10-Apr      11-Apr

Figure 21: Event probability plot produced when the outlier threshold parameter is 0.8.

Increasing the value of the outlier threshold parameter will reduce the number of time steps
classified as outliers and, thus, the number of events, which makes the event detection algorithm
less sensitive in terms of detecting significant changes in the water quality signal. In the
stationB_OT.99_.yml file in the directory
"Tutorial_Files\Configuration_Tutorial\Outlier_Threshold\Change," the value of the outlier
threshold parameter is increased from 0.8 to 0.99 as highlighted in Figure 22. The probability of
an event plot produced from running CANARY with this increased value is shown in Figure 23.
Only one event, on April 8* , is identified with this increased threshold value compared to the
three events identified with the smaller threshold value.

Changing the outlier threshold and the history window parameter values affects the sensitivity of
CANARY at a specified monitoring station. As a general guide, the history window parameter
should be set large enough to include two days of previous data and the outlier threshold
parameter should be adjusted to obtain the desired event detection sensitivity at the monitoring
station. Typically, the outlier threshold parameter will be near 1.0. Additional examples of
adjusting the outlier threshold and history window parameters are covered later in this document.
CANARY Training Tutorials                                                       Page 29

-------
algorithms:
 -  id:  test
   type:  LPCF
   history window:  144
   outlier threshold:
   event  threshold:  J.85
   event  t imeout:  1::
   event  window save:  30
   BED:  # BED
    window:  r,
    outlier probability: 0.5
Figure 22: Algorithms configuration section with an outlier threshold parameter of 0.99.
     °- "^
     -1 c

0.5
04-
OB ; M :
n
n i n i i J I rti n
Apr 05- Apr 06- Apr 07- Apr 08- Apr 09-Apr

	 h-jl~"!
1 J 1 1
10-Apr 11-/
Figure 23: Event probability plot produced when the outlier threshold parameter is 0.99.
2.6 Event Threshold
This tutorial examines the event threshold parameter which defines the maximum event
probability that must be exceeded before a group of outliers is identified as an event. Note the
outlier threshold parameter is related to a single outlier value, while the event threshold
parameter is for a group of outliers. For the initial run, the event threshold parameter is set to
0.85 (highlighted in Figure 24 which shows the algorithms section of the station_et.yml file in
the "Tutorial_Files\Configuration_Tutorial\ Event_Threshold\Initial" directory).

 algorithms:
]-  id:  test
   type:  LPCF
   history window:  144
   outlier threshold: 0.8
   event  threshold:  O.E
   event  timeout:
   event  window  s ave: 3
1   BED:  fi BED
     window: c
     outlier probability: 0.5

Figure 24: Algorithms configuration section with an event threshold parameter of 0.85.

Figure 25 shows event probabilities for one week of data using an event threshold parameter of
0.85. A total of eight distinct events were identified during this week.
CANARY Training Tutorials                                                      Page 30

-------
  &?.-
  -I *
       11-Apr       12-Apr       13-Apr      14-Apr      15-Apr      16-Apr      17-Apr      18-Apr

Figure 25: Event probability plot produced when the event threshold parameter is 0.85.


When the event threshold parameter is increased to 0.99, as highlighted in Figure 26, only six
events are identified in Figure 27, two less than at the previous setting. As the threshold is
increased, CANARY is less sensitive, detecting fewer events. This illustrates the importance of
selecting the appropriate value for the event threshold parameter with the desired detection
sensitivity.

 algorithms:
I-  id:  test
   type: LPCF
   history window:  144
   outlier threshold:  0.8
   event threshold:  0.
   event t imeout:
   event window s ave:
I   BED:  # BED
     window:  6
     outlier probability:  0.5

Figure 26: Algorithms configuration section with an event threshold parameter of 0.99.



      „;
      '                                                  U/\,
      11-Apr      12-Apr      13-Apr       14-Apr      15-Apr      16-Apr      17-Apr       18-Apr

Figure 27: Event probability plot produced when the event threshold parameter is 0.99.
2.7 Multiple Locations
Multiple monitoring stations can be included in a single configuration file. This is common in
online mode when multiple stations are run from a single launch of CANARY. For example, a
water utility might have five monitoring stations that contain three sensors each (pH, CL2, and
COND). All of the monitoring data is transmitted wirelessly to the utility's SCADA database. A
single CANARY configuration file can be created that defines all five stations and CANARY
can be run in online mode analyzing the data from all five stations simultaneously. Defining
multiple monitoring stations in a single configuration file is less common for offline (BATCH)
mode, but it can be useful if the data for multiple stations are stored in a single CSV formatted
file.

To combine two or more  stations in a single configuration file, the two station's configuration
details must be combined. If a configuration file already exists for each station, this process
consists of adding one of the configuration files into the other. When combining the
CANARY Training Tutorials                                                      Page 31

-------
configuration files, both stations need to be listed under the monitoring stations configuration
section, all input signals for both stations need to be listed with the correct names, and each
station needs to be assigned an algorithm.

Figure 28 shows the example configuration file, MultiStation.yml found in the "Tutorial_Files\
Configuration_Tutorial\Multiple_Locations" directory. This file defines two stations that both
use the same test algorithm; Station B has more input signals than Station A.

When a configuration file with multiple monitoring stations is run, multiple EDSD output files
are created; one for each station and one with results from all of the stations. Figure 29 shows the
output EDSD files created when using the configuration file shown in Figure 28.

The single EDSD file with the combined results can be used to graph each station separately.
The EDSD files generated are specific to each station and so are the resulting graphs. Figure 30
and Figure 31 are examples of graphs for Stations A and B, respectively. Since the number of
signals within each station is different, the resulting number of plots is also different. The graphs
are from the same week and both show variability in their respective signals and different
number of events.
CANARY Training Tutorials                                                       Page 32

-------
 monitoring stations:
I-  id:  StationB
   station id number:
   station tag name:  StationB
   location id number:
   enabled:  yes
I   inputs:
     -  id:  stationB_in
I   outputs:
     -  id:  stationA_out
   signals:
     -  id:  PRESS_CAt
     -  id:  B_CHLORINE1
     -  id:  B_PH
     -  id:  B_ORP
     -  id:  B_TEMP
     -  id:  B_COHD
     -  id:  B_PRES3
     -  id:  DEL_ERE3S
   algorithms:
     -  id:  test

I-  id:  StationA
   station id number:
   station tag name:  StationA
   location id number:  !
   enabled:  yes
I   inputs:
     -  id:  stationA_in
   outputs:
     -  id:  stationA_out
!   signals:
     -  id:  CAL_StationA
     -  id:  A_CHLORINE
     -  id:  A_PH
     -  id:  AJTEMP
     -  id:  A_COND
     -  id:  A_TOC
I   algorithms:
     -  id:  test
Figure 28: Monitoring stations configuration section.
Name
BB Multistation . edsd
NMMyltiStation.5tataonA.edsd
H Multistation . StationB , edsd
Size Type
354 KB CANARY Data File
80 KB CANARY Data File
347KB CANARY Data Fie
Figure 29: Output EDSD files.
CANARY Training Tutorials                                                        Page 33

-------
                                  StationA 2010-02-01 00:00:00 to 2010-02-07 23:50:00
     A CHLORINE  1-5
       CL2 (NTU)   1
                0.5
           APH   7

         PH(PH)6.8
A TEMPERATURE
       TEMP (C)   „


               -0.1
 A CONDUCTIVITY
    COND (pS/cm)
           ATOC
        TOC (ppm)
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb       08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb       08-Feb
                0.1 i	1	1	1	1	1	1	
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb       08-Feb
100 -
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb       08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb       08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb       08-Feb

Figure 30: Station A output graph.
CANARY Training Tutorials
                                                                                             Page 34

-------
                                  StationB 2010-02-01 00:00:00 to 2010-02-07 23:50:00
     BCHLORINE1 2
        CL2 (NTU)
           BPH
         PH (pH)
 BTEMPERATURE 15
        TEMP (C)
B CONDUCTIVITY
   COND(nS/cm) 280
     B PRESSURE 3
      PRESS (PSI) 2
                 1
                 0
     DEL PRESS
 Pressure Change
                  (WvAmlLhW^mjviW'liAjj:
Figure 31:  Station B output graph.
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
                 7h	!	
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
                01-Feb       02-Feb       03-Feb       04-Feb        05-Feb       06-Feb       07-Feb        08-Feb
CANARY Training Tutorials
Page 35

-------
3. Optimizing  CANARY Configurations Tutorial
Optimization of CANARY configuration parameters is an iterative process to determine the ideal
set of configuration parameters. Typically, the goal of optimizing the configuration parameters is
to minimize false positive detections while maximizing detection of true events. False positive
detections occur when CANARY identifies an event that the user does not consider to be a real
event. The configuration parameters can be adjusted to minimize false positives if historical data
is available that contains no real events. True detection occurs when CANARY detects an event
that the user has determined is a real event. In order to configure CANARY to maximize
detection of true events, data containing real events must be available. In most cases, data
containing multiple large-scale real incidents is not available; however, data containing events
such as pipe breaks or other routine water quality events should be used to help maximize the
true detection rate.

Determining the ideal set of parameters often requires a number of offline runs on historical data
with different parameter settings. The results are compared to determine which parameters are
best for each monitoring station. The computational burden of these runs can be decreased by
defining different algorithms, each with a different parameterization, in the same configuration
file and using a single CANARY run to obtain results.

Typically, the optimization process is  conducted for each monitoring station separately,  and
involves:
   •   selecting the most appropriate algorithm (e.g., LPCF, MVNN, or one of the consensus
       algorithms)
   •   selecting the ideal parameter values for the algorithm
   •   adjusting the Binomial Event Discriminator (BED) parameters to increase/decrease the
       sensitivity of the alarm detection as well as to adjust the delay between the onset of an
       event and the  identification of that event.

Algorithms are discussed in Section 2.4.2 of the CANARY User's Manual (Hart and McKenna
2012).  The files used  for the tutorials in this section are located in the
"Tutorial_Files\Optimizing_Tutorials" directory.

3.1 Multiple Algorithms
This tutorial examines different combinations of the history window and outlier threshold
parameters for the LPCF algorithm at a single monitoring station location. The goal is to select
values  of the history window and outlier threshold parameters that minimize false positives and
maximize detection at this monitoring station. The LPCF algorithm type has already been
selected for use, and the BED parameters have been set (BED parameters are discussed further in
Section 3.3 and Appendix B of this document, and Section 2.5.1 of the CANARY User's Manual
(Hart and McKenna 2012). For this tutorial, the outlier threshold parameter is varied from 0.6 to
1.0 and the history window parameter is varied from 36 to 180.

For this example, it is unknown if the data over this time period contains true water quality
events. It is assumed that the data set is representative of normal background conditions with no
true water quality events. Therefore, in this example, the combination of parameter values that
CANARY Training Tutorials                                                      Page 36

-------
minimizes the number of detected events should be selected in order to minimize the number of
false positives. Ideally, if data with real events were available for this example, the true detection
rate could be maximized. For more information about optimizing configuration,  see Murray et al.
2010.

For the first exercise, the configuration file, stationB_multialgorithm_FIW36.yml, and the
historical data file, Tutorial_Station_B.csv, are located in the directory
"Tutorial_Files\Optimizing_Tutorials\Multiple_Algorithms\HW_36". To begin,  the history
window parameter is set to 36 time steps. The data interval in the timing options  section of the
YML file is set to 20 minutes; therefore, 36 time steps is the equivalent of 12 hours. Figure 32
shows the algorithms section of the YML file where the history window parameter is set to 36
and five test algorithms are defined with id parameters of testl through tests. The only
differences in the test algorithms  are the values of the outlier threshold parameter, which range
from 0.60 (algorithm testl) to 1.0 (algorithm testS). Four of these definitions are shown in Figure
32 (algorithm testS is not shown). The monitoring stations section of the YML with the five
algorithms enabled is shown in Figure 33.
CANARY Training Tutorials                                                       Page 37

-------
 algorithms:
|- id:  testl
   type:  LPCF
   history window:
   outlier  threshold:
   event  threshold: 0.85
   event  timeout:  12
   event  window save:
1   BED: n BED
    window:
    outlier probability: 0.5

|- id:  test2
   type:  LPCF
   history window:  :I
   outlier  threshold:
   event  threshold: 0.85
   event  timeout:
   event  window save: 30
   BED: n BED
    window:
    outlier probability: 0.5

I- id:  test3
   type:  LPCF
   history window:
   outlier  threshold:
   event  threshold: 0.85
   event  timeout:  12
   event  window save: 30
I   BED: # BED
    window: 8
    outlier probability: 0.5

I- id:  test4
   type:  LPCF
   history window:
   outlier  threshold: 0.9
   event  threshold: 0.85
   event  timeout:  lj
   event  window save: 30
I   BED: # BED
    window: 6
    outlier probability: 0.5
Figure 32: Algorithms configuration section with a history window parameter of 36.
CANARY Training Tutorials                                                         Page 3 8

-------
 monitoring  stations:
 I-  id:  Stations
   station id number:
   station tag name: Stations
   loc at ion  id number:
   enabled:  yes
   input s:
     -  id: stationb_in
   outputs:
   signals:
     -  id: CAL_StationB
     -  id: TEST_CL
     -  id: TEST_PH
     -  id: TEST_TEMP
     -  id: TEST_COND
     -  id: TEST_TURB
     -  id: TEST_PRES_PLNT
     -  id: TEST_FLOW_PLNT
 I    -  id: TEST_TOC
       cluster: no
   algorithms:
     -  id: testl
     -  id: test2
     -  id: test3
     -  id: test4
     -  id: tests

Figure 33: Monitoring stations section with multiple algorithms enabled.


Figure 34 shows the output graph when CANARY is run on this YML file. The graphical output
produced is  similar to the case with a single algorithm, except that an additional plot is added to
the bottom of the graph for each of the five test algorithms. The outlier threshold parameter
increases from algorithms testl to tests (top to bottom in the figure). As the threshold increases,
fewer data points are considered to be outliers, and thus fewer events are detected. Thus the
sensitivity of the detection decreases as the outlier threshold parameter increases.
CANARY Training Tutorials                                                        Page 39

-------
             X10
                             StationB 2006-02-28 00:00:00 to 2006-03-06 23:40:00
       BCL2 2
    CL2(Mg/L) 0|

           -2'
       B PH 7.4
     PH (pH)
    TEMP (°F) 60
           58
    BCOND
COND (|iS/cm)
     B TURB 0.7
  TURB (NTU) g-|
   B PINT OP 18
  FLOW(gpm) 16
    BTOC
  TOC(ppb)
r
\
                                                         \
LL
-------
Tutorial_Station_B.csv, are located in the directory
"Tutorial_Files\Optimizing_Tutorials\Multiple_Algorithms\HW_72". Figure 35 shows the
algorithms section of the configuration file with the history window parameter set to 72 and the
five corresponding algorithms definitions.
 algorithms:
I- id:  testl
   type:  LPCF
   history window:
   outlier threshold: 0.6
   event  threshold: 0.85
   event  timeout:  i-
   event  window  save: 30
I   BED: « BED
     window:
     outlier probability: 0.5

I- id:  test2
   type:  LPCF
   history window:
   outlier threshold: 0.7
   event  threshold: 0.85
   event  timeout:
   event  window  save: 30
|   BED: « BED
     window:
     outlier probability:  '.'

I- id:  tests
   type:  LPCF
   history window:  72
   outlier threshold: 0.8
   event  threshold: 0.85
   event  timeout:  :
   event  window  save: 30
I   BED: # BED
     window:
     outlier probability: 0.5

I- id:  test4
   type:  LPCF
   history window:  72
   outlier threshold: 0.9
   event  threshold: 0.85
   event  timeout;
   event  window  save:
I   BED: «t BEE
     window:  6
     outlier probability: 0.5
Figure 35: Algorithms configuration section with a history window parameter of 72.

Figure 36 shows the output graph when the history window parameter is 72. Note the decreased
CANARY Training Tutorials                                                      Page 41

-------
sensitivity of the event detection relative to the previous exercise when the history window
parameter was equal to 36 (Figure 34). For this larger history window parameter, only three
events are identified at the two highest outlier thresholds parameters (bottom two plots).
                            Stations 2006-02-28 00:00:00 to 2006-03-06 23:40:00
       B CL2  2
    CL2(Mg/L)  O
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
       BPH 7.4
     PH (pH)
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar       05-Mar       06-Mar       07-Mar
    BCOND
COND (|iS/cm)
                                                                  ^yvtr:
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
     BTURB 0.7
  TURB(NTU) 0|
   B PINT OP
   PRES (PSI)
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
           75 F
           70
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
   B PINT OP 18
  FLOW(gpm) 16

           14
    BTOC
  TOC(ppb)
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
         10-32 I	1	1	1	1	1	1	
          10.3
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
     LLC-I
     LLOM
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
            1 ,
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
            1 ,
                                                                                      /L
     LLCN
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar
            1 ,
                                                                                   fl
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar       05-Mar       06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar      03-Mar      04-Mar      05-Mar       06-Mar       07-Mar


Figure 36: Output graph when the history window parameter is  72.
CANARY Training Tutorials
Page 42

-------
For the third exercise, the history window parameter is set to 108 time steps (or 1.5 days). The
configuration file, stationB_multialgorithm_HW108.yml, and the historical data file,
Tutorial_Station_B.csv, are located in the directory
"Tutorial_Files\Optimizing_Tutorials\Multiple_Algorithms\HW_108". Figure 37 shows the
algorithms section of the YML file with a history window parameter set to 108 and the five
corresponding algorithms definitions.
CANARY Training Tutorials                                                      Page 43

-------
 algorithms:
 - id:  testl
   type:  LPCF
   history window:  108
   outlier threshold:
   event  threshold:  0.35
   event  timeout:  11
   event  window save:
j   BED:  ft BED
     window:  e
     outlier  probability:

1- id:  test2
   type!  LPCF
   history window:  108
   outlier threshold:  .
   event  threshold:  0.85
   event  timeout:  I
   event  window save:
   BED:  # BED
     window:
     outlier  probability:  0.5

]- id:  testa
   type:  LPCF
   history window:
   outlier threshold:  0.6
   event  threshold:  0.85
   event  timeout:  U
   event  window save:
;   BED:  n BED
     window:
     outlier  probability:  0.?

 - id:  test.4
   type:  LPCF
   history window:  108
   outlier threshold:
   event  threshold:  0.85
   event  timeout:  :
   event  window save:
;   BED:  n BED
     window:
     outlier  probability:  0.5
Figure 37: Algorithms configuration section with a history window parameter of 108.

Figure 38 shows the corresponding output graph. The longer history window parameter leads to
even fewer events detected (Figure 38), but the decrease is not as dramatic as seen for the initial
increase in the history window parameter from 36 to 72 time steps.
CANARY Training Tutorials                                                     Page 44

-------
             X10
       BCL2
    CL2 (Mg/L)
       B PH 7.4
     PH (pH)
           7.2
    BCOND
COND (|iS/cm)
     BTURB 0.7
  TURB (NTU) jj-6
   BPLNTOP 18
  FLOW(gpm) 16

           14
    BTOC
  TOC(ppb)
           «
                            StationB 2006-02-28 00:00:00 to 2006-03-06 23:40:00
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar

                                                                  05-Mar       06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-M
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
         10.32 i	,	,	,	,	,	,	
          10.3
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
            1
                                                                                      A
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
            1 ,
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
            1 ,
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
            1 ,
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar
            1 ,
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar       06-Mar       07-Mar


Figure 38: Output graph when the history window parameter is 108.


For the fourth exercise, the history window parameter is set to 144 (2 days). The configuration
file, stationB_multialgorithm_HW144.yml, and the historical data file, Tutorial_Station_B.csv,
are located in the "Tutorial_Files\Optimizing_Tutorials\Multiple_Algorithms\HW_144"
CANARY Training Tutorials
Page 45

-------
directory. Figure 39 shows the algorithms section of the configuration file with a history window
parameter of 144 and the five corresponding algorithms definitions.
 algorithms:
 - id: testl
  type: LPCF
  history window:  i-.-i
  outlier threshold:
  event threshold:
  event timeout:
  event window save:
  BED: » BED
    window:
    outlier probability: 0.5

 - id: test2
  type: LPCF
  history window:
  outlier threshold: 0.7
  ev ent thr e shold:
  event timeout:  12
  event window save: 30
  BED: # BED
    window:
    out lift probability: 0.5

 - id: testa
  type: LPCF
  history window:   ;•!
  outlier threshold: 0.3
  event threshold:
  event timeout:
  event window save: 30
  BED: o BE I'
    window:
    outlier probability: 0.5

 - id: test"!
  type: LPCF
  history window:    :
  outlier threshold: 0.9
  event threshold: O.S5
  event timeout:  11
  event window save:
  BED: n BED
    window:
    outlier probability: 0.5
Figure 39: Algorithms configuration section with a history window parameter of 144.

Figure 40 shows the output graph when the history window parameter is 144 time steps using
five different outlier threshold parameters. As the outlier threshold parameter increases, the
number and duration of events decreases.
CANARY Training Tutorials                                                        Page 46

-------
                             StationB 2006-02-28 00:00:00 to 2006-03-06 23:40:00
       BCL2
    CL2(Mg/L)
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
       BPH 7.4
     PH (pH)
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
    BCOND
COND (nS/cm)
     B TURB 0 7
  TURB(NTU) °-6
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
   BPLNTOP 18
  FLOW(gpm) 16

           14
    BTOC
  TOC(ppb)
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
         10.32 |	1	,	1	1	,	,	
          10.3
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
            1
r 	 D;i" ' i 	 F\T\ 	 r 	
: ; r- iR ; n ;
rj_jijii! A i
\ A
I II
1
A I
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
            1 r 	,	:	:	:	:	,	
           28-Feb       01-Mar
            1 ,-
                                  02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar       04-Mar       05-Mar      06-Mar       07-Mar


Figure 40:  Output graph when the history window parameter is 144.


For the fifth exercise, the history window parameter is set to 180 time steps (2.5 days). The
configuration file, stationB_multialgorithm_HW180.yml, and the historical data file,
Tutorial_Station_B.csv, are located in the directory
CANARY Training Tutorials
Page 47

-------
"Tutorial_Files\Optimizing_Tutorials\Multiple_Algorithms\HW_180". Figure 41 shows the
algorithms section of the configuration file with a history window parameter of 180 and the five
corresponding algorithms definitions.
 algorithms:
 -  id:  testl
   type: LPCF
   history window: 180
   outlier threshold: 0.6
   event threshold: 0.85
   event timeout:
   event window save:
   BED: P BED
    window:
    outlier probability:

 -  id:  test2
   type: LPCF
   history window: 180
   outlier threshold: 0.7
   event threshold: 0.85
   event timeout:  ::
   event window save: 30
   BED: # BED
    window: 8
    outlier probability: 0.5

 -  id:  tests
   type: LPCF
   history window: 180
   outlier threshold: 0.3
   event threshold:
   event timeout:
   event window save:
   BED: H BE I'
    window:
    outlier probability: 0.5

 -  id:  test3
   type: LPCF
   history window: 180
   outlier threshold: 0.9
   event threshold: 0.85
   event timeout:  12
   event window save: 30
   BED: # BED
    window:
    outlier probability: 0.5
Figure 41: Algorithms configuration section with a history window parameter of 180.

Figure 42 shows the output graph when the history window parameter is 180 time steps. Note
that there is only one event identified for all five algorithms.
CANARY Training Tutorials                                                      Page 48

-------
                             StationB 2006-02-28 00:00:00 to 2006-03-06 23:40:00
       BCL2 2
    CL2(Mg/L) 0

            -2
       BPH 7.4
     PH (pH)
    BCOND
COND (p-S/cm)
     B TURB 0 7
  TURB(NTU) 0|
                  .^^^^
   B PINT OP 18
  FLOW(gpm) 16

           14
    BTOC
  TOC(ppb)
                                          nni
                                         r1 \  1  \i
      CL II
      SI
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
           28-Feb       01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
         10-32 i	,	,	1	1	,	,	
          10.3
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
            1
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
            1 r	:	i	:	:	:	•	
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar
            1 ,-           ,           ;           :            :                      ,            ;
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar
           °-5
           28-Feb      01-Mar       02-Mar       03-Mar      04-Mar       05-Mar       06-Mar      07-Mar

Figure 42: Output graph when the history window parameter is 180.


By analyzing the number of events detected for the combinations of history window and outlier
threshold parameter values, the sensitivity of CANARY to these parameters for this monitoring
CANARY Training Tutorials
Page 49

-------
station can be examined. Figure 43 shows the number of events identified for each combination
of history window and outlier threshold parameters for the data set from a single monitoring
station over a period of 69 days. Increasing both the history window and outlier threshold
parameters decrease the number of events detected. The decrease in the number of events is a
nearly linear function of the outlier threshold parameter, while the events decrease more rapidly
with an increasing history window parameter. If the data set is representative of normal
background conditions with no true water quality events, then the combination of parameter
values that minimizes the number of detected events should be selected. This would serve to
minimize the number of false positives. However, since data with real events is not available for
this test, it might be more effective to allow for a slightly larger number of detected events in
order to ensure detection is sensitive enough to detect true hazardous contamination events. For
more information about optimizing configuration, see Murray et al. 2010.
               0.6
                       °'7      0.8
                    outlier threshold
0.9
Figure 43: CANARY total events graph.


3.2 Consensus Algorithm
In some cases, the combination of two event detection algorithms provides improved results
relative to a single event detection algorithm. As an example, the LPCF algorithm identifies
significant changes in the relative value of a signal, but it does not provide information on the
actual value of the signal. In other words, the LPCF algorithm does not care what the data value
is, only how quickly it changes. Under the LPCF algorithm, a water quality signal could
gradually decrease from one to zero over a week without any events being detected by
CANARY. In contrast to the LPCF algorithm, a set-point algorithm detects events if the actual
values of the water quality signal exceed upper or lower bounds. However, it cannot detect
changes in water quality, no matter how sudden, that do not exceed either the set-point
thresholds (minimum or maximum). The consensus algorithm feature within CANARY allows
CANARY Training Tutorials
                                      Page 50

-------
both algorithms to be used simultaneously. Two consensus algorithm options (CAVE, which
averages event probabilities, and CMAX, which takes the maximum event probability) are
available within CANARY (see details in Section 2.4.3.5 the CANARY User's Manual (Hart
andMcKenna2012)).

This tutorial examines the application of the CMAX consensus algorithm, which combines the
set-point algorithm, SPPE, with the LPCF algorithm. The configuration files CLDY Initial.yml
and CLDY Join.yml are used for this tutorial and are located in the directory
"Tutorial_Files\Optimizing_Tutorials\Consensus_Algorithm". The monitoring station data used
in this tutorial is contained in the data file, CLDY train.csv., and includes six primary water
quality parameters: residual chlorine (CL2), specific conductivity (COND), pH (PH),
temperature (TEMP), turbidity (TURB), and total organic carbon (TOC). The data were
collected at a two minute sampling frequency. An alarm status for each sensor, indicating a
nonfunctioning sensor, was recorded. In addition, a large number of signals regarding the status
of pumps and valves along with flow rates, tank levels, and pressures were recorded. A
calibration signal for the entire station was provided.

The SPPE algorithm is chosen for the set-point algorithm  (see Section 2.4.2.3  of the CANARY
User's Manual (Hart and McKenna 2012) for additional details). In a set-point algorithm, as a
water quality signal value gets closer to either the minimum or maximum set-point value, the
probability of an event becomes closer to 1.0. The difference between the water  quality signal
value and either set-point value is measured in units of the precision parameter that is part of
each signal definition. Figure 44 shows the signal section  of the CLDY_initial.yml file in which
three signals (CL2, COND, and PH)  are defined each with distinct precision settings (0.005, 1,
and 0.01).  Set-point algorithms require the set points parameter to be set in the signal definition.
This parameter defines the minimum and maximum set-point values, and the default parameter
values are [-.inf., .inf]. The set-point data values for the TEST_CL and the TEST_PH signals are
shown in Figure 44, which are distinct from the valid range parameter data values. The
remaining signals, including TEST_COND, are set at the default set-point values of [-.inf, .inf],
and therefore do not contribute to the set point algorithm.
CANARY Training Tutorials                                                      Page 51

-------
 signals:
 - id: TEST_CL
  SCADA tag: CLDY_CL2X_VAL
  evaluation type: wq
  parameter type: CL2
  ignore changes: none
  data options:
    precision: 0,00 5
    units:  'mg/L'
    valid range: [0.0, 4]
    set points:  [0.2, 1.5]

 - id: TEST_COHD
  SCADA tag: CLDY_COMD_VAL
  evaluation type: wq
  parameter type: COMD
  ignore changes: none
  data options:
    precision: 1
    units:  ' {\niu}S/cm1
    valid range: [300, 500]
    set points:  [-.inf,  .inf]

 - id: TEST_PH
  SCADA tag: CLDY_PHXX_VAL
  evaluation type: wq
  parameter type: PH
  ignore changes: none
  data options:
    precision: 0.01
    units:  'pH1
    valid range: [6, 10]
    set points:  [8, 9.5]

Figure 44: Signals configuration section.
The algorithms section of the CLDY_initial.yml file is shown in Figure 45. One algorithm is
defined: B2 is the SPPE set-point algorithm with a history window parameter of 10 and an outlier
threshold parameter of 80. Figure 46 shows the monitoring stations section of the configuration
file which identifies the signals and algorithm to use in the analysis.
 algorithms:
 - id: B2
   type: SPPE
   history window:  i
   outlier threshold:
   event threshold:
   event timeout:  '
   event window save:
Figure 45: Algorithms configuration section with the consensus algorithm defined.
CANARY Training Tutorials                                                      Page 52

-------
 monitoring  stations:
 > id: StationD
   station id number:
   station tag name: StationD
   location  id number:    \
enabled
inputs :
- id:
outputs
: yes
stationd
in
signals :
-
-
-
-
-
-
-
-
-
id:
id:
id:
id:
id:
id:
id:
id:
id:
TEST
TEST
TEST
TEST
TEST
TEST
TEST_
TEST
TEST
CL
COND
PH
TEMP
TOC
RES
TANK
TANK
CAL





CL
_ELEV
FLOW

algorithms :
-
id:
B2


Figure 46: Monitoring configuration section with algorithm B2 activated.
Figure 47 shows the output graph from running this YML file in CANARY. During this one
week, the algorithm identifies two water quality events; in both cases, TEST_PH exceeds its
upper set point value. The event probability plot at the bottom of Figure 47 indicates other times
where the signals get close to one of the set-point values but do not exceed the set point to trigger
an event. The pink triangle in the TEST_PH plot shows when the values exceed the valid range,
here set to be [6, 10].
CANARY Training Tutorials                                                     Page 53

-------
                           StationD 2008-08-15 00:00:00 to 2008-08-21 23:54:00
     CLDYCL2X  '
      CL2 (Mg/L)  1

             0.5
    CLDYCOND 460
   COND (|iS/cm) 440
      CLDY TOCX
       TOC (ppm) 4
              2
  CLDY TANK CL2X
      CL2 (Mgi)
CLDY TANK ELEV
    ELEVffeet) 1020
  CLDY TANK FLOW
     FLOW(gpm) „
  CLDY OSXX ALM
      OPS (gpm)  1

             0.5
        LUO
        &TU °-5
        COc
              0
             15-Aug      16-Aug
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
             0.5
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
              5 i	1	1	1	
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
             15	1	1	1	1	1	1	
             15-Aug      16-Aug      17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
              1 ,-	
                                17-Aug      18-Aug     19-Aug      20-Aug     21-Aug      22-Aug
Figure 47: Output graph produced using the set point algorithm, SPPE.

In the configuration file, CLDY_Join.yml found in the directory "\Tutorial_Files\
Optimizing_Tutorials\Consensus_Algorithm\Join", as shown in Figure 48, the algorithms section
defines three algorithms: Bl is the LPCF algorithm; B2 is the SPPE set-point algorithm; JOIN is
the consensus algorithm of type  CMAX that combines the first two algorithms together. CMAX
retains the maximum event probability between two algorithms. To activate the consensus
algorithm in the analysis, the algorithm's name, JOIN, is listed under the algorithms parameter in
the monitoring stations section (Figure 49), as shown in CLDY_Join.yml.
CANARY Training Tutorials
Page 54

-------
algorithms:
- id: Bl
  type: LPCF
  history window:  700
  outlier threshold: 0.8
  event threshold:  0.995
  event timeout:  ir
  event window save:
  BED:
    window:
    outlier  probability: 0.5

- id: B2
  type: SPPE
  history window:  !
  outlier threshold:
  event threshold:  J.995
  event timeout:
  event window save:

- id: JOIN
  type: CMAX
  history window:  10
  outlier threshold:
  event threshold:  1.995
  event timeout:  30
  event window save:
  use algorithm inputs:
    - id: Bl
    - id: B2
Figure 48: Algorithms configuration section with three algorithms defined.
 monitoring stations:
]- id:  StationD
   station id number:
   station tag name: StationD
   location id number:
   enabled:  yes
]   inputs:
     -  id:  stationd_in
   outputs:
   signals:
     -  id:  TEST_CL
     -  id:  TEST_COND
     -  id:  TEST_PH
     -  id:  TEST_TEHP
     -  id:  TESTJTOC
     -  id:  TEST_RES_CL
     -  id:  TEST_TANK_ELEV
     -  id:  TE5T_TANK_FLO0
     -  id:  TEST_CAL
   algorithms:
     -  id:  Bl
     -  id:  B2
     -  id:  JOIN
Figure 49: Monitoring stations configuration section with the consensus algorithm enabled.
CANARY Training Tutorials                                                           Page 55

-------
Figure 50 shows the output graph with the results from all three algorithms for the same week as
shown in Figure 47. The event probability is calculated by each individual algorithm, as well as
the combined algorithm. The LPCF algorithm identifies multiple events while the set point
algorithm identifies only one event. The consensus algorithm used here, CMAX, combines the
results from both algorithms and identifies the events identified by both separate algorithms.

In summary, the consensus algorithm allows the simultaneous use of multiple algorithms for
water quality monitoring. This feature combines the strengths of different algorithms (e.g., set
points for absolute signal values and LPCF for relative changes in the signal values) to improve
confidence in the event detection results.
     CLDYCL2X
     CL2IMg/U
            '5
            0.5
      CLDY TOCX
      TOC (ppm) 4
             2
            15-Aug
             6-
  CLDY TANK CL2X
     CL2lMg/l_|
            0.5
            15-Aug
CLDY TANK ELEV
    ELEV(feet) 1020
  CLDY TANK FLOW
     FLOW(gpm) „
  CLDY OSXX ALM
     OPS (gpmi
                         StationD 2008-08-15 00-00:00 to 2008-08-21 23:54:00
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug
            15-Aug     16-Aug     17-Aug     18-Aug      19-Aug     20-Aug     21-Aug     22-Aug
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug
                     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug
                                                                 /^Wr ^w—^
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug
             11	
                     16-Aug     17-Aug     18-Aug
                                                         20-Aug     21-Aug     22-Aug
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug
             5 i	1	1	1	1	1	1	
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug
            1.5
            05
            15-Aug     16-Aug     17-Aug     18-Aug
                                                19-Aug
                                                         20-Aug     21-Aug     22-Aug
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug

        £?        —'*> :

        (flc*
                                                                           22-Aug
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug
            15-Aug     16-Aug     17-Aug     18-Aug     19-Aug     20-Aug     21-Aug     22-Aug

Figure 50: Output graph produced when using the consensus algorithm.
CANARY Training Tutorials
Page 56

-------
3.3 Binomial Event Discriminator
Typical utility water quality data has significant, short-lived spikes in the values, which often last
only one or two time steps. Some of these spikes are due to errors in the sensors or in the
transmission of data; in most cases, they do not indicate a true water quality event. In order to
ignore these spikes, the Binomial Event Discriminator (BED) function within CANARY
aggregates evidence of an event (i.e., outliers) over multiple consecutive time steps before
identifying an event.

The BED is based on a binomial failure model that looks at the number of outliers
(NFAILURES) within a certain number of time steps in a user defined window (NTRIALS-
window) given a fixed probability of an outlier occurring at any time step (see Appendix B:
Binomial Distribution Function Exercise in this document for more details). If there are more
outliers within the window than would be predicted, then this increases the likelihood that an
event is occurring. A CANARY user can adjust the BED window parameter to increase or
decrease the sensitivity of detection.

The BED function works in conjunction with another algorithm. For example, Figure 51  shows
the algorithms section of StationB.yml found in the "\Tutorial_Files\Optimizing_Tutorials\BED"
directory. The test algorithm is defined to be of type LPCF and uses the BED function with a
window of 20 time steps and an outlier probability of 0.5. The LPCF algorithm is used to
identify outliers and then BED calculates the continuous probability that an event has occurred at
that time step and ensures that enough outliers are present within the window before identifying
an event. If the BED probability is greater than the algorithm event threshold parameter, then
CANARY identifies an event.

This tutorial examines the BED function within CANARY by considering the influence of the
BED window parameter. By changing  the BED window parameter, the user can increase or
decrease CANARY'S sensitivity. Figure 51 shows the algorithms section of StationB.yml found
in the "\Tutorial_Files\Optimizing_Tutorials\BED\Window_20" directory and Figure 52 shows
the algorithms section of StationB.yml found in the
"\Tutorial_Files\Optimizing_Tutorials\BED\Window_6" directory.  These files define the BED
window parameter as 20 and 6 time steps long, respectively. The output graphs for each
configuration are shown in Figure 53 and Figure 54.

algorithms:
- id: test
  type: LPCF
  history window: I•
  outlier  threshold:
  event threshold:
  event t inieout : 1 _
  event window  s ave: _ L
  BED:
    window:   J
    outlier probability:  :. , f

Figure 51: Algorithms configuration section with a BED window parameter of 20.
CANARY Training Tutorials                                                      Page 57

-------
 algorithms:
I- id:  test
   type:  LPCF
   history window:  144
   outlier threshold:  0.8
   event  threshold:  L . J [.
   event  timeout:
   event  window save:
I   BED:  # BED
     window:  6
     outlier probability:  J.[
Figure 52: Algorithms configuration section with a BED window parameter of 6.
       B CL2 32
   CL2 (MgVL)  3
            2.8
            2.6
            25-Apr
       B PH 7.6
     PH (PH) 7 4

            7.2
     BTEMP
    TEMP (°F)
     BCOND
COND (p.S/cm)
25-Apr
70
60
50
40
            25-Apr
     BTURB
  TURB (NTU) 0.2

            0.1
            25-Apr
   B PLNT OP 72
   PRES (PSI) 70
            68
            66
   B PLNT OP 35
   FLOW(gpm) 30
            25
            20

            25-Apr
                              StationB 2006-04-25 00:00:00 to 2006-04-30 23:00:00
            26-Apr
            26-Apr
                        26-Apr
27-Apr
28-Apr
29-Apr
30-Apr
                        26-Apr       27-Apr
                       — I    •       I =
            2 8-Apr
            26-Apr       27-Apr
            28-Apr
27-Apr
28-Apr
29-Apr
                       27-Apr
            28-Apr
            29-Apr
            30-Apr
            25-Apr        26-Apr
                         J _ /
                       27-Apr
            28-Apr
            26-Apr
27-Apr
28-Apr
29-Apr
01-May       02-May
30-Apr       01-May
30-Apr       01-May
            29-Apr       30-Apr       01-May       02-May
           —-1	—	—i—	=-——]
            25-Apr        26-Apr       27-Apr       28-Apr        29-Apr       30-Apr       01-May       02-May
            29-Apr       30-Apr       01-May       02-May
            02-May
            25-Apr        26-Apr       27-Apr       28-Apr        29-Apr       30-Apr       01-May       02-May
           01-May       02-May
            29-Apr       30-Apr       01-May       02-May
            02-May
Figure 53: Output graph produced using a BED window parameter of 20.
CANARY Training Tutorials
                                                                                     Page 58

-------
                           StationB 2006-04-25 00:00:00 to 2006-04-30 23:00:00
      BCL2 3.2
   CL2 (Mg/L)  3
          2.8
          2.6
      B PH 7.6
     PH (pH) 7 4

          7.2
     BTEMP
   TEMP (°F) 69
           68
    BCOND
COND (|iS/cm)
     BTURB
  TURB (NTU) 0.2

          0.1
   B PINT OP 72
   PRES(PSI) 70
           68
           66
   B PINT OP 35
  FLOW(gpm) 30
           25
           20
   TOC (ppb)  '
          8.8
          8.6
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May
           25-Apr      26-Apr       27-Apr       28-Apr       29-Apr      30-Apr       01-May       02-May

Figure 54: Output graph produced using a BED window parameter of 6.

With a BED window parameter of 20, Figure 53 shows fewer events than for a BED window
parameter of 6 as in Figure 54. A shorter BED window parameter leads to faster event
determinations, but also results in more events being identified, some of which might be false
positives. A shorter BED window parameter leads to faster event determination, because fewer
outliers are needed in order to classify a group of outliers as an event.

The tradeoff between faster determination and increased false positives is found in nearly all
types of event detection systems. For each monitoring station and situation, the appropriate time
to event determination and number of detections must be determined. Changing the values of the
BED window, outlier threshold, and event threshold parameters gives the user the flexibility to
set the CANARY sensitivity for any monitoring station.
CANARY Training Tutorials
Page 59

-------
4.  Database Driven Input/Output Tutorial
This tutorial provides step-by-step instructions for connecting CANARY to a database. For most
online applications of CANARY, a database that contains water quality signals (along with any
associated operational and/or alarm signals) will need to be accessible to CANARY. This tutorial
assumes that the user has some knowledge and experience in using databases and connecting
databases to external programs. The configuration file used in this section,
example_SQLServer_2008_by_cols.yml, is located in the "Tutorial_Files\Database_Tutorial"
directory.

4.1 Obtaining JDBC Driver
    •   Download the JDBC driver for the user's specific database. These files enable CANARY
       to communicate directly with the database, and they are typically available from your
       database vendor's website. In Figure 55,  the file sqljdbc 3.0.1301.101 enu was
       downloaded and saved to a directory on the user's computer. This file is the JDBC driver
       file in a zipped format. This file should be unzipped within this same directory (as shown
       in Figure 56).
  W Favorites
   El Desktcp
   * Downloads
   !fi Recent Places

  ^ Libraries
   [J Documents
   , Music
   ^ Pictures
   a Videos

  *& Homegroup

  ;•• Computer
   & Local Disk (CO
   0 Data (E:)
   ^ Documents (\\vt
   V KOs (\\vbcHsrv) i

  *>lp Network
   •fc VBOXSVR
Documents library
Connect To SCADA
                  Arrange by: Folder T
 I example,SQLServer_2008_by_cols
 | example_SQLServer_2008_by_rows
8/1/201112:11 PM  EDSYFJIe
8/1/201112:11 PM  EDSYFile
-3 sqljdbc_3.0.1301.101_er
                              7/19/2011 3:15 PM  Applies
      SqIjdbO.Q.l301.10l_enu Date modified: 7/19/2011 3:15 PM
      Application          Size: 3.60 MB
                                  Date created: 8/15/2011 2:53 PM
Figure 55: JDBC driver zip file location.
CANARY Training Tutorials
                                                                       Page 60

-------
1 = II 13 |fBl
d^ j-9 It > Libraries > Documents > MyCANARY > Connect To SCADA T 1 ** 1 1 ^orch p
Organize - B Open Share with - New folder £ •* EB ®
^4" Favorites
• Desktop
4 Downloads
^ Recent Places
^ Libraries
J Documents
., Music
t Pictures
g Videos
*^ HomEgroup
j*i Computer
& Local Disk (C)
C3 Data (EO
4* Documents (\\vt
^ ISOs (\\ybcncsrv) I
% Network
rj¥ VBOXSVR
Documents libra
Connect To SCADA
Name
11 example_SQLServer_2
@ examp!e_SQLSeTver_2
(3! sqljdbcJ.OJ.301.101j
ry
Date modrfie
I08_by_cols 8/1/2011 12;]
WinZip Self-Extractor - sqljdbcj.01301101_enu.exe \mr2m{
To un?ip all tiles in this self-extractor tile lo the 1 UrcJn 1
specihed lulder piei: the Unzip button.
Run WinZip
Unzip to folder:
J7 Overwrite files without prompting Atj(J|jf
Help |

Arrange by: Folder T
d Type Size
1 PM EDSY File S KB
1 PM EDSY File 8KB
5 PM Application 3,654 KB
1 n-f| sqljdbcj .0.130 1.10 l_enu Date modified: 7/19/2011 3:15 PM Date created: 8/15/2011 2:53 PM
| mt| Application Size: 3.60 MB
Figure 56: Unzipping JDBC driver file.
       Access the folder with the unzipped files (Figure 57) and then select the desired database
       JAR file.
r^irss
gjY )^| * « Documents > MyCANARY > Connect To SCADA > Microsoft SQL Server JDBC Driver 3.0 > ' ] *t [ | • «'••• P
Organize » Share with » New folder ^EE ^ Ei ®
>T Favorites
K Desktop
4. Downloads
'^1 Recent Places
^ Libraries
-| Documents
Jl Mu,ic
S Pictures
H Videos
«»»***,
;^ Computer
£i Local Disk (CO
^j Data (E:)
^ Documents (\\vt
V ISOs (\\vboxsrv) i





E





Documents library Fo|dw,
Microsoft SQL Server iDBC Driver 3.0
Name Date modified Type Size
sqljdbcj.O 8/15/2011 2:55 PM Filefolder










*V Network
:¥ VBOXSVR
i 1 item
T


Figure 57: Unzipped file location.


   •   Find and copy the JAR file(s) located in the unzipped JDBC driver folder (Figure 58).
       For this example, two JAR files need to be copied.
CANARY Training Tutorials
Page 61

-------
SES
gSt ,* « My CANARY > Connect To SCADA > Microsoft SQL Server JDEC Driver 3.0 > sqljdbcj.0 > enu > ' ^ I ^eTr^e p
Organize ' [^j Open
"5^" Favorites
• Desktop
* Downleads
£i Recent Places

,^al Libraries
ijj Documents
J> Music





'%• Computer
& Local Disk (CO
a Data (E:)
^" Documents (\\vt
i^- ISOs (\Wboxsrv) I




















4jp Network
-fc VBOXSVR

^ 2 items se ectec

Share with ^ New folder J^ <" gj >Q
Documents library
J Arrange by; Folder T
enu


auth 8A5/2Q11 2:55 PM File folder
help 8/15/2011 2:55 PM File folder
xa 8/15/2011 2:55 PM File folder
Q install 4/19/2010 10:08 AM Text Document 2 KB
H license 4/19/2010 10:08 AM Text Document 11 KB
!_, release 4/19/2010 10:08 AM Text Document 5 KB
^ sqljdbc —
5qljdbc4










open
Share with *
Send to *
Cut

Copy
Create shortcut
Delete
Rename

Properties
4/19/2010 10:08 AM Executable Jar File 505 KB
4/19/2010 10;Q8 AM Executable Jar File 525 KB










Date modified: 4/19/2010 10:08 AM Date created: 4/19/2010 10:08 AM
Size; 1.00 MB
Figure 58: Copying JAR files.
       Paste the JAR file(s) to the \Program Files\CANARY\lib directory (Figure 59).
                                                               I ° II B B
^^ j-9\ > Computer > Local Disk (O) > Programmes >
Organize - [^ Open New folder
. Favorites -, Name
• Desktop
* Downloads
^ Recent Places

^ Libraries
[3 Documents
J) Music
fc. Pictures
B Videcf

*s^ Hcmegroup

# Computer
£i Local Disk (C:)
j_j Data (E;)
^> Documents (\\vt
V ISOs (\\vboxsrvj i
_ CanerysCore
1. commons-math-2.0
,_ commons-math-2.0.zip.asc
Q commons- math-2.0-zip.md5
^ common5-math-2.0-javadoc
|i*j commons- math -2.0 -sources
,_ LICENSE
E _ NOTICE
,*j snakeyaml-1,7
| a sqljdbc
CANARY > lib > - | 4f \ | : p
SB - 13 »
Date modified Type Size
8/1/2011 3:12 PM Executable Jir File 81KB
3/31/2011 9:35 AM Executable Jar File 742KB
3/31/2011 9:35 AM Compressed (zipp... 10,434 KB
3/31/2011 9:35 AM ASC File 1 KB
3/31/2011 9:35 AM M05 File 1 KB
3/31/2011 9:35 AM ExecutableJarFile 4,454KB
3/31/2011 935 AM Executable Jar File 952KB
3/31/2011 9:35 AM Text Document 19KB
3/31/2011 9-35 AM Text Document 3 KB
7/7/2011 9:04 PM Executable Jar File 248 KB
4/19/2010 10:08 AM Executablejar File 506KB
[ H sql)dbc4 4/19/2010 10:08 AM Executable Jar File 525 KB





<3ji Network
;¥ VBOXSVR
2 items selected Date modified: 4/19/2010 10:08 AM
Size: 1.00 MB







Date created: 8/15/2011 2:57 PM
Figure 59: New JAR file location.


4.2 Modifying Configuration File to Use Databases
The configuration file tells CANARY that the input data is found within a database and several
options need to be defined. Open the configuration (YML) file in a text editor application.
   •   Go to the data sources section near the top of the file and set the type parameter to DB
       (see the highlighted section in Figure 60).
CANARY Training Tutorials
Page 62

-------
  # CANARY  Config File - Database driven tutorial example

  canary:
    run mode: BATCH
    control type: INTERNAL
    control messenger: null
    driver  files: null

  # Enter the time step options below
  timing options:
    dynamic start-stop: off
    date-time format: mm/dd/yyyy HH:MM:3S
    date-time start: 02/01/2008 00:00:00
    date-time stop: 02/28/2009 23:58:00
    data interval: 00:02:00
    message interval: 00:00:01

  # Enter the list of data sources below
  data sources:
    - id: database_sqlserver
      type:  db
      location:  jdbc:sqlserver://127.0.0.1;instanceName=SQLEXPRESS;databaseName=WDS_SIM
      enabled: yes
      timestep options:
        field: "TIME_STEP"
        format:  "120"
        conversion function: "CONVERT(datetime,"
 [j]    database options :
        time drift:
        JDEC2 class name: com.microsoft.sqlserver.jdbc.SQLServerDataSource
        input table: "dbo.test_station_d"
        input format: row based
        output table: "dbo.test_output_d"
 [Jl      login info:
         prompt for login: no
         username: sa
         password: SYSTEM

Figure 60: Data sources configuration section with the type parameter set as database.


    •   Set the location parameter to the user's database URL location (Figure 61). The format
        for this line is jdbc:(database vendor identifier)://(IP address); (database  specific options).
CANARY Training Tutorials                                                            Page 63

-------
  # CANARY Config File - Database driven tutorial example

  canary:
    run mode:  BATCH
    control type: INTERNAL
    control messenger: null
    driver files: null

  # Enter the  time step options below
  timing options:
    dynamic start-stop: off
    date-time  format: mra/dd/yyyy HH:MM:SS
    date-time  start: 02/01/2008 00:00:00
    date-time  stop: 02/28/200B 23:58:00
    data interval: 00:02:00
    message interval: 00:00:01

  # Enter the  list of data sources below
  data sources:
    - id: database_sqlserver
     type:  db
     location:  jdbc:sqlserver://127.0.0.1;instanceName=SQLEXERESS;databaseName=»DS_SIM
     enabled: yes
     timestep options:
       field: "TIME_STEP"
       format:  "120"
       conversion function: "CONVERT(datetime,"
     database options:
       time drift: 0
       JDBC2  class name: com.microsoft.sqlserver.jdbc.SQLServerDataSource
       input  table: "dbo.test_station_d"
       input  format: row based
       output table: "dbo.test_output_d"
       login info:
         prompt for login: no
         username: sa
         password: SYSTEM

Figure 61: Data sources configuration section with the location parameter listing URL of
database.


    •   Set the timestep options parameters (Figure 62). The timestep options parameters are
       field, format, and conversion function. 1]\Q field parameter is typically "Time_Step" and
       it must correspond to the name of the time step column within the database table. The
       format parameter is  defined by the database and it should match the value in the date-
       time format parameter in the timing options section. The conversion function parameter is
       specified by the database and should include any preceding commas or open parenthesis.
CANARY Training Tutorials                                                           Page 64

-------
  # CANARY Config File - Database driven tutorial example

  canary:
    run mode: BATCH
    control type: INTERNAL
    control messenger: null
    driver files: null

  # Enter the time step options below
  timing options:
    dynamic start-stop: off
    date-time format:  mm/dd/yyyy HH:MM:SS
    date-time start: 02/01/2008 00:00:00
    date-time stop: 02/28/2008 23:58:00
    data interval: 00:02:00
    message interval:  00:00:01

  # Enter the list of  data sources below
  data sources:
 E-J  - id: database_sqlserver
     type:  db
     location:  jdbc: sqlsetrver ://127.0.0.1; instanceName=SQLEXPRESS; databaseName=WDS_S IM
     enabled: yes
 Ej]   timestep options:
       field: "TIME_STEP"
       format:  "120"
       conversion function: "CONVERT(datetime, "
 Ej]   database options :
       time drift:
       JDBC2 class name: com.microsoft.sqlserver.jdbc.SQLServerDataSource
       input table: "dbo.test_station_d"
       input format:  row based
       output table:  "dbo.test_output_d"
       login info:
         prompt for login: no
         username: sa
         password: SYSTEM

Figure 62: Data sources configuration section with the timestep options parameters.


    •    Set the  database options parameters (Figure 63). The database options include many
       parameters, but the main ones which need to be defined are time drift, JDBC2 class
       name, input  table, and output table. The time drift parameter is database and computer
        specific and  is the difference in time between the local computer time and the time on the
       database machine in days. The JDBC2 class name parameter is a Java class name
       referenced in the JDBC database documentation and it will always have DataSource in
       the last part of the file name. The input table and output table parameters are database
        specific and  user defined.
CANARY Training Tutorials                                                          Page 65

-------
  # CANARY Config File - Database driven tutorial example

  canary:
    run mode:  BATCH
    control type: INTERNAL
    control messenger: null
    driver files: null

  # Enter the  time step options below
  timing options:
    dynamic start-stop: off
    date-time  format: mm/dd/yyyy HH:MM:SS
    date-time  start: 02/01/2008 00:00:00
    date-time  stop: 02/28/2008 23:58:00
    data interval: 00:02:00
    message interval: 00:00:01

  » Enter the  list of data sources below
  data sources:
    - id: database_sqlserver
     type:  db
     location:  jdbc: sqlserver:: //127 .0 .0 .1; instanceName=SQLEXPRES S; databaseName=WDS_SIM
     enabled: yes
     timestep options:
       field: "TIME_STEP"
       format:  "120"
       conversion function:  "CONVERT(datetime, "
     database options:
       time drift:
       JDBC2  class name: com.microsoft.sqlserver.jdbc.SQLServerDataSource
       input  table: "dbo.test_station_d"
       input  format: row based
       output table: "dbo.test_output_d"
       login  info:
         prompt for login: no
         username: sa
         password: SYSTEM

Figure 63: Data sources configuration section with the database options parameters.


    •   Under the database options parameter, set the login info parameters. These parameters
       arepromptfor login, username, andpassword. If the prompt for login parameter is set to
       yes, then the username and password parameters are not required and can be removed
       from the configuration file. If the parameters remain, then the parameter values must be
       blank. If prompt for login is set to no, the user must define the username andpassword
       parameters (Figure  64).
CANARY Training Tutorials                                                          Page 66

-------
  # CANARY Config File - Database driven tutorial example

  canary:
   run mode:  BATCH
   control type: INTERNAL
   control messenger: null
   driver files: null

   Enter the  time step options below
  timing options:
   dynamic start-stop: off
   date-time  format: mm/dd/yyyy HH:MM:SS
   date-time  start: 02/01/2008 00:00:00
   date-time  stop: 02/28/2008 23:58:00
   data interval: 00:02:00
   message interval: 00:00:01

  # Enter the  list of data sources below
  data sources:
   - id: database_sqlserver
     type:  db
     location:  jdbc:sqlserver://12'7.0.0.1; instanceName=SQLEXPRESS;databaseNanie=WDS_SIM
     enabled: yes
     timestep options:
       field: "TIME_STEP"
       format:  "120"
       conversion function:  "CONVERT(datetime,"
     database options:
       time drift:
       JDBC2  class name:  com.microsoft.sqlserver.jdbc.SQLServerDataSource
       input  table: "dbo.test_station_d"
       input  format: row based
       output table: "dbo.test_output_d"
       login  info:
         prompt for login: no
         username: sa
         password: SYSTEM

Figure 64: Data sources configuration section with database options login parameters.


   •   Additional optional parameters can be set by the user under the database options
       parameter. These options include input format and output format. If the input format
       parameter is omitted, then the format is column based (for other options, see Section 5.3
       of the CANARY User's Manual  (Hart and McKenna 2012)). The example in Figure 65
       shows the option for row based input. If the output format parameter is omitted, then the
       default format of the database table is used.
CANARY Training Tutorials                                                          Page 67

-------
  # CANARY Config File -  Database driven tutorial example

  canary:
    run mode: BATCH
    control type: INTERNAL
    control messenger:  null
    driver files: null

  # Enter the time step options below
  timing options:
    dynamic start-stop: off
    date-time format:  min/dd/yyyy HH:MM:SS
    date-time start:  02/01/2008 00:00:00
    date-time stop: 02/28/2008 23:53:00
    data interval: 00:02:00
    message interval:  00:00:01

  # Enter the list of data sources below
  data  sources:
 FJ  - id: database_sqlserver
     type: db
     location:   jdbc:sqlserver://127.0.0.1; instanceName=SO.LEXPRES S;databaseName=(JDS_SIM
     enabled: yes
     timestep options:
       field: "TIME_STEP"
       format:   "120"
       conversion function:  "CONVERT(datetime,"
     database options:
       time drift:
       ODBC2 class name: com.microsoft.sqlserver.jdbc.SQLServerDataSource
       input table:  "dbo.test_station_d"
       input format:  row based
       output table:  "dbo.test_output_d"
 [J]     login info:
         prompt for login: no
         username: sa
         password: SYSTEM

Figure 65: Data sources configuration section with the optional database options parameter,
input format.


    •   To test the database connection, save and run the configuration file in CANARY.
CANARY Training Tutorials                                                               Page 68

-------
5. Composite Signals Tutorials
The composite signals capability allows users to combine and modify input signals in order to
enhance detection. Composite signals are created through simple mathematical operations and
other logic statements. They can be combinations of water quality, operational, and/or calibration
signals. Composite signals can be used to create new calibration signals or to integrate
operational information into the event detection process.

Three tutorials that highlight the development of different composite signals are provided. The
files associated with these tutorials are located in the "Tutorial_Files\Composite_Tutorials"
directory.  The three tutorials are:

   •   Creation of a calibration signal to suppress alarms for a fixed time period after a
       calibration event.
   •   Integration of flow data from a single pump into a composite signal.
   •   Integration of tank levels and tank outlet water quality into a composite signal.

5.1 Suppressing an Alarm After a Calibration  Event
Alarms after a calibration event are a common issue in utilities when there is a significant
difference in the water quality values before and after calibrating the sensor(s). Suppressing
alarms for a set time period following a calibration event allows the algorithm data windows to
be filled with new data prior to re-starting online event detection. A composite signal  can be
defined to extend  the signal calibration time period by a set amount. This tutorial demonstrates
this approach.

In this tutorial, the input data is contained in the file, CTFD_caltest_mod.csv, which is located in
the "Tutorial_Files\Composite_Tutorials\Composite_Signals_l\Initial" directory. This file
contains data from a single monitoring station that collected data every two minutes for six water
quality signals: residual chlorine (CL2), specific conductivity (COND), oxidation reduction
potential (ORP), pH (PH), temperature (TEMP), and total organic carbon (TOC). A calibration
signal is applied to all sensors at the monitoring station in a 0/1  format. For this calibration
signal, 0 indicates calibration and 1 indicates normal operations. Note that this logic is the
reverse of that typically used in distribution systems.

Figure 66  shows the water quality signal plots as well as the CANARY LPCF detection plot for a
single day, January 3, 2011, based on running the configuration file,
CTFD_composite_A_mod.yml. The YML file specifies the LPCF algorithm with a history
window parameter of 1080 time steps (1.5 days) and an event threshold parameter of 0.90 in the
algorithms section. It also specifies the six signals to be included in the analysis in the
monitoring stations section, as well as the calibration signal RAW_CAL. The calibration alarm
is turned on around 10:30 AM on January 3, 2011, and stays on until 11:04 AM as shown by the
green bars in Figure 66. The TOC value drops to zero quickly and stays there throughout the rest
of the day. Since the calibration signal was used as part of the analysis, the calibration periods
are ignored and CANARY does not identify an event during this period.

In contrast, Figure 67 shows the water quality signal plots and detection results for January 6,
CANARY Training Tutorials                                                      Page 69

-------
2011. In this case, the calibration alarm is on from 9:44 AM to 10:32 AM. The TOC signal takes
about 14 minutes before it starts to return to normal values, first bouncing up rapidly to a value
near 3 and then stabilizing down to a value near 1. During this transition period, CANARY
identifies an event. As the calibration period has ended, this data is not ignored by CANARY.
                              CTFD 2011-01-03 00:00:00 to 2011-01-03 23:58:00
 DSTCWSHTCHCL2XV
         CL2(mg/L)
                  00  01 02 03  04 05 06  07  08 09  10  11 12 13  14 15 16  17  18 19  20  21 22 23  00
    DST CWS HTCH V
       COND (|iS/cm)
400

200
                  00  01 02 03  04 05 06  07  08 09  10  11 12 13  14 15 16  17  18 19  20  21 22 23  00
DST CWS HTCH ORPx V 600
         ORP (ppm) 400
                200
                  0
                  00  01 02 03  04 05 06  07  08 09  10  11 12 13  14 15 16  17  18 19  20  21 22 23  00
  DST CWS HTCH PHxx V 8
            PH(PH) 6
                 4
                 2
                 0
                  00  01 02 03  04 05 06  07  08 09  10  11 12 13  14 15 16  17  18 19  20  21 22 23  00
     DST CWS HTCH V
         TEMP(deg) 10
                 5
                  00  01 02 03  04 05 06  07  08 09  10  11 12 13  14 15 16  17  18 19  20  21 22 23  00
  DST CWS SIEV TOCx V  1
         TOC (mg/L)
                  00  01 02 03  04 05 06  07  08 09  10  11 12 13  14 15 16  17  18 19  20  21 22 23  00
             o    1
            LLCO
            y?  0.5
                  00  01 02 03  04 05 06  07  08 09  10  11 12 13  14 15 16  17  18 19  20  21 22 23  00
Figure 66: Output graph produced for January 3, 2011.
CANARY Training Tutorials
                                                                           Page 70

-------
                               CTFD 2011 -01 -06 00:00:00 to 2011 -01 -06 23:54:00
 DST CWS HTCH CL2x V
          CL2(mg/L)
                   00 01  02  03  04 05 06 07  08  09  10 11  12  13  14 15 16 17  18  19 20 21  22  23  00
    DST CWS HTCH V
                 400

                 200
                   00 01  02  03  04 05 06 07  08  09  10 11  12  13  14 15 16 17  18  19 20 21  22  23  00
DST CWS HTCH ORPx V 600
         ORP (ppm) 400
                 200
                  0
                   00 01  02  03  04 05 06 07  08  09  10 11  12  13  14 15 16 17  18  19 20 21  22  23  00
  DST CWS HTCH PHxx V 8
             PH (PH) 6

                  2
                  0
                   00 01  02  03  04 05 06 07  08  09  10 11  12  13  14 15 16 17  18  19 20 21  22  23  00
     DST CWS HTCH V 15
         TEMP(deg) 10
                  5
                   00 01  02  03  04 05 06 07  08  09  10 11  12  13  14 15 16 17  18  19 20 21  22  23  00
   DST CWS SIEV TOCx V
          TOC(mg/L) 2
                  1
                   00 01  02  03  04 05 06 07  08  09  10 11  12  13  14 15 16 17  18  19 20 21  22  23  00
              O   1
            LLOO
                 0.5
              !=   0
                   00 01  02  03  04 05 06 07  08  09  10 11  12  13  14 15 16 17  18  19 20 21  22  23  00

Figure 67: Output plot produced for January 6, 2011.


This event is clearly a byproduct of the calibration period and should not be identified as a real
event. Either users can use their expert judgement to determine that this is a false alarm and
ignore it, or they can attempt to use another feature of CANARY to suppress these types of
events that follow calibration periods. In what follows, the second option is demonstrated.

In order to suppress events identified following calibration periods, a new composite signal is
created by extending the original calibration signal, RAW_CAL, by thirty time steps. A portion
of the signals section of the YML  file is shown in Figure 68 where the calibration signal is
defined.

 -  id:  RAW_CAL
   SCADA tag: DST_CHS_CTFD_MON_OSXX_CMD
   evaluation type:  cal
   parameter type:  Raw CAL
   ignore changes:  none
   alarm options:
    value when  active:

Figure 68: Signals configuration section for original calibration signal, RAW_CAL.
CANARY Training Tutorials
Page 71

-------
A new composite signal is created in the signals section of the YML file to reverse the
calibration flags, so that 0 indicates normal operations and 1 indicates calibration (Figure 69). At
each time step, the  composite rule subtracts 1.0 from the RAW_CAL signal and then takes the
absolute value of the resulting number. Note that the signal evaluation type parameter is set to
OP for two reasons. First, so that the signal can be graphed and second, so a new composite
signal can be defined as the calibration signal. The new YML file, CTFD_caltest_mod.csv, is
found in the "Tutorial_Files\Composite_Tutorials\Composite_Signals_l\Flip_CAL" directory.

- id: FLIP_CAL
  SCADfl tag: FLIP_CAL
  evaluation type:  op
  parameter type:  Flipped Calib
  ignore changes:  none
  data options:  #  DATA
    precision:  L.I
    units:  "
    valid range: [0.0,  1.0]
    set points:  [-.inf,  .inf]
  composite rules:  I
    §RAW_CAL[0]
    (1)

    abs

Figure 69: Signals configuration section for composite calibration signal.
CANARY Training Tutorials                                                      Page 72

-------
# Enter the list of event detection algorithms below
algorithms:
- id: CTFD_ALG
  type: LPCF
  history window: 1090
  outlier threshold:
  event threshold: 0.95
  event timeout:
  event window  save--
  BED:
    window:
    outlier probability:  u.S
t Enter the list of monitoring stations below
monitoring stations:
- id:  CTFD
  station id number:
  station tag name: CTFD
  location id number:
  enabled: yes
  Inputs:
    -  id: files_in
  outputs:
  signals:
    -  id: RAH_CAL
    -  id: FLIP_CAL
    -  id: CTFD_H2O>:HTCH_CL2x_V
    -  id: CTFD_H2OXHTCH_COND_V
    -  id: CTFD_H2OxHTCH_ORPx_V
    -  id: CTFD_H20xHTCH_PHxx_V
    -  id: CTFD_H20xHTCH_TEMP_V
    -  id: CTFD_H2OxSIEV_TOCx_V
  algorithms:
    -  id: CTFD ALG
Figure 70: Algorithms and monitoring stations configuration sections.

The RAW_CAL and FLIP_CAL signals are plotted in Figure 71 with the rest of the data by
switching the signal evaluation type parameter from CAL to OP. Since these two signals are now
operational (OP) signals, they have green y-axis labels to distinguish them from the water quality
(WQ) signals. Note that these signals must be included in the monitoring stations section of the
YML file in order to appear on the graphs, see Figure 70. The newly created composite signal is
working correctly, since the FLIP_CAL signal is a mirror image of the RAW_CAL signal.
CANARY Training Tutorials                                                            Page 73

-------
                             CTFD 2011-01-06 00:00:00 to 2011-01-06 23:54:00
DST CWS MON OSxx CMD
          Raw CAL
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 16  19 20  21 22  23 00
          FLIP CAL
        Flipped Calib
  DST CWS HTCH CL2x V
         CL2 (mg/L)



-n •;
1 1 1 I 1 1 1 1 ! 1 1 1 1 ! 1 1 1 1 ! 1 1 1 1
: ; ! : : : ; ; FT [- : ; ; : : : ; ! : : : \
': - ' '. '- 1 - 1 : : : : : : : :
i i i i I i i i i I i i i i i i i i i i i i i
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00
                 '
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00
     DST CWS HTCH V 520
       COND (|j.S/cm)
                500
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00
 DST CWS HTCH ORPxV
         ORP(ppm) 700
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00
  DST CWS HTCH PHxx V g 4
           PH (PH)
                9.2
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00
     DST CWS HTCH V 15
         TEMP(deg) ™
                12
                11
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00
   DST CWS SIEV TOCx V
          TOC(mg/L) 2
                 1
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00
             O   1
            LLCO

                "
             !=   0
                  00 01 02 03 04 05 06 07 08 09 10  11 12  13 14  15 16  17 18  19 20  21 22  23 00

Figure 71: Output graph produced with composite signal.

To suppress alarms for a user-specified time period after calibration, a new composite signal that
uses the FLIP_CAL signal and stays as a 1 for 30 time steps beyond the most recent calibration
event can be used. The new YML file is found in the
"Tutorial_Files\Composite_Tutorials\Composite_Signals_l\Suppress" directory. Because of a
constraint that limits the total number of parenthetical statements within a composite signal to 16
(32 pairs), the composite signal definition needs to be broken down into smaller parts to cover all
30 times steps. The first composite signal  covers the first 10 time steps, the second covers the
second 10 time steps, and the third covers the third 10 time steps. In Figure 72, the first of three
composite signals that provide the basis of alarm suppression over 30 consecutive time steps is
defined.
CANARY Training Tutorials
Page 74

-------
- id:  CAL_TIHE_OUT_CTFD_A
  SCSDJl tag: CAL_TIHE_OUT_CTFD_A
  evaluation type:  op
  parameter type:  Calibration Time Out
  ignore changes:  none
  data options:  »  DATA
    precision:  I
    units:  ' '
    valid range:  [-.inf, .inf]
    set points:  [-.inf, .inf]
  composite rules:  I
    @FLIP_CBL[0]
    OFLIP CBL[1]
    3FLIP_CBL[0]
    @FLIP_CSL[2]

    abs
    max
    @FLIP_CaL[0]
    9FLIP_CHL[3J

    abs
    max
    (5FLIPCBL[0]
    |JFLIP_C31L [4]

    abs
    max
    @FLIP_CHL[0]
    @FLIP_C»L[3]

    abs
    max
    @FLIP_CBL[0]
    @FLIP CBL[6]

    abs
    max
    @FLIP_CftL[0]
    @FLIP_C»L[7]

    abs
    max
    OFLIP CJIL[0]
    (§FLIP C»L[8]

    abs
    max
    i?FLIP CAL[0]
Figure 72: Signals configuration section for CAL_TIME_OUT_CTFD_A.

Composite signals use a memory stack and only two numbers can be on the stack at any one
time. In plain English, the logic above says:

    •   Subtract the previous value of FLIP_CAL (@ FLI P_CAL [ 1 ] ) from the current value of
CANARY Training Tutorials                                                          Page 75

-------
      FLIP_CAL (@FLIP_CAL [ 0 ]) and take the absolute value of the result. One value is on
      the stack.
   •  Subtract the second previous value of FLIP_CAL (@FLIP_CAL [ 2 ] ) from the current
      value of FLIP_CAL and take the absolute value of the result. Two values are on the
      stack.
   •  Take the maximum of the two values on the stack. This leaves a single value, the
      maximum, on the stack.
   •  Subtract the third previous value of FLIP_CAL (@ FLI P_CAL [ 3 ] ) from the current
      value of FLIP_CAL and take the absolute value of the result. Two values are again on the
      stack.
   •  Take the maximum of the two values on the stack. This leaves a single value, the
      maximum, on the stack.
   •  Continue the same steps through the 10th previous value (@ FL I P_CAL [ 10 ] ). A single,
      final value on the stack is the maximum across all 10 comparisons.

The composite signal definition could also be written as the following equation:
                         max {\FLIP G4L(0)  -  FLIP CAL(i}\}
                         i=i-io      ~               ~

Two additional composite signals are created using the same exact commands given in Figure 72
except the numbers in the shift are 11-20 for the composite signal CAL_TIME_OUT_CTFD_B
and 21-30 for the composite signal CAL_TIME_OUT_CTFD_C. These three composite signals
track the maximum difference between the value of FLIP_CAL at the current time step and the
value of FLIP_CAL at time steps ranging from 1 to 30 time steps prior to the current time step.
The final  composite signal, FINAL_CAL_CTFD, combines and retains the maximum value of
any of these three, 10 time-step long composite signals (Figure 73). Note that this is the only
signal in the YML file with an evaluation type parameter value set to CAL, and that this signal
was included in the monitoring stations section of the YML file in order to be included in the
detection analysis.

 - id:  FINAL_CAL_CTFD
  SCADA tag: FINAL_CAL_CTFD
  evaluation type:  cal
  parameter type: Final Calibration
  ignore  changes: none
  alarm options:  #  ALARM
    value when active: 1
  composite rules:  I
    @ CALJT IME_OUT_CTFD_R [ 0 ]
    @CAL_TZME_OUT_CTFD_B [0]
    max
    @ CAL_T IME_OUT_CTFD_C [ 0 ]
    max

Figure 73: Signals configuration section for FINAL_CAL_CTFD.

The results from running the configuration file with the new calibration signal are shown in
Figure 74, in which periods of alarm suppression due to calibration are shown in green. By
comparing Figure 71 and Figure 74, a few differences can be seen. The previously detected event
CANARY Training Tutorials                                                    Page 76

-------
after the calibration period due to TOC suddenly increasing just before 10:00 AM is no longer
detected, since the event occurs during a period of alarm suppression extending thirty time steps
after calibration.
                             CTFD 2011-01-06 00:00:00 to 2011-01-06 23:54:00
 DSTCWSHTCHCL2xV
         CL2(mg/L)
                 00 01 02  03 04  05 06  07 08  09 10 11  12 13  14 15  16 17  18 19  20 21  22 23 00
    DST CWS HTCH V
      COND (jiS/cm)
400
200
                 00 01 02  03 04  05 06  07 08  09 10 11  12 13  14 15  16 17  18 19  20 21  22 23 00
DST CWS HTCH ORPx V 600
        ORP (ppm) 400
               200
                 0
                 00 01 02  03 04  05 06  07 08  09 10 11  12 13  14 15  16 17  18 19  20 21  22 23 00
  DST CWS HTCH PHxx V 8
            PH(PH) 6
                 2
                 0
     DST CWS HTCH V
         TEMP(deg) 10
                 5
                 00 01 02  03 04  05 06  07 08  09 10 11  12 13  14 15  16 17  18 19  20 21  22 23 00
                15
                 00 01 02  03 04  05 06  07 08  09 10 11  12 13  14 15  16 17  18 19  20 21  22 23 00
  DST CWS SIEV TOCx V  1
        TOC (mg/L)

                 0
                 00 01 02  03 04  05 06  07 08  09 10 11  12 13  14 15  16 17  18 19  20 21  22 23 00
            O   1
           LLCO
             s=   0
                 00 01 02  03 04  05 06  07 08  09 10 11  12 13  14 15  16 17  18 19  20 21  22 23 00

Figure 74: Output graph produced with alarm suppression composite signal enabled.
5.2 Integrating Pump Flow Operational Data into Event Detection
This tutorial demonstrates the creation of composite signals that integrate available operational
data into the event detection process. Often water quality is monitored at the same locations that
control operations in the distribution network (e.g., a pump station, tank, or valve). Operational
alterations could cause changes in the water quality at the co-located monitoring station. If the
changes in the operational control are recorded and available through the SCADA system, then
they can be integrated into the event detection process.

The files associated with this tutorial are located in the
"Tutorial_Files\Composite_Tutorials\Composite_Signals_2" directory. The  data for this tutorial
are found in the StationD_train.csv file which contains data from a four-month period in 2008.
This monitoring station, Station D, recorded a number of primary water quality parameters every
two minutes including: residual  chlorine (CL2), specific conductivity (COND), pH (PH),
temperature (TEMP), turbidity (TURB), and total organic carbon (TOC). The  SCADA system at
this site provides a calibration signal in a 0/1 format to indicate calibration time periods. The
calibration signal applies to all sensors at the monitoring station.
CANARY Training Tutorials
                                                                     Page 77

-------
This configuration file for this tutorial is called, StationD_Initial.yml. The monitoring stations
section of the configuration file is shown in Figure 75 with seven signals included and one
algorithm, Bl. Algorithm Bl is of type LPCF with a history window of 1080 time steps (1.5
days) and an event threshold of 0.995.

 monitoring stations:
I-  id:  StationD
   station  id number:
   station  tag name: StationD
   loc at ion id number:  -1
   enabled:  yes
I   input s:
     - id:  stationd_in
   outputs:
I   signals:
     - id:  TEST_CAL
     - id:  TEST_CL
     - id:  TEST_PH
     - id:  TESTJTEMP
     - id:  TEST_COND
     - id:  TESTJTURB
     - id:  TESTJTOC
I   algorithms:
     - id:  Bl

Figure 75: Monitoring stations configuration section.


The results for the second week of the analysis (1/23-1/29/2008) are shown in Figure 76. The
bright green signal values (green boxes in Figure 76) in the afternoon of January 29th indicate a
short calibration event at this station as indentified by the TEST_CAL signal, which defines time
periods of manual calibration. CANARY ignores water quality signals from this station during
the indicated period of calibration.

During this two-week period, three events are identified based on changes in the CL2 and PH
data. These alarms are considered to be false positives by the water utility as they are relatively
small changes, fall within the normal range for these parameters, and occur on a regular basis
throughout the four month period. It is hypothesized that the changes in the water quality values
are caused by routine operational changes at this monitoring station.
CANARY Training Tutorials                                                      Page 78

-------
                           StationD 2008-01-23 00:00:00 to 2008-01-29 23:54:00
           23-Jan      24-Jan       25-Jan       26-Jan      27-Jan       28-Jan      29-Jan       30-Jan
      DPH
     PH(pH) 8.6

          8.4
           23-Jan      24-Jan       25-Jan       26-Jan      27-Jan       28-Jan      29-Jan
      D TEMP 4
    TEMP (°C)
    DCOND
COND (iiSfcm)
          320
    D TURB 0.1
 TURB (NTU)
     DTOC 0.8
   TOC(ppm) °-6
          0.4
          0.2
           0
       o
     LLCO
          0.5
                                                                                    30-Jan
           23-Jan      24-Jan       25-Jan       26-Jan      27-Jan       28-Jan      29-Jan       30-Jan
          325,	,	,	,	,	,	,	
                                                                                _m_J
           23-Jan      24-Jan       25-Jan       26-Jan      27-Jan       28-Jan      29-Jan       30-Jan
                                                                                ..u
           23-Jan      24-Jan       25-Jan       26-Jan      27-Jan       28-Jan      29-Jan       30-Jan
           23-Jan      24-Jan       25-Jan       26-Jan      27-Jan       28-Jan      29-Jan       30-Jan
           23-Jan      24-Jan       25-Jan       26-Jan      27-Jan       28-Jan      29-Jan       30-Jan

Figure 76: Output graph produced of Station D data with a calibration period and three
alarms.

This monitoring station has a number of additional signals, such as, several pump signals and
other water quality signals from mains that feed into this station. The monitoring stations section
of the configuration file, StationD_Stepl.yml, was altered to allow for the display of the three
pump signals (Figure 77). By plotting these signals, a relationship between water quality signals,
operational signals, and the identified events can be determined.
CANARY Training Tutorials
Page 79

-------
 monitoring stations:
 -  id: StationD
   station id number:
   station tag name: StationD
   location id number:
   enabled: yes
 I  input s:
     - id:  stationd_in
   outputs:
   signals:
     - id:  TEST_CAL
     - id:  TEST_CL
     - id:  TEST_PH
     - id:  TEST_TEMP
     - id:  TEST_COND
     - id:  TESTJTURB
     - id:  TEST_TOC
     - id:  TEST_PUMP_CONN_FLOW
     - id:  TEST_PUMP1_FLOW
     - id:  TEST_PUMP2_FLOW
   algorithms:
     - id:  Bl

Figure 77: Monitoring stations configuration section with three pump signals for Station D.


Figure 78  shows the same data as Figure 76 with the addition of the three pump signals. The
operational signals are shown with green labels on the y-axis. Examination of Figure 78 shows
that the water quality events might be linked to one or more of the changes in pumping rates.
Although  it is not exactly clear which pump could be responsible for the water quality events, it
appears that changes in the pump on inlet 2, the D PUMP IN2 FLOW signal in Figure 78, might
be causing the events. The timing of the identified events appears to correspond to large changes
in this signal. In order to test this hypothesis, composite signals are added to the configuration
file that will tell CANARY to ignore data during periods of large changes in the D PUMP IN2
FLOW signal.
CANARY Training Tutorials                                                       Page 80

-------
        DCL2
     CL2 (Mgfl.)
       OPH
      PH (pH) 9
     DTEMP
    TEMP (°C)
     DCOND
  COND I
          200
     D TURB 0 2
   TURB (NTU) 0 15
          0.1
          0.05
       DTOC
     TDC (ppm)
  D PUMP FLOW
    FLOW(gpm)
D PUMP IN1 FLOW
    FLOW (gpm]
D PUMP IN2 FLOW
    FLOW (gpini
                       StationD 2008-01-23 00:00:00 to 2008-01-29 23:54:00
           23-Jan     24-Jan     26-Jan     26-Jan     27-Jan     2S-Jan     29-Jan     30-Jan
           10,
           23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     29-Jan     30-Jan
 23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     29-Jan      30-Jan

400 -	•	;	:	i	!	i	
           23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     29-Jan
           23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     29-Jan     30-Jan
           5,	1	1	,	1	1	1	
           23-Jan     24-Jan     26-Jan     26-Jan     27-Jan     28-Jan     29-Jan     30-Jan
           40,	1	1	,	1	1	1	
               ~L
O
           23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     29-Jan     30-Jan
           40,—
       (rim   n   finnnt h-imiiri
           23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     25-Jan     30-Jan
           SO
1
_.. IL_ 	 : _ JL JL.__:___

~ _ .1 L 	
           23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     29-Jan     30-Jan
           1r	
       J^ 0.5
           23-Jan     24-Jan     25-Jan     26-Jan     27-Jan     28-Jan     29-Jan
                                                                    30-Jan
Figure 78: Output graph produced with the additional three operational signals.

In the StationD_Step2.yml file in the
"Tutorial_Files\Composite_Tutorials\Composite_Signals_2\Step 2" directory, a new composite
signal is created called CMB_PUMP_FLOW. This new signal measures changes in the flow rate
over the last two time steps.  Specifically, it is the maximum value of the absolute differences in
the pump flow (TEST_PUMP2_FLOW) between the current time step [0] and the previous time
step  [1] or the current time step [0] and two time steps prior [2]. This composite signal is defined
as an operational signal (OP).

Another composite signal is created called CAL_PUMP_FLOW and it compares the output of
CMB_PUMP_FLOW (e.g., the change in the flow rate) at the current time to the constant value
of 5 gpm. The greater than or equal operator (ge) returns a  1 if the value of
CMB_PUMP_FLOW[0] is greater than or equal to 5 gpm and 0 otherwise. This signal is defined
as an operational signal so that it is visible on CANARY'S  output graph and to verify that is
CANARY Training Tutorials
                                                                        Page 81

-------
working correctly. These additions to the file are shown in Figure 79.
 - id: CMB_PUMP_FLOW
  SCftDA tag: CMB_PUMP_FLOW
  evaluation type:  op
  parameter type:  Max_4Min_Change
  ignore changes:  none
  data options:
    precision:
    units:  ''
    valid range: [-.inf, .inf]
    set points: [-.inf, .inf]
  composite rules:  I
    @TEST_PUME2_FLOW [0 ]
    @ T E S T_P UME 2_F1OW [ 1 ]

    abs
    @TEST_PUMP2_FLOW [0 ]
    @TEST_PUMP2_FLOW[2]

    abs
    max

 - id: CAL_PUMP_FLOW
  SCADA tag: CAL_PUMP_FLOW
  evaluation type:  op
  parameter type:  Pump2_Change
  ignore changes:  none
  data options: # DATA
    precision:
    unit s :  T T
    valid range: [-.inf, .inf]
    set points: [-.inf, .inf]
  composite rules:  |
    9 CMB_P UMP_FLOW [ 0 ]
    (5)
Figure 79: Signals configuration section with added composite signals.

Figure 80 shows the three operational signals and the resulting probability of an event after
running the YML file in CANARY. The top signal, D PUMP IN2 FLOW, is the original
operational signal from the SCADA system. The second signal is the CMB_PUMP_FLOW
composite signal, which calculates the maximum change in the D PUMP IN2 FLOW over the
past two time steps (4 minutes). The last signal is the CAL_PUMP_FLOW composite signal,
which has a value of 1 when changes in CMB_PUMP_FLOW exceed 5 gpm. From examination
of Figure 80, all signals appear to be working as designed. In addition, the CMB_PUMP_FLOW
signal value is typically near zero, but can range up to 30 gpm when there are sudden changes in
the D PUMP IN2 FLOW signal.
CANARY Training Tutorials                                                      Page 82

-------
D PUMP IN2 FLOW 50
FLOW (gpm)
0
23-
CMB PUMP FLOW 30
Max 4Min Change 20
10
0
23-
CAL PUMP FLOW 1
Pump2 Change Q 5
0
23-
0 1
LLCO
CL° 0.5
c* 0
23-
iH
	 ! 	 r
i
tJt=3 	 ! 	

1


~
"


Jan 24-Jan 25-Jan 26-Jan 27-Jan 28-Jan 29-Jan 30-Jan
_..


i i
	



....


Jan 24-Jan 25-Jan 26-Jan 27-Jan 28-Jan 29-Jan 30-Jan
-


i
Jan 24-Jan 25-Jan
-

j

26-J
i
1
1
an 27-Jan
i
i




28-Jan 29-Jan 30-Jan
Jan 24-Jan 25-Jan 26-Jan 27-Jan 28-Jan 29-Jan 30-Jan
Figure 80: Output graph produced with three operation signals and event probability.


If it seems reasonable to ignore data during periods of large changes in the operational signal
from one of the pumps, then another composite signal can be defined. This new signal,
CAL_ALL, is defined to be a calibration signal which will trick CANARY into ignoring data at
certain time periods. The new signal is created by modifying the existing calibration signal,
TEST_CAL, whenever the composite signal CAL_PUMP_FLOW is non-zero. Only one
calibration signal can be assigned per station, but this calibration signal can be a composite of
other signals. The original TEST_CAL signal is combined with the newly created
CAL_PUMP_FLOW into  a final calibration signal, CAL_ALL.

Since CAL_ALL is going to be the new calibration signal for Station D, the original TEST_CAL
signal's evaluation type parameter needs to be changed from CAL to OP. The alarm options
parameters of the original TEST_CAL calibration signal do not apply for an operational signal
and should not be used. These changes are made to the StationD_Step3.yml  configuration file
"Tutorial_Files\Composite_Tutorials\Composite_Signals_2\Step 3" directory. Figure 81 shows
the signals section of the YML file in which alarm  options parameters have  been commented;
alternatively, these three lines could be deleted from the configuration file.

-  id: TEST_CAL
   SCflDA tag: D_CAL
   evaluation type:  op
   parameter type:  CAL
   ignore changes:  none
   data options:  #  DATA
    precision:
    units: ''
    valid range:  [-.inf,  .inf]
    set points:  [-.inf,  .inf]
#  alarm options:  # ALARM
#   value when active: 0
#   scope: # entire station

Figure 81: Signals configuration section with changed calibration signal.
CANARY Training Tutorials
Page 83

-------
The CAL_ALL signal definition is shown in Figure 82 and it has the required alarm options
parameters. In addition, the CAL_ALL signal needs to be added to the monitoring station section
of the configuration file. The composite rule steps for this signal are:

   •   Subtract one from the current value of the TEST_CAL signal (@TEST_CAL [ 0 ] ) and
       take the absolute value of the result. This converts the original TEST_CAL signal to 1
       when active and 0 in the background.
   •   Add the result to the current value of the CAL_PUMP_FLOW signal
       (@CAL_PUMP_FLOW [ 0 ] ) and compare to zero. If the final result is true (sum > 0), the
       signal value for this time step will be 1; otherwise it will be 0.

I-  id:  CAL_ALL
   SCADA tag: CAL_ALL
   evaluation type:  cal
   parameter type:  Aggregate_CAL
   ignore changes:  none
I   alarm options:  # ALARM
     value when active:   1
I   composite rules:  I
     @TEST_CAL[0]
     (1)

     abs
     @ CAL_PUMP_FLOW [ 0 ]
     gt

Figure 82: Signals configuration section with the composite calibration signal.

Figure 83 shows the plots for the two inputs to the C AL_ALL signal and the probability of event
after running CANARY with this new composite calibration signal. The calibration signal used
in this analysis, CAL_ALL, is not plotted, but the calibration events it identifies are marked in
green. The three water quality events are still  identified by CANARY; thus, either these events
are not related to the pump activity, or the timing of the events is slightly out of step with the
changes in the water quality that caused CANARY to alarm. An additional look at the data in
Figure 78 shows that events are occurring roughly one hour (30 time steps) after the change in
status of D PUMP IN2 FLOW. This information can be included into the composite signal by
increasing the length of the calibration signal.
CANARY Training Tutorials                                                     Page 84

-------
CAL PUMP FLOW
Pump2 Change „
-1
23-
DCAL
CAL 1
0
.1




i


'



i i I i
Jan
24-Jan 25-Jan
26-Jan 27-Jan
28-Jan 29-Jan

30-Jan
III!
-

i
j

i
i

	 I 	 _

           23-Jan
                     24-Jan
                               25-Jan
                                        26-Jan
                                                  27-Jan
                                                            28-Jan
                                                                      29-Jan
                                                                               30-Jan
        O   1
      LLCO
           0.5
        c   0
           23-Jan
                     24-Jan
                               25-Jan
                                        26-Jan
                                                  27-Jan
                                                            28-Jan
                                                                      29-Jan
                                                                               30-Jan
Figure 83: Output plots produced using composite calibration signal.

These changes to the CMB_PUMP_FLOW composite signal were made in the configuration file,
StationD_Step4.yml, found in the
"Tutorial_Files\Composite_Tutorials\Composite_Signals_2\Step 4" directory. Figure 84 displays
the modified CMB_PUMP_FLOW signal in the YML file. This change looks for differences
between the current time step and 30 through 34 time steps (60 to 68 minutes) prior to the current
time step. The results of this change are shown in Figure 85. This change in the delay of the
calibration signal to an hour or more after the change in the pump status generates a calibration
signal during the time of the first two events and so they are no longer identified as events.
However, the third event occurs before a calibration period and thus is not clearly associated
with the changes in the pump operations.

This tutorial accomplishes the goal of using operational data to decrease false positive detection
of water quality events. The disadvantage of using composite signals to generate calibration
signals is that additional time steps are classified as being in calibration and all water quality
changes during those periods, not just those caused by the operational change, will  go unnoticed.
However, by limiting the period of calibration within the composite signal, the number of time
steps considered to be in calibration can be minimized.
CANARY Training Tutorials
Page 85

-------
 -  id: CMB_PUMP_FLOW
   SCADA tag: CMB_PUMP_FLOW
   evaluation type: op
   parameter type: Max_4Min_Change
   ignore changes: none
   data options:  # DATA
    precision:
    units:  "
    valid range:  [-.inf,  . inf ]
    set points:  [-.inf, .inf]
   composite rules: I
    @ TE ST_PUMP 2_FLOW [ 0 ]
    @TEST_PUMP2_FLOW [30 ]

    abs
    3 TE ST_PUMP 2_FLOW [ 0 ]
    @TEST_PUMP2_FLOW [31]

    abs
    @ TE ST_PUMP 2_FLOW [ 0 ]
    @TEST_PUMP2_FLOW [32 ]

    abs
    8 TE ST_PUMP 2_FLOW [ 0 ]
    @TEST_PUMP2_FLOW [33 ]

    abs
    8 TE ST_PUMP 2_FLOW [ 0 ]
    @TEST_PUMP2_FLOW [34 ]

    abs
Figure 84: Signals configuration section with the modified composite signal.
FLOW
hange g
-1
23-
DCAL 2
CAL 1
0
_1
JU
Jan





I
24-Jan 25-Jan
I
i
i



26-Jan 27-Jan
I
i
28-Jan



29-Jan



30-Jan
I I I !
-

j
I
j


i

	 I ..._

           23-Jan
                     24-Jan
                               25-Jan
                                         26-Jan
                                                   27-Jan
                                                             28-Jan
                                                                       29-Jan
                                                                                 30-Jan
        O   1
      LLCO
      0°  0.5
      LJ- ii
        C   0
           23-Jan
                     24-Jan
                               25-Jan
                                         26-Jan
                                                   27-Jan
                                                             28-Jan
                                                                       29-Jan
                                                                                 30-Jan
Figure 85: Output plots produced using modified composite signal.

Figure 86 shows the same data as Figure 85, but at a finer time resolution of only one day
(January 25th). The periods of calibration are just over an hour long due to the additional delay of
60-68 minutes. A final change is made to the CMB_PUMP_FLOW to examine the differences in
the pumping rates between 25 time steps prior to the current time and the same 30-34 time steps
CANARY Training Tutorials
Page 86

-------
prior to the current time as used previously (StationD_Step5.yml). These changes are shown in
Figure 87 and the associated files are found in the directory,
"Tutorial_Files\Composite_Tutorials\Composite_Signals_2\Step 5".
CAL PUMP FLOW
  Pump2 Change «
1 I l ! I 1 1 I 1 ! 1 I I 1
: : '• • : '•'•': '• '-
\ '•'-:•'- . : • '•
i i i 1 1 i 1 i 1 1 i 1 i i
!l i
i i i
! |
i i i i i
             00  01 02 03 04 05 06  07  08  09  10  11  12  13  14  15  16 17 18 19 20  21  22  23
       DCAL
        CAL "'
            0
            -1
i   i   i
             00  01 02 03 04 05 06  07  08  09  10  11  12  13  14  15  16 17 18 19 20  21  22  23  00
        O   1
      LLCO
           0.5
        c   0
             00  01 02 03 04 05 06  07  08  09  10  11  12  13  14  15 16 17 18 19 20  21  22  23  00
Figure 86: Output plots produced using modified composite signal for January 25th only.
   id:  CMB_PUMP_FLOW
   SCADA tag:  CMB_PUMP_FLOW
   evaluation type:  op
   parameter type:  Max_4Min_Change
   ignore changes:  none
   data options:  # DATA
     precision:  '
     units:  "
     valid range:  [-.inf,  .inf]
     set points:  [-.inf,  .inf]
   composite rules:  |
     f?TEST_PUMP2_FLOW [2 5 ]
     iiTEST_PDMP2_FLOW [30 ]

     abs
     9 TE ST_P UMP 2_FLOW [ 2 5 ]
     @IEST_PUMP2_FLOW[31]

     abs
     (3TEST_PUMP2_FLOW[25]
     9TEST_PUMP2_FLOW[32]

     abs
     @TEST PUMP2  FLOW[25]
     @TEST_PUMP 2_FLOW [33 ]

     abs
     8 TE ST_P UMP 2_FLOW [25]
     STEST_PUMP2_FLOW[34]

     abs
     max
Figure 87: Signals configuration section with final composite signal.
CANARY Training Tutorials
                                                                   Page 87

-------
The results from this final composite signal modification are shown for the entire week and
January 25th in Figure 88 and Figure 89, respectively. The total number of time steps within the
total calibration period is cut from 34 down to 9. The change in the size of the calibration period
is most clearly seen in Figure 89. The timing of the delay might not be quite optimal, as there are
still some non-zero event probabilities in the lowest graphs, but these levels are below the event
threshold needed to define a water quality event.
CAL PUMP FLOW
Pump2 Change n
-1
23-
DCAL 2
CAL 1
0
-1
23-
0 1
LLCO
U° 0.5
i"*
c o
23-




Jan
1 1
i
24-Jan 25-Jan
I J


i

i
26-Jan 27-Jan 28-Jan
')
i
29-Jan



30-Jan
Illll
—

Jan
_...

Jan
i
24-Jan 25-Jan
c- 1-2 : :

24-Jan 25-Jan


i
26-Jan 27-Jan
I
i
28-Jan
i
29-Jan
	 i 	 :

30-Jan

26-Jan 27-Jan 28-Jan
29-Jan
30-Jan
Figure 88: Output plots produced using final composite signal for entire week.
CAL PUMP FLOW
Pump2 Change Q
-1
0
DCAL 2
CAL 1
0
-1
0
0 1
LLOO
"° 0.5
i",
c: n
i i i i i i i ii
i i i i i i i i i i i i i i i i
!
i I i i i i
0 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0
i i l i i i i i i i i i i
i i i i i i i i I i i i i i i
I i i i i
0 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0
i i i i i i i i i i i i i i i I'M
i i i i i i
            00  01  02 03 04  05  06  07 08 09  10  11  12 13 14  15  16  17 18 19  20  21  22 23 00

Figure 89: Output plots produced using final composite signal for January 25th.

Additional adjustments to the composite signals can be made to increase or decrease the delay
and/or length of the calibration signal created to meet additional needs. Also, information from
other operational data streams can be integrated into the water quality event detection process
following this example.

5.3 Integrating Tank Level  Operational Information Into Event Detection
This tutorial demonstrates how composite signals integrate operational data into the event
detection process as a method for reducing false positives by using operational data to create
simulated calibration time periods. While this is similar to the previous tutorial, the approach is
different.
CANARY Training Tutorials
Page 88

-------
The data used in this tutorial are found in the file, CLDY_train.csv, found in the
"Tutorial_Files\Composite_Tutorials\Composite_Signals_3" directory. The file contains data
from a single water quality monitoring station which includes six primary water quality
parameters collected every two minutes over a six-week period. The water quality parameters are
residual chlorine (CL2), specific conductivity (COND), pH (PH), temperature (TEMP), turbidity
(TURB), and total organic carbon (TOC). An alarm status for each sensor, which indicates that
the data is unreliable, is also included. A calibration signal for the entire station is provided.


The data at this location is highly variable. Operational data might or might not be useful to help
reduce the false positive event detection rate. Therefore, the data file also provides a large
number of signals regarding the status of pumps and valves, along with flow rates, tank levels,
and pressures, measured from other locations in the same water distribution system. In addition,
a secondary chlorine sensor monitors the residual chlorine levels at the tank outlet that feeds
water into the main where the water quality monitoring station is located.


The large amount of operational data and the lack of knowledge regarding the correlation
between the operations and the water quality at this monitoring station make this a complicated
problem. A large number of possible  combinations of operational signals could be used to
suppress false alarms.  A subset of these possible combinations is examined here, with the
understanding that other combinations could provide results of equal or improved quality. The
configuration file for this tutorial is CLDY_Initial.yml and is found in the
"Tutorial_Files\Composite_Tutorials\Composite_Signals_3\Initial" directory. A set of
parameters for the LPCF and BED algorithms was developed to detect events. The monitoring
stations configuration section is shown in the Figure 90.

 monitoring stations:
]-  id: StationD
   station  id number:
   station  tag name: StationD
   location id number:
   enabled: yes
   inputs:
    - id:  stationd_in
   outputs:
]   signals:
    - id:  TEST_CL
    - id:  TEST_COND
    - id:  TEST_PH
    - id:  TESTJTEME
    - id:  TEST_TOC
    - id:  TEST_RES_CL
    - id:  TEST_TANK_ELEV
    - id:  TEST_TANK_FLOB
    - id:  TEST_CAL
    - id:  TEST_UPS_ALM
   algorithms :
    - id:  Bl

Figure 90: Monitoring stations configuration section.


Figure 91  shows the output produced by CANARY with this configuration file. Five distinct
events in the first week of the analysis (08/08-08/14/2008) are identified with multiple water
CANARY Training Tutorials                                                       Page 89

-------
quality signals contributing to each event. No calibration events were defined during this time
period, so there are no bright green time bars on the graph. The TEST_UPS_ALM signal is the
only calibration signal in the data set (not shown). Three operation signals that might be
correlated with the water quality changes are shown, including the CLDY_TANK_FLOW,
CLDY_TANK_ELEV and CLDY_OSXX_ALM signals. The CLDY_TANK_FLOW signal
does not provide any useful information for this week. The CLDY_TANK_CL2X signal is the
residual chlorine level coming out of the tank into the main and  it might provide some useful
information on reducing event detections. Additionally, the fluctuations in the
CLDY_TANK_ELEV signal could be of use.
     CLDYCL2X
      CL2 (Mg/L)  1
             0.5
              0'	
             08-Aug

    CLDYCOND 5°°
   COND (jiS/cm) 400

             300
             08-Aug

     CLDYPHXX 1°
        PH(pH) g
       CLDYTEMP
        TEMP (°C]
      CLDYTOCX
       TOC(ppm) 5
  CLDY TANK CL2X
      CL2 (Mg/L)
             0.5
CLDYTANKELEV 1025
     ELEV(feet) 1020
            1015
             08-Aug

  CLDY TANK FLOW 5
     FLOW(gpm) „ 	
             08-Aug

  CLDYOSXXALM 1-!?
     OPS(gpm) 0-5 ......
              0 -
        h§
 O1	
08-Aug
                           StationD 2008-08-08 00:00:00 to 2008-08-14 23:58:00
                           	1	1	1	1	r
              0	1	
             08-Aug      09-Aug

              ' '     "*"    '
            i	i	i	i	n       	1
          10-Aug      11-Aug      12-Aug     13-Aug      14-Aug      15-Aug
              0	i	i	
             08-Aug      09-Aug      10-Aug
_^	\__
09-Aug      10-Aug
                             11-Aug
                                                      i	v,	\j	|
                                                    12-Aug      13-Aug      14-Aug     15-Aug
09-Aug      10-Aug
                             11-Aug
                                                    12-Aug      13-Aug
14-Aug
15-Aug
             .0.5!	I	I	I	I	I	I	I
             08-Aug      09-Aug      10-Aug     11-Aug      12-Aug      13-Aug      14-Aug     15-Aug
              1 r	:	 I:	:	II	•	:	i	:	

                         i	H   I I	I   i    II
                       09-Aug      10-Aug     11-Aug      12-Aug


Figure 91: Output graph produced using initial configuration file.
                                                                i i    mi   i
                                                              13-Aug      14-Aug
                                                           15-Aug
CANARY Training Tutorials
                                                              Page 90

-------
The signals section of the configuration file, CLDY_Stepl.yml, found in the directory
"Tutorial_Files\Composite_Tutorials\Composite_Signals_3\Stepl" was altered to include two
new composite signals. The first new composite signal, REL_TANK_LVL, normalizes the water
elevation level of the tank to a value between 0 and 1. The second new composite signal,
TANK_CL_CHANGE, identifies significant changes within the tank outlet residual chlorine
value over the previous two time steps. The signals sections of the YML file for these signals are
shown in Figure 92 and Figure 93.

I- id: REL_TANK_LVL
  SCADA tag:  REL_TANK_LVL
  evaluation  type:  op
  parameter type: Calibration Time Out
  ignore changes: none
  data options:  # DATA
    precision: 1
    units:  T'
    valid range:  [0.0, 1.0]
    set points:   [-.inf,  . inf]
  composite rules:  |
    l3TEST_TANK_ELEV [0]
    (1.009630e+003)

    <1.027150e+003)
    (1.009630e+003)
Figure 92: Signals configuration section defining composite signal, REL_TANK_LVL.

In order to build the composite signal that defines the relative tank level, it is necessary to know
the minimum (1009.630 ft) and maximum (1027.150 ft) tank levels. The relative tank level is the
current value of the tank level (TEST_TANK_ELEV[OJ) minus the minimum value (1009.630)
divided by the full tank level range (1027.150-1009.630). This transforms the REL_TANK_LVL
into a value between 0 and 1.
CANARY Training Tutorials                                                     Page 91

-------
I- id: TANK_CL_CHANGE
  SCADA tag: TANK_CL_CHANGE
  evaluation type:  op
  parameter type:  Tank Chlorine Change
  ignore changes:  none
# alarm options:  #ALARM
#   value when active: 1
#   scope: # entire station
  data options:  * DATA
    precision: J.ILL!
    unit s : T T
    valid range:  [-.inf,  .inf]
    set points:  [-.inf,  .inf]
  composite rules:  |
    @TEST_RES_CL[0]
    @TEST_RES_CL[1]

    abs
    (i TE S T_RE S_CL [ 0 ]
    (iTEST_RES_CL [2 ]

    abs
    @TEST_RES_CL[0]
    @TEST_RES_CL[3]

    abs
    max
    (1.200000e-001)
    qt
Figure 93: Signals configuration section defining composite signal, TANK_CL_CHANGE.

The TANK_CL_CHANGE signal identifies the maximum absolute value of the difference in the
TEST_RES_CL signal between the current time step and any of the previous three time steps.
This maximum value is then compared to 0.12 and any value greater is a 1, otherwise it is a 0.
Thus, TANK_CL_CHANGE is equal to 1 when large changes have recently occurred in the
chlorine residual. This composite signal is of evaluation  type CAL and therefore the alarm
options parameter information is included in the signal definition. However, since the signal
needs to be viewed in the output graph, the evaluation type parameter is set to OP and the alarm
options parameter information is commented out with the # symbols.

Additional changes to the monitoring stations section of the configuration file include removing
the two operational signals, TEST_TANK_FLOW and CLDY_OSXX_ALM, which do not
provide any useful information about the events during this time period. Also, the TEST_TOC
signal does not appear to be functioning properly during  this time period. In the signals section,
the TEST_RES_CL signal is converted from the evaluation type parameter of WQ to OP for
graphing purposes.  The two new composite signals are added to the monitoring stations section
of the configuration file as shown in Figure 94. The main goal at this point is to plot these signals
to look for a relationship between them and the observed events.
CANARY Training Tutorials                                                     Page 92

-------
 monitoring stations:
]- id:  StationD
   station id number:
   station tag name:  StationD
   loc at ion id number:
   enabled: yes
1   input s:
     - id:  stationd_in
   outputs:
'.   signals:
     - id:  TEST_CL
     - id:  TEST_COND
     - id:  TEST_PH
     - id:  TEST_TEMP
     - id:  TEST_RES_CL
     - id:  REL_TANK_LVL
     - id:  TANK_CL_CHANGE
     - id:  TEST_UPS_ALM
!   algorithms:
     - id:  Bl

Figure 94: Monitoring stations configuration section composite signal enabled.


Figure 95 shows the new output graph with the additional composite signals. Examination of the
figure shows that the changes in the tank outflow residual chlorine appear to be indicators of
events. The relationship between the relative tank level and the events is less clear; however,
relative tank levels at or near 1.0 could be the cause of some events. The total number of events
for this week is now eight, since the removal of the TOC signal has caused some events that
were combined previously to now show up as individual events.
CANARY Training Tutorials                                                    Page 93

-------
                             StationD 2008-08-08 00:00:00 to 2008-08-14 23:58:00
       CLDY CL2X
        CL2 (Mg/L)
               1.5
               0.5
      CLDY COND
     COND (nS/cm) 40Q
       CLDY PHXX
          PH(pH)
   CLDY TANK CL2X
        CL2 (Mg/L)
     REL TANK LVL
  Calibration Time Out
   TANK CL CHANGE
Tank Chlorine Change
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
               08-Aug     09-Aug      10-Aug      11-Aug      12-Aug      13-Aug      14-Aug      15-Aug
Figure 95: Output graph produced when using composite signal.


In order to test the hypotheses that the water quality events might be linked to changes in the
tank chlorine residual, the TANK_CL_CHANGE signal's evaluation type parameter is converted
back to CAL. In addition, the three lines that previously commented out when the evaluation
type parameter was OP need to be uncommented.  The revised signals section of the
CLDY_Step2.yml file found in the directory
"Tutorial_Files\Composite_Tutorials\Composite_Signals_3 \Step2" for the
TANK_CL_CHANGE signal is shown in Figure 96. The relative tank level is left as an
operation signal and so does not affect the CANARY analysis directly. The monitoring stations
section does not need to be changed, since the TANK_CL_CHANGE signal is already defined.
CANARY Training Tutorials
Page 94

-------
  id: TANK_CL_CHANGE
  SCADA tag: TANK_CL_CHAN6E
  evaluation type: cal
  parameter type: Tank Chlorine Change
  ignore changes: none
  alarm options:  * ALARM
    value when active:
  composite rules: I
    |JTEST_RES_CL [0]
    9TEST_RES_CL[1]

    abs
    @TEST_RES_CL [0]
    @TEST_RES_CL[2]

    abs
    @TEST_RES_CL [0]
    9TEST_RES_CL [3]


    max
    <1.200000e-001)
    gt
Figure 96: Signals configuration section with the modified TANK_CL_CHANGE signal.

Figure 97 shows the new output graph produced when using the TANK_CL_CHANGE as a
calibration signal. Multiple calibration events occur based on this new composite signal and are
shown in green. Only two water quality events are identified. Examination of these two events
shows that the first one, on August 11* , is caused by a short-lived change in the TEST_COND
and TEST_PH signals and is not correlated with changes in the residual chlorine level out of the
tank (TEST_RES_CL signal). In addition, the temperature signal, TEST_TEMP, changes at this
same time. The second event, at the end of August 12*, is associated with changes in the
TEST_RES_CL signal. The TANK_CL_CHANGE composite signal could be modified to be
longer in order to remove this event.
The relative tank level, as defined in the REL_TANK_LVL signal, is not a direct predictor of
events. A relationship does exist between periods of tank filling and sudden changes in the
TEST_RES_CL values, but the current calibration signal, TANK_CL_CHANGE, already uses
the TEST_RES_CL values directly, and the REL_TANK_LVL signal would only add redundant
information to the current calibration signal. For this reason, the REL_TANK_LVL signal is not
added to the calibration process.
CANARY Training Tutorials                                                    Page 95

-------
      CLDY CL2X
      CL2 (Mg/L)
              1.5
              0.5
     CLDY COND
    COND (MS/cm) 400
              08-Aug
              500
              300
      CLDYPHXX
         PH(pH)
08-Aug
10,	
               8
              08-Aug
  CLDY TANK CL2X
      CL2 (Mg/L)
    REL TANK LVL
Calibration Time Out
                             StationD 2008-08-08 00:00:00 to 2008-08-14 23:58:00
09-Aug
                        09-Aug
09-Aug
10-Aug
11-Aug
12-Aug
13-Aug
14-Aug
          10-Aug
          11-Aug
          12-Aug
          13-Aug
          14-Aug
10-Aug
11-Aug
12-Aug
13-Aug
14-Aug
              08-Aug      09-Aug      10-Aug      11-Aug      12-Aug      13-Aug

Figure 97: Output graph produced when using modified signal.
                                                   14-Aug
                                                                       15-Aug
          15-Aug
                                                                       15-Aug
                                                                                     15-Aug
                                                                                     15-Aug
                                                                                     15-Aug
                                                  15-Aug
In summary, adding a calibration signal that used the residual chlorine level of a tank outflow
was effective in reducing unwanted water quality events for this monitoring station downstream
of the tank. This tutorial also shows how to use two constant elevation values to define the
relative water tank level. Additional information, such as the rate of change (slope) of the tank
level could be calculated within a composite signal and added to the event detection process, but
that level of sophistication is not needed here.
CANARY Training Tutorials
                                                                Page 96

-------
References

Hart, D. B. and McKenna, S. A. 2012. CANARY User's Manual, Version 4.3.2. Washington,
D,C.: U.S. Environmental Protection Agency. EPA/600/R/08/040B.

McKenna, S. A., Vugrin, E. D., Hart, D. B, and Aumer, R. A. 2013. Multivariate Trajectory
Clustering for False Positive Reduction in Online Event Detection. Journal of Water Resources
Planning and Management. 139(1): 3-12.

Murray, R., Haxton, T., McKenna, S., Hart, D., Klise, K., Koch, M., Vugrin, E., Martin, S.,
Wilson, M., and Cruz, V. 2010. Water Quality Event Detection Systems for Drinking Water
Contamination Warning Systems: Development, Testing, and Application of CANARY.
Washington, D,C.: U.S. Environmental Protection Agency. EPA/600/R-10/036.

U.S. EPA. 2012. CANARY Quick Start Guide. Washington, D.C.: U.S. Environmental Protection
Agency. EPA/600/R-12/010.
CANARY Training Tutorials                                                    Page 97

-------
Appendix A: Frequently Asked Questions (FAQs)
This section provides answers to frequently asked questions from each of section of CANARY
Training Tutorials.

CONFIGURATION FILE FAQS

Q. What are the spacing requirements within an YMLfile? What conventions need to be paid
attention to when editing the configuration file?

A. Extra lines should be removed, and nothing should be indented using tabs. Indentation should
only be done using spaces.

Q. Can explanatory notes be added to the configuration file?

A. Yes. Notes are added as comments within YML formatted files. Comments are denoted with a
leading # on each line in the configuration file. Two types of comments are shown in Figure 98.
The # alarm options is a comment denoting that the alarm options parameter is a heading in the
YML for alarm information. Another comment is # entire station, which identifies that the scope
parameter with no value is applied to the entire station. The TANK_CL_CHANGE  signal was
originally defined as a calibration signal and as such it has the alarm options parameter
information. The signal was changed to evaluation type of OP and the alarm options parameters
are not needed. Rather than removing these parameters, as they might be used again in the future,
the entire alarm options parameter block is commented out by adding a # to the start of each line.
     id: TANK_CL_CHAHGE
     KCADR tag: TANK_CL_CHANGE
     esvs lust Ion typ*^: op
     param^t^r typ^: Tank Chlorine
     Ignore changers: non^
     dat a opt Ions;
       precision:   .   :
       unit s:  *T
       valid jtdtKfe:  [ - . Inf, .inf]
       set point H:  [ -.inf,  .inf]
     composite rules: j
       MI;! jrf s < i [ i; j
       MS! J:f :', '• I [ 1 ]
Figure 98: Example of comments within a configuration file.
CANARY Training Tutorials                                                   Page 98

-------
OPTIMIZING CANARY CONFIGURATION FILE FAQS

Q. Will multiple locations be on the same graph?

A. No. The EDSD files generated are specific to each station and so are the resulting graphs.
This feature allows one configuration file to be used for all the stations, but the results are kept
separate on a station by station basis.

Q. What if BED is not used? Will there still be values on an event probability plot in the output
graphs?

A. If BED is turned off, the resulting probability is the absolute value of the largest residual
across all sensors for that time step. It is no longer an actual probability, but a residual with
values greater than 1.0. The status column will now indicate an event for every time step where
the probability is greater than event threshold parameter.

This is a little bit of apples versus oranges in that CANARY is now comparing a residual value
against a probability threshold, but it makes sense in that whatever is called probability of event
should be consistently compared against the value of event threshold parameter. This is why the
organization of the configuration file has the parameter definition for event threshold parameter
outside of the BED parameters.

If a true probability of an event is wanted when using the LPCF or MVNN algorithms, BED
must be used. The SPPE and SPPB algorithms will produce a true probability of event value
without using the BED. The probability of an event is calculated by the BED using the approach
detailed in Murray  et al. (2010).

DATABASE DRIVEN INPUT/OUTPUT FAQS

Q. What do positive or negative values of the time drift parameter mean?

A. The time drift parameter accounts for the discrepancies between the clocks on two different
systems. Positive values mean that the computer running CANARY has a clock value behind the
database (e.g.,  CANARY'S clock says  12:30 but the database says 12:35). A negative value
means that CANARY'S clock is ahead of the database.  Because of latency in writing data values
to a database, it is a good idea to set this value so that CANARY is looking for new data a
minute or two after the database writes the data - this way there is less likely to be data loss
because of missed reads.

Q. What are the units of the time drift parameter?

A. The time drift parameter is defined in fractions of a day. It is specified in decimal format.

Q. My database JDBC driver has several classes, which class is the right one?

A. The class will have DataSource in its name, probably at the end of the class name. Pool time
classes are okay. The main thing to ensure is that this is a JDBC2 DataSource class, which
CANARY Training Tutorials                                                      Page 99

-------
should be listed in the class ancestors.

Q. My database vendor has several different JAR files on their website, which one do I
download?

A. This depends on which version of the database is being used. For example, Oracle™ has
different  drivers for versions 10 and 11 of their full database and different versions for their free
software  as well. Many of the newest installations will have a *jdbc*.jar file somewhere in the
installed Program Files folder, but when in doubt, ask the database administrator, the contractor,
or employee in charge of maintaining the database. Again, CANARY needs the JDBC2 drivers.

Q. How do I set the URL for my database?

A. The address will look similar to the addresses in the examples. However, this is very database
specific, and not every example can be shown. When in doubt, talk to the local database contact.

Q. / do not want my username and password in the configuration file. Does this mean I have to
type it in  every time I use CANARY?

A. Yes. The user will either need to protect the configuration file with access controls or enter a
password every time, because CANARY does not have encryption capabilities. Even with access
controls,  it is always a good idea to make the CANARY user have the least privileges possible to
protect both systems and the database.

COMPOSITE SIGNALS FAQS

Q. What are the mathematical operations that can be used in composite signals?

A. A description of the permissible operations and the symbols that define them are included in
the CANARY User's Manual (Hart and McKenna 2012).

Q. When  should I use pattern matching with trajectory cluster ing versus using composite signals
to decrease the events caused by water quality patterns?

A. If operational data are available that provide cues to when a water quality event is about to
happen, then composite signals are generally easier to implement and refine. When operational
data are not available, or not closely tied to the change in water quality (as can occur when the
operational change and the monitoring station are not located together), then pattern matching
can be implemented.
CANARY Training Tutorials                                                     Page 100

-------
Appendix B:  Binomial Distribution Function  Exercise
A useful exercise to help understand how CANARY'S BED parameters interact is to examine the
effect of the different parameters within a spreadsheet by using the binomial distribution
function. The BED is based on a binomial failure model that looks at the number of failures
(NFAILURES) within a certain number of trials (NTRIALS) given that the chance of a failure
occurring in any single trial (PFAIL) is constant. In terms of the CANARY event detection
process, NFAILURE is an outlier, NTRIALS is the number of time steps within a user-defined
window, and PFAIL is the chance of an outlier occurring at any time step. If there are more
outliers within the window than would be predicted, then this increases the likelihood that an
event is occurring.

Two parameters control the sensitivity of the event detection when using the BED, the size of the
BED window and the BED event threshold. While the event threshold parameter is not directly
one of the BED parameters, it uses the output of the BED to determine when an event occurs. By
changing the BED window parameter, the user can increase or decrease CANARY'S sensitivity.
For this tutorial, Microsoft® Office Excel® was used; however, any spreadsheet software that
includes a binomial  distribution function can be used. The spreadsheet function used to compute
the probability of an event is the following:

Probability of event = BINOMDIST(NFAILURES, NTRIALS, PFAIL, CDF)

Where the relationship between the parameters in the spreadsheet function and those in
CANARY are:
NTRIALS = BED window
PFAIL = BED outlier probability
CDF = 1.0 (Not an input  to CANARY)
NFAILURES = The number of outlier time steps that occur within the BED window. This
number is calculated within CANARY.

For this tutorial, complete the following steps:

   •  Open a Microsoft® Office Excel® file and make a column that contains the numbers 0 to
      20. This column will represent the NFAILURES (number of outliers) in the function
      (Figure 99).
CANARY Training Tutorials                                                  Page 101

-------
        Al
              B
                            D
                                                                             K
          1
          2
          3
          4
          5
          6
          7
8
9




















  Select destination and press ENTER or choose Paste
Figure 99: NFAILURES column using a NTRIALS of 20.

   •   In the Bl cell, type the following function: = BINOMDIST(A1, 20, 0.5, 1) (Figure 100).
       This function uses the Column A values as the number of outliers within the BED
       window.
CANARY Training Tutorials
Page 102

-------
                             = BINOMDIST(Alr 20, 0.5,1)
                                     E      F      G
2
3
4
5
6
-j
S
9
10
11
12
13
14
15
16
17
IS
19
20
21
22
23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20






























































































































































































































     K H | sheetl  SheetZ .- Sheets /
  Ready
Figure 100: Binomial distribution function using a NTRIALS of 20 and a PFAIL of 0.5.
       Copy the Bl cell to cells B2 through B21 (Figure 101).
CANARY Training Tutorials
Page 103

-------
CB22 - (^ £
A
B
C

D
E
F
G
H
1
J
K

          0  9.53674E-07
          l|  2.00272E-05
          2  0.000201225
          3  0.001288414
          4  0.005908966
          5  0.020694733
          6  0.057659149
          7  0.131587982
          8  0.251722336
          9  0.411901474
          10  0.5SS09S526
          11  0.748277664
          12  0.868412018
          13  0.942340851
          14  0.979305267
          15  0.994091034
          16  0.998711586
          17  0.999798775
          18  0.999979973
          19  0.999999046
          20       ~~l[
          Sheetl J Sheet!,.  Sheet3
  Ready
Figure 101: Probability of event values for a NTRIALS of 20.

The Column B values are the calculated probability of an event, which increase as the number of
outlier time steps within the BED window increases, until it becomes 100% at 20 outliers. The
probability of event values approach 1.0 asymptotically; even with 15 of the time steps in the
BED window being outliers, the probability of event value is already above 0.99. A graph of this
can be seen in Figure 102. The sensitivity of the event detection can also be modified by using
different event threshold parameter values. For example, if the event threshold parameter is set to
0.85, 12 or more outliers (failures) within the BED window parameter are needed to cause an
event as shown in Figure 102. If the event threshold parameter is increased to 0.99, and all other
parameter values are held constant, 15 or more outliers within the window are needed to cause an
event.
CANARY Training Tutorials
Page 104

-------
          2  4
                    8  10 12 14 16 18 20
                   NFAILURES
Figure 102: Probability of event graph using a NTRIALS of 20.

To increase the sensitivity of the BED function, the size of NTRIALS (BED window) is
decreased. The previous steps are repeated using a NTRIALS of 6 instead of 20. The function is
the following = BINOMDIST(A1, 6, 0.5, 1). The probability of event values using a NTRIALS
of 6 is shown in (Figure 103).
        E22

                B
           0  9.53674E-07
           1  2.00272E-05
           2  0.000201225
           3  0.0012SS414
                                    0 0.015625
           4  0.005908966
           5  0.020694733
           6  0.057659149
           7  0.1315879S2
           8  0.251722336
           9  0.411901474
          10  0.5SS09S526
          11  0.748277664
          12  0.868412018
          13  0.942340851
          14  0.979305267
          15  0.994091034
          16  0.99S7115S6
          17  0.999798775
          IS  0.999979973
          19  0.999999046
          20          1
 1 0.109375
 2  0.34375
 3  0.65625
 4 0.890625
 5 0.984375

 6r     X
 7F#NUM!
 &W#NUM!
 9r#NUM!
10 r #NUM!
lir#NUM!
14ffNUM!
15r#NUM!
17#NUM!
18 r #NUM!
19rttNUM!
  Select destination and press ENTER or choose Paste
Figure 103: Probability of event values for a NTRIALS of 6.
CANARY Training Tutorials
                                                  Page 105

-------
With a shorter NTRIALS (BED window) of 6, the probability of an event increases much faster
than with a NTRIALS of 20. Using event threshold parameters of 0.85 and 0.99 would now
cause an event after four or five outliers. As seen in Figure 104, a 100% probability is reached at
6 outliers instead of 20. Having more outliers than the length of the window is not possible (i.e.,
7 outliers out of 6 time steps is impossible); therefore, the #NUM error appears in the lower cells
of Column E.
    i
   0.9
   0.8
   o,
   0.6
  • 0.5
   0.4
   0.3
   0.2
   0.1
       0  1 2 3 4  5  6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
                         NFAILURES
Figure 104: Probability of event graph using a NTRIALS of 6.
CANARY Training Tutorials
Page 106

-------
Appendix C: Configuration File Quick Reference
This section is intended to provide a quick reference to create or edit CANARY configuration
files. The input parameters associated with each section of the configuration file along and
section examples based on the tutorial files are provided. Example configuration files can be
found in the examples folder created after installing CANARY (.. .\My CANARY\examples\), or
in the tutorial archive (ZIP) that corresponds to CANARY Training Tutorials. For more
information regarding CANARY configuration files,  see Sections 5  and Appendix A of the
CANARY User's Manual (Hart and McKenna 2012).

The main sections of the CANARY configuration file are shown in Table 3. The parameters for
each section along with their options, syntax,  and additional information are shown in following
tables. Defaults, if applicable, are in bold in the Options column. If a parameter is optional, then
a checkmark is in the Optional column.


Table 3: CANARY Configuration File Sections

  Section	Contents	
  canary              Basic information about the CANARY run

  timing options       Date  and time range of information, data intervals

  data sources         Input file or database information

  signals              Lists  the signal information from the input file or database

  algorithms          Specifies the event detection algorithm

  monitoring stations   Identifies the monitoring stations, signals, and algorithms for analysis

The first section of the CANARY configuration file is the canary section. This section provides
the basic information on how a CANARY analysis is executed. Table 4 shows the input
parameters for this section. Section 5.1 of the CANARY User's Manual (Hart and McKenna
2012) provides more details on these parameters.


Table 4: Input Parameters  for canary Section of the CANARY Configuration File
Input Parameters   Options	Optional
  run mode:         BATCH, REALTIME, EDDIES
  control type:      INTERNAL, EXTERNAL, EDDIES
  control            null (for INTERNAL control type), else  defines data sources    /
  messenger:	for EXTERNAL or EDDIES control type	
                   Specify a restart file to pre-populate data (e.g., continue.edsd).   /
  11QP f^ontinilP*                          r^r^r^          \  o :>             /    ^/
  use continue.      Used primarily with REALTD^E	

  data provided:     NEW VALUES, ALL VALUES                            ^
  driver files:       null, or specify any driver files  required to run. Precede each
CANARY Training Tutorials                                                   Page 107

-------
 Input Parameters   Options	Optional
                    entry with a minus (-) symbol.


 Figure 105 shows a basic example of a canary section using BATCH mode. Figure 106
 highlights the use of the driver files parameter for connecting CANARY to a database. Note the
 indentation levels for each type of input and the use of the minus (-) symbol to reference the
 location of the database specific driver file. The required database driver file must be
 downloaded from the database developer's website or installation CD. Its location on your
 machine must also be specified. In this example, it is  stored in the lib subfolder within
 CANARY'S install location. This was done for convenience. Section 4 of CANARY Training
 Tutorials provides more information on connecting CANARY with databases.

  canary:
    run mode: BATCH
    control type: INTERNAL
    control messenger: null
    driver  files: null
 Figure 105: Example of canary section using BATCH mode.

  canary:
    run mode: BATCH
    control type:  INTERNAL
    control  messenger: null
    driver  files:
      - c:\Program Files (x36)\CANARY\11b\database_spec1fIc-bln.jar

 Figure 106: Example of canary section using a database connection.

 The second section of the CANARY configuration file is the timing options section. This section
 provides the date and time information for the data to be analyzed. Table 5 shows the input
 parameters for this section. Section 5.2 of the CANARY User's Manual (Hart and McKenna
 2012) provides more details on the timing options section.

 Table 5: Input Parameters for timing options Section of the CANARY Configuration File
 Input Parameters	Options	Optional
   dynamic start-stop:     off (usually used by BATCH and EDDIES)
	on (usually used in REALTIME)	
   date-time format:       Common formats:
                        US standard - mm/dd/yyyy HH:MM AM
                        European Std - dd/mm/yyyy HH:MM:SS
                        ISO Std - yyyymmddTHHMMSS
                        ODBC Database canonical
	- yyyy-mm-dd HH:MM:SS	
   date-time start:         Input start time of analysis
   date-time stop:         Input stop time of analysis
   data interval:           HH:MM:SS Format (e.g.,  5 minutes = 00:05:00)
 CANARY Training Tutorials                                                   Page 108

-------
Input Parameters
    Options
Optional
  message interval:
    Time to wait for input, or sleep time. Generally, should be
    smaller than data interval.
Figure 107 contains an example of the timing options section. This example is requesting
analysis from May 28th 2013 to July 1st 2013 at a data interval of 2 minutes.


 # Enter the time  step options below
 timing options:
   dynamic start-stop:  off
   date-time start:  2013-05-28 00:00:00
   date-time stop:  2013-07-01 00:00:00
   date-time format:  yyyy-mm-dd HH:MM:ss
   data interval:  00:02:00
   message interval:  00:00:10

Figure 107: Example of timing options section.



The third section of the CANARY configuration file is the data sources section, which provides

the input and output information for a CANARY analysis. Table 6 shows the input parameters

for this section. Section 5.3 of the CANARY User's Manual (Hart and McKenna 2012) provides
more details on the data sources section.
Table 6: Input Parameters for data sources Section of the CANARY Configuration File

Input Parameters   Options	Optional
  -id:
Internal identifier. Text string. No spaces are allowed (use
dash or underscore) and field is case sensitive.	
  type:
CSV, DB (or JDBC), EDDIES, XML
  location:
File name or URL of data source (can contain ".A" or full
path to direct to local files not located in the same folder as
the YML)	
  enabled:
                    yes or no
  configFile:
Replaced by database options:
  timestep options:   field:
                    Specify the column header of the
                    date/time information
                    format:
                    Can be omitted if it matches the global
                    date format
                    conversion
                    function:
                    Required to convert string formats in
                    databases
  database options:   time drift:
                    Units of fractional day (e.g.,
                    0.75=18hr)	
                    JDBC2 class name:   Java class inside the driver
                    input table:
                    Specify the table or view inside the
                    database that contains the input data
                    input format:
                    default, row-based or custom (case
                    sensitive)	
CANARY Training Tutorials
                                                         Page 109

-------
Input Parameters   Options
                                      Optional
                    input fields:
       time-step: (field in which the
       time stamp of the new data is
       listed)
       parameter tag: (field in which
       the SCADA tag is listed)
       parameter value: (field in
       which the SCADA value is
       listed)
       parameter quality: (field in
       which the quality flag might be
       listed. This must be a string that
       is either "Normal" or "Bad"
       (case insensitive).)	
                    output table:
 Specify the location of the table inside
 the database into which the CANARY
 results will be placed	
                    output format:
 default, extended, custom (case
 sensitive)	
                    output fields:
 Starts the output fields subsection.
 Can be turned off with no or false.
 custom - no default field names are
 provided for any field
 default or extended can use the
 commands below to override default
 fields
    •   write conditions: (all)
    •   time-step: (field for time/date
        information)
    •   instance id: (defines the field
        for the CANARY instance ID
        that created the values)
    •   station id: (station definition
        that provided the results)
    •   algorithm id: (field for the
        algorithm that generated the
        results)
    •   parameter tag: (single value of
        the SCADA parameter tag that
        caused the event)
    •   parameter residual:  (field for
        the residual of the parameter.
        Omit unless specifying
        parameter tag.)
    •   parameter type: (field that
	contains the parameter type.
CANARY Training Tutorials
                                       Page 110

-------
Input Parameters   Options
                                                             Optional
                                               Only valid when using
                                               parameter tag.)
                                               event code: (field for the
                                               integer event code)
                                               event probability: (field for the
                                               event probability)
                                               contributing parameters: (field
                                               for whitespace delimited
                                               contributing parameters)
                                               comments: (field for analysis
                                               comments)
                                               pattern match id: (field for the
                                               pattern with the highest match
                                               pattern)
                                               pattern match probability: (field
                                               for the probability of highest
                                               match probability)
                                               secondary match id: (field for
                                               the pattern with the 2nd highest
                                               match pattern)
                                               secondary match probability:
                                               (field for the probability of 2"
                                               highest match probability)
                                               tertiary match id: (field for the
                                               pattern with the 3r highest
                                               match probability)
                                               tertiary match probability:
                                               (field for the probability of 3r
                                               highest match pattern)
                                                        •>nd
                                                        >rd
                    login info:
                        prompt for login: yes or no
                        username:
                        password:	
Figure 108 shows an example of the data sources section for a CSV data file. Section 4 of
CANARY Training Tutorials provides more information regarding database connections.
 # Enter the  list of data sources below
 data sources:
 - Id: stat1onb_1n
   type
   location
   enabled
   tirnestep options:
     field: "TIME_STEP"
CSV
Tutorial_station_B.csv
yes
Figure 108: Example of data sources section using a CSV file.
CANARY Training Tutorials
                                                              Page 111

-------
The fourth section of the CANARY configuration file is the signals section, which defines the
data signals for a CANARY analysis. Table 7 shows the input parameters for this section. Each
signal must begin with the id parameter and subsequent lines should match the indentation of this
parameter. Section 5.4 of the CANARY User's Manual (Hart and McKenna 2012) provides
more details on the signals section.
Table 7: Input Parameters for signals Section of the CANARY Configuration File
Input Parameters  Options	Optional
  -id:
 Case sensitive string. Cannot include spaces or symbols. Must
 be unique and cannot be a portion of another id. (e.g.,
 'CL_PUMP1' and 'CL_PUMP1_1' will produce an error, but
 'CL  PUMP  1  r will not.)	
  SCADA tag:       Signal name given in the CSV or database file. Must match the
                    column header, database field, or the database table value.
  evaluation type:
 WQ (water quality)
 OP (operations) - Can also contain calibration data, if multiple
              calibration signals occur in one station
 ALM (alarm)
 CAL (calibration) - Only one per station	
  parameter type:
 Short text used as axis label for CANARY plots (e.g., pH or
 CL2). In EDDIES mode, they must match EDDIES definitions.
  ignore changes:   all, increases, decreases, both, none
  data options:
 Needed for all water quality (WQ) and operational (OP) signal
 types	
 precision:       Specify the noise threshold for the signal. Tied
	to sensor precision.	
                Label for CANARY plots. Recognizes LaTeX
         	character strings (e.g., |\mu} = u).	
                   units:
                   valid range:     Values outside the valid range are treated as
                                   originating from a faulty sensor and are
                                   ignored.  Two unit vector (e.g., [-.inf, .inf] or
                  	[0, 2]).	
                   set points:       Used for event detection. If either value is
                                   outside the valid range then the set point alarm
 	does not occur. Two unit vector.	
  alarm options:     Used with alarm (ALM) and calibration (CAL) signals
                   value when active:
                    1 or 0 (The majority of utilities define
                    normal operations as 0 and
                    alarm/calibration as 1.  The opposite can
                    also be handled.)	
                    scope:
                    SCADA Tag. Specifies which signal (WQ
                    or OP types) the alarm signal applies.
                    Blank for CAL.
CANARY Training Tutorials
                                                           Page 112

-------
 Input Parameters   Options	Optional
   composite rules:   Defines a composite signal. Should be followed by a pipe
                    command ( ). Commands use Reverse Polish Notation (RPN).
                    Limited to about 12 calculations per 'composite rule:'
	command.	

 Figure 109 contains a portion of the signals section. This specific example is for a chlorine signal
 that is stored under the SCADA tag parameter B_CL2_VAL but will be displayed in CANARY
 as TEST_CL (id parameter).

 # Enter  the list of scADA/composite signals/parameters  below
 signals:
 - id:  TEST_CL
   SCADA  tag:  B_CL2_VAL
   e v al LI at i o n  type: wq
   parameter type: ci_2
   ignore changes: none
   data options :
      precision: 0.0035
      units:  'Mg/L'
      valid  range:  [-.inf,  .inf]
      set  points:  [-.inf,  .inf]

 Figure 109: Example of signals section using a chlorine signal.

 The fifth section of the CANARY configuration file is the algorithms section, which defines the
 algorithms to be used for a CANARY analysis. Table 8 shows the input parameters for this
 section. Each algorithm must begin with the id parameter and subsequent lines should match the
 indentation of this parameter. Section 5.5  of the CANARY User's Manual (Hart and McKenna
 2012) provides  more details on the algorithms section.

 Table 8: Input Parameters for algorithms Section of the CANARY Configuration File
 Input Parameters	Options	Optional
   - id:                    Internal identifier. Text string. No spaces are allowed
                          (use dash or underscore) and field is case sensitive. Same
	indent behavior as the signals section.	
   type:                   LPCF, MVNN, SPPE, SPPB, JAVA, CAVE, CMAX
   history window:         Number of prior time steps to include in history.
                          Rule of thumb is to include 1.5 to 2 days of previous data
                          (e.g., 1 day = 1440 minutes => 1440/2 minutes/time step
	= 720).	
   outlier threshold:         Number of standard deviations of prediction error that
                          must be met or exceeded before declaring a value an
	outlier	
   event threshold:         Specify the probability of an event that must be exceeded
	prior to declaring a series of outliers an event	
   event timeout:           Number of consecutive time steps after an event is
	detected before the alarm is automatically silenced	
 CANARY Training Tutorials                                                   Page 113

-------
 Input Parameters	Options	Optional
  event window save:      Amount of time to be saved prior to an event being
                          reported. Used only for plotting the identified events in
	post processing.	
  BED:                   window:         Size of the binomial window. Must be
                                          less than history window. Typical
                         	values are 4 to 18 steps.	
                          outlier           Should be left at 0.5. The probability of
	probability:	failure for each binomial trial.	
  external algorithm       Can only be used if external algorithm has been created.
  class:                   It is the class name of the external algorithm being         S
	called.	
  external algorithm       Defines the external algorithm configuration. The XML    /
  config:	configuration should be added here in double-quotes.	
  cluster library file:       File name of the clustering data file to use with the         ^
	algorithm	
  use algorithm inputs:     Specifies which algorithm to use in the CAVE or CMAX
                          consensus algorithms. List of algorithm ids. Each line
	starting with a (-) symbol.	

 Figure 110 displays an algorithms section example that defines the algorithm as LPCF using the
 BED option.

 #  Enter  the list of event  detection algorithms  below
 algorithms:
 -  id: test
    type:  LPCF
    history window: 144
    outlier threshold: 0.8
    event  threshold: 0.85
    event  timeout: 12
    event  window save: 30
    BED:
      window:  6
      outlier probability:  0.5

 Figure 110: Example of algorithms section using the LPCF algorithm.

 The last section of the CANARY configuration file is the monitoring stations section. This
 section defines the monitoring station location, the signals, and the algorithms to be used for a
 CANARY analysis. Table 9 shows the input parameters for this section. Each algorithm must
 begin with the id parameter and subsequent lines should match the indentation of this parameter.
 Section 5.6 of the CANARY User's Manual (Hart and McKenna 2012) provides more details on
 the monitoring stations section.

 Table 9: Input Parameters for monitoring stations Section of the CANARY Configuration
 File
 Input Parameters	Options	Optional
 CANARY Training Tutorials                                                    Page 114

-------
Input Parameters
Options
                                     Optional
  -id:
Basis for naming output files. Text string. No spaces are
allowed (use dash or underscore) and field is case
sensitive. Same indent behavior as the signals section.
  location id number:
Integer. User-defined physical location.
  station tag name:
Defines a station tag name for use in the
LOCATION ID field of output tables. If omitted the
station id string will be used instead.	
  station id number:
Configuration number for the station. Can be used to
differentiate between two different substations or two
different algorithms using the same signal data.	
  enabled:
yes or no. This allows one configuration file to have
multiple analysis steps configured for a data stream, but
only run a subset at any given time.	
  inputs:
'-id:' (Must match the id from the data sources section)
  outputs:
'-id:' (Must match the id from the data sources section)    •/
  signals:
'-id:'
Must match the id in the signals
section
                         cluster:
                                         yes or no
  algorithms:
'-id:'
Must match the id in the algorithms
section
                         cluster:
                                         yes or no
Figure 111 shows a monitoring stations section example. Each of the id parameters refers to the
internal names that correspond to those in the data sources, signals and algorithms sections.
 # Enter the list of monitoring  stations below
 monitoring stations:
 - Id:  stations
   station Id number:
   station tag name: stations
   location la number: -1
   enabled: yes
   Inputs:
     -  Id:  stat1onb_1n
   outputs:
   signals:
     -  Id:  CAL_stat1onB
     -  Id:  TEST_CL
     -  Id:  TEST_PH
     -  Id:  TEST_TEMP
     -  Id:  TEST_CGND
     -  Id:  TEST_TURB
     -  Id:  TEST_PRES_PLNT
     -  Id:  TEST_FLOW_PLNT
     -  Id:  TEST_TGC
       cluster: no
   algorlthms:
     -  Id:  test
Figure 111: Example of monitoring stations section.
CANARY Training Tutorials
                                                    Page 115

-------
Appendix D: File Types

Input files

   •   .yml or .edsy - file that specifies the configuration details; double click this type of file to
       run CANARY, or right-click and choose "Edit" to open the configuration editor in a text
       editor. Older configuration files were written in XML format and ended in EDSX. These
       older files can still be read by CANARY. The EDSY extension hooks the file into the
       right-click abilities of the Windows® operating system; otherwise it is equivalent to the
       YML

   •   csv - file that provides input data for analysis from a spreadsheet. If the data is to be read
       from a database, then the YML file must contain instructions for accessing the database.

Output files

   •   out.yml - copy of the YML configuration file which was run.

   •   heartbeatdat - file that details the specific date and time CANARY was run.

   •   .edsd - output files for each monitoring station and one that summarizes all of the output;
       graphed when double clicked.

   •   status.log - file that tracks the history of all the actions taken at each time step.

   •   station.log - file that tracks the  history of the CANARY analysis on each day.

   •   CONTROL.msg.log - file that tracks the history of messages passed between analysis
       outputs and Control,  the part of the code that controls when actions are executed;  will
       open in a text editor when double-clicked.

   •   Test Station.Summary.txt - summary  of the CANARY run that logs each detected
       event and summarizes the inputs to the analysis.

   •   .png - image file used by CANARY that displays graphed data. This file is created by
       right clicking the EDSD file and selecting Graph Data.
CANARY Training Tutorials                                                     Page 116

-------
Glossary
   •   baseline-change - A long-term sustained change in the average behavior of a signal. The
       length of time that is required to be considered a baseline-change is definable by the user.
       The initial change will trigger an event alarm, however after the user-defined length of
       time the alarms will be suppressed for two reasons: (1) The CANARY alarm is no longer
       needed because operators have identified the cause or initiated action to find the cause; or
       (2) because enough time has passed, operators would like CANARY to begin using the
       new baseline-value to calculate future events. Not all baseline-changes last long enough
       to be considered baseline-changes.

   •   binomial event discriminator (BED) - A statistical algorithm that is used to determine
       how many outliers are needed in a given time-frame to constitute an event alarm.
       Parameters for this algorithm are specified by the user. (See Section 3.3 and Appendix B
       of this document, or the CANARY User's Manual Section 2.5.1 for more information.)

   •   clustering - A process of identifying normal changes in water quality patterns. Cluster
       files are used to keep a library of pattern information to identify "normal events" and
       decrease false alarms. Typically one or more operation signals provide useful information
       to a clustering algorithm.

   •   data-interval - (also called sampling interval) The frequency at which CANARY
       expects data signals.  A common data interval is two minutes (data interval: 00:02:00).
       Only data that is most recent, or matches the data-interval best, will be used by
       CANARY for analysis.

   •   evaluation type - This parameter defines the type of data signal. The options are WQ
       (water-quality), OP (operations), ALM (data alarm), and CAL (calibration). Only signals
       defined as WQ are used to detect events. Only one CAL signal can be used by each
       station. To use multiple calibration signals, define the signals as OP signals, and combine
       them into a single CAL signal using a composite signal.

   •   event - A period of sustained anomalous activity (i.e., a number of outliers occurring
       within a short period of time). The binomial event discriminator is used to determine the
       number of outliers in a given time-frame that will trigger an event alarm signal.

   •   false alarm - An event alarm signaled by CANARY that does not correlate to an actual
       event. Alarms may be caused due to calibration events, operational changes within the
       system or configuration parameters that are too sensitive.

   •   monitoring station - A set of sensors and their time series signals that are used together
       for event detection. These are made up of sensor hardware that is all located in the same
       physical testing site. Generally, a monitoring station includes some water quality signals,
       and may also transmit operations or alarm signals.

   •   normalization window - The user-definable time-frame in which short-term statistical
CANARY Training Tutorials                                                    Page 117

-------
       calculations are made. Data within the normalization window is normalized to have a
       mean of zero and a standard deviation of 1.0. The duty cycle of a given location should
       be considered when specifying the normalization window. For example, a site below a
       tank that fills and drains on a daily schedule may need a window size of slightly longer
       than one day, while an in-network location may only need three to six hours of history.

   •   outlier - A data point at a single time step that is anomalous (i.e., deviates by more than
       a threshold value) relative to the background or predicted behavior for that signal at a
       given time step. Multiple outliers in a given time-frame may constitute an event.

   •   precision - The precision limit of sensor hardware. Used by CANARY to help reduce
       false alarms by telling CANARY the smallest change that can be reported by  a sensor
       (e.g., a change from one time step to another of one (1)  in a  conductivity signal does not
       have the same implications as a change of one in a pH signal).

   •   probability threshold (TB) - The probability of an event that must be exceeded before an
       event is signaled by CANARY.

   •   residual - The value calculated by CANARY that is equal to the difference between the
       measured and predicted water quality signal value at a single time step. The absolute
       value of this calculated residual is compared to the threshold value to determine if the
       data point is an outlier.

   •   signal - A data stream, usually from a SCADA system. Signals may include water
       quality (WQ) data (e.g., residual or free chlorine concentration) and operations (OP) data
       (e.g., tank levels, or flow rates). Sensor hardware may also provide error data  to the
       SCADA controller, which can be used by CANARY as an alarm (ALM) signal.

   •   sub-station - A monitoring station that is physically located at the same place as other
       monitoring stations. CANARY uses sub-stations to differentiate testing locations. For
       example, if a main supply tank has two outlet pipes that are  being monitored;  each outlet
       would be considered its own sub-station.

   •   threshold  (TA) - A value used to determine if a data point is considered an outlier by
       CANARY. This value is calculated relative to residuals calculated for that signal.
       Threshold  has units of standard deviation, a, not the units of the raw signal data. The
       standard deviation is calculated within a normalization window. This allows outlier
       determinations for all signal data to be made using a single threshold value.

   •   water quality pattern - A recurring trend in one or more water quality signals that
       would normally be significant enough to trigger an alarm, but which is considered by the
       user to be a "normal event". Such patterns could be caused by routine operations such as
       daily demand changes, pumps or plants turning on or off, or water treatment activities. If
       a pattern is regular enough, it can be identified,  stored in a library, and used as a
       recognized pattern to help eliminate false alarms associated  with normal activities.
CANARY Training Tutorials                                                     Page 118

-------
United States
Environmental Protection
Agency
PRESORTED STANDARD
 POSTAGE & FEES PAID
         EPA
   PERMIT NO. G-35
Office of Research and Development (8101R)
Washington, DC 20460

Official Business
Penalty for Private Use
$300

-------