v>EPA
EPA/600/R-13/029 I May 2013 I www.epa.gov/research
  United States
  Environmental Protection
  Agency
  Environmental Data Management
  in Support of Sharing Data
  and Management
   Office of Research and Development

-------
                                                                  EPA/600/R-13/029
                                                                          May 2013
                              FINAL REPORT
 ENVIRONMENTAL DATA MANAGEMENT IN SUPPORT OF SHARING DATA
                              AND MANAGEMENT
                                         by

    Dr. Lilit Yeghiazarian, Mr. Amr Safwat, Mr. Allen Teklitz, and Mr. Donald Morehead,
          Pegasus Technical Services, 46 E. Hollister Street, Cincinnati, OH 45219
 School of Energy, Environmental, Biological  and Medical Engineering Program, University of
                           Cincinnati, Cincinnati, OH 45221
                     Dr. Tim Whiteaker and Dr. David R. Maidment
          Pegasus Technical Services, 46 E. Hollister Street, Cincinnati, OH 45219
  Center for Research in Water Resources, University of Texas at Austin, Austin, TX 78712

                                  Dr. Elly P.H. Best,
Water Supply and Water Resources Division, National Risk Management Research Laboratory,
                           EPA/ORD, Cincinnati, OH 45268
                              Contract No. EP-C-11-006
                                  Task Order No. 41

                                  Dr. Elly P.H. Best,
                                 Task Order Manager
                          Water Quality Management Branch,
                      Water Supply and Water Resources Division
                    National Risk Management Research Laboratory
                          Office of Research and Development
                         U.S. Environmental Protection Agency
                 EPA's National Risk Management Research Laboratory,
                 Andrew W. Breidenbach Environmental Research Center
                                26 W M.L. King Drive,
                                Cincinnati, OH 45268

-------
                                    DISCLAIMER
The U.S. Environmental Protection Agency (EPA), through its Office of Research and
Development, funded and managed, or partially funded and collaborated in, the research
described herein under Task Order (TO) 0041 of Contract No. EP-C-11-006 to Pegasus
Technical Services. This document has been reviewed in accordance with U.S. Environmental
Protection Agency (EPA) policy and approved for publication. The views expressed in this
report are those of the author[s] and do not necessarily reflect the views or policies of EPA.
Mention of trade names or commercial products does not constitute endorsement or
recommendation for use.  The quality of secondary data referenced in this document was not
independently evaluated by EPA and Pegasus.
                                          11

-------
                                      ABSTRACT

A data management system (DMS) was developed, tested and demonstrated to store and manage
water  quality  and  quantity  (WQ2)  data  pertaining  to  U.S.  Environmental  Protection
Agency/Office  of Research and Development (EPA/ORD)  research projects in standardized
formats. This approach was taken to facilitate accessibility, sharing, integration, and use of this
information for simple calculations and inclusion into models by EPA and other users to  inform
water management  decisions. The  objectives of this  project were:  (1) Build  a previously
identified hydrologic information  system (HIS) Observational Data Model (ODM) and convert
existing water quality and quantity data, generated by EPA/ORD-research projects,  into this
format; (2) Develop, test and demonstrate a scalable DMS based on HIS protocols for storage
and management of geo-referenced data; and (3) Enable targeted users to access, exchange and
integrate geospatially referenced WQ2 information, and to conduct relatively simple calculations
and/or run models to inform water management decisions at various watershed scales.  The HIS
developed by the Consortium of Universities for the Advancement of Hydrologic Science Inc.
(CUAHSI) was used in this project. The CUAHSI HIS ODM model is widely accepted in the
United States, is compatible with the National Water Quality Portal, and is gaining international
acceptance as the standard for water data. This HIS is web-based, and uses the WaterML format
for sharing hydrologic time series data.

Within the U.S., many organizations  measure WQ2  and various biological parameters. Despite
the  fact that this information is routinely made available  to the public,  the  difficulties  in
identifying data sources, syntactic and semantic heterogeneity across data formats and  metadata
make the discovery, access and interpretation  of data challenging  for research and other
stakeholder  communities. EPA-ORD seeks to alleviate these challenges  and plans  to use
CUAHSI's HIS and other tools to integrate  time series data generated within its local, regional
and national watershed projects with data collected by collaborators, and to share these data with
other interested parties.

This report presents  the results of exploring,  implementing and amending the CUAHSI approach
to storing and managing data of (1) an  ongoing WQ2 monitoring effort in the East Fork of the
Little Miami River (EFLMR) watershed in Ohio; and (2) a completed WQ2 monitoring effort in
the Shepherd Creek watershed near Cincinnati, Ohio. Data included in the new DMS pertained to
physicochemical parameters of discrete water samples,  water-immersed sensors, and discrete
stream-substrate samples.  Data on biological parameters of water-immersed sensors and discrete
stream-substrate samples were not included.  The functionality of the DMS was demonstrated by
estimating an example nitrogen (N) TMDL using a Load Duration Curve (LDC) approach based
on data retrieved from the EFLMR DMS and other databases. All data were  accessed,  explored
and visualized through the HIS Hydrodesktop tool.
                                           in

-------
                              ACKNOWLEDGMENTS
This report has been prepared with input from the research team, which includes Pegasus
Technical Services; the School of Energy, Environmental, Biological and Medical Engineering
Program, University of Cincinnati; the Center for Research in Water Resources, University of
Texas at Austin; and Water Quality Management Branch (WQMB)TWater Supply and Water
Resources Division (WSWRD)/National Risk Management Research Laboratory (NRMRL)/
Office of Research and Development (ORD) of the U.S. Environmental Protection Agency
(EPA).

Technical lead, direction and coordination for this project were provided by Dr. Elly P.H. Best,
EPA/ORD/NRMRL/WSWRD/WQMB.

The authors would like to thank Dr. Joel Allen, EPA/ORD/NRMRL/WSWRD/WQMB and Dr.
Yusuf Mohamoud, EPA/ORD/NERL/ERD for providing valuable written comments on the final
report.

Special appreciation is given to the contact person of the case study presented in this report, Dr.
Christopher Nietch, EPA/ORD/NRMRL/WSWRD/WQMB, contact persons of the EPA/ORD
Office of Science and Information Management (OSIM) who advised on the handling of the
EPA fire-wall, Mr. David Lyons, Mrs.  Ann Vega and Mrs. Bhagya Subramanian, Cincinnati,  and
Dr. Guoxiang Yang, ORISE Research Associate at EPA/ORD/NRMRL/WSWRD/WQMB who
supported the report preparation.
                                         IV

-------
                               EXECUTIVE SUMMARY

Many organizations within the U.S. measure water quantity and quality (WQ2) variables such as
precipitation, streamflow, water quality parameters (pH, nutrients, etc), and various biological
parameters. Results of data analyses are typically published in reports and journal articles. These
publications often include technical details of data collection and metadata. Data values are often
made available as  files that can be  retrieved  from public websites.  Discovery,  access and
interpretation by research and stakeholder communities are challenging because of syntactic and
semantic heterogeneity across data formats and metadata. Pilot projects  that render information
on WQ2 variables, values, and metadata readily identifiable, accessible, sharable and usable in
analytical and modeling applications that accept standardized formats for direct interpretation,
can play a vital role in facilitating and accelerating collaborative and environmental modeling
activities within EPA-ORD, EPA, as well as between EPA and other agencies and organizations.
The  pilot project  described in this  report was conducted  to evaluate a web-based,  highly
standardized  and  widely  accepted  hydrologic information  system (HIS)  for storage and
management of EPA-ORD regional watershed WQ2 data.
The main goal of the project was to explore a previously identified HIS for storage of WQ2 data
of an ongoing EPA-ORD watershed project, to develop, test and demonstrate the subsequent use
of these data for  calculation of a total maximum daily load (TMDL), and  to demonstrate
potential access and use by targeted users. In addition, a wider implementation potential within
EPA-ORD was explored.
The  HIS developed by the Consortium of Universities  for the  Advancement  of  Hydrologic
Science Inc. (CUAHSI) was used in this project.  This HIS provides a database schema called the
Observations Data Model (ODM) for consistent storage and management of observations data
and associated metadata, including data elements required by the WaterML format.  ODM is
widely accepted in the United States, it is compatible with the National Water Quality Portal, and
is gaining international acceptance as the golden standard for water data. The CUAHSI HIS is
web-based and uses the WaterML format for sharing hydrologic time series data.
WQ2 data from an ongoing watershed management research proj ect in the East Fork Watershed
of the  Little Miami River in Ohio (EFWLMR), pertaining to 2010,  served as the  initial data
source.  The DMS work was initiated by downloading and installing a blank ODM database
(subsequently called 'DMS') from the CUAHSI-HIS website onto an EPA-ORD server located
within the EPA-ORD firewall. Of all data categories explored, the physico-chemical data from
discrete water samples, the sensor data  from water-immersed sensors for  physico-chemical
variables, and the  physico-chemical data from discrete stream substrate (sediment) samples were
committed to the  ODM. The biological monitoring data, largely on benthic macrofauna, could
not be loaded because  no data loader is currently  available to which  the data may be mapped.
The  data from the discrete water samples were  reformatted and mapped to the ODM Data
Loader. Additional columns were inserted into their original Excel sheet for Sources and Sites,
and  the information  required  to populate these  columns was  generated: for  Sources,
organization/agency collecting the data; for Sites, site-name, latitude, longitude, and elevation. In
addition, the column headers were  edited to match the ODM  vocabulary, particularly the
Variable Code. For each existing Excel sheet, a second sheet was added to contain the metadata.
After reformatting, all data were  committed to the  DMS using the ODM Data Loader.  The data
from  the water-immersed  sensors were committed directly to  the DMS  using  the ODM

-------
Streaming Data Loader. The data from the discrete sediment samples were also reformatted, and
then mapped to a modified version of the ODM Streaming Data Loader. Besides editing and
addition of the same information as for the discrete water  samples, additional columns were
inserted into their original Excel sheet for the sediment categories. The consistency of the data
committed to the DMS was validated by a three-step process, including verification of the data
category, of the ODM log file, and of the actual data committed to the ODM database (DMS).
The same procedure was used  to  commit water quality monitoring data  of the  completed
Shepherd Creek project to the ODM database, and, thus, also add this new database to the DMS.

The functionality  of the DMS was demonstrated  with  an example nitrogen  (N)  TMDL
calculation using the Load Duration Curve  (LDC) approach. Data were retrieved from the DMS
and other  databases and accessed, explored and visualized through the HIS Hydrodesktop tool.
In this case Hydrodesktop accessed the DMS-data via a link to the test publication of the EFW
Database  for which CUAHSI  provides a codeplex sandbox.  When implementation of the
recently developed DMS is approved for use within EPA-ORD and funding for the DMS to enter
into its' 'Production Phase' (involving dedicated server testing,  operating and  maintenance)  is
generated, publication of the EFW Database in the CUAHSI HIS Services  Central  Web  Service
Registry (and possibly other websites) can be accomplished. The  latter step would enable access
and  use  by  users  in-  and outside EPA/ORD,  and may greatly facilitate   and  accelerate
collaboration.
                                           VI

-------
                                    CONTENTS

DISCLAIMER	ii
ABSTRACT	iii
ACKNOWLEDGMENTS	 iv
EXECUTIVE SUMMARY	 v
ACRONYMS AND ABBREVIATIONS	x
1.0: INTRODUCTION	 1
    1.1  Project Background	 1
    1.2  Project Objectives	 1
    1.3  Report Outline	2
2.0: DATA SOURCES	 1
3.0: DMS ARCHITECTURE AND USE FOR EXCHANGE AND INTEGRATION OF WQ2
    INFORMATION	 1
    3.1 General Guidelines	 1
            .1. Hardware and Software Requirements	  1
            .2. Review of CUAHSIHIS Data Publication Tools and Service	2
            .3. ODM Structure	2
            .4. Installing a Blank ODM Database	  2
            .5. Requirements for Input Files to ODM Data Loader: Transforming Data into CUAHSI
               Format	  3
            .6. Loading Metadata and Data Values into ODM	3
            .7. Working With ODM Data	4
    3.2. Working with East Fork Watershed (EFW) Data	4
         3.2.1. Description of Physico-chemical Data Water File	4
         3.2.2. Description of Sediment Data File	5
         3.2.3. Loading Sites to the ODM	 6
         3.2.4. Loading Sources to the ODM	7
         3.2.5. Loading Variables to the ODM	7
         3.2.6. Loading SampleMediumCV to the ODM	8
         3.2.7. Loading VariableNameCV to the ODM	8
         3.2.8.LoadingSampleTypeCVtotheODM	8
         3.2.9. Loading Variables to the ODM	8
         3.2.10. Loading DataValues to the ODM	8
         3.2.11. Loading Sensor Data from Water-immersed Sensors for Physico-chemical Variables to
              the ODM	9
         3.2.12. Loading Sediment Data to the ODM	9
         3.2.13. Data Validation	11
    3.3  Working with Shepherd's Creek Data	12
4.0: DATA PUBLICATION	13

    4.1. Data Publication with WaterOneFlow	13
    4.2. Publishing New Watershed Data	14
         4.2.1. DMS Development (and Testing) Phase (Figure 4-1):	14
                                         vn

-------
         4.2.2. DMS Demonstration Phase (Figure 4-2):	14
         4.2.3. DMS Production Phase (Figure 4-3):	15
5.0: USING THE DMS: EXCHANGE AND INTEGRATION OF WQ2 INFORMATION	20
    5.1. Using HydroDesktop	20
    5.2. Using HydroR	20
    5.3. Using HydroExcel	20
6.0: DATA ANALYSIS TO INFORM DECISION-MAKING	21
    6.1. Area of Interest and Problem Identification	21
    6.2. Data Management	21
    6.3. Data Exploration	21
    6.4. Data Analysis Using Excel	25
7.0: CRITICAL GAPS OF CURRENT APPROACHES	32
8.0: RECOMMENDATIONS FOR IMPROVING UPON EXISTING APPROACHES	33
9.0: REFERENCES	34
                                       FIGURES
Figure 3 -1.   Observations Data Model (ODM) Schema, from Horsburgh et al. 2009	3
Figure 3-2.   Physico-chemical Data Water 2010, Excel sheet	5
Figure 3 -3.   Physico-chemical Data Water 2010, with New Formatted Data Values Spreadsheet	5
Figure 3-4.   Physico-chemical Data Sediment 2010, with New Formatted Data Values Spreadsheet	5
Figure 3-5.   Steps in Data Loading to the ODM	6
Figure 3-6.   Loading Sites to the ODM	6
Figure 3-7.   Loading Sources to the ODM	7
Figure 3-8.   Loading Variables to the ODM	7
Figure 3-9.   Loading 2010 Nutrient Data Values	9
Figure 3-10.  ODM Streaming Data Loader	9
Figure 3-11.  Modified Data Loader	10
Figure 3-12.  New Testing Tab Added to the ODM Data Loader	11
Figure 3-13.  Data Validation Using ODM Tools	12
Figure 4-1.   DMS Development Phase	16
Figure 4-2.   DMS Demonstration Phase	17
Figure 4-3.   Publication DMS in Demonstration Phase	18
Figure 4-4.   DMS Production Phase	19
Figure 6-1.   The East Fork of the Little Miami River Watershed, with Point and Non-point Sources for
            Pollutant Loadings Indicated (fromNietchetal. 2010)	22
Figure 6-2.   Monitoring Stations for the East Fork of the Little Miami River Watershed	23
Figure 6-3.   Time Series of the Flow Data at Perintown, OH,  Showing Seasonality and Variation by
            Hydrological Year	23
Figure 6-4.   Box and Whisker Plot of the Flow Data at Perintown, OH, Showing a Maximum in March
            and Minimum in September	24
Figure 6-5.   Summary Statistics of the Flow Data at Perintown,  OH, Showing Variability in the Data 24
Figure 6-6.   Flow at Perintown, OH, in 2010	25
Figure 6-7.   Nitrogen Time Series at Perintown, OH, in 2010	26
Figure 6-8.   Computed Nitrogen Loading Curve of the East Fork Watershed in 2010	27
Figure 6-9.   Flow Duration Curve of the East Fork Watershed	28
Figure 6-10.  Nitrogen Load Duration Curve of the East Fork Watershed	30
                                           Vlll

-------
Figure 6-11.  Nitrogen Load Duration Curve of the East Fork Watershed, with Difference Between the
           Averages of Observed and Target Loads, Respectively, Marked	30
                                  TABLES
Table 6-1.    Load Duration Curve Estimated Nitrogen Loads and Overall Reductions Needed to Meet
           the Proposed Dissolved Inorganic Nitrogen TMDL for the Management of Rivers and
           Streams < 1,3 00 km2 (with 10% margin of safety)	31
                                APPENDICES

Appendix A:  WORKSHOP: PUBLISHING DATA WITH THE CUAHSIHYDROLOGIC
            INFORMATION SYSTEM (60 pp)
Appendix B:  CUAHSI COMMUNITY OBSERVATIONS DATA MODEL (ODM) VERSION
            1.1 DESIGN SPECIFICATIONS (54 pp)
Appendix C:  DESCRIPTION OF ESF DATA LOADED TO THE ODM (11 pp)
Appendix D:  DESCRIPTION OF SC DATA LOADED TO THE ODM (2 pp)
Appendix E:  HYDROLOGY OF JACOB'S WELL SPRING. A TUTORIAL FOR USING
            HYDRODESKTOP  TO  DISCOVER AND  ACCESS  WATER  DATA.
            PRESENTED AT THE UNIVERSITY OF CINCINNATI,  SEPTEMBER 6, 2011
            (32 pp)
                                     IX

-------
                       ACRONYMS AND ABBREVIATIONS
ADC         Application Deployment Checklist
cfs           cubic feet per second
CUAHSI     Consortium of Universities for the Advancement of Hydrologic Science Inc.
DIN         Dissolved Inorganic Nitrogen
DMS         Data Management System
DMZ         Demilitarized Zone
DWTP       Drinking Water Treatment Plant
EFLMR      East Fork of the Little Miami River
EFW         East Fork Watershed
EPA         United States Environmental Protection Agency (=USEPA)
ERD         Ecosystem Research Division
FDC         Flow Duration Curve
GIS          Geographic Information System
HIS          Hydrologic Information System
LTER        Long Term Ecological Research
Mac         Macintosh
MGD        Mega Gallons per Day
NERL        National Exposure Research Laboratory
NRMRL     National Risk Management Research Laboratory
NWIS        National Water Information System
ODM        Observational Data Model
ORD         Office of Research and Development
OSEVI        Office of Science and Information Management
PC           Personal Computer
QA          Quality Assurance
RTF         Research Triangle Park
sa           system administrator
SC           Shepherd's Creek
STORET     STORage and RETrieval system for water quality data
TMDL       Total Maximum Daily Load
TO           Task Order
UC           University of Cincinnati
USAGE      United States Army Corps of Engineers
USGS        United States Geological Survey
WQ2         water quality and quantity
WQMB      Water Quality Management Branch
WSWRD     Water Supply and Water Resources Division
WWTP      Waste Water Treatment Plant

-------
                               1.0:      INTRODUCTION
1.1    Project Background
Within the U.S., many organizations measure water quantity and quality (WQ2) variables such
as precipitation, streamflow, water quality parameters (pH, nutrients, etc), and various biological
parameters. Results of data analyses are typically published in reports and journal articles. These
publications often include technical details of data collection and metadata. Data values are often
made available as files that  can  be  retrieved  from public  websites. However,  syntactic  and
semantic heterogeneity across  data formats and metadata make  the discovery,  access  and
interpretation challenging for research and stakeholder communities.
To address these issues, the  Consortium of Universities  for the  Advancement of Hydrologic
Science Inc. (CUAHSI) has developed a web-based Hydrologic Information System (HIS) that
uses the WaterML format for sharing hydrologic time series  data  (www.cuahsi.org).  The
CUAHSI HIS also provides a database schema called the Observations Data Model (ODM) for
consistent storage and management of observations data and associated metadata, including  data
elements required by WaterML. The CUAHSI model  has been widely accepted in  the United
States and is gaining international acceptance as the  standard for water  data, with over  150
organizations such as universities, utilities, agencies, businesses, state and local governments. An
overview  of web-based interactions facilitating the sharing of hydrological data enabled by
working with CUAHSI-HIS is provided in Appendix A.
EPA-ORD plans to use CUAHSI's HIS  and other tools to integrate time series data generated
within its local,  regional  and national  watershed projects  with data collected  by EPA's
collaborators, and to share these data with other interested parties.  The significance of this effort
is that the standardization and integration of data are expected to greatly facilitate collaborative
and environmental modeling  activities within EPA-ORD, EPA, as well as between EPA and
other agencies and organizations in the  new Safe and Sustainable  Water  Research Program's
focus areas  including  Sustainable  Water  Resource Flows  and  Sustainable Natural  and
Engineered Water Infrastructure Systems.
1.2    Project Objectives
This project has three objectives:

Objective 1. Build a CUAHSI ODM and convert existing WQ2 data into the CUAHSI format;

Objective 2. Develop, test and demonstrate a scalable Data Management System (DMS) based
on CUAHSI protocols for storage and management of geo-referenced discrete data generated by
ongoing watershed management research in the East Fork Watershed of the Little Miami River
in Ohio (EFWLMR), as well as newly initiated research at other locations;

-------
Objective 3. Enable targeted users to access,  exchange and integrate geospatially referenced
WQ2 information, and conduct relatively simple calculations and/or run models to inform water
management decisions at various watershed scales.

The  results  of this project are subject to the Quality Assurance Project Plan ID no W-15812
(Approval date: 11/21/2011).

1.3       Report Outline
The remainder of the report is organized as follows. Section 2 describes data sources used in the
project and those projected for use in the future. Section 3 outlines the approach taken by the
team to convert the existing WQ2 data into the CUAHSI format. This section meets the first
objective of the project. Section 4 describes data publication phases, with a focus on the current,
demonstration phase.  Section 5 describes the architecture of the DMS developed for the
EFWLMR data, and its use  for demonstration purposes. Sections 4 and 5  meet the second
objective. Section 6 explains how users can retrieve data from the DMS and upload their own,
and  describes the computer codes developed by  the team  to  inform  watershed management
decision-making. Section 6 meets the third objective of the project.  Sections 7 and 8 identify
critical gaps in current approaches and data management and discuss the  feasibility of further
improvement.

The contents of these sections are briefly outlined below:

2.0    Data Sources
       This section  outlines the data sources used in the project and those projected for use in
       the future.
3.0    DMS Architecture and Use for Exchange and Integration of WO2 Information
       The  CUAHSI-HIS data publication tools  are reviewed. The existing WQ2 data and
       metadata were translated into terminology used by the CUAHSI ODM, these translation
       methods are  outlined  in this section. The data were then loaded into ODM.  All steps to
       install the required software and load the data are outlined.
4.0    Data Publication
       Steps needed to publish an ODM database are outlined. Three data publication phases are
       presented: development,  demonstration and production. The data publication is currently
       in the demonstration  phase, whereby a copy of the  dataset is housed on a designated
       server at the University of Cincinnati (UC). This web-service can  be accessed by EPA
       and  non-EPA users  who  have access to  the Internet,  by  searching in  CUAHSI
       HydroDesktop for "EPA - East Fork Watershed" in Ohio.
5.0    Using The DMS: Exchange and Integration of WQ2 Information
       This section describes tools that enable data visualization and analysis. It includes a
       discussion of HydroDesktop, HydroR and HydroExcel.
6.0    Data Analysis to Inform Decision-Making
       This section  contains the description of data retrieval  by users from the DMS, uploading
       data  of their  own, and the computer codes developed by the team to inform the decision-
       making process of watershed management.

-------
7.0    Critical Gaps of Current Approaches
       This section contains the overview and synthesis of information provided in Sections 3-6,
       and identification of the critical gaps in the approaches.
8.0    Recommendations for Improving upon Existing Approaches
       This section focuses on feasibility assessment for improvement upon the best  current
       approaches and discusses an implementation plan.

-------
                                2.0:       DATA SOURCES
The  first source of data is the EPA-ORD/NRMRL's ongoing WQ2 monitoring effort in the
EFW. This monitoring program has been in place since 2005.  The resulting data include (1)
physico-chemical data from water samples; (2) sensor data from water-immersed sensors for
physico-chemicial  variables;  (3)  physico-chemical  data  from  stream-substrate (sediment)
samples; (4) sensor data from water-immersed sensors for biological variables; and (5) biological
data  from  stream substrate samples. In particular, data collected in the year 2010 were selected
for upload into the ODM. All  data originating from this data source are subject to the Quality
Assurance Project Plan ID No 634-Q-2-0. Approved data may become available for public use
approximately 2 years after collection. For example, a 2-year lag-period between data collection
and public availability is customary for  networks  such as the Long Term Ecological  Research
(LTER) Network.
Other potential  data  sources  are related to  projects within the NRMRL Cincinnati Cross-
Laboratory Green Infrastructure Study, where water quality monitoring is conducted. Among the
latter projects, those on Shepherd's Creek, Quebec Heights,  and Pervious Pavement, all located
in Cincinnati, OH, served as potential additional data sources.

The monitoring data pertaining to the Shepherd's  Creek project, located near Cincinnati, Ohio,
served as the second data source. This monitoring program has been in place from 2004 to 2011.
The resulting data included physico-chemical data from water grab samples. All data originating
from this data source  are subject to the Quality Assurance Project Plan ID No S-10386-JA-2-0
(Approval date: 09/02/2011).

-------
           3.0:       DMS ARCHITECTURE AND USE FOR EXCHANGE AND
                          INTEGRATION OF WQ2 INFORMATION

In this  section the hardware and software  requirements  to  create a HydroServer for  data
publishing are outlined. The DMS architecture is presented as well as how it can be used for data
sharing. This section is organized as follows. The first part,  General Guidelines, is a condensed
guide to steps that need to be taken to publish data through a  HIS HydroServer. Detailed step-by-
step  instructions are given in  Appendix A, which  is a tutorial from a workshop taught at the
University of Cincinnati  by Dr. Timothy Whiteaker. The  workshop  was conducted using  a
simplified dataset 'RawData', available electronically as part of this report to facilitate  user
learning. The second part, Working with East Fork Watershed (EFW) Data, is a detailed, step-
by-step description of work performed with EFW data collected in 2010 and published through
CUAHSI HIS. The third part, Working with Shepherd's Creek (SC) Data, is a description of the
work performed with SC data collected in the period 2004-2011 and published through CUAHSI
HIS.

3.1 General Guidelines

3.1.1. Hardware and Software Requirements.
The minimal hardware requirement is a Personal Computer (PC) with 4GB RAM, 500 GB hard
disk, 2.7 GHz processor. Macintosh (Mac) users can use the CUAHSI software through a virtual
machine.
The software includes:
   •   Windows 7, XP or Vista
   •   Microsoft Internet Information Services (IIS) 7 — comes with Windows 7, but may need
       to be enabled. Enable ASP.NET.
   •   .NET Framework 2.0 SP2, 3.5 SP1 (free)
   •   Microsoft SQL Server 2008 R2 (commercial) or SQL Server 2008 R2 Express (free)
       o  Be sure to install the version with SQL Management Studio (aka Database and
          Management Tools)
       o  Install with these options:
             -   Install with Mixed Mode Authentication.
             -   Specify the 'sa' (system administrator) password that you will remember.
   •   HIS software (free)
       o  ODM Data Loader - http://his.cuahsi.org/odmData Loader.html
       o  ODM Streaming Data Loader - http://his.cuahsi.org/odmsdl.html
       o  ODM Tools - http://his.cuahsi.org/odmtools.html
   •   Additional software (not required for HydroServer)
       o  Microsoft Office 2010 (32-bit version)
       o  Google Earth (free) - http://earth.google.com/download-earth.html
       o  HydroObjects (free) - http://his.cuahsi.org/hydroobjects.html
       o  HydroExcel (free) - http://his.cuahsi.org/hydroexcel.html
       o  HydroDesktop (free) - http://his.cuahsi.org/hydrodesktop.html

-------
3.1.2. Review of CUAHSI HIS Data Publication Tools and Service.

The CUAHSI HIS provides Web services, tools, standards and procedures that enhance access to
more and better data for hydrologic analysis. HIS  software is free and available on the HIS
website at http://his.cuahsi.org. A variety of HIS software applications have been built to serve
several types of users and scenarios, from data users to data publishers to educators and
developers. These include the HydroServer, the ODM, WaterOneFlow, HIS Central, the
HydroDesktop. HydroServer is what allows the users to publish their own data. It includes
software for publishing observations data with a WaterOneFlow Web service, but it also includes
a website and supporting components for geospatial and temporal data visualization. ODM is a
data model for the storage and retrieval of hydrologic observations in a relational database.
WaterOneFlow is a Web service that facilitates automated and programmatic access to data. HIS
Central is a website maintained by the CUAHSI HIS team where users can register their
WaterOneFlow Web service. The service then becomes discoverable along with dozens of other
Web services already registered with the system, including the services EPA Storage and
Retrieval System for water quality data (STORET) and the USGS National Water Information
System (NWIS). This makes HIS Central the largest single catalog of the nation's water data.
HydroDesktop is a free and open source Geographic Information System (GIS) that allows the
user to explore and download Internet data published on HydroServers.

3.1.3. ODM Structure.

ODM is a data model for the storage and retrieval  of hydrologic observations in a relational
database (Figure 3-1). The purpose for such a database is to store hydrologic observations data in
a system designed to optimize data retrieval for integrated analysis of information collected by
multiple investigators. It is intended to provide a standard format to aid in the effective sharing
of information between investigators and to allow analysis of information from disparate sources
both within a single study area or hydrologic observatory and across hydrologic observatories
and regions. The observations data model is designed to store hydrologic observations and
sufficient ancillary information (metadata) about the data values to provide traceable heritage
from raw measurements to usable information allowing them to be unambiguously  interpreted
and used. A relational database format is used to provide querying capability to allow data
retrieval supporting diverse analyses. A generic template for the observations database is referred
to as the Observations Data Model (ODM). The specifics of the ODM are documented in
Tarboton et al., 2008. The current ODM design specifications can be found in Appendix B of
this report.

3.1.4. Installing a Blank ODM Database.

The first step is to download and install a blank ODM database from http://his.cuahsi.org. The
ODM database then needs to be attached to the SQL Server. The controlled vocabularies within
the database are then updated; this step helps to ensure that the user's terminology is consistent
with the peer-reviewed vocabulary of terms maintained by CUAHSI.

-------
  3.1.5. Requirements for Input Files to ODM Data Loader: Transforming Data into
  CUAHSI Format.

  User data files need to be transformed into files that are formatted for loading into an ODM
  database using the ODM Data Loader.  The minimum pieces of information, stored in separate
  files and needed to describe data in the ODM, are:

          •       Sites
          •       Sources
          •       Variables
          •       DataValues
        Monitoring Site Locations
                                                                                                     ISOhtotadati
                                                                                                    (M| Mi?rad«i1iiiD (PK)
                                                                                                  1 , (Mf TflplcCal»gory
                                                                                                     Till*
                                                                             (M>5aure*IO{PK)
                                                                             (M> Organization
                                                                             i M [ SaurcoDoscfipnon
                                                                                                    (M| Abstract
                                                                                                     | ProHlaVenlen
                                                                             iM! Phone
                                                                             (M) Email
                                                                             (M) Address
                     i M: UU.onaDjlumlO (FK!
                   0 ' 1(01 Elevation m
                     KO)LocatX
                     O) LocalY
                    .' ||Q) LocalProjflcliomD (FK)
                                                                                       Data Col faction Methods
                                                                                                  (Ml LjtJHetfiuJID {PK}
                                                                                                  (M} UbHame
                                                                                                  (M) J ahOrq,nmj.Hino
                                                                                                 1 (MJ
                                                                                   1 J"' MctnodlD (PK]

                                                                                    [Ol r.vtiruH.'O
           D«Kflll«Ofl«lUn« . !.„-:
SliSSr    ^~   "^a*-*™?**
                                         (O) OtlMtTypolO {f K)
                                         {H) C*nsorCod*
                                        . . (O»Qualifi«flD(FK|.
yartrthtowcv . .

        1 MM*.  'iHlv^bl^T1
SpvciotionCV   '        t Xw) Variabl«Nam«

;«i  [ Dti   JMJSam^Mrfium^^
                ~" I) Value Type
                                         Categorical Oats
1 •






ata















I








^L

IM) OniveaFrtmlO (FK)

	 KM) GroupIO
(M) ValwIO <

• Group
Data Qualifiers
. dually
                                                                                  Value Grouping
                                                                QuaNtyContrdLevafe
                                                            1 1 i|M) QuatltyControlUvellD (PK}
                                                           	4M) Qo*fityContTolUvelCodt)
                                                              4 M] CWtnitipn
                                                                                   ° r(M| Qu.IIO.rlD (PK|
c

|

      Figure 3-1.    Observations Data Model (ODM) Schema, from Horsburgh et al. 2009

  3.1.6. Loading Metadata and Data Values into ODM.

  After the Sites, Sources and Variables metadata and data values are transformed into files
  acceptable by the ODM Loader, they are committed to the ODM database. In addition to data
  recorded in Excel spreadsheets, text files, etc, it is also possible to load data from telemetry
  systems. Appendix A describes how to use the Streaming Data Loader for that purpose.

-------
3.1.7. Working With ODM Data.

Once data are loaded into the ODM database, they can be queried, visualized and analyzed.
ODM Tools are used for this purpose. They also allow to derive new time series from data, for
example by applying a function to the original data.

3.2. Working  with East Fork Watershed (EFW)  Data
A list of the data files pertaining to the EFW which were accommodated into the DMS is shown
below. The data for the water-immersed sensors for biological variables were not available. The
biological data  from the stream-substrate samples could not be accommodated because HIS is
not programmed to store biological data and it does not carry a biological data loader yet.
1. Physico-chemical data water and sediment
      a.  2010_ESF-EFWS_NutrientData_ODM.xls
      b.  2010_ESF-EFWS_NutrientData_ODM_QC.xls

2. Sensor data from water-immersed  sensors for physico-chemical variables
      a.  CEC 2009-20lO.csv
      b.  CLC 2009-2010.CSV
      c.  FMR 2009-20lO.csv
      d.  HST2009-2010.CSV
      e.  HWR2009-2010.CSV
      f  KRT2009-2010.CSV
      g.  LRC 2009-2010.CSV
      h.  OWT 2009-20lO.csv
      i.  SHA2009-2010.CSV
      j.  SLT2009-2010.CSV
      k.  UHL2009-2010.CSV
      1.  USR 2009-20 lO.csv

3. Sediment data
      a.  FieldSedimentFractions_8-05_7-09_update91409.xls

All files were formatted to conform with the CUAHSI  standard before they were loaded to the
ODM database. Most of the column names in those files were renamed.
3.2.1. Description of Physico-chemical Data Water File.
The nutrient data file contained a DATA and a METADATA spreadsheet (Figure 3-2; Figure 3-
3). The DATA spreadsheet contained information about sampling date and time, site location,
sampling method, variables, and concentrations. Each column in the DATA spreadsheet is
explained in detail under the METADATA spreadsheet. The original METADATA spreadsheet
has been modified to such an extent that it can be used as source for lookup tables. The new
spreadsheet named Data Values has been added to this file. This spreadsheet pulls out all the
information from the two spreadsheets DATA and METADATA and creates a CUAHSI

-------
formatted Data Value file. This Data Values spreadsheet was copied to a new Excel file and saved
as a csv file.
Ki Schar3.






sie id I'D)
QAC
QAC
QAC
QAC
subkj
Schar7.





long id
fsuBim
STO
STD
STD
STD
pos
Schar4.





CROSS id
(POS)
H
G
F
E
rep 1






REP*
1
1
1
1
matrix
SC.IW2.






MATRIX
Ol
Bl
Ol
Dl
type
ScfiarS.






TYPE
CAL
CAL
CAL
CAL
anaty
ScharS.




Anafyte
Name
iANALY)
TNH4
TNH4
TNH4
TMH4
cup 3.



Autosam
pier-Cup
Number
(CUP)
91
92
93
94
ddate
yymmddtO.




Analyle
Detection
Date (DDATE)
20100106
20100106
20100106
20100106
dtime toe 11.




Analyte
Detection
Time {DTIME )
02:14:20 PM
02:1 5:25 PM
02:16:31 PM
02:17:36 PM
di«2.




Manual
Dilution
Factor
1
1
1
1
method
ScriarS.






METHOD
N-1.1
N-1.1
N-1.1
N-1.1
parea S.





Peak Area
(PAREA)
-0.155
0.25
0.923
1.33
phtg.




Peak
Height
(PHT)
-0.00734
0.0124
0.0472
0.0722
cone 10



Peak
Con centra
tion
(COMC'I
0
20
60
80
unit
Schare.






UNO"
ug N/L
ug N/L
ug N/L
ug N/L
cconc 10.
Peak
Concentra
tion
(corrected
for dilution
factor)
fuo/ll
1 0.00 1
20.00
60.00
80.00
               Figure 3-2.   Physico-chemical Data Water 2010, Excel sheet
 1/5(2010 Q3:Qi
 1/5(2010 03:0i
 1/5(2010 Q3:Qi
 1/5(2010 Q3:Q5
 1/5(2010 03:0i
 1/5(2010 Q3:Q5
  SiteCode
1:00 QAC
t-OG QAC
1:00 QAC
t-OG QAC
1:00 QAC
t-OG QAC
1:00 QAC
t-OG QAC
1:00 QAC
/alue
  C
  20 TNH4-D1
  SO TWH4-DI
  SO TNH4-D1
 300 TWH4-DI
 600 TNH4-D1
 1200 TWH4-DI
 2000 TNH4-D1
 1740 TWH4-DI
UbSampleCode  LabMethodlD MethodDesc
      0     0 Stand i
      0     0 Stand i
      0     0 Stand i
      0     0 Stand i
      0     0 Stand i
      0     0 Stand i
      0     0 Stand i
      0     0 Stand
                                                      ceiD  QualittiCcntrolLevsilD CensoiCode QffsetDes
                                                                              OffsetUnitsfD UTCO((s«
W5r2010 03:00
W5I2010 03:00
W5r2010 03:00
W5I2010 03:00
W5r2010 03:00
W5I2010 03:00
W5r2Q10 03:00
W5I2010 03:00
W5r2Q10 03:00
    Figure 3-3.   Physico-chemical Data Water 2010, with New Formatted Data Values
                                         Spreadsheet

3.2.2. Description of Sediment Data File.
The sediment data file contains Fractions and a METADATA spreadsheet. The Fractions
spreadsheet contains information about sampling date and time, site location, fraction sizes, and
weights. The METADATA spreadsheet has been modified and holds all the Offset Descriptions
(Figure 3-4). A new spreadsheet named Variables has been created which holds information for
each of the newly created variables. For each fraction size a new spreadsheet has been added to
this file. In addition, another spreadsheet named All Data Values has been created. This
spreadsheet gathers all the information from each class size and METADATA spreadsheet and
creates a CUAHSI formatted Data Value file. This Data Values spreadsheet was copied to a new
Excel file and saved as a csv file.
cdate
5»iimmdd1Q.



collection da*«


20070424
20070424
20070424
20070424
20070424
20070424
20070424
20070424
20070424
20070424
pdate
yymmddlQ,



placement
date


20070327
20070327
20070327
20070327
20070327
20070327
20070327
20070327
20070327
20070327
id
*cha3.



sample
id


OWT
OWT
OWT
OWT
KRT
KRT
KRT
KRT
CEC
CEC
tcharE *chai2
«pt «3,



bftq cross ^ ^ , . . t
, u . | { elapsed time of
chann crtarin i isp \ , .
el id ©lid ; ^


1 1A 1 28
1 2A 1 28
1 26 1 28
1 3A 1 28
2 1A 1 28
2 2A 1 28
2 2B 1 28
2 3A 1 28
3 1A 1 28
3 2A 1 28
mattK
tchai2.



Analytical
Base
Isediment



tnitDW4,Q
aye cobble
tray w*
163
aw small pw
sample* wt
53
^S£T
Ay, initial (gl
finWW
4.0



final w^t
weight
(gJ

SS 2505
SS 2504 3570
SS 2433 3525
SS 2508
SS 2435
SS 2504 3S86
SS 2507 3250
S3 2S08
SS 2433
SS 2502 4058
finDW
4.0



final
(g)

TsedW
15, Ts
-------
Figure 3-5 shows the steps taken to load Sources, Sites, Variables and Data Values to the ODM
database. It also shows a screenshot of the ODM Data Loader and the  ODM Streaming Data
Loader.
                          Sources
                           Sftes
                         Variables
                 Che mica!/Physical/Biological
                         Data Values
                                                                            _J
                    Figure 3-5.    Steps in Data Loading to the ODM
3.2.3. Loading Sites to the ODM
jb> ODM Data Loader
File Help
F:\Enviranment\EPA-ED MS\0 D M\Docs\EFC SitesO D MTest_v2.csv

>
SiteCode
^^^^^|
SLT
SHC
HST
KRT
Longitude Latitude Site Name
.84 29503
^S4.212S98
-84.231882
-8425S485
-84.071855
35.&B525B
39.041194
39.08114
39.143385
39.QG4253
Upper Hall Run at Qoug
South Lucr,' Tributary at
Shaylor Crossing at Coui
Heiserman Stream at Rt
Tributar>pto Kain Run at
                        Figure 3-6.   Loading Sites to the ODM

A csv file with all the sampling sites within the EFW was created according to the CUAHSI
standard (Figure 3-6). The sampling sites file contains detailed information about site name,
code, and location. The site csv file was uploaded to the DMS using the ODM Data Loader.

-------
3.2.4. Loading Sources to the ODM.
£n ODM Data Loader
File Help
F:\Environment \EPA-ED MS\0 D M\Docs\EFC Sources .csv
Organization
»
CC-OEQ
USAGE
EPA
Source Description Source Link
East Fork Watershed Sites
East Fork Watershed Sites
East Fork Watershed Sites

http://www.epa.org

http://www.lrl.usace.army.mil/
East Fork Watershed Sites http://www. epa.org
                       Figure 3-7.   Loading Sources to the ODM

A csv file with information about data sources used in this project was created and loaded to the
DMS using the ODM Data Loader (Figure 3-7). This file contains information about the
different organizations which mange the different sampling sites within the EFW. It also contains
information about a contact person for each of those sampling sites.

3.2.5. Loading Variables to the ODM.
It is sometimes necessary to update the controlled vocabularies. The controlled vocabularies
(CVs) in ODM are implemented as tables.  For example, the list of valid variable names is stored
in the VariableNameCV table (Figure 3-8). To ensure consistent descriptions of user data, the
data must use terminology from these CV tables.  CUAHSI maintains a master list of CVs that
can be synced with an ODM database; however, sometimes it is necessary to add new, project
specific terms that aren't in the master CV. This can be done using the tools included with SQL
Server.
yb< ODM Data Loader
Fi
e Help
F:\Environment\EPA-ED M S\0 D M\Docs\EFC variables .csv
>
VariableCode
^^^^^^|
NH4
NH4-Urease
N02-3
TON
TN
Variable Name Sped;
Dissolved (Filtered) Reactive Phosphorus (as orthophosphate)
Dissolved (Filtered) Ammonium
Ammonium analyzed as an endpoint to 24hr Urease Assay
Dissolved (Filtered) Nitrite-Nitrate
Total Dissolved (Filtered) Nitrogen (as nitrate)
Total Nitrogen (Not filtered, as nitrate)
P
A
A
N
N
N
                      Figure 3-8.   Loading Variables to the ODM

-------
Before new variables could be loaded to the ODM new vocabularies had to be added to the
VariableNameCV table. The new vocabulary was added to the ODM by using SQL statements
inside SQL Server.
SQL statement: Insert into VariableNameCV Values (('Urea, total', Total Urea') and Insert into
VariableNameCV Values ('Urea, dissolved (filtered)', Total Urea')

3.2.6. Loading SampleMediumCV to the ODM.
New vocabularies had to be created and added to the SampleMediumCV table. The new
vocabulary was added to the ODM by using SQL statements inside SQL Server.
SQL statement: Insert into SampleMediumCV values('Atmospheric Deposition', 'AD'), Insert
into SampleMediumCV values(Tntergravel', TG'), Insert into SampleMediumCV values('Waste
Water', 'WW'), Insert into SampleMediumCV values('Deionized Water', 'DI'),  Insert into
SampleMediumCV values('Drinking Water', 'DW') and Insert into SampleMediumCV
values('Periphyton', 'BP')

3.2.7. Loading VariableNameCV to the ODM.
New vocabularies had to be created and added to the VariableNameCV table. The new
vocabulary was added to the ODM by using SQL statements inside SQL Server.
 SQL statement: Insert into VariableNameCV values( 'Ammonium analyzed as an endpoint to
24hr Urease Assay', 'Ammonium analyzed as an endpoint to 24hr Urease Assay') , Insert into
VariableNameCV values( 'Nitrogen, nitrite (NO2) + nitrate (NOs) nitrogen', 'Nitrogen (NO2) +
Nitrate (NOs) Nitrogen') and Insert into VariableNameCV values( 'Nitrogen, nitrite (NO2)
nitrogen', 'Nitrate (NO2) Nitrogen')

3.2.8. Loading SampleTypeCV to the ODM.
New vocabularies had to be created and added to the SampleTypeCV table. The new vocabulary
was added to the ODM by using SQL statements inside SQL Server.
SQL statement: Insert into SampleTypeCV values(T, 'Duplicate sample No. 1'), Insert into
SampleTypeCV values('2', 'Duplicate sample No. 2'), Insert into SampleTypeCV values('3',
'Duplicate sample No. 3')

3.2.9. Loading Variables to the ODM.
A variable csv file had to be created and formatted according to the CUAHSI standard. All the
newly created variables include a unique VariableCode that can be used to specify a variable of
interest. The VariableCode starts with the variable abbreviation followed by the sample medium
abbreviation to distinguish each variable by its location where it has been collected. For example
TNH4-SW stands for 'Nitrogen, NH3 + NH4' and it has been collected as a 'Surface Water'
(SW) Sample.

3.2.10. Loading Data Values to  the ODM.
The DataValues were created from the 2010_ESF-EFWS_NutrientData.xls file. A new
spreadsheet was created and all relevant DataValues Column names added. This spreadsheet
pulls values from the Data spreadsheet and the METADATA spreadsheet and  writes each value
to the correct column in the DataValues spreadsheet. This spreadsheet had to be copied and
pasted into a new Excel  document. This document was then saved as a csv file (Figure 3-9).

-------
Alternatively, the data can be saved using Save As CSV directly from the modified
NutrientData.xls file without having to save as a separate Excel file first. The ODM Data Loader
was used to load the Data Values to the ODM.
•£> ODM Data Loader
Pile Help
	 	
^^H^^^^^^^^^^H^H

•F \Environment \EPA-EDMS\ODM\DocsN201Q ESF-EFWS NutnentData ODM vanablescsvB



Date TVne SiteCode Data Value VanableCode Sample Type
6/21/2010
ii/a/20io
5/17/2010
11/1/2010
5/24/2010
10:20:00 AH
10:11:00 AM
10:20:00 AM
10:27:00 AM
oc
CLC
CLC
CLC
10: 19:00 AM CLC
6.75
7
7.21
7.28
7.39
DOC-SW
TUREA-SW
TOC-SW
TNH4-SW
DOC-SW
1
1
1
1
1
                    Figure 3-9.   Loading 2010 Nutrient Data Values

3.2.11. Loading Sensor Data from Water-immersed Sensors for Physico-chemical Variables
to the ODM.

The sensor water data files were uploaded to the ODM using the ODM Streaming Data Loader
(Figure 3-10). All sensor data consider daylight savings time. First a new csv file with all sensor
data variables was created. A description for each column in the sensor data files can be found
under Appendix C. While loading the sensor data to the ODM database the variables were
mapped to the newly created variables in the ODM.
                 1^ COM fetmrq CMi n»«!«.



                   + / — » J
                                  Lac*   F •fcnnwr£l>Jt  Id^.   711MIM«00  KUW1 M«
                        fa  IW
                        V.  I',!      LKd
                         '  f«f
                        V.  £SF
                        f  IV
      MiwnotW

tdw    7.uaiMaoo -.--1.7:111'«
      :u»ni5M 'jii,ysHtt*
         nty.
                                                      T.-U'MH I2IB  ••4-2C1IV.K
                        ,  ;
                                                              7/1MB1ll7»
                                                 '-j,i   :iijq!'
                       Figure 3-10.  ODM Streaming Data Loader
3.2.12. Loading Sediment Data to the ODM.
The ODM Streaming Data Loader was used to load the sediment data. New vocabularies were
added to the VariableNameCV table. The following SQL statement was used inside SQL Server:

-------
Insert into VariableNameCV Values(' Gravel Tray, total sediment weight (g)',' Gravel Tray, total
sediment weight (g)'), Insert into VariableNameCV Values(' Gravel Tray, total sediment weight
sum of fractions(g)',' Gravel Tray, total sediment weight sum of fractions(g)'), Insert into
VariableNameCV Values('Gravel Tray , > 2mm (g)',' Gravel Tray , > 2mm (g)'), Insert into
VariableNameCV Values('Gravel Tray, 250um -2 mm (g)', 'Gravel Tray, 250um -2 mm (g)'),
slnsert into VariableNameCV Values(' Gravel Tray, 250um - 2mm LOI % (w/w)', 'Gravel Tray,
250um - 2mm LOI % (w/w)'), Insert into VariableNameCV Values(' Gravel Tray, 1.2um -
250um (g)', 'Gravel Tray, 1.2um - 250um (g)') and Insert into VariableNameCV Values(' Gravel
Tray,  1.2um - 250um LOI % (w/w)', 'Gravel Tray, 1.2um - 250um LOI % (w/w)')

New sediment variables were created and loaded to the ODM. Sediment Datavalues were loaded
to the ODM using a modified version of the ODM Data Loader (Figure 3-11).
           F:\Enviranmertf\EPA-EDMS\Testing\20Q7-20Q9FliedSediments.csv
                                                                         Open
             Source Column Names
                                       Target DB Column Names
Date
Time
SiteCode
DataValue
VariableCode
LabSampleCode
LabMethodID
Method Description
OffsetValue




Source ID
QualityControl Level ID
CensorCode ^
ffl LabMethods *
i SampleTypeCV
4i- Methods
+'• Categories
S- Samples
|- SamplelD 
Date Time SiteCode DataValue VariableCode
4/24/2007
4/24/2007
4/24/2007
4/24/2007
4/24/20D7
4/24/2007
4/24/2007
4/24/2CB7
4/24/2007
4/24/2007
4/24/2007











OWT
QWT
OWT
OWT
KRT
KRT
KRT
KRT
CEC
CEC
CEC
373.4675911
680
686
310.0505956
574.5372778
760
413
242.4367722
734.743367
1073
1053
TsedWI
TsedW2
TsedW2
TsedWI
TsedWI
TsedW2
TsedW2
TsedWI
TsedWI
TsedW2
TsedW2
SampleType
I^H
1
i
1
1
1
1
1
i
1
1
LabSampleCode *|
T
«
0
0
0
0
0
0
0
0
0
< | m |
           You are loading DataValues
                                                               1001 Rows
                                                                        Commit File
                           Figure 3-11.  Modified Data Loader
                                           10

-------
3.2.13. Data Validation.
       SQL statement
       Select "from Units
                                                                         I;  Query

»•

4
UnitsName UnitsType Units Abbreviation ^_
percent
degree
grad
radian
degree north
degree south
degree west
degree east
ancminute
arcsecond
steradian
acre
hectare
square centimeter
sni iFire foot
Dimension less
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Area
Area
Area
Am Ft
X LJ
deg
grad
rad
degN
degS
degW
degE
arcmin
arcsec
sr
ac
ha
cm2
ft? "
rrr t
                                                                            31 7 Rows
             Figure 3-12.  New Testing Tab Added to the ODM Data Loader

Three steps were followed to validate the consistency of any data committed to the DMS. In the
first step a new tab named Testing in the modified ODM Data Loader was created (Figure 3-12).
This tab allows for a quick check and validation of any category of data which has been loaded
to the ODM. Under Figure 3-8 the tab 'Units' was queried. In the second step the ODM log files
were checked. The ODM data loaders keep a log file which contains information about what type
of data has been loaded, the loading date and time, and any errors which occurred during the
loading process. The data loaded using the different  ODM data loaders were validated after each
file had been loaded to the ODM database. All data loaded to the ODM database were
quantitatively verified by the user for consistency with the data loaded to the ODM database. The
third step included verification that the data values committed to the ODM for a specific location
and time were actually in the database, using ODM Tools, a freely downloadable application.
This tool enables querying the DMS and verification of the results returned (Figure 3-13) by
comparison with the data before loading onto the ODM.
                                           11

-------
kftl ODM Tools [ •=• 1 B larfSa
File Edit Tools Help i

'
[
[
C
/

Query j Visualize
7j Query by Site
> Choose Sites from a list
Edit

O Query by Site Name
ESF - Experimental Stream Facility Weather Station .*,
GRR - Grassy Fork at Glancey-Marat
HLR - Hall Run at Roundbottom Rd c
HST - Heiserman Stream at Rt 5D crc
7\ Query by Variable
a Choose Variables from a list
Temp - Temperature
TN-DW -Nitrogen, total
TNH4-AD - Nitrogen, NH3 + NH4


Organization (; )
1
Source Description (; )


® AND OR
ion Rd crossing Uuery by bite Lode
crossing


; Query by Variable Name
	 -. |
Q C' Query by Variable Code




EH General Category [7J Value Type |7J Quality Cc
Oimate BS3SS35I2i^^^M -9999 - U
Instrumentation Sample
Water Quality 1 -QualS)
2 - Derive
3 - Interpr
[3 Sample Medium [7| Data Type i - (fn™ul
Intergravel » Average 	 * '-
Sedhrient ' ' Minimum ^ Method {;




Multipls
Entries I. )
© AND
OR



Multiple
Entt
® AND
OR



ntrol Level 	 i tt of Observations
iknown * > |1
controlle S
d produJ- J QTimePenod
etedprod begin: 10/ 4/2D11
^nf. nmr *
* end- J10/ 4/2011
)


ite Variable Speciation Variable Units General Category Value Type Sample Medium Data Type Quality Control Level Method Description Vah




Export Checked Metadata Export Checke
)

d Data | Query
                    Figure 3-13.  Data Validation Using ODM Tools


3.3       Working with Shepherd's Creek Data
A procedure similar to that used for the EFW data was used to accommodate the SC data into the
DMS.In this case only one file type was included, i.e., Excel sheets containing physico-
chemicaldata in water grab samples, collected during a 7-year period (2004-2011).
                                         12

-------
                             4.0:       DATA PUBLICATION

This section of the report outlines how to publish data through HIS Central and consists of two
parts. Part one, Data Publication with WaterOneFlow, describes general concepts, requirements
and steps to be taken for data publication. Detailed instructions are given in Appendix A. Part
two, Publishing EFW Data, outlines specific phases towards publishing the EFW data with
regard to EPA ORD IT safety requirements.

4.1. Data Publication with WaterOneFlow
WaterOneFlow Web services are used to share data with others online.  WaterOneFlow defines a
standard set of queries and a standard output format for accessing data, regardless of whether the
data are accessed internally from an ODM  database, some other database, or even through
another website. Additionally, WaterOneFlow provides a layer  of security over the  database,
which makes it less susceptible to hackers than exposing the database itself with public access.
Appendix A provides step-by-step instructions on how to publish the data with WaterOneFlow.
The main steps include:

    1. Create a SQL Server account that the Web service will use to access the database.
    2. Install the WaterOneFlow Web service on the computer.
    3. Configure the service.
    4. Check the result.

Upon the completion of these steps, the service must be registered at HIS Central to make it
discoverable online. More than just a listing of WaterOneFlow services, HIS Central performs
the following functions:

    •   Provides detailed information about the service, including contact information, abstract,
       and areal extent of the sites.
    •   Supports translation from the variables to an ontology of common hydrologic concepts.
       This facilitates easy search for variables in the service, especially by those who are not
       familiar with what the service has to offer.
    •   Maintains a catalog of sites and variables available in all registered services, enabling fast
       search for data from multiple data providers.

After the service has been registered, HIS Central needs to harvest the data.  HIS Central keeps a
catalog of all sites and variables, which enables fast searching across all registered services. HIS
Central  creates this catalog  by calling  various methods from the WaterOneFlow Web service.
This is called data harvesting. When a data harvest is requested by the database developer, an
HIS Central administrator has to be notified who then triggers a harvest of data.  A trigger
request  is performed with a click of a button. Once the service is in place, harvesting occurs
weekly automatically.
The next step is to map the  variables into the CUAHSI HIS ontology.  This must be performed
for each variable to  ensure that variable  names  comply with  CUAHSI  ontologies  and  are,
                                           13

-------
therefore, recognized. Mapping has to be programmed by the database developer at this time.
Finally, HydroDesktop is used to test if data publication has been successful.
4.2. Publishing New Watershed Data
The three  phases  of publication  of  a new  database  include  the  development/testing,
demonstration  and production.  The  team  has  completed  the  development/testing  and
demonstration phases.

4.2.1. DMS Development (and Testing) Phase (Figure 4-1):
1.   An ORD-HIS Development Database, an ORD-HIS Test Database, and an ORD-HIS
    Production Database have been set up by the EPA OSEVI inside the EPA firewall. During the
    development and testing phase, this approach will allow data loaders and application
    developers and testers three uniform, formatted databases to develop, stage, clean, quality
    assurance (QA), and test data before it is moved to the production database (which will
    eventually be accessible to the public).
    DMS Project HIS Database Administrators will have direct access to the three databases
    (development, test, and production) on the ORD Enterprise SQL Server. The DMS  Project
    HIS Database Administrators will also manage access for other ORD or EPA users (e.g., read
    only,  load data,  or modify records).
2.   EPA/ORD-users may submit a request for the installation and use of CUAHSI's, open-
    source, HydroDesktop software, which was recently approved for use within the EPA
    Agency.  Only users inside the EPA firewall can access the EFW and SC databases during
    the development phase. For demonstration of the EFW and SC databases by the UC
    contractors, copies on a UC server are used until the production phase is complete.
3.   OSEVI will also  begin setting up an EPA HydroServer that will initially function within
    EPA's firewall.  This HydroServer will live inside the EPA firewall until the DMS team is
    ready to make it publically accessible. The HydroServer will read data records from the SQL
    database file that is on the EPA/ORD Enterprise SQL server. The original ESF and SC
    databases currently reside also on this EPA/ORD Hydroserver at RTF which is accessible
    from within the EPA/ORD firewall (through a SQL Server account which has permission to
    view those databases and using ODMTools; point of contact: Dr. Elly P.H.Best).

4.2.2. DMS Demonstration Phase (Figure 4-2):
For demonstration of the EFW and SC databases by University of Cincinnati (UC) contractors,
copies  on a designated UC server are used until the production phase is complete.  These web-
services can be accessed by EPA and non-EPA users who have access to the  Internet in the  HIS
Data Service List by the following link: http://hiscentral/cuahsi.org/pub_services.aspx.
By clicking on the  title 'EPA- East Fork Watershed  in Ohio' the individual service page of the
EFW database comes up (http://hiscentral.cuahsi.org/pub_network.aspx?n=264). The data of this
database can be  accessed via CUAHSI's Hydrodesktop application.
By clicking on  the title 'Shepherd Creek Watershed Ohio' the individual service  page of the
Shepherd Creek database comes up (http://hiscentral.cuahsi.org/pub network.aspx?n=216).
                                          14

-------
The  data  of this database can also be accessed via CUAHSI's Hydrodesktop application.
Publication of both databases is visualized in Figure 4-3.

4.2.3. DMS Production Phase (Figure 4-3):
1.  Once ORD researchers are ready for the production phase (i.e., data are loaded on the
   production database and have funding secured), they will begin the Application Deployment
   Checklist (ADC) or other more recently adopted process to move the EPA HydroServer to
   the demilitarized zone (DMZ, hosted by the EPA National Computing Center) so it will be
   fully accessible by HIS Central (unless a different process has been adopted by EPA).
2.  ORD and other EPA users with the HydroDesktop client installed will access the internet-
   based public HIS Central  metadata catalog through the Agency Firewall.  They will retrieve
   data records from the EPA HydroServer and from other (public) HydroServers as well.
   Access to the EPA HydroServer will be through the internal DMZ firewall router, while
   access to other public HydroServers will be through the Agency Firewall.
3.  Public (non-EPA) clients who retrieve pointers to ORD water data from HIS Central queries
   will access the EPA HydroServer through the Agency internet/DMZ router.  The EPA
   HydroServer will retrieve requested water data records from the internal ORD-HIS  database
   and pass them back to the external client.
4.  Offsite (external to the Agency Firewall) access to the ORD-HIS databases will only be
   possible through the EPA HydroServer, and will be limited to those types of access provided
   by the  HydroServer (i.e., only retrieval of water data records from the ORD-HIS database).
   Offsite ORD-HIS Database Administrators or other offsite users with requirements  to
   directly access the ORD-HIS databases (e.g., approved external data loaders) will have to use
   Agency AAA/F5 connection to access the ORD-HIS databases.  (An additional 'jump box'
   may be necessary for offsite direct access to the ORD Enterprise SQL Server and the ORD-
   HIS databases.  This may have to be developed as required).
By placing the HIS database on the ORD Enterprise SQL Server, ORD researchers are relieved
of all tasks to maintain the SQL server. ORD researchers (or designated staff) remain the data
owner and Database Administrator of the databases.  OSEVI (or designated staff) manages and
maintains  the SQL Server itself, including normal server backups. Reliable backup of  SQL
database  files will require arrangement and coordination with the Database  Administrator.
Additional backup may also be needed and will be determined later.
                                          15

-------
The Internet


                            EPAHydroserver
                              Public Data
                              Publication
EPA-ORD Internal Network
                            DMS in Development Phase
                                                                          rodesktop
                                                                         EPA clients
                                                                      EPA-ORD User
             ORD Enterprise
               SQLServer
                                                               It,/
                                                           *»
                    SQL Client
                  OOLOslal-oarfer
                 Database Managem_
                EPA-ORO
                EFWObue Manager
                       Figure 4-1.  DMS Development Phase
                                      16

-------
    The Internet
Hydrodesktop
PubNc clients
 (Non ""'
  HIS Central
 Data discovery, metadata
 catalog, web-services API
EPA-ORD Internal Network
                           DMS in Demonstration Phase
                                                                                   ORD Enterprise
                                                                                      SQL Server
                                                                                11  //
                                                                                           SQL Client
                                                                                          ODLDataLoader
                                                                                       Database Management
                   EPA-ORD
                   EFWDbase Manager
                   UC Contractors
                               Figure 4-2.   DMS Demonstration Phase
                                                   17

-------
   EPA - East Fork Watershed in Ohio
          EPA.EFW
          EFVy
         j *»«•*«. S«rw*
           Conutt
       **
                       i si .5? MI
T« si Data 1rom (he EPA East Fork \Vat«ished in Ohio. Data is not qualit,
controlled
  EPA kilt Fork Witor»t)»d ,n Dim
Shepherd Creek Watershed Ohio
        EPA-SC
                                                                          Edit Service Details
Test Data from Hie Sflepterd Creek Watershed in Ohio Data is quatfy
                                                       Tea Dais from Ete Shepherd Creek Wattfdwd w
                                                       Oho. Dili i$ quality confided
            Figure 4-3.    Publication DMS in Demonstration Phase
                                            18

-------
    The Internet
mil!'  EPA's Security
 IMOMZ
          DM2
Hydrodesklop
Public clients
 (Non-EPA)
 Data access
                                                                       EPA-ORD Internal Network
  HIS Central
 Data discovery, metadata
 catalog, web-senrtcej API
                            DM5 in Production Phase
                                                                                                SOL Client
                                                                                              OOLOstsLosder
                                                                                            Database Management
                                                    EPA      EPA-ORD
                                                            EFW Dbase Manager
                                   Figure 4-4.   DMS Production Phase
                                                     19

-------
        5.0:      USING THE DMS: EXCHANGE AND INTEGRATION OF WQ2
                                      INFORMATION

This section describes tools that enable data visualization and analysis. It includes a discussion of
HydroDesktop, HydroR and HydroExcel. Details on HydroDesktop and HydroR are contained in
Appendix D, and on HydroExcel in Appendix A.

5.1. Using  HydroDesktop
HydroDesktop is a desktop application for discovering, accessing, and analyzing time series data
from WaterOneFlow services. HydroDesktop  can be used to search for hydrologic data, select
and download data, visualize time series, label  features, delineate watersheds, and explore data.

5.2. Using  HydroR
HydroR plug-in is an interface between HydroDesktop and the R statistical software. HydroR
makes it easy to provide R access to the data downloaded with HydroDesktop. This interface
allows HydroDesktop users access to advanced statistical analysis and plotting capabilities of R.

5.3. Using  HydroExcel
HydroExcel  is  an Excel  spreadsheet  customized with  macros for  accessing data  from  a
WaterOneFlow Web service. It is thus possible to extract data from the published  Web  service
from within an Excel spreadsheet.
                                          20

-------
             6.0:      DATA ANALYSIS TO INFORM DECISION-MAKING
6.1. Area of Interest and Problem Identification
In this section a case study is described to illustrate how users may access, explore and retrieve
WQ2 data from the DMS,  combine them  with their own data,  conduct analyses and use the
results to inform watershed management decision-making.
The East Fork of the Little Miami River (EFLMR) watershed is a large Midwestern watershed,
covering approximately 1,295 km2 (500 mi2) and discharging into the Little Miami River (LMR),
a National Scenic River. The LMR is a tributary of the Ohio River.  The headwaters of the
EFLMR are located in the rural Counties, Clinton,  High, and Brown, and the confluence with the
LMR in suburban Clermont County.  Dam  construction by the U.S. Army Corps of Engineers
(USAGE) is in 1975  created a 8.74 km2 (3.38 mi2) reservoir (Harsha Lake or East Fork Lake).
The dam and reservoir  provide flood control, recreation opportunities, and a drinking water
source for Clermont County. The EFLMR has been listed as an impaired water by the State of
Ohio  since  2006,  and  has been  designated  for  Total  Maximum Daily Load  (TMDL)
identification (Collaborative EFLMR Watershed, 2007; Collaborative).   The watershed has a
mixed agricultural and urban  land  use,  and harbors several  wastewater treatment plants
(WWTPs) and a drinking water treatment plant (DWTP). Issues of concern for the water quality
of this watershed include stormwater,  wastewater,  and agricultural runoff, which may contribute
to increased loading of N and elevated aqueous N  concentrations;  and  drinking water treatability
(Nietch et al. 2010).  A level of  1.1 mg/L for dissolved inorganic  nitrogen (DIN)  has been
suggested as  a water  quality standard for the management of rivers and  streams <1,300 km2
(Miltner 2010).  In this case study, a procedure is described in which a N-TMDL for the EFLMR
is computed using 1.1 mg DIN/L as a target level.  The TMDL will be calculated for the pour
point of the watershed, where the EFLMR discharges into the LMR. The layout of the watershed
is depicted in Figure 6.1.

6.2. Data Management
Data management, sharing, exchangeability, and accessibility were identified as major problems
for  effective communication and analysis  of  information  by the  Collaborative  in which
EPA/ORD participates.  Many WQ2 data are currently collected, analyzed, and stored in similar
ways by members of the Collaborative.  Therefore, it is expected  that  a scalable DMS based  on
CUAHSI protocols for storage and management of georeferenced discrete data pertaining to the
EFW  collected by EPA will  serve as an example and greatly facilitate effective communication
and analysis of information in the near future (see sections 2-5 of this report).

6.3. Data Exploration
HydroDesktop was used to  explore and identify  the geographical area of the EFW, select it,
identify  the  monitoring stations  in  the  watershed, and  visualize this information as  a
geographically referenced map (Figure 6-2).
                                          21

-------
                                             * 
-------
     .! UV, - bwl f-wt, W«er»1wd m C*
     m (1290211 II7S37J
     ] nwiy [teiiy v*km
  . F™

 a **,un
     ,,r.
     D
   -  US

   a Qu.9.S»
     XNAMC
   -  c.

                                                                       < Xan MUTKK^ WWTP Fbul vfflufrri i
                                                                            . OH 
-------
                     Discharge of East Fork Watershed at Perintown (OH)
6000 -
sonn
Ji dnnn
9
P
_C
u

1000 -
0'




•

i
Jan





t
i
F<
1




f !
;b M
—




'
1
ar
•
i
i
»



:
A





'
91





\
i
M

.
! •
'
. ' 1 •
i !
: : - ! • T:
'i i 4 I 4 ,| 4 ^
ay Jun Jul Aug Sep Oct Nov Dec
                                    Time (month)

   Figure 6-4.   Box and Whisker Plot of the Flow Data at Perintown, OH, Showing a
                   Maximum in March and Minimum in September
Figure 6-5.   Summary Statistics of the Flow Data at Perintown, OH, Showing Variability
                                    in the Data
                                        24

-------
6.4. Data Analysis Using Excel
Cumulative flow (in mega gallons per day, MOD) was calculated from measured daily flow (in
cubic feet per second, cfs), and desired period of time (in days, d). Cumulative flow over a year
is commonly used as the basis for estimation of pollutant loads and/or TMDLs (Figure 6-6).
Daily flow is calculated using equation (1).
where:
Qd
Qi
          Qd  =
                1=1
                                                           (l)
cumulative flow on day d
flow on day i
       0
                            Perintown Flow 2010
                                                                           120
                                                                           100
                                                                           30
                                                                               g
                                                                               o
                                                                               
                                                              0
      1/1/2010 2/20/2010 4/11/2010 5/31/2010 7/20/2010  9/8/2010 10/28/201012/17/2010
                     Figure 6-6.   Flow at Perintown, OH, in 2010

In addition to the  flow data, data  on nitrogenous compounds from  station EFW close to
Perintown, the pour point, collected by the Collaborative were selected.
                                         25

-------
                             Nitrogen Time Series
      6000
      5000
      4000
      2000
      1000
         0
        1/1/2010  2/20/2010 4/11/2010 5/31/2010 7/20/2010  9/8/2010 10/28/201012/17/2010
                                           Date
              Figure 6-7.   Nitrogen Time Series at Perintown, OH, in 2010

Cumulative load is calculated as follows.
   — /i * r1                                                t^i\
  • — Vi * Lt                                                (/)
where:
LJ     =     load at time /'
Qi     =     flow at time /'
Cj     =     concentration at time /'

Daily cumulative load is then:
Ld  =If=1Qi* C^ At                                     (3)
where:
Ld    =     cumulative load on day d
Qi     =     flow at day /
d     =     concentration at day /
A^     =     the dimensionless time step from day dio day d+1 at which concentrations were
             measured

Cumulative target load is calculated as follows.
Lcd  =2f=1
-------
                                Nitrogen Loading Curve
            SCDOOO
            450000
                                         5/31/10     7/20/10     W10     10/U/10    13/17/10
                                                Dole
               1A/10
                                                            ^—CunUsne CrmreN Loadwth MOS(H)


   Figure 6-8.   Computed Nitrogen Loading Curve of the East Fork Watershed in 2010

The flow duration curve (FDC) is a plot that shows the percentage of time that flow in a stream
is likely to equal or exceed a specified value of interest.  Flow duration curves are developed
from historic flow data at the site  and provide a snapshot of the flow record at that location over
a certain period of time.

A duration curve is computed  according to the following equation.


where:
p(x)    =
P(x)    =
              exceedance probability of event x
              probability of event x
The FDC for the pour point of the EFLMR was computed over a period of 5 years from 2005
through 2010. It showed that the flow at this station is at least 100 cfs in 55% of the time (Figure
6-9). Flow duration curves are often segmented into flow regimes to describe varying hydrologic
conditions at that site (Cleland, 2003).
                                           27

-------
                      Flow Duration Curve
                                                                            Low
                                                                            Flow
                                  Flow Exceedance Percentile

             Figure 6-9.    Flow Duration Curve of the East Fork Watershed

Flow duration  curves  have  several applications  in the  water resources  field (Vogel  and
Fennessey 1995; Johnson et al. 2009),  including water quality analysis through load duration
curves. Load duration curves are developed from and work similarly to flow duration curves.
Instead of addressing flows, however, load duration curves address the likelihood of equaling or
exceeding a given pollutant load at a  given  location. Curves are computed by applying the
concept of mass loading which combines water  quality  and flow information to quantify the
pollutant load contributed to a point on a stream by the watershed that lies above it.
The mass loading is then calculated by:
LL= QL* CL                                                    (6)
where
L;     =     mass loading of input at time /' in mass/time
Qi     =     input flow at time /' in volume/time
C;     =     pollutant concentration at time /' in mass/volume.
Load duration curves can give insights in various aspects of pollutant loading, such as patterns in
loading under various flow conditions, impacts of point versus non-point sources, and effects of
best management practices (Cleland 2002; Cleland, 2003; USEPA, 2007). Over the last 10 years
load  duration curves have been widely used in the  calculation of TMDLs.

Using the load duration curves to calculate TMDLs is governed by the following equation

TMDL = X WLA +1 LA + MOS                             (7)
where
                                          28

-------
WLA   =      the waste load allocation (point sources)
LA    =      the load allocation (non-point sources)
MOS  =      the margin of safety

The Clean Water Act requires that a TMDL includes a margin of safety (MOS) to account for
any lack of knowledge concerning the relationship between load and waste load allocations and
water quality.  According to EPA guidance the MOS may be implicit (i.e., incorporated into the
TMDL through  conservative assumptions in the analysis) or explicit (i.e., expressed in  the
TMDL as loadings set aside for the MOS). Commonly a MOS of 5-10% of the TMDL is used.
However, a 20% MOS may be used for the load duration curve (LDC) method where a relatively
low number of data points is available for the analysis.
For this EFW  case,  first a target curve was computed from the target concentration (of 1.1  mg
N/L) according to equation (7) using a 10% MOS. The resulting curve is the maximum pollutant
load that can be experienced at the  site,  based on previous flow conditions, while still meeting
the water quality standard  (i.e., the TMDL; Figure 6-10,  blue  line).  Subsequently,  the  water
quality data monitored at the nearby station were converted to loads  and plotted  in the same
graph  (Figure 6-10; red dots). This example  computation is limited to monitored N values
accessible via the DMS, representing non-point  sources (LA), and, therefore,  the estimated N
load is probably less than with point-source contributions  (WLA) included. Because the  target
curve represents the water quality standard, points falling above the curve are out of compliance,
and points below the curve are in compliance.  The difference between the observed and  target
loads are then computed to reveal the overall load reduction required to meet the water quality
standard. The required load reduction can be found by inspecting the flow regimes pertaining to
the observed and target loads, respectively.  The percent load reduction can be calculated by
subtracting the average of the observed load from the average of the target load (Figure 6-11).
Expressing the  load  reductions  numerically  and  by flow regime  provides  more  specific
information  on the  circumstances  under which  TMDLs are exceeded,  and may, thus,  guide
measures and  actions  required to avoid TMDL exceedance (Table 6-1). To take the MOS into
consideration,  the target load has to  be decreased by 10% for this calculation. In this watershed,
the observed N load remained below the target load under high flow conditions in 2010, despite
transporting the greatest N load, since ample water was available for dilution. In contrast,  under
dry as well as  low-flow conditions the observed N load greatly exceeded the target load, despite
transporting over ten times lower  loads per day than at high flow. In this case,  therefore,
measures alleviating water quality impairments under dry and low-flow conditions may have to
be  taken. In the current TMDL example  computation only N from non-point sources was
included. To reduce the N loading from non-point sources,  nutrient management measures such
as reduced fertilizer rates, conservation tillage, cover crops, and on-site runoff treatment may be
considered. Regulatory action recommending lower effluent concentrations for N only pertain to
pollutants from point sources.

This case study demonstrated the functionality of the DMS by  estimating an example nitrogen
(N) TMDL using a  Load Duration Curve (LDC) approach. Data were  retrieved from the DMS
and other  databases  accessed,  explored  and  visualized  through  the  HIS'  application
Hydrodesktop.  In this  case  Hydrodesktop  accessed the  DMS-data  via a link to  the test-
publication of the EFW Database for which CUAHSI provides a codeplex sandbox.  When the
implementation of the recently  developed DMS is  approved  for use within EPA-ORD and

                                          29

-------
funding for the DMS is generated to enter into  the  'Production Phase'  (involving dedicated
server testing, operating and maintenance), the publication of the EFW Database in the CUAHSI
HIS Services Central Web  Service Registry (and possibly other websites) can be accomplished.
The latter  step would enable access and use by  users  within and outside EPA/ORD, and is
expected to greatly facilitate and accelerate collaboration.


                          Nitrogen  Load Duration Curve
               1.0E+05
               1.0E+00
                    0.0    0.1     0.2     0.3    0.4    0.5    0.6    0.7    0.8    0.9     1.0
                                       Flow Exceedance Percentile
                                  - Target Sanple Load   •  Observed Loads
         Figure 6-10.  Nitrogen Load Duration Curve of the East Fork Watershed
                             Nitrogen Load Duration Curve
                  1.0E+05
                  1.0E+00
                       00    0.1
                                        03    04    05     06    07

                                         Flow Exceedance Percentile

                                     -Target Sample Load   » Observed Loads
                                                                    08    09
                                                                               10
 Figure 6-11.  Nitrogen Load Duration Curve of the East Fork Watershed, with Difference
        Between the Averages of Observed and Target Loads, Respectively, Marked

                                            30

-------
 Table 6-1.    Load Duration Curve Estimated Nitrogen Loads and Overall Reductions
Needed to Meet the Proposed Dissolved Inorganic Nitrogen TMDL for the Management of
             Rivers and Streams <1,300 km2 (with 10% margin of safety)
Nitrogen Reductions
Flow
Percentile
Range
0-0.1
0.1-0.4
0.4-0.6
0.6-0.9
0.9-1.0
Hydrologic
Condition
Class
High flow
Moist
conditions
Mid-range
conditions
Dry
conditions
Low flow
Number
of
Samples
5
17
10
7
17
Average
Observed
Load
(kg/day)
6399.39
1951.91
553.82
268.14
335.88
Average
Target
Load
(kg/day)
7608.86
1964.72
369.53
141.61
93.27
Average
Target
Load w.
MOS
(kg/day)
6847.98
1768.25
332.58
127.45
83.95
% Load
Reduction
w. MOS
=Obs-Target
w.MOS)/Obs
0.0
9.4
39.9
53.5
75.0
                                      31

-------
                7.0:      CRITICAL GAPS OF CURRENT APPROACHES

The EFW biological monitoring data include data attributes that do not map to attributes within
ODM.  Because access to the data within CUAHSI HIS is provided by Web services which
connect to the ODM database, this presents a number of challenges. The ODM database will not
store these attributes. Even if the database were modified to accommodate these attributes, the
ODM data loaders would not load these attributes because they have not been programmed  to
recognize them.   Even if the data  are somehow  successfully loaded into  the database, the
remaining cyberinfrastructure from the WaterOneFlow  Web  service  to HIS  Central  to
HydroDesktop, would not recognize these attributes and so they would not be communicated  to
end users. Recommendations for addressing these challenges are described in the next section.
                                          32

-------
         8.0:      RECOMMENDATIONS FOR IMPROVING UPON EXISTING
                                       APPROACHES

Being an open source system, modifications to CUAHSI HIS software and data models can be
made when they fall short of project needs. For example, to address issues with handling
biological data as described in the previous section, the following steps could be taken:
    1.  Using SQL Server, modify the ODM database to  include the additional biological
       attributes.
    2.  Modify the ODM data loaders to load these attributes. Modifications can be made by
       downloading the source code from CUAHSI, modifying the code, and compiling the
       code.
    3.  While it is not possible to modify the HIS Central application since it is maintained at
       CUAHSI, users within the EPA firewall can still get access to the data by making direct
       connections to the ODM database. These connections would allow for the transfer of the
       additional biological attributes.
A balance must be struck between making these modifications within the DMS team and relying
on CUAHSI to perform the work. In some cases, the modifications may prove useful to the
broader CUAHSI community outside the scope of the DMS project. CUAHSI provides a
number of mechanisms by which the DMS team can contribute to this broader community:
    1.  When modifying source code, instead of maintaining only a local copy of the code,
       commit a branch with the code to CUAHSI's source code repository.  The two
       repositories of interest are:
          a.  HydroServer http://hydroserver.codeplex.com/
          b.  HydroDesktop http://hydrodesktop. codeplex. com/
    2.  When issues are encountered or to open discussions with the CUAHSI team and
       community, use the issue tracker and discussion forums located on the websites linked
       above.
    3.  When terms must be added to the controlled vocabularies in the project's ODM database,
       if these terms might be useful to the broader community, submit edits to CUAHSI's
       master controlled vocabularies at http://his.cuahsi.org/mastercvreg.html.
                                          33

-------
                                9.0:       REFERENCES

Cleland, B. 2002. "TMDL Development from the 'Bottom Up' - Part II: Using duration curves
   to connect the pieces," In: National TMDL Science and Policy 2002 - WEF Specialty
   Conference. Water Environment Federation, Phoenix, Arizona.

Cleland, B. 2003. "TMDL Development from the 'Bottom Up' - Part III: Duration curves and
   wet- weather assessment," In: National TMDL Science and Policy 2003 - WEF Specialty
   Conference, Chicago, Illinois, p. 27

Horsburgh, J. S., D. G. Tarboton, M. Piasecki, D.R. Maidment, I. Zaslavsky, D. Valentine, and
   T. Whitenack. 2009. "An Integrated System for Publishing Environmental Observations
        " Environmental Modeling and. Software, 24, 879-888.
Horsburgh, J. S., D. G. Tarboton, D.R. Maidment, and I. Zaslavsky. 2008. "A Relational Model
   for Environmental and Water Resources Data," Water Resources Research, 44, W05406, 12
   P-

Johnson, S.L., T. Whiteaker, and D.R. Maidment. 2009. "A Tool for Automated Load Duration
   Curve Creation," Journal of the American Water Resources Association, vol. 45, No 3: 654-
   663.

Miltner, RJ. 2010. A Method and Rationale for Deriving Nutrient Criteria for Small Rivers and
   Streams in Ohio. Environmental Management. DOI 10.1007/s00267-010-9439-9.

Nietch, C., M. Elovitz, E. Heiser, H. Thurston, L. Underwood, H. Lubbers, D. Macke, E. Best, P.
   Braasch, R. McClatchey, D. Brown, M. Heberling. 2010. "Linking sources to stress
   dynamics in drinking and recreational waters in a mixed-use watershed in Southwestern Ohio
   with a multi-agency cooperative. Society of Environmental Toxicology and Chemistry North
   American 31st Annual Meeting, Portland,  OR, November 7-11, 2010.

Tarboton, D.G., J.S. Horsburgh, and D.R. Maidment. 2008. "CUAHSI Community Observations
   Data Model (ODM) Version 1.1 Design Specifications," CUAHSI.

USEPA (U.S. Environmental Protection Agency). 2007. "An approach for using load duration
   curves in developing TMDLs," U.S. Environmental Protection Agency: Office of Wetlands,
   Oceans, & Watersheds, Washington, D.C., 68 p.

Vogel, R.M. and N.M. Fennessey, 1995. "Flow Duration Curves II:  A Review of Applications in
   Water Resources Planning," Water Resources Bulletin 3 1(6): 1029-1039.
                                          34

-------
This page intentionally left blank.
               35

-------
             APPENDIX A:
           CUAHS
            universities allied for wafer research
Workshop: Publishing Data with the CUAHSI Hydrologic Information
               System

             September 7, 2011
                by:

         Dr. Tim Whiteaker (twhitgpmail.utexas.edu)
          Center for Research in Water Resources
           The University of Texas at Austin
                A-l

-------
Distribution
Copyright © 2011, Consortium of Universities for the Advancement of Hydrologic Science, Inc.
All rights reserved.

Funding and Acknowledgements

Funding for this document was provided by the Consortium of Universities for the Advancement of
Hydrologic Science, Inc. (CUAHSI) under NSF Grant No. EAR-0622374.  In addition, much input and
feedback has been received from the CUAHSI Hydrologic Information System development team. Their
contribution is acknowledged here.
                                         A-2

-------
Table of Contents
Introduction	4

    Goals of the Workshop	4
    Workshop Requirements	5
    About the Workshop Data	6
    Review of CUAHSI-HIS Data Publication Tools	6
    Workshop Outline	8
Translating and Loading Data into ODM	9

    Installing a Blank ODM Database	9
    Getting To  Know Your Data	13
    Creating a Sites File	14
    Creating a Sources File	16
    Creating a Variables File	18
    Loading Sites, Sources, and Variables into ODM	20
    Creating and Loading a Data Values File	21
    Using the Streaming Data Loader	23
    For Advanced Participants	31
Working with ODM Data	32

    For Advanced Participants	37
Publishing an ODM Database with WaterOneFlow	38

    Creating the Webclient SQL Server Account	38
    Installing a WaterOneFlow Web Service	40
    Testing the Web Service in a Web Browser	45
    Testing the Web Service with HydroExcel	46
    For Advanced Participants	48
Registering Your Service at HIS Central	49

    Adding Your WaterOneFlow Web Service to HIS Central	49
    Viewing Your Data in HydroDesktop	54
Appendix A: Uninstallation Instructions	59

    HIS Central	59
    WaterOneFlow	59
    Streaming Data Loader Scheduled Task	59
    ODM Database	60
    CUAHSI-HIS and Related Software	60
                                            A-3

-------
                                         Introduction

This document provides steps to complete the hands-on training portion of a workshop that teaches how
to publish water observations  data using the CUAHSI Hydrologic Information System  (HIS).  This
involves loading data into an Observations Data Model (ODM) database, exposing the data in a secure
and  standard way online via WaterOneFlow Web services, and  making  the  data  discoverable  by
registering the Web service with HIS Central.

For background information on CUAHSI or HIS, please refer to presentation materials provided at the
workshop, or the HIS website at http://his.cuahsi.org.

Why publish data online?
Sometimes people ask what the motivation is for using HIS to publish data. When the HIS team interacts
with HIS users, here are the most common reasons those users give for why they want to publish their
data:
    •  Academics
           o   Recognition of work
           o   Data publication is mandated by the funding agency
           o  To support science in the US and promote collaboration
    •  Agencies
           o  Standardizing data access (both internally and externally)
           o  Time savings in developing a publication system
           o   Public benefit with publication
           o   Return on investment - people can get the data themselves without requiring a
              "middle-man"
           o  Get all the state or regional data "together"

Additionally, HIS just makes it easy for users to discover and access data.  This is traditionally a pretty
big time sink for users, so let's be good citizens and make their lives (and ours) easier!

If you have  other reasons for publishing data, please let us  know!  Fulfilling your needs is a primary
driver in the  future development of HIS.

Goals of the Workshop
This workshop seeks to introduce you to the  HIS data publication process.  As there are numerous
avenues for publishing data, which often depend on a given user's available software and system setup,
this workshop does not seek to teach  every technique that can be  used for publishing data. Rather, the
workshop communicates the basic concepts  of data publication, which can then be applied by you to fit
your specific needs or environment. After completing the workshop, you should be able to:
    •  Understand what kind of data can be stored in an Observations Data Model database
    •  Translate your observations data and metadata into terminology used by the Observations Data
       Model
    •  Load your data into an Observations Data Model database
    •  Publish your data with a WaterOneFlow Web  service
    •  Register your Web service with HIS Central so that others can discover it
    •  Access your data in a number  of ways using HIS software:


                                             A-4

-------
           o   Direct database connection with ODM Tools
           o   Direct Web service access with HydroExcel
           o   Discover and access with HydroDesktop
    •   Learn more about HIS using the HIS website at http://his.cuahsi.org

Workshop Requirements
Computers and sample data were prepared ahead of time for the workshop.  However, all HIS software is
free, and so the configuration below can also be applied to your own computer if you have licenses for the
commercial software used, such as the operating system.  The system outlined below closely resembles
the HydroServer system described at http://his.cuahsi.org/hydroserver.html.  but  without website and
spatial data hosting capabilities.
  Note
  The instructions in this manual are written assuming your computer is configured like the one described below.
Hardware:
    •   PC, 4 GB RAM, 500 GB hard disk, 2.7 GHz processor
    •   Network setup allowing external access to Web services installed on local PC

Software:
    •   Windows 7
           o   Microsoft Internet Information Services (IIS) 7 — comes with Windows 7, but may need
               to be enabled.
                  •   Enable ASP.NET.
    •   .NET Framework 2.0 SP2, 3.5 SP1 (free)
    •   Microsoft SQL Server 2008 R2 (commercial) or SQL Server 2008 R2 Express (free)
           o   Be sure to install the version with SQL Management Studio (aka Database and
               Management Tools)
           o   Install with these options:
                  •   Install with Mixed Mode Authentication.
                  •   Specify the "sa" (system administrator) password that you will remember.
    •   HIS software (free)
           o   ODM Data Loader- http://his.cuahsi.org/odmdataloader.html
           o   ODM Streaming Data Loader - http://his.cuahsi.org/odmsdl.html
           o   ODM Tools - http://his.cuahsi.org/odmtools.html
    •   Additional software (not required for HydroServer)
           o   Microsoft Office 2010 (32-bit version)
           o   Google Earth (free) - http://earth.google.com/download-earth.html
           o   HydroObjects (free) - http://his.cuahsi.org/hydroobiects.html
           o   HydroExcel (free) - http://his.cuahsi.org/hydroexcel.html
           o   HydroDesktop (free) - http://his.cuahsi.org/hydrodesktop.html

Data (located on your Desktop in a folder called Workshop):

                                             A-5

-------
    •  Raw data files of water quality time series
    •  Metadata text file describing the data
    •  Solution files, for reference or for use if workshop steps cannot be completed successfully
    •  Image files that will be associated with your data

User (You):
    •  Basic knowledge of how to operate a computer and use the internet
    •  Very basic notions of database concepts such as the terms "table" and "field"
    •  Rudimentary understanding of hydrology and hydrologic data

About the Workshop Data
In this workshop, you will publish time series of water quality data measured for the Lake Champlain
Long-term Water Quality and Biological Monitoring Project. The data include measurements of nitrogen,
phosphorus, temperature, total suspended solids,  and  chlorophyll a, taken from  1992 to 2007.  More
information         about         this        project        can         be        found        at
http://www.am.state.vt.us/dec/waterq/lakes/htm/lp longterm.htm.

Each workshop computer has been installed with "raw" data files of the  water quality  observations
described above.  Some modifications have already been made to the raw files to facilitate data loading.
We wanted you to get a sense of how to transform data, without overburdening you with easy-but-tedious
data operations. The raw data files include site locations, time series  of water quality observations, and
metadata.  These files are located on your Desktop in Workshop\RawData. During the workshop, you
will transform the raw data files so that they are in a form that can be loaded into an Observations Data
Model database.  As a  contingency plan in case you  are unable to  complete the data transformation
process,   transformed   files   have    been   generated    for    you   and   are    located   in
WorkshopVSolutionFilesYTransformedData.

Review of CUAHSI-HIS Data Publication Tools
The CUAHSI Hydrologic Information System provides Web services, tools, standards and procedures
that enhance access to more and better data for hydrologic analysis.  HIS software is free and available on
the HIS website at http://his.cuahsi.org. A variety of HIS software applications have been built to serve
several types of users and scenarios, from data users to data publishers to educators and developers.

HydroServer
In this workshop, you'll be playing the role of the data publisher. This means you have some hydrologic
observations data that you've collected, and you'd like to publish that data on the Web in a standard way
so that others can easily access and use  it.  To facilitate data publication, HIS offers HydroServer
(http://his.cuahsi.org/hydroserver.html). which is really just a bundle of HIS software designed for data
publication that operates on a Windows computer.

HydroServer includes software for publishing observations data with a WaterOneFlow Web service, but it
also includes  a website and supporting components for geospatial and temporal data visualization. These
website components are not required for data publication and are not covered in this workshop.  The focus
of this workshop is on storing observations data in an Observations Data Model database and publishing
the data with a WaterOneFlow Web service.

Observations Data Model
                                             A-6

-------
The Observations Data Model (ODM) (http://his.cuahsi.org/odmdatabases.html) is a data model for the
storage and retrieval of hydrologic observations in a relational database. An ODM database stores data
and sufficient ancillary information (metadata) about the data values to provide traceable heritage from
raw measurements to  usable information allowing them to be unambiguously interpreted and used. A
relational database format is used to provide querying capability to allow data retrieval supporting diverse
analyses. To learn all  the details of ODM, read the design specifications document on the website linked
above.

Data can be loaded into an ODM database using a number of tools, including free HIS software.  For
loading static data files for what is generally a one-time process, the free ODM Data Loader is used.
These data files are usually the result of a study or project that has been completed and will not need
periodic updating.  The ODM Data Loader is optimized for data with one data value per row.  For data
that are  continuously  updated, such as data streaming in from sensors in the field, use the free ODM
Streaming Data Loader.  The Streaming Data Loader is optimized for data with one time step per row.
For more complex data loading tasks, SQL Server Integration Services is one of many software packages
up to the task.  However, these software packages are typically not free and require  significant training to
learn how to use them. For this workshop, you'll gain experience with both ODM data loaders.

You'll load most of the data in this working using the  ODM Data Loader.  The ODM Data Loader reads
input  files that are formatted much like the tables in ODM.  For example, if you want to  load site
locations into ODM, you could prepare a spreadsheet called  "sites.xls" with column headings that use
(roughly) the same names as fields from the Sites table in ODM (names are not case sensitive). In some
cases, you can load data for more than one ODM table from a single input file by simply appending
additional columns to the data in the file. This prevents you from having to create an input file for every
table in  ODM, which  would be quite tedious since relational  databases tend to have many associations
across many tables.  A document describing the  input format required by the ODM Data Loader can  be
found at http ://his .cuahsi .org/odmdataloader.html.

Once data are loaded into an ODM database, you can examine the data using  the free ODM Tools.  Or, if
you have knowledge of SQL, you can write your own queries within SQL  Management Studio. If the
data look good, you can publish the data with a WaterOneFlow Web service.

WaterOneFlow
A challenge in querying and interpreting data from disparate data sources is that each data source not only
has its own method for asking for data, but also its own format for delivering the requested data to the
user.  WaterOneFlow overcomes this by providing a single query interface and a standard output format
called WaterML, which is an XML language for the communication of water data.  WaterOneFlow is a
Web service, which facilitates automated and programmatic access to the data. This is an advancement
beyond simply publishing data on a web page, which can require complicated  and  often error-prone
screen scraping and parsing.

A WaterOneFlow Web service is available that hooks directly into an ODM database to publish data from
that database. However, WaterOneFlow Web services  can also be written to support internal data formats
other than ODM.  This means that no matter what data storage mechanism  you choose to use, you can
still publish your data in a standard way with WaterOneFlow.

HIS Central
Once your data are published, there's still the issue of data discovery. How do people learn about your
data? That's where HIS Central comes in. HIS  Central is a website maintained by the CUAHSI HIS
team where you can register your WaterOneFlow Web service. Your service then becomes discoverable
along with dozens of other Web services already registered with the system, including services for EPA

                                             A-7

-------
and the USGS National Water Information System. This makes HIS Central the largest single catalog of
the nation's water data.

HydroExcel and HydroDesktop
Now it's time to briefly play the role of the data user. Once data are published with HIS, how do people
access the data? To help data users get started with HIS, several free applications or application
extensions are available on the HIS website, geared towards application environments most commonly
used by hydrologists such as Microsoft Excel. HydroExcel is an Excel spreadsheet customized with
macros for accessing data from a WaterOneFlow Web service. HydroDesktop is a free and open source
Geographic Information System (GIS) designed from the ground up to work with HIS. In the workshop,
you'll use both HydroExcel and HydroDesktop to verify that you have successfully published your data
with WaterOneFlow.

Workshop Outline
The workshop begins with presentations and demonstrations by the HIS team to familiarize the audience
with HIS. Contact the workshop administrator to check for availability of these materials.  The hands-on
training portion of the workshop leads the audience through the data publication process with these key
steps:
    1.  Translate raw data for loading into ODM.
    2.  Load data into an ODM database.
    3.  Expose database content online via WaterOneFlow Web service.
    4.  Register Web service with  HIS Central to enable data discovery by external users.
                                             A-8

-------
                          Translating and Loading Data into ODM

To load data into ODM, you'll be using a tool called the ODM Data Loader.  The ODM Data Loader
loads data from comma delimited files  (.csv) or Microsoft Excel 2003 files (.xls) that have a one row
header that uses ODM field names in the header, followed by the data in subsequent rows. When loading
data from Excel, the data should be located in a worksheet that has the same name as the file.  More about
these  data  formats  can  be  found  in the  documentation  for  the  data  loader available  at
http://his.cuahsi.org/odmdataloader.html. Generally, the  fields  in the input files conform to the table
structure of ODM, with some flexibility for specifying alternative information for database generated IDs.
You'll see how this works during the workshop.

Most likely, your data are not exactly in the same format as what the ODM Data Loader is expecting.  For
example, instead of using the ODM terminology "SiteCode", you may call the unique ID for each of your
observation sites a "StationID". Also, you may need to do some leg work to look up information such as
the horizontal datum associated with the latitude and longitude  coordinates of your site, which is one of
the pieces of information that ODM requires. Translating your data and metadata into ODM terminology
may seem a bit tedious, but this exercise is actually very valuable as it helps  you to fully understand
ODM as well as your own data, and in the end you will have a database that richly describes your data.

In the interest  of time, much of the data translation work has been performed for you.  However, some
items still remain untranslated and are just dying  to have a talented hydro hero  like you perform the
transformation and save the day!

Installing a Blank ODM Database

An understanding of your own data as well as the Observations Data Model is essential before attempting
to transform your raw data into inputs for the ODM Data Loader.  To help you in this process, ODM
includes some predefined terms  called controlled vocabularies that  you  can  choose  from when
populating its tables. Let's grab an ODM database from the HIS website and see some of these terms for
ourselves.

To attach a blank ODM database:
    1.   In a web browser, navigate to http://his.cuahsi.org.
    2.   Under  Quick Links  on the right, click ODM Database.
    3.   Click the link to download the ODM 1.1 Blank SQL Server Schema Database.
    4.   Unzip the contents of the downloaded file into the SQL Server data folder, e.g., C:\Program
        Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA. The download includes
        both a  database file (.mdf) and a log file (.Idf) that tracks transactions made to the database.

You'll now attach this database to SQL  Server, and eventually you will load the workshop data into this
database.

  Tip
  SQL Server is the database software installed on the workshop computers.  "Attaching a database" to SQL Server
  basically means letting SQL Server know about your database so that its software can work with it.


    5.   Start SQL Server Management Studio.
    6.   Make sure the Server name is (local).
                                             A-9

-------
  Note
  If you are using SQL Server Express, then wherever you see (local) in this document, you should use
  (local)\SQLExpress instead.
    7.  Click Connect to log into SQL Server.
  Tip
  SQL Server Management Studio is an application that lets you manage, view, and execute queries on your
  databases.
    8.   In the Object Explorer on the top left, right-click Databases and click Attach (Figure 0-1).
                              Object Explorer
                               Connect -
                               Q (A CRWR-WHITE AKERT (SQL Server 9.0.30
                                  +
                                                     New Database...
                                                     Attach...
              Figure 0-1 SQL Server Management Studio is used to attach databases to SQL Server

    9.   In the Attach Databases dialog that opens, click Add.
    10.  Navigate to the folder where you saved your database, select OD.mdf, and click OK.
    11.  Change the "Attach As" name to MyWaterData (Figure 0-2), and click OK. (You can change the
        name to whatever you want, but let's all use MyWaterData as the name for this workshop
        exercise.)
                                                 MyWaterData |
                   Figure 0-2 You can assign a database name of your choice in SQL Server
    12. Click OK if prompted about full-text catalogs.
    13. In the Object Explorer, click the plus sign to expand Databases. You should now see your
       MyWaterData database.

Now that the database is attached, let's look its tables.

To explore the database you just attached:
    1.  In the Object Explorer, expand MyWaterData, and then Tables, to see a list of tables in this
       blank ODM database.
    2.  Open the DataValues table by right-clicking it and then clicking Select Top 1000 Rows. This
       table stores time series values. Each  row stores a single datetime, a single value, and metadata
       about that value. The table is currently blank, but you'll fix that later on!
    3.  Notice that there is a field in this table called SitelD. Rather than repeat the  latitude, longitude,
       and other site information with every row in the DataValues table  related to a given site, ODM
                                             A-10

-------
        keeps the database compact by using the SitelD to locate a matching row in the Sites table
        where the site details are stored. These relationships between tables are used extensively
        throughout ODM, leveraging the power of a relational database.
    4.   Click the X near the top right of the query that was opened to close the query and the table. Be
        sure not to click the X in the blue title bar for the application, or you will close Management
        Studio!
    5.   Open the Sites table to see the information describing each site.  Close the table when you are
        finished looking at it.

So far, the tables that you've been looking at are empty.  Next, you'll open some tables that have already
been populated with values that will be useful as you load data.
    6.   Open the VariableNameCV table. The letters "CV" at the end of the table name indicate that
        this is a controlled vocabulary table.  Only terms from this table are used to describe variable
        names. This helps to standardize the terminology used to describe data across multiple ODM
        databases (Figure 0-3).
             Term
              Momentum flux
              N, albuminoid
              Net heat flux
              Nickel
              Nitrogen, Dissolved Inorganic
              Nitrogen, Dissolved Organic
              Nitrogen, gas
Definition
Momentum flux
Albuminoid Nitrogen
Outgoing rate of heat energy transfer minus the inco.
Nickel (Mi)
Dissolved inorganic nitrogen
Dissolved Organic Nitrogen
Gaseous Nitrogen (N2)
       Figure 0-3 The VariableNameCV table is a controlled vocabulary of terms to use when naming variables

  Tip
  A level of data integrity is enforced through the use of controlled vocabularies (CVs) within ODM.  If a field uses a
  CV, then only terms from that CV can be entered into that field. This way, the database uses consistent
  terminology. If there is a CV term that you need which is not already in ODM, you can add it. Just remember to
  add the term to the CV table first, and then load your data.

    7.   Open the SampleMediumCV table.  This table has terms used to describe the sample medium in
        which a measurement applies.
    8.   Continue opening and browsing the tables of ODM. When you are ready to move on, close the
        tables and minimize SQL Server Management Studio. You can leave Management Studio open
        as we'll use it again later.

The controlled vocabularies were originally conceived by the HIS team and are updated by users like you
who    submit    change    requests    or     additions     to     the     CV     website     at
http://his.cuahsi.org/mastercvreg/cvl 1 .aspx.  As this list is updated, the  CVs in your  database  may
become out of date. The ODM Tools makes updating your CVs a snap.  You'll use the ODM Tools again
later to analyze the data you load, but now let's use the tools just to update the CVs in your database.
                                              A-ll

-------
The ODM Tools software is a free download from the HIS website at http ://his .cuahsi .org/odmtools .html.
The tools have already been installed on the workshop computers.

To update controlled vocabularies using the ODM Tools:
    1.  Start the ODM Tools. When you run the ODM Tools for the first time, it will prompt you for a
       database connection.
    2.  Input these values into the New Database Connection dialog and then click Save Changes.
           a.   Server Address: (local)
           b.   Database Name: MyWaterData
           c.   Server User ID: sa
           d.   Server Password: [enter the password for the server user ID here]
    3.  Dismiss the message indicating that the connection was successful.

  Tip
  The sa account is an administrator account for SQL Server. It has all the privileges we need for this workshop and
  more. For security in your production environment, you may want to establish a SQL Server login with lower
  permissions and call it ODMToolsUser, for example, so that your users do not need to know the administrator
  login and password. Steps for creating a login are provided later in this manual.


When the tools open, the interface looks a bit empty. That's because you haven't loaded data into the
database yet.  However, we can still update the controlled vocabularies.
    4.  Click Tools | CV Update.

A dialog opens showing you a comparison between the master controlled vocabulary list maintained by
CUAHSI (on the left) and your local CV (on the right). You can change the CV that you are looking at
using the drop down box in the top left corner of the dialog.
    5.  Change to the Sample Medium CM.

As you can see, (at the time this document was written) the master list has been updated since the last
time the ODM blank database was created  for the HIS website.
    6.  Click Update Local CV.

Your local CV now indicates that terms have been  added but not committed.
    7.  Click Apply. This commits the changes to your local database.

  Tip
  If you don't see the word "Apply," after clicking Update Local CV, then it means your database is already up-to-
  date.

    8.  Repeat this process to update the  remaining CVs in the list.
    9.  Once the CVs have been updated,  close the ODM Tools.

Now that you've had a hands-on introduction  to ODM, let's get to know the raw data that you'll be
loading into it.
                                             A-12

-------
Getting To Know Your Data
Let's now  take  a  look  at  the raw data  for  the workshop,  available  on  your  computer  at
Workshop\RawData. These data files were created from real data extracted out of the Lake Champlain
Long-term Water Quality and Biological Monitoring  Project.  Note that the  data have been slightly
reformatted to facilitate use in this workshop, but much of the terminology used by the  data is untouched,
giving you a sense of what it takes to transform real data for loading into HIS.

Purpose of the Data
Some aspects of the data have been modified for use in the workshop. These data are provided solely for use in
the HIS workshop, and are not intended to be used for real analysis or decision making.

You should see three files in the RawData folder:
    •   sampling_sites.xls - Excel spreadsheet with locations for all sampling sites
    •   LCM_Data.xls - Excel spreadsheet with  the time series of water quality for all sites and variables
    •   metadata.txt - Text file with metadata describing your data

Sites
Open sampling_sites.xls.   Each row in this file represents a single sampling site  and  includes the
following information:
    •   StationID - Unique internal identifier for a site
    •   StationName-Name of the site
    •   Latitude - Latitude of the site in decimal degrees
    •   Longitude - Longitude of the site in decimal degrees
    •   County - The county that the site is in
    •   State - The state that the site is in

Time Series
Open LCM_Data.xls. Each row in this file  represents a single water quality measurement at a particular
site at a particular point in time.  By looking at the dates and times of the measurements, it appears that
the data were taken sporadically through time. The spreadsheet includes the following information:
    •   Station - Unique internal identifier for a site
    •   Date - Date that a measurement occurred
    •   Time -Time that a measurement occurred
    •   Depth - Depth at which the measurement was taken
    •   Test - Code for the variable being measured
    •   Result - Value of the variable at the  given date  and time
    •   Method - Method used to determine the value

Metadata
Open metadata.txt. If this data provider wanted to share the data, the information in the sites and time
series files alone may not be enough to fully  describe the data.  Therefore, metadata such as the content of
metadata.txt are often provided alongside the actual data files.  In this file, you can find information about
the nature of the study, the variables involved,  and the data source.  You'll use this information to help
load the data into HIS.
                                             A-13

-------
Transformed Output
From these files describing sites, time  series, and metadata, you will create transformed files that are
formatted for loading into an ODM database using the ODM Data Loader.  The transformed files that you
will create are:
    •  Sites
    •  Sources
    •  Variables
    •  DataValues

These are basically the minimum pieces of information you need to describe your data in ODM. Of
course, there  are plenty of additional types of information you can load into an ODM database to more
fully describe your data, but for the purposes of this workshop, you'll just load the above items.

Before performing the transformation, it's imperative that you familiarize yourself with the  structure of
the Observations Data Model (http://his.cuahsi.org/odmdatabases.html). and the requirements for input
files to the ODM  Data  Loader (http://his.cuahsi.org/odmdataloader.html).  Review the information in
"Review of CUAHSI-HIS Data Publication Tools," the workshop presentation  materials, and the online
materials linked above  for more information.   The  following  sections describe  the transformation
procedure.

Creating a Sites File
Information about your  sampling sites is  contained in the  RawData\sampling^sites.xls file.  You'll
transform this to a TransformedData\sites.csv file.  The fields in the transformed file include:
    •  SiteCode
    •  SiteName
    •  Latitude
    •  Longitude
    •  County
    •  SiteState
    •  LatLongDatumSRSName

These are some of the field names the ODM Data Loader expects to see when loading information about
sites. The first six fields match very well with the  data from sampling_sites.xls.  Note that  you will be
adding one field that isn't in sampling_sites.xls: LatLongDatumSRSName. The ODM requires the datum
to be stored with the latitude and longitude  coordinates of a site.  Luckily, you can find that data in the
meatadata.txt file.  The metadata indicates that the datum used is WGS84.  There is already a record for
this datum in the SpatialReferences table of ODM (Figure 0-4). The record has a SpatialReferencelD of
3, and an SRSName value of WGS84.
                               SpatialReferencelD    SRSID   SRSName
                               0                  0       Unknown
                               1                  4267    NAD 27
                               2                  4269    NADS 3


      Figure 0-4 The WGS84 datum is among the list of coordinate systems in the ODM SpatialReferences table
                                             A-14

-------
Being a relational database, the ODM Sites table is expecting a numerical datum ID (in this case, the
number 3) to accompany each site record. However, it's easier for us humans to interpret text rather than
numbers  during the data translation process, which is why the  ODM Data Loader allows you to use
"WGS84" instead of the number 3 to refer to your datum.  The ODM Data Loader will make sure the
LatLongDatumSRSName refers to an SRSName in ODM's SpatialReferences table before finalizing the
data loading operation.  This is one of the advantages of using the ODM Data Loader for loading data - it
performs some quality control and maintains integrity of relationships between the tables of ODM during
loading.

You'll be following this procedure of referencing information in your ODM database a lot as you figure
out how to translate your data to ODM. A quick summary of the procedure is:
    1.  You need to know how  to describe some aspect of your data, such as the datum.
    2.  You check the ODM database to see if a table already exists to describe that item.
    3.  You find a match for your item in the ODM table, and use the matching term from ODM as you
       build your translated data files for eventual loading into the ODM database.

If you don't find a matching item in ODM, you can add it. If you think the item should have been in
ODM's CVs in the first place, then you can petition to have  the item added to the Master Controlled
Vocabularies at http://his.cuahsi.org/mastercvreg/cvl 1 .aspx. although updating those CVs is beyond the
scope of this workshop.

Details of how to map from the  raw file to the transformed file are below.
                                 Table 0-1 Mapping Raw Data to Sites
Transformed Field
SiteCode
SiteName
Latitude
Longitude
County
SiteState
LatLongDatumSRSName
Raw Data Field (from
sampling sites.xls)
StationID
StationName
Latitude
Longitude
County
State
Use "WGS84" from ODM SpatialReferences table
  Tip
  The ODM Data Loader ignores case in the field names, so Longitude and LONGITUDE are both valid field names.
  Tip
  If you have any trouble creating the transformed files, you might find it helpful to refer to the solution files in
  Workshop\SolutionFiles\TransformedData.


To create the transformed sites file:
    1.  Open Workshop\RawData\sampling_sites.xls.
    2.  Save the file in the Workshop\TransformedData folder as sites.csv.  Be sure to select CSV
       (Comma delimited) (*.csv) from the Save as type drop down box as you save the file.
                                            A-15

-------
ODM Data Loader Best Practice - Use CSV Files
The ODM Data Loader can work with both comma delimited (.csv) files and Microsoft Excel 2003 (.xls) files.
However, the author has found that sometimes Excel cell formatting can cause an incorrect interpretation of the
data. Therefore, the author recommends saving the transformed files as comma delimited text files, which contain
no instructions about how data should be formatted.

    3.   After saving, click Yes if prompted to keep the workbook in CSV format.

Transforming the sites file will be very easy. You'll start by renaming some fields, and then add one new
field.  Note that you must not misspell any of the field names,  or else the ODM Data Loader will not
recognize the field.
    4.   Rename the following fields:
           a.   StationID to SiteCode
           b.  StationName to SiteName
           c.   State to SiteState
    5.   Add a field called "LatLongDatumSRSName" (without  quotes) and calculate all values to be
        "WGS84" (without quotes).
    6.   Save the file.  If prompted about keeping the workbook in CSV format, click Yes.
    7.   Close the file.  If prompted about saving changes to the file, click No. You just saved them, so
        you should be fine.

Great job!  Creating the sites file was a snap since the raw data already had a sites file to begin with.  In
addition to sites, ODM also keeps a table of data sources.  Think of a data source as a group, agency, or
institution that operates monitoring sites.  Next you'll create a sources file that defines the data source
behind your data.

Creating a Sources  File
Transforming the sites file was pretty easy since the raw data included a sites spreadsheet that you could
start from. However,  a file for data sources is not present among the raw data files.  Instead, you'll find
the information you need to describe the  data source in the RawData\metadata.txt file.  This file is
largely extracted from http://www.anr.state.vt.us/dec/waterq/lakes/htm/lp longterm.htm. albeit condensed
a bit for workshop brevity.

The fields in the sources file that you will create include:
    •    Organization                 •   Address                      •  TopicCategory
    •    SourceDescription            •   City                          •  Title
    •    SourceLink                   •   SourceState                  •  Abstract
    •    ContactName                 •   ZipCode                      •  ProfileVersion
    •    Phone                       •   Citation                      •  MetadataLink
    •    Email

The information in the transformed file will actually be used to  insert data into two tables:  Sources and
ISOMetadata.   The  ODM Data  Loader  will  use  the last  five  columns to create an  entry  in the
ISOMetadata table, and relate the information back to your new entry in the Sources table.  It knows the
appropriate place to put your data in the database.
                                             A-16

-------
Details of how to map from the raw file to the transformed file are below.
                               Table 0-2               to
Transformed
Field
Organization
SourceDescripti
on
SourceLink
ContactName
Phone
Email
Address
City
Source State
ZipCode
Citation
TopicCategory
Title
Abstract
ProfileVersion
MetadataLink
Metadata
Location
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Read from ODM
TopicCategoryCV
Header
STUDY
DESCRIPTION
n/a
Header
Value
VT Department of Environmental Conservation
The Vermont Department of Environmental Conservation's
mission is to preserve, enhance, restore and conserve
Vermont's natural resources and protect human health for the
benefit of this and future generations.
http://www.anr.state.vt.us/dec/dec.htm
Jill Data
555-555-5555
i ill . data(S>champlain .com
123 Main Street
Datatown
VT
15671
P. Stangel, A. Shambaugh, F. Dunlap, (1992-2008), "Sixteen
years of water-quality data collection on Lake Champlain,
Vermont and New York, United States"
inlandWaters
Lake Champlain Long-Term Monitoring Program data
The Long-Term Water Quality and Biological Monitoring
Project for Lake Champlain has been in operation since 1992.
The project is conducted by the Vermont Department of
Environmental Conservation (DEC) and the New York State
Department of Environmental Conservation with funding
provided by the Lake Champlain Basin Program and the two
states. The monitoring network includes 15 lake stations
representing major lake segments with distinct physical and
water quality characteristics.
Unknown
http://www.anr.state.vt.us/dec/waterq/lakes/htm/lp lonaterm.ht
m
To create the sources file:
    1.  Using Excel, create a new file in Workshop\TransformedData named sources.csv. Be sure to
       save the file as a comma delimited file. Click OK if prompted about saving only the active sheet
       and Yes if prompted about keeping the worksheet in this format.

    2.  In row 1, type the following column names:
           a.  Organization
           b.  SourceDescription
           c.  SourceLink
           d.  ContactName
           e.  Phone
                                           A-17

-------
           f.   Email
           g.   Address
           h.   City
           i.   SourceState
           j.   ZipCode
           k.   Citation
           I.   TopicCategory
           m.  Title
           n.   Abstract
           o.   ProfileVersion
           p.   MetadataLink
    3.   Locate the pertinent information from the metadata file as described in Table 0-2 above and
        enter it into row 2 of the worksheet.
  Note
  If you're feeling lazy, you could just copy the data from the solution file. However, looking up information tucked
  away in metadata is very typical in the data loading process, so don't deny yourself the enriching experience of at
  least looking up a few of those fields!
    4.  Save and close the file.

Wow, can you believe we're already half-way through the data transformation process? Two down, two
to go!  Not even the strictest controlled vocabulary can contain my excitement!  Was that too much?
Maybe I should apply some database constraints to myself. Ok, seriously, I'm finished now.

At this point, you might be so fired up about data translation that you're already thinking about how to
work with that file of water quality time series, but first let's define your variables so that ODM knows
what kind of time series data you have.
Creating a Variables File
Your data represents several water quality time  series variables. Some information about these variables
can be found in the metadata.txt file. Often you'll have to do a bit of legwork to fill in the rest, in order to
fully describe your data in ODM.

As  a brief summary, measurements for nitrogen, phosphorus, temperature, total suspended solids, and
chlorophyll a were taken sporadically in time.  Some metadata about these variables can be  found in the
Variables section of the metadata text file.

The fields in the variables file that you will create include:
    •  VariableCode
    •  VariableName
    •  Speciation
    •  VariableUnitsName
    •  SampleMedium
    •  ValueType

                                             A-18

-------
    •  IsRegular
    •  TimeSupport
    •  TimeUnitsName
    •  DataType
    •  GeneralCategory
    •  NoDataValue

VariableName,  Speciation, SampleMedium, ValueType, DataType,  and GeneralCategory  must  all
conform to terms in ODM controlled vocabularies.  This can actually make your life easier because all
you have to do is pick the CV term that best describes your variable.

The  ODM Data Loader will use the abbreviation for the variable units name to match the variable to a
unit in the Units table.  This is another example of how the ODM Data Loader performs integrity checks
on the data during the loading process.

Details of how to supply values in the transformed file are shown in the table below. The example values
are for the nitrogen variable.
Transformed
Field
Variable Code
VariableName
Speciation
VariableUnitsName
SampleMedium
ValueType
IsRegular
TimeSupport
TimeUnitsName
DataType
GeneralCategory
NoDataValue
Value
TN
Nitrogen, total
N
micrograms
per liter
Surface Water
Sample
FALSE
0
Day
Sporadic
Water Quality
-9999
Notes
The term used for nitrogen, from metadata.txt
From VariableName CV
From SpeciationCV
From UnitsName field of Units table, matches units from
metadata.txt
From SampleMediumCV
From ValueTypeCV
These are instantaneous measurements made irregularly
time
through
Use "0" when instantaneous measurements are recorded
The value doesn't really matter for instantaneous data, as
it matches text in the UnitsName field of the Units table
long as
From DataTypeCV
From the GeneralCategoryCV
From metadata.txt
To create the variables file:
    1.  In the interest of time, a file named variables.CSV has already been created for you in the
       Workshop\TransformedData folder. This file has records for four out of five variables. You'll
       add the fifth variable, nitrogen. Open the file with Excel.
    2.  In row 6, fill in the values for the nitrogen variable.  For reference, see Table 0-3.
    3.  Save and close the file.

Nice job!  Three out of four input files for the ODM Data Loader have now been created.  Before you
work on the last file, let's go ahead and load the other data into the database.
                                             A-19

-------
Loading Sites, Sources, and Variables into ODM
Before a single time  series value  is loaded into an ODM database,  you must have  already loaded
information that describes the time series, e.g., the sites, sources, and variables data that you just finished
preparing.  As you load time series values, the ODM Data Loader will look for the other information
related to the time series to make sure it is being appropriately described.  If something is missing, the
ODM Data Loader will pop up a friendly message basically saying something to the effect of, "Hi there.
I see you're loading data, but you haven't told me what the  data represent or how to describe it in the
database." This is one way the data loader helps to ensure the integrity of your data.

So without further ado, let's run the ODM Data Loader  on those transformed files  that you've been
working on.
    1.  Open the ODM Data Loader.
    2.  Enter the connection information using the same information as you did for the ODM Tools.
    3.  Click Save Changes.
    4.  Dismiss the message indicating that the connection was successful.

With the connection set, you are now ready to open the transformed data files and commit them to the
database.
    5.
    6.
    Click Open.
    Navigate to and open the Workshop\TransformedData\sites.csv file. The ODM Data Loader
    previews the file (Figure 0-5). As indicated in the bottom left corner, the application has
    recognized that you are loading sites information. It does this based on the field names in the
    input file you selected.
•J ODM Data Loader
File Help
r~
00®

::\H 1 S_Training\Transformed DataNsites .csv
7
*
<
SiteCode
E^^H
19
25
34
36


fou are oading Sites
Site Name
Port Henry Segment
Main Lake
Malletts Bay
Northeast Aim
Isle LaMotte (off Grand Isle)

Open
Latitude Longitude
44.126
44.471
44.582
iA7C81GGG7
44.75616667

-73.41283333
-73.2331SSS7
-73.28116667
-73.22683333
-73.355

>


Commit File


                                  Figure 0-5 Loading sites into ODM
    7.
    Click Commit File to write the records to the database.  After a moment, the user interface is
    cleared indicating that the operation has completed.
8.   Repeat steps 5-7, this time loading sources.csv.
9.   Repeat steps 5-7, this time loading variables.csv.
10. Minimize the data loader when it has finished.
You'll use the data loader again in a moment to load the time series values. For now, take look inside the
database to see that data were loaded successfully.

                                            A-20

-------
    11. Restore SQL Server Management Studio. If you closed it, please open and connect to it again.
    12. In the MyWaterData database, open the Sites table.

You should now  see your sites in the table (Figure 0-6).  If the table is still  blank, try closing and
reopening the table to refresh it.
SiteCode
19
25
34
36
7
SiteName
Main Lake
Malletts Bay
Northeast Arm
Isle LaMotte [off. . .
Port Henry Seg,.,
Latitude
44.471
44,582
44,70816667
44,75616667
44,126
Longitude
-73.29916667
-73,28116667
-73.22683333
-73.355
-73.41283333
LatLongDatumlD
3
3
3
3
3
                        Figure 0-6 Sites information successfully loaded into ODM
Notice the LatLongDatumlD field.  The ODM Data Loader automatically used the abbreviation for the
datum that you provided in the transformed  sites  file and matched it up with the datum ID from the
SpatialReferences table in ODM.
    13. Open the Sources table. Notice that the data source was automatically assigned a SourcelD of
       1. Also notice that the Title, Abstract, etc., that you entered earlier has been replaced with a
       single MetadatalD value.  If you open the ISOMetadata table, you'll find the additional
       metadata. The data loader is flexible enough to allow you to load both sources and metadata
       information from a single file or from two separate files if you had chosen to do so.
    14. Open the Variables table to see the result of data loading.

With this information in the database, you can now load the time series values.
Creating and Loading a Data Values File
The    actual   time    series    of   water    quality   data   is   stored   in    raw   form   at
Workshop\RawData\LCM_Data.xls.  Now that you've loaded metadata about the time series into your
ODM database, you're ready to load the time series values themselves.

The fields in the transformed data values file that you will create include:
    •   SiteCode
    •   LocalDateTime
    •   UTCOffset
    •   OffsetValue
    •   OffsetUnitsName
    •   OffsetDescription
    •   VariableCode
    •   DataValue
    •   MethodDescription
    •   SourcelD
    •   QualityControlLevellD
    •   CensorCode
                                             A-21

-------
Many of these fields do not have an equivalent in the LCM_Data.xls file. The original data source may
not have conceived that their data would be fully described in an ODM database!  For  some of the
additional fields, you will find matching terms or IDs in the ODM database. For example, the ODM Data
Loader will match the  SiteCode and VariableCode to items in the Sites and Variables tables which have
already been loaded. You'll use the SourcelD from the Sources table that matches the record you added
earlier.  Similarly, you can look in the Methods and QualityControlLevels tables to get an idea of what to
enter for MethodID and QualityControlLevellD.

Details of how to supply values in the transformed file are shown below.
                             Table 0-4 Mapping Raw Data to Data Values
Transformed Field
SiteCode
LocalDateTime
UTCOffset
OffsetValue
OffsetUnitsName
OffsetDescription
VariableCode
DataValue
MethodDe scription
SourcelD
QualityControlLevellD
CensorCode
Raw Data Field
(from
LCM Data.xls)
Station
Date and Time
"-5"
Depth
"meter"
"Depth below water
surface"
Test
Result
Method
e.g., "1"
"-9999"
"nc"
Notes

Concatenate the two fields to form a single
LocalDateTime field
From metadata.txt, all values are in Eastern Standard
Time, which is five hours behind Coordinated
Universal Time (UTC), hence the value of -5 for the
UTC offset

From metadata.txt
From metadata.txt



This should be the value of the SourcelD for the source
data you entered earlier
"-9999" indicates an unknown QC level, defined in the
QualityControlLevels table of ODM
From CensorCodeCV table of ODM. "nc" is the
default value meaning "not censored."
Like the raw sites file earlier, you'll start with the raw time series file and modify it accordingly.

To create the data values file:
    1.  Open Workshop\RawData\LCM_Data.xls with Excel.
    2.  Save the file in the Workshop\TransformedData folder as datavalues.csv. Be sure to save the
       file as a comma delimited file.
    3.  Rename the following fields:
           a.   Station to SiteCode
           b.   Depth to OffsetValue
           c.   Test to VariableCode
           d.   Result to DataValue
           e.   Method to MethodDescription
                                            A-22

-------
    4.   Add a field called "LocalDateTime" and calculate all values to be the concatenation of the Date
        and Time fields. For example, in row 2, you would use the formula "=B2 + C2".
    5.   Add a field called "UTCOffset" and calculate all values to be "-5".
    6.   Add a field called OffsetUnitsName and calculate all values to be "meter".
    7.   Add a field called OffsetDescription and calculate all values to be "Depth below water surface".
    8.   Add a field called "SourcelD" and calculate all values to be the SourcelD that was generated
        when you created the source information in the database earlier.
    9.   Add a field called "QualityControlLevellD" and calculate all values to be "-9999".
    10. Add a field called "CensorCode" and calculate all values to be "nc".
    11. Save and close the file.

  Tip
  Quality control levels provide some confidence as to the amount of quality control performed on a dataset. A
  quality control  level of zero (0) indicates raw data, while a quality control level of one (1) indicates quality
  controlled data. The level of quality control for individual data values is not available for the workshop dataset,
  so you'll use a value of -9999 to indicate "unknown". For more information on quality control levels, see the
  ODM design specifications document at http://his.cuahsi.org/odmdatabases.html.
  Tip
  It's OK to leave the Date and Time columns in the file. Because those are not field names that the ODM Data
  Loader recognizes, it will just ignore the columns.


Now all that's left is to load the data values into ODM.
    12. Restore the ODM Data Loader. If you closed the program, please reopen it and connect to the
        MyWaterData database.
    13. Open the datavalues.csv file.
    14. Commit the data.
    15. When the data loader finishes, close the program and view the results in the DataValues table in
        SQL Server Management Studio.

That's it!  You've now finished loading all of the raw data into an ODM database.  If you are familiar
with SQL, you can now write queries to play  with other tools in SQL Server to work with the data.  But
just in case you are not a  SQL pro, HIS has developed software called the ODM Tools, which can be used
to query and plot graphs of data in an ODM database.

Using the Streaming Data Loader

While the ODM  Data Loader is designed to facilitate one-time loading of archived data on disk, the
Streaming Data Loader is designed to run as a scheduled task to load data from sensors operating in the
field.  These sensors typically are  connected to a datalogger  that handles the recording of data from
sensors.  Data from the datalogger are sent via a telemetry  system to a  resource on your  computer
network, usually in the form of delimited text files.  As new data are  recorded, the text files are updated
with the latest values added to the end of the text file.  An alternate scheme is for the telemetry system to
create a new  text file for each data download.  In  either case, each  row in the text file  represents
measurements at a single time stamp, and the data values for each variable are stored in separate columns


                                              A-23

-------
within the file.  A given file is typically associated with only one monitoring site. For an example of this
kind of file, see the sdl.csv file in your Workshop\Streaming folder.  More information on the Streaming
Data Loader is at http ://his .cuahsi .org/odmsdl .html.

In this portion of the exercise, you will use the Streaming Data Loader to load data for a new site that was
added to the  study. This site only measures nitrogen and phosphorus, and it has the characteristics in
Table 0-5.
                             Table 0-5 Streaming Data Loader Example Site
Property
Site Code
Site Name
Latitude
Longitude
Datum
County
State
Sensor Method
Sensor Depth
QC Level
Variables
Value
50
Sunset Point
44.71
-73.25
WGS84
Grand Isle
Vermont
Composite, unfiltered
10 meters
Raw data
TN and TP
Data for this site have already been prepared for you and are located in your Workshop\Streaming
folder.

To view the example datalogger files:
    1.   Browse to and open the sdl.csv file in your Workshop\Streaming folder.
    2.   Notice the columns for total nitrogen (TN) and total phosphorus (TP).
    3.   Close the file.

Next you will tell the Streaming Data Loader how to map from data in your file to the ODM database.

To map a data values file using the Streaming Data Loader:
    1.   Open the Streaming Data Loader Configuration Wizard by clicking Setup ODM SDL on your
        desktop, or by clicking the configuration wizard item in All Programs under CUAHSI HIS.
    2.   Click the Add button (the plus sign) at the top of the window.
    3.   In the dialog  that opens (Figure 0-7), click the browse button next to the Local File text box.
    4.   In the dialog  that opens, browse to and open the sdl.csv file in your Workshop\Streaming
        folder.
    5.   Leave the Run Every option to run once per minute. For demonstration purposes, this will
        ensure that the loader runs each time you tell it to run.
  Note - "Run Every" Frequency vs. Scheduled Task Frequency
  The "Run Every" option is like telling the data loader how long to rest between attempts to load data.  So if I set
  the "Run Every" option to 1 hour, and I run the data loader seven times within the first hour, then the  data
  loader will run the first time, but for the remaining six times it will say,  "Sorry, I'm not even going to look at your
  data files because I'm still resting."  Likewise, if I take a break for the next seven hours, the data loader won't do
                                             A-24

-------
  anything because I didn't tell it to run. If I tell it to run again on the seventh hour, it will say, "More than an hour
  has passed since I last ran, so I will look at your data files and load new data if present."

  You probably don't want to wait around every hour and then click to run the data loader. Fortunately, Windows
  can run the data loader for you as a scheduled task. You tell Windows how often to run the data loader and then
  Windows handles the rest.  Obviously, you wouldn't set up the scheduled task frequency to be finer than the
  "Run  Every" option or else the data  loader would sometimes tell Windows, "I'm still resting" and it won't do any
  work. You'll set up a scheduled task for the data loader later in this exercise.
    6.  Specify the connection string to your database as before.
    7.  Specify that there are column headers on row 1 and that data begin on row 2.
    8.  Click Next.
          Location
          »• Local Rle C:\Workshop\Streaming\sdl.csv
          "• Website
Delimiter:   'Comma Delimited >  T
Run Every:  1

Start:     8/23/2011
                               minutes
                               @  9:00:00 AM
            Please select a Database:
            Microsoft SQL Server
             Server Address:  (ocal)
            Database Name:  MyWaterData
             Server User ID:  sa
            Server Password:  •••«••••••
          Column Headers on Row tt
           Data Starts on Row tt
                                              (0 for None)
             Include Data previous to Data Values that are already in the Database.
                                                                        Cancel
                                                                                     Next
                    Figure 0-7 Specifying a file and database with the Streaming Data Loader
On the next page of the wizard, you see your data values and some more options to set.  You will tell the
data loader which column  stores your  datetimes,  and then map your TN and TP columns.  In this
example, your datetime column stores local datetimes.
    9.   In the bottom left, set the Time option as follows:
            a.  Choose the option for Local Date Time.
            b.  Select the LocalDateTime column in the drop down box.
            c.  Select -5 as the time zone. This site is five hours behind UTC time.
            d.  Leave the DST box unchecked. The  sensor at this site does not use daylight savings time.
    10. Near the bottom right, click the Add button to map a data value column.
    11. Choose TN to map the total nitrogen column and click Next.
                                                A-25

-------
You've told the data loader that the values in the TN column are time series values. Next you will tell the
data loader which site, variable, etc.,  are associated with values  in this column.  Notice that the data
loader is currently showing you a list of sites that it found in your database. You can choose from this list
or create a new site.  Since the Sunset Point site is not in the database yet, you will add it.
    12. Near the bottom right, click the Add button to add a new site to the database.
    13. Fill in the site parameters according to values in Table 0-5 (Figure 0-8) and click OK. If the
        information  isn't listed in the table, then you can leave it blank.
Add Ne« Site ^^^H
Required
Site Code
50
Site Name
Sunset Point
Latitude Longitude
44.71 -73.25
Latitude/Longitude Datum
4326-WGS84

Optional
Elevation in meters
1
T-
Local X Local Y
Positional Accuracy in meters
*
State County

Vermont Grand Isle
Comments
A
Cancel
| ok ||
                                     Figure 0-8 Adding a New Site

The new site now shows up in the list of sites and is selected.
                          Define Series
'lease Select a Site.
'ress *to Create a New Site.

>
SiteCode SiteName Latitude Longitude
19
25
34
36
Main Lake21
Malletts Bay21
Northeast Am21
Isle LaMotte (off Grand lsle)21
44.464397
44.564364
44.67012
44.735773
-73.286155
-73.259328
-73.259608
-73.384465
| 5D Sunset Point 44.71 -73.25

7
Port Henry Segment21
44.090711
-73.410854
                                      Figure 0-9 Choosing a Site
    14. With the Sunset Point site selected, click Next.
    15. In the list of variables, select TN and click Next.
                                              A-26

-------
16. Select the "composite, unfiltered" method and click Next.
17. With the only source record selected, click Next.
18. Chose the Depth below water surface offset type. Then at the bottom of the window, enter an
    offset value of 10 (Figure 0-10).  Click Next.
                    Please Select a Offset Type and Offset Value
                    Press + to Create a New Offset Tree.
                                Figure 0-10 Choosing an Offset
19. Select the Raw data quality control level and click Finish.
20. Repeat the steps above to add the total phosphorus (TP) variable.
21. Once TP has been mapped (Figure 0-11), click Finish in the Add New File window.
                                          A-27

-------
              Local DateTime
                         TN
             5/7/2011 15:30
             6/16/2011 10:50
             7/29/2011 11:30
                         D.72
                         C.64
                         D.5S
         Time [must select one option)
         O  UTC Date Time

         '» Local Date Time Local Dateline
               Time Zone Us    ^1 D DST
 Value Column    Site   Variable   Offset Type   Offset Value
Q2^^^^^|[^^HE^^^^B^^^^^BB!^^^H
TP         641          10
                                                                    Back
                                                                                Finish
                        Figure 0-11 Input File Mappings for Streaming Data Loader


The Streaming Data Loader Configuration Wizard now shows a record for the file you just mapped. The
configuration options that you specified in setting up this record are stored in an XML file located in your
AppData folder, e.g., C:\Users\USERNAME\AppData\Local\CUAHSI\StreamingDataLoader\1.1.2.  You
can edit this XML manually, which means you  could also write a script to automatically create these
configuration files. This would be useful if you had a very large network of streaming sensors and didn't
want to specify the configuration for each sensor file using the wizard.

 You can now execute the data loader by clicking the Run button in the configuration wizard.

To run the Streaming Data Loader from  the Configuration Wizard:
    1.  Click the Run button  3 to execute the Streaming Data Loader.
    2.  After a moment, view the data values in your database using SQL Server Management Studio to
       see that they have been updated.

  Tip"
  To show just the values for Sunset Point, which was assigned a SitelD of 6 in your database, append the following
  WHERE clause to the DataValues query in SQL Server Management Studio and click Execute.

  WHERE  SitelD    6


To simulate new values arriving from a sensor, you'll now append the values from sdl-2.csv into sdl.csv.
The sdl-2 file contains values for August and September, 2011.
    3.  Use a program such as Excel or Notepad to copy the values from sdl-2.csv, and append those
       values to sdl.csv.
    4.  Save sdl.csv.
                                             A-28

-------
    5.   Close sdl.csv and sdl-2.csv.
    6.   In the Streaming Data Loader Configuration Wizard, click the Run button.
    7.   After a moment, view the data values in your database to see that they have been updated.
    8.   Close the Streaming Data Loader Configuration Wizard.

The Streaming Data Loader saw that new data values had been  added to the sensor file (sdl.csv) and
updated the database accordingly. It did not reload the old data values even though they were still present
in the sensor file.

  Tip
  In the case that your monitoring and telemetry system creates a new streaming data file for each datalogger
  each time data is downloaded, you can connect to multiple local files containing data from the same datalogger
  by using wildcard characters (i.e., entering 'C:\StreamingData\ThisSite*.dat' will use all files within the
  C:\StreamingData folder that begin with ThisSite' and have a '.dat' extension). All of these files must be
  formatted exactly the same. The ODM SDL will scan each file each time the update is run for new data to load
  into the database.

Since the configuration seems to be working correctly with the Streaming Data Loader, you can now set it
up to run automatically as a scheduled task in Windows.

Update sdl.csv with even more data:
    1.   In your Workshop\Streaming folder, append the values from sdl-3.csv to sdl.csv.
    2.   Save sdl.csv.
    3.   Close sdl.csv and sdl-3.csv.

To  set up the Streaming Data Loader  to run as a scheduled task:
    1.   Start Task Scheduler (Start | All Programs  | Accessories | System Tools | Task Scheduler).
    2.   Task Scheduler tends to open in a smaller window than it should. Maximize Task Scheduler.
    3.   Click Action | Create Basic Task.
        Enter Streaming Data Loader as the name, Runs ODM Streaming Data Loader as the
        Description, and click Next (Figure 0-12).
4.
                               Use this wizard to quickly schedule a common task. For more advanced options or settings
                               such as multiple task actions or triggers, use the Create Task command in the Actions pane,
                                        Streaming Data Loader
                               Description:  Runs ODM Streaming Data Loader

                           Figure 0-12 Creating a Basic Task with Task Scheduler

    5.   With Daily selected, click Next.
    6.   Choose a time a few minutes from now and click Next.
    7.   With Start a program selected, click Next.
                                               A-29

-------
    8.  Browse to the Streaming Data Loader executable (e.g., "C:\Program Files (x86)\CUAHSI
        HIS\ODM SDL 1.1.2\ODMSDL.exe") and click Next.
    9.  Click Finish.

After the time has passed for which the task was scheduled, check your database to see if the new values
were appended.  Once you've seen  that the task completed successfully,  you may want to adjust the
frequency of the task. Even though the  smallest time period available was Daily when creating the task,
you can specify a shorter time period  in the properties window for a task once it has been created.

To refine the frequency of a scheduled task's execution:
    1.  In Task Scheduler, click Refresh to make sure your task is visible.
    2.  In the Active Tasks section, double-click Streaming Data  Loader (Figure 0-13).
      Task Scheduler
    File  Action  View  Help
               B
      Task Scheduler (Local)
       fj. Task Scheduler Library
                               EH HotStart (last run succeeded at,,,
                               EB MP Scheduled Signature Updat...
                                      ~n;t~r fr, mn'mr,*
                                           ill
                              Active tasks are tasks that are currently enabled and have not

                              Summary; 27 total

                               Task Name
                               ConfigNotification
                              I Streaming Data Loader
                               ScheduledDefrag
                               KernelCeipTask
Next Run Time
8/23/201110:00:00 AM
8/23/20114:44:03 PM
8/24/2011 2:30:53 AM
8/25/2011 3:30:00 AM
                            Last refreshed at 8/22/2011 4:41:54 PM
Actions
Task Scheduler (Local)
   Connect to Another C,,,
   Create Basic Task...
•>- Create Task...
   Import Task,,,
[^ Display All Running T,,.
g Disable All Tasks Histo...
   AT Service Account C...
   View
JGj Refresh
B Help
                                    Figure 0-13 Selecting a Scheduled Task
    3.  Click Action | Properties.
    4.  In the properties window that opens, click the Triggers tab.
    5.  Select the Daily trigger and click Edit.
    6.  Place a check next to Repeat task every. Leave the default of 1 hour selected.
    7.  Choose a duration of Indefinitely from the drop down list and click OK (Figure 0-14).
                                                  A-30

-------
               Edit Trigger
Begin the task:  On a schedule
 Settings

    Onetime

  ® Daily

    Weekly

    Monthly
                              Start:  8/22/2011  D'   4:44:03 PM  jjj D Synchronize across time zones
                               Re^ur every: 1    days
                 Advanced settings

                 O Delay task for up to (random delay):
                 g] Regeat task every:  1 hour       f            for a duration of:  L

                     HH Stop all running tasks at end of repetition duration

                    Stop task if it runs longer than:
                            3 days
                 D Expire  8/22/2012 I    | 4:46:58 PM


                    Enabled
                                           Synchronize across time zones

                                                                             Cancel
                                  Figure 0-14. Refining the Task Trigger

    8.  Click OK to close the properties window.

Congratulations! You are now a Streaming Data Loader expert!  In practice, you would schedule your
task to run as frequently as your data files are updated.

With data loaded into the database, you will now view the data using ODM Tools.

For Advanced  Participants
ODM Data Loader from the Command Line
To   learn    more    about    the    ODM    Data    Loader,    read   the    documentation    at
http://his.cuahsi.org/odmdataloader.html. Did  you know that the ODM Data Loader can be scripted?
Open the command prompt and give it a shot.

Work that SQL Magic
Familiar with SQL?  Try out some queries to explore your database, or experiment in creating table
views. Or, just look around the various tables to get a better sense of what's in ODM. More on ODM can
be found in the documentation at http ://his .cuahsi .org/odmdatabases .html.
                                                A-31

-------
                                  Working with ODM Data

Now that data are loaded into an ODM database, how do we  analyze it?  Lucky for us, the ODM Tools
are specifically designed to query and visualize data within an ODM database.  You  will use the ODM
Tools to examine the contents of the database that you have just created.

To examine your data with the ODM Tools:
    1.  Open the ODM Tools. Notice how the tools automatically connect to the last database you
       worked with.

The ODM Tools application opens with three tabs visible: Query, Visualize, and Edit. The Query tab is
selected by default.  On this tab, you specify various filters to search for time series in the ODM database.
Take a moment to review the query options.

Our data are fairly uniform in nature, i.e., they all have the same  data source, data type, sample medium,
etc. This limits the kinds of interesting queries we can perform with our data. But we can still query by
site, variable, number of observations, or time period.

Let's query for temperature data as described below.
    2.  Click the check box to Query by Variable (Figure 0-1).  This enables you to choose a variable
       from the variable list.
    3.  Select Temperature in the variables list by left-clicking on it (Figure 0-1).
    4.  Click the Query button in the bottom right corner (Figure 0-1).

The results of the query are shown at the bottom of the application window. You should see several items
there, where each one represents a time series of temperature at a particular site.

  Tip
  You can resize the ODM Tools window to show more query results at the bottom.


    5.  Find the time series of temperature for Main Lake (it's probably the first one in the list).  Right-
       click the data series (Figure 0-1).  In  the context menu, notice that you have options for plotting
       graphs, editing the data, viewing and exporting metadata, and exporting the data series itself as
       comma or tab delimited text.
    6.  In the context menu, click to View MetaData (Figure 0-1).
                                             A-32

-------
         ODM Tools
        File Edit Took  Help
                               Export Single Data
                               Expert Single MetaOita

                               Select for Group Ejport
                               Select AH
                               Select None
                              Figure 0-1 Querying data with the ODM Tools

Metadata for the time series are extracted from the database and transformed to XML (Figure 0-2)
                     
                         Temp
                         Temperature
                         Not Applicable
                         
                              degree celsius
                              Temperature
                              degC
                         
                         Surface Water
                         Field Observation	
                       Figure 0-2 Metadata exported from ODM using the ODM Tools

    7.   Right-click the data series again, and this time click to Export Single Data.
    8.   Save the file to disk at a location of your choosing.
                                              A-33

-------
    9.   When the data export is complete, dismiss the message box and open the data file. There are
        your data again, this time with internal identifiers and other pieces that the ODM Data Loader
        filled in for you when loading data.

  Tip
  You can select additional items to be included in the data export by clicking Tools |  Options.

    10. Close the metadata file and exported data file when you are finished  looking at them.

Now let's plot a graph of the data.
    11. Right-click the Main Lake data series, and click Plot.

You are brought to the Visualize tab, where the ODM Tools plot a graph of the data. Information about
the data series you are plotting is  shown at the bottom of the application window.  The plot has a very
"spiky" nature to  it. Let's take a closer look to see what's going on.
    12. Using the left mouse button, click and hold the mouse button down to draw a box on the plot
        around one of the "spikes" in the plot to zoom in to that section (Figure 0-3).
       'M Tools
   He Ed*  Tools  Help
       Qwxy

     Time Scraa  Probably [ Haagaai \ Batt/Wtefca
                             19 -
       It •
     !»•
     !'
     £ n -
     i
     * * +
                                   BM
     Da« To Vaualie
     Sla  1
           Temp -

     Select 9 D*> $e«3 tg Uew

•-i Use ceracred dala lULmmaiy
-! ttnabc*
aatabct
BCTObwvationi
SOfCwvKXWjOfc*
708
0
MhnxllcMean 1133
G*om«t.x Mean
9.781
MCKBTUTO t". /
Mmun:
Standard D9V»at»n :
CcHfictarC ri Vanatwn
27
6013
3621

Pert?enMM ICft
25*
(Median) 50%
7K
90X
5
e
92
17
20.5
     General Q**jor/   SpscaUwi  Vwdle Unu  T«ws Scwxrt  TmalME  SnvteHeduii   VW
     WaterQu*y    N*J^f*cjWe dcgreecebut        0  diy     Surface Walff  FxMC
Start CM*;
End DM*
          7/W&
                                                                             Sf 17/2007
                                                                              UWalePfat
                           Figure 0-3 Left-click to zoom in on a plot in ODM Tools

Notice that several data points appear to occur at the same point in time (Figure  0-4). This is because
several measurements are taken at the same time, but at different depths within the water body.
                                              A-34

-------
      •;!i ODM Tools
      File  Edit Tools Help
          Query
                                                                   l-llnllxl
Visualize
                                Edit
        Time Series
                Probability Histogram Box/Whisker
                                19-MainLake
        Data To Visualize
        Site: 19-Main Lake
                                                                 v
        Variable:
              Temp • Temperature
        S elect a D ata S eries to View :
         General Category   Speciation  Variable Units Time Support  Time Units Sample Medium
         Water Quality    Not Applicable degree celcius        0  day     Surf ace Water  Fiel
Summary Plot Options
DUse censored data in
statistics.
Statistics
8 Of Observations :
B Of Censored Obs. :
Arithmetic Mean :
Geometric Mean :
Maximum :
Minimum:
Standard Deviation :
Coefficiant of Variation:
Percentiles 1 0%
25%
(Median) 5DK
75%
30%
Date Range
summary
70S
0
11.33
9.781
24.7
2.7
6.013
36.21
5
6
9.2
17
20.5



                                                                     End Date:
                                                                             9AI7./2007
                                                                              Update Plot
             Figure 0-4 Plot of temperature data measured at different depths below the water surface
    13. Experiment with the other charting capabilities. You can change plot options, view summary
        statistics, show a probability plot, show a histogram, and show a box/whisker plot.  You can also
        interact with the plot by clicking on it. Left-click and drag a box to zoom in, and right-click to
        zoom out or copy and print the chart.

Finally, let's briefly take a look at the Edit tab.  You won't be doing any editing for this workshop, but
you can at least get a sense of what you can do with the ODM Tools.
    14. Click the Query tab.  Your previous results are still visible at the bottom.
    15. Right-click the Temperature time series at the bottom, and click Edit.

This brings you to the  Edit tab and pulls up the time series you selected.  Look at the options on the right.
You can change individual values or apply a data filter to perhaps look for outliers. Let's try that.
    16. Click the option to set a Value Change Threshold (Figure 0-5). This option is useful for locating
        values  that differ greatly from the other values around it, which could indicate a sensor
        malfunction, human error in observation, or some other anomaly.
    17. Type a value for the Value Change Threshold  that will capture some of your data, just to see
        what happens. For my data, I used a value of 10 (Figure 0-5). Press ENTER after you have typed
        in your value to confirm it. The Apply Filter button should now be enabled.
    18. Click Apply Filter. Any values  that match the filter will be highlighted in red in the plot (Figure
        0-5). This shows us values that we may want to check for quality assurance.
                                               A-35

-------
19-MiinLKktI
14 •
D
£20-
"
i ii-
». -'
I
|S 50 •
f S •


3
i

\
i


!


.-•'





\


/

J
11
a
i
IN


,


i



1-1 V 1W» 1533 3901 3094 Wfl
CUt>
4
•

•
BataToPtot
S4* . 19 - Mar, L*.*2
VMuBlD tWaV*« Vaijtf Accuracy LK* Pale and T« »
TOM
10*45
7015
2,7
3,1
68
! 5Wl99(211;45AiJ


S/7/19S21215P
V20/1W2250P
• 10447 - ' .?o?H
TOTS 144 '3S2ii$nl
10*46
1 7017
i:?n
701S
7019
*2?
8562

45
9.1
9.1
7.6
137
82
6,7

'









DemrtNewCWASsne* [|gj [|
«/VW$ai:ttP»
11*1993 nflOJ"
1VSn9S211JJ^
5^^1994 1i45F
6*1995 11:15 A
6*1996 1115 A
6^1995 11:15 A,
»
3HHH
           Temp - Teetperatun

     Select e 0*i Senes to E*
      General Csiegwy   Speoatan  VaraMe Uhtt  TewSuppcM
      Water QuaMy    Not ^pteabie dagwcebui        0  day
                                                             Before  D8/17/2D07 W. 15 AM
                             Figure 0-5 Applying a data filter with the ODM Tools
Lastly, let's take a look at options for deriving new data series.
    19. Click Derive New Data Series.

In the window that opens, you'll see options for applying an algebraic equation, using a daily aggregate
function, creating attributes to describe the output, and more (Figure 0-6).  These functions are useful for
turning  your raw information  into  knowledge products, like  when gage height  data for rivers  are
converted to streamflow.
                                                 A-36

-------
        ! Derive A New Data Series
        Derivation Information
         Derive Method
          O Create a Quality Controlled Data Series for Editing

            Derive using a Smoothing Algorithm
           Smoothing Window:        minutes;
                                                                                              - n x
                  Method Description
                     Automatically generate a Method Description
                     Select an enisling Method Description
          O Derive using a Daily Aggregate Function
                                         Average
                                                                   Create a new Method Description
          O Derive using an Algebraic Equation
          y = r  "oi + r ~oi«+ r   ^
        Data Series Attributes
        Site:
         Variable
         Variable: |
          Time Support
           Value: I
                       Units:
    Units:
Source
Organization:
                               Specialion:

                                Citation:
         Value Type:

         Data Type:
                                               Source Description:
          Quality Control Level

          Method: F
G eneral Category:

Sample Medium:
                                                                    Derive New Data Series
                                                                                          Cancel
                           Figure 0-6 Window to derive a new data series in ODM Tools

In your case, because the quality control level is unknown, you'd first have to create a quality controlled
data series for editing before the other options would be enabled. However, editing data series is beyond
the scope of this introductory workshop, so we'll leave the ODM Tools for now.
    20. Click Cancel to close the Derive A New Data Series window.
    21. Close the ODM Tools.

Congratulations!   You are now an expert in loading, querying, and visualizing data in an ODM database.
For some scientists, this may be  all that is  desired:  a means  of  storing and working with hydrologic
observations data.  However, there is often merit in sharing data with a larger community.  That's where
the WaterOneFlow Web  services and HIS Central come in.

Now that you have prepared an ODM database, you will publish the data with a WaterOneFlow Web
service to make it accessible online.

For Advanced  Participants
Continue playing around with ODM Tools.  Try some new queries and deeply explore  the options for
plotting graphs.
                                                   A-37

-------
                     Publishing an ODM Database with WaterOneFlow
As far as your local setup goes, you are now ready for action with your ODM database.  But what if you
want  to  share the data with others  online?   That's  where WaterOneFlow  Web services  come in.
WaterOneFlow  defines a standard set of queries and a standard output format for accessing  data,
regardless of whether the data are accessed internally from an ODM database, some other database, or
even through another website.  Additionally, WaterOneFlow provides a layer of security  over  your
database which makes it less susceptible to hackers than exposing the database itself with public access.

For those who have their own database format for storing data, they must write their own WaterOneFlow
Web service to publish their data in HIS. However, you're in luck!  HIS includes a free WaterOneFlow
Web service specifically designed to work with an ODM database. This means you don't have to write a
single line of programming code for your service. You just have to set it up on your computer and tell it
to talk to your ODM database.

In this portion of the  workshop, you'll download and  install a WaterOneFlow Web service to work with
your ODM database.  The main steps are:
    1.  Create a SQL  Server account that your Web service will use to access your database.
    2.  Install the WaterOneFlow Web service on your computer.
    3.  Configure the service.
    4.  Check the result.

Creating the Webclient SQL Server Account
The WaterOneFlow Web service uses a SQL Server account to connect to your ODM database.  This
account should have fewer privileges than the one you used for ODM Tools because WaterOneFlow only
needs to read data from the database.  It does not perform any data creation, updating, or deletion.  It is
also the public interface to your data, so restricting what the service can do to your database is generally a
good idea. In the steps below, you will set up the SQL Server account for WaterOneFlow to use. For this
workshop, the account is  named "webclient." However, you are  free to name the account whatever you
choose in your production environment.

To set up the webclient account in SQL Server:
    1.  If it is not already open, open and connect to SQL Server  Management Studio.
    2.  In the Object Explorer on the left, expand Security | Logins (Figure 0-1).
                                  Object Explorer
                                   Connect'
                                     E) Cj Databases
                                     B Q Security
                                        E) tJl Logins
                              Figure 0-1 Accessing logins for SQL Server

    3.  Right-click Logins and click New Login.
    4.  Type webclient as the name (Figure 0-2).
    5.  Select the SQL Server authentication option (Figure 0-2).
                                            A-38

-------
    6.  Type webclient as the password (Figure 0-2). Normally you'd want a more secure password, but
       for the purposes of this workshop, we're keeping things simple and easy to remember.
    7.  Uncheck the option to Enforce password policy (Figure 0-2).  Again, we're choosing to keep
       things simple for the workshop. You may want to enforce the SQL Server password policy for
       your own installation.
         _J login - New
         Selct 1 a pjye
         .^General
         If ServwR**
         Jf UwrMappng
         ** Sco/atfe*
         *" Statin
                                a SQL Server aiiherlieaum
                                  Paiswwd           . ii«.«t>.
                           ^



                             Figure 0-2 Creating a webclient SQL Server login
  Note
  Suppose you've already published one ODM database with WaterOneFlow, and have now created a second
  database that you want to publish. Since the webclient account was created when you published the first
  database, you do need to repeat those steps. You can start from the next step with User Mapping.
With the webclient login created, you will now add the MyWaterData database to the list of databases that
the login can access.
    8.   In the Select a page pane on the left, click User Mapping (Figure 0-3).
    9.   Place a check next to MyWaterData and click OK (Figure 0-3).
                           Figure 0-3 Allow the webclient to access the database
You're almost finished. The last thing to do is allow the webclient account to perform Select operations
on the database.  The Web  service needs this in order to properly query the database.  You allow this
permission on the properties page for the database itself.
    10.  In the Object Explorer on the left, expand Databases until you see your MyWaterData database.
    11.  Right-click MyWaterData and click Properties.
                                              A-39

-------
    12.  In the Select a page pane on the left, click Permissions (Figure 0-4). You should see the
        webclient account in the list of users for the database.
    13.  In the list of permissions, scroll to the bottom and place a check in the Grant column for Select
        (Figure 0-4).
    14.  Click OK to close the dialog (Figure 0-4). You may also close SQL Server Management Studio.
         _i* Change Tracing

         j? Bsended Piwssties.
Se
-------
The zip file  contains several  files  related to WaterOneFlow.  The Web service  application is in the
WebApp folder. You'll copy this folder to the default folder for web applications on your computer.
    5.   From the zip file, copy the WebApp folder to C:\lnetpub\wwwroot.
    6.   Rename the WebApp folder to MyDataService. You can name the folder whatever you want,
        but we'll use MyDataService for this exercise.

Now you'll use Internet Information Services (IIS) Manager to configure. This is a program you can use
to manage web applications on your server. To isolate WaterOneFlow from your other web applications,
you'll set up an application pool for WaterOneFlow to run in.  This pool will use the Classic managed
pipeline mode since this is what WaterOneFlow was built to use.

To  set up an application pool:
    1.   Open Internet Information Services (IIS) Manager.
    2.   In the Connections box on the left, expand the tree view until you see Applications Pools.
    3.   Right-click Applications Pools and click Add  Application Pool (Figure 0-5).
                             Connections
                                               AUSTIN
*ij||  CRWR-W

 Rtan
                                 Figure 0-5 Adding an application pool

    4.   In the Add Application Pool dialog, specify these options (Figure 0-6):
           a.  Name: WaterOneFlow
           b.  .NET Framework version: .NET Framework v2.0
           c.  Managed pipeline mode: Classic
                                 Name:
                                 WaterOneFlow

                                 ,NET Framework version;
                                 .NET Framework V2.0.50727
                                 Managed pipeline mode:
                                 Classic
                                   Start application pool immediately
                                              OK
                                                         Cancel
                            Figure 0-6 WaterOneFlow application pool settings

    5.   Click OK to close the dialog.


  Tip
  If you are creating multiple instances of the WaterOneFlow web services on your server because you have
                                              A-41

-------
  multiple ODM databases, you can reuse the WaterOneFlow application pool that you just created for those
  services.
Now you are ready to install the service and configure it.

To install the WaterOneFlow Web service:
    1.   In IIS, expand the tree view to Sites | Default Web Site |  MyDataService.  MyDataService is
        currently just a folder that IIS has found.  Next you'll tell IIS that this folder is actually a web
        application.
    2.   Right-click MyDataService and click Convert to Application.
    3.   In the Add Application dialog, click Select (Figure 0-7).
    4.   In the Select Application Pool dialog, choose WaterOneFlow (Figure 0-7).
                  Site name  Default Web Site
                  Path:      /
Alias;
MyOMa Service
Example ulej
Phyjitjl p*th:
                 Psit-lhrough authentication
                  Connect a*-.
                               Ted Settings.-
                                         Application pool
                                         DefaultAppPool
                                DefavltAppPool
                                Clastic .MET AppPcol
                                ASP.NETv4J)
                                                                 OK
                                                                             Cancel
                           Figure 0-7 Selecting the WaterOneFlow application pool
    5.  Click OK to close the Select Application Pool dialog.
    6.  Click OK to close the Add Application dialog.

With the service  installed, you'll next configure the service to tell it how to  connect to your ODM
database.

To configure the  WaterOneFlow Web service:
    1.  Left-click MyDataService to select it (Figure 0-8).
    2.  Under the ASP.NET icon group in the middle section of the IIS Manager window, double-click
        Connection Strings to open the connection strings editor (Figure 0-8).
                                                A-42

-------
t -« CRWR-WHTTJAKEW {AUSTtt
    ^J Application Poob
  * •_ Sues
    ^ A Default Web Site
                   M>-DalaS*rvke
               |  '1 wtleroncflow
or
 fifcee
/MyDataService Home
             -SBfio
                                                                      Gwupt^
                                   ASP.NET
                                    .NET

                                              .MET    .NET Error    .NET    .NET Profile

                                                      '5^
                                   .NET Roles   .NET Trust  .MET Useri  Apphtationl Connection
                                             If .f!'              Settings  I   Stnngs

                                  MjthineKey  P*g«and  Prw.id«rs  Scisign SUl* SMTP E-meil
                                            Controls
                                   DS

                           Figure 0-8 Accessing service connection strings
3.  Double-click ODDB to set the database connection string.
4.  Set these parameters in the Edit Connection String dialog:
        a.   Choose SQL Server.
        b.   Server: (local)
        c.   Database: MyWaterData
        d.   Credentials: Choose to Specify credentials.
5.  Click Set to set the SQL Server credentials.
6.  Set these credentials (Figure 0-9):
        a.   User name: webclient
        b.   Password: [same as the webclient password you established earlier]
        c.   Confirm password: [same as above]
                             User name:
                             webclient
                             Rassword:
                             Confirm password:
                                              OK
                                                          Cancel
                      Figure 0-9 SQL Server credentials for WaterOneFlow to use

7'.  Click OK to close the Set Credentials dialog. Your Edit Connection String dialog should now look
    similar to the one in Figure 0-10.
                                            A-43

-------
                  Name:

                  « SQLServer
                               ODDB
                    Server:    (local)\SQLb
-------
       0-11). These numbers will help keep your service distinct from that of other participants in this
       workshop.
                              Name:
                              network

                              Value:
                              LakeChamplain27
                                                 OK
                                                           Cancel
                                 Figure 0-11 Setting the network name

    13. Click OK to close the Edit Application Setting dialog.
    14. Similarly, change the Value of vocabulary to VTDEC.

With the  service configured, you are now ready to test it.

Testing  the Web Service in a Web Browser

You'll test the service by exploring it in a web browser and by using HydroExcel.

To test the service in a web browser:
    1.  In a web browser, navigate to http://localhost/MyDataService.

  Tip"
  The keyword "localhost" tells your browser to look on your own computer for the web page that you are
  attempting to access.


The Web service displays this web page when accessed with a browser. Near the top of the page is a link
for Database Test Page that you can use to test the service's connection to your database.
    2.  Click Database Test Page.

In a  moment, you should see some example  data from your database (Figure 0-12). If you see an error
message, you can use the message to track down the cause of the error.
This should display up to 1 0 sites
SiteCode
19
25
34
36
7
SiteName
Main Lake2
Malletts Bay2
Northeast Arm2
Isle LaMotte (oft" Grand Isle)2
Port Henri' Segment2
                              Figure 0-12 WaterOneFlow database test page
Excellent! The Web service should now be up and running. Let's test the service using HydroExcel.
                                             A-45

-------
Testing the Web Service with HydroExcel
At this point in the data publication process, you should be able to give someone the URL to your Web
service, and they should then be able to query data from it using any software that communicates with
Web  services.    CUAHSI  HIS  includes  free  software called HydroExcel  that lets  you  access
WaterOneFlow Web services from within a Microsoft Excel spreadsheet. In this portion of the workshop,
you'll use HydroExcel to extract data from your Web service.

To test the Web service with HydroExcel:
    1.   In a web browser, navigate to http://his.cuahsi.org.
    2.   Under Quick Links on the right, click HydroExcel.
    3.   Click the link to download the Microsoft Office 2007/2010 version.
    4.   For this exercise, save HydroExcel to your Desktop.
    5.   Open the file when it has finished downloading.  If prompted, click to enable editing and enable
        content.
  Note
  HydroExcel requires the free HydroObjects software to be installed. This software was installed on the workshop
  computers prior to the workshop. If you are working from a different computer, you can find the installation file
  at http://his.cuahsi.org/hydroobiects.html.
The worksheets in HydroExcel call methods from a WaterOneFlow Web service to query data and write
the result into the spreadsheet. For an in depth tutorial on HydroExcel, see the  software manual on the
HIS website.  For this workshop, we'll just do a quick test to download a list of sites and what variables
they have, and also a time series for a given variable at a given site.
    6.  Activate the Data Source worksheet. (Click "Data Source" at the bottom of HydroExcel.)

On the Data  Source worksheet,  you tell HydroExcel which Web service  you want to work with by
inputting the  URL address of the service next to the  box that says "WSDL Location".  Some URLs are
already listed in the spreadsheet, but you will have to locate your own URL that points to the Web service
you just created.  Since you're still using the same computer on which the Web  service is installed, you
could just use the localhost URL.  However, let's use the actual IP address of your computer since that's
how other people will be connecting to it.
    7.  Locate your IP address. You can find your IP address from the command line:
           a.  Click Start.
           b.  Type cmd.exe and press ENTER.
           c.  In the window that opens, type ipconfig and press ENTER.
           d.  Write down the numbers that appear to the right of IP Address.  These numbers look
               something like "129.116.104.171". Note that this address will be different for each
               computer.
           e.  Close the command window.

From this point forward, if you see the text "[YOURJP]", please replace it with your IP address. For example, if you
are asked to enter the address "http://[YOUR_IP]/MyDataService", and your IP address is 129.116.104.171, please
enter "http://129.116.104.171/MyDataService".
                                             A-46

-------
    8.  In a web browser, navigate to http://[YOUR_IP]/MyDataService.
    9.  Click the link for the 1.1 version of the service.
    10. Click the link for Service Description. This takes you to the WSDL for your Web service.

  Tip
  The service WSDL (Web Services Description Language) is where your Web service defines what it can do and
  how programs can interact with it.  It is designed for programs to read, so don't worry if you can't make sense of
  it. When a program accesses your Web service, it will read the WSDL and know exactly how to send requests to
  it, and will also know what format of output to expect back.
    11. From the address bar of your web browser, copy the URL of the WSDL and paste it into the cell
       next to the cell that says WSDL Location on the Data Source worksheet of HydroExcel.
    12. Activate the Series Catalog worksheet.
    13. Change the option to Create and open KML file after download to TRUE.
    14. Click Get Series Catalog.

After a moment, your spreadsheet is updated with information about the sites and variables measured at
the sites.  Also, Google Earth opens to show the site locations.
    15. Browse around Google Earth to see your site locations.  Click on placemarks for sites to see
       information associated with the sites.
    16. Switch back to HydroExcel.
    17. Dismiss the message box indicating that the download is complete.

Take a look at the information in the Series Catalog.  You not only get information about the location of
sites in your database, but also the variables measured at those sites.  Notice that the start date, end date,
and number of records of time series observations are included with  each variable.  You'll use  this
information to  download a time series for one of the sites.
    18. Locate a site and variable for which you'd like to download time series data. For my
       screenshots, I'm going to get temperature data at Main Lake, so you may want to do the same.
    19. In the Series Catalog worksheet, right-click anywhere on the row for the site and variable that
       you want.
    20. In the context menu that opens, point to HydroExcel, and  then click to download the time
       series.

You are brought to the Time Series worksheet, where HydroExcel filled in the parameters to make the
request for the  data, called the Web service, and populated the result in the spreadsheet.
    21. Dismiss the message box indicating the download  is complete.

Notice there are several measurements  taken at a given datetime, but at different depths below the water
surface (Figure 0-13).
                                            A-47

-------
Get Values Options
Site Code/Location
Variable Code
Start Date
End Date
Mlc~ore NoData Value
LakeChamFlain27 19
VTDECTernp
5/7/1992 11:45
9/17/200710:15
Get Values
DateTime
5/7/1992 11 45
S/7/1992 12:15
5/20/1992 14:50
5/20/1992 15:20
Value
	 I Variable Name Temperatur
I Units degree celc
T~ SampleMedium Surface WE
^^^^^T Site Name Main Lake
Latitude 44.471
Longitude -73299166
^^^^1 Data Source http://129.1
Obtained 5/15/2009 1
Ignore NoDataValue TRUE
Qualifier Offset Offset Unit
2-7 2 meter
3.1 100 meter
6.8 2 meter
4
103 meter
                          Figure 0-13 Time series returned from the Web service

Kudos to you!  You've come a long way, and now have successfully made your data available online
using WaterOneFlow.  Do you feel the magic in this moment?  I certainly do. You're almost finished
with the entire data publication process.  The last major step is to register your service with HIS Central
so that others can discover it.

For Advanced Participants

Fun with Pivot Tables
Have you worked with pivot tables in Excel before?  They  are nifty.  Suppose you want to show
temperature data 50 feet below the water surface in the lake.  On the Statistics and  Charts worksheet, you
can drag Offset to the Report Filter box, and then choose "50" after clicking on the  drop down arrow next
to Offset in the Pivot Table Field List.  Experiment with other ways of summarizing the data with pivot
tables.

By the way, pivot tables and charts are a native part of Excel functionality, and aren't just  limited to
HydroExcel.  You may find pivot tables useful in other spreadsheets that you've created.
                                             A-48

-------
                           Registering Your Service at HIS Central
Although your Web service is now online, there's still the question of how people will find out about it so
that they can use it.  That's where HIS Central comes in.  HIS Central is a special server maintained by
the HIS team  which  keeps a  catalog of  WaterOneFlow  Web  services.   When  you publish  a
WaterOneFlow Web service, you should register it with HIS Central to make it discoverable. More than
just a listing of WaterOneFlow services, HIS Central performs these functions:
    •   Provides detailed information about your service, including contact information, abstract, and
        areal extent of your sites.
    •   Supports translation from your variables to an ontology of common hydrologic concepts.  This
        facilitates easy searching for variables  in your service, especially by those who aren't familiar
        with what your service has to  offer.
    •   Maintains a catalog of sites and variables available in all registered services, enabling fast
        searching for data from multiple data providers.

The registration process involves these three key steps:
    1.  Add an entry at HIS Central for your Web service.
    2.  Tag variables from your service to the  hydrologic concept ontology.
    3.  Check that your sites show up in HydroDesktop.

Once you've completed these steps, the entire publication process will be complete!	
Workshop Links for HIS Central and HydroDesktop
Rather than register your service at the official HIS Central (http://hiscentral.cuahsi.org), you'll be using a special
sandbox version set up just for workshop  use at http://water.sdsc.edu/hiscentralsb/. That way, we don't clutter
the production system with datasets that  we're just using for demonstration purposes. When you're ready to
contribute to HIS with your own data and  services, then please use the official HIS Central link.

Adding Your WaterOneFlow Web Service to HIS Central
For this portion  of the workshop, you will add  an entry for your Web service with HIS Central.  You will
not only give HIS Central the URL to your service WSDL, but also supplementary information about the
service.
                                              A-49

-------
To register a WaterOneFlow Web service with HIS Central:
    1.  In a web browser, navigate to the workshop HIS Central at http://water.sdsc.edu/hiscentralsb/.
    2.  At the top of the page, click the link to Login.
    3.  Log in with your credentials.  If you currently in the HIS workshop, your instructor will provide
       you with credentials. Otherwise, feel free to register your own free HIS Central account.
    4.  Once you are logged in, near the top of the page, click the link to Add Data Service.
    5.  Add service details (Figure 0-1):
           a.  To  help distinguish your service from others in the workshop, the service title should be
               unique among all registered services. Therefore, please enter "Lake Champlain
               Workshop" (without quotes) and your computer number as the Service Title, e.g., Lake
               Champlain Workshop 27.
           b.  Input the Network Name that you assigned to your service earlier, e.g.,
               LakeChamplain27.
           c.   Input your WSDL address,  i.e.,
               http://[YOURJP]/MyDataService/cuahsi_l_l.asmx?WSDL, as the Service WSDL.
           d.  Click the link I have read and agree to the Data Service Agreement to see the data
               service agreement. The link opens in a new window.
           e.  Be  sure the box is checked next to the link for the data service agreement.
Register Data Service
Service Title: Lake Champlain Workshop 27
This will appear atop the page associated with your service.
Network Name: LakeChamplain27

This is a unique code associated with your service. The network name used when configuring your
webservice is appropriate. This value is unique across this system.
Service WSDL: http://l29.H6.l04.l7l/MyDataService/cuahsi_l_0.asmx?WSDL

Enter the full web URL to your webservice WSDL. This value is unique across this system.
W 1 have read and agree to the Data Service Agreement
Next»
                       Figure 0-1 Registration page for a new service at HIS Central
    6.  Click Next. This brings you to the Data Service Details page.

You now have an entry for your service in the system at HIS Central. However, there are still some steps
to take before you make it public. Let's continue editing details of this service.
    7.  About half-way down the page, click Edit Details.

This brings you to a page that lets you edit the description of your service. For the workshop, we'll only
be adding a few items, but if you are a fast typist, feel free to add more!
    8.  Add some info to the details page, such  as the following:

                                            A-50

-------
           a.   Organization: HIS Workshop
           b.   Name: Workshop Participant
           c.   Citation: Doe, Jane (2011). Water quality monitoring data from the Lake Champlain
               Long-Term Monitoring Program. VT Department of Environmental Conservation.
           d.   Abstract: Here's a short abstract.
  Note
  When you register a service at HIS Central, please use the following citation format:
  Name (Year). Type of Data. Network. Collecting Organization.
    9.   Be sure to check the box next to Is service public. If you don't check this box, sites from your
        service will not show up in applications like HydroDesktop.
    10.  Click Update.
This brings you back to the Data Service Details page.
Now let's add images that will be associated with your service.  You'll add a logo for your organization
that users will see when they view details for your service at HIS Central, and a small icon that represents
a site location that will appear in the HydroDesktop map.
    11. Change the images for your service.
           a.  Click Change Images.
           b.  For the organization icon, browse to the Workshop\lmages folder on your computer,
               and open the Organizationlcon.gif file.
           c.  Click Upload to upload the image once you have located it. The web page will be
               updated with your image once the upload is complete.
           d.  For the map icon, browse to and add the Maplcon.jpg file, located in the same directory
               as the organization image.
           e.  Click Upload to upload the image.
           f.   Click Back.

At this point, you could add additional contacts, links, and descriptions, which would show up as part of
your service's details. This is not required for the workshop. Therefore, your data are now ready to be
harvested.

What's harvesting, you ask?  Recall that HIS Central keeps a catalog of all of your sites and variables,
which enables fast searching  across all registered  services.  HIS Central creates this catalog by calling
various methods from your WaterOneFlow Web  service.  This is called data harvesting.  When  you
request a data harvest, an HIS Central administrator is notified and will trigger a harvest of your data. To
keep the  system  from  being bogged down  by numerous harvest  requests,  only  an  HIS Central
administrator can trigger  a  harvest.   Harvesting  is essential, because HIS  Central must know what
variables are  available in your service before you can tag those variables to concepts in the hydrologic
ontology.

Normally you would now click the button to Request Data Harvest.  However, rather than wait  for an
administrator to trigger the harvest, you will trigger the harvesting yourself.  Your user account for the
workshop has been given sufficient privileges to trigger data harvests.
    12. Trigger a harvest for your service.

                                             A-51

-------
           a.  Near the top of the Data Service Details web page, click Harvest Data.
           b.  Click Begin Harvest. In a moment, a link will appear to view the harvest log.
           c.  Click the Harvest log link to view progress. The file will indicate that the harvest is
               completed once it has completed. Refresh your browser to see any updates.

Now that the service is harvested, you can tag variables in the service to the ontology.
  Tip
  At this point, you may want to take a moment to stretch and relax.  Sometimes it takes a few minutes for the
  results of the harvest to be committed to HIS Central's catalog database.

    13. Click My Data Services.
    14. Click Details for your service.
    15. Click List Variables. This brings you to a page showing variables found in your service (Figure
        0-2). If you don't see your variables here, then the harvesting was not completed.
                                Map Variables
                                Variable Name  VariableCode     Units
                              Chlorophyll a       VTDEC:Chla micrograms pe
                      Figure 0-2 List of variables for a service registered at HIS Central
    16.  Click Map Variables.

The HydroTagger opens, showing the CUAHSI HIS ontology.  There are high level  concepts  in the
center, with branches increasing in specificity as they move out to the leaf concepts. The variables that
HIS Central found in your service are  listed on the bottom left.  Your mission here  is to map your
variables to the appropriate leaf concept.  Users search upon these concepts, not whatever wacky variable
names you used, when they search for data at HIS Central.
    17.  In your list of variables, click Select on the row for chlorophyll a. This adds that item to the
        Variable text box (Figure 0-3).
    18.  In the view of the ontology, click and drag with the mouse to locate the chlorophyll a concept
        leaf (Figure 0-3).  It falls under Hydrosphere |  Biological | Biological  community | Pigment |
        Chlorophyll | Chlorophyll a.
    19.  Double-click Chlorophyll a to add it to the Mapping text box at the bottom of the window
        (Figure 0-3).
    20.  Click Map (Figure 0-3).

  Tip
  If you can't find chlorophyll in the ontology, type chlorophyll in the Search box and click Search.  Small red arrows
  will indicate keywords that match your search. Note that the Search box is only visible when the Variable and
  Mapping text boxes are not populated.
                                              A-52

-------
                                                                   Slir Iree-ii crcded with torn
                      Variable Name
                      chlorophyll a
                      lemptratufe
                      nitrogen, iota)
                      phosphorus total
Code     Medium
vtdecdila  ?urfa
vtdKtemp s
vtdectn    surface water s*)ert
vtdectp    surface water **!*<*
         Variable:
^  Chlorophyll A
    	 Mapping:
    Chlorophyll»
                      $oli
-------
This brings you to the public details page for your service.  Notice that the service details you entered are
there, along with a map showing a box that defines the extent of your site locations (Figure 0-4).
                     I Lake Champlatn Workshop 27
                           HIS Workshop
Edit Service Details
                           rntfjftxi lie TW ITWrDMSWCVViaf® t 1
                            Conlatt:  jiflvDoo
                     SUE

                     IMutK J'"»
                     Here's a *liwt a
                                                        Doe. Jane {2011} Water quality mooitonng data
                                                        from the Lake Champlam Lona-Term Monrtorino
                                                        Program VT Department ol tnvironmantal
                                                        ConsBiyaBon	
           Figure 0-4 Public view of registered service at HIS Central (your images and details may vary)

Outstanding! Your service is now registered with HIS Central, and you've seen how others can navigate
to HIS Central and discover your service.  As a final  check,  let's find data from your service  in
HydroDesktop.

Viewing Your Data in HydroDesktop
The final step in the publication process is  to check that your service is properly registered by viewing
sites from your service in HydroDesktop.   HydroDesktop is a desktop  application for discovering,
accessing, and analyzing  time  series  data from  WaterOneFlow services.   Unlike  HydroExcel,
HydroDesktop  doesn't require prior knowledge of your service URL before  you can use it.  Instead,
HydroDesktop  searches HIS Central's catalog for locations of time series of interest,  giving you access to
ALL publicly available services registered at HIS Central.

You'll use HydroDesktop to submit a query that should include sites from your Web service in the result.
If you  see your sites, then the data publication process is completed!

To search for your sites in HydroDesktop:
    1.   Open HydroDesktop and create a new North America project.

As you can see, HydroDesktop is a Geographic Information System (GIS) integrated with CUAHSI-HIS.
Next you will tell  HydroDesktop to use the sandbox HIS Central  instead of the official version.  Recall
that you registered your service with the sandbox HIS  Central (the one for testing and workshops) and not
the official version.
    2.   In the ribbon, click the drop down arrow under the Search panel and then click Advanced
        Settings (Figure 0-5).
                                               A-54

-------
                                    Q  '
                                    ^^^^  Morr^
                                                         Cf*ph
                                       ^          +x     -
                                      Stjfch  i  Pan  Zoom In  Zoom
                                      :
                                         Advanced Setci
                           Figure 0-5 Accessing HydroDesktop advanced settings

    3.   In the Advanced Settings dialog, click Custom (Figure 0-6).
    4.   Enter the following URL into the box below Custom (Figure 0-6). This is the URL to the search
        services operated by the sandbox HIS Central in which you  registered your service.
        http://water.sdsc.edu/hiscentralsb/webservices/hiscentral.asmx
    5.   Click OK (Figure 0-6).
                             Advanced Settings
                                       -*
                               SpeoTythc HlSCenml URL
                               ,  HSO
0 Custom
    s Kjcrrtn)) svni
                                 Ovwmte
                                                             :*-,.-'
                                 Figure 0-6 Setting the fflS Central URL
Now you will locate Lake Champlain. This is where your sites are located.  An online basemap will be
useful in finding the location.
    6.   In the ribbon, in the Online Basemap panel, select ESRI World Topo from the drop down list.
    7.   Using the Zoom In tool, zoom in to the northeastern United States to find Lake Champlain on
        the border with Canada between Vermont and New York.
    8.   In the Search panel, click the Select by drawing Rectangle tool (Figure 0-7).
    9.   Click and drag to draw a box  around Lake Champlain (Figure 0-7).
                                              A-55

-------
                             Figure 0-7 Specifying the area to search
10.  In the Search panel, click the Keywords tab (Figure 0-8).
11.  Type nitrogen in the keyword box (Figure 0-8). Keywords related to nitrogen are automatically
    selected in the keyword list.
12.  Click Nitrogen, total (Figure 0-8). This is one of the variables included in your service.
13.  Click the Add button (Figure 0-8). The nitrogen keyword is added to the list of keywords to
    search.
14.  Click Run Search (Figure 0-8).
                                          A-56

-------
                             fina  Qpbara

                             Keywori! r>pe«
                             fiulfewteoers
                                              Radii
                                                          total
                                  _ r*te fl*
                              Mtregen suspend
                              Wngin. total
                              Hbogen.taWi
                              Mngen.totall
    Mngen. total
    MferOQttY; EfitSl •
    tagen daselved*.
-  Phoaphotus
    Hy*at>*aMe (Sxxah
    Hydrotytable P oka <
    ?hC3ph
-------
team in developing the next generation of HIS software and methods to further hydrologic information
science.

This concludes the workshop.
                                            A-58

-------
                          Appendix A: Uninstallation Instructions
If you want to erase your footsteps from this workshop, here's how.
HIS Central
Only administrators can permanently delete services at HIS Central.  The workshop instructor will delete
all services registered during the workshop within a few weeks after the workshop has concluded.  If you
register another service  in the future  and would like it permanently deleted, feel free to  email the
workshop instructor who will tell the HIS Central administrator to delete the service for you.

In the meantime, you can hide the service so that others cannot see it.

To hide a service at HIS Central:
    1.  Log in to HIS Central.
    2.  Click My Data Services.
    3.  Click Details for the service that you want to hide.
    4.  Click Edit Details.
    5.  Uncheck Is service public.
    6.  Click Update.

WaterOneFlow
To uninstall your WaterOneFlow Web service from your server, you will remove it from IIS and then
delete the files on your computer.

To uninstall WaterOneFlow:
    1.  Open IIS.
    2.  Expand the tree view on the left to find MyDataService.
    3.  Right-click your service and click Remove. Click Yes if prompted to remove the application.
    4.  Optional: If you have no other services using the WaterOneFlow application pool, then in the
       tree view, click Application Pools, find WaterOneFlow, and click to Remove the WaterOneFlow
       application pool.
    5.  Close IIS.
    6.  Open Windows Explorer.
    7.  Navigate to your MyDataService folder.

Deleting  the files is a two-step process.  First you  delete the  service files since they have  a lock on
supporting files. Then you delete the supporting files.

To delete WaterOneFlow files:
    1.  In the MyDataService folder, delete all ASP.NET Server Page files. These are the files with the
       .aspx or .asmx extension.  (Hint: Sort by Type.)
    2.  Delete the MyDataService folder.

Streaming Data Loader Scheduled Task
To remove the scheduled task for the Streaming Data Loader:
    1.  Open Task Scheduler.

                                            A-59

-------
    2.  In the list of Active Tasks, double-click Streaming Data Loader.
    3.  Click Actions |  Delete. Click Yes to confirm. Close Task Scheduler.

ODM Database
To clean up your instance of SQL Server:
    1.  Log into SQL Management Studio.
    2.  In the Object Explorer, find your MyWaterData database.
    3.  Right-click MyWaterData and click Delete.  Click  OK to delete the database. This both detaches
       your database  and deletes the files from your computer.
    4.  Optional: If you have no other databases associated with your webclient SQL Server login, then
       you can delete it.
           a.   In the Object Explorer, expand Security | Logins to find your webclient login.
           b.   Right-click webclient and click Delete. Click OK to delete the login.
    5.  Repeat step 4 for any other ODM-related logins that you created.

CUAHSI-HIS and Related Software
To uninstall HIS software:
    1.  Open Programs and Features.
    2.  Uninstall the following software:
           a.   CUAHSI HydroObjects
           b.   Google Earth
           c.   HydroDesktop
           d.   Microsoft SQL Server or SQL Server Express (please see your system administrator first
               since this program would take a long time to reinstall if desired)
           e.   ODM Tools
           f.   ODM Data Loader
           g.   ODM Streaming Data  Loader
           h.   Optional: Microsoft Office (please see your system administrator first since this program
               is not free)
    3.  Close Programs and Features.
    4.  To uninstall HydroExcel, delete the spreadsheet file (HydroExcelllS.xIsb).
    5.  Delete the HydroExcelllS folder, located in the same directory as the HydroExcel spreadsheet
       file. This folder contains  KML generated during the Google Earth portion of the exercise.
                                            A-60

-------
This page intentionally left blank.
             A-61

-------
                                       APPENDIX B:

         CUAHSI Community Observations Data Model (ODM)
                                       Version 1.1
                                 Design Specifications

                                           May 2008

                  David G. Tarboton1, Jeffery S. Horsburgh1, David R. Maidment2
Abstract
The CUAHSI Hydrologic Information System project is developing information technology infrastructure
to support hydrologic science.  One aspect of this is a data model for the storage  and retrieval of
hydrologic observations in a relational database.  The purpose for such a database is to  store hydrologic
observations data in a system designed to optimize  data retrieval for integrated analysis of information
collected by multiple investigators.  It is intended to provide a standard format to aid in the effective
sharing of information between investigators and to allow analysis of information from disparate sources
both within  a single study area or hydrologic observatory and  across hydrologic  observatories and
regions. The observations data model is designed to store hydrologic observations and  sufficient ancillary
information  (metadata) about the data values to provide traceable heritage from raw measurements to
usable  information allowing them to be unambiguously interpreted and used.   A relational database
format is used to provide querying  capability to allow  data retrieval supporting diverse analyses.  A
generic template for the observations database is presented.  This is referred to as the  Observations Data
Model (ODM).

Introduction

The  Consortium of Universities for the Advancement of  Hydrologic Science, Inc. (CUAHSI)  is an
organization  representing more  than 100  universities and  is sponsored by  the  National Science
Foundation to provide infrastructure and services to  advance the development of hydrologic science and
education in the United States. The CUAHSI Hydrologic Information System  (HIS) is being developed
as a geographically distributed network of hydrologic data sources and functions that are integrated using
web  services so that they function  as a connected whole.  One aspect of the CUAHSI HIS  is the
development of a standard database  schema for use in the storage of point observations  in a relational
database. This is referred to as the point Observations Data Model (ODM) and is intended to allow for
comprehensive analysis of information collected by multiple  investigators for varying purposes.  It is
intended to  expand the ability for data analysis by providing a standard format to  share data among
investigators and to facilitate  analysis of information from disparate sources both within  a single  study
area or hydrologic observatory and across hydrologic observatories and regions. The ODM is designed to
store hydrologic observations with sufficient ancillary information (metadata) about  the  data values to
provide traceable  heritage from raw  measurements to  usable information  allowing them  to  be
unambiguously interpreted and used.  Although designed specifically  with hydrologic observation data in
mind, this data model has  a simple and general structure that will also  accommodate a wide range of other
data, such as from other environmental observatories or observing networks.
1 Utah Water Research Laboratory, Utah State University
2 Center for Research in Water Resources, University of Texas at Austin

                                             B-l

-------
ODM uses a relational database format to allow for ease in querying and data retrieval in support of a
diverse range of analyses. Reliance on databases and tables within databases also provides the capability
to have the model scalable from the observations of a single investigator in a single project through the
multiple investigator communities associated with a hydrologic observatory and ultimately to the entire
set of observations available to the CUAHSI community.   ODM is focused on observations made at a
point. A relational database model with individual observations recorded as individual records (an atomic
model) has been chosen to provide maximum flexibility in data analysis through the ability to query and
select individual observation records. This approach carries the burden of record level metadata, so it is
not appropriate for all variables that might be observed.  For example, individual pixel values in large
remotely sensed images or grids are inappropriate for this model.

This data model is presented as a generic template for a point observations database, without reference to
the specific implementation  in a database management system.  This is done so that the general design is
not limited to any  specific  proprietary software, although we expect that implementations will  take
advantage of capabilities of specific software. It should be possible to implement ODM in a variety of
relational database management systems, or even in a set of text tables or variable arrays in a computer
program.  However, to take full advantage  of the  relationships between  data elements, the  querying
capability of a relational database system is required.  By presenting the design at a general conceptual
level, we also avoid implementation specific detail on the format of how information is represented.  See
the discussion of Dates and Times under ODM features below for an example of the distinction between
general concepts and implementation specific details.

Version Information

ODM has evolved from an initial design presented at a CUAHSI workshop held in Austin during March,
2005  (Maidment, 2005)  that  was  then  widely reviewed with  comments being received  from 22
individuals (Tarboton, 2005). These reviews served as the basis for a redesign that was presented at a
CUAHSI workshop  in Duke during July, 2005 and  presented as part of the CUAHSI HIS status report
(Horsburgh et  al., 2005).  Following this presentation of the design, the data model was  reviewed and
commented on by a number of others,  including  the CLEANER (Collaborative  Large-scale Engineering
Analysis Network for Environmental Research) cyberinfrastructure committee.  Further versions of the
Observations Data Model were circulated in April, June  and October 2006. These documented changes
made in the evolution of this design. The fundamental design, however, has not changed since the status
report presentation of the model (Horsburgh et  al., 2005)  but many table and field names have been
changed. Tables have also been added to give spatial reference information, metadata information, and to
define controlled vocabularies.  Version 1.0 of ODM, which was the first release version of ODM, has
been implemented and tested within the WATERS network of test bed sites  and was documented in
Water Resources Research (Horsburgh  et al., 2008).  This document describes the second release version
of the data model design, which has been named ODM Release Version 1.1,  and has been so named to
correspond to  the Version 1.1  release of the CUAHSI HIS.  This document  supersedes the previous
documents.
                                             B-2

-------
In general, the following changes have been made for Version 1.1:

    •  All integer IDs serving as the primary key for tables in ODM have been changed to auto
       number/identity fields.
    •  Text field lengths have been relaxed in some cases and have been standardized according to the
       following scheme: codes = 50 characters, terms = 255 characters, links = 500 characters,
       definitions/explanations = unlimited.
    •  Check constraints have been defined for the Latitude and Longitude fields in the Sites table.
    •  Check constraints have been added to many of the fields in ODM to constrain the characters that
       are valid for those fields (see Appendix A for details).
    •  Relationships have been added between controlled vocabulary tables and the tables that contain
       the fields that they define. This  was done to more rigorously enforce the ODM controlled
       vocabularies.
    •  Unique constraints were placed  on both SiteCode in the Sites table and VariableCode in the
       Variables table.
    •  The controlled vocabulary was relaxed on the QualityControlLevels table to allow more detailed
       versioning of data series. A QualityControlLevelCode was also added to this table to facilitate
       this.
    •  A Citation field was added to the Sources table to provide a place for a formal citation for data in
       the database.
    •  A Speciation field was added to the Variables table. This provides a place to  store information
       about the speciation of chemistry observations. A SpeciationCV controlled vocabulary table was
       added to define this field.
    •  An ODMVersion table was added to store the version number of the database.
    •  The SeriesCatalog table has been updated based on the addition of the above fields.
                                              B-3

-------
Hydrologic Observations

Many organizations and  individuals measure hydrologic  variables  such as  streamflow, water quality,
groundwater levels, and precipitation.  National databases such as USGS' National Water Information
System  (NWIS) and USEPA's data Storage and Retrieval (STORET) system contain a wealth of data,
but, in general, these national data repositories have different data formats, storage, and retrieval systems,
and combining data from disparate  sources can be difficult.  The problem is compounded when multiple
investigators are involved (as would be the case at proposed CUAHSI Hydrologic Observatories) because
everyone has their own way of storing and manipulating observational data.  There is a need within the
hydrologic community for an observations  database structure that presents observations from many
different sources and of many different types in a consistent format.

Hydrologic observations are identified by the following fundamental characteristics:

    •    The location at which the observations were made (space)
    •    The date and time at which the observations were made (time)
    •    The type of variable that was observed, such as streamflow, water surface elevation, water quality
        concentration, etc. (variable)

These three  fundamental characteristics may be represented as a data cube (Figure 1), where a particular
observed data value (D) is located as a function of where it was observed (L), its time of observation (T),
and what kind of variable  it is (V), thus forming D(L,T,V).
                                           Time,  T
                                                 D
                                                               Space,  L
                        Variables,  V
Figure 1.  A measured data value (D) is indexed by its spatial location (L), its time of measurement (T),
and what kind of variable it is (V).

In addition to these fundamental  characteristics, there  are many  other distinguishing attributes that
accompany observational data.  Many of these secondary attributes provide more information about the
three fundamental characteristics mentioned above. For example, the location of an observation can be
expressed as  a text string (i.e., "Bear River Near Logan, UT"), or as latitude  and longitude coordinates
that accurately delineate the location of the observation. Other attributes can provide important context in
interpreting the observational data.  These include data qualifying comments and information about the
organization that collected the data. The fundamental design decisions associated with the ODM involve
                                              B-4

-------
choices as to how  much supporting information to include in the  database  and whether to store (and
potentially repeat) this information with each observation or save this information in separate tables with
key fields used to logically associate observation records with the associated information in the ancillary
tables. Table 1 presents the general attributes associated with a point observation that we judged should
be included in the generic ODM design.
Table 1.  ODM attributes associated with an observation
      Attribute
                                       Definition
 Data Value

 Accuracy

 Date and Time


 Variable Name


 Speciation


 Location

 Units

 Interval


 Offset
 Offset Type/
 Reference Point

 Data Type
 Organization

 Censoring

 Data Qualifying
 Comments

 Analysis   Procedure/
 Method


 Source


 Sample Medium

 Value Category
 The observation value itself

 Quantification of the measurement accuracy associated with the observation value

 The date and time of the observation (including time zone offset relative to UTC and daylight savings
 time factor)

 The name of the physical,  chemical, or biological quantity that the data  value represents (e.g.
 streamflow, precipitation, temperature)

 For concentration measurements, the species in which the concentration is expressed (e.g., as N, or as
 NO3, or as NH4)

 The location at which the observation was made (e.g. latitude and longitude)

 The units (e.g. m or m3/s) and unit type (e.g. length or volume/time) associated with the variable

 The interval over which each observation was collected or implicitly averaged by the measurement
 method and whether the observations are regularly recorded on that interval

 Distance from a  reference point to the location at which the observation was made (e.g. 5 meters
 below water surface)

 The reference point from which the offset to the measurement location was measured (e.g. water
 surface, stream bank, snow surface)

 An  indication of the kind of quantity being measured (e.g. a continuous, minimum, maximum, or
 cumulative measurement)

 The organization or entity providing the measurement

 An indication of whether the observation is censored or not

Comments accompanying the  data that can affect the way the data is used or interpreted (e.g. holding
 time exceeded, sample contaminated, provisional data subject to change, etc.)

 An  indication of what method was used to collect the observation (e.g. dissolved oxygen by field
 probe or dissolved oxygen by Winkler Titration) including quality control and assurance that it has
 been subject to

 Information on the original source of the observation (e.g. from a specific organization, agency, or
 investigator 3rd party database)

 The medium in which the sample was collected (e.g. water, air, sediment, etc.)

 An indication of whether the data value represents an actual measurement, a calculated value, or is
 the result of a model simulation
Observations Data Model
The schema of the Observations Data Model is given in Figure 2. Appendix A gives details of each table
and each field in this generic data model schema. Appendix A serves as the data dictionary for the data
model and documents specific database constraints, data types, examples, and best practices. The primary
                                                     B-5

-------
table that stores point observation values is the DataValues table at the center of the schema in Figure 2.
Logical relationships between fields in the data model are shown and serve to establish the connectivity
between the observation values and associated ancillary information. Details of the relationships  are
given in Table 2. Figure 2 shows each of the controlled vocabulary tables and their relationships to  the
table containing the field that they define. Controlled vocabulary tables are highlighted with red headers.
In Figure 2, each of the mandatory fields is shown in  bold text, whereas optional fields are shown in
regular text.
                                               B-6

-------
                  Monitoring Site Locations

(M) SpatialReferencelD {PK}
(O) SRSID
(M) SRSName
(O) IsGeographic
(O) Notes
RefLatLong >
1.1 ~|_
O..-
1..1
ReferenceLocalCoords » g •
De
VerticalDatumCV 1 1 I
(M) Term
(O) Definition

~nesCV > g •

Sites
(M) SitelD {PK}
M) SiteCode
M) SiteName
M| Latitude
M) Longitude
|M) LatLongDatumID {FK}
(O) Elevation m
|O) VerticalDatum
;O) LocalX
(O) LocalY
(O) LocalProjectionID {FK}
(O) PosAccuracy_m
;O) State
;O) County
|O) Comments
D
1 Table
(M) Field cannot be null
(Q) Field can be Null
Controlled Vocabulary
(M) Field cannot be null
(O) Field can be Null

iagram Key
Cardinality
1 ..1 One and only one
1..* One or many
0..* Zero or many
0..1 Zeroorone
                                                                                                                                 ODM Version
                                                                                                                                                                                       Data Sources
                                                                                                                                                                   Sources
                                                                                                  Observation Values
                                                                                Locate >
                                                                                                                       <  Generate
Units
                                    Offsets
                                                                                                                                               (M) SourcelD {PK}
                                                                                                                                               (M) Organization
                                                                                                                                               (M) SourceDescription
                                                                                                                                               (O) SourceLink
                                                                                                                                               (M) ContactName
                                                                                                                                               (M) Phone
                                                                                                                                               (M) Email
                                                                                                                                               (M) Address
                                                                                                                                          1 1 l(M)City
                                                                                                                                         ——I(M) State
                                                                                                                                               (M) ZipCode
                                                                                                                                               (M) Citation
                                                                                                                                              :(M) MeladatalD (FK)
                                                                                                                                        <  Create
                                                                                                      0..
                                                                                              DataValues
                                                                                     (M) ValuelD {PK}
                                                                                     (M) DataValue
                                                                                     (O) ValueAccuracy
                                                                                     (M) LocalDateTime
                                                                                     (M) UTCOffset
                                                                                     (M) DateTimeUTC
                                                                                     (M) SitelD {FK}
                                                                                     (M) VarlablelD {FK}
                                                                                     (O) OffsetValue
                                                                                     (O) OffsetTypelD {FK}
                                                                                     (M) CensorCode
                                                                                     (O) QualifierlD {FK}
                                                                                     (M) MethodID {FK}
                                                                                     (M) SourcelD {FK}
                                                                                     (O) SamplelD {FK}
                                                                                     (O) DerivedFromID
                                                                                     (M) QualityControlLevellD {FK}
                                                                                                                                                                                Data Collection Methods
                          0  • (M) OffsetUnitsID {FK} o. 1
                             (M) OffsetDescription
   (M) VariablelD {FK}
-1 ' (Ml DataValue
                                                                                                                                                             (M) MethodIO {PK}
                                                                                                                                                             (M) MethodDescription
                                                                                                                                                             (O) MethodLink
                                                                                                                                                                                               LabMethods
                                                                                                                      (M) LabMethodID {PK}
                                                                                                                      ;M) LabName
                                                                                                                       M) LabOrganization
                                                                                                                      (M) LabMethodName
                                                                                                                      (M) LabMethodDescription
                                                                                                                      (O) LabMethodLink
                                                                                                                                   Qualify
                                                                                                                                                      Data Qualifiers
                                                                                                                                  QualityControlLevels
                                                                                                                             (M) QualityControlLevellD {PK}
                                                                                                                             (M) QualityControlLevelCode
                                                                                                                             (M| Definition
                                                                                                                              M) Explanation
                                                                                                                                                                                                                SeriesCatalog
                                                                                                                                                                                              P) SeriesID (PK)
                                                                                                                                                                                              P) SilelD (FK)
                                                                                                                                                                                              P) SiteCode
                                                                                                                                                                                              ;P) SiteName
                                                                                                                                                                                              P) VariablelD {FK}
                                                                                                                                                                                              P) VariableCode
                                                                                                                                                                                              P) VanableName
                                                                                                                                                                                              ;P) Spetiatlon
                                                                                                                                                                                              P) VariableUnitsID {FK}
                                                                                                                                                                                              P| VariableunitsName
                                                                                                                                                                                              P! SampleMedium
                                                                                                                                                                                              P) ValueType
                                                                                                                                                                                              P) TimeSuppon
                                                                                                                                                                                              P) TimeUnilsID (FK)
                                                                                                                                                                                              P) TimeUnitsName
                                                                                                                                                                                              P! DalaType
                                                                                                                                                                                              P) GeneralCalegory
                                                                                                                                                                                              P| MelhodID {FK)
                                                                                                                                                                                              P) MethodDescriplion
                                                                                                                                                                                              PI SourcelD (FK(
                                                                                                                                                                                              P) Organization
                                                                                                                                                                                              P) SourceDescription
                                                                                                                                                                                              ;PJ Citation
                                                                                                                                                                                              P) QualityControlLevellD {FK}
                                                                                                                                                                                              P) QualityControlLevelCode
                                                                                                                                                                                              P] BeginDaleTime
                                                                                                                                                                                              P) EndDateTime
                                                                                                                                                                                              P) BeginDateTimeuTC
                                                                                                                                                                                              P) EndDateTimeUTC
                                                                                                                                                                                              P} ValueCount
Figure 2.  Observations Data Model schema.

-------
Table 2.  Observations Data Model Logical Relationships
Relationships that
Table
DataValues
DataValues
DataValues
DataValues
DataValues
DataValues
DataValues
DataValues
define ancillary information about data values
Field Type Field
SitelD
VariablelD
OffsetTypelD
QualifierlD
MethodID
SourcelD
SamplelD
QualityControlLevellD
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
SitelD
VariablelD
OffsetTypelD
QualifierlD
MethodID
SourcelD
SamplelD
QualityControlLevellD
Table
Sites
Variables
OffsetTypes
Qualifiers
Methods
Sources
Samples
QualityControlLevels

Relationships that
Table
DataValues
DataValues
define derived from groups
Field
DerivedFromID
ValuelD
Type
* <-> *
1 <-> *
Field
DerivedFromID
ValuelD
Table
DerivedFrom
DerivedFrom

Relationships that
Table
DataValues
GroupDescriptions
define groups
Field
ValuelD
GroupID
Type
1 <-> *
1 <-> *
Field
ValuelD
GroupID
Table
Groups
Groups

Relationships used to define categories for categorical data
Table Field Type
Variables
DataValues
VariablelD
DataValue
Relationships used to define the Units
Table Field
Units
Units
Units
UnitsID
UnitsID
UnitsID
1 <-> *
* <-> 1
Type
1<->*
1<->*
1<->*
Field
VariablelD
DataValue
Field
VariableUnitsID
TimeUnitsID
OffsetUnitsID
Table
Categories
Categories
Table
Variables
Variables
OffsetTypes

Relationship used
Table
LabMethods
to define the Sample Laboratory
Field
LabMethodID
Methods
Type
1<->*
Field
LabMethodID
Table
Samples

Relationships used to define the Spatial References
Table Field
SpatialReferences
SpatialReferences
SpatialReferencelD
SpatialReferencelD
Type
1<->*
1<->*
Field
LatLongDatumID
LocalProjectionID
Table
Sites
Sites

Relationship used
Table
IsoMetaData
to define the ISOMetaData
Field
MetadatalD
Type
1<->*
Field
Sources
Table
MetadatalD

-------
Relationships used
Table
VerticalDatumCV
SampleTypeCV
VariableNameCV
ValueTypeCV
DataTypeCV
SampleMediumCV
SpeciationCV
GeneralCategoryCV
TopicCategoryCV
CensorCodeCV
to define Controlled Vocabularies
Field Type
Term l<->*
Term K->*
Term l<->*
Term K->*
Term l<->*
Term K->*
Term l<->*
Term K->*
Term l<->*
Term K->*
Field
VerticalDatum
SampleType
VariableName
ValueType
DataType
SampleMedium
Speciation
GeneralCategory
TopicCategory
CensorCode
Table
Sites
Samples
Variables
Variables
Variables
Variables
Variables
Variables
ISOMetadata
DataValues
Relationship type is indicated as One to One (!<->!), One to Many (!<->*), Many to One (*<->!) and
Many to Many (*<->*).  The  first set of relationships defines the links to tables that contain ancillary
information.  They are used so that only compact (integer) identifiers are stored with each data value and
thus repeated many times while the more voluminous ancillary information is stored to the side and not
repeated.  The second set of relationships defines derived from groupings used to specify data values that
have been used to derive other data values. The third set of relationships defines logical groupings of data
values.   The fourth  set of relationships is used to  specify the categories  associated with categorical
variables. The fifth set of relationships is used to define the units. The sixth set of relationships associates
laboratory methods with samples.  The seventh set  of relationships associates sites with the  Spatial
Reference System used to define the location.  The eigth set of relationships associates project and dataset
level metadata with  each data source.  The  last set of relationships defines  the linkage between the
controlled vocabulary fields and the tables that stored the acceptable terms for those fields.  Details of
how these relationships work are given in the discussion of features of the data model design below.

Features of the Observations Data Model Design

Geography
ODM is intended to be independent  of the geographical  representation of  the site locations.   The
geographic location of sites is specified through the Latitude, Longitude, and Elevation information in the
Sites table, and optionally local coordinates, which may be  in a standard geographic projection for the
study area or in a locally defined coordinate system specific to a study area.  Each site also has a unique
identifier, SitelD, which can be logically linked to one or  more objects in a Geographic  Information
System  (GIS) data model. For example, Figure 3 depicts a one-to-one relationship between sites within
ODM and HydroPoints  within  the Arc Hydro Framework Data Model  (Maidment, 2002)  used to
represent objects in a digital watershed.  In simple implementations, SitelD may have the same integer
value as  the  identifier for the  associated GIS object, HydroID in this  case.   In more complex
implementations, and especially when multiple databases are merged into a single ODM,  it may not be
possible to preserve the  simple one-to-one relationship between SitelD and HydroID with each of these
fields holding the same integer identifier values. In these cases, where SitelD  and HydroID are not the
same, a coupling table would be used to associate the ODM SitelDs used to identify sites with HydroIDs
in the Arc Hydro data model.

SitelD must be unique within an instance of ODM.  This could, for example, be achieved by assigning
SitelDs  from a master table. The linkage between SitelDs and GIS object IDs is intended to be generic
and suitable for use with any geographic data model that includes information specifying the location of
sites.  For example, a linear referencing system  on a river network, such as the National Hydrography
                                            B-9

-------
Dataset, might be used to specify the location of a site on a river network.  Addressing relative to specific
hydrologic objects through the SitelD field provides direct and specific location information necessary for
proper interpretation of data values.  Information from direct addressing relative to hydrologic objects is
often of greater value to a user than the simple Latitude and Longitude information stored in the ODM
Sites table. For example, it is more useful to know that a stream gage is on such and such a stream rather
than simply its latitude and longitude.
  Observations Data Model
       Sites
  SitelD
  SiteCode
  SiteName
  Latitude
  Longitude
  LatLongDatumID
  Elevation_m
  VerticalDatumID
  LocalX
  LocalY
  LocalProjectionlD
  PosAccuracy_m
  State
  County
  Comments
                                            Arc Hydro Framework Data Model
                   CouplingTable
                                   ComplexEdgeFeature
                                                                   Simple JunctionFeatu re
                                                                      HydroJunction
                                      HydrolD
                                      HydroCode
                                      ReachCode
                                      Name
                                      LengthKm
                                      LengthDown
                                      FlowDir
                                                       HydroNetwork
                                      FType
                                      EdgeType
                                      Enabled
 — HydroEdge
  O  HydroPoint
 Q Watershed
     Waterbody
Figure 3.  Arc Hydro Framework Data Model and Observations Data Model related through SitelD field
in the Sites table.

Series Catalog
A "data series" is an organizing principle used in ODM.  A data series consists of all the data values
associated with  a unique site, variable, method, source, and quality control level combination in  the
DataValues table.  The SeriesCatalog table lists data series, identifying each by a unique series identifier,
SeriesID.  This table is essentially a summary of many of the tables in the ODM and is not required to
maintain the integrity of the data. However, it serves to provide a listing of all the distinct series of data
values of  a specific variable at a specific site.  By doing so, this table provides a means by  which users
can execute most common data discovery queries (i.e., which variables have data at a site, etc.) without
the overhead of querying the entire DataValues table, which can become  quite large.

The SeriesCatalog table is also  intended to support  CUAHSI Web  Service method queries such as
GetSitelnfo, which returns information about a  monitoring site within an instance of the ODM including
the variables that have been measured at that site.  It should be noted that data series, as they are defined
here, do not distinguish between different series of the same variable at the same site but measured with
different offsets.   If for example temperature  was measured at  two different  offsets by two different
sensors at one site, both sets of data would fall  into one data series for the purposes of the SeriesCatalog
                                            B-10

-------
table.  In these cases, interpretation or analysis software will need to specifically examine and parse the
offsets by examining the offset associated with each data value.  The SeriesCatalog table does not do this
because its principal purpose is  data discovery, which we did not want to be overly complicated. The
SeriesCatalog table should be programmatically generated and modified as data are added to the database.

Accuracy
Each data value in the DataValues table has an associated attribute  called Value Accuracy.  This is a
numeric value that quantifies the total measurement accuracy defined as the nearness of a measurement to
the true or standard value.  Since the true value is not known, the ValueAccuracy is estimated based on
knowledge of the instrument accuracy, measurement method, and operational  environment.   The
ValueAccuracy, which is  also called the uncertainty of the measurement, compounds the estimates of
both bias and precision errors.  Bias errors are generally fixed or systematic and cannot be determined
statistically, while precision errors are random, being generated  by the variability  in the measurement
system and operational environment.  Figure  4  illustrates the effects of these errors on a sample of
measurements. Bias errors are usually estimated through  specially designed experiments (calibrations).
The precision errors are determined using statistical analysis by quantifying the measurement scatter,
which is proportional to the standard deviation of the sample of repeated measurements.  The total error is
obtained by  the  root-sum-square of the estimates for  bias and  precision errors  involved in the
measurement. Figure 5 gives another illustration of the ValueAccuracy concept based on the analogy of a
target, where the bulls eye at the  center represents the true value.

ValueAccuracy is a data value level attribute because it can change with each measurement, dependent on
the instrument or measurement protocol.  For example, if streamflow is measured using a V-notch weir, it
is actually the stage that is  measured, with accuracy limited  by the precision and bias of the depth
recording  instrument.  The  conversion to discharge through the  stage-discharge  relationship  results in
greater absolute error for larger discharges. Inclusion of the ValueAccuracy attribute, which will be blank
for many historic datasets because historically accuracy has not been recorded, adds  to the size  of data in
the ODM, but provides a way for factoring the accuracy associated with measurements into data analysis
and interpretation, a practice that should be encouraged.
                               TRUE VALUE AND
                               AVERAGE OF ALL
                               MEASUREMENTS
                       PARAMETER MEASUREMENT
                a. Unbiased, precise, accurate
                               TRUE VALUE AND
                               AVERAGE OF ALL
                               MEASUREMENTS
                       PARAMETER MEASUREMENT

0
UJ
§
UJ
a.


f
TRUE
VALUE







/





\
AVERAGE OF ALL
MEASUREMENTS






    PARAMETER MEASUREMENT
b. Biased, precise, inaccurate
                AVERAGE OF ALL
                MEASUREMENTS
                                                        PARAMETER MEASUREMENT
                c. Unbiased, imprecise, inaccurate    d. Biased, imprecise, inaccurate

Figure 4.  Illustration of measurement error effect (Source: AIAA, 1995).
                                            B-ll

-------
                                                                                 Bias
                 Accurate             Low Accuracy          Low Accuracy,
                                                                   but precise
Figure   5.   Illustration of Accuracy versus Precision (adapted from Wikipedia
http://en.wikipedia.org/wiki/Accuracy).
In designing  ODM, consideration was given to the suggestion by some reviewers to record bias and
precision separately, in addition to ValueAccuracy for each data value.  This has not been done at this
release in the interest of parsimony and also because quantifying these separate components of the error is
difficult. We suggest that for most measurements there should be the presumption that they are unbiased
and that ValueAccuracy quantifies the precision and accuracy in the judgment of the  investigator
responsible for collecting the data.  For cases where there is specific bias and precision information to
complement the ValueAccuracy attribute, this could be recorded in the ODM as a separate variable, e.g.
discharge precision, or temperature bias.  The groups and derived from features (see below) could be used
to associate these  variables  with their related observations.   For  measurements  that are known to be
biased, we suggest that the bias could be quantified by other reference measurements that should also be
placed in the database  and that a new set of corrected measurements that have  had the bias removed
should be added to the database at a higher quality control level.  These new measurements should have a
lower ValueAccuracy value to  reflect the improvement in accuracy by removal of the bias.  The method
and derived from information for these corrected measurements should give the bias removal method and
refer to the data used to quantify and remove the bias.

Offset
Each record in the DataValues table has two optional fields OffsetValue and OffsetTypelD. These  are
used to record the  location of an observation relative to an appropriate datum, such as "depth below the
water surface" or "depth below or above the ground." The OffsetTypelD references an OffsetValue into
an OffsetTypes table that gives  units and definition associated with the OffsetValue. This design only has
the capability to represent one  offset for each data value.  In  cases (which we expect to be rare) when
there are multiple  offsets (e.g.  distance in from a stream bank and depth below the surface) one of the
offsets will need to be distinguished as a separate variable.

Spatial Reference and Positional Accuracy
Unambiguous specification of the location of an observation site requires that the horizontal and vertical
datum used for latitude, longitude, and elevation be specified. The SpatialReferences table is provided for
this purpose to record the name and EPSG code of each Spatial Reference System used. EPSG codes  are
numeric codes associated with coordinate  system definitions published by the OGP  Surveying and
Positioning Committee (http://www.epsg.org/).  A non-standard Spatial  Reference System, such as,  for
example, a local grid at an experimental watershed, may be defined in the SpatialReferences table Notes
field.   The accuracy with which the location of a monitoring site is  known is quantified using the
PosAccuracy_m field in the Sites table.  This is a numeric value intended to specify the uncertainty (as a
                                             B-12

-------
standard deviation or root mean square error) in the spatial location information (latitude and longitude or
local coordinates) in meters.  Using a large number for PosAccuracy_m (e.g.  2000 m) accommodates
entry of data collected for a study area where the precise location where the observation was recorded is
not known.
The DerivedFrom and Groups tables fulfill the function of grouping data values for different purposes.
These are tables where the same identifier (DerivedFromID or GroupID) can appear multiple times in the
table associated  with different Value IDs, thereby defining  an associated group of records.   In the
DerivedFrom table this is the sole purpose of the table, and each group so  defined is associated with a
record in the DataValues table (through the DerivedFromID field in that table). This record would have
been derived from the data values identified by the group.  The method of derivation would be given
through the methods table associated  with  the  data value.   This construct is useful, for example, to
identify the 96 15-minute unit streamflow values that go into the estimate of the mean daily streamflow.
Note that there is no limit to how many groups a data value may be associated with, and data values that
are derived from other data values may themselves belong to groups used to derive other data values (e.g.
the daily minimum flow over a month derived from daily values derived from 15 minute unit values).
Note also that a derived from group may have as few as one data value for the case where a data value is
derived  from a single more primitive data value (e.g., discharge from stage). Through this construct the
ODM has the capability to store raw observation values and information derived from raw observations,
while preserving the connection of each data value to its more primitive  raw measurement.

The GroupID relationship that appears in Table 2  is designated as  one-to-many because there will be
many records in the Groups table that have the same GroupID, but different ValuelD, that serve to define
the group.  In Figure  1, the Group  relationship  is  labeled 1..*, at the  DataValues table and 0..*  at the
Groups  table.  This indicates that a  group may comprise one or more  data values  and that a data value
may be included  in 0 or more groups. Similarly, there will be many records in the DerivedFrom table that
have the same DerivedFromID, but different ValuelD that serve to define the group of data values from
which a data value is derived. Logically a data value should not be in a DerivedFrom group upon which
it is derived from.  If this can be programmatically checked by the  system, then this sort of circularity
error could be prevented.

The method description in the Methods table associated with a data  value that has a DerivedFromID
should describe  the method used for  deriving the particular data  value from other data values (e.g.
calculating discharge  from a number of velocity  measurements across a stream).  The relationship
between the  DataValues table DerivedFromID  field  and DerivedFrom table DerivedFromID field is
many-to-many (*<->*) because it can  occur that the same group  of data values is used to derive more
than one derived data value. In Figure  1, the AreDerivedFrom relationship between the data values and
DerivedFrom table actually depicts both relationships  between these tables  listed in Table 2.  The
AreDerivedFrom relationship is labeled 1..* at the DataValues table and 0..* at the  DerivedFrom table to
indicate that a derived from group may comprise 1 or more data values and that a data value may be a
member of 0 or more derived from groups.


Unambiguous interpretation of date and time information requires specification of the time zone or offset
from universal time (UTC). A UTCOffset field is included in the DataValues table to ensure that local
times recorded in the database can be  referenced to standard time and to enable comparison of results
across databases that may store data values  collected in different time zones  (e.g. compare data values
from one hydrologic observatory to those collected at another hydrologic observatory located across the
country). A design choice here was to have UTCOffset as a record  level qualifier because even though
the time zone, and hence offset, is likely the same  for all measurements at a site, the offset may change
                                           B-13

-------
due to daylight savings. Some investigators may run data loggers on UTC time, while others may use
local time adjusting for daylight saving time. To avoid the necessity to keep track of the system used, or
impose a system that might be cumbersome and lead to errors, we decided that if the offset was always
recorded, the precise time would be unambiguous and would reduce the chance for interpretation errors.
A field DateTimeUTC is also included as a record level attribute associated with each data value.  This
provides a consistent time for querying and sorting data values.  There is a level of redundancy between
LocalDateTime,  UTCOffset and  DateTimeUTC.  Only two are required to  calculate  the third.  For
simplicity and clarity we retain all three. A specific database implementation may choose to retain only
two and calculate the third on the fly. ODM data loaders should only require two of the quantities to be
input and should then calculate the third.

The separation of the date  and time specification into two variables,  LocalDateTime and UTCOffset, in
the generic conceptual model may be handled differently within specific implementations. In one specific
implementation these may be grouped in one text field in standard (e.g. ISO 8601) format such as YYYY-
MM-DDhh:mm:ss.sss:UTCOffset (e.g. 2006-03-2516:19:56.232:-?), while in another format the date and
time may  be specified as the number of fractional days from an origin (e.g. Excel represents the above
date as the following number 38801.6805 and allows the user to specify  the format for display) with
UTCOffset as a separate attribute. In general we expect specific implementations to take advantage of the
representation  of date  time  objects provided  by the  implementation software,  but to  expose the
LocalDateTime and UTCOffset  to  users  so  that time  may  be  unambiguously interpreted.  In the
SeriesCatalog  table,  begin and  end times for each data series  are  represented by the  attributes
BeginDateTime, EndDateTime, BeginDateTimeUTC, and EndDateTimeUTC.  The  UTC offset may be
derived from the difference between the UTC and local times. Because local time may change (e.g. with
daylight savings) it is important during the derivation of the SeriesCatalog table that identification of the
first and last records be based on UTC time and that local times be read from the corresponding records,
rather than using a min or a max function on local times which can result in an error.

Support Scale
In interpreting data values that comprise a time series  it is important to know the scale  information
associated with the data values. Bloschl and Sivapalan (1995) review the important issues. Any set of
data values is quantified by a scale triplet comprising support, spacing, and extent as illustrated in Figure
6.
             (a) Extent
                    O
                O
(b) Spacing
    O
(c) Support
                                                                       i i
                                                                       i i
                                                                       i i
                                                                          O
            length or time                    length or time                     length or time

Figure 6. The scale triplet of measurements (a) extent, (b) spacing, (c) support (from Bloschl, 1996).

Extent is the  full  range over  which  the  measurements occur, spacing is the  spacing  between
measurements, and support is the averaging interval or footprint implicit in any measurement.  In ODM,
extent and spacing are properties of multiple measurements and are defined by the LocalDateTime or
DateTimeUTC  associated  with data  values.  We  have included a field called  Time Support in the
Variables table  to explicitly quantify support.  Figure 7 shows  some of the implications associated with
support, spacing, and extent in the interpretation of time series data values.
                                             B-14

-------
                        (a) spacing too large - noise (aliasing)
             (b) extent too small - trend
                          (c) support too large - smoothing out

Figure 7.  The effect of sampling for measurement scales not commensurate with the process scale:  (a)
spacing larger than the process scale causes aliasing in the data; (b) extent smaller than the process scale
causes a trend in the data; (c) support larger than the process scale causes excessive smoothing in the data
(adapted from Bloschl, 1996).

The concepts of scale described here apply in spatial as well as time dimensions.  However, TimeSupport
is only used to quantify  support in the time dimension.  The spatial support associated with a specific
measurement method needs to be given or implied in the methods description in the Methods table.  The
next section indicates how time support should be specified for the different types of data.
Data Types
In the ODM, the following data types are  defined.
Variables table.
These are specified by the DataType field in the
1.  Continuous data - the phenomenon, such as streamflow, Q(t) is specified at a particular instant in
   time and measured with sufficient frequency (small spacing) to be interpreted as a continuous
   record of the phenomenon.  Time support may be specified as 0 if the measurements are
                                             B-15

-------
instantaneous, or given a value that represents the time averaging inherent in the measurement
method or device.
Sporadic data - the phenomenon is sampled at a particular instant in time but with a frequency
that is too coarse for interpreting the record as continuous. This would be the case when the
spacing is significantly larger than the support and the time scale of fluctuation of the
phenomenon, such as for example  infrequent water quality samples. As for continuous data, time
support may be specified as 0 if the measurements are instantaneous, or given a value that
represents the time averaging inherent in the measurement method or device.
Cumulative data - the data represents the cumulative value of a variable measured or calculated
up to a given instant of time, such as cumulative volume of flow or cumulative precipitation:
       t
V(t) = [ Q(T)dT , where i represents time in the integration over the interval [0,t]. To
       o
unambiguously interpret cumulative data one needs to know the time origin. In the ODM we
adopt the convention of using a cumulative record with a value of zero to initialize or reset
cumulative data. With this convention, cumulative data should be interpreted as the accumulation
over the time interval between the  date and time of the zero record and the current record at the
same site position.  Site position is defined by a unique combination of SitelD, VariablelD,
OffsetValue and OffsetType. All four of these quantities comprise the unambiguous description
of the position of an observation value and there may be multiple time series associated with
multiple observation positions (e.g. redundant rain gauges with different  offsets) at a location.
The time support for a cumulative  value should be specified  as 0 if the measurement of the
cumulative quantity is instantaneous, or given a value that represents the  time averaging inherent
in the measurement of the cumulative value at the end of the period of accumulation.
Incremental data - the data value represents the incremental  value of a variable over a time
interval At such as the incremental volume of flow, or incremental precipitation:
         t+&t
AV(t) =  \Q(r)dT .  As for cumulative data, unambiguous interpretation requires knowledge  of
          t
the time increment. In the ODM we adopt the convention of using TimeSupport to specify the
interval At, or the time interval to the next data value at the same position if TimeSupport is 0.
This accommodates incremental type precipitation data that is only reported when the  data value
is non-zero, such as NCDC data. Such NCDC data is irregular, with the  interpretation that
precipitation is 0 if not reported unless qualifying comments designate otherwise.  See example
E.4 below for an illustration of how NCDC precipitation data is accommodated in the  ODM.
Average data - the data value represents the average over a time interval, such as daily mean

discharge or daily mean temperature: Q (t) =	—.  The averaging interval is quantified by
                                           At
TimeSupport in the case of regular data  (as quantified by the IsRegular field) and  by the time
interval from the previous data value at the same position for irregular data.
Maximum data - the data value is the maximum value occurring at some  time during a time
interval, such as annual maximum  discharge or a daily maximum air temperature. Again
unambiguous interpretation requires knowledge of the time interval. The ODM adopts the
convention that the time interval is the TimeSupport for regular data and  the time  interval from
the previous data value at the same position for irregular data.
Minimum data - the data value is the minimum value occurring at some time during a time
interval, such as 7-day low flow for a year, or the daily minimum temperature. The time interval
is defined similarly to Maximum data.
Constant over interval data - the data value is a quantity that can be interpreted as constant over
the time interval to the next measurement.
                                    B-16

-------
    9.  Categorical data - the data value is a categorical rather than continuous valued quantity.
       Mapping from data values to categories is through the Categories table.

We anticipate that additional data types such as median, standard deviation, variance, and others may
need to be added as users work with ODM.
Data types 4 to  8  above apply to data values that occur over an interval of time.  The date and time
reported and entered in to the  ODM database associated with each interval data value is the beginning
time of the observation interval.  This convention was adopted to be consistent with the way dates and
times are represented in most common database management systems.  It should be noted that using the
beginning of the interval is not consistent with  the time a data logger would log an observation value.
Care should be exercised in adding data to the ODM to ensure that the beginning of interval convention is
followed.
A considerable portion of hydrologic observations data is in the form of time series.  This was why the
initial model was based on the Arc Hydro Time Series Data Model.  The ODM design has not specifically
highlighted time series capabilities; nevertheless, the data model has inherited the key components from
the  Arc Hydro Time  Series Data Model to give  it time series capability.   In particular one variable
DataType is "Continuous," which is designed to indicate that the data values are collected with sufficient
frequency as to be interpreted as a smooth time series.  The IsRegular field  also facilitates time series
analysis because  certain time series operations (e.g., Fourier Analysis)  are  predisposed to regularly
sampled data.  At first glance it may appear that there is redundancy between the  IsRegular field and the
DataType "Continuous," but we chose to keep  these  separate because there  are  regularly sampled
quantities for  which it is not reasonable to interpret the data values as "Continuous."  For example,
monthly grab samples of water quality are not continuous, but are better categorized as having DataType
"Sporadic."  Note that ODM does not explicitly store the time interval between measurements, nor does it
indicate where a continuous series has data gaps. Both of these are  required for time  series analysis, but
are  inherently  not properties of single measurements.  The time interval is the time difference between
sequential regular measurements, something that can be  easily computed from date and time values by
analysis tools.  The inference of measurement gaps (and what to  do about them) from date and time
values we also regard as analysis  functionality left for a Hydrologic Analysis System to handle.


In ODM, categorical or ordinal variables are stored in the  same table as continuous valued 'real' variables
through a numerical encoding of the  categorical data value as a 'real' data value.  The Categories table
then associates,  for each variable, a data  value  with an associated category description.   This  is a
somewhat cumbersome construct because real valued quantities are being used as database keys. We do
not see this as  a significant shortcoming though, because typically, in our judgment, only a small fraction
of hydrologic observations will be categorical.  The Categories table stores the categories associated with
categorical data values.  If a Variable has a DataType that is "Categorical"  then the VariablelD must
match  one or more  VariablelDs in Categories  that define  the  mapping  between  DataValues  and
Categories. The CategoryDescription field in the Categories table defines the category.


At first glance there may appear to be redundancy between the information in  the  Samples table and
Methods table. However, the samples table is intended  to only be used where data values are derived
from a physical sample that is later analyzed in a laboratory (e.g., a water chemistry sample or biological
sample).   The SamplelD that links  into the  Samples table provides tracking of the specific physical
sample used to derive each measurement and, by reference to information in the LabMethods table, the
                                           B-17

-------
laboratory methods and protocols followed.   The Methods table  refers to the  method  of field  data
collection, which may specify "how" a physical  observation was made or collected (e.g.,  from an
automated sampler or collected manually), but is also used to specify the measurement method associated
with an  in-situ measurement  instrument such  as  a weir, turbidity sensor, dissolved oxygen sensor,
humidity sensor, or temperature sensor.
Each  record in the  DataValues table has an attribute  called QualifierlD that references the Qualifiers
table. Each QualifierlD in the Qualifiers table has attributes QualifierCode and QualifierDescription that
provide qualifying  information that can  note  anything  unusual  or problematic  about individual
observations such as, for example, "holding time for analysis exceeded" or "incomplete or inexact daily
total." Specification of a QualifierlD in the DataValues table is optional, with the inference that if a
QualifierlD is not specified then the corresponding data value is not qualified.


Each  data value in the DataValues table has an attribute called QualityControlLevellD that references the
QualityControlLevels table and  is designed to record the  level of quality control processing that the data
value has been subjected to at  the level of data  series.   Quality control  level is one of the attributes
(together with site,  variable, method, and source) used to uniquely  identify data series.  Each quality
control level is uniquely identified by  its QualityControlLevellD; however,  each level  also has a text
QualityControlLevelCode that,  along with a Definition  and Explanation, provides a more descriptive
encoding of the quality control level.  The default quality  control level system used by ODM applies
integer values between 0 and 4 (converted to  text strings) as the  QualityControlLevelCodes.  Other
custom systems for QualityControlLevelCodes can be used (e.g., 0.1,  0.2 to represent raw data that is
progressing through a quality control work sequence, or text strings such as "Raw" or "Processed").  The
following 0-4  QualityControlLevelCode definitions are  adapted  from those used by other similar
systems,        such       as       NASA,       Earthscope       and       Ameriflux      (e.g.
http://ilrs.gsfc.nasa.gov/reports/ilrs reports/9809 attach7a.htmL
http://public.ornl.gov/ameriflux/available.shtml  accessed  3/6/2007) and are suggested so that CUAHSI
ODM is consistent with the practice of other data systems:

-   QualityControlLevelCode  = "0" - Raw Data
    Raw data is defined as unprocessed data and data products that have not undergone  quality control.
    Depending on the data type  and data transmission system, raw data may be available within seconds
    or minutes after real-time. Examples include real  time precipitation, stream/low and water quality
    measurements.

-   QualityControlLevelCode  = "1" - Quality Controlled Data
    Quality controlled data have passed quality assurance procedures such as routine estimation of timing
    and sensor calibration or visual  inspection and removal of obvious  errors. An example  is USGS
    published stream/low records following parsing through  USGS quality control procedures.

    QualityControlLevelCode  = "2" -Derived Products
    Derived products require scientific and technical interpretation and include multiple-sensor data. An
    example might  be basin average precipitation derived from rain gages using an  interpolation
    procedure.

    QualityControlLevelCode  = "3" -Interpreted Products
    These products require researcher (PI) driven analysis and interpretation, model-based interpretation
    using other data and/or strong prior assumptions. An  example is basin average precipitation derived
    from the combination of rain gages and radar return data.
                                           B-18

-------
    QualityControlLevelCode = "4" -Knowledge Products
    These products require researcher (PI) driven scientific interpretation  and multidisciplinary data
    integration and include model-based interpretation using other data and/or strong prior assumptions.
    An example is percentages of old or new water in a hydrograph inferred from an isotope analysis.

These definitions for quality control level are stored in the QualityControlLevels table.  These definitions
are recommended for use, but  users can  define  their  own quality control  level  system.   The
QualityControlLevels table is not a controlled  vocabulary, but specification of a quality control level for
each data value is required. Appendix B of this document provides a discussion of how to  handle data
versioning in terms of quality control levels (using the levels defined above), data series editing, and data
series creation.
ODM  has  been designed to contain all  the  core  elements of the CUAHSI  HIS metadata system
(http://www.cuahsi.org/his/metadata.html) required for compliance with evolving standards such as the
draft ISO 19115.  In its design, the ODM embodies much record, variable, and site level metadata.
Dataset and project level metadata  required by these standards,  such as  TopicCategory, Title, and
Abstract are included in a table called ISOMetaData linked to each data source.
The Methods, Sources, LabMethods and ISOMetaData tables contain fields that can be used to store links
to source or reference information.  At the general conceptual level of the ODM we do not specify how,
or in what form these links to references or sources should be implemented.  Options include using URLs
or storing entire documents in the database. If external URLs are used it will be important as the database
grows and is used over time to ensure that links or URLs included are stable. An alternative approach to
external links is to exploit the capability of modern databases to store entire digital documents, such as an
html or xml page, PDF document, or raw data file, within a field in the database.  The capability therefore
exists to instead have these links refer to a separate  table that  would actually contain this metadata
information, instead of housing it in a separate  digital library.  There is some merit in this because then
any data exported  in  ODM  format  could take with it the associated metadata required to completely
define  it as well as  the raw data upon which it is derived.  However, this has the disadvantage of
increasing (perhaps substantially) the size of database  file  containing the data and being distributed to
users.
The following tables in the ODM are tables where controlled vocabularies for the fields are required to
maintain consistency and avoid the use of synonyms that can lead to ambiguity:

    •    CensorCodeCV
    •    DataTypeCV
    •    GeneralCategoryCV
    •    SampleMediumCV
    •    SampleTypeCV
    •    SpatialReferences
    •    SpeciationCV
    •    TopicCategoryCV
    •    Units
    •    ValueTypeCV
    •    VariableNameCV
                                           B-19

-------
    •  VerticalDatumCV

The initial contents of these controlled vocabularies are specified in the Microsoft SQL Server 2005 blank
schema for the ODM. However, the ODM controlled vocabularies are dynamic.  A central repository of
current  ODM   controlled   vocabulary  terms   is   maintained   on  the   ODM   Website   at
http://water.usu.edu/cuahsi/odm/. together with the most recent version of the ODM SQL Server 2005
blank schema, this design specifications document, and other tools for working with ODM. Users can
submit new terms for the controlled vocabularies  and can request changes to existing terms using
functionality  available on the  ODM  website (http://water.usu.edu/cuahsi/odm/').   Functionality  for
updating local controlled vocabulary tables with new terms from the central ODM controlled vocabulary
repository is provided in  the ODM Tools software application, which is also available from the ODM
website.  The CUAHSI HIS team welcomes input on the controlled vocabularies.

Examples
The following examples show the capability of ODM to store different types of point observations.  It is
not possible in examples such as these to present all of the field values for all the tables.  Because of this,
the examples present selected  fields  and tables chosen to  illustrate key capabilities of the data model.
Refer to Appendix A for the complete definition of table and field contents.

             -
        ./        «,,/     
-------
   JD«l«V«lies; Table
      Valued I OalaValue | ValueAccuracy |    LocalDaleTime   | UTCQffsel | SilelD | VanablelO | MethodID [ DerivedFromlD
                     22 39831642
05/01/200600000000
OS/01/2006 0 00 00.000
05/01/2006 0:00 00.000
05/01/20060:1500.000
05/01/2006 0:15 00.000
05/01/20060.30.00.000
05/01/200603000000
05/01/2006 0:45:00.000
05/01/200604500.000
0501/20061:00.00.000
                               05/01/2006 1 15:00.000
                               05/01/2006 1 15:00.000
                              415
From ; T
Frc.mjfli
— f 1

3
4
S
6
7
ibte _ L '
Valued
> I
3
	 J
S
6
7
8 8
9
10
11
12
13
14
9
	 s
U
12
13
14
IS IS
16
17
16
17
                                                                                          : [MJ(7] f
                                        I VariabteUniiaD I Samplrtiledium |  Vi i87,pe
                                                                        TimeSuppM ITimgUnatlO]  DaliTtpe | L.emnalCalsaar) | NpDalaWte |-
                                                         Demei
                                                         F.eld <
                                     0
                                     5
                                     -
                                     Q
15
24
0
      5 Continuous
      5 Cwilmuous
                                                                                            Hydloktgic
                                                                                            Hydrologic
                                                                                    UltlMMtoul W»UrOu>Wj
                                                                                                         .9099,

JUnHs
UnitsI


Table
i| UnilsName
Jeel
f 2 3ubtc feet pet second
T3*Mi!ligiams per tiler
4 Meters
5 Mmules
~t 6 Hours
Rea»*QlIT)l * ' 5D

UnrtsType
Length
Flow
Concentration
Length
Time
Time
•• of 8
onus
.'raiv-irkwi.1 tl "
ft
«"3/s
mj/L
m
mm
hr v
                                                     J Melhodt
                                                             Tabte
                                                        Methodll
                                                                               MelrtQdDescnpltort
                                                                          J-
                                                               •age height measured vvrth conttriuous data logger
                                                               Ischarge derived from water stage using site specific rating cuiw
                                                               laily average discharge derived from 15 minute continuous discharge values
                                                     	      4 Dissolved oxygen measured wilri a Hydrolab multiprobe field instrument    v
                                                     Bemfil:(7r|fTl|     5 > (ff\f~ of 5             <             >
Figure E.I. Excerpts from tables illustrating the population of ODM with streamflow gage height (stage)
and discharge data.

Streamflow - Daily Average Discharge
Daily average streamflow is reported as an average of continuous 15 minute interval data values. Figure
E.2 shows excerpts from tables illustrating the  population of ODM  with both the continuous discharge
values and derived daily averages. The record giving the  single  daily average discharge with a value of
722 ft3/s in the DataValues table  has a DerivedFromID of 100.  This refers to multiple records in the
DerivedFrom table, with associated Value IDs 97, 98, 99,  ... 113 shown.  These refer to the specific 15
minute discharge values in  the  DataValues table used to derive the average daily discharge.  VariablelD
in the DataValues  table identifies the appropriate record in the Variables table specifying that this is a
daily average discharge with units of ft3/s from UnitsID referencing in to the Units table.  MethodID in
the  DataValues table identifies the appropriate  record in the  Methods table  specifying that the method
used to obtain this data value was daily averaging.
                                                   B-21

-------

J DauVaiiws : Table QiSJB

—




1
ValuelD
i
97
193
2
98
99
4
100
5
101
E
102
CKdrflTI





J Variable;
VanaUd
f
2 ^~
DataValue ValueAccuracy | LocalDaieTime | UTCOffset | SitelD | VariablelD | MelhodID | DerwerJFromlO
4.18 05/01/2006 OOC:C
743 05/01/20060.00.1
722 22.83631642 05/01/20060 00 (
4.18 05(01/20060:15:1
748 05/01/2006 0.15.1
4.17 05/01/20060:30:1
W.OOO -7
JO.OOO .71
B 000 -7
JO .000 -7
JO.OOO -7 T
JO.OOO -7
742 05/01/20060:30:00.000 -7
4.17 05/01/2006 045:00.000 -7
742 05/01/2006 0:45:00.000 -J
4,17 05/01/2006 1:00:00.000 -7
742 05/01/20061.00.00.000 -7
417 05/01/20061 1500000 -7
742 05/01/2006115-00.000 -7
« | T I * I MO*] of 415
1
Cp C
(

)
?
1
2

:T«bte
• vii;3r ;-::.:'j? j v.aii^c.'^.J*!^ •/iiisr t '.'iv! :. j 3:Hn-i^:,1*diL!i<
1 00065 Gage hetglil
hffipgO Dischaige
JOT60 Discharge. daily aveiage
•rtoSOO Di55oh»d«yg*neonc«n1(atipn
Reradi U<_LU I « ii_L»U»»J rf '
— ^.v^t- Fa
f 2\«
-------
            J DataValues  Table
              ValuelD | DataValue |    LocalDateTime
                  197
         194       10  09/04/2003 14:00:00.000
         195     10.13  09/04/2003 14:00:00.000
         196     10.02  09/04/2003 14:00:00.000
                 9.28  09/04/2003 14:00:00.000
                 7.85  09/04/2003 14:00:00.000
                 6.68  09/04/2003 14:00:00.000
         200      4.76  09/04/2003 14:00:00.000
         201      4.49  09/04/200314:00:00.000
    Record: fJT] < |       T [Q]T]fI] of 41S
UTCOffset | SitelD [VariablelP I QffsetValue | OffselTypelD | MethodID | -•>
                                            A
                                                                         0.2
                  198
                  199
0 -7
Ol -7
0 -7
0' -7
o; -7
0 -7
0 -7
2
2
2
2
2
2
2
m —







4
4
4
4
4
4
4







1
2
|
4
:§
|
7
	 1







1 1
1
1
1
1
1
V
\





)







4
4
4l
4
4
4
VY-
     } Vambl**
                 e I
                                                               I IsRe&iln [ l.meSup^.n TmgUnJslO)  DauT
                                                                                       | General
                                                                                              l, | NoOalaV
                   Dissofred oxygen c
                                                                                                    -9999'v
    J GroupDejctiptions : Table
         GroupID
              GroupDescription
                                                                           UnitsType    [UnitsAbb
                1 E;ho Reservoir Profile 9/4/2003 „
    Record; (TTIfTi |      2  > (TT1 > r of 2
                                                      per liter     Mass Per Volume  mg/L
                                                               I Length         m
                                                         T [ZEDS of i
    J Groups : Tabl>
GroupID
                   ValuelD  |'»
               194
               195
               196
               197
               198
               199
            \  1 /      200
    _j       \jy       201'
    RKOTd: [TTTT] I      is"  > (TT
                                                                               OffsetDescriplion
                                                                         4 D^pth below water surface
                                                                              of 2
                                                  MethodDeschption
                                4 Qssolved oxygen measured with 3 Hydrolab mulliprobe field inslrumenl
                        Record: [KL*'' I      1
Figure E.3. Excerpts from tables illustrating the population of ODM with water chemistry data.

NCDC Precipitation Data
Figure E.4 illustrates the  representation of NCDC  15 minute precipitation data by ODM.   The data
includes 15 minute incremental data values as well as daily totals. Separate records in the Variables table
are used for the  15 minute or daily total values. These data are reported at irregular intervals and only
logged for time  periods  for which precipitation is non zero.  This is  accommodated by setting the
IsRegular attribute associated with the variable to "False" and specifying the TimeSupport value as 15 or
24 and the TimeUnits as "Minutes" or "Hours".  The DataType of "Incremental" is used to indicate that
these are incremental data values defined over the TimeSupport interval. The convention for incremental
data  (see  above) is that  when the time  support is specified, it  specifies  the increment for irregular
incremental data. When time  support is specified as 0 it means  the increment is from the  previous data
value at the same site position. Data qualifiers indicate periods  where the data is missing.  The method
associated with each precipitation variable documents the convention that zero precipitation periods are
not logged in  this  data acquired from NCDC.  A data qualifier is  also used to flag  days  where the
precipitation total is incomplete due to the record being missing during part of the day.
                                                 B-23

-------
     J n.it.iVa
       ValuelD | DalaValue |    LotalDateTime   | UTCOItegl
                  0  02(01/20030:1500000
                 01 02*1/2003233000000
                 0.1  02/02/2003 0.00.00 000
                 0 1  02*2/2003 1 30 00 000
                 0,1  02(02/20037.00:00000
                 02 02*2/200313:4500000
                 01 02*2/2003163000000
                 0 5  02C3/20D3 0 00 00 000
                -9999  02*3/20038:30.00000
                -9999 02*3/2003 10:00:00.000
                 0.1
                 01
                 01
                 0,1
                 01
                 0.3
                     2iD3/2003 23 45 00 OCO
     02*4/200300000000
     2*4/200313:3000000
     2*4/2003 18:00:00.000
     2*4/200321:15:00000
     02(05/2003 0:00 00 000
             tf «5
                     Table
                                        Method Descnption
                       eciprtatron from a lipping bucket (jage  0 values not logge
                       lly precipitation from a tipping bucket gage. 0 values not
                                      of«             < i
   _ Variabtei: Tabh
                                            jgged.
                                                                   >
                                                      -
                                                                                     UnilsJUl     UnilsName   'A
                                                                          5 Sjinutes
                                                                                         7 Ir ches
                                                                     mrd: [14 |
                                                                                                          -
     Variable!!'
                     VariableName
                                      | VanableUnilsID | SaimpleMsdium [   ValgeType   | IsRegular | TimeSupporl |TimeUnilsl l[  DalaTyne
I    ft
 Lcve! <
6 Precip
7 Frecirj
                (alive to land surface (down negative)
                ation
                ation
                               • ! Ground Walei
                                7
                                         <
Field Observation
Fiolrl Oh^prvarmn
Field Obseivation
D
a
n
 0
16
24
 Inslanlaneous  Hydrologic
Srkcremenlal   Climate
S llcremental   Climate
    J QuaHfwrs : Ti bte
                     QualifieiDescriptiiin
                                                              BJteill
1\Jnly used for day 1, hour 0015 when precipitation is zero
2 Begin missing period during the 15 minute period (inclusive).
3 and missing period during Ihe 15 minute penod (inclusive)
4^complete or Inexact daily total occurring Value is nol a true 24-hour amount
                                                                               Incomplete or Inexact daily total occurring.
                                                                              Value is not a true 24-hour amount. One or
                                                                                 more periods are missing and/or an
                                                                             accumulated amount  has begun but not ended
                                                                                      during the daily period.
Figure E.4. Excerpts from tables illustrating the population of the ODM with NCDC Precipitation Data.

Groundwater Level Data
The following is an example of how groundwater level data can be stored in ODM.  In this example, the
data values are the water table level  relative to the  ground  surface  reported as negative values.  This
example shows multiple  data values of a single variable at a single site made by a single source that have
been  quality  controlled  as indicated  by the QualityControlLevellD field  in the QualityControlLevels
table.  The Site ID field in the DataValues table indicates the site in the  Sites table that gives the  location
information about the  monitoring site.  In this case, the elevation is with respect to the NGVD29 datum as
indicated in the VerticalDatum field, and latitude and longitude are with respect to the NAD27 datum as
indicated in the SpatialReferences  table.   The VariablelD field in  the DataValues table references  the
appropriate record in the Variables table indicating information about the variable.  The SourcelD field in
the DataValues table references the appropriate record in the Sources table  giving information about the
source of the data.
                                                     B-24

-------


—
SitelO | SiteCode
1 10109000
^-8,492613
( 3*4109111522101
R«w>a?ijai 	 r
| SiteName
I
Logan River Above State Dam Near Logan. Utah
Echo Reservoir above Dam 01
(A-11-1)18ddd-1
Latitude |
41 74333
40964167
416858
Longitude | LatLongDatumID | Elevation m | VerticalDatum
-11178194 1
•111.427667 1
-111.8725 1

1427 NGVD29
1753 NGVD29
1365 NGVD29

I State | County |*
Utah Cache
Utah Summit
Utah Cache v
>
           203
           209
           210
           211
           212
           213
           214
           215
           216
           217
           218
           219
           220
           221
     Record: fJT|
           elD | DataValue |    LocalDaleTitne    | UTCOffsel | SitelDl VariableiaTsourcelD | QualityConlrolLevellD
-3.03
-3.64
  -5
 -7.1
-8.25
 -8.2
 -7.8
03/05/1936 0:00:00.000
05/09/1936 0:00:00.000
06/26/1936 0:00:00.000
08/13/19360:00:00000
10/11/1936000:00.000
12/14/19360:00,00.000
01/06/19370:00:00.000
                             -7
                             -7
                             -7
                             -7
                             -7
                             -7
                             -7
 •7.5
 -6.6
 -6.2
-7.75
•8.35
-8,25
 -8.1
      01/17/19370:00:00.000
      03/12/19370:00:00.000
      05/12/1937 0:00:00.000
      08^)6/1937 0:00:00.000
      09/30/1937 0:00:00.000
      11/02/19370:00:00.000
      12/16/19370:00:00.000
                       -7
                       -7
                       -7
                       -7

                                                   1
                                                    : Hi.ililyti.iit!
                             VanablsName
                     Level relatrve to land surface (down negative)
                                              [ Vanafale Unit siD | SampleMediurn |
                                                       1 Ground Water  Field
                                                                       OualityConti ilLevellD| QualityControlLevelCodej
                                                                                                * 11  BS
                                                                 33
                                                                 44
                                                                            Raw data
                                                                            Quality controlled data
                                                                            Derived products
                                                                            Interpreted products
                                                                            Knowledge products
                                                                                                                       -
    J Sources  Table
                                                  SourceDescriplion
                                                                                                   SourceLink
    Record:
             1 LSGS      United States Geolofical Survey Data Retrieved from the National Water Information System  hllp //walerdata usg$ gov/nwis
            TArtahDWQ   Utah Division of Water Quality Data Retrieved from the EPA STORE! Repository         htlp //www epa gov/STORET/      >
                      3 ITTMKF*] of 3                <|                                                              >
Figure  E.5.  Excerpts from tables  illustrating the population  of  the  ODM  with irregularly  sampled
groundwater level data.

Acknowledgements
This material is based upon work supported by the National  Science Foundation under Grant
Nos. EAR 0412975 and 0413265. Any opinions,  findings and  conclusions or recommendations
expressed in this material are those of the authors  and do not necessarily reflect  the views of the
National  Science Foundation (NSF).
                                                       B-25

-------
References
AIAA, (1995), Assessment of Wind Tunnel Data Uncertainty. American Institute of Aeronautics and
       Astronautics: AIAA S-071-1995.
Bloschl, G., (1996), Scale  and Scaling in Hydrology, Habilitationsschrift. Weiner Mitteilungen Wasser
       Abwasser Gewasser, Wien, 346 p.
Bloschl, G. and M. Sivapalan, (1995), "Scale Issues in Hydrological Modelling: A Review," Hydrological
       Processes. 9(1995): 251-290.
Horsburgh, J.  S., D.  G. Tarboton and  D. R. Maidment, (2005),  "A  Community  Data  Model  for
       Hydrologic Observations, Chapter 6," in Hydrologic Information System Status Report. Version
       1, Edited by D. R. Maidment, p. 102-135, http://www.cuahsi.org/his/docs/HISStatusSeptl5.pdf
Horsburgh, J. S., D. G.  Tarboton, D. R. Maidment, and I. Zaslavsky, (2008), A relational model  for
       environmental  and water  resources data,  Water Resources Research,  Vol.  44,  W05406,
       doi: 10.1029/2007WR006392.
Maidment, D. R, ed. (2002), Arc Hydro GIS for Water Resources. ESRI Press, Redlands, CA, 203 p.
Maidment, D. R., (2005), "A Data Model for Hydrologic Observations." Paper prepared for presentation
       at the CUAHSI  Hydrologic Information Systems  Symposium, University  of Texas at Austin.
       March?, 2005.
Tarboton, D. G., (2005), "Review  of Proposed CUAHSI  Hydrologic Information System Hydrologic
       Observations Data Model." Utah State University. May 5, 2005.
                                          B-26

-------
Appendix A. Observations Data Model Table and Field Structure


The following is a description of the tables in the observations data model, a listing of the fields contained
in each table, a description of the data contained in each field and its  data type, examples of the
information to be stored in each field where appropriate, specific constraints imposed on each field, and
discussion  on how each field  should be  populated.  Values  in the example column should not be
considered to be inclusive of all potential values, especially in the case of fields that require a controlled
vocabulary.  We anticipate that these controlled vocabularies will need to be extended and adjusted.
Tables appear in alphabetical order.

Each table below includes a "Constraint" column.  The value in this column designates each field in the
table as one of the following:

Mandatory (M) - A value in this field is mandatory and cannot be NULL.
Optional (O) - A value in this field is optional and can be NULL.
Programmatically derived (P)  -  Inherits from the source  field.   The value in this field  should be
automatically populated as the result of a query and is not required to be input by the user.

Additional  constraints are documented where appropriate in the Constraint  column.   In addition, where
appropriate, each table contains  a "Default Value" column.  The value in this column is the default value
for the associated field. The default value specifies the convention that should be followed when a value
for the field is not specified.  Below each table  is a discussion of the rules and best practices that should
be used in populating each table  within ODM.

Table: Categories

The Categories table defines the categories for  categorical variables. Records are  required for variables
where  DataType is specified as  "Categorical."  Multiple  entries for each VariablelD,  with different
DataValues provide the mapping from DataValue to category description.
Field Name
VariablelD
DataValue
CategoryDescription
DataType
Integer
Real
Text (Unlimited)
Description
Integer identifier that references the
Variables record of a categorical
variable.
Numeric value that defines the category
Definition of categorical variable value
Examples
45
1.0
"Cloudy"
Constraint
M
Foreign
key
M
M
The following rules and best practices should be used in populating this table:
    1.  Although all of the fields in this table are mandatory, they need only be populated if categorical
       data are entered into the database.  If there are no categorical data in the DataValues table, this
       table will be empty.
    2.  This table should be populated before categorical data values are added to the DataValues table.

Table: CensorCodeCV

The CensorCodeCV table contains the controlled vocabulary  for censor codes.  Only values from the
Term field in this table can be used to populate the CensorCode field of the DataValues table.
                                           B-27

-------
Field Name
Term








Definition



Data Type
Text (255)








Text (unlimited)



Description
Controlled vocabulary for CensorCode.








Definition of CensorCode controlled
vocabulary term. The definition is
optional if the term is self explanatory.

Examples
"It", "gt",
"nc"







"less than",
"greater
than", "not
censored"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O



This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: DataTypeCV

The  DataTypeCV table contains the controlled vocabulary for data types.  Only values from the Term
field in this table can be used to populate the DataType field in the Variables table.
Field Name
Term








Definition












Data Type
Text (255)








Text (unlimited)












Description
Controlled vocabulary for DataType.








Definition of DataType controlled
vocabulary term. The definition is
optional if the term is self explanatory.










Examples
"Continuous"








"A quantity
specified at a
particular
instant in time
measured with
sufficient
frequency
(small
spacing) to be
interpreted as
a continuous
record of the
phenomenon."
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
0












This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: DataValues

The DataValues table contains the actual data values.
                                           B-28

-------
Field Name
ValuelD
DataValue
ValueAccuracy
LocalDateTime
UTCOffset
DateTimeUTC
SitelD
VariablelD
OffsetValue
OffsetTypelD
CensorCode
QualifierlD
MethodID
Data Type
Integer
Identity
Real
Real
Date/Time
Real
Date/Time
Integer
Integer
Real
Integer
Text (50)
Integer
Integer
Description
Unique integer identifier for each
data value.
The numeric value of the
observation. For Categorical
variables, a number is stored here.
The Variables table has DataType
as Categorical and the Categories
table maps from the DataValue
onto Category Description.
Numeric value that describes the
measurement accuracy of the data
value. If not given, it is interpreted
as unknown.
Local date and time at which the
data value was observed.
Represented in an implementation
specific format.
Offset in hours from UTC time of
the corresponding LocalDateTime
value.
Universal UTC date and time at
which the data value was observed.
Represented in an implementation
specific format.
Integer identifier that references the
site at which the observation was
measured. This links data values to
their locations in the Sites table.
Integer identifier that references the
variable that was measured. This
links data values to their variable in
the Variables table.
Distance from a datum or control
point to the point at which a data
value was observed. If not given
the OffsetValue is inferred to be 0,
or not relevant/necessary.
Integer identifier that references the
measurement offset type in the
OffsetTypes table.
Text indication of whether the data
value is censored from the
CensorCodeCV controlled
vocabulary.
Integer identifier that references the
Qualifiers table. If Null, the data
value is inferred to not be qualified.
Integer identifier that references
method used to generate the data
value in the Methods table.
Example
43
34.5
4
9/4/2003
7:00:00
AM
-7
9/4/2003
2:00:00
PM
o
J
5
2.1
3
"nc"
4
o
J
Constraint
M
Unique
Primary key
M
O
M
M
M
M
Foreign key
M
Foreign key
O
O
Foreign key
M
Foreign key
0
Foreign key
M
Foreign key
Default
Value


NULL





NULL =
No Offset
NULL =
No Offset
"nc" = Not
Censored
NULL
0 = No
method
specified
B-29

-------
Field Name
SourcelD
SamplelD
DerivedFromID
Quality Co ntrolLevellD
Data Type
Integer
Integer
Integer
Integer
Description
Integer identifier that references the
record in the Sources table giving
the source of the data value.
Integer identifier that references
into the Samples table. This is
required only if the data value
resulted from a physical sample
processed in a lab.
Integer identifier for the derived
from group of data values that the
current data value is derived from.
This refers to a group of derived
from records in the DerivedFrom
table. If NULL, the data value is
inferred to not be derived from
another data value.
Integer identifier giving the level of
quality control that the value has
been subjected to. This references
the QualityControlLevels table.
Example
5
7
5
1
Constraint
M
Foreign key
O
Foreign key
O
M
Foreign key
Default
Value

NULL
NULL
-9999
Unknown
The following rules and best practices should be used in populating this table:

    1.  ValuelD is the primary key, is mandatory, and cannot be NULL.  This field should be
       implemented as an autonumber/identity field. When data values are added to this table, a unique
       integer ValuelD should be assigned to each data value by the database software such that the
       primary key constraint is not violated.
    2.  Each record in this table must be unique. This is enforced by a unique constraint across all of the
       fields in this table (excluding ValuelD) so that duplicate records are avoided.
    3.  The LocalDateTime, UTCOffset, and DateTimeUTC must all be populated.  Care must be taken
       to ensure that the correct UTCOffset is used, especially in areas that observe daylight saving time.
       If LocalDateTime and DateTimeUTC are given, the UTCOffset can be calculated as the
       difference between the two dates. If LocalDateTime and UTCOffset are given, DateTimeUTC
       can be calculated.
    4.  Site ID must correspond to a valid Site ID from the Sites table.  When adding data for a new site to
       the ODM, the Sites table should be populated prior to adding data values to the DataValues table.
    5.  VariablelD must correspond to a valid Variable ID from the Variables table.  When adding data
       for a new variable to the ODM, the Variables table should be populated prior to adding data
       values for the new variable to the DataValues table.
    6.  OffsetValue and OffsetTypelD are optional because not all data values have  an offset.  Where no
       offset is used, both of these fields should be set to NULL indicating that the data values do not
       have an offset. Where an OffsetValue is specified, an OffsetTypelD must also be specified and it
       must refer to a valid OffsetTypelD in the OffsetTypes table. The OffsetTypes table should be
       populated prior to adding  data values with a particular OffsetTypelD to  the DataValues table.
    7.  CensorCode is mandatory and cannot be NULL. A default value of "nc" is used for this field.
       Only Terms from the CensorCodeCV table should be used to populate this field.
    8.  The QualifierlD field is optional because not all data values have qualifiers.  Where no qualifier
       applies, this field should be set to NULL. When a QualifierlD is specified in this field it must
       refer to a valid QualifierlD in the Qualifiers table. The Qualifiers table  should be populated prior
       to adding data values with a particular QualifierlD to the DataValues Table.
                                           B-30

-------
    9.  MethodID must correspond to a valid MethodID from the Methods table and cannot be NULL. A
       default value of 0 is used in the case where no method is specified or the method used to create
       the observation is unknown.  The Methods table should be populated prior to adding data values
       with a particular MethodID to the DataValues table.
    10. SourcelD must correspond to a valid SourcelD from the Sources table and cannot be NULL. The
       Sources table should be populated prior to adding data values with a particular SourcelD to the
       DataValues table.
    11. SamplelD is optional and should only be populated if the data value was generated from a
       physical sample that was sent to a laboratory for analysis. The SamplelD must correspond to a
       valid SamplelD in the Samples table, and the Samples table should be populated prior to adding
       data values with a particular SamplelD to the DataValues table.
    12. DerivedFromID is optional and should only be populated if the data value was derived from other
       data values that are also  stored in the ODM database.
    13. QualityControlLevellD is mandatory, cannot be NULL, and must correspond to a valid
       QualityControlLevellD in the QualityControlLevels table. A default value of-9999 is used  for
       this field in the event that the QualityControlLevellD is unknown. The QualityControlLevels
       table should be populated prior to adding data values with a particular QualityControlLevellD to
       the DataValues table.

Table: DerivedFrom

The DerivedFrom table contains the linkage between derived  data values and the data values that they
were derived from.
Field Name
DerivedFromID
ValuelD
Data Type
Integer
Integer
Description
Integer identifying the group of data
values from which a quantity is derived.
Integer identifier referencing data values
that comprise a group from which a
quantity is derived. This corresponds to
ValuelD in the DataValues table.
Examples
O
5
1,2,3,4,5
Constraint
M
M
The following rules and best practices should be used in populating this table:

    1.  Although all of the fields in this table are mandatory, they need only be populated if derived data
       values and the data values that they were derived from are entered into the database. If there are
       no derived data in the DataValues table, this table will be empty.

Table: GeneralCategoryCV

The GeneralCategoryCV table contains the controlled vocabulary for the general categories associated
with Variables.  The General Category field in the Variables table can only be populated with values from
the Term field of this controlled vocabulary table.
                                           B-31

-------
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for
GeneralCategory.
Definition of GeneralCategory
controlled vocabulary term. The
definition is optional if the term is self
explanatory.
Examples
"Hydrology"
"Data
associated
with
hydrologic
variables or
processes."
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM.  Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: GroupDescriptions

The GroupDescriptions table lists the descriptions for each of the groups of data values that have been
formed.
Field Name
GroupID
GroupDescription
Data Type
Integer
Identity
Text (unlimited)
Description
Unique integer identifier for each group
of data values that has been formed.
This also references to GroupID in the
Groups table.
Text description of the group.
Example
4
"Echo
Reservoir
Profile
7/7/2005"
Constraint
M
Unique
Primary key
0
The following rules and best practices should be used in populating this table:

    1.  This table will only be populated if groups of data values have been created in the ODM database.
    2.  The GroupID field is the primary key, must be a unique integer, and cannot be NULL. It should
       be implemented as an auto number/identity field.
    3.  The GroupDescription can be any text string that describes the group of observations.

Table: Groups

The Groups table lists the groups of data values that have been created and the data values that are within
each group.
Field Name
GroupID
ValuelD
Data Type
Integer
Integer
Description
Integer ID for each group of data values
that has been formed.
Integer identifier for each data value that
belongs to a group. This corresponds to
ValuelD in the Data Values table
Example
4
2,3,4
Constraint
M
Foreign key
M
Foreign key
                                           B-32

-------
The following rules and best practices should be used in populating this table:

    1.  This table will only be populated if groups of data values have been created in the ODM database.
    2.  The GroupID field must reference a valid GroupID from the GroupDescriptions table, and the
       GroupDescriptions table should be populated for a group prior to populating the Groups table.

Table: ISOMetadata

The  ISOMetadata table contains  dataset and project level metadata required by the CUAHSI HIS
metadata system (http://www.cuahsi.org/his/documentation.html) for compliance with standards such as
the draft ISO 19115 or ISO 8601.  The mandatory fields in this table must be populated to provide a
complete set of ISO compliant metadata in the database.
Field Name
MetadatalD
TopicCategory
Title
Abstract
ProfileVersion
MetadataLink
Data Type
Integer
Identity
Text (255)
Text (255)
Text
(unlimited)
Text (255)
Text (500)
Description
Unique integer ID for each
metadata record.
Topic category keyword that
gives the broad ISO19115
metadata topic category for data
from this source. The controlled
vocabulary of topic category
keywords is given in the
TopicCategoryCV table.
Title of data from a specific data
source.
Abstract of data from a specific
data source.
Name of metadata profile used
by the data source
Link to additional metadata
reference material.
Example
4
"inlandWaters"


"ISO8601"

Constraint
M
Unique
Primary key
M
Foreign key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
M
Cannot
contain tab,
line feed, or
carriage
return
characters
O
Default Value

"Unknown"
"Unknown"
"Unknown"
"Unknown"
NULL
The following rules and best practices should be used in populating this table:

    1.  The MetadatalD field is the primary key, must be a unique integer, and cannot be NULL. This
       field should be implemented as an auto number/identity field.
    2.  All of the fields in this table are mandatory and cannot be NULL except for the MetadataLink
       field.
    3.  The TopicCategory field should only be populated with terms from the TopicCategoryCV table.
       The default controlled vocabulary term is "Unknown."
                                           B-33

-------
    4.  The Title field should be populated with a brief text description of what the referenced data
       represent. This field can be populated with "Unknown" if there is no title for the data.
    5.  The Abstract field should be populated with a more complete text description of the data that the
       metadata record references. This field can be populated with "Unknown" if there is no abstract
       for the data.
    6.  The ProfileVersion field should be populated with the version of the ISO metadata profile that is
       being used.  This field can be populated with "Unknown" if there is no profile version for the
       data.
    7.  One record with a MetadatalD = 0 should exist in this table with TopicCategory, Title, Abstract,
       and ProfileVersion = "Unknown" and MetadataLink = NULL. This record should be the default
       value for sources with unknown/unspecified metadata.

Table: LabMethods

The LabMethods table contains descriptions of the laboratory methods used to analyze physical samples
for specific constituents.
Field Name
LabMethodID
LabName
LabOrganization
LabMethodName
LabMethodDescription
LabMethodLink
Data Type
Integer
Identity
Text (255)
Text (255)
Text (255)
Text
(unlimited)
Text (500)
Description
Unique integer identifier
for each laboratory
method. This is the key
used by the Samples table
to reference a laboratory
method.
Name of the laboratory
responsible for processing
the sample.
Organization responsible
for sample analysis.
Name of the method and
protocols used for sample
analysis.
Description of the method
and protocols used for
sample analysis.
Link to additional
reference material on the
analysis method.
Example
6
"USGS
Atlanta Field
Office"
"USGS"
"USEPA-
365.1"
"Processed
through Model
*** Mass
Spectrometer"

Constraint
M
Unique
Primary
key
M
Cannot
contain tab,
line feed,
or carriage
return
characters
M
Cannot
contain tab,
line feed,
or carriage
return
characters
M
Cannot
contain tab,
line feed,
or carriage
return
characters
M
O
Default Value

"Unknown"
"Unknown"
"Unknown"
"Unknown"
NULL
The following rules and best practices should be used when populating this table:
                                           B-34

-------
    1.  The LabMethodID field is the primary key, must be a unique integer, and cannot be NULL.  It
       should be implemented as an auto number/identity field.
    2.  All of the fields in this table are required and cannot be null except for the LabMethodLink.
    3.  The default value for all of the required fields except for the  LabMethodID is "Unknown."
    4.  A single record should exist in this table where the LabMethodID = 0 and the LabName,
       LabOrganization, LabMethdodName, and LabMethodDescription fields are equal to "Unknown"
       and the LabMethodLink = NULL. This record should be used to identify samples in the Samples
       table for which nothing is known about the laboratory method used to analyze the sample.

Table: Methods

The Methods table lists the methods used to collect the data and any additional information about the
method.
Field Name
MethodID
MethodDescription
MethodLink
Data Type
Integer
Identity
Text
(unlimited)
Text (500)
Description
Unique integer ID for
each method.
Text description of each
method.
Link to additional
reference material on
the method.
Example
5
"Specific
conductance
measured using a
Hydrolab" or
"Streamflow
measured using a V
notch weir with
dimensions xxx"

Constraint
M
Unique
Primary key
M
O
Default Value


NULL
The following rules and best practices should be used when populating this table:

    1.  The MethodID field is the primary key, must be a unique integer, and cannot be NULL.
    2.  There is no default value for the MethodDescription field in this table. Rather, this table should
       contain a record with MethodID = 0, MethodDescription = "Unknown", and MethodLink =
       NULL. A MethodID of 0 should be used as the MethodID for any data values for which the
       method used to create the value is unknown (i.e., the default value for the MethodID field in the
       DataValues table is 0).
    3.  Methods should describe the manner in which the observation was collected (i.e., collected
       manually, or collected using an automated sampler) or measured (i.e., measured using a
       temperature sensor or measured using a turbidity sensor).  Details  about the specific sensor
       models and manufacturers can be included in the MethodDescription.
                                          B-35

-------
Table: ODM Version
The ODM Version table has a single record that records the version of the ODM database. This table
must contain a valid ODM version number. This table will be pre-populated and should not be edited.
Field Name
VersionNumber




Data Type
Text (50)




Description
String that lists the version of the ODM
database.




Example
"1.1"




Constraint
M
Cannot
contain tab,
line feed, or
carriage
return
characters
Table: Offset Types

The OffsetTypes table lists full descriptive information for each of the measurement offsets.
Field Name
OffsetTypelD
OffsetUnitsID
OffsetDescription
Data Type
Integer
Identity
Integer
Text (unlimited)
Description
Unique integer identifier that identifies
the type of measurement offset.
Integer identifier that references the
record in the Units table giving the Units
of the OffsetValue.
Full text description of the offset type.
Example
2
1
"Below water
surface"
"Above
Ground Level"
Constraint
M
Unique
Primary key
M
Foreign key
M
The following rules and best practices should be followed when populating this table:

    1.  Although all three fields in this table are mandatory, this table will only be populated if data
       values measured at an offset have been entered into the ODM database.
    2.  The OffsetTypelD field is the primary key, must be a unique integer, and cannot be NULL.  This
       field should be implemented as an auto number/identity field.
    3.  The OffsetUnitsID field should reference a valid ID from the UnitsID field in the Units table.
       Because the Units table is a controlled vocabulary, only units that already exist in the Units table
       can be used as the units of the offset.
    4.  The OffsetDescription field should be filled in with a complete text description of the offset that
       provides enough information to interpret the type of offset being used. For example, "Distance
       from stream bank" is ambiguous because it is not known which bank is being referred to.

Table: Qualifiers

The Qualifiers table contains data qualifying comments that accompany the data.
Field Name
QualifierlD
Data Type
Integer
Identity
Description
Unique integer identifying the
data qualifier.
Example
3
Constraint
M
Unique
Primary key
Default Value

                                           B-36

-------
QualifierCode







QualifierDescription




Text (50)







Text
(unlimited)



Text code used by
organization that collects the
data.





Text of the data qualifying
comment.



"e" (for
estimated) or
"a" (for
approved) or
"p" (for
provisional)


"Holding
time for
sample
analysis
exceeded"
0
Cannot
contain
space, tab,
line feed, or
carriage
return
characters
M




NULL












This table will only be populated if data values that have data qualifying comments have been added to
the ODM database.  The following rules and best practices should be used when populating this table:

    1.  The QualifierlD field is the primary key,  must be a unique integer, and cannot be NULL. This
       field should be implemented as an auto number/identity field.

Table: QualityControlLevels

The QualityControlLevels table contains the quality control levels that are used for versioning data within
the database.
Field Name
Quality Co ntrolLevellD
Quality Co ntrolLevelCode
Definition
Explanation
Data Type
Integer
Identity
Text (50)
Text (255)
Text
(unlimited)
Description
Unique integer identifying the
quality control level.
Code used to identify the level
of quality control to which data
values have been subjected.
Definition of Quality Control
Level.
Explanation of Quality Control
Level
Example
0,1,2,3,4,5
"1", "1.1", "Raw",
"QC Checked"
"Raw Data",
"Quality Controlled
Data"
"Raw data is defined
as unprocessed data
and data products
that have not
undergone quality
control."
Constraint
M
Unique
Primary key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
This table is pre-populated with quality control levels 0 through 4 within the ODM. The following rules
and best practices should be used when populating this table:
                                           B-37

-------
       The QualityControlLevellD field is the primary key, must be a unique integer, and cannot be
       NULL.  This field should be implemented as an auto number/identity field.
       It is suggested that the pre-populated system of quality control level codes (i.e.,
       QualityControlLevelCodes 0 - 4) be used.  If the pre-populated list is not sufficient, new quality
       control levels can be defined. A quality control level code of-9999 is suggested for data whose
       quality control level is unknown.
Table: SampleMediumCV

The SampleMediumCV table contains the controlled vocabulary for sample media.
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for
sample media.
Definition of sample media
controlled vocabulary term. The
definition is optional if the term
is self explanatory.
Examples
"Surface Water"
"Sample taken from
surface water such as a
stream, river, lake,
pond, reservoir, ocean,
etc."
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM.
http://water.usu.edu/cuahsi/odm/.
Changes to this controlled vocabulary can be requested at
Table: Samples

The Samples table gives information about physical samples analyzed in a laboratory.
Field Name
SamplelD
SampleType
LabSampleCode
Data Type
Integer
Identity
Text (255)
Text (50)
Description
Unique integer identifier that
identifies each physical sample.
Controlled vocabulary specifying
the sample type from the
SampleTypeCV table.
Code or label used to identify and
track lab sample or sample
container (e.g. bottle) during lab
analysis.
Example
O
3
"FD",
"PB",
"SW",
"Grab
Sample"
"AB-123"
Constraint
M
Unique
Primary key
M
Foreign key
M
Unique
Cannot
contain tab,
line feed, or
carriage
return
characters
Default Value

"Unknown"

                                           B-38

-------
Field Name
LabMethodID
Data Type
Integer
Description
Unique identifier for the
laboratory method used to process
the sample. This references the
LabMethods table.
Example
4
Constraint
M
Foreign key
Default Value
0 = Nothing
known about
lab method
The following rules and best practices should be followed when populating this table:

    1.  This table will only be populated if data values associated with physical samples are added to the
       ODM database.
    2.  The SamplID field is the primary key, must be a unique integer, and cannot be NULL. This field
       should be implemented as an auto number/identity field.
    3.  The SampleType field should be populated using terms from the SampleTypeCV table.  Where
       the sample type is unknown, a default value of "Unknown" can be used.
    4.  The LabSampleCode should be  a unique text code used by the laboratory to identify the sample.
       This field is an alternate key for this table and should be unique.
    5.  The LabMethodID must reference a valid LabMethodID from the LabMethods table. The
       LabMethods table should be populated with the appropriate laboratory method information prior
       to adding records to this table that reference that laboratory method.  A default value of 0 for this
       field indicates that nothing is known about the laboratory method used to analyze the sample.

Table: SampleTypeCV

The SampleTypeCV table contains the controlled vocabulary for sample type.
Field Name
Term








Definition



Data Type
Text (255)








Text
(unlimited)


Description
Controlled vocabulary for sample
type.







Definition of sample type controlled
vocabulary term. The definition is
optional if the term is self
explanatory.
Examples
"FD", "PB", "Grab
Sample"







"Foliage Digestion",
"Precipitation Bulk"


Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O



This table is pre-populated within the ODM.  Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: SeriesCatalog

The SeriesCatalog table lists each separate data series in the database for the purposes of identifying or
displaying what data are available at each site and to  speed simple queries without querying the main
DataValues table.  Unique  site/variable combinations are  defined by unique combinations of SitelD,
VariablelD, MethodID, SourcelD, and QualityControlLevellD.
                                          B-39

-------
This entire table should be programmatically derived and should be updated every time data is added to
the database. Constraints on each field in the SeriesCatalog table are dependent upon the constraints on
the fields in the table from which those fields originated.
Field Name
SeriesID
SitelD
SiteCode
SiteName
VariablelD
VariableCode
VariableName
Speciation
VariableUnitsID
VariableUnitsName
SampleMedium
ValueType
TimeSupport
Data Type
Integer
Identity
Integer
Text (50)
Text (255)
Integer
Text (50)
Text (255)
Text (255)
Integer
Text (255)
Text (255)
Text (255)
Real
Description
Unique integer identifier for
each data series.
Site identifier from the Sites
table.
Site code used by organization
that collects the data.
Full text name of sampling
site.
Integer identifier for each
Variable that references the
Variables table.
Variable code used by the
organization that collects the
data.
Name of the variable from the
variables table.
Code used to identify how the
data value is expressed (i.e.,
total phosphorus concentration
expressed as P). This should
be from the SpeciationCV
controlled vocabulary table.
Integer identifier that
references the record in the
Units table giving the Units of
the data value.
Full text name of the variable
units from the UnitsName
field in the Units table.
The medium of the sample.
This should be from the
SampleMediumCV controlled
vocabulary table.
Text value indicating what
type of data value is being
recorded. This should be from
the ValueTypeCV controlled
vocabulary table.
Numerical value that indicates
the time support (or temporal
footprint) of the data values. 0
is used to indicate data values
that are instantaneous. Other
values indicate the time over
which the data values are
implicitly or explicitly
averaged or aggregated.
Example
5
7
"1002000"
"Logan River"
4
"00060"
"Temperature"
"P", "N", "NO3"
5
"milligrams per
liter"
"Surface Water"
"Field Observation"
0,24
Constraint
P
Primary key
P
P
P
P
P
P
P
P
P
P
P
P
                                            B-40

-------
Field Name
TimeUnitsID
TimeUnitsName
DataType
GeneralCategory
MethodID
MethodDescription
SourcelD
Organization
SourceDescription
Data Type
Integer
Text (255)
Text (255)
Text (255)
Integer
Text
(unlimited)
Integer
Text (255)
Text
(unlimited)
Description
Integer identifier that
references the record in the
Units table giving the Units of
the time support. If
TimeSupport is 0, indicating
an instantaneous observation,
a unit needs to still be given
for completeness, although it
is somewhat arbitrary.
Full text name of the time
support units from the
UnitsName field in the Units
table.
Text value that identifies the
data as one of several types
from the DataTypeCV
controlled vocabulary table.
General category of the
variable from the
GeneralCategoryCV table.
Integer identifier that
identifies the method used to
generate the data values and
references the Methods table.
Full text description of the
method used to generate the
data values.
Integer identifier that
identifies the source of the
data values and references the
Sources table.
Text description of the source
organization from the Sources
table.
Text description of the data
source from the Sources table.
Example
4
"hours"
"Continuous"
"Instantaneous"
"Cumulative"
"Incremental"
"Average"
"Minimum"
"Maximum"
"Constant Over
Interval"
"Categorical"
"Water Quality"
2
"Specific
conductance
measured using a
Hydrolab" or
"Streamflow
measured using a V
notch weir with
dimensions xxx"
5
"USGS"
"Text file retrieved
from the EPA
STORET system
indicating data
originally from Utah
Division of Water
Quality"
Constraint
P
P
P
P
P
P
P
P
P
B-41

-------
Field Name
Citation




Quality Co ntrolLevellD

Quality Co ntrolLevelCode

BeginDateTime

EndDateTime


BeginDateTimeUTC

EndDateTimeUTC


ValueCount


Data Type
Text
(unlimited)




Integer

Text (50)

Date/Time

Date/Time


Date/Time

Date/Time


Integer


Description
Text string that give the
citation to be used when the
data from each source are
referenced.




Integer identifier that indicates
the level of quality control that
the data values have been
subjected to.
Code used to identify the level
of quality control to which
data values have been
subjected.
Date of the first data value in
the series. To be
programmatically updated if
new records are added.
Date of the last data value in
the series. To be
programmatically updated if
new records are added.
Date of the first data value in
the series in UTC. To be
programmatically updated if
new records are added.
Date of the last data value in
the series in UTC. To be
programmatically updated if
new records are added.
The number of data values in
the series identified by the
combination of the SitelD,
VariablelD, MethodID,
SourcelD and
QualityControlLevellD fields.
To be programmatically
updated if new records are
added.
Example
"Slaughter, C. W.,
D. Marks, G. N.
Flerchinger, S. S.
Van Vactor and M.
Burgess, (2001),
"Thirty-five years of
research data
collection at the
Reynolds Creek
Experimental
Watershed, Idaho,
United States,"
Water Resources
Research, 37(11):
2819-2823."
0,1,2,3,4

"1", "1.1", "Raw",
"QC Checked"

9/4/2003 7:00:00
AM

9/4/2005 7:00:00
AM

9/4/2003 2:00 PM

9/4/2003 2:00 PM


50


Constraint
P




P

P

P

P


P

P


P


B-42

-------
Table:  Sites
The Sites table provides information giving the spatial location at which data values have been collected.
Field Name
SitelD
SiteCode
SiteName
Latitude
Longitude
LatLongDatumID
Elevation_m
VerticalDatum
LocalX
LocalY
Data Type
Integer
Identity
Text (50)
Text (255)
Real
Real
Integer
Real
Text (255)
Real
Real
Description
Unique identifier for each
sampling location.
Code used by organization that
collects the data to identify the
site
Full name of the sampling site.
Latitude in decimal degrees.
Longitude in decimal degrees.
East positive, West negative.
Identifier that references the
Spatial Reference System of
the latitude and longitude
coordinates in the
SpatialReferences table.
Elevation of sampling location
(inm). If this is not provided it
needs to be obtained
programmatically from a DEM
based on location information.
Vertical datum of the elevation.
Controlled Vocabulary from
VerticalDatumCV.
Local Projection X coordinate.
Local Projection Y Coordinate.
Example
37
"10109000"
(USGS Gage
number)
"LOGAN
RIVER
ABOVE
STATE DAM,
NEAR
LOGAN,UT"
45.32
-100.47
1
1432
"NAVD88"
456700
232000
Constraint
M
Unique
Primary
key
M
Unique
Allows
only
characters
in the
range of A-
Z (case
insensitive)
, 0-9, and
and" ".
M
Cannot
contain
tab, line
feed, or
carriage
return
characters
M
(>= -90
AND
<=90)
M
(>= -180
AND <=
360)
M
Foreign
key
O
O
Foreign
key
O
O
Default Value





0 = Unknown
NULL
NULL
NULL
NULL
                                             B-43

-------
Field Name
LocalProj ectionID
PosAccuracy_m
State
County
Comments
Data Type
Integer
Real
Text (255)
Text (255)
Text
(unlimited)
Description
Identifier that references the
Spatial Reference System of
the local coordinates in the
SpatialReferences table. This
field is required if local
coordinates are given.
Value giving the accuracy with
which the positional
information is specified in
meters.
Name of state in which the
monitoring site is located.
Name of county in which the
monitoring site is located.
Comments related to the site.
Example
7
100
"Utah"
"Cache"

Constraint
O
Foreign
key
O
O
Cannot
contain
tab, line
feed, or
carriage
return
characters
0
Cannot
contain
tab, line
feed, or
carriage
return
characters
O
Default Value
NULL
NULL
NULL
NULL
NULL
The following rules and best practices should be followed when populating this table:

    1.  The SitelD field is the primary key, must be a unique integer, and cannot be NULL. This field
       should be implemented as an auto number/identity field.
    2.  The SiteCode field must contain a text code that uniquely identifies each site. The values in this
       field should be unique and can be an alternate key for the table.  SiteCodes cannot contain any
       characters other than A-Z (case insensitive), 0-9, period ".", dash "-", and underscore "_".
    3.  The LatLongDatumID must reference a valid SpatialReferencelD from the SpatialReferences
       controlled vocabulary table. If the datum is unknown, a default value of 0 is used.
    4.  If the Elevation_m field is populated with a numeric value, a value must be specified in the
       VerticalDatum field.  The VerticalDatum field can only be populated using terms from the
       VerticalDatumCV table. If the vertical datum is unknown, a value of "Unknown" is used.
    5.  If the LocalX and LocalY fields are populated with numeric values, a value must be specified in
       the LocalProj ectionID field. The LocalProj ectionID must  reference a valid SpatialReferencelD
       from the SpatialReferences controlled vocabulary table. If the spatial reference system of the
       local coordinates is unknown, a default value of 0 is used.

Table: Sources

The Sources table lists the original sources of the data, providing information sufficient to retrieve and
reconstruct the data value  from the original data files if necessary.
                                           B-44

-------
Field Name
SourcelD


Organization






SourceDescription









SourceLink




ContactName






Phone






Email






Address






Data Type
Integer
Identity

Text (255)






Text
(unlimited)








Text (500)




Text (255)






Text (255)






Text (255)






Text (255)






Description
Unique integer identifier that
identifies each data source.

Name of the organization that
collected the data. This should
be the agency or organization
that collected the data, even if
it came out of a database
consolidated from many
sources such as STORET.
Full text description of the
source of the data.








Link that can be pointed at the
original data file and/or
associated metadata stored in
the digital library or URL of
data source.
Name of the contact person for
the data source.





Phone number for the contact
person.





Email address for the contact
person.





Street address for the contact
person.





Example
5


"Utah Division
of Water
Quality"




"Text file
retrieved from
the EPA
STORET
system
indicating data
originally from
Utah Division
of Water
Quality"





"Jane Adams"






"435-797-0000"






"Jane.Adams@
dwq.ut"





"45 Main
Street"





Constraint
M
Unique
Primary key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M









0




M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
Default Value




















NULL




"Unknown"






"Unknown"






"Unknown"






"Unknown"






B-45

-------
Field Name
City






State






ZipCode






Citation




MetadatalD


Data Type
Text (255)






Text (255)






Text (255)






Text
(unlimited)



Integer


Description
City in which the contact
person is located.





State in which the contact
person is located. Use two
letter abbreviations for US.
For other countries give the full
country name.


US Zip Code or country postal
code.





Text string that give the
citation to be used when the
data from each source are
referenced.

Integer identifier referencing
the record in the ISOMetadata
table for this source.
Example
"Salt Lake City"






"UT"






"82323"






"Data collected
by USU as part
of the Little
Bear River Test
Bed Project"
5


Constraint
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M




M
Foreign key

Default Value
"Unknown"






"Unknown"






"Unknown"






"Unknown"




0 = Unknown
or uninitialized
metadata
The following rules and best practices should be followed when populating this table:

    1.  The SourcelD field is the primary key, must be a unique integer, and cannot be NULL.  This field
       should be implemented as an auto number/identity field.
    2.  The Organization field should contain a text description of the agency or organization that created
       the data.
    3.  The SourceDescription field should contain a more detailed description of where the data was
       actually obtained.
    4.  A default value of "Unknown" may be used for the source contact information fields in the event
       that this information is not known.
    5.  Each  source must be associated with a metadata record in the ISOMetadata table. As such, the
       MetadatalD must reference a valid MetadatalD from the ISOMetadata table. The ISOMetatadata
       table  should be populated with an appropriate record prior to adding a source to the Sources table.
       A default MetadatalD of 0 can be used for a source with unknown or uninitialized metadata.
    6.  Use the Citation field to record the text that you would like others to use when they are
       referencing your data.  Where available, journal citations are encouraged to promote the correct
       crediting for use of data.

Table: SpatialReferences

The SpatialReferences table provides information about the Spatial Reference Systems used for latitude
and longitude as well as local coordinate systems in the Sites table. This table is a controlled vocabulary.
                                           B-46

-------
Field Name
SpatialReferencelD
SRSID
SRSName
IsGeographic
Notes
Data Type
Integer
Identity
Integer
Text (255)
Boolean
Text
(unlimited)
Description
Unique integer identifier for each Spatial
Reference System.
Integer identifier for the Spatial
Reference System from
http://www.epsg.org/.
Name of the Spatial Reference System.
Value that indicates whether the spatial
reference system uses geographic
coordinates (i.e. latitude and longitude)
or not.
Descriptive information about the Spatial
Reference System. This field would be
used to define a non-standard study area
specific system if necessary and would
contain a description of the local
projection information. Where possible,
this should refer to a standard projection,
in which case latitude and longitude can
be determined from local projection
information. If the local grid system is
non-standard then latitude and longitude
need to be included too.
Example
37
4269
"NAD83"
"True",
"False"

Constraint
M
Unique
Primary key
0
M
Cannot contain
tab, line feed, or
carriage return
characters
O
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: SpeciationCV

The SpeciationCV table contains the controlled vocabulary for the Speciation field in the Variables table.
Field Name
Term





Definition
Data Type
Text (255)





Text
(unlimited)
Description
Controlled vocabulary for Speciation.





Definition of Speciation controlled
vocabulary term. The definition is
optional if the term is self explanatory.
Examples
"P"





"Expressed as
phosphorus"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
                                           B-47

-------
Table: TopicCategoryCV

The TopicCategoryCV table contains the controlled vocabulary for the ISOMetaData topic categories.
Field Name
Term
Definition
Data Type
Text (255)
Text
(unlimited)
Description
Controlled vocabulary for
TopicCategory.
Definition of TopicCategory controlled
vocabulary term. The definition is
optional if the term is self explanatory.
Examples
"InlandWaters"
"Data
associated with
inland waters"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM.  Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: Units

The Units table gives the Units and UnitsType associated with variables, time support, and offsets.  This
is a controlled vocabulary table.
Field Name
UnitsID
UnitsName
UnitsType
Units Abbreviation
Data Type
Integer
Identity
Text (255)
Text (255)
Text (255)
Description
Unique integer identifier that identifies
each unit.
Full text name of the units.
Text value that specifies the dimensions
of the units.
Text abbreviation for the units.
Example
6
"Milligrams
Per Liter"
"Length"
"Time"
"Mass"
"mg/L"
Constraint
M
Unique
Primary key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
This table is pre-populated within the ODM.  Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
                                            B-48

-------
Table: ValueTypeCV

The ValueTypeCV table contains the controlled vocabulary for the ValueType field in the Variables and
SeriesCatalog tables.
Field Name
Term








Definition



Data Type
Text (255)








Text (unlimited)



Description
Controlled vocabulary for ValueType.








Definition of the ValueType controlled
vocabulary term. The definition is
optional if the term is self explanatory.

Examples
"Field
Observation"







"Observation
of a variable
using a field
instrument"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O



This table is pre-populated within the ODM.  Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: VariableNameCV

The  VariableName  CV  table contains  the controlled vocabulary for the VariableName field in the
Variables and SeriesCatalog tables.
Field Name
Term








Definition



Data Type
Text (255)








Text (unlimited)



Description
Controlled vocabulary for Variable
names.







Definition of the VariableName
controlled vocabulary term. The
definition is optional if the term is self
explanatory.
Examples
"Temperature",
"Discharge",
"Precipitation"










Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
0



This table is pre-populated within the ODM.  Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.

Table: Variables

The Variables table lists the full descriptive information about what variables have been measured.
                                           B-49

-------
Field Name
VariablelD
VariableCode
VariableName
Speciation
VariableUnitsID
SampleMedium
ValueType
IsRegular
TimeSupport
Data Type
Integer
Identity
Text (50)
Text (255)
Text (255)
Integer
Text (255)
Text (255)
Boolean
Real
Description
Unique integer identifier for
each variable.
Text code used by the
organization that collects the
data to identify the variable.
Full text name of the variable
that was measured, observed,
modeled, etc. This should be
from the VariableName CV
controlled vocabulary table.
Text code used to identify how
the data value is expressed (i.e.,
total phosphorus concentration
expressed as P). This should be
from the SpeciationCV
controlled vocabulary table.
Integer identifier that references
the record in the Units table
giving the units of the data
values associated with the
variable.
The medium in which the
sample or observation was taken
or made. This should be from
the SampleMediumCV
controlled vocabulary table.
Text value indicating what type
of data value is being recorded.
This should be from the
ValueTypeCV controlled
vocabulary table.
Value that indicates whether the
data values are from a regularly
sampled time series.
Numerical value that indicates
the time support (or temporal
footprint) of the data values. 0 is
used to indicate data values that
are instantaneous. Other values
indicate the time over which the
data values are implicitly or
explicitly averaged or
aggregated.
Example
6
"00060" used
by USGS for
discharge
"Discharge"
"P" "M"
A , n ,
"NO3"
4
"Surface
Water"
"Sediment"
"Fish Tissue"
"Field
Observation"
"Laboratory
Observation"
"Model
Simulation
Results"
"True"
"False"
0,24
Constraint
M
Unique
Primary key
M
Unique
Allows only
characters in
the range of
A-Z (case
insensitive),
0-9, and ".",
"-", and " ".
M
Foreign key
M
Foreign key
M
Foreign key
M
Foreign key
M
Foreign key
M
M
Default Value



"Not
Applicable"

"Unknown"
"Unknown"
"False"
0 = Assumes
instantaneous
samples where
no other
information is
available
B-50

-------
Field Name
TimeUnitsID








DataType









GeneralCategory




NoDataValue

Data Type
Integer








Text (255)









Text (255)




Real

Description
Integer identifier that references
the record in the Units table
giving the Units of the time
support. If TimeSupport is 0,
indicating an instantaneous
observation, a unit needs to still
be given for completeness,
although it is somewhat
arbitrary.
Text value that identifies the
data values as one of several
types from the DataTypeCV
controlled vocabulary table.






General category of the data
values from the
GeneralCategoryCV controlled
vocabulary table.

Numeric value used to encode
no data values for this variable.
Example
4








"Continuous"
"Sporadic"
"Cumulative"
"Incremental"
"Average"
"Minimum"
"Maximum"
"Constant
Over Interval"
"Categorical"
"Climate"
"Water
Quality"
"Groundwater
Quality"
-9999

Constraint
M
Foreign key







M
Foreign key








M
Foreign key



M

Default Value
103 = hours








"Unknown"









"Unknown"




-9999

The following rules and best practices should be followed when populating this table:

    1.  The VariablelD field is the primary key, must be a unique integer, and cannot be NULL.  This
       field should be implemented as an auto number/identity field.
    2.  The VariableCode field must be unique and serves as an alternate key for this table. Variable
       codes can be arbitrary, or they can use an organized system.  VaraibleCodes cannot contain any
       characters other than A-Z (case insensitive), 0-9, period ".", dash "-", and underscore "_".
    3.  The VariableName field must reference a valid Term from the VariableNameCV controlled
       vocabulary table.
    4.  The Speciation field must reference a valid Term from the SpeciationCV controlled vocabulary
       table. A default value of "Not Applicable" is used where speciation does not apply. If the
       speciation is unknown, a value of "Unknown" can be used.
    5.  The VariableUnitsID field must reference a valid UnitsID from the UnitsTable controlled
       vocabulary table.
    6.  Only terms from the SampleMediumCV table can be used to populate the SampleMedium field.
       A default value of "Unknown" is used where the sample medium is unknown.
    7.  Only terms from the ValueTypeCV table can be used to populate the ValueType field. A default
       value of "Unknown" is used where the value type is unknown.
    8.  The default for the TimeSupport field is 0. This corresponds to instantaneous values.  If the
       TimeSupport field is set to a value other than 0, an appropriate TimeUnitsID must be specified.
       The TimeUnitsID field can only reference valid UnitsID values from the Units controlled
       vocabulary table. If the TimeSupport field is set to 0, any time units  can be used (i.e., seconds,
       minutes, hours, etc.), however a default value of 103 has been used, which corresponds with
       hours.
                                          B-51

-------
    9.  Only terms from the DataTypeCV table can be used to populated the DataType field.  A default
       value of "Unknown" can be used where the data type is unknown.
    10. Only terms from the GeneralCategoryCV table can be used to populate the General Category
       field. A default value of "Unknown" can be used where the general category is unknown.
    11. The NoDataValue should be set such that it will never conflict with a real observation value. For
       example a NoDataValue of -9999 is valid for water temperature because we would never expect
       to measure a water temperature of -9999. The default value for this field is -9999.

Table: VerticalDatumCV

The VerticalDatumCV table contains the controlled vocabulary for the VerticalDatum field in the Sites
table.
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for
VerticalDatum.
Definition of the VerticalDatum
controlled vocabulary. The definition is
optional if the term is self explanatory.
Examples
"NAVD88"
"North
American
Vertical
Datum of
1988"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
0
This table is pre-populated within the ODM.
http://water.usu.edu/cuahsi/odm/.
Changes to this controlled vocabulary can be requested at
                                          B-52

-------
Appendix B. Data Versioning Within ODM


The main text of this document focuses on how ODM is structured to store observations data. It does not
address how to manage editing data stored within ODM.  Software applications based on ODM will have
functionality  that will allow data managers  and database  administrators to modify, delete, change, or
otherwise make edits to  data  stored within ODM.   In  addition,  these  software  tools will provide
functionality to create derived datasets, or datasets that are calculated or derived from data already stored
in ODM  (i.e., calculate a time series of discharge from a time series of stage, or calculate a time series of
daily average temperature  from a time series  of hourly observations).  The purpose of this appendix is to
clarify how data editing and versioning can be managed within the ODM schema.

Data Series Defined

In order to fully grasp the concepts that follow, the idea of a "data series" in the context of ODM must be
clarified.  A "data series" is an organizing principle that is present in the ODM.  A data series consists of
all of the data values associated with a unique  site, variable, method, source, and quality control  level
combination.  An example  of the  full  specification  for a data series is:  "all of the raw unchecked
(QualityControlLevel) water temperature (Variable) values measured in the Logan River near Logan, UT
(Site) using a field temperature sensor (method) by Utah State University (Source)."  Each record in the
SeriesCatalog table of ODM represents a unique  data series.

Rules for Editing and Deriving Data Series in  ODM

The following rules are suggested so that versioning of and edits to data series can be managed within the
ODM schema.  Software applications that work with ODM should follow these rules.  These rules are
based on  the default set of Quality Control Levels that are distributed with the ODM blank schema.

    1.  Data versioning should be done at the data series level - Within ODM, the concept of data
       versioning is related to the quality control level. Quality control level is a data series level
       attribute, and as  such, changes to the quality control level should occur at the data series level
       rather than at the individual value level.  For example, if an investigator wished to create a quality
       controlled Level 1 data series from a raw Level 0 data series, he/she should first make a copy of
       the raw Level 0 data series and then perform any edits and adjustments required in the quality
       control process to the copy. The edited copy then becomes the Level 1 data series, and the Level
       0 data series is preserved intact.
    2.  Data series with a QualityControlLevelCode ofO cannot be edited - Level 0 data series represent
       raw data from sensors (i.e., stage measured by a water level recorder) or other products derived
       from raw data (i.e., discharge that is programmatically derived from stage before the stage values
       have been quality controlled).  By definition, Level 0 data have not been quality controlled and
       may contain significant errors and bad values. However, Level 0 data series represent the source
       from which all other derived data series are based, and as such should remain intact for archive
       purposes.  Level 0 data series should not be used for analysis unless no other adequate options are
       available, and only if the user is aware that the data are raw. Level 0 data series can be removed
       entirely from the database, but only by removing the entire data series.
    3.  Only one QualityControlLevel 0 data series can exist for a Site, Variable, and Method
       combination - Only one raw data series for a Site, Variable, and Method combination can exist
       within an ODM database.  If multiple sensors are measuring the same variable at the same site,
       the method description would have to distinguish between the two.
    4.  Only one QualityControlLevel 1 data series can exist for each Site,  Variable, and Method
       combination - Once a Level 0 data series has been loaded to the database, a Level 1 data series
                                            B-53

-------
can be "derived" from that Level 0 data series.  This is done by first making a copy of the Level 0
data series, second changing the QualityControlLevel of the copy to 1, and last doing any
necessary filtering or editing required so that the Level 1 data series is acceptable as quality
controlled. In most cases, the majority of the values within a Level 0 data series and its
corresponding Level 1 data series will remain the same. However, where instruments
malfunction or other conditions are present that affect the raw data values, Level 0 values may be
deleted, adjusted, or otherwise edited in creating the Level 1 data series.
Any edits to a data series are saved to that data series - Level 0 data cannot be edited. With
Levels 1 or higher, however, software applications should be allowed to edit and delete values.
Each time an edit is made, the result should overwrite the previous value within a data series.  In
other words, edits should not create new data series, they  should modify an existing one.  This
will be true even where edits are done within multiple editing sessions. The editing software
should record the method or basis for any data edits in appropriate method records.
Data series of Level 2 or higher can only be created from data series of Level 1 or higher -
Derived data series of Level 2 or higher can only be created from data series of Level 1 or higher.
If a user wishes to create a derived data series from a Level 0 data series (such as discharge from
raw, unchecked stage values) that derived data series would also be Level 0.
                                     B-54

-------
This page left intentionally blank.
            B-55

-------
        APPENDIX C: DESCRIPTION OF ESF DATA LOADED TO THE ODM

                                      Sources
Loaded:      Sources have been successfully loaded to the ESF DB.
Comments:   Contacts had to be updated.
Total: Four sources
                                        Sites
Loaded:      Sites have been successfully loaded to the ESF DB.
Comments:   Sites had to be added. Elevations have been updated (Google Earth).
Total:  128 Sites.
SiteCode
UHL
SLT
SHC
HST
KRT
SHA
AYR
OWT
USR
HWR
FMR
HLR
LRC
GRR
SHR
CLC
STC
EFG
EFY
EFM
EFB
ELI
DWT
DAM
Sitename
Upper Hall Run at Clough Pike
South Lucy Tributary at Apple Rd
Shaylor Crossing at County Lift Station in
Shaylor Crossing Subdivision
Heiserman Stream at Rt 50 crossing and
Experimental Stream Facility Intake
Tributary to Kain Run at rt 276 crossing
South Harsha Tributary at Bantam Rd
crossing State Park
Aveys Run at Cincinnati Nature Center's
"Stream Site B"
Owensville Tributary at County lift station
off of St. Louis Rd
Upper Salt Run at County lift station off of
Shepard Rd.
Howard Run at Biddings Rd Crossing
Fourmile Run at Elick Rd crossing
Hall Run at Roundbottom Rd crossing
Lucy Run at Steve Wilson's property off of
SR222
Grassy Fork at Glancey-Marathon Rd
crossing
Shaylor Run at Bethel Rd Crossing
Cloverlick Creek in State Park at Austin Rd
off of 133
Stonelick Creek at RT 50 crossing
East Fork River @ Morgan Rd
East Fork River @ Fayetteville
East Fork River @ Marathon
East Fork River @ Blue Sky Parkway
East Fork River @ Williamsburg
Drinking Water Treatment Plant
East Fork River Below East Fork Lake Dam
Latitude
39.08526
39.04119
39.08114
39.14839
39.06425
39.0072
39.12321
39.12775
39.11801
39.12367
39.05586
39.14039
39.05701
39.13286
39.11834
38.98984
39.12203
39.2191
39.18764
39.13799
39.11526
39.05272
39.05131
39.02614
Longitude
-84.295
-84.2129
-84.2319
-84.2565
-84.0719
-84.1368
-84.2463
-84.1364
-84.2596
-84.0077
-84.1681
-84.259
-84.1795
-84.0152
-84.2163
-84.0594
-84.1986
-83.9156
-83.9381
-84.0029
-84.0246
-84.0515
-84.1361
-84.1482
Elevation_m
270.34
254.8
249.92
159.4
262.72
248.4
185
243.52
191.1
263.64
182.26
158.49
182.57
262.12
169.16
244.54
163.97
279.18
27.95
261.2
258.46
240.78
212.74
186.53
                                      C-l

-------
EFK
EFC
ESF
LEF
2EFR1120
5
2EFR2100
1
2EFR1000
2
2EFR2000
4
2EFR2002
2
2EFR2002
o
6
2EFR2002
4
2EFR2002
5
2EFR2000
1
2EFR1000
0
EFL
CEC
1IJ000600
01
1IJ000600
02
1IN00287
001
1PB00105
001
1PB00105
801
East Fork River @ Experimental Stream
Facility
East Fork Lake @ Confluence w/ Little
Miami River
Experimental Stream Facility Weather
Station
Lower East Fork Waste Water Treatment
Plant Effluent
Poplar Creek and Ohio 232, Bethel O. Quad.
East Fork Lake, Cloverlick Inflow Tributary
above Poplar Creek
EFLMR at Ohio 32 Bridge at Williamsburg,
USGS Gage, Williamsburg, O. Quad
(Equivalent to EPA "ELI" and CCOEQ
"EFRM34.8")
East Fork Lake, EFLMR Main Inflow
Tributary, Williamsburg, O. Quad
East Fork Lake, New Site 2010 to account
for mixing of major inflows
East Fork Lake, New Site 2010 upstream
point of 'narrows'
East Fork Lake, at mile 24. 0.5mi S of Elk
Lick, Ohio, ON Batavia, OH Quad. At cross
section from STA 20024 to mouth of
Slabcamp Run, 2/10 of length from left bank
East Fork Lake, New Site 2010, Cove section
North of STA 2EFR20024
Located on East Fork Lake at the Log Boom.
(Lake Side of "DAM" near USACE Flow
Control Structure)
EFLMR at CO. RD. Bridge 0.1 Miles off Elk
Lick Road, Batavia, O. Quad. (Near EPA
"DAM" Site)
East Fork Lake near Bob McEwen Drinking
Water Treatment Plant Intake (Sampled from
bridge to intake structure)
Cemetery Creek
Ohio Asphaltic Limestone Corp; Final
effluent discharge from sedimentation basin
to an unnamed tributary of Turtle Creek
Ohio Asphaltic Limestone Corp; Final
effluent discharge from sedimentation basin
to an unnamed tributary of Dodson Creek
Hanson Aggregates Midwest - Highland
Quarry Industrial Discharge Facility
Village of Lynchburg Municipal WWTP,
Final Effluent
Village of Lynchburg Municipal WWTP,
Upstream Monitoring (Pearl St. Covered
Bridge)
39.14608
39.15574
39.14723
39.14676
38.96139
38.99917
39.0525
39.01999
39.01722
39.01611
39.02
39.02833
39.02917
39.02611
39.03695
39.08222
39.24972
39.24306
39.19417
39.23806
39.24306
-84.2519
-84.2884
-84.2547
-84.2559
-84.1067
-84.0858
-84.0506
-84.1311
-84.1017
-84.1114
-84.1311
-84.1319
-84.0867
-84.1478
-84.1387
-84.1767
-83.6814
-83.6825
-83.7222
-83.8003
-83.7953
154.53
150.56
159.71
158.18
253.58
219.14
240.78
216.09
207.56
215.18
199.94
195.06
191.41
186.22
212.74
193.54
354.77
338.92
315.45
297.47
299.3
C-2

-------
1PB00105
901
1PG00100
001
2EFR1120
4
2EFR2000
5
2EFR2100
0
2EFR2300
1
5MILECR
0.5
BARNS1.
9
BRUSHO.
3
CABIN1.5
CLOV5.1
CWL
DODSNO.
1
DWT-0
DWT-
PCD
DWT-PFL
DWT-
PMN
DWT-SET
E01
E02
EOS
E04
£05
Village of Lynchburg Municipal WWTP,
Downstream Monitoring
Rolling Acres Municipal WWTP Final
effluent discharge to an unnamed tributary of
Dodson Creek. Monitoring just after
dechlorination.
Poplar Creek and St Route 125
East Fork Lake near combined confluence of
Cloverlick and Poplar Creek's inflows
East Fork Lake near Cabin Run Inflow
East Fork Lake near Slabcamp Run Inflow
Five Mile Creek @ Bluesky; upstream from
Bluesky Pkwy
Barnes Run @ Bethel-Concord; Bethel
Concord Rd
Brushy Fork RM 0.3 at Titus Road
Cabin Run; At Campground Loop G
Clover Creek RM 1.9 at St. Rt. 133
Cornwell Farm, stream crossing at driveway
to Stahl's Property
Dodson Creek RM 0.1 at mouth; Dodson
Creek @ Crampton Rd.
Inside Bob McEwen Drinking Water
Treatment Plant, Pumped from inside intake
structure on lake
Drinking Water Post Chlorine Disinfection,
before discharge to distribution system
Effluent of all filters currently operating
Drinking Water Treatment Plant - Post
Manganese addition before coagulant is
added
DWT- effluent from settling basin
Mesocosm 1 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 2 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 3 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 4 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 5 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
39.2375
39.22222
38.97379
39.02336
39.01527
39.03919
39.1136
39.0082
39.143
39.0327
38.9856
39.18389
39.22285
39.05169
39.05169
39.05169
39.05169
39.05169
39.14718
39.14718
39.14718
39.14718
39.14718
-83.8006
-83.6856
-84.1058
-84.1013
-84.0984
-84.1364
-84.0203
-84.0733
-84.1447
-84.1019
-84.0528
-84.0133
-83.8127
-84.1356
-84.1356
-84.1356
-84.1356
-84.1356
-84.2548
-84.2548
-84.2548
-84.2548
-84.2548
297.17
344.41
247.5
218.8
208.2
212.4
264.9
249.92
199.02
237.7
250.84
286.19
294.12
261.2
262.4
262.1
261.5
261.8
159.7
159.7
159.7
159.7
159.7
C-3

-------
E06
E07
EOS
EFRM0.7
EFRM15.
6
EFRM34.
8
EFRM44.
1
EFRM60.
6
EFRM70.
1
EFRM75.
3
EFRM9.1
ERS
GRSSYO.
2
GRSSY3.
0
GRSSY3.
2
HALL0.2
HWRD0.4
NLT
NWT
PLEAS0.2
POPLR2.1
SHYLR1.
7
SLAB0.5
ST13.4
Mesocosm 6 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 7 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 8 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
EFLMR RM 0.7 at S. Milford Rd. (Same as
EPA EFC site 0)
East Fork @ State Route 222
EFLMR @ Main st. autosampler;
Williamsburg Main Street Bridge
EFLMR RM 44. 1 at Blue Sky Park Rd.
EFLMR RM 60.6 at US 50 Fayetteville Lift
Station
East Fork @ Wise Road; Clinton Co Wise
Road Bridge
East Fork @ Canada Rd; Canada Road
Bridge
East Fork @Stonelick-Olive Branch;
upstream Stonelick-Olive branch bridge
East Fork River Supply Monitoring Station
in the Experimental Stream Facility, 1003
US Route 50, Milford, OH 45150
Grassy Fork @ GC-Marathon Rd
autosampler; Glancy Corner-Marathon Road
Grassy Fork Marathon Edenton; 5764
Marathon-Edenton Road
Grassy Fork @ 131 Brown County; river left
tributary, U.S. 131, Brown County
Hall Run @ Roundbottom Rd
Howard Run @ Burdsall; Burdsall Road
North Lucy Tributary at confluence with
South Lucy Tributary
Cedarville Rd stream crossing north of Main
St., Newtonsville OH
Pleasant Run Hutchinson Prop; Hutchinson
Road off SR 133
Poplar Creek @ Macedonia; Macedonia
Road Bridge
Shayler Run @ Baldwin Rd
Slabcamp Run accessed via Greenbriar Rd;
trailhead across from NE WTP pond
Stonelick Creek RM 13.4 at 727
39.14718
39.14718
39.14718
39.1533
39.0624
39.0525
39.1158
39.1865
39.22513
39.2731
39.11869
39.14718
39.1329
39.1664
39.1741
39.1401
39.1239
39.04121
39.185
39.1084
38.982
39.1183
39.0511
39.2097
-84.2548
-84.2548
-84.2548
-84.2975
-84.1789
-84.0504
-84.025
-83.9372
-83.8255
-83.7812
-84.2086
-84.2548
-84.0153
-84.0092
-83.9982
-84.2593
-84.0073
-84.212
-84.095
-84.0374
-84.101
-84.2163
-84.129
-84.0889
159.7
159.7
159.7
151.48
176.47
240.5
259.98
271.56
298.38
303.87
163.06
159.7
262.1
274.31
278.88
158.8
264.6
250.5
265.47
258.8
237.4
169.2
245.7
262.42
C-4

-------
ST17.3
ST5.7
STEFLM
R
TBS
ULREY1.
o
5
1IN00116
601
1IN00123
902
1PA00005
001
1PB00001
001
1PB00034
001
1PC00005
001
1PD00024
001
1PH00031
001
1PK00010
001
1PNOOOOO
002
1PP00020
001
1PR00116
001
1PT00077
001
1PV00002
001
1PV00009
001
1PV00034
001
1PV00074
001
1PX00059
001
1PZ00029
001
1PNOOOOO
001
2PD00024
002
Stonelick Creek @ Stonelick Tr; u/s from
Stonelick Tr Private Drive
Stonelick Creek RM 0.9 at Anstaett Rd.
Stonelick Creek at US 50
Twin Bridges Road, stream crossing just east
of horse camp
Ulrey Run@ St. Rt. 125; u/s from St. Rt. 125
USEPA Experimental Stream Fac
CECOS International Inc
New Vienna WWTP
Batavia WWTP
Williamsburg WWTP
Milford STP
Fayetteville Perry Twp WWTP
Martinsville-Midland WWTP
Middle East Fork Regional WWTP
US DOA William H Harsha Lake
Stonelick State Park Campgrounds
Cincinnati Nature Center - Rowe Woods
Recreation
Clermont NE Local Schools WWTP
Holly Towne MHP
Orchard Lake MHP
Forest Creek MHP
Royal Hills MHP
Locust Ridge Nursing Home Inc
Snow Hill Country Club
US DOA William H Harsha Lake
South Pleasant Street Lift Station Combined
Sewer Overflow
39.232
39.1429
39.1222
39.03806
39.0017
39.1475
39.12694
39.32889
39.08222
39.05667
39.16444
39.18361
39.32333
39.08861
39.02639
39.22444
39.12778
39.12806
39.00722
39.19111
38.99083
39.18
38.9875
39.34917
39.22
40
-84.044
-84.1492
-84.1991
-84.1203
-84.1515
-84.2553
-84.0519
-83.7033
-84.1767
-84.0464
-84.2836
-83.9392
-83.8694
-84.1872
-84.145
-84.0578
-84.2472
-84.1072
-84.1783
-84.2464
-84.1589
-84.2511
-84.0225
-83.7158
-84.06
-85
268.2
195.67
164
252.97
239.3
160
271
336
170
241
153
272
297
170
249
269
233
256
262
246
255
232
274
335
272
-9999
C-5

-------
DAMCS
DRI
DWF
GWS
REF
ROA
TMC
QAC
GWT
FF
Located on East Fork Lake at the Log Boom.
(Lake Side of ""DAM"" near USAGE Flow
Control Structure)
Site managed by Jake Beaulieu
Drinking Water Treatment Plant, Post
Filtration, Sampled by Macke for Elovitz
ESF Ground Water Source
Site managed by Jake Beaulieu
Site managed by Jake Beaulieu
Site managed by Jake Beaulieu
Quality Assurance Quality Control Sample
Grailville Treatment Wetland
??
39.02
40
39.0369
39.14718
40
40
40
39.14718
40
40
-84.1536
-85
-84.1387
-84.2548
-85
-85
-85
-81.2548
-85
-85
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
              Physico-chemical Data Water and Sediment - EXCEL Sheets
Variables
Loaded:
The variables listed below have been  loaded and mapped to the existing ODM vocabulary.
VariableCode is a unique identifier and consists of the AnalyteName-Matrix.
DRP-SW
NH4-SW
NH4-
Urease-SW
NO2-SW
NO2-3-SW
TDN-SW
TDP-SW
TN-SW
TNH4-SW
TNO2-SW
TNO2-3-
SW
TOC-SW
TP-SW
TRP-SW
TUREA-
SW
UREA-SW
DOC-SW
DOC-AD
DOC-BP
DOC-DW
DOC-IG
DOC-WW
DRP-AD
DRP-BP
DRP-DW
Phosphorus, orthophosphate
Nitrogen, NH4
Ammonium analyzed as an endpoint to 24hr Urease Assay
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, total dissolved
Phosphorus, total dissolved
Nitrogen, total
Nitrogen, NH3 + NH4
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Carbon, total organic
Phosphorus, total
Phosphorus, orthophosphate dissolved
Urea, total
Urea, dissolved (filtered)
Carbon, dissolved organic
Carbon, total organic
Carbon, dissolved organic
Carbon, dissolved organic
Carbon, dissolved organic
Carbon, dissolved organic
Phosphorus, orthophosphate
Phosphorus, orthophosphate
Phosphorus, orthophosphate
P
N
N
N
N
N
P
N
N
N
N
C
P
P
N
N
C
C
C
C
C
C
P
P
P
                                      C-6

-------
DRP-IG
DRP-WW
NH4-AD
NH4-BP
NH4-DW
NH4-IG
NH4-
Urease-AD
NH4-
Urease-BP
NH4-
Urease-DW
NH4-
Urease-IG
NH4-
Urease-
WW
NH4-WW
NO2-3-AD
NO2-3-BP
NO2-3-DW
NO2-3-IG
NO2-3-
WW
NO2-AD
NO2-BP
NO2-DW
NO2-IG
NO2-WW
TDN-AD
TDN-BP
TDN-DW
TDN-IG
TDN-WW
TDP-AD
TDP-BP
TDP-DW
TDP-IG
TDP-WW
TN-AD
TN-BP
TN-DW
TNH4-AD
TNH4-BP
TNH4-DW
TNH4-IG
TNH4-WW
TN-IG
TNO2-3-
AD
Phosphorus, orthophosphate
Phosphoras, orthophosphate
Nitrogen, NH4
Nitrogen, NH4
Nitrogen, NH4
Nitrogen, NH4
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Nitrogen, NH4
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, total dissolved
Nitrogen, total dissolved
Nitrogen, total dissolved
Nitrogen, total dissolved
Nitrogen, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Nitrogen, total
Nitrogen, total
Nitrogen, total
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, total
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
P
P
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
P
P
P
P
P
N
N
N
N
N
N
N
N
N
N
C-7

-------
TNO2-3-BP
TNO2-3-
DW
TNO2-3-IG
TNO2-3-
WW
TNO2-AD
TNO2-BP
TNO2-DW
TNO2-IG
TNO2-WW
TN-WW
TOC-AD
TOC-BP
TOC-DW
TOC-IG
TOC-WW
TP-AD
TP-BP
TP-DW
TP-IG
TP-WW
TRP-AD
TRP-BP
TRP-DW
TRP-IG
TRP-WW
TUREA-
AD
TUREA-BP
TUREA-
DW
TUREA-IG
TUREA-
WW
UREA-AD
UREA-BP
UREA-DW
UREA-IG
UREA-WW
TNH4-
SWNR
TUREA-
SWNR
TN-SWNR
TP-SWNR
TUREA-
GW
TRP-GW
TNH4-GW
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, total
Carbon, total organic
Carbon, total organic
Carbon, total organic
Carbon, total organic
Carbon, total organic
Phosphorus, total
Phosphorus, total
Phosphorus, total
Phosphorus, total
Phosphorus, total
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Urea, total
Urea, total
Urea, total
Urea, total
Urea, total
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Nitrogen, NH3 + NH4
Urea, total
Nitrogen, total
Phosphorus, total
Urea, total
Phosphorus, orthophosphate dissolved
Nitrogen, NH3 + NH4
N
N
N
N
N
N
N
N
N
N
C
C
C
C
C
p
p
p
p
p
p
p
p
p
p
N
N
N
N
N
N
N
N
N
N
N
N
N
P
N
P
N

-------

TNO2-DI
TNO2-3-
SWNR
TNO2-
SWNR
TRP-
SWNR
TNO2-3-
GW
TN-GW
TP-GW

Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Phosphorus, orthophosphate dissolved
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, total
Phosphorus, total

N
N
N
P
N
N
P
Comments:
OffsetTypes
Loaded:
Tray 1.1 A
Tray 1.2 A
Tray 1.3 A
Tray LIB
Tray 1.2B
Tray 1.3B
Tray 2.1 A
Tray 2.2A
Tray 2.3A
Tray 2. IB
Tray 2.2B
Tray 2.3B
Tray 3.1 A
Tray 3.2A
Tray 3.3A
Tray 3.IB
Tray 3.2B
Tray 3.3B
Tray 4.1 A
Tray 4.2A
Tray 4.3A
Tray 4. IB
Tray 4.2B
Tray 4.3B
Tray 5.1 A
1. The variables in the Excel sheets had to be changed to enable them to be
  committed to the Database.
2. Specifications, VariableUnitsID and Value Type had to be checked.
3. VariableNameCV. The following new variable terms  have been loaded to this
  table: urea, dissolved (filtered) and urea, total urea.
4. File: ESF-DataValues-final.csv
The following new offset terms have been added
                                          C-9

-------
Tray 5.2A
Tray 5.3A
Tray 5.IB
Tray 5.2B
Tray 5.3B
Center
Left
Right
Below Surface Water
Below Surface
           Sensor Data Water-immersed Sensors for Physico-chemical Variables
http://66.161.146.122/ESF Field Data/
                                        1:34 PM
                                        1:10PM
                                        1:00 PM
                                        1:43 PM
                                        1:10PM
                                       12:57 PM
                                        1:00 PM
                                       12:58 PM
                                       12:58 PM
                                       12:56 PM
                                        1:02 PM
                                       12:55 PM
                                       12:55 PM
                                       12:59 PM
              that combines date and time and a second column with OffsetTypelD (3 which
              describes Below Surface water in ft).
              The sensor data variables listed below have been loaded to the ODM.
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Comments:
July
July
July
July
July
July
July
July
July
July
July
July
July
July
For all
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
the sensor data fil
1549191
880313
74762
14801139
476831
4335011
4382015
3695128
4439957
4352695
1482854
3915199
4415080
3998020
CEC.csv
CLC.csv
FMR.csv
HST.csv
FIWR.csv
KRT.csv
LRC.csv
OWT.csv
SHA.csv
SHC.csv
SFIR.csv
SLT.csv
UHL.csv
USR.CSV
Variables
Loaded:
VariableC
ode
Temp
SPCOND
DoSat
VariableNam
e
Temperature
Specific
conductance
Oxygen,
dissolved
percent of
saturation
Variab
lellnits
ID
96
192
1
ValueType
Field
Observation
Field
Observation
Field
Observation
TimeSupport
10
10
5
TimeUnitsID
104
104
104
DataType
Continuous
Continuous
Average
GeneralCategory
Water Quality
Water Quality
Water Quality
                                        C-10

-------
DO
PH
ORP
TURB
Oxygen,
dissolved
pH
Reduction
potential
Turbidity
199
309
169
221
Field
Observation
Field
Observation
Field
Observation
Field
Observation
5
5
5
5
104
104
104
104
Continuous
Continuous
Continuous
Continuous
Water Quality
Water Quality
Water Quality
Water Quality
C-ll

-------
This page intentionally left blank
            C-12

-------
  APPENDIX D: DESCRIPTION OF SHEPHERD'S CREEK DATA LOADED TO THE
                                       ODM
Loaded:
Comments:
Total:
                         Sources
Sources have been successfully loaded to the Shepherd Creek DB.
Contacts had to be updated.
Loaded:
Comments:
Total:  6 Sites.
                           Sites
Sites have been successfully loaded to the SC DB.
Sites had to be added. Elevations have been updated.
SiteCode
CON
PWR
DRI
ROA
URB
REF7
Sitename
ConS(Catch)
Pwr2(Subl)
Dri3(Sub2)
Roa4(Sub3)
Urb6(Sub4)
Ref7(Sub5)
Latitude
39.17329
39.18222
39.1773
39.17879
39.18235
39.17649
Longitude
-84.5797
-84.5814
-84.5785
-84.576
-84.5746
-84.5781
Elevation_m
215
242
225
233
242
223
                      Physico-chemical Data Water - Excel Sheet
Variables
Loaded:
The variables listed below have been loaded  and mapped to the  existing ODM vocabulary.
VariableCode is a unique identifier and consists of the AnalyteName-Matrix.
TP
Temp
TN
Ecoli
Turb
Alk
DOC
TOC
Cl
Br
NO3
S04
DIN
TKN
NH3-N
TOP
Al
Ca
Cu
Fe
Mg
Phosphorus, total
Temperature
Nitrogen, total
E-coli
Turbidity
Alkalinity, carbonate
Carbon, dissolved organic
Carbon, total organic
Chloride
Bromide
Nitrogen, nitrate (NO3)
Sulfate
Nitrogen, dissolved inorganic
Nitrogen, total kjeldahl
Nitrogen, NH3
Phosphorus, total dissolved
Aluminum, dissolved
Calcium
Copper, dissolved
Iron, dissolved
Magnesium
P
Not Applicable
N
Not Applicable
Not Applicable
CaCO3
Unknown
Unknown
Cl
Br
NO3
S04
Unknown
Unknown
N
Unknown
Al
Ca
Cu
Fe
Mg
                                        D-l

-------
Mn
K
Na
Zn
oPO4
ssc
ZnTR
MnTR
FeTR
CuTR
A1TR
Manganese, dissolved
Potassium
Sodium
Zinc, dissolved
Phosphorus, ortophosphate
Suspended Sediment Concentration
Zinc, total reactive
Manganese, total reactive
Iron, total reactive
Copper, total reactive
Aluminum, total reactive
Mn
K
Na
Zn
PO4
Not Applicable
Zn
Mn
Fe
Cu
Al
Comments:    1.  The variables in the Excel sheets had to be changed to enable them to be
                committed to the Database.
              2.  Specifications, VariableUnitsID and Value Type had to be checked.
              3.  VariableNameCV. The following new variable terms have been loaded to this
                table:  Zinc,  total  reactive;  Manganese,  total reactive; Iron,  total reactive;
                Copper, total reactive; Aluminum, total reactive.
              4.  File: ShepherdCreek-DataValues-final.csv
                                          D-2

-------
D-3

-------
          APPENDIX E:
       CUAHS
        universities allied for water research
    HYDROLOGY OF
JACOB'S WELL SPRING
  A tutorial for using HydroDesktop to discover and access water data
      Presented at the University of Cincinnati

         September 6, 2011

            by:

      Dr. Tim Whiteaker (twhitgDmail.utexas.edu)
      Center for Research in Water Resources
       The University of Texas at Austin

            And

         Dr. David Tarboton
         Utah State University
            E-l

-------
Distribution
Copyright © 2011, The University of Texas at Austin.
All rights reserved.

Funding
Funding for this document was provided by the Consortium of Universities for the Advancement of
Hydrologic Science, Inc. (CUAHSI) under National Science Foundation Grant No. EAR-0622374.  In
addition, much input and feedback has been received from the CUAHSI Hydrologic Information System
development      team.            Their      contribution      is      acknowledged      here.
                                           E-2

-------
Table of Contents
Introduction	4

    About Jacob's Well Spring	4
    Goals and Objectives	6
    Computer and Data Requirements	6
    Participating in the Open Source Community	7
Exercise Procedure	8

    Getting To Know HydroDesktop	8
    Creating a Project	9
    Searching for Hydrologic Data	10
    Selecting Data for Download	13
    Downloading Data	15
    Visualizing Time Series Data	16
    Labeling Features	18
    Delineating Watersheds	20
    Searching for Additional Data	21
    Adding Data to a Theme	22
    Exporting Data	23
Advanced: Analysis with R	25

    Enabling the HydroR Plug-in	25
    Plotting a  Graph with R	26
    Analyzing Flow in Jacob's Well Spring	27
Appendix A: R Scripts	30

    Script 1: Preparing Inputs for Flow Analysis	30
    Script 2: Computing Surface Water Flow Fraction	30
References	32
                                            E-3

-------
                                        Introduction

CUAHSI-HIS enables sharing of water data
The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Hydrologic
Information  Systems  project (CUAHSI-HIS) has devoted itself to improving access to time series of
water data.  Towards that end, CUAHSI-HIS has developed standards for sharing data that make it easier
to ask for water data and interpret what comes back from a given data  source.   CUAHSI-HIS  also
maintains a  catalog of data available  from organizations that use  CUAHSI-HIS standards, essentially
serving as a  search engine for water data. The result is a universal mechanism for accessing time series
data, greatly simplifying the typically laborious task of getting the data you need to do your analysis.  But
how do people who are unfamiliar  with  CUAHSI-HIS  standards  use this system?  That's  where
HydroDesktop comes  in.

HydroDesktop uses CUAHSI-HIS to help you find water data
HydroDesktop is a free and open source Geographic Information  Systems  (GIS) application that helps
you discover, use, and manage hydrologic time series data published with CUAHSI-HIS. It handles the
details of how to work with CUAHSI-HIS so that you don't have to. HydroDesktop includes data query,
download, visualization, graphing,  analysis and modeling capabilities. The result is a spatially-enabled
system that facilitates  the aggregation of observational data describing the water environment.

Let's use HydroDesktop to learn about Jacob's Well Spring
This document presents an exercise that  shows how to use HydroDesktop to find water data for Jacob's
Well Spring  in Texas. With some simple analysis, you will compare characteristics  of this groundwater-
dominated system with those of a nearby river.  During the exercise, you will learn about some of the
most commonly-used  tools in HydroDesktop.

Related Links:
HydroDesktop - http://hydrodesktop.codeplex.com/
CUAHSI Hydrologic Information System - http://his.cuahsi.org/

About Jacob's Well Spring
The underwater cave  known as Jacob's  Well emerges in Hays County, Texas, at Jacob's  Well  Spring
where it serves as one of the primary sources of water for Cypress  Creek, which later flows into the
Blanco River. The  clear, crisp water cools down many Texans as it moves through  the Blue Hole
swimming area near Wimberley,  Texas.   This  karst spring has  been impacted in recent  years by
development in Hays  County and increasing demands  on the Middle Trinity Aquifer (Davidson, 2008).
                                            E-4

-------
                        Figure 0-1 Jacob's Well Spring (San Marcos Local News, 2009)
In 2005, a monitoring station was installed at Jacob's Well Spring 18 meters below the ground surface,
reporting flow and temperature conditions at  15-minute intervals.  The data for this station are accessible
via the US Geological Survey's National Water Information System (USGS NWIS).
                                                                                0 m
                                                       Location of USGS
                                                       gauging equipment
                                                                                40
                  60         50         40         30         20         10        Om
                      Figure 0-2 Cross-sectional diagram of Jacob's Well (Davidson, 2008)
                                                E-5

-------
            Figure 0-3 Jacob's Well Spring Monitoring Station (United States Geological Survey, 2007)
For more information about the spring, please read the 2008 Masters thesis of Sarah Cain Davidson from
The University of Texas at Austin.

Goals and Objectives
The goal of this exercise is to introduce you to the tools and functions available in HydroDesktop that
allow you to search for and synthesize hydrologic time series data in an area of interest. This exercise will
teach you how to find and obtain data for Jacob's Well Spring in Texas and compare data characteristics
using the analysis capabilities of HydroDesktop.

Objectives for this exercise include:
    •  Find streamflow and temperature data for Jacob's Well Spring in Texas.
    •  Identify useful time series and download them.
    •  Visualize time series data in graphs.
    •  Export time series data for use in other programs.

Computer and Data Requirements
At the time of this writing, HydroDesktop is still in beta stages, and thus changes are made frequently to
fix  bugs and enhance the  software.  Therefore, it is  recommended that you install the  version  of
HydroDesktop that was used to prepare this exercise: Version 1.2.591 Beta Release. This version is only
compatible with a Windows operating system such as Windows XP or Windows 7. You will also need
an Internet connection since you will be accessing  online resources to download time series data.

To install HydroDesktop:
       1.  In a Web browser, navigate to http://hydrodesktop.codeplex.com/.
       2.  Click the Downloads link near the top left of the page.
       3.  Find the link for the 1.2.601 Beta Release installer and click it.
       4.  Read the license and agree to it.
       5.  Save and run the installer, accepting all defaults. The installer will guide you through the
           rest.
                                             E-6

-------
An advanced portion of the exercise involves using the R statistical package within HydroDesktop.  R is a
separate program from HydroDesktop, so you will need to install it if it is not already installed on your
computer. It is not included with the HydroDesktop installation.

To install R:
       1.  In a Web browser, navigate to http://www.r-proiect.org/.
       2.  Click the download R link.
       3.  Click the link for a download site near your current location.
       4.  Click the link for your operating system (most likely Windows if you are using
           HydroDesktop).
       5.  Click the link for the base installation.
       6.  Click the link to Download  R.
       7.  Run the setup file and follow the instructions to complete the installation.

Participating in the Open Source Community

HydroDesktop is an open source product, which means that anyone can see the source code used to create
the program and contribute to its development. Even if you aren't a programmer, you can still participate
in the discussion forums and post bugs or feature requests in the issue tracker.

The  home for HydroDesktop is on CodePlex, a Web  site  for open source software.   To add to the
discussions or post  a bug, you  must first register for your free CodePlex account.  Once you have a
CodePlex account,  you  can log in at http://hvdrodesktop.codeplex.com/ and  start  contributing. The
community is really what drives open source  software development, so this is an exciting opportunity to
make your voice heard.

You are encouraged to provide feedback on any issue or problem you may encounter throughout this
exercise.   Feel free  to utilize online resources such as the issue tracker on the HydroDesktop  Web site
when providing feedback.  In this exercise you'll learn how to access  these resources directly through
HydroDesktop.
                                              E-7

-------
                                     Exercise Procedure
Suppose you live in Hays County in Texas, and for years you have enjoyed taking a dip in the Blue Hole
swimming area along Cypress Creek during hot Texas summers.  As population growth and increased
groundwater pumping threaten Jacob's Well  Spring, the primary source of water for Cypress Creek, you
decide to learn more about this valuable  resource.  In this exercise, you'll use HydroDesktop to find
temperature data and see how it compares to a nearby river.	
IMPORTANT
At the time of this writing, HydroDesktop is still in the beta stages of software development and thus still contains
bugs.  We are working hard to fix these bugs, but you may want to closely and carefully follow the exercise
procedure in the meantime in order to minimize bugs encountered.

Getting To Know HydroDesktop
Let's  open HydroDesktop and get to know  its user interface.
       1.   Open HydroDesktop (Start I All Programs I CUAHSI HIS I HydroDesktop I HydroDesktop).
       2.   Choose to create a new North  America project and click OK.

Take  a moment to  explore the user interface. As you can see, HydroDesktop looks much like a typical
GIS  interface. It  supports  complex layer symbologies, access to  online map services, and custom
programmed tools and plugins. It even comes with some basemap shapefile data which are already  added
to the map.   What sets HydroDesktop apart from other GIS  applications, is the ability to query for
hydrologic time series data.

Notice that HydroDesktop presents many of its controls on a ribbon, much like modern versions of
Microsoft Office.  The ribbon is  organized into tabs which contain groups of buttons and tools. There is
also an orb for accessing basic functions like saving and printing.
                        HydroDesktop Orb Button
                                                       b_                    o
                                                         «—  Cm** ; CJ|I »j K>J>I ^PMI>l
-------
If you have comments or issues as you work through this exercise, you can find helpful resources on the
Help tab.  The buttons on this tab let you view documentation, jump to the discussion forums or issue
tracker, email for help, or submit a comment.
       3.  Click the Help tab in the ribbon to view the buttons available on that tab.
       4.  Click the Issues button to open the issue tracker on the HydroDesktop Web site.
                                 Home   Tibte   Graph    Edit    Help
                                              ConSact   Submi!
                                              Support  Comrrant
                         Figure 0-2 Using the Help Tab To Open the Issue Tracker

       5.  Close the Issue Tracker Web page.

Creating a Project
HydroDesktop manages your  work within projects.   A HydroDesktop project file  (.hdprj) contains
information about what geospatial layers you have in your map and how those layers are symbolized.
These layers are stored in shapefiles, a widely available GIS data format. The shapefiles such as state
boundaries  that  are   included  with HydroDesktop   are  located  in  its  installation folder,  e.g.,
C:\Program Files\CUAHSI HIS\HydroDesktop\maps\BaseData.

The HydroDesktop project file also connects your work to a database (.sqlite file) where temporal data are
stored. This is where the time series data that you download through  HydroDesktop  are  saved.  A
relational database is much more efficient at storing time series data than shapefiles, and HydroDesktop
uses a free database called SQLite for this purpose.

You can create projects to organize your work, and you can save the project so that you can open it again
later.  When you first open HydroDesktop, it sets up a clean map and loads the default system database.
In order  to better manage the work in  this exercise, you will give this project a name and  save it
somewhere meaningful to you.

To save the project and database:
       1.  Click the HydroDesktop Orb button.
       2.  In the Orb menu, click Save Project.
       3.  Choose a location to save your project such as your desktop.
       4.  Name the project springs and then click Save.

The text  in the title bar of the HydroDesktop window  should now include the name of your project.
HydroDesktop has also created a database for your project named "springs.sqlilte." This database is saved
in the same location as your project file. You are now ready to work within your newly saved project and
database.
                                              E-9

-------
Searching for Hydrologic Data

When searching for data in HydroDesktop, you can specify the following filters: region of interest, time
period of interest, data source and variables of interest. HydroDesktop then searches the CUAHSI-HIS
national catalog of known time  series data to find locations of time  series that match your search.
Locations of time series data that match your search are presented in the map.  These results include
information that HydroDesktop can use to connect to each individual data provider for data access. You
can further filter the results and then choose which  data you want to actually download and store in your
database.

When you save data to your database, it is stored as a theme.  A theme is a collection of hydrologic time
series data that share a common  relationship. A theme can be anything from a geographic space (e.g.,
Texas, Colorado) to a hydrologic event (e.g., flood, hurricane) to a combination of both (e.g., Texas
Flood).  Simply put, a theme organizes a collection of related time series. HydroDesktop can save data to
a new theme or append data to an existing theme. The workflow for finding data and saving it to a theme
is shown in Figure 0-3.


     Start
Select Region
(where)
J

1
^
Select Variable(s)
(what)
j








Select Time Period
(when)
V. v1


Cil+ar Baciil+c
V. -S








f" **\
Select Service(s)
(who)
L J
|


v J





>C\
\J
End
                          Figure 0-3 Workflow for Searching for Hydrologic Data

In this exercise, you will locate streamflow and temperature data for one water year near Jacob's Well
Spring.  The county boundary for Hays County is included in the U.S. Counties layer that is already in the
map.  You'll use this boundary to restrict the area being searched.
        1.  Click the Home tab in the ribbon to activate it.
        2.  In the Search Panel on the right, under the Area tab, choose U.S. Counties in the list of
           Active Layers.  The map activates this layer while the Search Panel shows the fields in this
           layer.
        3.  Under Select Search Parameter, scroll through the names until you find Hays, TX. Click
           Hays, TX to select it. The map zooms to the county and highlights it in blue.
                                             E-10

-------
                                        Ke>wrt»  Rewti Search

                              Active Layer  U 5 Ca/t*i
                                                Siea Search Parameter
                                                   NAME
                                 Figure 0-4. Choosing a Search Area
 Tip
 As you build your query, the current search parameters are shown in the Search Summary at the bottom of the
 Search Panel.  To give yourself more room to work, you can hide the Search Summary by clicking the Hide Search
  Summary button ^J.  Click the button again to show the Search Summary once it is hidden.
Next you will tell HydroDesktop the date range of time series that you want.  For this exercise, search for
data available in the 2010 water year (i.e.. 10/1/2009 to 9/30/2010).
        1.  In the Search Panel, click the Options tab to activate it.
        2.  Specify a Start Date of 10/1/2009 and an End Date of 9/30/2010. You can click and type
           the numbers in directly, or you can click the drop down arrow next to the date to open an
           interactive calendar.

 Tip"
 On the Options tab, you can also select specific Web services (i.e., data sources) to query. The default is to
 search all Web services.

Next  you will tell HydroDesktop what hydrologic variables  you want.  To help you in this regard.
HydroDesktop employs a list of official CUAHSI-HIS keywords for hydrologic variables. Data providers
use this list when registering with CUAHSI-HIS.  This is a lot easier than typing whatever term the data
provider may be using internally (e.g., 00060 for USGS streamflow).
        3.  In the Search Panel, click the Keywords tab to activate it.
        4.  Start typing "streamflow" in the Keywords text box. The list of keywords automatically
           selects keywords that match your search.
        5.  Click Streamflow in the list of keywords.
                                              E-ll

-------
                                                                  rj,
tea   option* Keywwb
                              frst lew ietlsra
                               Hydrosphere
                                                   L S**c." Nxiagemtrt
                       Figure 0-5 Choosing a Hydrologic Variable Keyword to Search

When you click Streamflow, the keyword list automatically jumps to the term "Discharge, stream."  It
just so happens that "Discharge, stream" is the official keyword for what we think of as streamflow.
However, the keyword list also includes synonyms like "streamflow" to make it easier to find the variable
you're after.  To the right of the keyword list, you can see where stream discharge fits within the overall
hierarchy of hydrologic variables.

Now that you've identified the right keyword, you'll tell HydroDesktop to add that keyword to the list of
variables for which it will search.
       9.  In the Search Panel, Click the Add button  J to add "Discharge, stream" into the Selected
           Keywords box.

Now you will repeat the process to add the water temperature keyword.
       10. In the Keywords text box, start typing "temperature."
       11. Click Temperature, water in the list of keywords.
       12. In the Search Panel, click the Add button to add "Temperature, water" into the Selected
           Keywords box.

With search parameters set, you will now tell HydroDesktop to run the search for data.
       13. In the Search Panel, click  Run Search.

When you run  a search, HydroDesktop asks the CUAHSI-HIS  national catalog for descriptions of time
series that match your search criteria.  At this point, your software is using a remote online resource and
bringing back data to display in your map. After HydroDesktop has finished searching for time series, it
displays the locations of time series that fit your search in a map layer called "Search Results." Note that
there will be several "dots" at a single location in the map if the site represented by a given dot measures
more than one time series that matches your search.  In other words, each dot in the map represents a time
series of data. Different symbols for the dots indicate different data sources.
                                             E-12

-------
          Figure 0-6 Locations of Streamflow and Water Temperature Observations in Hays County, TX

While the search may have seemed fast, remember that your map is only showing where time series of
interest are located, and that you haven't actually downloaded any time series values yet. Now you can
begin to refine these  search results to locate time series that you actually want to download and save to
your database.

Selecting Data for Download
To help you identify time series of interest, HydroDesktop includes data and tools to give you a spatial
context for the data.   One of these is the ability to display online basemaps  from ESRI, Bing, and
OpenStreetMap. These are beautiful cartographic maps cached at multiple scales which are accessed in
real time as  you move around in the HydroDesktop map. For this exercise, you will enable the ESRI
Hydro Base Map. This map  shows a nice blend of hydrologic features and administrative boundaries.

To enable the basemap:
       1.  On the Home tab in the ribbon, find the Online Basemap group.  Click the drop down list of
           basemaps and choose ESRI Hydro Base Map.
                                       ESRI Hydro Base Map
                                        Opacity 100 -

                                           Online Basemap
                                Figure 0-7 Enabling an Online Basemap

In addition to providing spatial context, you can see that this basemap can help you produce a more
aesthetically pleasing printed map.

For this exercise, you will work with the sites and variables shown in Table 0-1.  You will select the
features that  represent these time series so that HydroDesktop knows which time series you want to
download. These time series are located at or near Jacob's Well Spring.
                       Table 0-1 Selected Time Series in the San Marcos River Basin
SiteName
Blanco Rv nr Kyle, TX
Jacobs Well Spg nr Wimberley, TX
Jacobs Well Spg nr Wimberley, TX
Blanco Rv at Halifax Rch nr Kyle, TX
VarName
Discharge, cubic feet per second
Discharge, cubic feet per second
Temperature, water, degrees Celsius
Temperature, water, degrees Celsius
DataType
Average
Average
Average
Average
                                             E-13

-------
To select time series for download:
       1.  In the Map Contents on the left, right-click on the Search Results layer name and click View
           Attributes.
                                 a
                                       NWISIIC  Set Dyrwnc Wsttty Scale
                                       WIYIStA  L*elns
                                              S£«*t?>Wl
                                 BSOn««L«  ^_U
                                 B E B«H M«p D
                         Figure 0-8 Viewing the Search Results Layer Attributes
The Attribute Table Editor opens showing you descriptions of time series in the Search Results layer.
You can scroll through the table and resize columns to see the information. Some key columns to note
are:
    •   SiteName - The name of the monitoring point where the time series is recorded.
    •   VarName - The name of the variable represented by the time series.
    •   DataType - Some sites report several statistics  of data.  For example, at Jacob's Well Spring, you
       can find minimum, maximum, and average streamflow values computed on daily time step.
    •   StartDate, EndDate, ValueCount - These fields give you sense of the overall period of record for
       a time series and the number of values for its period of record.

As indicated in Table 0-1, you will focus on average streamflow and temperature for Jacob's Well Spring,
and you'll  also choose a couple of time series from nearby sites for comparison.
       2.   In the Attribute Table Editor,  use the values in the SiteName, VarName, and DataType
           columns to  locate rows that match the values in Table 0-1.  While holding down the CTRL
           key, left click on these rows to select them.
                                            E-14

-------
                             Attribute Table Editor
                             Edit  View  Selection   Tools
                            El  -i j * m
                           Bear Ck bl FM 1S26 nr Driftwood. TX     Discharge, cubic feet per second
                           Blanco Rv at Wimberley. TX
Discharge, cubic feet per second
                           Blanco Rvnr Kyle. TX
Discharge, cubic feet per second

Discharge, cubic feet per second
                           Blanco Rvnr Kyle, TX
                           Jacobs Well Spg nrWMxafay, TX
Discharge, cubic feet per second
                           Jacobs Well Spg nr Wimberiey. TX
Discharge, cubic feet per second
                            C:\Tem plsea rch resu I t_N Wl S DV.sh p
                                 Figure 0-9 Selecting Time Series for Download

        3.  Close the Attribute Table Editor.

With the items selected, you are ready to download the data.

Downloading Data

Recall that when you save data, it is organized into a Theme and stored in the database associated with
your HydroDesktop project. You'll now download the data that you selected.

To download data:
        1.  In the Search Panel, in the Results tab, type "Hays County Data" in the New Theme text box,
            and make sure New Theme is selected.
        2.  Click Download Data.
                                 Save data to..
                                 © Now Theme
                                 Hays CAiry Data'
                                  InfifTM
                                         Figure 0-10 Saving a Theme

A download manager opens to show progress of the download.
                                                  E-15

-------
             ' Download Manager
             Download Complete.

             Total series:     4     Downloaded and saved:
             Remaining series:  0     With error:

             Estimated time:    00:00:00
                            fJ'vVISDV:Cei712... N'',VISDV:CCC1 C/
                                                   Blanco Rv at Mali... Temperature, wat
                                   Figure 0-11 Download Manager

        3.  Dismiss the message box and hide the download manager when it is finished.

Once the download is complete, the new theme is shown under the Themes group in the Map Contents.  If
you create additional themes, they will also appear in the Themes group.

Now that you've downloaded the data, you can view the data in both tabular and graph form.

Visualizing Time Series Data
HydroDesktop takes  a series-centric view of temporal data, meaning that it provides access to the data at
the time series level. An example of a time series is all  of the temperature values measured at a certain
point on the Blanco River. Let's take a look at the time series that you just downloaded.

To visualize time series data in HydroDesktop:
        1.  Click the Graph tab in the ribbon to activate it.

On the left you will see a list of all the time series in your database. You could use filters to restrict the
time series that are shown, but you only retrieved a handful in this exercise so it's fine to leave the default
view.
        2.  Click to place a check next to Temperature at the Blanco River at Halifax Ranch.

From the graph you  can see how the temperature in the water changes with the seasons throughout the
year. Now let's compare this time series with the one for Jacob's Well Spring.
        3.  Place a check next to Temperature at Jacob's Well Spring.

HydroDesktop  allows  you  to visualize  multiple  time series  on the  same  graph.  The plot  axes
automatically adjust  to fit your data.  In this example, there  is a dramatic difference between the two
temperature time series. The one for Jacob's Well Spring shows much less variation throughout the year
than the one for the Blanco River at Halifax Ranch.
                                              E-16

-------
SefectenToei
 © ALL O
                           Q Conwte* Fief
                                                              Temperature, water •
II :
                               Barao Rv rtr Ky
                               .^,-c-t-! ',Ve! £05
                   TemperaUre. watw
                          . wdte1 Jacobs W*i Sp$
                                               35
                                            1  30
                                            
-------

            36
er - degrees Celsius
                 Temperature
                     drop
	t.-A^H,
   Flow            £  .
 increase   u       f*ft

  .  '%$                *
                                                                                  0
                                                                                  Q
                                 120
                                                                             100
                                                                              80
                                                                            -  60   o
                                                                              40   8,
                                                                                  o
                                                                            -  20   Q
            Aug-2009   Nov-2009    Feb-2010   May-2010
                                       Date and Time
            Aug-2010    Nov-2010
         — •— Blanco Rv at HaSifax flch nr Kyle TX. ID: 4
              Jacobs WeH Spg or Wlmbertey. TX. ID: 3
         — •— Jacocs wen &og nr wimuertey TX, ID; 2
                        Figure 0-13 Examining Changes in Flow and Temperature

       7.  Uncheck the two temperature time series, and place a check next to Discharge at the
           Blanco River near Kyle.

The flow of the Blanco River dwarfs that of Jacob's Well Spring, but you can still see increases in flow in
the Blanco River at about the same time  as those observed in the spring.  Also notice the peak flow in
September, 2010.  This flow is the result of tropical storm Hermine as it swept through Texas.

Labeling Features
Are you  wondering where the time series that you just graphed are  located?  In this portion of the
exercise,  you will add labels to the map to identify site locations.
       1.  Click the Home tab in the ribbon to activate it.
       2.  In the Map Layers, uncheck Search Results to turn off that layer.  Now you should only see
           the three sites for which you downloaded data.
       3.  In the Map Layers, right-click on the Hays County Data layer name and click Labeling |  Label
           Setup.
                                            E-18

-------
      '.-<&
        ^^-
     %--
              'Stare* Sesults-
                NWISOV

V NWISllD
aBntMrnw
'-: -V] HiyiGourtyDiti
f NWSOV
- J1 -:.V,i •!!.• U.ovf'i.;;-,
S El Base Map Data
H IVi laksi
Zoom to Layer
ViewAttnbutei
Set Dynamic Vtubl.tv Scale
Ubci.ng t
Selection >
• -
Label Setup ^U
Set Dynamic Visibl
                           Figure 0-14 Accessing Label Setup
4.  In the Feature Labeler dialog, in the list of Field Names, double-click SiteName to use that
    field for the labels. [SiteName] will be added to the text box near the bottom of the dialog.
                   Foalure Labder
                         Figure 0-15 Choosing a Field for Labels
5.  Click Apply, and then click OK. Your map should now be labeled according to the settings
    you have just chosen.  You can clearly see Jacob's Well Spring and the two sites along the
    Blanco River.
                                       E-19

-------
                             Jacobs Well Spgwir Wimberley.'TX.--''
                                                  Blanco Rvwir Kyle, TX
                                    Figure 0-16 Labeling USGS Sites

Delineating Watersheds

At this point,  it might be nice to see precipitation data in this watershed for this water year.  You can
delineate watersheds for any river in the conterminous U.S. using a Web service provided by the EPA.
All you have to do is click on the desired watershed outlet location in the map, and then HydroDesktop
sends that point location to the  EPA service.  The service figures out which National Hydrography
Dataset (NHD) reach the clicked point is closest to, and then finds all catchments that the reach drains.
The catchments are merged into a single watershed and returned to HydroDesktop.

Note that the watershed returned is for the outlet of the entire reach, so if the point you clicked isn't at the
reach outlet, then the resulting watershed will include some additional area downstream of your clicked
point.  Thus,  this tool  is useful  for helping to identify an area of interest but  should not be used  to
determine watershed  parameters  such as area.  Future versions of the tool will support more precise
delineation.

In this portion of the exercise you will delineate a watershed for the area draining to  Jacob's Well Spring.
The watershed delineation tool is part of a HydroDesktop extension called EPA Delineation.

To  delineate a watershed for the area draining approximately to the Jacob's Well Spring location:
    1.   On the Home tab, in the EPA Tool panel, click Delineate to activate the delineation tool.
    2.   The tool prompts you for where to save the resulting  datasets. Accept the defaults by clicking
        OK.
    3.   Click on the site location for Jacob's Well Spring.

After a moment, the watershed is shown in the map.  The NHD  reaches flowing to  the point that you
clicked and the point itself are also shown.
                                                  r Wimberley; TX.
                                   Figure 0-17 Delineated Watershed

If you didn't get the correct watershed delineated then you can activate the tool and try again. It's OK to
overwrite previous results.
  Note
  Recall that the watershed is actually delineated for the outlet of the nearest NHD reach, which happens to be
  very close to Jacob's Well Spring in this example.  Also be aware that the surface watershed you just delineated
  defines some but not all of the area contributing water to the aquifer for Jacob's Well Spring. However, the area
  will suffice for this exercise which  merely demonstrates how to delineate watersheds and use those watersheds
  to find data.
                                              E-20

-------
With the watershed delineated, now you're ready to search for data in this watershed.
Searching for Additional Data
You can append this data to your current theme by performing another search.  In this exercise, you'll
search for daily precipitation data in the watershed for Jacob's Well Spring.

To demonstrate how to choose  a particular data source for a  search, you will select to search for
precipitation data from the National Weather Service.
        1.  Click the Home tab in the ribbon to activate it if it is not already active.
        2.  In the Search Panel,  under the Area tab, choose watershed as the active layer.
        3.  Under Select Search Parameter, select the only item present.
*«a   Options

 tave Layer waterahed
                                                       i Management
                               Figure 0-18 Selecting a Watershed to Search
        4.  In the Options tab, click the Show Web Service Selection Panel checkbox.
        5.  Click Select None.
        6.  In the list of services, place a check next to NWS-WGRFC Daily Multi-sensor Precipitation
           Estimates. This service provides precipitation data from the National Weather Service West
           Gulf River Forecast Center.
                              Figure 0-19 Selecting a Single Service to Search

        7.  In the Search Panel, click the Keywords tab to activate it.
        8.  In the Selected Keywords list, select all keywords from the previous search.
                                              E-21

-------
       9.  Click the Remove button .* L to remove those keywords from the search.
                             Figure 0-20 Removing Keywords from a Search
       10. In the Keywords text box, start typing "precipitation."
       11. Click Precipitation in the list of keywords.
       12. In the Search Panel, click the Add button to add "Precipitation" into the Selected Keywords
           box.
       13. In the Search Panel, click Run Search.

When the search finishes, you'll see some regularly-spaced dots over the watershed. These dots represent
the centroids of NEXRAD HRAP cells.  In other words, this Web service provides discrete  point
locations where you can basically sample this gridded rainfall dataset.  Each of these  points  is like a
virtual rain gauge. For this exercise, you'll just pick one near Jacob's Well Spring.
Adding  Data to a  Theme
       1.  In the ribbon, click the Select tool to activate it.
       2.  In the Map Layers,  make sure the Search Results layer is the only layer selected. The Select
           tool works with the selected layer in the list of map layers.
       3.  Draw a box around one or more precipitation sites to select time series for download.
                                             Jacobs Well Spg nr
                       Figure 0-21 Graphically Selecting a Time Series for Download
       4.  In the Search Panel, in the Results tab, select Hays County Data in the Existing Theme text
           box.
       5.  Click Download Data.  Hide the download manager once the download is complete.
                                             E-22

-------
The precipitation data are added to your theme. Now let's view the results.
       6.  Click the Graph tab.
       7.  Plot a graph with one of the precipitation time series you just downloaded and Discharge
           from Jacob's Well Spring.  Notice how quickly Jacob's Well Spring responds to rainfall
           events.
120 ,

100 -
"O
o
(I
0
fe 80
Q.
**
£
w 60 -
2
3
0
O
= 40 •
ra
3
5
20 -

0 -


















mm
I
j|



^^^^~ ^^T^~

I














.
;


4


u^^
*
*
:
1

M
V
%
•
i
i^
\
\
\
!]f\
. >J
f£ ^ ^^ 1 .



1
























f


1
iVk
V4w


- 8
- 7

£
- 6 c
__
8
-5^
1
- 4 1
•™
1
c
-3 1
£
a.
- 2 S
Q.

- 1
- 0
Aug-2009 Nov-2009 Fet>-2010 May-2010 Aug-2010 Nov-2010
Date and Time
| • Jacobs well Spg nr wimberley. TX ID 2 - 289744 ID" 6 |

                        Figure 0-22 Rainfall and Streamflow at Jacob's Well Spring

While HydroDesktop does contain additional analysis capabilities, it can also export data to a text file for
use in other programs.

Exporting Data

HydroDesktop can  export data to a variety of output file types for further study and  analysis. For
example, you can export individual time series by placing a check next to them and then  right-clicking
them in Table or Graph view.  For this exercise, you will export all time series for an entire theme.

To export time series data for a theme:
       1.  Click the Table tab in the ribbon to activate it.
       2.  In the Data Export group, click Export.
                                             E-23

-------
                                     - tjcercis*1.httprj
      :••,'.-.-

 J*    o
Change  New
  Database
                                                    Help

                                                      Add

                                                 Metadata
  Export

Data Ex port
                                      Figure 0-23 Exporting Data

This tool exports all data in a theme to a delimited text file. In the Export To Text File dialog, notice that
your theme name is already selected by default. You can also control the fields that are included in the
export and choose a delimiter.  For this  exercise, you will accept all defaults to produce a comma
delimited text file.
        3.  In the Export To Text File dialog, specify the output file location and name.
        4.  Click Export Data.
        5.  Close the Export To Text File dialog when it is finished.
                               Export IP Text File
                               •- -•: . "-- |

                               Theme Hame'  Ha>-s CouHy Dala

                               Q pOutH >Ja Daia- Value*

                               S*rt Fwyt to Sowrt

                               0
                               0
                               QVarUnt*
                                                                 -
                                Setedtf
                               © GBHHB CSV]  Q Tab      O a»e«
                               OBe*       OS«ffl«tav
                                   Figure 0-24 Setting Export Options

        6.  Find the file on your computer and open it to verify that the data were exported.


Congratulations!  With your theme data in hand, you have completed the exercise and learned how to
use HydroDesktop to discover and access water data. Feel free to experiment with other functionality
such as creating and printing a map, and be sure to give feedback using the Help tab. This concludes the
main portion of the exercise.  For an example of more advanced analysis, continue with the section
below to learn how to use the R statistical environment with HydroDesktop.
                                              E-24

-------
                                Advanced: Analysis with R
The work above has illustrated temperature, precipitation and discharge data and suggested that variations
in temperature in Jacob's Well Spring may be related to the mixing of surface and subsurface water
sources. In this section, you will use the HydroR plug-in in HydroDesktop to explore this phenomenon.
The HydroR plug-in provides an interface between HydroDesktop  and the free R statistical software
environment.
Enabling the HydroR Plug-in
Once R is installed, you will need to enable the HydroR plug-in in HydroDesktop.

To enable the HydroR plug-in:
       1.   Click the HydroDesktop Orb button.
       2.   In the Orb menu, click Extensions. In the extensions list, click HydroR. This adds a HydroR
           tab to the ribbon. You may have to hover the mouse on and off of Extensions to make the
           list open.
       3.   Click the HydroR tab in the ribbon to activate it.
       4.   In the HydroR tab, click Start R.
       5.   If prompted for the path to R.exe, enter the path where it was installed on your computer.
           Note that R may include more than one R.exe file.  The one you typically need is in the
           bin\i386\ folder, as in C:\Program Files\R\R-2.13.0\bin\i386\R.exe. Click OK once the path
           is entered. HydroDesktop will remember this path the next time you use the HydroR
           extension.
       6.   HydroDesktop needs some additional  R libraries. If you are using the HydroR extension for
           the first time, you may be prompted for a CRAN  mirror to download these libraries. Select
           the mirror closest to your location and click OK.  The appropriate libraries are downloaded
           automatically.

R should now  start and give you a blank R script in the top panel and  R Console in the bottom  panel.
Standard R commands can be entered in the R Console. HydroR makes it easy to provide R access to the
data you have downloaded with HydroDesktop using the buttons on the ribbon in the HydroR tab.
                                            E-25

-------
     & CUAH3 HyrftoOwttop - )««*'* Wefl Sprmj.Mprj
     Hen*    Tabte    G«|3*i    Edit
                                              Hj, I
                                                  «*--i	.  UJIwirlmiir ttctn
                                                  j^fltRay^  jjinoowr> U^f
                                                      R Console
            CNTempVljccbsVliccb's Well Spnng-sqfctc
                                      Figure 0-1 HydroR Layout
Plotting a Graph with R
To get familiar with how HydroR works, you'll plot a hydrograph for Jacob's Well Spring.

To plot a graph in HydroR:
        1.  In the HydroR tab, in the list of time series on the left, select a time series that you would
           like to import as a data frame into R. For this exercise, select discharge at Jacob's Well
           Spring.
        2.  In the ribbon, click Generate R code. The R code to get the selected data series is entered
           into the script.
        3.  Click Send All.  This sends the script text to the console  and executes it. The result is an
           object named dataO, which contains a list of R data frames.
        4.  In the R Console, enter labels (dataO) to see a list of data frames that make up dataO.

These data frames are basically tables from your HydroDesktop database.  The key table we're interested
in for this exercise is the DataValues table.
        5.  To see the first six rows of DataValues, in the R Console, enter head (dataO$DataValues)
           (R is case sensitive, so type the command exactly as it appears in this text). You may have to
           scroll up in the R Console to see the  full result.

You'll use the LocalDateTime and DataValue columns to provide data for the graph.
                                               E-26

-------
       7.
6.  To make it easier to access this streamflow time series, in the R Console, enter
   Q. j acobs=dataO$DataValues. This assigns the DataValues data frame to a variable
   named Q.Jacobs.
   In the R Console, enter
   plot (Q. Jacobs$LocalDateTime,Q. jacobs$DataValue, type="l") (the type is a
   lower case L, not a one). A time series plot of the data should appear in an R graphics
   window, demonstrating that the full capability of R is available to work with the data that
   has been imported.
                  1^ R Graphics: Device 2 (ACTIVE]
                  File  History Resize
                  _
                  03

                  -2
                  to
                  Q
               O
               oo
                              Nov    Jan    Mar    May

                                        flowSLocalDateTi me
                                                  Jul
Sep
                       Figure 0-2 Jacob's Well Spring Hydrograph Plotted Using R

Let's add a title to this graph.
       8.  In the R Console, enter dataO$Variable to see all attributes for the Variable table.
       9.  In the R Console, enter title (dataO$Variable$VariableName) to add the variable
           name as the title for the plot.
       10. After verifying that the title was added, close the R graphics window showing the graph.

Analyzing Flow in Jacob's Well Spring

Let's use mixing theory to estimate the fractions of Jacob's Well Spring flow that are from the surface and
subsurface based on temperature. Assume the surface source has temperature equal to the temperature in
Blanco River.  Assume groundwater source  is at a fixed temperature.  The following equations then
apply.
       Energy Balance:
       Mass Balance:
                     QT=Q1T1+Q2T2
                     Q=Qi+Q2
where Q is  discharge in Jacob's Well Spring,  T is  Temperature  in Jacob's Well Spring,  TI  is the
temperature of the surface source (assumed equal to Blanco J^iver temperature), and T2 is the temperature
of the subsurface source (assumed constant and taken as the average of last 60 days). Qi and Q2 are the
unknown discharge contributions from surface and subsurface sources respectively (Figure 0-3).
                                            E-27

-------
          Figure 0-3 Surface and subsurface contributions to Jacob's Well Spring outflow and temperature
Two linear equations, two unknowns can be easily solved (see your high school algebra book).  The
solution is

                      Qi/Q = (T-T2)/(TrT2)

The R scripts in Appendix A use data from HydroDesktop to solve this equation.  You'll assign the
relevant time series to simple variable names and then use the R scripts to plot a graph representing the
amount of flow in Jacob's Well Spring inferred as coming from the surface.
To use the R scripts to compute fractional flow:
        1.  In the same manner that you created the Q.Jacobs variable above and assigned it to be the
           discharge at Jacob's Well Spring, create and assign the following R variables (remember, the
           variable names are case sensitive).  In other words, for each variable, clear the R script
           panel, select a series, generate the R code, send it to R, and assign the variable in the R
           Console.
               a.  Q.blanco - Discharge at the Blanco River near Kyle
               b.  t.blanco - Water temperature at the Blanco River at Halifax Ranch
               c.  t.jacobs - Water temperature at Jacob's Well Spring**

"""IMPORTANT
You may encounter a bug when generating R code for water temperature at Jacob's Well Spring. In the R script
panel, if the endDate is not "2010-09-30" then edit the script to use "2010-09-30" before sending the script to the
R Console.

        2.  Enter Script 1 found in Appendix A into the R Console to execute the script. This script
           prepares inputs for the analysis and plots graphs of the input temperature and flow.
        3.  Once you have reviewed the graphs of temperature and streamflow generated by Script 1,
           close the two R Graphics Windows containing the graphs.
                                             E-28

-------
       4.  Enter Script 2 found in Appendix A into the R Console to execute the script. This script
           smooths the temperature time series and then performs the analysis to determine the
           fraction of flow in Jacob's Well Spring from surface water.


The resulting graphs show smoothed temperature  time series and  the portion  of flow in Jacob's Well
Spring inferred to be from the surface (the red line in the graph).  Note that the analysis requires
differences between the assumed groundwater temperature and surface water temperature, so the graph
will be missing segments when those temperatures are nearly the same.
  1^ R Graphics: Device3 (ACTIVE)

   File  History  Resize
               = II  B
        CD
        O
        O
        CD
        O
        <£>
        O
        T
        O
        CM
                                                                   nferred From Surface
                   Nov        Jan         Mar        May

                                                 DT
Jul
Sep
                            Figure 0-4 Fractional Flow in Jacob's Well Spring

Congratulations! You have completed the exercise and seen how advanced analysis environments such as
R can be integrated into HydroDesktop using the power of plug-ins. This concludes the advanced portion
of the exercise.
                                             E-29

-------
                           Appendix A: R Scripts

Script 1: Preparing Inputs for Flow Analysis
#  SCRIPT 1: PREPARING  INPUTS  FOR FLOW ANALYSIS

#  This code plots input  time  series  of flow and temperature.
#  The code assumes  the following variables have already been set to
#  the DataValues data  frame for  these time series:
#    Q.Jacobs - Discharge at Jacob's  Well Spring
#    Q.bianco -- Discharge at the  Blanco River near Kyle
#    t.Jacobs -- Water temperature at  Jacob's Well Spring
#    t.bianco -- Water temperature at  the Blanco River at Halifax Ranch
#  The code handles  intermittent  missing values

#  Start one day earlier  because  queries seem to be based on UTC
DT = seq(from=as.Date("2009-09-30"),to=as.Date("2010-09-30"),by=l)
ind=match(t.blanco$LocalDateTime,DT)
Tl=rep(NA,length(DT))
Tl[ind]= t.blanco$DataValue

ind=match(t.jacobs$LocalDateTime,DT)
T=rep(NA,length(DT))
T[ind]=t.Jacobs$DataValue
T2 = T[l]   #  The first value

ind=match(Q.jacobs$LocalDateTime,DT)
Q=rep(NA,length(DT))
Q[ind]  = Q.jacobs$DataValue
plot(DT,Tl,type="l",ylab="T")
lines(DT,T,col=2)
legend("bottomright",c("T Blanco","T  Jacobs"),col=c(l,2),lty=l)
windows()
plot(DT,Q,type="l")

Script 2: Computing Surface Water Flow Fraction
#  SCRIPT 2: COMPUTING  SURFACE WATER  FLOW FRACTION

#  This script solves the equation Ql/Q = (T-T2)/(T1-T2)
#  and plots a graph showing the  portion of flow inferred
#  to be directly from  surface water  sources in Jacob's
#  Well Spring.
#  Before running this  script,  you must run SCRIPT 1:
#  PREPARING INPUTS  FOR FLOW ANALYSIS

#  Smoothing Blanco  River temperature data using lowess
ind=!is.na(Tl)  #  array  indices  of unmissing values
TIK-lowess (DT [ind] ,T1 [ind] ,f=0.1)
plot(DT,Tl,type="l",ylab="Degrees C")
lines(Tll,col=2)
lines(DT,T,col=3)
                                  E-30

-------
legend("bottomright",c("Blanco     T","Smoothed    Blanco     T","Jacobs
T"),col=c(l:3),lty=l)
title("Temperatures")

#  Match the dates of the output for use  in  calculations
ind=match(Tll$x,DT)
Tls=rep(NA,length(DT))
Tls[ind]=Tll$y

#  For results to be reasonable Tl and T2 have  to be  different.
#  Only evaluate answers when Tl and T2 differ  by at  least  3  degrees
#  Also only accept positive answers

#  Calculate T2 as average over last 60 days
T2p=c(rep(T[1],59),T)
T2=rep(NA,length(DT))
for(i in 1:length(DT))  T2[i]=mean(T2p[i:(i+   )],na.rm="True")
Qlf=(T-T2)/(T1-T2)  # apply the mixing solution  equation
# eliminate answers when temperature difference  is  less than  3
indna=abs(T1-T2)<3
Qlf[indna]=NA
indna=Qlf< -0.05    # eliminate large negative values
Qlf[indna]=NA

#  Plot the results
windows()
plot(DT,Q,type="l",ylab="cfs",lwd=2)
lines(DT,Q*Qlf,col=2,lwd=2)
legend("topright",c("Flow","Inferred From Surface"),lty=l,col=c(1,2;
title("Jacobs Well Discharge")
                                  E-31

-------
                                             References
Davidson, S. C. (2008).  Hydrogeological characterization of base/low to Jacob's Well spring, Hays County, Texas
        (Master's thesis). Retrieved November 2, 2010, from Hays Trinity Groundwater Conservation District Web
        site:  http://havsgroundwater.com/files/Documents/Davidson-08 thesis  Cypress  Crk Jacobs Well.pdf.
San Marcos Local News. (2009, March 10). Jacobs Well area to hold incorporation vote.  Retrieved November 2,
        2010, from San Marcos Local News Web site: http://www.newstreamz.com/2009/03/10/area-around-
        iacobs-well-to-hold-incorporation-election/.
United States Geological Survey. (2007, March 26). National Surface Water Conference and Hydroacoustics
        Workshop.  Retrieved November 1, 2010, from United States Geological Survey Web site:
        http://water.usgs.gov/osw/images/2007 photos/Hydroacoustics.html.
                                                E-32

-------