v>EPA
EPA/600/R-13/029 I May 2013 I www.epa.gov/research
United States
Environmental Protection
Agency
Environmental Data Management
in Support of Sharing Data
and Management
Office of Research and Development
-------
EPA/600/R-13/029
May 2013
FINAL REPORT
ENVIRONMENTAL DATA MANAGEMENT IN SUPPORT OF SHARING DATA
AND MANAGEMENT
by
Dr. Lilit Yeghiazarian, Mr. Amr Safwat, Mr. Allen Teklitz, and Mr. Donald Morehead,
Pegasus Technical Services, 46 E. Hollister Street, Cincinnati, OH 45219
School of Energy, Environmental, Biological and Medical Engineering Program, University of
Cincinnati, Cincinnati, OH 45221
Dr. Tim Whiteaker and Dr. David R. Maidment
Pegasus Technical Services, 46 E. Hollister Street, Cincinnati, OH 45219
Center for Research in Water Resources, University of Texas at Austin, Austin, TX 78712
Dr. Elly P.H. Best,
Water Supply and Water Resources Division, National Risk Management Research Laboratory,
EPA/ORD, Cincinnati, OH 45268
Contract No. EP-C-11-006
Task Order No. 41
Dr. Elly P.H. Best,
Task Order Manager
Water Quality Management Branch,
Water Supply and Water Resources Division
National Risk Management Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
EPA's National Risk Management Research Laboratory,
Andrew W. Breidenbach Environmental Research Center
26 W M.L. King Drive,
Cincinnati, OH 45268
-------
DISCLAIMER
The U.S. Environmental Protection Agency (EPA), through its Office of Research and
Development, funded and managed, or partially funded and collaborated in, the research
described herein under Task Order (TO) 0041 of Contract No. EP-C-11-006 to Pegasus
Technical Services. This document has been reviewed in accordance with U.S. Environmental
Protection Agency (EPA) policy and approved for publication. The views expressed in this
report are those of the author[s] and do not necessarily reflect the views or policies of EPA.
Mention of trade names or commercial products does not constitute endorsement or
recommendation for use. The quality of secondary data referenced in this document was not
independently evaluated by EPA and Pegasus.
11
-------
ABSTRACT
A data management system (DMS) was developed, tested and demonstrated to store and manage
water quality and quantity (WQ2) data pertaining to U.S. Environmental Protection
Agency/Office of Research and Development (EPA/ORD) research projects in standardized
formats. This approach was taken to facilitate accessibility, sharing, integration, and use of this
information for simple calculations and inclusion into models by EPA and other users to inform
water management decisions. The objectives of this project were: (1) Build a previously
identified hydrologic information system (HIS) Observational Data Model (ODM) and convert
existing water quality and quantity data, generated by EPA/ORD-research projects, into this
format; (2) Develop, test and demonstrate a scalable DMS based on HIS protocols for storage
and management of geo-referenced data; and (3) Enable targeted users to access, exchange and
integrate geospatially referenced WQ2 information, and to conduct relatively simple calculations
and/or run models to inform water management decisions at various watershed scales. The HIS
developed by the Consortium of Universities for the Advancement of Hydrologic Science Inc.
(CUAHSI) was used in this project. The CUAHSI HIS ODM model is widely accepted in the
United States, is compatible with the National Water Quality Portal, and is gaining international
acceptance as the standard for water data. This HIS is web-based, and uses the WaterML format
for sharing hydrologic time series data.
Within the U.S., many organizations measure WQ2 and various biological parameters. Despite
the fact that this information is routinely made available to the public, the difficulties in
identifying data sources, syntactic and semantic heterogeneity across data formats and metadata
make the discovery, access and interpretation of data challenging for research and other
stakeholder communities. EPA-ORD seeks to alleviate these challenges and plans to use
CUAHSI's HIS and other tools to integrate time series data generated within its local, regional
and national watershed projects with data collected by collaborators, and to share these data with
other interested parties.
This report presents the results of exploring, implementing and amending the CUAHSI approach
to storing and managing data of (1) an ongoing WQ2 monitoring effort in the East Fork of the
Little Miami River (EFLMR) watershed in Ohio; and (2) a completed WQ2 monitoring effort in
the Shepherd Creek watershed near Cincinnati, Ohio. Data included in the new DMS pertained to
physicochemical parameters of discrete water samples, water-immersed sensors, and discrete
stream-substrate samples. Data on biological parameters of water-immersed sensors and discrete
stream-substrate samples were not included. The functionality of the DMS was demonstrated by
estimating an example nitrogen (N) TMDL using a Load Duration Curve (LDC) approach based
on data retrieved from the EFLMR DMS and other databases. All data were accessed, explored
and visualized through the HIS Hydrodesktop tool.
in
-------
ACKNOWLEDGMENTS
This report has been prepared with input from the research team, which includes Pegasus
Technical Services; the School of Energy, Environmental, Biological and Medical Engineering
Program, University of Cincinnati; the Center for Research in Water Resources, University of
Texas at Austin; and Water Quality Management Branch (WQMB)TWater Supply and Water
Resources Division (WSWRD)/National Risk Management Research Laboratory (NRMRL)/
Office of Research and Development (ORD) of the U.S. Environmental Protection Agency
(EPA).
Technical lead, direction and coordination for this project were provided by Dr. Elly P.H. Best,
EPA/ORD/NRMRL/WSWRD/WQMB.
The authors would like to thank Dr. Joel Allen, EPA/ORD/NRMRL/WSWRD/WQMB and Dr.
Yusuf Mohamoud, EPA/ORD/NERL/ERD for providing valuable written comments on the final
report.
Special appreciation is given to the contact person of the case study presented in this report, Dr.
Christopher Nietch, EPA/ORD/NRMRL/WSWRD/WQMB, contact persons of the EPA/ORD
Office of Science and Information Management (OSIM) who advised on the handling of the
EPA fire-wall, Mr. David Lyons, Mrs. Ann Vega and Mrs. Bhagya Subramanian, Cincinnati, and
Dr. Guoxiang Yang, ORISE Research Associate at EPA/ORD/NRMRL/WSWRD/WQMB who
supported the report preparation.
IV
-------
EXECUTIVE SUMMARY
Many organizations within the U.S. measure water quantity and quality (WQ2) variables such as
precipitation, streamflow, water quality parameters (pH, nutrients, etc), and various biological
parameters. Results of data analyses are typically published in reports and journal articles. These
publications often include technical details of data collection and metadata. Data values are often
made available as files that can be retrieved from public websites. Discovery, access and
interpretation by research and stakeholder communities are challenging because of syntactic and
semantic heterogeneity across data formats and metadata. Pilot projects that render information
on WQ2 variables, values, and metadata readily identifiable, accessible, sharable and usable in
analytical and modeling applications that accept standardized formats for direct interpretation,
can play a vital role in facilitating and accelerating collaborative and environmental modeling
activities within EPA-ORD, EPA, as well as between EPA and other agencies and organizations.
The pilot project described in this report was conducted to evaluate a web-based, highly
standardized and widely accepted hydrologic information system (HIS) for storage and
management of EPA-ORD regional watershed WQ2 data.
The main goal of the project was to explore a previously identified HIS for storage of WQ2 data
of an ongoing EPA-ORD watershed project, to develop, test and demonstrate the subsequent use
of these data for calculation of a total maximum daily load (TMDL), and to demonstrate
potential access and use by targeted users. In addition, a wider implementation potential within
EPA-ORD was explored.
The HIS developed by the Consortium of Universities for the Advancement of Hydrologic
Science Inc. (CUAHSI) was used in this project. This HIS provides a database schema called the
Observations Data Model (ODM) for consistent storage and management of observations data
and associated metadata, including data elements required by the WaterML format. ODM is
widely accepted in the United States, it is compatible with the National Water Quality Portal, and
is gaining international acceptance as the golden standard for water data. The CUAHSI HIS is
web-based and uses the WaterML format for sharing hydrologic time series data.
WQ2 data from an ongoing watershed management research proj ect in the East Fork Watershed
of the Little Miami River in Ohio (EFWLMR), pertaining to 2010, served as the initial data
source. The DMS work was initiated by downloading and installing a blank ODM database
(subsequently called 'DMS') from the CUAHSI-HIS website onto an EPA-ORD server located
within the EPA-ORD firewall. Of all data categories explored, the physico-chemical data from
discrete water samples, the sensor data from water-immersed sensors for physico-chemical
variables, and the physico-chemical data from discrete stream substrate (sediment) samples were
committed to the ODM. The biological monitoring data, largely on benthic macrofauna, could
not be loaded because no data loader is currently available to which the data may be mapped.
The data from the discrete water samples were reformatted and mapped to the ODM Data
Loader. Additional columns were inserted into their original Excel sheet for Sources and Sites,
and the information required to populate these columns was generated: for Sources,
organization/agency collecting the data; for Sites, site-name, latitude, longitude, and elevation. In
addition, the column headers were edited to match the ODM vocabulary, particularly the
Variable Code. For each existing Excel sheet, a second sheet was added to contain the metadata.
After reformatting, all data were committed to the DMS using the ODM Data Loader. The data
from the water-immersed sensors were committed directly to the DMS using the ODM
-------
Streaming Data Loader. The data from the discrete sediment samples were also reformatted, and
then mapped to a modified version of the ODM Streaming Data Loader. Besides editing and
addition of the same information as for the discrete water samples, additional columns were
inserted into their original Excel sheet for the sediment categories. The consistency of the data
committed to the DMS was validated by a three-step process, including verification of the data
category, of the ODM log file, and of the actual data committed to the ODM database (DMS).
The same procedure was used to commit water quality monitoring data of the completed
Shepherd Creek project to the ODM database, and, thus, also add this new database to the DMS.
The functionality of the DMS was demonstrated with an example nitrogen (N) TMDL
calculation using the Load Duration Curve (LDC) approach. Data were retrieved from the DMS
and other databases and accessed, explored and visualized through the HIS Hydrodesktop tool.
In this case Hydrodesktop accessed the DMS-data via a link to the test publication of the EFW
Database for which CUAHSI provides a codeplex sandbox. When implementation of the
recently developed DMS is approved for use within EPA-ORD and funding for the DMS to enter
into its' 'Production Phase' (involving dedicated server testing, operating and maintenance) is
generated, publication of the EFW Database in the CUAHSI HIS Services Central Web Service
Registry (and possibly other websites) can be accomplished. The latter step would enable access
and use by users in- and outside EPA/ORD, and may greatly facilitate and accelerate
collaboration.
VI
-------
CONTENTS
DISCLAIMER ii
ABSTRACT iii
ACKNOWLEDGMENTS iv
EXECUTIVE SUMMARY v
ACRONYMS AND ABBREVIATIONS x
1.0: INTRODUCTION 1
1.1 Project Background 1
1.2 Project Objectives 1
1.3 Report Outline 2
2.0: DATA SOURCES 1
3.0: DMS ARCHITECTURE AND USE FOR EXCHANGE AND INTEGRATION OF WQ2
INFORMATION 1
3.1 General Guidelines 1
.1. Hardware and Software Requirements 1
.2. Review of CUAHSIHIS Data Publication Tools and Service 2
.3. ODM Structure 2
.4. Installing a Blank ODM Database 2
.5. Requirements for Input Files to ODM Data Loader: Transforming Data into CUAHSI
Format 3
.6. Loading Metadata and Data Values into ODM 3
.7. Working With ODM Data 4
3.2. Working with East Fork Watershed (EFW) Data 4
3.2.1. Description of Physico-chemical Data Water File 4
3.2.2. Description of Sediment Data File 5
3.2.3. Loading Sites to the ODM 6
3.2.4. Loading Sources to the ODM 7
3.2.5. Loading Variables to the ODM 7
3.2.6. Loading SampleMediumCV to the ODM 8
3.2.7. Loading VariableNameCV to the ODM 8
3.2.8.LoadingSampleTypeCVtotheODM 8
3.2.9. Loading Variables to the ODM 8
3.2.10. Loading DataValues to the ODM 8
3.2.11. Loading Sensor Data from Water-immersed Sensors for Physico-chemical Variables to
the ODM 9
3.2.12. Loading Sediment Data to the ODM 9
3.2.13. Data Validation 11
3.3 Working with Shepherd's Creek Data 12
4.0: DATA PUBLICATION 13
4.1. Data Publication with WaterOneFlow 13
4.2. Publishing New Watershed Data 14
4.2.1. DMS Development (and Testing) Phase (Figure 4-1): 14
vn
-------
4.2.2. DMS Demonstration Phase (Figure 4-2): 14
4.2.3. DMS Production Phase (Figure 4-3): 15
5.0: USING THE DMS: EXCHANGE AND INTEGRATION OF WQ2 INFORMATION 20
5.1. Using HydroDesktop 20
5.2. Using HydroR 20
5.3. Using HydroExcel 20
6.0: DATA ANALYSIS TO INFORM DECISION-MAKING 21
6.1. Area of Interest and Problem Identification 21
6.2. Data Management 21
6.3. Data Exploration 21
6.4. Data Analysis Using Excel 25
7.0: CRITICAL GAPS OF CURRENT APPROACHES 32
8.0: RECOMMENDATIONS FOR IMPROVING UPON EXISTING APPROACHES 33
9.0: REFERENCES 34
FIGURES
Figure 3 -1. Observations Data Model (ODM) Schema, from Horsburgh et al. 2009 3
Figure 3-2. Physico-chemical Data Water 2010, Excel sheet 5
Figure 3 -3. Physico-chemical Data Water 2010, with New Formatted Data Values Spreadsheet 5
Figure 3-4. Physico-chemical Data Sediment 2010, with New Formatted Data Values Spreadsheet 5
Figure 3-5. Steps in Data Loading to the ODM 6
Figure 3-6. Loading Sites to the ODM 6
Figure 3-7. Loading Sources to the ODM 7
Figure 3-8. Loading Variables to the ODM 7
Figure 3-9. Loading 2010 Nutrient Data Values 9
Figure 3-10. ODM Streaming Data Loader 9
Figure 3-11. Modified Data Loader 10
Figure 3-12. New Testing Tab Added to the ODM Data Loader 11
Figure 3-13. Data Validation Using ODM Tools 12
Figure 4-1. DMS Development Phase 16
Figure 4-2. DMS Demonstration Phase 17
Figure 4-3. Publication DMS in Demonstration Phase 18
Figure 4-4. DMS Production Phase 19
Figure 6-1. The East Fork of the Little Miami River Watershed, with Point and Non-point Sources for
Pollutant Loadings Indicated (fromNietchetal. 2010) 22
Figure 6-2. Monitoring Stations for the East Fork of the Little Miami River Watershed 23
Figure 6-3. Time Series of the Flow Data at Perintown, OH, Showing Seasonality and Variation by
Hydrological Year 23
Figure 6-4. Box and Whisker Plot of the Flow Data at Perintown, OH, Showing a Maximum in March
and Minimum in September 24
Figure 6-5. Summary Statistics of the Flow Data at Perintown, OH, Showing Variability in the Data 24
Figure 6-6. Flow at Perintown, OH, in 2010 25
Figure 6-7. Nitrogen Time Series at Perintown, OH, in 2010 26
Figure 6-8. Computed Nitrogen Loading Curve of the East Fork Watershed in 2010 27
Figure 6-9. Flow Duration Curve of the East Fork Watershed 28
Figure 6-10. Nitrogen Load Duration Curve of the East Fork Watershed 30
Vlll
-------
Figure 6-11. Nitrogen Load Duration Curve of the East Fork Watershed, with Difference Between the
Averages of Observed and Target Loads, Respectively, Marked 30
TABLES
Table 6-1. Load Duration Curve Estimated Nitrogen Loads and Overall Reductions Needed to Meet
the Proposed Dissolved Inorganic Nitrogen TMDL for the Management of Rivers and
Streams < 1,3 00 km2 (with 10% margin of safety) 31
APPENDICES
Appendix A: WORKSHOP: PUBLISHING DATA WITH THE CUAHSIHYDROLOGIC
INFORMATION SYSTEM (60 pp)
Appendix B: CUAHSI COMMUNITY OBSERVATIONS DATA MODEL (ODM) VERSION
1.1 DESIGN SPECIFICATIONS (54 pp)
Appendix C: DESCRIPTION OF ESF DATA LOADED TO THE ODM (11 pp)
Appendix D: DESCRIPTION OF SC DATA LOADED TO THE ODM (2 pp)
Appendix E: HYDROLOGY OF JACOB'S WELL SPRING. A TUTORIAL FOR USING
HYDRODESKTOP TO DISCOVER AND ACCESS WATER DATA.
PRESENTED AT THE UNIVERSITY OF CINCINNATI, SEPTEMBER 6, 2011
(32 pp)
IX
-------
ACRONYMS AND ABBREVIATIONS
ADC Application Deployment Checklist
cfs cubic feet per second
CUAHSI Consortium of Universities for the Advancement of Hydrologic Science Inc.
DIN Dissolved Inorganic Nitrogen
DMS Data Management System
DMZ Demilitarized Zone
DWTP Drinking Water Treatment Plant
EFLMR East Fork of the Little Miami River
EFW East Fork Watershed
EPA United States Environmental Protection Agency (=USEPA)
ERD Ecosystem Research Division
FDC Flow Duration Curve
GIS Geographic Information System
HIS Hydrologic Information System
LTER Long Term Ecological Research
Mac Macintosh
MGD Mega Gallons per Day
NERL National Exposure Research Laboratory
NRMRL National Risk Management Research Laboratory
NWIS National Water Information System
ODM Observational Data Model
ORD Office of Research and Development
OSEVI Office of Science and Information Management
PC Personal Computer
QA Quality Assurance
RTF Research Triangle Park
sa system administrator
SC Shepherd's Creek
STORET STORage and RETrieval system for water quality data
TMDL Total Maximum Daily Load
TO Task Order
UC University of Cincinnati
USAGE United States Army Corps of Engineers
USGS United States Geological Survey
WQ2 water quality and quantity
WQMB Water Quality Management Branch
WSWRD Water Supply and Water Resources Division
WWTP Waste Water Treatment Plant
-------
1.0: INTRODUCTION
1.1 Project Background
Within the U.S., many organizations measure water quantity and quality (WQ2) variables such
as precipitation, streamflow, water quality parameters (pH, nutrients, etc), and various biological
parameters. Results of data analyses are typically published in reports and journal articles. These
publications often include technical details of data collection and metadata. Data values are often
made available as files that can be retrieved from public websites. However, syntactic and
semantic heterogeneity across data formats and metadata make the discovery, access and
interpretation challenging for research and stakeholder communities.
To address these issues, the Consortium of Universities for the Advancement of Hydrologic
Science Inc. (CUAHSI) has developed a web-based Hydrologic Information System (HIS) that
uses the WaterML format for sharing hydrologic time series data (www.cuahsi.org). The
CUAHSI HIS also provides a database schema called the Observations Data Model (ODM) for
consistent storage and management of observations data and associated metadata, including data
elements required by WaterML. The CUAHSI model has been widely accepted in the United
States and is gaining international acceptance as the standard for water data, with over 150
organizations such as universities, utilities, agencies, businesses, state and local governments. An
overview of web-based interactions facilitating the sharing of hydrological data enabled by
working with CUAHSI-HIS is provided in Appendix A.
EPA-ORD plans to use CUAHSI's HIS and other tools to integrate time series data generated
within its local, regional and national watershed projects with data collected by EPA's
collaborators, and to share these data with other interested parties. The significance of this effort
is that the standardization and integration of data are expected to greatly facilitate collaborative
and environmental modeling activities within EPA-ORD, EPA, as well as between EPA and
other agencies and organizations in the new Safe and Sustainable Water Research Program's
focus areas including Sustainable Water Resource Flows and Sustainable Natural and
Engineered Water Infrastructure Systems.
1.2 Project Objectives
This project has three objectives:
Objective 1. Build a CUAHSI ODM and convert existing WQ2 data into the CUAHSI format;
Objective 2. Develop, test and demonstrate a scalable Data Management System (DMS) based
on CUAHSI protocols for storage and management of geo-referenced discrete data generated by
ongoing watershed management research in the East Fork Watershed of the Little Miami River
in Ohio (EFWLMR), as well as newly initiated research at other locations;
-------
Objective 3. Enable targeted users to access, exchange and integrate geospatially referenced
WQ2 information, and conduct relatively simple calculations and/or run models to inform water
management decisions at various watershed scales.
The results of this project are subject to the Quality Assurance Project Plan ID no W-15812
(Approval date: 11/21/2011).
1.3 Report Outline
The remainder of the report is organized as follows. Section 2 describes data sources used in the
project and those projected for use in the future. Section 3 outlines the approach taken by the
team to convert the existing WQ2 data into the CUAHSI format. This section meets the first
objective of the project. Section 4 describes data publication phases, with a focus on the current,
demonstration phase. Section 5 describes the architecture of the DMS developed for the
EFWLMR data, and its use for demonstration purposes. Sections 4 and 5 meet the second
objective. Section 6 explains how users can retrieve data from the DMS and upload their own,
and describes the computer codes developed by the team to inform watershed management
decision-making. Section 6 meets the third objective of the project. Sections 7 and 8 identify
critical gaps in current approaches and data management and discuss the feasibility of further
improvement.
The contents of these sections are briefly outlined below:
2.0 Data Sources
This section outlines the data sources used in the project and those projected for use in
the future.
3.0 DMS Architecture and Use for Exchange and Integration of WO2 Information
The CUAHSI-HIS data publication tools are reviewed. The existing WQ2 data and
metadata were translated into terminology used by the CUAHSI ODM, these translation
methods are outlined in this section. The data were then loaded into ODM. All steps to
install the required software and load the data are outlined.
4.0 Data Publication
Steps needed to publish an ODM database are outlined. Three data publication phases are
presented: development, demonstration and production. The data publication is currently
in the demonstration phase, whereby a copy of the dataset is housed on a designated
server at the University of Cincinnati (UC). This web-service can be accessed by EPA
and non-EPA users who have access to the Internet, by searching in CUAHSI
HydroDesktop for "EPA - East Fork Watershed" in Ohio.
5.0 Using The DMS: Exchange and Integration of WQ2 Information
This section describes tools that enable data visualization and analysis. It includes a
discussion of HydroDesktop, HydroR and HydroExcel.
6.0 Data Analysis to Inform Decision-Making
This section contains the description of data retrieval by users from the DMS, uploading
data of their own, and the computer codes developed by the team to inform the decision-
making process of watershed management.
-------
7.0 Critical Gaps of Current Approaches
This section contains the overview and synthesis of information provided in Sections 3-6,
and identification of the critical gaps in the approaches.
8.0 Recommendations for Improving upon Existing Approaches
This section focuses on feasibility assessment for improvement upon the best current
approaches and discusses an implementation plan.
-------
2.0: DATA SOURCES
The first source of data is the EPA-ORD/NRMRL's ongoing WQ2 monitoring effort in the
EFW. This monitoring program has been in place since 2005. The resulting data include (1)
physico-chemical data from water samples; (2) sensor data from water-immersed sensors for
physico-chemicial variables; (3) physico-chemical data from stream-substrate (sediment)
samples; (4) sensor data from water-immersed sensors for biological variables; and (5) biological
data from stream substrate samples. In particular, data collected in the year 2010 were selected
for upload into the ODM. All data originating from this data source are subject to the Quality
Assurance Project Plan ID No 634-Q-2-0. Approved data may become available for public use
approximately 2 years after collection. For example, a 2-year lag-period between data collection
and public availability is customary for networks such as the Long Term Ecological Research
(LTER) Network.
Other potential data sources are related to projects within the NRMRL Cincinnati Cross-
Laboratory Green Infrastructure Study, where water quality monitoring is conducted. Among the
latter projects, those on Shepherd's Creek, Quebec Heights, and Pervious Pavement, all located
in Cincinnati, OH, served as potential additional data sources.
The monitoring data pertaining to the Shepherd's Creek project, located near Cincinnati, Ohio,
served as the second data source. This monitoring program has been in place from 2004 to 2011.
The resulting data included physico-chemical data from water grab samples. All data originating
from this data source are subject to the Quality Assurance Project Plan ID No S-10386-JA-2-0
(Approval date: 09/02/2011).
-------
3.0: DMS ARCHITECTURE AND USE FOR EXCHANGE AND
INTEGRATION OF WQ2 INFORMATION
In this section the hardware and software requirements to create a HydroServer for data
publishing are outlined. The DMS architecture is presented as well as how it can be used for data
sharing. This section is organized as follows. The first part, General Guidelines, is a condensed
guide to steps that need to be taken to publish data through a HIS HydroServer. Detailed step-by-
step instructions are given in Appendix A, which is a tutorial from a workshop taught at the
University of Cincinnati by Dr. Timothy Whiteaker. The workshop was conducted using a
simplified dataset 'RawData', available electronically as part of this report to facilitate user
learning. The second part, Working with East Fork Watershed (EFW) Data, is a detailed, step-
by-step description of work performed with EFW data collected in 2010 and published through
CUAHSI HIS. The third part, Working with Shepherd's Creek (SC) Data, is a description of the
work performed with SC data collected in the period 2004-2011 and published through CUAHSI
HIS.
3.1 General Guidelines
3.1.1. Hardware and Software Requirements.
The minimal hardware requirement is a Personal Computer (PC) with 4GB RAM, 500 GB hard
disk, 2.7 GHz processor. Macintosh (Mac) users can use the CUAHSI software through a virtual
machine.
The software includes:
• Windows 7, XP or Vista
• Microsoft Internet Information Services (IIS) 7 — comes with Windows 7, but may need
to be enabled. Enable ASP.NET.
• .NET Framework 2.0 SP2, 3.5 SP1 (free)
• Microsoft SQL Server 2008 R2 (commercial) or SQL Server 2008 R2 Express (free)
o Be sure to install the version with SQL Management Studio (aka Database and
Management Tools)
o Install with these options:
- Install with Mixed Mode Authentication.
- Specify the 'sa' (system administrator) password that you will remember.
• HIS software (free)
o ODM Data Loader - http://his.cuahsi.org/odmData Loader.html
o ODM Streaming Data Loader - http://his.cuahsi.org/odmsdl.html
o ODM Tools - http://his.cuahsi.org/odmtools.html
• Additional software (not required for HydroServer)
o Microsoft Office 2010 (32-bit version)
o Google Earth (free) - http://earth.google.com/download-earth.html
o HydroObjects (free) - http://his.cuahsi.org/hydroobjects.html
o HydroExcel (free) - http://his.cuahsi.org/hydroexcel.html
o HydroDesktop (free) - http://his.cuahsi.org/hydrodesktop.html
-------
3.1.2. Review of CUAHSI HIS Data Publication Tools and Service.
The CUAHSI HIS provides Web services, tools, standards and procedures that enhance access to
more and better data for hydrologic analysis. HIS software is free and available on the HIS
website at http://his.cuahsi.org. A variety of HIS software applications have been built to serve
several types of users and scenarios, from data users to data publishers to educators and
developers. These include the HydroServer, the ODM, WaterOneFlow, HIS Central, the
HydroDesktop. HydroServer is what allows the users to publish their own data. It includes
software for publishing observations data with a WaterOneFlow Web service, but it also includes
a website and supporting components for geospatial and temporal data visualization. ODM is a
data model for the storage and retrieval of hydrologic observations in a relational database.
WaterOneFlow is a Web service that facilitates automated and programmatic access to data. HIS
Central is a website maintained by the CUAHSI HIS team where users can register their
WaterOneFlow Web service. The service then becomes discoverable along with dozens of other
Web services already registered with the system, including the services EPA Storage and
Retrieval System for water quality data (STORET) and the USGS National Water Information
System (NWIS). This makes HIS Central the largest single catalog of the nation's water data.
HydroDesktop is a free and open source Geographic Information System (GIS) that allows the
user to explore and download Internet data published on HydroServers.
3.1.3. ODM Structure.
ODM is a data model for the storage and retrieval of hydrologic observations in a relational
database (Figure 3-1). The purpose for such a database is to store hydrologic observations data in
a system designed to optimize data retrieval for integrated analysis of information collected by
multiple investigators. It is intended to provide a standard format to aid in the effective sharing
of information between investigators and to allow analysis of information from disparate sources
both within a single study area or hydrologic observatory and across hydrologic observatories
and regions. The observations data model is designed to store hydrologic observations and
sufficient ancillary information (metadata) about the data values to provide traceable heritage
from raw measurements to usable information allowing them to be unambiguously interpreted
and used. A relational database format is used to provide querying capability to allow data
retrieval supporting diverse analyses. A generic template for the observations database is referred
to as the Observations Data Model (ODM). The specifics of the ODM are documented in
Tarboton et al., 2008. The current ODM design specifications can be found in Appendix B of
this report.
3.1.4. Installing a Blank ODM Database.
The first step is to download and install a blank ODM database from http://his.cuahsi.org. The
ODM database then needs to be attached to the SQL Server. The controlled vocabularies within
the database are then updated; this step helps to ensure that the user's terminology is consistent
with the peer-reviewed vocabulary of terms maintained by CUAHSI.
-------
3.1.5. Requirements for Input Files to ODM Data Loader: Transforming Data into
CUAHSI Format.
User data files need to be transformed into files that are formatted for loading into an ODM
database using the ODM Data Loader. The minimum pieces of information, stored in separate
files and needed to describe data in the ODM, are:
• Sites
• Sources
• Variables
• DataValues
Monitoring Site Locations
ISOhtotadati
(M| Mi?rad«i1iiiD (PK)
1 , (Mf TflplcCal»gory
Till*
(M>5aure*IO{PK)
(M> Organization
i M [ SaurcoDoscfipnon
(M| Abstract
| ProHlaVenlen
iM! Phone
(M) Email
(M) Address
i M: UU.onaDjlumlO (FK!
0 ' 1(01 Elevation m
KO)LocatX
O) LocalY
.' ||Q) LocalProjflcliomD (FK)
Data Col faction Methods
(Ml LjtJHetfiuJID {PK}
(M} UbHame
(M) J ahOrq,nmj.Hino
1 (MJ
1 J"' MctnodlD (PK]
[Ol r.vtiruH.'O
D«Kflll«Ofl«lUn« . !.„-:
SliSSr ^~ "^a*-*™?**
(O) OtlMtTypolO {f K)
{H) C*nsorCod*
. . (O»Qualifi«flD(FK|.
yartrthtowcv . .
1 MM*. 'iHlv^bl^T1
SpvciotionCV ' t Xw) Variabl«Nam«
;«i [ Dti JMJSam^Mrfium^^
~" I) Value Type
Categorical Oats
1 •
ata
I
^L
IM) OniveaFrtmlO (FK)
KM) GroupIO
(M) ValwIO <
• Group
Data Qualifiers
. dually
Value Grouping
QuaNtyContrdLevafe
1 1 i|M) QuatltyControlUvellD (PK}
4M) Qo*fityContTolUvelCodt)
4 M] CWtnitipn
° r(M| Qu.IIO.rlD (PK|
c
|
Figure 3-1. Observations Data Model (ODM) Schema, from Horsburgh et al. 2009
3.1.6. Loading Metadata and Data Values into ODM.
After the Sites, Sources and Variables metadata and data values are transformed into files
acceptable by the ODM Loader, they are committed to the ODM database. In addition to data
recorded in Excel spreadsheets, text files, etc, it is also possible to load data from telemetry
systems. Appendix A describes how to use the Streaming Data Loader for that purpose.
-------
3.1.7. Working With ODM Data.
Once data are loaded into the ODM database, they can be queried, visualized and analyzed.
ODM Tools are used for this purpose. They also allow to derive new time series from data, for
example by applying a function to the original data.
3.2. Working with East Fork Watershed (EFW) Data
A list of the data files pertaining to the EFW which were accommodated into the DMS is shown
below. The data for the water-immersed sensors for biological variables were not available. The
biological data from the stream-substrate samples could not be accommodated because HIS is
not programmed to store biological data and it does not carry a biological data loader yet.
1. Physico-chemical data water and sediment
a. 2010_ESF-EFWS_NutrientData_ODM.xls
b. 2010_ESF-EFWS_NutrientData_ODM_QC.xls
2. Sensor data from water-immersed sensors for physico-chemical variables
a. CEC 2009-20lO.csv
b. CLC 2009-2010.CSV
c. FMR 2009-20lO.csv
d. HST2009-2010.CSV
e. HWR2009-2010.CSV
f KRT2009-2010.CSV
g. LRC 2009-2010.CSV
h. OWT 2009-20lO.csv
i. SHA2009-2010.CSV
j. SLT2009-2010.CSV
k. UHL2009-2010.CSV
1. USR 2009-20 lO.csv
3. Sediment data
a. FieldSedimentFractions_8-05_7-09_update91409.xls
All files were formatted to conform with the CUAHSI standard before they were loaded to the
ODM database. Most of the column names in those files were renamed.
3.2.1. Description of Physico-chemical Data Water File.
The nutrient data file contained a DATA and a METADATA spreadsheet (Figure 3-2; Figure 3-
3). The DATA spreadsheet contained information about sampling date and time, site location,
sampling method, variables, and concentrations. Each column in the DATA spreadsheet is
explained in detail under the METADATA spreadsheet. The original METADATA spreadsheet
has been modified to such an extent that it can be used as source for lookup tables. The new
spreadsheet named Data Values has been added to this file. This spreadsheet pulls out all the
information from the two spreadsheets DATA and METADATA and creates a CUAHSI
-------
formatted Data Value file. This Data Values spreadsheet was copied to a new Excel file and saved
as a csv file.
Ki Schar3.
sie id I'D)
QAC
QAC
QAC
QAC
subkj
Schar7.
long id
fsuBim
STO
STD
STD
STD
pos
Schar4.
CROSS id
(POS)
H
G
F
E
rep 1
REP*
1
1
1
1
matrix
SC.IW2.
MATRIX
Ol
Bl
Ol
Dl
type
ScfiarS.
TYPE
CAL
CAL
CAL
CAL
anaty
ScharS.
Anafyte
Name
iANALY)
TNH4
TNH4
TNH4
TMH4
cup 3.
Autosam
pier-Cup
Number
(CUP)
91
92
93
94
ddate
yymmddtO.
Analyle
Detection
Date (DDATE)
20100106
20100106
20100106
20100106
dtime toe 11.
Analyte
Detection
Time {DTIME )
02:14:20 PM
02:1 5:25 PM
02:16:31 PM
02:17:36 PM
di«2.
Manual
Dilution
Factor
1
1
1
1
method
ScriarS.
METHOD
N-1.1
N-1.1
N-1.1
N-1.1
parea S.
Peak Area
(PAREA)
-0.155
0.25
0.923
1.33
phtg.
Peak
Height
(PHT)
-0.00734
0.0124
0.0472
0.0722
cone 10
Peak
Con centra
tion
(COMC'I
0
20
60
80
unit
Schare.
UNO"
ug N/L
ug N/L
ug N/L
ug N/L
cconc 10.
Peak
Concentra
tion
(corrected
for dilution
factor)
fuo/ll
1 0.00 1
20.00
60.00
80.00
Figure 3-2. Physico-chemical Data Water 2010, Excel sheet
1/5(2010 Q3:Qi
1/5(2010 03:0i
1/5(2010 Q3:Qi
1/5(2010 Q3:Q5
1/5(2010 03:0i
1/5(2010 Q3:Q5
SiteCode
1:00 QAC
t-OG QAC
1:00 QAC
t-OG QAC
1:00 QAC
t-OG QAC
1:00 QAC
t-OG QAC
1:00 QAC
/alue
C
20 TNH4-D1
SO TWH4-DI
SO TNH4-D1
300 TWH4-DI
600 TNH4-D1
1200 TWH4-DI
2000 TNH4-D1
1740 TWH4-DI
UbSampleCode LabMethodlD MethodDesc
0 0 Stand i
0 0 Stand i
0 0 Stand i
0 0 Stand i
0 0 Stand i
0 0 Stand i
0 0 Stand i
0 0 Stand
ceiD QualittiCcntrolLevsilD CensoiCode QffsetDes
OffsetUnitsfD UTCO((s«
W5r2010 03:00
W5I2010 03:00
W5r2010 03:00
W5I2010 03:00
W5r2010 03:00
W5I2010 03:00
W5r2Q10 03:00
W5I2010 03:00
W5r2Q10 03:00
Figure 3-3. Physico-chemical Data Water 2010, with New Formatted Data Values
Spreadsheet
3.2.2. Description of Sediment Data File.
The sediment data file contains Fractions and a METADATA spreadsheet. The Fractions
spreadsheet contains information about sampling date and time, site location, fraction sizes, and
weights. The METADATA spreadsheet has been modified and holds all the Offset Descriptions
(Figure 3-4). A new spreadsheet named Variables has been created which holds information for
each of the newly created variables. For each fraction size a new spreadsheet has been added to
this file. In addition, another spreadsheet named All Data Values has been created. This
spreadsheet gathers all the information from each class size and METADATA spreadsheet and
creates a CUAHSI formatted Data Value file. This Data Values spreadsheet was copied to a new
Excel file and saved as a csv file.
cdate
5»iimmdd1Q.
collection da*«
20070424
20070424
20070424
20070424
20070424
20070424
20070424
20070424
20070424
20070424
pdate
yymmddlQ,
placement
date
20070327
20070327
20070327
20070327
20070327
20070327
20070327
20070327
20070327
20070327
id
*cha3.
sample
id
OWT
OWT
OWT
OWT
KRT
KRT
KRT
KRT
CEC
CEC
tcharE *chai2
«pt «3,
bftq cross ^ ^ , . . t
, u . | { elapsed time of
chann crtarin i isp \ , .
el id ©lid ; ^
1 1A 1 28
1 2A 1 28
1 26 1 28
1 3A 1 28
2 1A 1 28
2 2A 1 28
2 2B 1 28
2 3A 1 28
3 1A 1 28
3 2A 1 28
mattK
tchai2.
Analytical
Base
Isediment
tnitDW4,Q
aye cobble
tray w*
163
aw small pw
sample* wt
53
^S£T
Ay, initial (gl
finWW
4.0
final w^t
weight
(gJ
SS 2505
SS 2504 3570
SS 2433 3525
SS 2508
SS 2435
SS 2504 3S86
SS 2507 3250
S3 2S08
SS 2433
SS 2502 4058
finDW
4.0
final
(g)
TsedW
15, Ts
-------
Figure 3-5 shows the steps taken to load Sources, Sites, Variables and Data Values to the ODM
database. It also shows a screenshot of the ODM Data Loader and the ODM Streaming Data
Loader.
Sources
Sftes
Variables
Che mica!/Physical/Biological
Data Values
_J
Figure 3-5. Steps in Data Loading to the ODM
3.2.3. Loading Sites to the ODM
jb> ODM Data Loader
File Help
F:\Enviranment\EPA-ED MS\0 D M\Docs\EFC SitesO D MTest_v2.csv
>
SiteCode
^^^^^|
SLT
SHC
HST
KRT
Longitude Latitude Site Name
.84 29503
^S4.212S98
-84.231882
-8425S485
-84.071855
35.&B525B
39.041194
39.08114
39.143385
39.QG4253
Upper Hall Run at Qoug
South Lucr,' Tributary at
Shaylor Crossing at Coui
Heiserman Stream at Rt
Tributar>pto Kain Run at
Figure 3-6. Loading Sites to the ODM
A csv file with all the sampling sites within the EFW was created according to the CUAHSI
standard (Figure 3-6). The sampling sites file contains detailed information about site name,
code, and location. The site csv file was uploaded to the DMS using the ODM Data Loader.
-------
3.2.4. Loading Sources to the ODM.
£n ODM Data Loader
File Help
F:\Environment \EPA-ED MS\0 D M\Docs\EFC Sources .csv
Organization
»
CC-OEQ
USAGE
EPA
Source Description Source Link
East Fork Watershed Sites
East Fork Watershed Sites
East Fork Watershed Sites
http://www.epa.org
http://www.lrl.usace.army.mil/
East Fork Watershed Sites http://www. epa.org
Figure 3-7. Loading Sources to the ODM
A csv file with information about data sources used in this project was created and loaded to the
DMS using the ODM Data Loader (Figure 3-7). This file contains information about the
different organizations which mange the different sampling sites within the EFW. It also contains
information about a contact person for each of those sampling sites.
3.2.5. Loading Variables to the ODM.
It is sometimes necessary to update the controlled vocabularies. The controlled vocabularies
(CVs) in ODM are implemented as tables. For example, the list of valid variable names is stored
in the VariableNameCV table (Figure 3-8). To ensure consistent descriptions of user data, the
data must use terminology from these CV tables. CUAHSI maintains a master list of CVs that
can be synced with an ODM database; however, sometimes it is necessary to add new, project
specific terms that aren't in the master CV. This can be done using the tools included with SQL
Server.
yb< ODM Data Loader
Fi
e Help
F:\Environment\EPA-ED M S\0 D M\Docs\EFC variables .csv
>
VariableCode
^^^^^^|
NH4
NH4-Urease
N02-3
TON
TN
Variable Name Sped;
Dissolved (Filtered) Reactive Phosphorus (as orthophosphate)
Dissolved (Filtered) Ammonium
Ammonium analyzed as an endpoint to 24hr Urease Assay
Dissolved (Filtered) Nitrite-Nitrate
Total Dissolved (Filtered) Nitrogen (as nitrate)
Total Nitrogen (Not filtered, as nitrate)
P
A
A
N
N
N
Figure 3-8. Loading Variables to the ODM
-------
Before new variables could be loaded to the ODM new vocabularies had to be added to the
VariableNameCV table. The new vocabulary was added to the ODM by using SQL statements
inside SQL Server.
SQL statement: Insert into VariableNameCV Values (('Urea, total', Total Urea') and Insert into
VariableNameCV Values ('Urea, dissolved (filtered)', Total Urea')
3.2.6. Loading SampleMediumCV to the ODM.
New vocabularies had to be created and added to the SampleMediumCV table. The new
vocabulary was added to the ODM by using SQL statements inside SQL Server.
SQL statement: Insert into SampleMediumCV values('Atmospheric Deposition', 'AD'), Insert
into SampleMediumCV values(Tntergravel', TG'), Insert into SampleMediumCV values('Waste
Water', 'WW'), Insert into SampleMediumCV values('Deionized Water', 'DI'), Insert into
SampleMediumCV values('Drinking Water', 'DW') and Insert into SampleMediumCV
values('Periphyton', 'BP')
3.2.7. Loading VariableNameCV to the ODM.
New vocabularies had to be created and added to the VariableNameCV table. The new
vocabulary was added to the ODM by using SQL statements inside SQL Server.
SQL statement: Insert into VariableNameCV values( 'Ammonium analyzed as an endpoint to
24hr Urease Assay', 'Ammonium analyzed as an endpoint to 24hr Urease Assay') , Insert into
VariableNameCV values( 'Nitrogen, nitrite (NO2) + nitrate (NOs) nitrogen', 'Nitrogen (NO2) +
Nitrate (NOs) Nitrogen') and Insert into VariableNameCV values( 'Nitrogen, nitrite (NO2)
nitrogen', 'Nitrate (NO2) Nitrogen')
3.2.8. Loading SampleTypeCV to the ODM.
New vocabularies had to be created and added to the SampleTypeCV table. The new vocabulary
was added to the ODM by using SQL statements inside SQL Server.
SQL statement: Insert into SampleTypeCV values(T, 'Duplicate sample No. 1'), Insert into
SampleTypeCV values('2', 'Duplicate sample No. 2'), Insert into SampleTypeCV values('3',
'Duplicate sample No. 3')
3.2.9. Loading Variables to the ODM.
A variable csv file had to be created and formatted according to the CUAHSI standard. All the
newly created variables include a unique VariableCode that can be used to specify a variable of
interest. The VariableCode starts with the variable abbreviation followed by the sample medium
abbreviation to distinguish each variable by its location where it has been collected. For example
TNH4-SW stands for 'Nitrogen, NH3 + NH4' and it has been collected as a 'Surface Water'
(SW) Sample.
3.2.10. Loading Data Values to the ODM.
The DataValues were created from the 2010_ESF-EFWS_NutrientData.xls file. A new
spreadsheet was created and all relevant DataValues Column names added. This spreadsheet
pulls values from the Data spreadsheet and the METADATA spreadsheet and writes each value
to the correct column in the DataValues spreadsheet. This spreadsheet had to be copied and
pasted into a new Excel document. This document was then saved as a csv file (Figure 3-9).
-------
Alternatively, the data can be saved using Save As CSV directly from the modified
NutrientData.xls file without having to save as a separate Excel file first. The ODM Data Loader
was used to load the Data Values to the ODM.
•£> ODM Data Loader
Pile Help
^^H^^^^^^^^^^H^H
•F \Environment \EPA-EDMS\ODM\DocsN201Q ESF-EFWS NutnentData ODM vanablescsvB
Date TVne SiteCode Data Value VanableCode Sample Type
6/21/2010
ii/a/20io
5/17/2010
11/1/2010
5/24/2010
10:20:00 AH
10:11:00 AM
10:20:00 AM
10:27:00 AM
oc
CLC
CLC
CLC
10: 19:00 AM CLC
6.75
7
7.21
7.28
7.39
DOC-SW
TUREA-SW
TOC-SW
TNH4-SW
DOC-SW
1
1
1
1
1
Figure 3-9. Loading 2010 Nutrient Data Values
3.2.11. Loading Sensor Data from Water-immersed Sensors for Physico-chemical Variables
to the ODM.
The sensor water data files were uploaded to the ODM using the ODM Streaming Data Loader
(Figure 3-10). All sensor data consider daylight savings time. First a new csv file with all sensor
data variables was created. A description for each column in the sensor data files can be found
under Appendix C. While loading the sensor data to the ODM database the variables were
mapped to the newly created variables in the ODM.
1^ COM fetmrq CMi n»«!«.
+ / — » J
Lac* F •fcnnwr£l>Jt Id^. 711MIM«00 KUW1 M«
fa IW
V. I',! LKd
' f«f
V. £SF
f IV
MiwnotW
tdw 7.uaiMaoo -.--1.7:111'«
:u»ni5M 'jii,ysHtt*
nty.
T.-U'MH I2IB ••4-2C1IV.K
, ;
7/1MB1ll7»
'-j,i :iijq!'
Figure 3-10. ODM Streaming Data Loader
3.2.12. Loading Sediment Data to the ODM.
The ODM Streaming Data Loader was used to load the sediment data. New vocabularies were
added to the VariableNameCV table. The following SQL statement was used inside SQL Server:
-------
Insert into VariableNameCV Values(' Gravel Tray, total sediment weight (g)',' Gravel Tray, total
sediment weight (g)'), Insert into VariableNameCV Values(' Gravel Tray, total sediment weight
sum of fractions(g)',' Gravel Tray, total sediment weight sum of fractions(g)'), Insert into
VariableNameCV Values('Gravel Tray , > 2mm (g)',' Gravel Tray , > 2mm (g)'), Insert into
VariableNameCV Values('Gravel Tray, 250um -2 mm (g)', 'Gravel Tray, 250um -2 mm (g)'),
slnsert into VariableNameCV Values(' Gravel Tray, 250um - 2mm LOI % (w/w)', 'Gravel Tray,
250um - 2mm LOI % (w/w)'), Insert into VariableNameCV Values(' Gravel Tray, 1.2um -
250um (g)', 'Gravel Tray, 1.2um - 250um (g)') and Insert into VariableNameCV Values(' Gravel
Tray, 1.2um - 250um LOI % (w/w)', 'Gravel Tray, 1.2um - 250um LOI % (w/w)')
New sediment variables were created and loaded to the ODM. Sediment Datavalues were loaded
to the ODM using a modified version of the ODM Data Loader (Figure 3-11).
F:\Enviranmertf\EPA-EDMS\Testing\20Q7-20Q9FliedSediments.csv
Open
Source Column Names
Target DB Column Names
Date
Time
SiteCode
DataValue
VariableCode
LabSampleCode
LabMethodID
Method Description
OffsetValue
Source ID
QualityControl Level ID
CensorCode ^
ffl LabMethods *
i SampleTypeCV
4i- Methods
+'• Categories
S- Samples
|- SamplelD
Date Time SiteCode DataValue VariableCode
4/24/2007
4/24/2007
4/24/2007
4/24/2007
4/24/20D7
4/24/2007
4/24/2007
4/24/2CB7
4/24/2007
4/24/2007
4/24/2007
OWT
QWT
OWT
OWT
KRT
KRT
KRT
KRT
CEC
CEC
CEC
373.4675911
680
686
310.0505956
574.5372778
760
413
242.4367722
734.743367
1073
1053
TsedWI
TsedW2
TsedW2
TsedWI
TsedWI
TsedW2
TsedW2
TsedWI
TsedWI
TsedW2
TsedW2
SampleType
I^H
1
i
1
1
1
1
1
i
1
1
LabSampleCode *|
T
«
0
0
0
0
0
0
0
0
0
< | m |
You are loading DataValues
1001 Rows
Commit File
Figure 3-11. Modified Data Loader
10
-------
3.2.13. Data Validation.
SQL statement
Select "from Units
I; Query
»•
4
UnitsName UnitsType Units Abbreviation ^_
percent
degree
grad
radian
degree north
degree south
degree west
degree east
ancminute
arcsecond
steradian
acre
hectare
square centimeter
sni iFire foot
Dimension less
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Angle
Area
Area
Area
Am Ft
X LJ
deg
grad
rad
degN
degS
degW
degE
arcmin
arcsec
sr
ac
ha
cm2
ft? "
rrr t
31 7 Rows
Figure 3-12. New Testing Tab Added to the ODM Data Loader
Three steps were followed to validate the consistency of any data committed to the DMS. In the
first step a new tab named Testing in the modified ODM Data Loader was created (Figure 3-12).
This tab allows for a quick check and validation of any category of data which has been loaded
to the ODM. Under Figure 3-8 the tab 'Units' was queried. In the second step the ODM log files
were checked. The ODM data loaders keep a log file which contains information about what type
of data has been loaded, the loading date and time, and any errors which occurred during the
loading process. The data loaded using the different ODM data loaders were validated after each
file had been loaded to the ODM database. All data loaded to the ODM database were
quantitatively verified by the user for consistency with the data loaded to the ODM database. The
third step included verification that the data values committed to the ODM for a specific location
and time were actually in the database, using ODM Tools, a freely downloadable application.
This tool enables querying the DMS and verification of the results returned (Figure 3-13) by
comparison with the data before loading onto the ODM.
11
-------
kftl ODM Tools [ •=• 1 B larfSa
File Edit Tools Help i
'
[
[
C
/
Query j Visualize
7j Query by Site
> Choose Sites from a list
Edit
O Query by Site Name
ESF - Experimental Stream Facility Weather Station .*,
GRR - Grassy Fork at Glancey-Marat
HLR - Hall Run at Roundbottom Rd c
HST - Heiserman Stream at Rt 5D crc
7\ Query by Variable
a Choose Variables from a list
Temp - Temperature
TN-DW -Nitrogen, total
TNH4-AD - Nitrogen, NH3 + NH4
Organization (; )
1
Source Description (; )
® AND OR
ion Rd crossing Uuery by bite Lode
crossing
; Query by Variable Name
-. |
Q C' Query by Variable Code
EH General Category [7J Value Type |7J Quality Cc
Oimate BS3SS35I2i^^^M -9999 - U
Instrumentation Sample
Water Quality 1 -QualS)
2 - Derive
3 - Interpr
[3 Sample Medium [7| Data Type i - (fn™ul
Intergravel » Average * '-
Sedhrient ' ' Minimum ^ Method {;
Multipls
Entries I. )
© AND
OR
Multiple
Entt
® AND
OR
ntrol Level i tt of Observations
iknown * > |1
controlle S
d produJ- J QTimePenod
etedprod begin: 10/ 4/2D11
^nf. nmr *
* end- J10/ 4/2011
)
ite Variable Speciation Variable Units General Category Value Type Sample Medium Data Type Quality Control Level Method Description Vah
Export Checked Metadata Export Checke
)
d Data | Query
Figure 3-13. Data Validation Using ODM Tools
3.3 Working with Shepherd's Creek Data
A procedure similar to that used for the EFW data was used to accommodate the SC data into the
DMS.In this case only one file type was included, i.e., Excel sheets containing physico-
chemicaldata in water grab samples, collected during a 7-year period (2004-2011).
12
-------
4.0: DATA PUBLICATION
This section of the report outlines how to publish data through HIS Central and consists of two
parts. Part one, Data Publication with WaterOneFlow, describes general concepts, requirements
and steps to be taken for data publication. Detailed instructions are given in Appendix A. Part
two, Publishing EFW Data, outlines specific phases towards publishing the EFW data with
regard to EPA ORD IT safety requirements.
4.1. Data Publication with WaterOneFlow
WaterOneFlow Web services are used to share data with others online. WaterOneFlow defines a
standard set of queries and a standard output format for accessing data, regardless of whether the
data are accessed internally from an ODM database, some other database, or even through
another website. Additionally, WaterOneFlow provides a layer of security over the database,
which makes it less susceptible to hackers than exposing the database itself with public access.
Appendix A provides step-by-step instructions on how to publish the data with WaterOneFlow.
The main steps include:
1. Create a SQL Server account that the Web service will use to access the database.
2. Install the WaterOneFlow Web service on the computer.
3. Configure the service.
4. Check the result.
Upon the completion of these steps, the service must be registered at HIS Central to make it
discoverable online. More than just a listing of WaterOneFlow services, HIS Central performs
the following functions:
• Provides detailed information about the service, including contact information, abstract,
and areal extent of the sites.
• Supports translation from the variables to an ontology of common hydrologic concepts.
This facilitates easy search for variables in the service, especially by those who are not
familiar with what the service has to offer.
• Maintains a catalog of sites and variables available in all registered services, enabling fast
search for data from multiple data providers.
After the service has been registered, HIS Central needs to harvest the data. HIS Central keeps a
catalog of all sites and variables, which enables fast searching across all registered services. HIS
Central creates this catalog by calling various methods from the WaterOneFlow Web service.
This is called data harvesting. When a data harvest is requested by the database developer, an
HIS Central administrator has to be notified who then triggers a harvest of data. A trigger
request is performed with a click of a button. Once the service is in place, harvesting occurs
weekly automatically.
The next step is to map the variables into the CUAHSI HIS ontology. This must be performed
for each variable to ensure that variable names comply with CUAHSI ontologies and are,
13
-------
therefore, recognized. Mapping has to be programmed by the database developer at this time.
Finally, HydroDesktop is used to test if data publication has been successful.
4.2. Publishing New Watershed Data
The three phases of publication of a new database include the development/testing,
demonstration and production. The team has completed the development/testing and
demonstration phases.
4.2.1. DMS Development (and Testing) Phase (Figure 4-1):
1. An ORD-HIS Development Database, an ORD-HIS Test Database, and an ORD-HIS
Production Database have been set up by the EPA OSEVI inside the EPA firewall. During the
development and testing phase, this approach will allow data loaders and application
developers and testers three uniform, formatted databases to develop, stage, clean, quality
assurance (QA), and test data before it is moved to the production database (which will
eventually be accessible to the public).
DMS Project HIS Database Administrators will have direct access to the three databases
(development, test, and production) on the ORD Enterprise SQL Server. The DMS Project
HIS Database Administrators will also manage access for other ORD or EPA users (e.g., read
only, load data, or modify records).
2. EPA/ORD-users may submit a request for the installation and use of CUAHSI's, open-
source, HydroDesktop software, which was recently approved for use within the EPA
Agency. Only users inside the EPA firewall can access the EFW and SC databases during
the development phase. For demonstration of the EFW and SC databases by the UC
contractors, copies on a UC server are used until the production phase is complete.
3. OSEVI will also begin setting up an EPA HydroServer that will initially function within
EPA's firewall. This HydroServer will live inside the EPA firewall until the DMS team is
ready to make it publically accessible. The HydroServer will read data records from the SQL
database file that is on the EPA/ORD Enterprise SQL server. The original ESF and SC
databases currently reside also on this EPA/ORD Hydroserver at RTF which is accessible
from within the EPA/ORD firewall (through a SQL Server account which has permission to
view those databases and using ODMTools; point of contact: Dr. Elly P.H.Best).
4.2.2. DMS Demonstration Phase (Figure 4-2):
For demonstration of the EFW and SC databases by University of Cincinnati (UC) contractors,
copies on a designated UC server are used until the production phase is complete. These web-
services can be accessed by EPA and non-EPA users who have access to the Internet in the HIS
Data Service List by the following link: http://hiscentral/cuahsi.org/pub_services.aspx.
By clicking on the title 'EPA- East Fork Watershed in Ohio' the individual service page of the
EFW database comes up (http://hiscentral.cuahsi.org/pub_network.aspx?n=264). The data of this
database can be accessed via CUAHSI's Hydrodesktop application.
By clicking on the title 'Shepherd Creek Watershed Ohio' the individual service page of the
Shepherd Creek database comes up (http://hiscentral.cuahsi.org/pub network.aspx?n=216).
14
-------
The data of this database can also be accessed via CUAHSI's Hydrodesktop application.
Publication of both databases is visualized in Figure 4-3.
4.2.3. DMS Production Phase (Figure 4-3):
1. Once ORD researchers are ready for the production phase (i.e., data are loaded on the
production database and have funding secured), they will begin the Application Deployment
Checklist (ADC) or other more recently adopted process to move the EPA HydroServer to
the demilitarized zone (DMZ, hosted by the EPA National Computing Center) so it will be
fully accessible by HIS Central (unless a different process has been adopted by EPA).
2. ORD and other EPA users with the HydroDesktop client installed will access the internet-
based public HIS Central metadata catalog through the Agency Firewall. They will retrieve
data records from the EPA HydroServer and from other (public) HydroServers as well.
Access to the EPA HydroServer will be through the internal DMZ firewall router, while
access to other public HydroServers will be through the Agency Firewall.
3. Public (non-EPA) clients who retrieve pointers to ORD water data from HIS Central queries
will access the EPA HydroServer through the Agency internet/DMZ router. The EPA
HydroServer will retrieve requested water data records from the internal ORD-HIS database
and pass them back to the external client.
4. Offsite (external to the Agency Firewall) access to the ORD-HIS databases will only be
possible through the EPA HydroServer, and will be limited to those types of access provided
by the HydroServer (i.e., only retrieval of water data records from the ORD-HIS database).
Offsite ORD-HIS Database Administrators or other offsite users with requirements to
directly access the ORD-HIS databases (e.g., approved external data loaders) will have to use
Agency AAA/F5 connection to access the ORD-HIS databases. (An additional 'jump box'
may be necessary for offsite direct access to the ORD Enterprise SQL Server and the ORD-
HIS databases. This may have to be developed as required).
By placing the HIS database on the ORD Enterprise SQL Server, ORD researchers are relieved
of all tasks to maintain the SQL server. ORD researchers (or designated staff) remain the data
owner and Database Administrator of the databases. OSEVI (or designated staff) manages and
maintains the SQL Server itself, including normal server backups. Reliable backup of SQL
database files will require arrangement and coordination with the Database Administrator.
Additional backup may also be needed and will be determined later.
15
-------
The Internet
EPAHydroserver
Public Data
Publication
EPA-ORD Internal Network
DMS in Development Phase
rodesktop
EPA clients
EPA-ORD User
ORD Enterprise
SQLServer
It,/
*»
SQL Client
OOLOslal-oarfer
Database Managem_
EPA-ORO
EFWObue Manager
Figure 4-1. DMS Development Phase
16
-------
The Internet
Hydrodesktop
PubNc clients
(Non ""'
HIS Central
Data discovery, metadata
catalog, web-services API
EPA-ORD Internal Network
DMS in Demonstration Phase
ORD Enterprise
SQL Server
11 //
SQL Client
ODLDataLoader
Database Management
EPA-ORD
EFWDbase Manager
UC Contractors
Figure 4-2. DMS Demonstration Phase
17
-------
EPA - East Fork Watershed in Ohio
EPA.EFW
EFVy
j *»«•*«. S«rw*
Conutt
**
i si .5? MI
T« si Data 1rom (he EPA East Fork \Vat«ished in Ohio. Data is not qualit,
controlled
EPA kilt Fork Witor»t)»d ,n Dim
Shepherd Creek Watershed Ohio
EPA-SC
Edit Service Details
Test Data from Hie Sflepterd Creek Watershed in Ohio Data is quatfy
Tea Dais from Ete Shepherd Creek Wattfdwd w
Oho. Dili i$ quality confided
Figure 4-3. Publication DMS in Demonstration Phase
18
-------
The Internet
mil!' EPA's Security
IMOMZ
DM2
Hydrodesklop
Public clients
(Non-EPA)
Data access
EPA-ORD Internal Network
HIS Central
Data discovery, metadata
catalog, web-senrtcej API
DM5 in Production Phase
SOL Client
OOLOstsLosder
Database Management
EPA EPA-ORD
EFW Dbase Manager
Figure 4-4. DMS Production Phase
19
-------
5.0: USING THE DMS: EXCHANGE AND INTEGRATION OF WQ2
INFORMATION
This section describes tools that enable data visualization and analysis. It includes a discussion of
HydroDesktop, HydroR and HydroExcel. Details on HydroDesktop and HydroR are contained in
Appendix D, and on HydroExcel in Appendix A.
5.1. Using HydroDesktop
HydroDesktop is a desktop application for discovering, accessing, and analyzing time series data
from WaterOneFlow services. HydroDesktop can be used to search for hydrologic data, select
and download data, visualize time series, label features, delineate watersheds, and explore data.
5.2. Using HydroR
HydroR plug-in is an interface between HydroDesktop and the R statistical software. HydroR
makes it easy to provide R access to the data downloaded with HydroDesktop. This interface
allows HydroDesktop users access to advanced statistical analysis and plotting capabilities of R.
5.3. Using HydroExcel
HydroExcel is an Excel spreadsheet customized with macros for accessing data from a
WaterOneFlow Web service. It is thus possible to extract data from the published Web service
from within an Excel spreadsheet.
20
-------
6.0: DATA ANALYSIS TO INFORM DECISION-MAKING
6.1. Area of Interest and Problem Identification
In this section a case study is described to illustrate how users may access, explore and retrieve
WQ2 data from the DMS, combine them with their own data, conduct analyses and use the
results to inform watershed management decision-making.
The East Fork of the Little Miami River (EFLMR) watershed is a large Midwestern watershed,
covering approximately 1,295 km2 (500 mi2) and discharging into the Little Miami River (LMR),
a National Scenic River. The LMR is a tributary of the Ohio River. The headwaters of the
EFLMR are located in the rural Counties, Clinton, High, and Brown, and the confluence with the
LMR in suburban Clermont County. Dam construction by the U.S. Army Corps of Engineers
(USAGE) is in 1975 created a 8.74 km2 (3.38 mi2) reservoir (Harsha Lake or East Fork Lake).
The dam and reservoir provide flood control, recreation opportunities, and a drinking water
source for Clermont County. The EFLMR has been listed as an impaired water by the State of
Ohio since 2006, and has been designated for Total Maximum Daily Load (TMDL)
identification (Collaborative EFLMR Watershed, 2007; Collaborative). The watershed has a
mixed agricultural and urban land use, and harbors several wastewater treatment plants
(WWTPs) and a drinking water treatment plant (DWTP). Issues of concern for the water quality
of this watershed include stormwater, wastewater, and agricultural runoff, which may contribute
to increased loading of N and elevated aqueous N concentrations; and drinking water treatability
(Nietch et al. 2010). A level of 1.1 mg/L for dissolved inorganic nitrogen (DIN) has been
suggested as a water quality standard for the management of rivers and streams <1,300 km2
(Miltner 2010). In this case study, a procedure is described in which a N-TMDL for the EFLMR
is computed using 1.1 mg DIN/L as a target level. The TMDL will be calculated for the pour
point of the watershed, where the EFLMR discharges into the LMR. The layout of the watershed
is depicted in Figure 6.1.
6.2. Data Management
Data management, sharing, exchangeability, and accessibility were identified as major problems
for effective communication and analysis of information by the Collaborative in which
EPA/ORD participates. Many WQ2 data are currently collected, analyzed, and stored in similar
ways by members of the Collaborative. Therefore, it is expected that a scalable DMS based on
CUAHSI protocols for storage and management of georeferenced discrete data pertaining to the
EFW collected by EPA will serve as an example and greatly facilitate effective communication
and analysis of information in the near future (see sections 2-5 of this report).
6.3. Data Exploration
HydroDesktop was used to explore and identify the geographical area of the EFW, select it,
identify the monitoring stations in the watershed, and visualize this information as a
geographically referenced map (Figure 6-2).
21
-------
*
-------
.! UV, - bwl f-wt, W«er»1wd m C*
m (1290211 II7S37J
] nwiy [teiiy v*km
. F™
a **,un
,,r.
D
- US
a Qu.9.S»
XNAMC
- c.
< Xan MUTKK^ WWTP Fbul vfflufrri i
. OH
-------
Discharge of East Fork Watershed at Perintown (OH)
6000 -
sonn
Ji dnnn
9
P
_C
u
1000 -
0'
•
i
Jan
t
i
F<
1
f !
;b M
—
'
1
ar
•
i
i
»
:
A
'
91
\
i
M
.
! •
'
. ' 1 •
i !
: : - ! • T:
'i i 4 I 4 ,| 4 ^
ay Jun Jul Aug Sep Oct Nov Dec
Time (month)
Figure 6-4. Box and Whisker Plot of the Flow Data at Perintown, OH, Showing a
Maximum in March and Minimum in September
Figure 6-5. Summary Statistics of the Flow Data at Perintown, OH, Showing Variability
in the Data
24
-------
6.4. Data Analysis Using Excel
Cumulative flow (in mega gallons per day, MOD) was calculated from measured daily flow (in
cubic feet per second, cfs), and desired period of time (in days, d). Cumulative flow over a year
is commonly used as the basis for estimation of pollutant loads and/or TMDLs (Figure 6-6).
Daily flow is calculated using equation (1).
where:
Qd
Qi
Qd =
1=1
(l)
cumulative flow on day d
flow on day i
0
Perintown Flow 2010
120
100
30
g
o
0
1/1/2010 2/20/2010 4/11/2010 5/31/2010 7/20/2010 9/8/2010 10/28/201012/17/2010
Figure 6-6. Flow at Perintown, OH, in 2010
In addition to the flow data, data on nitrogenous compounds from station EFW close to
Perintown, the pour point, collected by the Collaborative were selected.
25
-------
Nitrogen Time Series
6000
5000
4000
2000
1000
0
1/1/2010 2/20/2010 4/11/2010 5/31/2010 7/20/2010 9/8/2010 10/28/201012/17/2010
Date
Figure 6-7. Nitrogen Time Series at Perintown, OH, in 2010
Cumulative load is calculated as follows.
— /i * r1 t^i\
• — Vi * Lt (/)
where:
LJ = load at time /'
Qi = flow at time /'
Cj = concentration at time /'
Daily cumulative load is then:
Ld =If=1Qi* C^ At (3)
where:
Ld = cumulative load on day d
Qi = flow at day /
d = concentration at day /
A^ = the dimensionless time step from day dio day d+1 at which concentrations were
measured
Cumulative target load is calculated as follows.
Lcd =2f=1
-------
Nitrogen Loading Curve
SCDOOO
450000
5/31/10 7/20/10 W10 10/U/10 13/17/10
Dole
1A/10
^—CunUsne CrmreN Loadwth MOS(H)
Figure 6-8. Computed Nitrogen Loading Curve of the East Fork Watershed in 2010
The flow duration curve (FDC) is a plot that shows the percentage of time that flow in a stream
is likely to equal or exceed a specified value of interest. Flow duration curves are developed
from historic flow data at the site and provide a snapshot of the flow record at that location over
a certain period of time.
A duration curve is computed according to the following equation.
where:
p(x) =
P(x) =
exceedance probability of event x
probability of event x
The FDC for the pour point of the EFLMR was computed over a period of 5 years from 2005
through 2010. It showed that the flow at this station is at least 100 cfs in 55% of the time (Figure
6-9). Flow duration curves are often segmented into flow regimes to describe varying hydrologic
conditions at that site (Cleland, 2003).
27
-------
Flow Duration Curve
Low
Flow
Flow Exceedance Percentile
Figure 6-9. Flow Duration Curve of the East Fork Watershed
Flow duration curves have several applications in the water resources field (Vogel and
Fennessey 1995; Johnson et al. 2009), including water quality analysis through load duration
curves. Load duration curves are developed from and work similarly to flow duration curves.
Instead of addressing flows, however, load duration curves address the likelihood of equaling or
exceeding a given pollutant load at a given location. Curves are computed by applying the
concept of mass loading which combines water quality and flow information to quantify the
pollutant load contributed to a point on a stream by the watershed that lies above it.
The mass loading is then calculated by:
LL= QL* CL (6)
where
L; = mass loading of input at time /' in mass/time
Qi = input flow at time /' in volume/time
C; = pollutant concentration at time /' in mass/volume.
Load duration curves can give insights in various aspects of pollutant loading, such as patterns in
loading under various flow conditions, impacts of point versus non-point sources, and effects of
best management practices (Cleland 2002; Cleland, 2003; USEPA, 2007). Over the last 10 years
load duration curves have been widely used in the calculation of TMDLs.
Using the load duration curves to calculate TMDLs is governed by the following equation
TMDL = X WLA +1 LA + MOS (7)
where
28
-------
WLA = the waste load allocation (point sources)
LA = the load allocation (non-point sources)
MOS = the margin of safety
The Clean Water Act requires that a TMDL includes a margin of safety (MOS) to account for
any lack of knowledge concerning the relationship between load and waste load allocations and
water quality. According to EPA guidance the MOS may be implicit (i.e., incorporated into the
TMDL through conservative assumptions in the analysis) or explicit (i.e., expressed in the
TMDL as loadings set aside for the MOS). Commonly a MOS of 5-10% of the TMDL is used.
However, a 20% MOS may be used for the load duration curve (LDC) method where a relatively
low number of data points is available for the analysis.
For this EFW case, first a target curve was computed from the target concentration (of 1.1 mg
N/L) according to equation (7) using a 10% MOS. The resulting curve is the maximum pollutant
load that can be experienced at the site, based on previous flow conditions, while still meeting
the water quality standard (i.e., the TMDL; Figure 6-10, blue line). Subsequently, the water
quality data monitored at the nearby station were converted to loads and plotted in the same
graph (Figure 6-10; red dots). This example computation is limited to monitored N values
accessible via the DMS, representing non-point sources (LA), and, therefore, the estimated N
load is probably less than with point-source contributions (WLA) included. Because the target
curve represents the water quality standard, points falling above the curve are out of compliance,
and points below the curve are in compliance. The difference between the observed and target
loads are then computed to reveal the overall load reduction required to meet the water quality
standard. The required load reduction can be found by inspecting the flow regimes pertaining to
the observed and target loads, respectively. The percent load reduction can be calculated by
subtracting the average of the observed load from the average of the target load (Figure 6-11).
Expressing the load reductions numerically and by flow regime provides more specific
information on the circumstances under which TMDLs are exceeded, and may, thus, guide
measures and actions required to avoid TMDL exceedance (Table 6-1). To take the MOS into
consideration, the target load has to be decreased by 10% for this calculation. In this watershed,
the observed N load remained below the target load under high flow conditions in 2010, despite
transporting the greatest N load, since ample water was available for dilution. In contrast, under
dry as well as low-flow conditions the observed N load greatly exceeded the target load, despite
transporting over ten times lower loads per day than at high flow. In this case, therefore,
measures alleviating water quality impairments under dry and low-flow conditions may have to
be taken. In the current TMDL example computation only N from non-point sources was
included. To reduce the N loading from non-point sources, nutrient management measures such
as reduced fertilizer rates, conservation tillage, cover crops, and on-site runoff treatment may be
considered. Regulatory action recommending lower effluent concentrations for N only pertain to
pollutants from point sources.
This case study demonstrated the functionality of the DMS by estimating an example nitrogen
(N) TMDL using a Load Duration Curve (LDC) approach. Data were retrieved from the DMS
and other databases accessed, explored and visualized through the HIS' application
Hydrodesktop. In this case Hydrodesktop accessed the DMS-data via a link to the test-
publication of the EFW Database for which CUAHSI provides a codeplex sandbox. When the
implementation of the recently developed DMS is approved for use within EPA-ORD and
29
-------
funding for the DMS is generated to enter into the 'Production Phase' (involving dedicated
server testing, operating and maintenance), the publication of the EFW Database in the CUAHSI
HIS Services Central Web Service Registry (and possibly other websites) can be accomplished.
The latter step would enable access and use by users within and outside EPA/ORD, and is
expected to greatly facilitate and accelerate collaboration.
Nitrogen Load Duration Curve
1.0E+05
1.0E+00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Flow Exceedance Percentile
- Target Sanple Load • Observed Loads
Figure 6-10. Nitrogen Load Duration Curve of the East Fork Watershed
Nitrogen Load Duration Curve
1.0E+05
1.0E+00
00 0.1
03 04 05 06 07
Flow Exceedance Percentile
-Target Sample Load » Observed Loads
08 09
10
Figure 6-11. Nitrogen Load Duration Curve of the East Fork Watershed, with Difference
Between the Averages of Observed and Target Loads, Respectively, Marked
30
-------
Table 6-1. Load Duration Curve Estimated Nitrogen Loads and Overall Reductions
Needed to Meet the Proposed Dissolved Inorganic Nitrogen TMDL for the Management of
Rivers and Streams <1,300 km2 (with 10% margin of safety)
Nitrogen Reductions
Flow
Percentile
Range
0-0.1
0.1-0.4
0.4-0.6
0.6-0.9
0.9-1.0
Hydrologic
Condition
Class
High flow
Moist
conditions
Mid-range
conditions
Dry
conditions
Low flow
Number
of
Samples
5
17
10
7
17
Average
Observed
Load
(kg/day)
6399.39
1951.91
553.82
268.14
335.88
Average
Target
Load
(kg/day)
7608.86
1964.72
369.53
141.61
93.27
Average
Target
Load w.
MOS
(kg/day)
6847.98
1768.25
332.58
127.45
83.95
% Load
Reduction
w. MOS
=Obs-Target
w.MOS)/Obs
0.0
9.4
39.9
53.5
75.0
31
-------
7.0: CRITICAL GAPS OF CURRENT APPROACHES
The EFW biological monitoring data include data attributes that do not map to attributes within
ODM. Because access to the data within CUAHSI HIS is provided by Web services which
connect to the ODM database, this presents a number of challenges. The ODM database will not
store these attributes. Even if the database were modified to accommodate these attributes, the
ODM data loaders would not load these attributes because they have not been programmed to
recognize them. Even if the data are somehow successfully loaded into the database, the
remaining cyberinfrastructure from the WaterOneFlow Web service to HIS Central to
HydroDesktop, would not recognize these attributes and so they would not be communicated to
end users. Recommendations for addressing these challenges are described in the next section.
32
-------
8.0: RECOMMENDATIONS FOR IMPROVING UPON EXISTING
APPROACHES
Being an open source system, modifications to CUAHSI HIS software and data models can be
made when they fall short of project needs. For example, to address issues with handling
biological data as described in the previous section, the following steps could be taken:
1. Using SQL Server, modify the ODM database to include the additional biological
attributes.
2. Modify the ODM data loaders to load these attributes. Modifications can be made by
downloading the source code from CUAHSI, modifying the code, and compiling the
code.
3. While it is not possible to modify the HIS Central application since it is maintained at
CUAHSI, users within the EPA firewall can still get access to the data by making direct
connections to the ODM database. These connections would allow for the transfer of the
additional biological attributes.
A balance must be struck between making these modifications within the DMS team and relying
on CUAHSI to perform the work. In some cases, the modifications may prove useful to the
broader CUAHSI community outside the scope of the DMS project. CUAHSI provides a
number of mechanisms by which the DMS team can contribute to this broader community:
1. When modifying source code, instead of maintaining only a local copy of the code,
commit a branch with the code to CUAHSI's source code repository. The two
repositories of interest are:
a. HydroServer http://hydroserver.codeplex.com/
b. HydroDesktop http://hydrodesktop. codeplex. com/
2. When issues are encountered or to open discussions with the CUAHSI team and
community, use the issue tracker and discussion forums located on the websites linked
above.
3. When terms must be added to the controlled vocabularies in the project's ODM database,
if these terms might be useful to the broader community, submit edits to CUAHSI's
master controlled vocabularies at http://his.cuahsi.org/mastercvreg.html.
33
-------
9.0: REFERENCES
Cleland, B. 2002. "TMDL Development from the 'Bottom Up' - Part II: Using duration curves
to connect the pieces," In: National TMDL Science and Policy 2002 - WEF Specialty
Conference. Water Environment Federation, Phoenix, Arizona.
Cleland, B. 2003. "TMDL Development from the 'Bottom Up' - Part III: Duration curves and
wet- weather assessment," In: National TMDL Science and Policy 2003 - WEF Specialty
Conference, Chicago, Illinois, p. 27
Horsburgh, J. S., D. G. Tarboton, M. Piasecki, D.R. Maidment, I. Zaslavsky, D. Valentine, and
T. Whitenack. 2009. "An Integrated System for Publishing Environmental Observations
" Environmental Modeling and. Software, 24, 879-888.
Horsburgh, J. S., D. G. Tarboton, D.R. Maidment, and I. Zaslavsky. 2008. "A Relational Model
for Environmental and Water Resources Data," Water Resources Research, 44, W05406, 12
P-
Johnson, S.L., T. Whiteaker, and D.R. Maidment. 2009. "A Tool for Automated Load Duration
Curve Creation," Journal of the American Water Resources Association, vol. 45, No 3: 654-
663.
Miltner, RJ. 2010. A Method and Rationale for Deriving Nutrient Criteria for Small Rivers and
Streams in Ohio. Environmental Management. DOI 10.1007/s00267-010-9439-9.
Nietch, C., M. Elovitz, E. Heiser, H. Thurston, L. Underwood, H. Lubbers, D. Macke, E. Best, P.
Braasch, R. McClatchey, D. Brown, M. Heberling. 2010. "Linking sources to stress
dynamics in drinking and recreational waters in a mixed-use watershed in Southwestern Ohio
with a multi-agency cooperative. Society of Environmental Toxicology and Chemistry North
American 31st Annual Meeting, Portland, OR, November 7-11, 2010.
Tarboton, D.G., J.S. Horsburgh, and D.R. Maidment. 2008. "CUAHSI Community Observations
Data Model (ODM) Version 1.1 Design Specifications," CUAHSI.
USEPA (U.S. Environmental Protection Agency). 2007. "An approach for using load duration
curves in developing TMDLs," U.S. Environmental Protection Agency: Office of Wetlands,
Oceans, & Watersheds, Washington, D.C., 68 p.
Vogel, R.M. and N.M. Fennessey, 1995. "Flow Duration Curves II: A Review of Applications in
Water Resources Planning," Water Resources Bulletin 3 1(6): 1029-1039.
34
-------
This page intentionally left blank.
35
-------
APPENDIX A:
CUAHS
universities allied for wafer research
Workshop: Publishing Data with the CUAHSI Hydrologic Information
System
September 7, 2011
by:
Dr. Tim Whiteaker (twhitgpmail.utexas.edu)
Center for Research in Water Resources
The University of Texas at Austin
A-l
-------
Distribution
Copyright © 2011, Consortium of Universities for the Advancement of Hydrologic Science, Inc.
All rights reserved.
Funding and Acknowledgements
Funding for this document was provided by the Consortium of Universities for the Advancement of
Hydrologic Science, Inc. (CUAHSI) under NSF Grant No. EAR-0622374. In addition, much input and
feedback has been received from the CUAHSI Hydrologic Information System development team. Their
contribution is acknowledged here.
A-2
-------
Table of Contents
Introduction 4
Goals of the Workshop 4
Workshop Requirements 5
About the Workshop Data 6
Review of CUAHSI-HIS Data Publication Tools 6
Workshop Outline 8
Translating and Loading Data into ODM 9
Installing a Blank ODM Database 9
Getting To Know Your Data 13
Creating a Sites File 14
Creating a Sources File 16
Creating a Variables File 18
Loading Sites, Sources, and Variables into ODM 20
Creating and Loading a Data Values File 21
Using the Streaming Data Loader 23
For Advanced Participants 31
Working with ODM Data 32
For Advanced Participants 37
Publishing an ODM Database with WaterOneFlow 38
Creating the Webclient SQL Server Account 38
Installing a WaterOneFlow Web Service 40
Testing the Web Service in a Web Browser 45
Testing the Web Service with HydroExcel 46
For Advanced Participants 48
Registering Your Service at HIS Central 49
Adding Your WaterOneFlow Web Service to HIS Central 49
Viewing Your Data in HydroDesktop 54
Appendix A: Uninstallation Instructions 59
HIS Central 59
WaterOneFlow 59
Streaming Data Loader Scheduled Task 59
ODM Database 60
CUAHSI-HIS and Related Software 60
A-3
-------
Introduction
This document provides steps to complete the hands-on training portion of a workshop that teaches how
to publish water observations data using the CUAHSI Hydrologic Information System (HIS). This
involves loading data into an Observations Data Model (ODM) database, exposing the data in a secure
and standard way online via WaterOneFlow Web services, and making the data discoverable by
registering the Web service with HIS Central.
For background information on CUAHSI or HIS, please refer to presentation materials provided at the
workshop, or the HIS website at http://his.cuahsi.org.
Why publish data online?
Sometimes people ask what the motivation is for using HIS to publish data. When the HIS team interacts
with HIS users, here are the most common reasons those users give for why they want to publish their
data:
• Academics
o Recognition of work
o Data publication is mandated by the funding agency
o To support science in the US and promote collaboration
• Agencies
o Standardizing data access (both internally and externally)
o Time savings in developing a publication system
o Public benefit with publication
o Return on investment - people can get the data themselves without requiring a
"middle-man"
o Get all the state or regional data "together"
Additionally, HIS just makes it easy for users to discover and access data. This is traditionally a pretty
big time sink for users, so let's be good citizens and make their lives (and ours) easier!
If you have other reasons for publishing data, please let us know! Fulfilling your needs is a primary
driver in the future development of HIS.
Goals of the Workshop
This workshop seeks to introduce you to the HIS data publication process. As there are numerous
avenues for publishing data, which often depend on a given user's available software and system setup,
this workshop does not seek to teach every technique that can be used for publishing data. Rather, the
workshop communicates the basic concepts of data publication, which can then be applied by you to fit
your specific needs or environment. After completing the workshop, you should be able to:
• Understand what kind of data can be stored in an Observations Data Model database
• Translate your observations data and metadata into terminology used by the Observations Data
Model
• Load your data into an Observations Data Model database
• Publish your data with a WaterOneFlow Web service
• Register your Web service with HIS Central so that others can discover it
• Access your data in a number of ways using HIS software:
A-4
-------
o Direct database connection with ODM Tools
o Direct Web service access with HydroExcel
o Discover and access with HydroDesktop
• Learn more about HIS using the HIS website at http://his.cuahsi.org
Workshop Requirements
Computers and sample data were prepared ahead of time for the workshop. However, all HIS software is
free, and so the configuration below can also be applied to your own computer if you have licenses for the
commercial software used, such as the operating system. The system outlined below closely resembles
the HydroServer system described at http://his.cuahsi.org/hydroserver.html. but without website and
spatial data hosting capabilities.
Note
The instructions in this manual are written assuming your computer is configured like the one described below.
Hardware:
• PC, 4 GB RAM, 500 GB hard disk, 2.7 GHz processor
• Network setup allowing external access to Web services installed on local PC
Software:
• Windows 7
o Microsoft Internet Information Services (IIS) 7 — comes with Windows 7, but may need
to be enabled.
• Enable ASP.NET.
• .NET Framework 2.0 SP2, 3.5 SP1 (free)
• Microsoft SQL Server 2008 R2 (commercial) or SQL Server 2008 R2 Express (free)
o Be sure to install the version with SQL Management Studio (aka Database and
Management Tools)
o Install with these options:
• Install with Mixed Mode Authentication.
• Specify the "sa" (system administrator) password that you will remember.
• HIS software (free)
o ODM Data Loader- http://his.cuahsi.org/odmdataloader.html
o ODM Streaming Data Loader - http://his.cuahsi.org/odmsdl.html
o ODM Tools - http://his.cuahsi.org/odmtools.html
• Additional software (not required for HydroServer)
o Microsoft Office 2010 (32-bit version)
o Google Earth (free) - http://earth.google.com/download-earth.html
o HydroObjects (free) - http://his.cuahsi.org/hydroobiects.html
o HydroExcel (free) - http://his.cuahsi.org/hydroexcel.html
o HydroDesktop (free) - http://his.cuahsi.org/hydrodesktop.html
Data (located on your Desktop in a folder called Workshop):
A-5
-------
• Raw data files of water quality time series
• Metadata text file describing the data
• Solution files, for reference or for use if workshop steps cannot be completed successfully
• Image files that will be associated with your data
User (You):
• Basic knowledge of how to operate a computer and use the internet
• Very basic notions of database concepts such as the terms "table" and "field"
• Rudimentary understanding of hydrology and hydrologic data
About the Workshop Data
In this workshop, you will publish time series of water quality data measured for the Lake Champlain
Long-term Water Quality and Biological Monitoring Project. The data include measurements of nitrogen,
phosphorus, temperature, total suspended solids, and chlorophyll a, taken from 1992 to 2007. More
information about this project can be found at
http://www.am.state.vt.us/dec/waterq/lakes/htm/lp longterm.htm.
Each workshop computer has been installed with "raw" data files of the water quality observations
described above. Some modifications have already been made to the raw files to facilitate data loading.
We wanted you to get a sense of how to transform data, without overburdening you with easy-but-tedious
data operations. The raw data files include site locations, time series of water quality observations, and
metadata. These files are located on your Desktop in Workshop\RawData. During the workshop, you
will transform the raw data files so that they are in a form that can be loaded into an Observations Data
Model database. As a contingency plan in case you are unable to complete the data transformation
process, transformed files have been generated for you and are located in
WorkshopVSolutionFilesYTransformedData.
Review of CUAHSI-HIS Data Publication Tools
The CUAHSI Hydrologic Information System provides Web services, tools, standards and procedures
that enhance access to more and better data for hydrologic analysis. HIS software is free and available on
the HIS website at http://his.cuahsi.org. A variety of HIS software applications have been built to serve
several types of users and scenarios, from data users to data publishers to educators and developers.
HydroServer
In this workshop, you'll be playing the role of the data publisher. This means you have some hydrologic
observations data that you've collected, and you'd like to publish that data on the Web in a standard way
so that others can easily access and use it. To facilitate data publication, HIS offers HydroServer
(http://his.cuahsi.org/hydroserver.html). which is really just a bundle of HIS software designed for data
publication that operates on a Windows computer.
HydroServer includes software for publishing observations data with a WaterOneFlow Web service, but it
also includes a website and supporting components for geospatial and temporal data visualization. These
website components are not required for data publication and are not covered in this workshop. The focus
of this workshop is on storing observations data in an Observations Data Model database and publishing
the data with a WaterOneFlow Web service.
Observations Data Model
A-6
-------
The Observations Data Model (ODM) (http://his.cuahsi.org/odmdatabases.html) is a data model for the
storage and retrieval of hydrologic observations in a relational database. An ODM database stores data
and sufficient ancillary information (metadata) about the data values to provide traceable heritage from
raw measurements to usable information allowing them to be unambiguously interpreted and used. A
relational database format is used to provide querying capability to allow data retrieval supporting diverse
analyses. To learn all the details of ODM, read the design specifications document on the website linked
above.
Data can be loaded into an ODM database using a number of tools, including free HIS software. For
loading static data files for what is generally a one-time process, the free ODM Data Loader is used.
These data files are usually the result of a study or project that has been completed and will not need
periodic updating. The ODM Data Loader is optimized for data with one data value per row. For data
that are continuously updated, such as data streaming in from sensors in the field, use the free ODM
Streaming Data Loader. The Streaming Data Loader is optimized for data with one time step per row.
For more complex data loading tasks, SQL Server Integration Services is one of many software packages
up to the task. However, these software packages are typically not free and require significant training to
learn how to use them. For this workshop, you'll gain experience with both ODM data loaders.
You'll load most of the data in this working using the ODM Data Loader. The ODM Data Loader reads
input files that are formatted much like the tables in ODM. For example, if you want to load site
locations into ODM, you could prepare a spreadsheet called "sites.xls" with column headings that use
(roughly) the same names as fields from the Sites table in ODM (names are not case sensitive). In some
cases, you can load data for more than one ODM table from a single input file by simply appending
additional columns to the data in the file. This prevents you from having to create an input file for every
table in ODM, which would be quite tedious since relational databases tend to have many associations
across many tables. A document describing the input format required by the ODM Data Loader can be
found at http ://his .cuahsi .org/odmdataloader.html.
Once data are loaded into an ODM database, you can examine the data using the free ODM Tools. Or, if
you have knowledge of SQL, you can write your own queries within SQL Management Studio. If the
data look good, you can publish the data with a WaterOneFlow Web service.
WaterOneFlow
A challenge in querying and interpreting data from disparate data sources is that each data source not only
has its own method for asking for data, but also its own format for delivering the requested data to the
user. WaterOneFlow overcomes this by providing a single query interface and a standard output format
called WaterML, which is an XML language for the communication of water data. WaterOneFlow is a
Web service, which facilitates automated and programmatic access to the data. This is an advancement
beyond simply publishing data on a web page, which can require complicated and often error-prone
screen scraping and parsing.
A WaterOneFlow Web service is available that hooks directly into an ODM database to publish data from
that database. However, WaterOneFlow Web services can also be written to support internal data formats
other than ODM. This means that no matter what data storage mechanism you choose to use, you can
still publish your data in a standard way with WaterOneFlow.
HIS Central
Once your data are published, there's still the issue of data discovery. How do people learn about your
data? That's where HIS Central comes in. HIS Central is a website maintained by the CUAHSI HIS
team where you can register your WaterOneFlow Web service. Your service then becomes discoverable
along with dozens of other Web services already registered with the system, including services for EPA
A-7
-------
and the USGS National Water Information System. This makes HIS Central the largest single catalog of
the nation's water data.
HydroExcel and HydroDesktop
Now it's time to briefly play the role of the data user. Once data are published with HIS, how do people
access the data? To help data users get started with HIS, several free applications or application
extensions are available on the HIS website, geared towards application environments most commonly
used by hydrologists such as Microsoft Excel. HydroExcel is an Excel spreadsheet customized with
macros for accessing data from a WaterOneFlow Web service. HydroDesktop is a free and open source
Geographic Information System (GIS) designed from the ground up to work with HIS. In the workshop,
you'll use both HydroExcel and HydroDesktop to verify that you have successfully published your data
with WaterOneFlow.
Workshop Outline
The workshop begins with presentations and demonstrations by the HIS team to familiarize the audience
with HIS. Contact the workshop administrator to check for availability of these materials. The hands-on
training portion of the workshop leads the audience through the data publication process with these key
steps:
1. Translate raw data for loading into ODM.
2. Load data into an ODM database.
3. Expose database content online via WaterOneFlow Web service.
4. Register Web service with HIS Central to enable data discovery by external users.
A-8
-------
Translating and Loading Data into ODM
To load data into ODM, you'll be using a tool called the ODM Data Loader. The ODM Data Loader
loads data from comma delimited files (.csv) or Microsoft Excel 2003 files (.xls) that have a one row
header that uses ODM field names in the header, followed by the data in subsequent rows. When loading
data from Excel, the data should be located in a worksheet that has the same name as the file. More about
these data formats can be found in the documentation for the data loader available at
http://his.cuahsi.org/odmdataloader.html. Generally, the fields in the input files conform to the table
structure of ODM, with some flexibility for specifying alternative information for database generated IDs.
You'll see how this works during the workshop.
Most likely, your data are not exactly in the same format as what the ODM Data Loader is expecting. For
example, instead of using the ODM terminology "SiteCode", you may call the unique ID for each of your
observation sites a "StationID". Also, you may need to do some leg work to look up information such as
the horizontal datum associated with the latitude and longitude coordinates of your site, which is one of
the pieces of information that ODM requires. Translating your data and metadata into ODM terminology
may seem a bit tedious, but this exercise is actually very valuable as it helps you to fully understand
ODM as well as your own data, and in the end you will have a database that richly describes your data.
In the interest of time, much of the data translation work has been performed for you. However, some
items still remain untranslated and are just dying to have a talented hydro hero like you perform the
transformation and save the day!
Installing a Blank ODM Database
An understanding of your own data as well as the Observations Data Model is essential before attempting
to transform your raw data into inputs for the ODM Data Loader. To help you in this process, ODM
includes some predefined terms called controlled vocabularies that you can choose from when
populating its tables. Let's grab an ODM database from the HIS website and see some of these terms for
ourselves.
To attach a blank ODM database:
1. In a web browser, navigate to http://his.cuahsi.org.
2. Under Quick Links on the right, click ODM Database.
3. Click the link to download the ODM 1.1 Blank SQL Server Schema Database.
4. Unzip the contents of the downloaded file into the SQL Server data folder, e.g., C:\Program
Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA. The download includes
both a database file (.mdf) and a log file (.Idf) that tracks transactions made to the database.
You'll now attach this database to SQL Server, and eventually you will load the workshop data into this
database.
Tip
SQL Server is the database software installed on the workshop computers. "Attaching a database" to SQL Server
basically means letting SQL Server know about your database so that its software can work with it.
5. Start SQL Server Management Studio.
6. Make sure the Server name is (local).
A-9
-------
Note
If you are using SQL Server Express, then wherever you see (local) in this document, you should use
(local)\SQLExpress instead.
7. Click Connect to log into SQL Server.
Tip
SQL Server Management Studio is an application that lets you manage, view, and execute queries on your
databases.
8. In the Object Explorer on the top left, right-click Databases and click Attach (Figure 0-1).
Object Explorer
Connect -
Q (A CRWR-WHITE AKERT (SQL Server 9.0.30
+
New Database...
Attach...
Figure 0-1 SQL Server Management Studio is used to attach databases to SQL Server
9. In the Attach Databases dialog that opens, click Add.
10. Navigate to the folder where you saved your database, select OD.mdf, and click OK.
11. Change the "Attach As" name to MyWaterData (Figure 0-2), and click OK. (You can change the
name to whatever you want, but let's all use MyWaterData as the name for this workshop
exercise.)
MyWaterData |
Figure 0-2 You can assign a database name of your choice in SQL Server
12. Click OK if prompted about full-text catalogs.
13. In the Object Explorer, click the plus sign to expand Databases. You should now see your
MyWaterData database.
Now that the database is attached, let's look its tables.
To explore the database you just attached:
1. In the Object Explorer, expand MyWaterData, and then Tables, to see a list of tables in this
blank ODM database.
2. Open the DataValues table by right-clicking it and then clicking Select Top 1000 Rows. This
table stores time series values. Each row stores a single datetime, a single value, and metadata
about that value. The table is currently blank, but you'll fix that later on!
3. Notice that there is a field in this table called SitelD. Rather than repeat the latitude, longitude,
and other site information with every row in the DataValues table related to a given site, ODM
A-10
-------
keeps the database compact by using the SitelD to locate a matching row in the Sites table
where the site details are stored. These relationships between tables are used extensively
throughout ODM, leveraging the power of a relational database.
4. Click the X near the top right of the query that was opened to close the query and the table. Be
sure not to click the X in the blue title bar for the application, or you will close Management
Studio!
5. Open the Sites table to see the information describing each site. Close the table when you are
finished looking at it.
So far, the tables that you've been looking at are empty. Next, you'll open some tables that have already
been populated with values that will be useful as you load data.
6. Open the VariableNameCV table. The letters "CV" at the end of the table name indicate that
this is a controlled vocabulary table. Only terms from this table are used to describe variable
names. This helps to standardize the terminology used to describe data across multiple ODM
databases (Figure 0-3).
Term
Momentum flux
N, albuminoid
Net heat flux
Nickel
Nitrogen, Dissolved Inorganic
Nitrogen, Dissolved Organic
Nitrogen, gas
Definition
Momentum flux
Albuminoid Nitrogen
Outgoing rate of heat energy transfer minus the inco.
Nickel (Mi)
Dissolved inorganic nitrogen
Dissolved Organic Nitrogen
Gaseous Nitrogen (N2)
Figure 0-3 The VariableNameCV table is a controlled vocabulary of terms to use when naming variables
Tip
A level of data integrity is enforced through the use of controlled vocabularies (CVs) within ODM. If a field uses a
CV, then only terms from that CV can be entered into that field. This way, the database uses consistent
terminology. If there is a CV term that you need which is not already in ODM, you can add it. Just remember to
add the term to the CV table first, and then load your data.
7. Open the SampleMediumCV table. This table has terms used to describe the sample medium in
which a measurement applies.
8. Continue opening and browsing the tables of ODM. When you are ready to move on, close the
tables and minimize SQL Server Management Studio. You can leave Management Studio open
as we'll use it again later.
The controlled vocabularies were originally conceived by the HIS team and are updated by users like you
who submit change requests or additions to the CV website at
http://his.cuahsi.org/mastercvreg/cvl 1 .aspx. As this list is updated, the CVs in your database may
become out of date. The ODM Tools makes updating your CVs a snap. You'll use the ODM Tools again
later to analyze the data you load, but now let's use the tools just to update the CVs in your database.
A-ll
-------
The ODM Tools software is a free download from the HIS website at http ://his .cuahsi .org/odmtools .html.
The tools have already been installed on the workshop computers.
To update controlled vocabularies using the ODM Tools:
1. Start the ODM Tools. When you run the ODM Tools for the first time, it will prompt you for a
database connection.
2. Input these values into the New Database Connection dialog and then click Save Changes.
a. Server Address: (local)
b. Database Name: MyWaterData
c. Server User ID: sa
d. Server Password: [enter the password for the server user ID here]
3. Dismiss the message indicating that the connection was successful.
Tip
The sa account is an administrator account for SQL Server. It has all the privileges we need for this workshop and
more. For security in your production environment, you may want to establish a SQL Server login with lower
permissions and call it ODMToolsUser, for example, so that your users do not need to know the administrator
login and password. Steps for creating a login are provided later in this manual.
When the tools open, the interface looks a bit empty. That's because you haven't loaded data into the
database yet. However, we can still update the controlled vocabularies.
4. Click Tools | CV Update.
A dialog opens showing you a comparison between the master controlled vocabulary list maintained by
CUAHSI (on the left) and your local CV (on the right). You can change the CV that you are looking at
using the drop down box in the top left corner of the dialog.
5. Change to the Sample Medium CM.
As you can see, (at the time this document was written) the master list has been updated since the last
time the ODM blank database was created for the HIS website.
6. Click Update Local CV.
Your local CV now indicates that terms have been added but not committed.
7. Click Apply. This commits the changes to your local database.
Tip
If you don't see the word "Apply," after clicking Update Local CV, then it means your database is already up-to-
date.
8. Repeat this process to update the remaining CVs in the list.
9. Once the CVs have been updated, close the ODM Tools.
Now that you've had a hands-on introduction to ODM, let's get to know the raw data that you'll be
loading into it.
A-12
-------
Getting To Know Your Data
Let's now take a look at the raw data for the workshop, available on your computer at
Workshop\RawData. These data files were created from real data extracted out of the Lake Champlain
Long-term Water Quality and Biological Monitoring Project. Note that the data have been slightly
reformatted to facilitate use in this workshop, but much of the terminology used by the data is untouched,
giving you a sense of what it takes to transform real data for loading into HIS.
Purpose of the Data
Some aspects of the data have been modified for use in the workshop. These data are provided solely for use in
the HIS workshop, and are not intended to be used for real analysis or decision making.
You should see three files in the RawData folder:
• sampling_sites.xls - Excel spreadsheet with locations for all sampling sites
• LCM_Data.xls - Excel spreadsheet with the time series of water quality for all sites and variables
• metadata.txt - Text file with metadata describing your data
Sites
Open sampling_sites.xls. Each row in this file represents a single sampling site and includes the
following information:
• StationID - Unique internal identifier for a site
• StationName-Name of the site
• Latitude - Latitude of the site in decimal degrees
• Longitude - Longitude of the site in decimal degrees
• County - The county that the site is in
• State - The state that the site is in
Time Series
Open LCM_Data.xls. Each row in this file represents a single water quality measurement at a particular
site at a particular point in time. By looking at the dates and times of the measurements, it appears that
the data were taken sporadically through time. The spreadsheet includes the following information:
• Station - Unique internal identifier for a site
• Date - Date that a measurement occurred
• Time -Time that a measurement occurred
• Depth - Depth at which the measurement was taken
• Test - Code for the variable being measured
• Result - Value of the variable at the given date and time
• Method - Method used to determine the value
Metadata
Open metadata.txt. If this data provider wanted to share the data, the information in the sites and time
series files alone may not be enough to fully describe the data. Therefore, metadata such as the content of
metadata.txt are often provided alongside the actual data files. In this file, you can find information about
the nature of the study, the variables involved, and the data source. You'll use this information to help
load the data into HIS.
A-13
-------
Transformed Output
From these files describing sites, time series, and metadata, you will create transformed files that are
formatted for loading into an ODM database using the ODM Data Loader. The transformed files that you
will create are:
• Sites
• Sources
• Variables
• DataValues
These are basically the minimum pieces of information you need to describe your data in ODM. Of
course, there are plenty of additional types of information you can load into an ODM database to more
fully describe your data, but for the purposes of this workshop, you'll just load the above items.
Before performing the transformation, it's imperative that you familiarize yourself with the structure of
the Observations Data Model (http://his.cuahsi.org/odmdatabases.html). and the requirements for input
files to the ODM Data Loader (http://his.cuahsi.org/odmdataloader.html). Review the information in
"Review of CUAHSI-HIS Data Publication Tools," the workshop presentation materials, and the online
materials linked above for more information. The following sections describe the transformation
procedure.
Creating a Sites File
Information about your sampling sites is contained in the RawData\sampling^sites.xls file. You'll
transform this to a TransformedData\sites.csv file. The fields in the transformed file include:
• SiteCode
• SiteName
• Latitude
• Longitude
• County
• SiteState
• LatLongDatumSRSName
These are some of the field names the ODM Data Loader expects to see when loading information about
sites. The first six fields match very well with the data from sampling_sites.xls. Note that you will be
adding one field that isn't in sampling_sites.xls: LatLongDatumSRSName. The ODM requires the datum
to be stored with the latitude and longitude coordinates of a site. Luckily, you can find that data in the
meatadata.txt file. The metadata indicates that the datum used is WGS84. There is already a record for
this datum in the SpatialReferences table of ODM (Figure 0-4). The record has a SpatialReferencelD of
3, and an SRSName value of WGS84.
SpatialReferencelD SRSID SRSName
0 0 Unknown
1 4267 NAD 27
2 4269 NADS 3
Figure 0-4 The WGS84 datum is among the list of coordinate systems in the ODM SpatialReferences table
A-14
-------
Being a relational database, the ODM Sites table is expecting a numerical datum ID (in this case, the
number 3) to accompany each site record. However, it's easier for us humans to interpret text rather than
numbers during the data translation process, which is why the ODM Data Loader allows you to use
"WGS84" instead of the number 3 to refer to your datum. The ODM Data Loader will make sure the
LatLongDatumSRSName refers to an SRSName in ODM's SpatialReferences table before finalizing the
data loading operation. This is one of the advantages of using the ODM Data Loader for loading data - it
performs some quality control and maintains integrity of relationships between the tables of ODM during
loading.
You'll be following this procedure of referencing information in your ODM database a lot as you figure
out how to translate your data to ODM. A quick summary of the procedure is:
1. You need to know how to describe some aspect of your data, such as the datum.
2. You check the ODM database to see if a table already exists to describe that item.
3. You find a match for your item in the ODM table, and use the matching term from ODM as you
build your translated data files for eventual loading into the ODM database.
If you don't find a matching item in ODM, you can add it. If you think the item should have been in
ODM's CVs in the first place, then you can petition to have the item added to the Master Controlled
Vocabularies at http://his.cuahsi.org/mastercvreg/cvl 1 .aspx. although updating those CVs is beyond the
scope of this workshop.
Details of how to map from the raw file to the transformed file are below.
Table 0-1 Mapping Raw Data to Sites
Transformed Field
SiteCode
SiteName
Latitude
Longitude
County
SiteState
LatLongDatumSRSName
Raw Data Field (from
sampling sites.xls)
StationID
StationName
Latitude
Longitude
County
State
Use "WGS84" from ODM SpatialReferences table
Tip
The ODM Data Loader ignores case in the field names, so Longitude and LONGITUDE are both valid field names.
Tip
If you have any trouble creating the transformed files, you might find it helpful to refer to the solution files in
Workshop\SolutionFiles\TransformedData.
To create the transformed sites file:
1. Open Workshop\RawData\sampling_sites.xls.
2. Save the file in the Workshop\TransformedData folder as sites.csv. Be sure to select CSV
(Comma delimited) (*.csv) from the Save as type drop down box as you save the file.
A-15
-------
ODM Data Loader Best Practice - Use CSV Files
The ODM Data Loader can work with both comma delimited (.csv) files and Microsoft Excel 2003 (.xls) files.
However, the author has found that sometimes Excel cell formatting can cause an incorrect interpretation of the
data. Therefore, the author recommends saving the transformed files as comma delimited text files, which contain
no instructions about how data should be formatted.
3. After saving, click Yes if prompted to keep the workbook in CSV format.
Transforming the sites file will be very easy. You'll start by renaming some fields, and then add one new
field. Note that you must not misspell any of the field names, or else the ODM Data Loader will not
recognize the field.
4. Rename the following fields:
a. StationID to SiteCode
b. StationName to SiteName
c. State to SiteState
5. Add a field called "LatLongDatumSRSName" (without quotes) and calculate all values to be
"WGS84" (without quotes).
6. Save the file. If prompted about keeping the workbook in CSV format, click Yes.
7. Close the file. If prompted about saving changes to the file, click No. You just saved them, so
you should be fine.
Great job! Creating the sites file was a snap since the raw data already had a sites file to begin with. In
addition to sites, ODM also keeps a table of data sources. Think of a data source as a group, agency, or
institution that operates monitoring sites. Next you'll create a sources file that defines the data source
behind your data.
Creating a Sources File
Transforming the sites file was pretty easy since the raw data included a sites spreadsheet that you could
start from. However, a file for data sources is not present among the raw data files. Instead, you'll find
the information you need to describe the data source in the RawData\metadata.txt file. This file is
largely extracted from http://www.anr.state.vt.us/dec/waterq/lakes/htm/lp longterm.htm. albeit condensed
a bit for workshop brevity.
The fields in the sources file that you will create include:
• Organization • Address • TopicCategory
• SourceDescription • City • Title
• SourceLink • SourceState • Abstract
• ContactName • ZipCode • ProfileVersion
• Phone • Citation • MetadataLink
• Email
The information in the transformed file will actually be used to insert data into two tables: Sources and
ISOMetadata. The ODM Data Loader will use the last five columns to create an entry in the
ISOMetadata table, and relate the information back to your new entry in the Sources table. It knows the
appropriate place to put your data in the database.
A-16
-------
Details of how to map from the raw file to the transformed file are below.
Table 0-2 to
Transformed
Field
Organization
SourceDescripti
on
SourceLink
ContactName
Phone
Email
Address
City
Source State
ZipCode
Citation
TopicCategory
Title
Abstract
ProfileVersion
MetadataLink
Metadata
Location
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Source Details
Read from ODM
TopicCategoryCV
Header
STUDY
DESCRIPTION
n/a
Header
Value
VT Department of Environmental Conservation
The Vermont Department of Environmental Conservation's
mission is to preserve, enhance, restore and conserve
Vermont's natural resources and protect human health for the
benefit of this and future generations.
http://www.anr.state.vt.us/dec/dec.htm
Jill Data
555-555-5555
i ill . data(S>champlain .com
123 Main Street
Datatown
VT
15671
P. Stangel, A. Shambaugh, F. Dunlap, (1992-2008), "Sixteen
years of water-quality data collection on Lake Champlain,
Vermont and New York, United States"
inlandWaters
Lake Champlain Long-Term Monitoring Program data
The Long-Term Water Quality and Biological Monitoring
Project for Lake Champlain has been in operation since 1992.
The project is conducted by the Vermont Department of
Environmental Conservation (DEC) and the New York State
Department of Environmental Conservation with funding
provided by the Lake Champlain Basin Program and the two
states. The monitoring network includes 15 lake stations
representing major lake segments with distinct physical and
water quality characteristics.
Unknown
http://www.anr.state.vt.us/dec/waterq/lakes/htm/lp lonaterm.ht
m
To create the sources file:
1. Using Excel, create a new file in Workshop\TransformedData named sources.csv. Be sure to
save the file as a comma delimited file. Click OK if prompted about saving only the active sheet
and Yes if prompted about keeping the worksheet in this format.
2. In row 1, type the following column names:
a. Organization
b. SourceDescription
c. SourceLink
d. ContactName
e. Phone
A-17
-------
f. Email
g. Address
h. City
i. SourceState
j. ZipCode
k. Citation
I. TopicCategory
m. Title
n. Abstract
o. ProfileVersion
p. MetadataLink
3. Locate the pertinent information from the metadata file as described in Table 0-2 above and
enter it into row 2 of the worksheet.
Note
If you're feeling lazy, you could just copy the data from the solution file. However, looking up information tucked
away in metadata is very typical in the data loading process, so don't deny yourself the enriching experience of at
least looking up a few of those fields!
4. Save and close the file.
Wow, can you believe we're already half-way through the data transformation process? Two down, two
to go! Not even the strictest controlled vocabulary can contain my excitement! Was that too much?
Maybe I should apply some database constraints to myself. Ok, seriously, I'm finished now.
At this point, you might be so fired up about data translation that you're already thinking about how to
work with that file of water quality time series, but first let's define your variables so that ODM knows
what kind of time series data you have.
Creating a Variables File
Your data represents several water quality time series variables. Some information about these variables
can be found in the metadata.txt file. Often you'll have to do a bit of legwork to fill in the rest, in order to
fully describe your data in ODM.
As a brief summary, measurements for nitrogen, phosphorus, temperature, total suspended solids, and
chlorophyll a were taken sporadically in time. Some metadata about these variables can be found in the
Variables section of the metadata text file.
The fields in the variables file that you will create include:
• VariableCode
• VariableName
• Speciation
• VariableUnitsName
• SampleMedium
• ValueType
A-18
-------
• IsRegular
• TimeSupport
• TimeUnitsName
• DataType
• GeneralCategory
• NoDataValue
VariableName, Speciation, SampleMedium, ValueType, DataType, and GeneralCategory must all
conform to terms in ODM controlled vocabularies. This can actually make your life easier because all
you have to do is pick the CV term that best describes your variable.
The ODM Data Loader will use the abbreviation for the variable units name to match the variable to a
unit in the Units table. This is another example of how the ODM Data Loader performs integrity checks
on the data during the loading process.
Details of how to supply values in the transformed file are shown in the table below. The example values
are for the nitrogen variable.
Transformed
Field
Variable Code
VariableName
Speciation
VariableUnitsName
SampleMedium
ValueType
IsRegular
TimeSupport
TimeUnitsName
DataType
GeneralCategory
NoDataValue
Value
TN
Nitrogen, total
N
micrograms
per liter
Surface Water
Sample
FALSE
0
Day
Sporadic
Water Quality
-9999
Notes
The term used for nitrogen, from metadata.txt
From VariableName CV
From SpeciationCV
From UnitsName field of Units table, matches units from
metadata.txt
From SampleMediumCV
From ValueTypeCV
These are instantaneous measurements made irregularly
time
through
Use "0" when instantaneous measurements are recorded
The value doesn't really matter for instantaneous data, as
it matches text in the UnitsName field of the Units table
long as
From DataTypeCV
From the GeneralCategoryCV
From metadata.txt
To create the variables file:
1. In the interest of time, a file named variables.CSV has already been created for you in the
Workshop\TransformedData folder. This file has records for four out of five variables. You'll
add the fifth variable, nitrogen. Open the file with Excel.
2. In row 6, fill in the values for the nitrogen variable. For reference, see Table 0-3.
3. Save and close the file.
Nice job! Three out of four input files for the ODM Data Loader have now been created. Before you
work on the last file, let's go ahead and load the other data into the database.
A-19
-------
Loading Sites, Sources, and Variables into ODM
Before a single time series value is loaded into an ODM database, you must have already loaded
information that describes the time series, e.g., the sites, sources, and variables data that you just finished
preparing. As you load time series values, the ODM Data Loader will look for the other information
related to the time series to make sure it is being appropriately described. If something is missing, the
ODM Data Loader will pop up a friendly message basically saying something to the effect of, "Hi there.
I see you're loading data, but you haven't told me what the data represent or how to describe it in the
database." This is one way the data loader helps to ensure the integrity of your data.
So without further ado, let's run the ODM Data Loader on those transformed files that you've been
working on.
1. Open the ODM Data Loader.
2. Enter the connection information using the same information as you did for the ODM Tools.
3. Click Save Changes.
4. Dismiss the message indicating that the connection was successful.
With the connection set, you are now ready to open the transformed data files and commit them to the
database.
5.
6.
Click Open.
Navigate to and open the Workshop\TransformedData\sites.csv file. The ODM Data Loader
previews the file (Figure 0-5). As indicated in the bottom left corner, the application has
recognized that you are loading sites information. It does this based on the field names in the
input file you selected.
•J ODM Data Loader
File Help
r~
00®
::\H 1 S_Training\Transformed DataNsites .csv
7
*
<
SiteCode
E^^H
19
25
34
36
fou are oading Sites
Site Name
Port Henry Segment
Main Lake
Malletts Bay
Northeast Aim
Isle LaMotte (off Grand Isle)
Open
Latitude Longitude
44.126
44.471
44.582
iA7C81GGG7
44.75616667
-73.41283333
-73.2331SSS7
-73.28116667
-73.22683333
-73.355
>
Commit File
Figure 0-5 Loading sites into ODM
7.
Click Commit File to write the records to the database. After a moment, the user interface is
cleared indicating that the operation has completed.
8. Repeat steps 5-7, this time loading sources.csv.
9. Repeat steps 5-7, this time loading variables.csv.
10. Minimize the data loader when it has finished.
You'll use the data loader again in a moment to load the time series values. For now, take look inside the
database to see that data were loaded successfully.
A-20
-------
11. Restore SQL Server Management Studio. If you closed it, please open and connect to it again.
12. In the MyWaterData database, open the Sites table.
You should now see your sites in the table (Figure 0-6). If the table is still blank, try closing and
reopening the table to refresh it.
SiteCode
19
25
34
36
7
SiteName
Main Lake
Malletts Bay
Northeast Arm
Isle LaMotte [off. . .
Port Henry Seg,.,
Latitude
44.471
44,582
44,70816667
44,75616667
44,126
Longitude
-73.29916667
-73,28116667
-73.22683333
-73.355
-73.41283333
LatLongDatumlD
3
3
3
3
3
Figure 0-6 Sites information successfully loaded into ODM
Notice the LatLongDatumlD field. The ODM Data Loader automatically used the abbreviation for the
datum that you provided in the transformed sites file and matched it up with the datum ID from the
SpatialReferences table in ODM.
13. Open the Sources table. Notice that the data source was automatically assigned a SourcelD of
1. Also notice that the Title, Abstract, etc., that you entered earlier has been replaced with a
single MetadatalD value. If you open the ISOMetadata table, you'll find the additional
metadata. The data loader is flexible enough to allow you to load both sources and metadata
information from a single file or from two separate files if you had chosen to do so.
14. Open the Variables table to see the result of data loading.
With this information in the database, you can now load the time series values.
Creating and Loading a Data Values File
The actual time series of water quality data is stored in raw form at
Workshop\RawData\LCM_Data.xls. Now that you've loaded metadata about the time series into your
ODM database, you're ready to load the time series values themselves.
The fields in the transformed data values file that you will create include:
• SiteCode
• LocalDateTime
• UTCOffset
• OffsetValue
• OffsetUnitsName
• OffsetDescription
• VariableCode
• DataValue
• MethodDescription
• SourcelD
• QualityControlLevellD
• CensorCode
A-21
-------
Many of these fields do not have an equivalent in the LCM_Data.xls file. The original data source may
not have conceived that their data would be fully described in an ODM database! For some of the
additional fields, you will find matching terms or IDs in the ODM database. For example, the ODM Data
Loader will match the SiteCode and VariableCode to items in the Sites and Variables tables which have
already been loaded. You'll use the SourcelD from the Sources table that matches the record you added
earlier. Similarly, you can look in the Methods and QualityControlLevels tables to get an idea of what to
enter for MethodID and QualityControlLevellD.
Details of how to supply values in the transformed file are shown below.
Table 0-4 Mapping Raw Data to Data Values
Transformed Field
SiteCode
LocalDateTime
UTCOffset
OffsetValue
OffsetUnitsName
OffsetDescription
VariableCode
DataValue
MethodDe scription
SourcelD
QualityControlLevellD
CensorCode
Raw Data Field
(from
LCM Data.xls)
Station
Date and Time
"-5"
Depth
"meter"
"Depth below water
surface"
Test
Result
Method
e.g., "1"
"-9999"
"nc"
Notes
Concatenate the two fields to form a single
LocalDateTime field
From metadata.txt, all values are in Eastern Standard
Time, which is five hours behind Coordinated
Universal Time (UTC), hence the value of -5 for the
UTC offset
From metadata.txt
From metadata.txt
This should be the value of the SourcelD for the source
data you entered earlier
"-9999" indicates an unknown QC level, defined in the
QualityControlLevels table of ODM
From CensorCodeCV table of ODM. "nc" is the
default value meaning "not censored."
Like the raw sites file earlier, you'll start with the raw time series file and modify it accordingly.
To create the data values file:
1. Open Workshop\RawData\LCM_Data.xls with Excel.
2. Save the file in the Workshop\TransformedData folder as datavalues.csv. Be sure to save the
file as a comma delimited file.
3. Rename the following fields:
a. Station to SiteCode
b. Depth to OffsetValue
c. Test to VariableCode
d. Result to DataValue
e. Method to MethodDescription
A-22
-------
4. Add a field called "LocalDateTime" and calculate all values to be the concatenation of the Date
and Time fields. For example, in row 2, you would use the formula "=B2 + C2".
5. Add a field called "UTCOffset" and calculate all values to be "-5".
6. Add a field called OffsetUnitsName and calculate all values to be "meter".
7. Add a field called OffsetDescription and calculate all values to be "Depth below water surface".
8. Add a field called "SourcelD" and calculate all values to be the SourcelD that was generated
when you created the source information in the database earlier.
9. Add a field called "QualityControlLevellD" and calculate all values to be "-9999".
10. Add a field called "CensorCode" and calculate all values to be "nc".
11. Save and close the file.
Tip
Quality control levels provide some confidence as to the amount of quality control performed on a dataset. A
quality control level of zero (0) indicates raw data, while a quality control level of one (1) indicates quality
controlled data. The level of quality control for individual data values is not available for the workshop dataset,
so you'll use a value of -9999 to indicate "unknown". For more information on quality control levels, see the
ODM design specifications document at http://his.cuahsi.org/odmdatabases.html.
Tip
It's OK to leave the Date and Time columns in the file. Because those are not field names that the ODM Data
Loader recognizes, it will just ignore the columns.
Now all that's left is to load the data values into ODM.
12. Restore the ODM Data Loader. If you closed the program, please reopen it and connect to the
MyWaterData database.
13. Open the datavalues.csv file.
14. Commit the data.
15. When the data loader finishes, close the program and view the results in the DataValues table in
SQL Server Management Studio.
That's it! You've now finished loading all of the raw data into an ODM database. If you are familiar
with SQL, you can now write queries to play with other tools in SQL Server to work with the data. But
just in case you are not a SQL pro, HIS has developed software called the ODM Tools, which can be used
to query and plot graphs of data in an ODM database.
Using the Streaming Data Loader
While the ODM Data Loader is designed to facilitate one-time loading of archived data on disk, the
Streaming Data Loader is designed to run as a scheduled task to load data from sensors operating in the
field. These sensors typically are connected to a datalogger that handles the recording of data from
sensors. Data from the datalogger are sent via a telemetry system to a resource on your computer
network, usually in the form of delimited text files. As new data are recorded, the text files are updated
with the latest values added to the end of the text file. An alternate scheme is for the telemetry system to
create a new text file for each data download. In either case, each row in the text file represents
measurements at a single time stamp, and the data values for each variable are stored in separate columns
A-23
-------
within the file. A given file is typically associated with only one monitoring site. For an example of this
kind of file, see the sdl.csv file in your Workshop\Streaming folder. More information on the Streaming
Data Loader is at http ://his .cuahsi .org/odmsdl .html.
In this portion of the exercise, you will use the Streaming Data Loader to load data for a new site that was
added to the study. This site only measures nitrogen and phosphorus, and it has the characteristics in
Table 0-5.
Table 0-5 Streaming Data Loader Example Site
Property
Site Code
Site Name
Latitude
Longitude
Datum
County
State
Sensor Method
Sensor Depth
QC Level
Variables
Value
50
Sunset Point
44.71
-73.25
WGS84
Grand Isle
Vermont
Composite, unfiltered
10 meters
Raw data
TN and TP
Data for this site have already been prepared for you and are located in your Workshop\Streaming
folder.
To view the example datalogger files:
1. Browse to and open the sdl.csv file in your Workshop\Streaming folder.
2. Notice the columns for total nitrogen (TN) and total phosphorus (TP).
3. Close the file.
Next you will tell the Streaming Data Loader how to map from data in your file to the ODM database.
To map a data values file using the Streaming Data Loader:
1. Open the Streaming Data Loader Configuration Wizard by clicking Setup ODM SDL on your
desktop, or by clicking the configuration wizard item in All Programs under CUAHSI HIS.
2. Click the Add button (the plus sign) at the top of the window.
3. In the dialog that opens (Figure 0-7), click the browse button next to the Local File text box.
4. In the dialog that opens, browse to and open the sdl.csv file in your Workshop\Streaming
folder.
5. Leave the Run Every option to run once per minute. For demonstration purposes, this will
ensure that the loader runs each time you tell it to run.
Note - "Run Every" Frequency vs. Scheduled Task Frequency
The "Run Every" option is like telling the data loader how long to rest between attempts to load data. So if I set
the "Run Every" option to 1 hour, and I run the data loader seven times within the first hour, then the data
loader will run the first time, but for the remaining six times it will say, "Sorry, I'm not even going to look at your
data files because I'm still resting." Likewise, if I take a break for the next seven hours, the data loader won't do
A-24
-------
anything because I didn't tell it to run. If I tell it to run again on the seventh hour, it will say, "More than an hour
has passed since I last ran, so I will look at your data files and load new data if present."
You probably don't want to wait around every hour and then click to run the data loader. Fortunately, Windows
can run the data loader for you as a scheduled task. You tell Windows how often to run the data loader and then
Windows handles the rest. Obviously, you wouldn't set up the scheduled task frequency to be finer than the
"Run Every" option or else the data loader would sometimes tell Windows, "I'm still resting" and it won't do any
work. You'll set up a scheduled task for the data loader later in this exercise.
6. Specify the connection string to your database as before.
7. Specify that there are column headers on row 1 and that data begin on row 2.
8. Click Next.
Location
»• Local Rle C:\Workshop\Streaming\sdl.csv
"• Website
Delimiter: 'Comma Delimited > T
Run Every: 1
Start: 8/23/2011
minutes
@ 9:00:00 AM
Please select a Database:
Microsoft SQL Server
Server Address: (ocal)
Database Name: MyWaterData
Server User ID: sa
Server Password: •••«••••••
Column Headers on Row tt
Data Starts on Row tt
(0 for None)
Include Data previous to Data Values that are already in the Database.
Cancel
Next
Figure 0-7 Specifying a file and database with the Streaming Data Loader
On the next page of the wizard, you see your data values and some more options to set. You will tell the
data loader which column stores your datetimes, and then map your TN and TP columns. In this
example, your datetime column stores local datetimes.
9. In the bottom left, set the Time option as follows:
a. Choose the option for Local Date Time.
b. Select the LocalDateTime column in the drop down box.
c. Select -5 as the time zone. This site is five hours behind UTC time.
d. Leave the DST box unchecked. The sensor at this site does not use daylight savings time.
10. Near the bottom right, click the Add button to map a data value column.
11. Choose TN to map the total nitrogen column and click Next.
A-25
-------
You've told the data loader that the values in the TN column are time series values. Next you will tell the
data loader which site, variable, etc., are associated with values in this column. Notice that the data
loader is currently showing you a list of sites that it found in your database. You can choose from this list
or create a new site. Since the Sunset Point site is not in the database yet, you will add it.
12. Near the bottom right, click the Add button to add a new site to the database.
13. Fill in the site parameters according to values in Table 0-5 (Figure 0-8) and click OK. If the
information isn't listed in the table, then you can leave it blank.
Add Ne« Site ^^^H
Required
Site Code
50
Site Name
Sunset Point
Latitude Longitude
44.71 -73.25
Latitude/Longitude Datum
4326-WGS84
Optional
Elevation in meters
1
T-
Local X Local Y
Positional Accuracy in meters
*
State County
Vermont Grand Isle
Comments
A
Cancel
| ok ||
Figure 0-8 Adding a New Site
The new site now shows up in the list of sites and is selected.
Define Series
'lease Select a Site.
'ress *to Create a New Site.
>
SiteCode SiteName Latitude Longitude
19
25
34
36
Main Lake21
Malletts Bay21
Northeast Am21
Isle LaMotte (off Grand lsle)21
44.464397
44.564364
44.67012
44.735773
-73.286155
-73.259328
-73.259608
-73.384465
| 5D Sunset Point 44.71 -73.25
7
Port Henry Segment21
44.090711
-73.410854
Figure 0-9 Choosing a Site
14. With the Sunset Point site selected, click Next.
15. In the list of variables, select TN and click Next.
A-26
-------
16. Select the "composite, unfiltered" method and click Next.
17. With the only source record selected, click Next.
18. Chose the Depth below water surface offset type. Then at the bottom of the window, enter an
offset value of 10 (Figure 0-10). Click Next.
Please Select a Offset Type and Offset Value
Press + to Create a New Offset Tree.
Figure 0-10 Choosing an Offset
19. Select the Raw data quality control level and click Finish.
20. Repeat the steps above to add the total phosphorus (TP) variable.
21. Once TP has been mapped (Figure 0-11), click Finish in the Add New File window.
A-27
-------
Local DateTime
TN
5/7/2011 15:30
6/16/2011 10:50
7/29/2011 11:30
D.72
C.64
D.5S
Time [must select one option)
O UTC Date Time
'» Local Date Time Local Dateline
Time Zone Us ^1 D DST
Value Column Site Variable Offset Type Offset Value
Q2^^^^^|[^^HE^^^^B^^^^^BB!^^^H
TP 641 10
Back
Finish
Figure 0-11 Input File Mappings for Streaming Data Loader
The Streaming Data Loader Configuration Wizard now shows a record for the file you just mapped. The
configuration options that you specified in setting up this record are stored in an XML file located in your
AppData folder, e.g., C:\Users\USERNAME\AppData\Local\CUAHSI\StreamingDataLoader\1.1.2. You
can edit this XML manually, which means you could also write a script to automatically create these
configuration files. This would be useful if you had a very large network of streaming sensors and didn't
want to specify the configuration for each sensor file using the wizard.
You can now execute the data loader by clicking the Run button in the configuration wizard.
To run the Streaming Data Loader from the Configuration Wizard:
1. Click the Run button 3 to execute the Streaming Data Loader.
2. After a moment, view the data values in your database using SQL Server Management Studio to
see that they have been updated.
Tip"
To show just the values for Sunset Point, which was assigned a SitelD of 6 in your database, append the following
WHERE clause to the DataValues query in SQL Server Management Studio and click Execute.
WHERE SitelD 6
To simulate new values arriving from a sensor, you'll now append the values from sdl-2.csv into sdl.csv.
The sdl-2 file contains values for August and September, 2011.
3. Use a program such as Excel or Notepad to copy the values from sdl-2.csv, and append those
values to sdl.csv.
4. Save sdl.csv.
A-28
-------
5. Close sdl.csv and sdl-2.csv.
6. In the Streaming Data Loader Configuration Wizard, click the Run button.
7. After a moment, view the data values in your database to see that they have been updated.
8. Close the Streaming Data Loader Configuration Wizard.
The Streaming Data Loader saw that new data values had been added to the sensor file (sdl.csv) and
updated the database accordingly. It did not reload the old data values even though they were still present
in the sensor file.
Tip
In the case that your monitoring and telemetry system creates a new streaming data file for each datalogger
each time data is downloaded, you can connect to multiple local files containing data from the same datalogger
by using wildcard characters (i.e., entering 'C:\StreamingData\ThisSite*.dat' will use all files within the
C:\StreamingData folder that begin with ThisSite' and have a '.dat' extension). All of these files must be
formatted exactly the same. The ODM SDL will scan each file each time the update is run for new data to load
into the database.
Since the configuration seems to be working correctly with the Streaming Data Loader, you can now set it
up to run automatically as a scheduled task in Windows.
Update sdl.csv with even more data:
1. In your Workshop\Streaming folder, append the values from sdl-3.csv to sdl.csv.
2. Save sdl.csv.
3. Close sdl.csv and sdl-3.csv.
To set up the Streaming Data Loader to run as a scheduled task:
1. Start Task Scheduler (Start | All Programs | Accessories | System Tools | Task Scheduler).
2. Task Scheduler tends to open in a smaller window than it should. Maximize Task Scheduler.
3. Click Action | Create Basic Task.
Enter Streaming Data Loader as the name, Runs ODM Streaming Data Loader as the
Description, and click Next (Figure 0-12).
4.
Use this wizard to quickly schedule a common task. For more advanced options or settings
such as multiple task actions or triggers, use the Create Task command in the Actions pane,
Streaming Data Loader
Description: Runs ODM Streaming Data Loader
Figure 0-12 Creating a Basic Task with Task Scheduler
5. With Daily selected, click Next.
6. Choose a time a few minutes from now and click Next.
7. With Start a program selected, click Next.
A-29
-------
8. Browse to the Streaming Data Loader executable (e.g., "C:\Program Files (x86)\CUAHSI
HIS\ODM SDL 1.1.2\ODMSDL.exe") and click Next.
9. Click Finish.
After the time has passed for which the task was scheduled, check your database to see if the new values
were appended. Once you've seen that the task completed successfully, you may want to adjust the
frequency of the task. Even though the smallest time period available was Daily when creating the task,
you can specify a shorter time period in the properties window for a task once it has been created.
To refine the frequency of a scheduled task's execution:
1. In Task Scheduler, click Refresh to make sure your task is visible.
2. In the Active Tasks section, double-click Streaming Data Loader (Figure 0-13).
Task Scheduler
File Action View Help
B
Task Scheduler (Local)
fj. Task Scheduler Library
EH HotStart (last run succeeded at,,,
EB MP Scheduled Signature Updat...
~n;t~r fr, mn'mr,*
ill
Active tasks are tasks that are currently enabled and have not
Summary; 27 total
Task Name
ConfigNotification
I Streaming Data Loader
ScheduledDefrag
KernelCeipTask
Next Run Time
8/23/201110:00:00 AM
8/23/20114:44:03 PM
8/24/2011 2:30:53 AM
8/25/2011 3:30:00 AM
Last refreshed at 8/22/2011 4:41:54 PM
Actions
Task Scheduler (Local)
Connect to Another C,,,
Create Basic Task...
•>- Create Task...
Import Task,,,
[^ Display All Running T,,.
g Disable All Tasks Histo...
AT Service Account C...
View
JGj Refresh
B Help
Figure 0-13 Selecting a Scheduled Task
3. Click Action | Properties.
4. In the properties window that opens, click the Triggers tab.
5. Select the Daily trigger and click Edit.
6. Place a check next to Repeat task every. Leave the default of 1 hour selected.
7. Choose a duration of Indefinitely from the drop down list and click OK (Figure 0-14).
A-30
-------
Edit Trigger
Begin the task: On a schedule
Settings
Onetime
® Daily
Weekly
Monthly
Start: 8/22/2011 D' 4:44:03 PM jjj D Synchronize across time zones
Re^ur every: 1 days
Advanced settings
O Delay task for up to (random delay):
g] Regeat task every: 1 hour f for a duration of: L
HH Stop all running tasks at end of repetition duration
Stop task if it runs longer than:
3 days
D Expire 8/22/2012 I | 4:46:58 PM
Enabled
Synchronize across time zones
Cancel
Figure 0-14. Refining the Task Trigger
8. Click OK to close the properties window.
Congratulations! You are now a Streaming Data Loader expert! In practice, you would schedule your
task to run as frequently as your data files are updated.
With data loaded into the database, you will now view the data using ODM Tools.
For Advanced Participants
ODM Data Loader from the Command Line
To learn more about the ODM Data Loader, read the documentation at
http://his.cuahsi.org/odmdataloader.html. Did you know that the ODM Data Loader can be scripted?
Open the command prompt and give it a shot.
Work that SQL Magic
Familiar with SQL? Try out some queries to explore your database, or experiment in creating table
views. Or, just look around the various tables to get a better sense of what's in ODM. More on ODM can
be found in the documentation at http ://his .cuahsi .org/odmdatabases .html.
A-31
-------
Working with ODM Data
Now that data are loaded into an ODM database, how do we analyze it? Lucky for us, the ODM Tools
are specifically designed to query and visualize data within an ODM database. You will use the ODM
Tools to examine the contents of the database that you have just created.
To examine your data with the ODM Tools:
1. Open the ODM Tools. Notice how the tools automatically connect to the last database you
worked with.
The ODM Tools application opens with three tabs visible: Query, Visualize, and Edit. The Query tab is
selected by default. On this tab, you specify various filters to search for time series in the ODM database.
Take a moment to review the query options.
Our data are fairly uniform in nature, i.e., they all have the same data source, data type, sample medium,
etc. This limits the kinds of interesting queries we can perform with our data. But we can still query by
site, variable, number of observations, or time period.
Let's query for temperature data as described below.
2. Click the check box to Query by Variable (Figure 0-1). This enables you to choose a variable
from the variable list.
3. Select Temperature in the variables list by left-clicking on it (Figure 0-1).
4. Click the Query button in the bottom right corner (Figure 0-1).
The results of the query are shown at the bottom of the application window. You should see several items
there, where each one represents a time series of temperature at a particular site.
Tip
You can resize the ODM Tools window to show more query results at the bottom.
5. Find the time series of temperature for Main Lake (it's probably the first one in the list). Right-
click the data series (Figure 0-1). In the context menu, notice that you have options for plotting
graphs, editing the data, viewing and exporting metadata, and exporting the data series itself as
comma or tab delimited text.
6. In the context menu, click to View MetaData (Figure 0-1).
A-32
-------
ODM Tools
File Edit Took Help
Export Single Data
Expert Single MetaOita
Select for Group Ejport
Select AH
Select None
Figure 0-1 Querying data with the ODM Tools
Metadata for the time series are extracted from the database and transformed to XML (Figure 0-2)
Temp
Temperature
Not Applicable
degree celsius
Temperature
degC
Surface Water
Field Observation
Figure 0-2 Metadata exported from ODM using the ODM Tools
7. Right-click the data series again, and this time click to Export Single Data.
8. Save the file to disk at a location of your choosing.
A-33
-------
9. When the data export is complete, dismiss the message box and open the data file. There are
your data again, this time with internal identifiers and other pieces that the ODM Data Loader
filled in for you when loading data.
Tip
You can select additional items to be included in the data export by clicking Tools | Options.
10. Close the metadata file and exported data file when you are finished looking at them.
Now let's plot a graph of the data.
11. Right-click the Main Lake data series, and click Plot.
You are brought to the Visualize tab, where the ODM Tools plot a graph of the data. Information about
the data series you are plotting is shown at the bottom of the application window. The plot has a very
"spiky" nature to it. Let's take a closer look to see what's going on.
12. Using the left mouse button, click and hold the mouse button down to draw a box on the plot
around one of the "spikes" in the plot to zoom in to that section (Figure 0-3).
'M Tools
He Ed* Tools Help
Qwxy
Time Scraa Probably [ Haagaai \ Batt/Wtefca
19 -
It •
!»•
!'
£ n -
i
* * +
BM
Da« To Vaualie
Sla 1
Temp -
Select 9 D*> $e«3 tg Uew
•-i Use ceracred dala lULmmaiy
-! ttnabc*
aatabct
BCTObwvationi
SOfCwvKXWjOfc*
708
0
MhnxllcMean 1133
G*om«t.x Mean
9.781
MCKBTUTO t". /
Mmun:
Standard D9V»at»n :
CcHfictarC ri Vanatwn
27
6013
3621
Pert?enMM ICft
25*
(Median) 50%
7K
90X
5
e
92
17
20.5
General Q**jor/ SpscaUwi Vwdle Unu T«ws Scwxrt TmalME SnvteHeduii VW
WaterQu*y N*J^f*cjWe dcgreecebut 0 diy Surface Walff FxMC
Start CM*;
End DM*
7/W&
Sf 17/2007
UWalePfat
Figure 0-3 Left-click to zoom in on a plot in ODM Tools
Notice that several data points appear to occur at the same point in time (Figure 0-4). This is because
several measurements are taken at the same time, but at different depths within the water body.
A-34
-------
•;!i ODM Tools
File Edit Tools Help
Query
l-llnllxl
Visualize
Edit
Time Series
Probability Histogram Box/Whisker
19-MainLake
Data To Visualize
Site: 19-Main Lake
v
Variable:
Temp • Temperature
S elect a D ata S eries to View :
General Category Speciation Variable Units Time Support Time Units Sample Medium
Water Quality Not Applicable degree celcius 0 day Surf ace Water Fiel
Summary Plot Options
DUse censored data in
statistics.
Statistics
8 Of Observations :
B Of Censored Obs. :
Arithmetic Mean :
Geometric Mean :
Maximum :
Minimum:
Standard Deviation :
Coefficiant of Variation:
Percentiles 1 0%
25%
(Median) 5DK
75%
30%
Date Range
summary
70S
0
11.33
9.781
24.7
2.7
6.013
36.21
5
6
9.2
17
20.5
End Date:
9AI7./2007
Update Plot
Figure 0-4 Plot of temperature data measured at different depths below the water surface
13. Experiment with the other charting capabilities. You can change plot options, view summary
statistics, show a probability plot, show a histogram, and show a box/whisker plot. You can also
interact with the plot by clicking on it. Left-click and drag a box to zoom in, and right-click to
zoom out or copy and print the chart.
Finally, let's briefly take a look at the Edit tab. You won't be doing any editing for this workshop, but
you can at least get a sense of what you can do with the ODM Tools.
14. Click the Query tab. Your previous results are still visible at the bottom.
15. Right-click the Temperature time series at the bottom, and click Edit.
This brings you to the Edit tab and pulls up the time series you selected. Look at the options on the right.
You can change individual values or apply a data filter to perhaps look for outliers. Let's try that.
16. Click the option to set a Value Change Threshold (Figure 0-5). This option is useful for locating
values that differ greatly from the other values around it, which could indicate a sensor
malfunction, human error in observation, or some other anomaly.
17. Type a value for the Value Change Threshold that will capture some of your data, just to see
what happens. For my data, I used a value of 10 (Figure 0-5). Press ENTER after you have typed
in your value to confirm it. The Apply Filter button should now be enabled.
18. Click Apply Filter. Any values that match the filter will be highlighted in red in the plot (Figure
0-5). This shows us values that we may want to check for quality assurance.
A-35
-------
19-MiinLKktI
14 •
D
£20-
"
i ii-
». -'
I
|S 50 •
f S •
3
i
\
i
!
.-•'
\
/
J
11
a
i
IN
,
i
1-1 V 1W» 1533 3901 3094 Wfl
CUt>
4
•
•
BataToPtot
S4* . 19 - Mar, L*.*2
VMuBlD tWaV*« Vaijtf Accuracy LK* Pale and T« »
TOM
10*45
7015
2,7
3,1
68
! 5Wl99(211;45AiJ
S/7/19S21215P
V20/1W2250P
• 10447 - ' .?o?H
TOTS 144 '3S2ii$nl
10*46
1 7017
i:?n
701S
7019
*2?
8562
45
9.1
9.1
7.6
137
82
6,7
'
DemrtNewCWASsne* [|gj [|
«/VW$ai:ttP»
11*1993 nflOJ"
1VSn9S211JJ^
5^^1994 1i45F
6*1995 11:15 A
6*1996 1115 A
6^1995 11:15 A,
»
3HHH
Temp - Teetperatun
Select e 0*i Senes to E*
General Csiegwy Speoatan VaraMe Uhtt TewSuppcM
Water QuaMy Not ^pteabie dagwcebui 0 day
Before D8/17/2D07 W. 15 AM
Figure 0-5 Applying a data filter with the ODM Tools
Lastly, let's take a look at options for deriving new data series.
19. Click Derive New Data Series.
In the window that opens, you'll see options for applying an algebraic equation, using a daily aggregate
function, creating attributes to describe the output, and more (Figure 0-6). These functions are useful for
turning your raw information into knowledge products, like when gage height data for rivers are
converted to streamflow.
A-36
-------
! Derive A New Data Series
Derivation Information
Derive Method
O Create a Quality Controlled Data Series for Editing
Derive using a Smoothing Algorithm
Smoothing Window: minutes;
- n x
Method Description
Automatically generate a Method Description
Select an enisling Method Description
O Derive using a Daily Aggregate Function
Average
Create a new Method Description
O Derive using an Algebraic Equation
y = r "oi + r ~oi«+ r ^
Data Series Attributes
Site:
Variable
Variable: |
Time Support
Value: I
Units:
Units:
Source
Organization:
Specialion:
Citation:
Value Type:
Data Type:
Source Description:
Quality Control Level
Method: F
G eneral Category:
Sample Medium:
Derive New Data Series
Cancel
Figure 0-6 Window to derive a new data series in ODM Tools
In your case, because the quality control level is unknown, you'd first have to create a quality controlled
data series for editing before the other options would be enabled. However, editing data series is beyond
the scope of this introductory workshop, so we'll leave the ODM Tools for now.
20. Click Cancel to close the Derive A New Data Series window.
21. Close the ODM Tools.
Congratulations! You are now an expert in loading, querying, and visualizing data in an ODM database.
For some scientists, this may be all that is desired: a means of storing and working with hydrologic
observations data. However, there is often merit in sharing data with a larger community. That's where
the WaterOneFlow Web services and HIS Central come in.
Now that you have prepared an ODM database, you will publish the data with a WaterOneFlow Web
service to make it accessible online.
For Advanced Participants
Continue playing around with ODM Tools. Try some new queries and deeply explore the options for
plotting graphs.
A-37
-------
Publishing an ODM Database with WaterOneFlow
As far as your local setup goes, you are now ready for action with your ODM database. But what if you
want to share the data with others online? That's where WaterOneFlow Web services come in.
WaterOneFlow defines a standard set of queries and a standard output format for accessing data,
regardless of whether the data are accessed internally from an ODM database, some other database, or
even through another website. Additionally, WaterOneFlow provides a layer of security over your
database which makes it less susceptible to hackers than exposing the database itself with public access.
For those who have their own database format for storing data, they must write their own WaterOneFlow
Web service to publish their data in HIS. However, you're in luck! HIS includes a free WaterOneFlow
Web service specifically designed to work with an ODM database. This means you don't have to write a
single line of programming code for your service. You just have to set it up on your computer and tell it
to talk to your ODM database.
In this portion of the workshop, you'll download and install a WaterOneFlow Web service to work with
your ODM database. The main steps are:
1. Create a SQL Server account that your Web service will use to access your database.
2. Install the WaterOneFlow Web service on your computer.
3. Configure the service.
4. Check the result.
Creating the Webclient SQL Server Account
The WaterOneFlow Web service uses a SQL Server account to connect to your ODM database. This
account should have fewer privileges than the one you used for ODM Tools because WaterOneFlow only
needs to read data from the database. It does not perform any data creation, updating, or deletion. It is
also the public interface to your data, so restricting what the service can do to your database is generally a
good idea. In the steps below, you will set up the SQL Server account for WaterOneFlow to use. For this
workshop, the account is named "webclient." However, you are free to name the account whatever you
choose in your production environment.
To set up the webclient account in SQL Server:
1. If it is not already open, open and connect to SQL Server Management Studio.
2. In the Object Explorer on the left, expand Security | Logins (Figure 0-1).
Object Explorer
Connect'
E) Cj Databases
B Q Security
E) tJl Logins
Figure 0-1 Accessing logins for SQL Server
3. Right-click Logins and click New Login.
4. Type webclient as the name (Figure 0-2).
5. Select the SQL Server authentication option (Figure 0-2).
A-38
-------
6. Type webclient as the password (Figure 0-2). Normally you'd want a more secure password, but
for the purposes of this workshop, we're keeping things simple and easy to remember.
7. Uncheck the option to Enforce password policy (Figure 0-2). Again, we're choosing to keep
things simple for the workshop. You may want to enforce the SQL Server password policy for
your own installation.
_J login - New
Selct 1 a pjye
.^General
If ServwR**
Jf UwrMappng
** Sco/atfe*
*" Statin
a SQL Server aiiherlieaum
Paiswwd . ii«.«t>.
^
Figure 0-2 Creating a webclient SQL Server login
Note
Suppose you've already published one ODM database with WaterOneFlow, and have now created a second
database that you want to publish. Since the webclient account was created when you published the first
database, you do need to repeat those steps. You can start from the next step with User Mapping.
With the webclient login created, you will now add the MyWaterData database to the list of databases that
the login can access.
8. In the Select a page pane on the left, click User Mapping (Figure 0-3).
9. Place a check next to MyWaterData and click OK (Figure 0-3).
Figure 0-3 Allow the webclient to access the database
You're almost finished. The last thing to do is allow the webclient account to perform Select operations
on the database. The Web service needs this in order to properly query the database. You allow this
permission on the properties page for the database itself.
10. In the Object Explorer on the left, expand Databases until you see your MyWaterData database.
11. Right-click MyWaterData and click Properties.
A-39
-------
12. In the Select a page pane on the left, click Permissions (Figure 0-4). You should see the
webclient account in the list of users for the database.
13. In the list of permissions, scroll to the bottom and place a check in the Grant column for Select
(Figure 0-4).
14. Click OK to close the dialog (Figure 0-4). You may also close SQL Server Management Studio.
_i* Change Tracing
j? Bsended Piwssties.
Se
-------
The zip file contains several files related to WaterOneFlow. The Web service application is in the
WebApp folder. You'll copy this folder to the default folder for web applications on your computer.
5. From the zip file, copy the WebApp folder to C:\lnetpub\wwwroot.
6. Rename the WebApp folder to MyDataService. You can name the folder whatever you want,
but we'll use MyDataService for this exercise.
Now you'll use Internet Information Services (IIS) Manager to configure. This is a program you can use
to manage web applications on your server. To isolate WaterOneFlow from your other web applications,
you'll set up an application pool for WaterOneFlow to run in. This pool will use the Classic managed
pipeline mode since this is what WaterOneFlow was built to use.
To set up an application pool:
1. Open Internet Information Services (IIS) Manager.
2. In the Connections box on the left, expand the tree view until you see Applications Pools.
3. Right-click Applications Pools and click Add Application Pool (Figure 0-5).
Connections
AUSTIN
*ij|| CRWR-W
Rtan
Figure 0-5 Adding an application pool
4. In the Add Application Pool dialog, specify these options (Figure 0-6):
a. Name: WaterOneFlow
b. .NET Framework version: .NET Framework v2.0
c. Managed pipeline mode: Classic
Name:
WaterOneFlow
,NET Framework version;
.NET Framework V2.0.50727
Managed pipeline mode:
Classic
Start application pool immediately
OK
Cancel
Figure 0-6 WaterOneFlow application pool settings
5. Click OK to close the dialog.
Tip
If you are creating multiple instances of the WaterOneFlow web services on your server because you have
A-41
-------
multiple ODM databases, you can reuse the WaterOneFlow application pool that you just created for those
services.
Now you are ready to install the service and configure it.
To install the WaterOneFlow Web service:
1. In IIS, expand the tree view to Sites | Default Web Site | MyDataService. MyDataService is
currently just a folder that IIS has found. Next you'll tell IIS that this folder is actually a web
application.
2. Right-click MyDataService and click Convert to Application.
3. In the Add Application dialog, click Select (Figure 0-7).
4. In the Select Application Pool dialog, choose WaterOneFlow (Figure 0-7).
Site name Default Web Site
Path: /
Alias;
MyOMa Service
Example ulej
Phyjitjl p*th:
Psit-lhrough authentication
Connect a*-.
Ted Settings.-
Application pool
DefaultAppPool
DefavltAppPool
Clastic .MET AppPcol
ASP.NETv4J)
OK
Cancel
Figure 0-7 Selecting the WaterOneFlow application pool
5. Click OK to close the Select Application Pool dialog.
6. Click OK to close the Add Application dialog.
With the service installed, you'll next configure the service to tell it how to connect to your ODM
database.
To configure the WaterOneFlow Web service:
1. Left-click MyDataService to select it (Figure 0-8).
2. Under the ASP.NET icon group in the middle section of the IIS Manager window, double-click
Connection Strings to open the connection strings editor (Figure 0-8).
A-42
-------
t -« CRWR-WHTTJAKEW {AUSTtt
^J Application Poob
* •_ Sues
^ A Default Web Site
M>-DalaS*rvke
| '1 wtleroncflow
or
fifcee
/MyDataService Home
-SBfio
Gwupt^
ASP.NET
.NET
.MET .NET Error .NET .NET Profile
'5^
.NET Roles .NET Trust .MET Useri Apphtationl Connection
If .f!' Settings I Stnngs
MjthineKey P*g«and Prw.id«rs Scisign SUl* SMTP E-meil
Controls
DS
Figure 0-8 Accessing service connection strings
3. Double-click ODDB to set the database connection string.
4. Set these parameters in the Edit Connection String dialog:
a. Choose SQL Server.
b. Server: (local)
c. Database: MyWaterData
d. Credentials: Choose to Specify credentials.
5. Click Set to set the SQL Server credentials.
6. Set these credentials (Figure 0-9):
a. User name: webclient
b. Password: [same as the webclient password you established earlier]
c. Confirm password: [same as above]
User name:
webclient
Rassword:
Confirm password:
OK
Cancel
Figure 0-9 SQL Server credentials for WaterOneFlow to use
7'. Click OK to close the Set Credentials dialog. Your Edit Connection String dialog should now look
similar to the one in Figure 0-10.
A-43
-------
Name:
« SQLServer
ODDB
Server: (local)\SQLb
-------
0-11). These numbers will help keep your service distinct from that of other participants in this
workshop.
Name:
network
Value:
LakeChamplain27
OK
Cancel
Figure 0-11 Setting the network name
13. Click OK to close the Edit Application Setting dialog.
14. Similarly, change the Value of vocabulary to VTDEC.
With the service configured, you are now ready to test it.
Testing the Web Service in a Web Browser
You'll test the service by exploring it in a web browser and by using HydroExcel.
To test the service in a web browser:
1. In a web browser, navigate to http://localhost/MyDataService.
Tip"
The keyword "localhost" tells your browser to look on your own computer for the web page that you are
attempting to access.
The Web service displays this web page when accessed with a browser. Near the top of the page is a link
for Database Test Page that you can use to test the service's connection to your database.
2. Click Database Test Page.
In a moment, you should see some example data from your database (Figure 0-12). If you see an error
message, you can use the message to track down the cause of the error.
This should display up to 1 0 sites
SiteCode
19
25
34
36
7
SiteName
Main Lake2
Malletts Bay2
Northeast Arm2
Isle LaMotte (oft" Grand Isle)2
Port Henri' Segment2
Figure 0-12 WaterOneFlow database test page
Excellent! The Web service should now be up and running. Let's test the service using HydroExcel.
A-45
-------
Testing the Web Service with HydroExcel
At this point in the data publication process, you should be able to give someone the URL to your Web
service, and they should then be able to query data from it using any software that communicates with
Web services. CUAHSI HIS includes free software called HydroExcel that lets you access
WaterOneFlow Web services from within a Microsoft Excel spreadsheet. In this portion of the workshop,
you'll use HydroExcel to extract data from your Web service.
To test the Web service with HydroExcel:
1. In a web browser, navigate to http://his.cuahsi.org.
2. Under Quick Links on the right, click HydroExcel.
3. Click the link to download the Microsoft Office 2007/2010 version.
4. For this exercise, save HydroExcel to your Desktop.
5. Open the file when it has finished downloading. If prompted, click to enable editing and enable
content.
Note
HydroExcel requires the free HydroObjects software to be installed. This software was installed on the workshop
computers prior to the workshop. If you are working from a different computer, you can find the installation file
at http://his.cuahsi.org/hydroobiects.html.
The worksheets in HydroExcel call methods from a WaterOneFlow Web service to query data and write
the result into the spreadsheet. For an in depth tutorial on HydroExcel, see the software manual on the
HIS website. For this workshop, we'll just do a quick test to download a list of sites and what variables
they have, and also a time series for a given variable at a given site.
6. Activate the Data Source worksheet. (Click "Data Source" at the bottom of HydroExcel.)
On the Data Source worksheet, you tell HydroExcel which Web service you want to work with by
inputting the URL address of the service next to the box that says "WSDL Location". Some URLs are
already listed in the spreadsheet, but you will have to locate your own URL that points to the Web service
you just created. Since you're still using the same computer on which the Web service is installed, you
could just use the localhost URL. However, let's use the actual IP address of your computer since that's
how other people will be connecting to it.
7. Locate your IP address. You can find your IP address from the command line:
a. Click Start.
b. Type cmd.exe and press ENTER.
c. In the window that opens, type ipconfig and press ENTER.
d. Write down the numbers that appear to the right of IP Address. These numbers look
something like "129.116.104.171". Note that this address will be different for each
computer.
e. Close the command window.
From this point forward, if you see the text "[YOURJP]", please replace it with your IP address. For example, if you
are asked to enter the address "http://[YOUR_IP]/MyDataService", and your IP address is 129.116.104.171, please
enter "http://129.116.104.171/MyDataService".
A-46
-------
8. In a web browser, navigate to http://[YOUR_IP]/MyDataService.
9. Click the link for the 1.1 version of the service.
10. Click the link for Service Description. This takes you to the WSDL for your Web service.
Tip
The service WSDL (Web Services Description Language) is where your Web service defines what it can do and
how programs can interact with it. It is designed for programs to read, so don't worry if you can't make sense of
it. When a program accesses your Web service, it will read the WSDL and know exactly how to send requests to
it, and will also know what format of output to expect back.
11. From the address bar of your web browser, copy the URL of the WSDL and paste it into the cell
next to the cell that says WSDL Location on the Data Source worksheet of HydroExcel.
12. Activate the Series Catalog worksheet.
13. Change the option to Create and open KML file after download to TRUE.
14. Click Get Series Catalog.
After a moment, your spreadsheet is updated with information about the sites and variables measured at
the sites. Also, Google Earth opens to show the site locations.
15. Browse around Google Earth to see your site locations. Click on placemarks for sites to see
information associated with the sites.
16. Switch back to HydroExcel.
17. Dismiss the message box indicating that the download is complete.
Take a look at the information in the Series Catalog. You not only get information about the location of
sites in your database, but also the variables measured at those sites. Notice that the start date, end date,
and number of records of time series observations are included with each variable. You'll use this
information to download a time series for one of the sites.
18. Locate a site and variable for which you'd like to download time series data. For my
screenshots, I'm going to get temperature data at Main Lake, so you may want to do the same.
19. In the Series Catalog worksheet, right-click anywhere on the row for the site and variable that
you want.
20. In the context menu that opens, point to HydroExcel, and then click to download the time
series.
You are brought to the Time Series worksheet, where HydroExcel filled in the parameters to make the
request for the data, called the Web service, and populated the result in the spreadsheet.
21. Dismiss the message box indicating the download is complete.
Notice there are several measurements taken at a given datetime, but at different depths below the water
surface (Figure 0-13).
A-47
-------
Get Values Options
Site Code/Location
Variable Code
Start Date
End Date
Mlc~ore NoData Value
LakeChamFlain27 19
VTDECTernp
5/7/1992 11:45
9/17/200710:15
Get Values
DateTime
5/7/1992 11 45
S/7/1992 12:15
5/20/1992 14:50
5/20/1992 15:20
Value
I Variable Name Temperatur
I Units degree celc
T~ SampleMedium Surface WE
^^^^^T Site Name Main Lake
Latitude 44.471
Longitude -73299166
^^^^1 Data Source http://129.1
Obtained 5/15/2009 1
Ignore NoDataValue TRUE
Qualifier Offset Offset Unit
2-7 2 meter
3.1 100 meter
6.8 2 meter
4
103 meter
Figure 0-13 Time series returned from the Web service
Kudos to you! You've come a long way, and now have successfully made your data available online
using WaterOneFlow. Do you feel the magic in this moment? I certainly do. You're almost finished
with the entire data publication process. The last major step is to register your service with HIS Central
so that others can discover it.
For Advanced Participants
Fun with Pivot Tables
Have you worked with pivot tables in Excel before? They are nifty. Suppose you want to show
temperature data 50 feet below the water surface in the lake. On the Statistics and Charts worksheet, you
can drag Offset to the Report Filter box, and then choose "50" after clicking on the drop down arrow next
to Offset in the Pivot Table Field List. Experiment with other ways of summarizing the data with pivot
tables.
By the way, pivot tables and charts are a native part of Excel functionality, and aren't just limited to
HydroExcel. You may find pivot tables useful in other spreadsheets that you've created.
A-48
-------
Registering Your Service at HIS Central
Although your Web service is now online, there's still the question of how people will find out about it so
that they can use it. That's where HIS Central comes in. HIS Central is a special server maintained by
the HIS team which keeps a catalog of WaterOneFlow Web services. When you publish a
WaterOneFlow Web service, you should register it with HIS Central to make it discoverable. More than
just a listing of WaterOneFlow services, HIS Central performs these functions:
• Provides detailed information about your service, including contact information, abstract, and
areal extent of your sites.
• Supports translation from your variables to an ontology of common hydrologic concepts. This
facilitates easy searching for variables in your service, especially by those who aren't familiar
with what your service has to offer.
• Maintains a catalog of sites and variables available in all registered services, enabling fast
searching for data from multiple data providers.
The registration process involves these three key steps:
1. Add an entry at HIS Central for your Web service.
2. Tag variables from your service to the hydrologic concept ontology.
3. Check that your sites show up in HydroDesktop.
Once you've completed these steps, the entire publication process will be complete!
Workshop Links for HIS Central and HydroDesktop
Rather than register your service at the official HIS Central (http://hiscentral.cuahsi.org), you'll be using a special
sandbox version set up just for workshop use at http://water.sdsc.edu/hiscentralsb/. That way, we don't clutter
the production system with datasets that we're just using for demonstration purposes. When you're ready to
contribute to HIS with your own data and services, then please use the official HIS Central link.
Adding Your WaterOneFlow Web Service to HIS Central
For this portion of the workshop, you will add an entry for your Web service with HIS Central. You will
not only give HIS Central the URL to your service WSDL, but also supplementary information about the
service.
A-49
-------
To register a WaterOneFlow Web service with HIS Central:
1. In a web browser, navigate to the workshop HIS Central at http://water.sdsc.edu/hiscentralsb/.
2. At the top of the page, click the link to Login.
3. Log in with your credentials. If you currently in the HIS workshop, your instructor will provide
you with credentials. Otherwise, feel free to register your own free HIS Central account.
4. Once you are logged in, near the top of the page, click the link to Add Data Service.
5. Add service details (Figure 0-1):
a. To help distinguish your service from others in the workshop, the service title should be
unique among all registered services. Therefore, please enter "Lake Champlain
Workshop" (without quotes) and your computer number as the Service Title, e.g., Lake
Champlain Workshop 27.
b. Input the Network Name that you assigned to your service earlier, e.g.,
LakeChamplain27.
c. Input your WSDL address, i.e.,
http://[YOURJP]/MyDataService/cuahsi_l_l.asmx?WSDL, as the Service WSDL.
d. Click the link I have read and agree to the Data Service Agreement to see the data
service agreement. The link opens in a new window.
e. Be sure the box is checked next to the link for the data service agreement.
Register Data Service
Service Title: Lake Champlain Workshop 27
This will appear atop the page associated with your service.
Network Name: LakeChamplain27
This is a unique code associated with your service. The network name used when configuring your
webservice is appropriate. This value is unique across this system.
Service WSDL: http://l29.H6.l04.l7l/MyDataService/cuahsi_l_0.asmx?WSDL
Enter the full web URL to your webservice WSDL. This value is unique across this system.
W 1 have read and agree to the Data Service Agreement
Next»
Figure 0-1 Registration page for a new service at HIS Central
6. Click Next. This brings you to the Data Service Details page.
You now have an entry for your service in the system at HIS Central. However, there are still some steps
to take before you make it public. Let's continue editing details of this service.
7. About half-way down the page, click Edit Details.
This brings you to a page that lets you edit the description of your service. For the workshop, we'll only
be adding a few items, but if you are a fast typist, feel free to add more!
8. Add some info to the details page, such as the following:
A-50
-------
a. Organization: HIS Workshop
b. Name: Workshop Participant
c. Citation: Doe, Jane (2011). Water quality monitoring data from the Lake Champlain
Long-Term Monitoring Program. VT Department of Environmental Conservation.
d. Abstract: Here's a short abstract.
Note
When you register a service at HIS Central, please use the following citation format:
Name (Year). Type of Data. Network. Collecting Organization.
9. Be sure to check the box next to Is service public. If you don't check this box, sites from your
service will not show up in applications like HydroDesktop.
10. Click Update.
This brings you back to the Data Service Details page.
Now let's add images that will be associated with your service. You'll add a logo for your organization
that users will see when they view details for your service at HIS Central, and a small icon that represents
a site location that will appear in the HydroDesktop map.
11. Change the images for your service.
a. Click Change Images.
b. For the organization icon, browse to the Workshop\lmages folder on your computer,
and open the Organizationlcon.gif file.
c. Click Upload to upload the image once you have located it. The web page will be
updated with your image once the upload is complete.
d. For the map icon, browse to and add the Maplcon.jpg file, located in the same directory
as the organization image.
e. Click Upload to upload the image.
f. Click Back.
At this point, you could add additional contacts, links, and descriptions, which would show up as part of
your service's details. This is not required for the workshop. Therefore, your data are now ready to be
harvested.
What's harvesting, you ask? Recall that HIS Central keeps a catalog of all of your sites and variables,
which enables fast searching across all registered services. HIS Central creates this catalog by calling
various methods from your WaterOneFlow Web service. This is called data harvesting. When you
request a data harvest, an HIS Central administrator is notified and will trigger a harvest of your data. To
keep the system from being bogged down by numerous harvest requests, only an HIS Central
administrator can trigger a harvest. Harvesting is essential, because HIS Central must know what
variables are available in your service before you can tag those variables to concepts in the hydrologic
ontology.
Normally you would now click the button to Request Data Harvest. However, rather than wait for an
administrator to trigger the harvest, you will trigger the harvesting yourself. Your user account for the
workshop has been given sufficient privileges to trigger data harvests.
12. Trigger a harvest for your service.
A-51
-------
a. Near the top of the Data Service Details web page, click Harvest Data.
b. Click Begin Harvest. In a moment, a link will appear to view the harvest log.
c. Click the Harvest log link to view progress. The file will indicate that the harvest is
completed once it has completed. Refresh your browser to see any updates.
Now that the service is harvested, you can tag variables in the service to the ontology.
Tip
At this point, you may want to take a moment to stretch and relax. Sometimes it takes a few minutes for the
results of the harvest to be committed to HIS Central's catalog database.
13. Click My Data Services.
14. Click Details for your service.
15. Click List Variables. This brings you to a page showing variables found in your service (Figure
0-2). If you don't see your variables here, then the harvesting was not completed.
Map Variables
Variable Name VariableCode Units
Chlorophyll a VTDEC:Chla micrograms pe
Figure 0-2 List of variables for a service registered at HIS Central
16. Click Map Variables.
The HydroTagger opens, showing the CUAHSI HIS ontology. There are high level concepts in the
center, with branches increasing in specificity as they move out to the leaf concepts. The variables that
HIS Central found in your service are listed on the bottom left. Your mission here is to map your
variables to the appropriate leaf concept. Users search upon these concepts, not whatever wacky variable
names you used, when they search for data at HIS Central.
17. In your list of variables, click Select on the row for chlorophyll a. This adds that item to the
Variable text box (Figure 0-3).
18. In the view of the ontology, click and drag with the mouse to locate the chlorophyll a concept
leaf (Figure 0-3). It falls under Hydrosphere | Biological | Biological community | Pigment |
Chlorophyll | Chlorophyll a.
19. Double-click Chlorophyll a to add it to the Mapping text box at the bottom of the window
(Figure 0-3).
20. Click Map (Figure 0-3).
Tip
If you can't find chlorophyll in the ontology, type chlorophyll in the Search box and click Search. Small red arrows
will indicate keywords that match your search. Note that the Search box is only visible when the Variable and
Mapping text boxes are not populated.
A-52
-------
Slir Iree-ii crcded with torn
Variable Name
chlorophyll a
lemptratufe
nitrogen, iota)
phosphorus total
Code Medium
vtdecdila ?urfa
vtdKtemp s
vtdectn surface water s*)ert
vtdectp surface water **!*<*
Variable:
^ Chlorophyll A
Mapping:
Chlorophyll»
$oli
-------
This brings you to the public details page for your service. Notice that the service details you entered are
there, along with a map showing a box that defines the extent of your site locations (Figure 0-4).
I Lake Champlatn Workshop 27
HIS Workshop
Edit Service Details
rntfjftxi lie TW ITWrDMSWCVViaf® t 1
Conlatt: jiflvDoo
SUE
IMutK J'"»
Here's a *liwt a
Doe. Jane {2011} Water quality mooitonng data
from the Lake Champlam Lona-Term Monrtorino
Program VT Department ol tnvironmantal
ConsBiyaBon
Figure 0-4 Public view of registered service at HIS Central (your images and details may vary)
Outstanding! Your service is now registered with HIS Central, and you've seen how others can navigate
to HIS Central and discover your service. As a final check, let's find data from your service in
HydroDesktop.
Viewing Your Data in HydroDesktop
The final step in the publication process is to check that your service is properly registered by viewing
sites from your service in HydroDesktop. HydroDesktop is a desktop application for discovering,
accessing, and analyzing time series data from WaterOneFlow services. Unlike HydroExcel,
HydroDesktop doesn't require prior knowledge of your service URL before you can use it. Instead,
HydroDesktop searches HIS Central's catalog for locations of time series of interest, giving you access to
ALL publicly available services registered at HIS Central.
You'll use HydroDesktop to submit a query that should include sites from your Web service in the result.
If you see your sites, then the data publication process is completed!
To search for your sites in HydroDesktop:
1. Open HydroDesktop and create a new North America project.
As you can see, HydroDesktop is a Geographic Information System (GIS) integrated with CUAHSI-HIS.
Next you will tell HydroDesktop to use the sandbox HIS Central instead of the official version. Recall
that you registered your service with the sandbox HIS Central (the one for testing and workshops) and not
the official version.
2. In the ribbon, click the drop down arrow under the Search panel and then click Advanced
Settings (Figure 0-5).
A-54
-------
Q '
^^^^ Morr^
Cf*ph
^ +x -
Stjfch i Pan Zoom In Zoom
:
Advanced Setci
Figure 0-5 Accessing HydroDesktop advanced settings
3. In the Advanced Settings dialog, click Custom (Figure 0-6).
4. Enter the following URL into the box below Custom (Figure 0-6). This is the URL to the search
services operated by the sandbox HIS Central in which you registered your service.
http://water.sdsc.edu/hiscentralsb/webservices/hiscentral.asmx
5. Click OK (Figure 0-6).
Advanced Settings
-*
SpeoTythc HlSCenml URL
, HSO
0 Custom
s Kjcrrtn)) svni
Ovwmte
:*-,.-'
Figure 0-6 Setting the fflS Central URL
Now you will locate Lake Champlain. This is where your sites are located. An online basemap will be
useful in finding the location.
6. In the ribbon, in the Online Basemap panel, select ESRI World Topo from the drop down list.
7. Using the Zoom In tool, zoom in to the northeastern United States to find Lake Champlain on
the border with Canada between Vermont and New York.
8. In the Search panel, click the Select by drawing Rectangle tool (Figure 0-7).
9. Click and drag to draw a box around Lake Champlain (Figure 0-7).
A-55
-------
Figure 0-7 Specifying the area to search
10. In the Search panel, click the Keywords tab (Figure 0-8).
11. Type nitrogen in the keyword box (Figure 0-8). Keywords related to nitrogen are automatically
selected in the keyword list.
12. Click Nitrogen, total (Figure 0-8). This is one of the variables included in your service.
13. Click the Add button (Figure 0-8). The nitrogen keyword is added to the list of keywords to
search.
14. Click Run Search (Figure 0-8).
A-56
-------
fina Qpbara
Keywori! r>pe«
fiulfewteoers
Radii
total
_ r*te fl*
Mtregen suspend
Wngin. total
Hbogen.taWi
Mngen.totall
Mngen. total
MferOQttY; EfitSl •
tagen daselved*.
- Phoaphotus
Hy*at>*aMe (Sxxah
Hydrotytable P oka <
?hC3ph
-------
team in developing the next generation of HIS software and methods to further hydrologic information
science.
This concludes the workshop.
A-58
-------
Appendix A: Uninstallation Instructions
If you want to erase your footsteps from this workshop, here's how.
HIS Central
Only administrators can permanently delete services at HIS Central. The workshop instructor will delete
all services registered during the workshop within a few weeks after the workshop has concluded. If you
register another service in the future and would like it permanently deleted, feel free to email the
workshop instructor who will tell the HIS Central administrator to delete the service for you.
In the meantime, you can hide the service so that others cannot see it.
To hide a service at HIS Central:
1. Log in to HIS Central.
2. Click My Data Services.
3. Click Details for the service that you want to hide.
4. Click Edit Details.
5. Uncheck Is service public.
6. Click Update.
WaterOneFlow
To uninstall your WaterOneFlow Web service from your server, you will remove it from IIS and then
delete the files on your computer.
To uninstall WaterOneFlow:
1. Open IIS.
2. Expand the tree view on the left to find MyDataService.
3. Right-click your service and click Remove. Click Yes if prompted to remove the application.
4. Optional: If you have no other services using the WaterOneFlow application pool, then in the
tree view, click Application Pools, find WaterOneFlow, and click to Remove the WaterOneFlow
application pool.
5. Close IIS.
6. Open Windows Explorer.
7. Navigate to your MyDataService folder.
Deleting the files is a two-step process. First you delete the service files since they have a lock on
supporting files. Then you delete the supporting files.
To delete WaterOneFlow files:
1. In the MyDataService folder, delete all ASP.NET Server Page files. These are the files with the
.aspx or .asmx extension. (Hint: Sort by Type.)
2. Delete the MyDataService folder.
Streaming Data Loader Scheduled Task
To remove the scheduled task for the Streaming Data Loader:
1. Open Task Scheduler.
A-59
-------
2. In the list of Active Tasks, double-click Streaming Data Loader.
3. Click Actions | Delete. Click Yes to confirm. Close Task Scheduler.
ODM Database
To clean up your instance of SQL Server:
1. Log into SQL Management Studio.
2. In the Object Explorer, find your MyWaterData database.
3. Right-click MyWaterData and click Delete. Click OK to delete the database. This both detaches
your database and deletes the files from your computer.
4. Optional: If you have no other databases associated with your webclient SQL Server login, then
you can delete it.
a. In the Object Explorer, expand Security | Logins to find your webclient login.
b. Right-click webclient and click Delete. Click OK to delete the login.
5. Repeat step 4 for any other ODM-related logins that you created.
CUAHSI-HIS and Related Software
To uninstall HIS software:
1. Open Programs and Features.
2. Uninstall the following software:
a. CUAHSI HydroObjects
b. Google Earth
c. HydroDesktop
d. Microsoft SQL Server or SQL Server Express (please see your system administrator first
since this program would take a long time to reinstall if desired)
e. ODM Tools
f. ODM Data Loader
g. ODM Streaming Data Loader
h. Optional: Microsoft Office (please see your system administrator first since this program
is not free)
3. Close Programs and Features.
4. To uninstall HydroExcel, delete the spreadsheet file (HydroExcelllS.xIsb).
5. Delete the HydroExcelllS folder, located in the same directory as the HydroExcel spreadsheet
file. This folder contains KML generated during the Google Earth portion of the exercise.
A-60
-------
This page intentionally left blank.
A-61
-------
APPENDIX B:
CUAHSI Community Observations Data Model (ODM)
Version 1.1
Design Specifications
May 2008
David G. Tarboton1, Jeffery S. Horsburgh1, David R. Maidment2
Abstract
The CUAHSI Hydrologic Information System project is developing information technology infrastructure
to support hydrologic science. One aspect of this is a data model for the storage and retrieval of
hydrologic observations in a relational database. The purpose for such a database is to store hydrologic
observations data in a system designed to optimize data retrieval for integrated analysis of information
collected by multiple investigators. It is intended to provide a standard format to aid in the effective
sharing of information between investigators and to allow analysis of information from disparate sources
both within a single study area or hydrologic observatory and across hydrologic observatories and
regions. The observations data model is designed to store hydrologic observations and sufficient ancillary
information (metadata) about the data values to provide traceable heritage from raw measurements to
usable information allowing them to be unambiguously interpreted and used. A relational database
format is used to provide querying capability to allow data retrieval supporting diverse analyses. A
generic template for the observations database is presented. This is referred to as the Observations Data
Model (ODM).
Introduction
The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an
organization representing more than 100 universities and is sponsored by the National Science
Foundation to provide infrastructure and services to advance the development of hydrologic science and
education in the United States. The CUAHSI Hydrologic Information System (HIS) is being developed
as a geographically distributed network of hydrologic data sources and functions that are integrated using
web services so that they function as a connected whole. One aspect of the CUAHSI HIS is the
development of a standard database schema for use in the storage of point observations in a relational
database. This is referred to as the point Observations Data Model (ODM) and is intended to allow for
comprehensive analysis of information collected by multiple investigators for varying purposes. It is
intended to expand the ability for data analysis by providing a standard format to share data among
investigators and to facilitate analysis of information from disparate sources both within a single study
area or hydrologic observatory and across hydrologic observatories and regions. The ODM is designed to
store hydrologic observations with sufficient ancillary information (metadata) about the data values to
provide traceable heritage from raw measurements to usable information allowing them to be
unambiguously interpreted and used. Although designed specifically with hydrologic observation data in
mind, this data model has a simple and general structure that will also accommodate a wide range of other
data, such as from other environmental observatories or observing networks.
1 Utah Water Research Laboratory, Utah State University
2 Center for Research in Water Resources, University of Texas at Austin
B-l
-------
ODM uses a relational database format to allow for ease in querying and data retrieval in support of a
diverse range of analyses. Reliance on databases and tables within databases also provides the capability
to have the model scalable from the observations of a single investigator in a single project through the
multiple investigator communities associated with a hydrologic observatory and ultimately to the entire
set of observations available to the CUAHSI community. ODM is focused on observations made at a
point. A relational database model with individual observations recorded as individual records (an atomic
model) has been chosen to provide maximum flexibility in data analysis through the ability to query and
select individual observation records. This approach carries the burden of record level metadata, so it is
not appropriate for all variables that might be observed. For example, individual pixel values in large
remotely sensed images or grids are inappropriate for this model.
This data model is presented as a generic template for a point observations database, without reference to
the specific implementation in a database management system. This is done so that the general design is
not limited to any specific proprietary software, although we expect that implementations will take
advantage of capabilities of specific software. It should be possible to implement ODM in a variety of
relational database management systems, or even in a set of text tables or variable arrays in a computer
program. However, to take full advantage of the relationships between data elements, the querying
capability of a relational database system is required. By presenting the design at a general conceptual
level, we also avoid implementation specific detail on the format of how information is represented. See
the discussion of Dates and Times under ODM features below for an example of the distinction between
general concepts and implementation specific details.
Version Information
ODM has evolved from an initial design presented at a CUAHSI workshop held in Austin during March,
2005 (Maidment, 2005) that was then widely reviewed with comments being received from 22
individuals (Tarboton, 2005). These reviews served as the basis for a redesign that was presented at a
CUAHSI workshop in Duke during July, 2005 and presented as part of the CUAHSI HIS status report
(Horsburgh et al., 2005). Following this presentation of the design, the data model was reviewed and
commented on by a number of others, including the CLEANER (Collaborative Large-scale Engineering
Analysis Network for Environmental Research) cyberinfrastructure committee. Further versions of the
Observations Data Model were circulated in April, June and October 2006. These documented changes
made in the evolution of this design. The fundamental design, however, has not changed since the status
report presentation of the model (Horsburgh et al., 2005) but many table and field names have been
changed. Tables have also been added to give spatial reference information, metadata information, and to
define controlled vocabularies. Version 1.0 of ODM, which was the first release version of ODM, has
been implemented and tested within the WATERS network of test bed sites and was documented in
Water Resources Research (Horsburgh et al., 2008). This document describes the second release version
of the data model design, which has been named ODM Release Version 1.1, and has been so named to
correspond to the Version 1.1 release of the CUAHSI HIS. This document supersedes the previous
documents.
B-2
-------
In general, the following changes have been made for Version 1.1:
• All integer IDs serving as the primary key for tables in ODM have been changed to auto
number/identity fields.
• Text field lengths have been relaxed in some cases and have been standardized according to the
following scheme: codes = 50 characters, terms = 255 characters, links = 500 characters,
definitions/explanations = unlimited.
• Check constraints have been defined for the Latitude and Longitude fields in the Sites table.
• Check constraints have been added to many of the fields in ODM to constrain the characters that
are valid for those fields (see Appendix A for details).
• Relationships have been added between controlled vocabulary tables and the tables that contain
the fields that they define. This was done to more rigorously enforce the ODM controlled
vocabularies.
• Unique constraints were placed on both SiteCode in the Sites table and VariableCode in the
Variables table.
• The controlled vocabulary was relaxed on the QualityControlLevels table to allow more detailed
versioning of data series. A QualityControlLevelCode was also added to this table to facilitate
this.
• A Citation field was added to the Sources table to provide a place for a formal citation for data in
the database.
• A Speciation field was added to the Variables table. This provides a place to store information
about the speciation of chemistry observations. A SpeciationCV controlled vocabulary table was
added to define this field.
• An ODMVersion table was added to store the version number of the database.
• The SeriesCatalog table has been updated based on the addition of the above fields.
B-3
-------
Hydrologic Observations
Many organizations and individuals measure hydrologic variables such as streamflow, water quality,
groundwater levels, and precipitation. National databases such as USGS' National Water Information
System (NWIS) and USEPA's data Storage and Retrieval (STORET) system contain a wealth of data,
but, in general, these national data repositories have different data formats, storage, and retrieval systems,
and combining data from disparate sources can be difficult. The problem is compounded when multiple
investigators are involved (as would be the case at proposed CUAHSI Hydrologic Observatories) because
everyone has their own way of storing and manipulating observational data. There is a need within the
hydrologic community for an observations database structure that presents observations from many
different sources and of many different types in a consistent format.
Hydrologic observations are identified by the following fundamental characteristics:
• The location at which the observations were made (space)
• The date and time at which the observations were made (time)
• The type of variable that was observed, such as streamflow, water surface elevation, water quality
concentration, etc. (variable)
These three fundamental characteristics may be represented as a data cube (Figure 1), where a particular
observed data value (D) is located as a function of where it was observed (L), its time of observation (T),
and what kind of variable it is (V), thus forming D(L,T,V).
Time, T
D
Space, L
Variables, V
Figure 1. A measured data value (D) is indexed by its spatial location (L), its time of measurement (T),
and what kind of variable it is (V).
In addition to these fundamental characteristics, there are many other distinguishing attributes that
accompany observational data. Many of these secondary attributes provide more information about the
three fundamental characteristics mentioned above. For example, the location of an observation can be
expressed as a text string (i.e., "Bear River Near Logan, UT"), or as latitude and longitude coordinates
that accurately delineate the location of the observation. Other attributes can provide important context in
interpreting the observational data. These include data qualifying comments and information about the
organization that collected the data. The fundamental design decisions associated with the ODM involve
B-4
-------
choices as to how much supporting information to include in the database and whether to store (and
potentially repeat) this information with each observation or save this information in separate tables with
key fields used to logically associate observation records with the associated information in the ancillary
tables. Table 1 presents the general attributes associated with a point observation that we judged should
be included in the generic ODM design.
Table 1. ODM attributes associated with an observation
Attribute
Definition
Data Value
Accuracy
Date and Time
Variable Name
Speciation
Location
Units
Interval
Offset
Offset Type/
Reference Point
Data Type
Organization
Censoring
Data Qualifying
Comments
Analysis Procedure/
Method
Source
Sample Medium
Value Category
The observation value itself
Quantification of the measurement accuracy associated with the observation value
The date and time of the observation (including time zone offset relative to UTC and daylight savings
time factor)
The name of the physical, chemical, or biological quantity that the data value represents (e.g.
streamflow, precipitation, temperature)
For concentration measurements, the species in which the concentration is expressed (e.g., as N, or as
NO3, or as NH4)
The location at which the observation was made (e.g. latitude and longitude)
The units (e.g. m or m3/s) and unit type (e.g. length or volume/time) associated with the variable
The interval over which each observation was collected or implicitly averaged by the measurement
method and whether the observations are regularly recorded on that interval
Distance from a reference point to the location at which the observation was made (e.g. 5 meters
below water surface)
The reference point from which the offset to the measurement location was measured (e.g. water
surface, stream bank, snow surface)
An indication of the kind of quantity being measured (e.g. a continuous, minimum, maximum, or
cumulative measurement)
The organization or entity providing the measurement
An indication of whether the observation is censored or not
Comments accompanying the data that can affect the way the data is used or interpreted (e.g. holding
time exceeded, sample contaminated, provisional data subject to change, etc.)
An indication of what method was used to collect the observation (e.g. dissolved oxygen by field
probe or dissolved oxygen by Winkler Titration) including quality control and assurance that it has
been subject to
Information on the original source of the observation (e.g. from a specific organization, agency, or
investigator 3rd party database)
The medium in which the sample was collected (e.g. water, air, sediment, etc.)
An indication of whether the data value represents an actual measurement, a calculated value, or is
the result of a model simulation
Observations Data Model
The schema of the Observations Data Model is given in Figure 2. Appendix A gives details of each table
and each field in this generic data model schema. Appendix A serves as the data dictionary for the data
model and documents specific database constraints, data types, examples, and best practices. The primary
B-5
-------
table that stores point observation values is the DataValues table at the center of the schema in Figure 2.
Logical relationships between fields in the data model are shown and serve to establish the connectivity
between the observation values and associated ancillary information. Details of the relationships are
given in Table 2. Figure 2 shows each of the controlled vocabulary tables and their relationships to the
table containing the field that they define. Controlled vocabulary tables are highlighted with red headers.
In Figure 2, each of the mandatory fields is shown in bold text, whereas optional fields are shown in
regular text.
B-6
-------
Monitoring Site Locations
(M) SpatialReferencelD {PK}
(O) SRSID
(M) SRSName
(O) IsGeographic
(O) Notes
RefLatLong >
1.1 ~|_
O..-
1..1
ReferenceLocalCoords » g •
De
VerticalDatumCV 1 1 I
(M) Term
(O) Definition
~nesCV > g •
Sites
(M) SitelD {PK}
M) SiteCode
M) SiteName
M| Latitude
M) Longitude
|M) LatLongDatumID {FK}
(O) Elevation m
|O) VerticalDatum
;O) LocalX
(O) LocalY
(O) LocalProjectionID {FK}
(O) PosAccuracy_m
;O) State
;O) County
|O) Comments
D
1 Table
(M) Field cannot be null
(Q) Field can be Null
Controlled Vocabulary
(M) Field cannot be null
(O) Field can be Null
iagram Key
Cardinality
1 ..1 One and only one
1..* One or many
0..* Zero or many
0..1 Zeroorone
ODM Version
Data Sources
Sources
Observation Values
Locate >
< Generate
Units
Offsets
(M) SourcelD {PK}
(M) Organization
(M) SourceDescription
(O) SourceLink
(M) ContactName
(M) Phone
(M) Email
(M) Address
1 1 l(M)City
——I(M) State
(M) ZipCode
(M) Citation
:(M) MeladatalD (FK)
< Create
0..
DataValues
(M) ValuelD {PK}
(M) DataValue
(O) ValueAccuracy
(M) LocalDateTime
(M) UTCOffset
(M) DateTimeUTC
(M) SitelD {FK}
(M) VarlablelD {FK}
(O) OffsetValue
(O) OffsetTypelD {FK}
(M) CensorCode
(O) QualifierlD {FK}
(M) MethodID {FK}
(M) SourcelD {FK}
(O) SamplelD {FK}
(O) DerivedFromID
(M) QualityControlLevellD {FK}
Data Collection Methods
0 • (M) OffsetUnitsID {FK} o. 1
(M) OffsetDescription
(M) VariablelD {FK}
-1 ' (Ml DataValue
(M) MethodIO {PK}
(M) MethodDescription
(O) MethodLink
LabMethods
(M) LabMethodID {PK}
;M) LabName
M) LabOrganization
(M) LabMethodName
(M) LabMethodDescription
(O) LabMethodLink
Qualify
Data Qualifiers
QualityControlLevels
(M) QualityControlLevellD {PK}
(M) QualityControlLevelCode
(M| Definition
M) Explanation
SeriesCatalog
P) SeriesID (PK)
P) SilelD (FK)
P) SiteCode
;P) SiteName
P) VariablelD {FK}
P) VariableCode
P) VanableName
;P) Spetiatlon
P) VariableUnitsID {FK}
P| VariableunitsName
P! SampleMedium
P) ValueType
P) TimeSuppon
P) TimeUnilsID (FK)
P) TimeUnitsName
P! DalaType
P) GeneralCalegory
P| MelhodID {FK)
P) MethodDescriplion
PI SourcelD (FK(
P) Organization
P) SourceDescription
;PJ Citation
P) QualityControlLevellD {FK}
P) QualityControlLevelCode
P] BeginDaleTime
P) EndDateTime
P) BeginDateTimeuTC
P) EndDateTimeUTC
P} ValueCount
Figure 2. Observations Data Model schema.
-------
Table 2. Observations Data Model Logical Relationships
Relationships that
Table
DataValues
DataValues
DataValues
DataValues
DataValues
DataValues
DataValues
DataValues
define ancillary information about data values
Field Type Field
SitelD
VariablelD
OffsetTypelD
QualifierlD
MethodID
SourcelD
SamplelD
QualityControlLevellD
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
*<-> 1
SitelD
VariablelD
OffsetTypelD
QualifierlD
MethodID
SourcelD
SamplelD
QualityControlLevellD
Table
Sites
Variables
OffsetTypes
Qualifiers
Methods
Sources
Samples
QualityControlLevels
Relationships that
Table
DataValues
DataValues
define derived from groups
Field
DerivedFromID
ValuelD
Type
* <-> *
1 <-> *
Field
DerivedFromID
ValuelD
Table
DerivedFrom
DerivedFrom
Relationships that
Table
DataValues
GroupDescriptions
define groups
Field
ValuelD
GroupID
Type
1 <-> *
1 <-> *
Field
ValuelD
GroupID
Table
Groups
Groups
Relationships used to define categories for categorical data
Table Field Type
Variables
DataValues
VariablelD
DataValue
Relationships used to define the Units
Table Field
Units
Units
Units
UnitsID
UnitsID
UnitsID
1 <-> *
* <-> 1
Type
1<->*
1<->*
1<->*
Field
VariablelD
DataValue
Field
VariableUnitsID
TimeUnitsID
OffsetUnitsID
Table
Categories
Categories
Table
Variables
Variables
OffsetTypes
Relationship used
Table
LabMethods
to define the Sample Laboratory
Field
LabMethodID
Methods
Type
1<->*
Field
LabMethodID
Table
Samples
Relationships used to define the Spatial References
Table Field
SpatialReferences
SpatialReferences
SpatialReferencelD
SpatialReferencelD
Type
1<->*
1<->*
Field
LatLongDatumID
LocalProjectionID
Table
Sites
Sites
Relationship used
Table
IsoMetaData
to define the ISOMetaData
Field
MetadatalD
Type
1<->*
Field
Sources
Table
MetadatalD
-------
Relationships used
Table
VerticalDatumCV
SampleTypeCV
VariableNameCV
ValueTypeCV
DataTypeCV
SampleMediumCV
SpeciationCV
GeneralCategoryCV
TopicCategoryCV
CensorCodeCV
to define Controlled Vocabularies
Field Type
Term l<->*
Term K->*
Term l<->*
Term K->*
Term l<->*
Term K->*
Term l<->*
Term K->*
Term l<->*
Term K->*
Field
VerticalDatum
SampleType
VariableName
ValueType
DataType
SampleMedium
Speciation
GeneralCategory
TopicCategory
CensorCode
Table
Sites
Samples
Variables
Variables
Variables
Variables
Variables
Variables
ISOMetadata
DataValues
Relationship type is indicated as One to One (!<->!), One to Many (!<->*), Many to One (*<->!) and
Many to Many (*<->*). The first set of relationships defines the links to tables that contain ancillary
information. They are used so that only compact (integer) identifiers are stored with each data value and
thus repeated many times while the more voluminous ancillary information is stored to the side and not
repeated. The second set of relationships defines derived from groupings used to specify data values that
have been used to derive other data values. The third set of relationships defines logical groupings of data
values. The fourth set of relationships is used to specify the categories associated with categorical
variables. The fifth set of relationships is used to define the units. The sixth set of relationships associates
laboratory methods with samples. The seventh set of relationships associates sites with the Spatial
Reference System used to define the location. The eigth set of relationships associates project and dataset
level metadata with each data source. The last set of relationships defines the linkage between the
controlled vocabulary fields and the tables that stored the acceptable terms for those fields. Details of
how these relationships work are given in the discussion of features of the data model design below.
Features of the Observations Data Model Design
Geography
ODM is intended to be independent of the geographical representation of the site locations. The
geographic location of sites is specified through the Latitude, Longitude, and Elevation information in the
Sites table, and optionally local coordinates, which may be in a standard geographic projection for the
study area or in a locally defined coordinate system specific to a study area. Each site also has a unique
identifier, SitelD, which can be logically linked to one or more objects in a Geographic Information
System (GIS) data model. For example, Figure 3 depicts a one-to-one relationship between sites within
ODM and HydroPoints within the Arc Hydro Framework Data Model (Maidment, 2002) used to
represent objects in a digital watershed. In simple implementations, SitelD may have the same integer
value as the identifier for the associated GIS object, HydroID in this case. In more complex
implementations, and especially when multiple databases are merged into a single ODM, it may not be
possible to preserve the simple one-to-one relationship between SitelD and HydroID with each of these
fields holding the same integer identifier values. In these cases, where SitelD and HydroID are not the
same, a coupling table would be used to associate the ODM SitelDs used to identify sites with HydroIDs
in the Arc Hydro data model.
SitelD must be unique within an instance of ODM. This could, for example, be achieved by assigning
SitelDs from a master table. The linkage between SitelDs and GIS object IDs is intended to be generic
and suitable for use with any geographic data model that includes information specifying the location of
sites. For example, a linear referencing system on a river network, such as the National Hydrography
B-9
-------
Dataset, might be used to specify the location of a site on a river network. Addressing relative to specific
hydrologic objects through the SitelD field provides direct and specific location information necessary for
proper interpretation of data values. Information from direct addressing relative to hydrologic objects is
often of greater value to a user than the simple Latitude and Longitude information stored in the ODM
Sites table. For example, it is more useful to know that a stream gage is on such and such a stream rather
than simply its latitude and longitude.
Observations Data Model
Sites
SitelD
SiteCode
SiteName
Latitude
Longitude
LatLongDatumID
Elevation_m
VerticalDatumID
LocalX
LocalY
LocalProjectionlD
PosAccuracy_m
State
County
Comments
Arc Hydro Framework Data Model
CouplingTable
ComplexEdgeFeature
Simple JunctionFeatu re
HydroJunction
HydrolD
HydroCode
ReachCode
Name
LengthKm
LengthDown
FlowDir
HydroNetwork
FType
EdgeType
Enabled
— HydroEdge
O HydroPoint
Q Watershed
Waterbody
Figure 3. Arc Hydro Framework Data Model and Observations Data Model related through SitelD field
in the Sites table.
Series Catalog
A "data series" is an organizing principle used in ODM. A data series consists of all the data values
associated with a unique site, variable, method, source, and quality control level combination in the
DataValues table. The SeriesCatalog table lists data series, identifying each by a unique series identifier,
SeriesID. This table is essentially a summary of many of the tables in the ODM and is not required to
maintain the integrity of the data. However, it serves to provide a listing of all the distinct series of data
values of a specific variable at a specific site. By doing so, this table provides a means by which users
can execute most common data discovery queries (i.e., which variables have data at a site, etc.) without
the overhead of querying the entire DataValues table, which can become quite large.
The SeriesCatalog table is also intended to support CUAHSI Web Service method queries such as
GetSitelnfo, which returns information about a monitoring site within an instance of the ODM including
the variables that have been measured at that site. It should be noted that data series, as they are defined
here, do not distinguish between different series of the same variable at the same site but measured with
different offsets. If for example temperature was measured at two different offsets by two different
sensors at one site, both sets of data would fall into one data series for the purposes of the SeriesCatalog
B-10
-------
table. In these cases, interpretation or analysis software will need to specifically examine and parse the
offsets by examining the offset associated with each data value. The SeriesCatalog table does not do this
because its principal purpose is data discovery, which we did not want to be overly complicated. The
SeriesCatalog table should be programmatically generated and modified as data are added to the database.
Accuracy
Each data value in the DataValues table has an associated attribute called Value Accuracy. This is a
numeric value that quantifies the total measurement accuracy defined as the nearness of a measurement to
the true or standard value. Since the true value is not known, the ValueAccuracy is estimated based on
knowledge of the instrument accuracy, measurement method, and operational environment. The
ValueAccuracy, which is also called the uncertainty of the measurement, compounds the estimates of
both bias and precision errors. Bias errors are generally fixed or systematic and cannot be determined
statistically, while precision errors are random, being generated by the variability in the measurement
system and operational environment. Figure 4 illustrates the effects of these errors on a sample of
measurements. Bias errors are usually estimated through specially designed experiments (calibrations).
The precision errors are determined using statistical analysis by quantifying the measurement scatter,
which is proportional to the standard deviation of the sample of repeated measurements. The total error is
obtained by the root-sum-square of the estimates for bias and precision errors involved in the
measurement. Figure 5 gives another illustration of the ValueAccuracy concept based on the analogy of a
target, where the bulls eye at the center represents the true value.
ValueAccuracy is a data value level attribute because it can change with each measurement, dependent on
the instrument or measurement protocol. For example, if streamflow is measured using a V-notch weir, it
is actually the stage that is measured, with accuracy limited by the precision and bias of the depth
recording instrument. The conversion to discharge through the stage-discharge relationship results in
greater absolute error for larger discharges. Inclusion of the ValueAccuracy attribute, which will be blank
for many historic datasets because historically accuracy has not been recorded, adds to the size of data in
the ODM, but provides a way for factoring the accuracy associated with measurements into data analysis
and interpretation, a practice that should be encouraged.
TRUE VALUE AND
AVERAGE OF ALL
MEASUREMENTS
PARAMETER MEASUREMENT
a. Unbiased, precise, accurate
TRUE VALUE AND
AVERAGE OF ALL
MEASUREMENTS
PARAMETER MEASUREMENT
0
UJ
§
UJ
a.
f
TRUE
VALUE
/
\
AVERAGE OF ALL
MEASUREMENTS
PARAMETER MEASUREMENT
b. Biased, precise, inaccurate
AVERAGE OF ALL
MEASUREMENTS
PARAMETER MEASUREMENT
c. Unbiased, imprecise, inaccurate d. Biased, imprecise, inaccurate
Figure 4. Illustration of measurement error effect (Source: AIAA, 1995).
B-ll
-------
Bias
Accurate Low Accuracy Low Accuracy,
but precise
Figure 5. Illustration of Accuracy versus Precision (adapted from Wikipedia
http://en.wikipedia.org/wiki/Accuracy).
In designing ODM, consideration was given to the suggestion by some reviewers to record bias and
precision separately, in addition to ValueAccuracy for each data value. This has not been done at this
release in the interest of parsimony and also because quantifying these separate components of the error is
difficult. We suggest that for most measurements there should be the presumption that they are unbiased
and that ValueAccuracy quantifies the precision and accuracy in the judgment of the investigator
responsible for collecting the data. For cases where there is specific bias and precision information to
complement the ValueAccuracy attribute, this could be recorded in the ODM as a separate variable, e.g.
discharge precision, or temperature bias. The groups and derived from features (see below) could be used
to associate these variables with their related observations. For measurements that are known to be
biased, we suggest that the bias could be quantified by other reference measurements that should also be
placed in the database and that a new set of corrected measurements that have had the bias removed
should be added to the database at a higher quality control level. These new measurements should have a
lower ValueAccuracy value to reflect the improvement in accuracy by removal of the bias. The method
and derived from information for these corrected measurements should give the bias removal method and
refer to the data used to quantify and remove the bias.
Offset
Each record in the DataValues table has two optional fields OffsetValue and OffsetTypelD. These are
used to record the location of an observation relative to an appropriate datum, such as "depth below the
water surface" or "depth below or above the ground." The OffsetTypelD references an OffsetValue into
an OffsetTypes table that gives units and definition associated with the OffsetValue. This design only has
the capability to represent one offset for each data value. In cases (which we expect to be rare) when
there are multiple offsets (e.g. distance in from a stream bank and depth below the surface) one of the
offsets will need to be distinguished as a separate variable.
Spatial Reference and Positional Accuracy
Unambiguous specification of the location of an observation site requires that the horizontal and vertical
datum used for latitude, longitude, and elevation be specified. The SpatialReferences table is provided for
this purpose to record the name and EPSG code of each Spatial Reference System used. EPSG codes are
numeric codes associated with coordinate system definitions published by the OGP Surveying and
Positioning Committee (http://www.epsg.org/). A non-standard Spatial Reference System, such as, for
example, a local grid at an experimental watershed, may be defined in the SpatialReferences table Notes
field. The accuracy with which the location of a monitoring site is known is quantified using the
PosAccuracy_m field in the Sites table. This is a numeric value intended to specify the uncertainty (as a
B-12
-------
standard deviation or root mean square error) in the spatial location information (latitude and longitude or
local coordinates) in meters. Using a large number for PosAccuracy_m (e.g. 2000 m) accommodates
entry of data collected for a study area where the precise location where the observation was recorded is
not known.
The DerivedFrom and Groups tables fulfill the function of grouping data values for different purposes.
These are tables where the same identifier (DerivedFromID or GroupID) can appear multiple times in the
table associated with different Value IDs, thereby defining an associated group of records. In the
DerivedFrom table this is the sole purpose of the table, and each group so defined is associated with a
record in the DataValues table (through the DerivedFromID field in that table). This record would have
been derived from the data values identified by the group. The method of derivation would be given
through the methods table associated with the data value. This construct is useful, for example, to
identify the 96 15-minute unit streamflow values that go into the estimate of the mean daily streamflow.
Note that there is no limit to how many groups a data value may be associated with, and data values that
are derived from other data values may themselves belong to groups used to derive other data values (e.g.
the daily minimum flow over a month derived from daily values derived from 15 minute unit values).
Note also that a derived from group may have as few as one data value for the case where a data value is
derived from a single more primitive data value (e.g., discharge from stage). Through this construct the
ODM has the capability to store raw observation values and information derived from raw observations,
while preserving the connection of each data value to its more primitive raw measurement.
The GroupID relationship that appears in Table 2 is designated as one-to-many because there will be
many records in the Groups table that have the same GroupID, but different ValuelD, that serve to define
the group. In Figure 1, the Group relationship is labeled 1..*, at the DataValues table and 0..* at the
Groups table. This indicates that a group may comprise one or more data values and that a data value
may be included in 0 or more groups. Similarly, there will be many records in the DerivedFrom table that
have the same DerivedFromID, but different ValuelD that serve to define the group of data values from
which a data value is derived. Logically a data value should not be in a DerivedFrom group upon which
it is derived from. If this can be programmatically checked by the system, then this sort of circularity
error could be prevented.
The method description in the Methods table associated with a data value that has a DerivedFromID
should describe the method used for deriving the particular data value from other data values (e.g.
calculating discharge from a number of velocity measurements across a stream). The relationship
between the DataValues table DerivedFromID field and DerivedFrom table DerivedFromID field is
many-to-many (*<->*) because it can occur that the same group of data values is used to derive more
than one derived data value. In Figure 1, the AreDerivedFrom relationship between the data values and
DerivedFrom table actually depicts both relationships between these tables listed in Table 2. The
AreDerivedFrom relationship is labeled 1..* at the DataValues table and 0..* at the DerivedFrom table to
indicate that a derived from group may comprise 1 or more data values and that a data value may be a
member of 0 or more derived from groups.
Unambiguous interpretation of date and time information requires specification of the time zone or offset
from universal time (UTC). A UTCOffset field is included in the DataValues table to ensure that local
times recorded in the database can be referenced to standard time and to enable comparison of results
across databases that may store data values collected in different time zones (e.g. compare data values
from one hydrologic observatory to those collected at another hydrologic observatory located across the
country). A design choice here was to have UTCOffset as a record level qualifier because even though
the time zone, and hence offset, is likely the same for all measurements at a site, the offset may change
B-13
-------
due to daylight savings. Some investigators may run data loggers on UTC time, while others may use
local time adjusting for daylight saving time. To avoid the necessity to keep track of the system used, or
impose a system that might be cumbersome and lead to errors, we decided that if the offset was always
recorded, the precise time would be unambiguous and would reduce the chance for interpretation errors.
A field DateTimeUTC is also included as a record level attribute associated with each data value. This
provides a consistent time for querying and sorting data values. There is a level of redundancy between
LocalDateTime, UTCOffset and DateTimeUTC. Only two are required to calculate the third. For
simplicity and clarity we retain all three. A specific database implementation may choose to retain only
two and calculate the third on the fly. ODM data loaders should only require two of the quantities to be
input and should then calculate the third.
The separation of the date and time specification into two variables, LocalDateTime and UTCOffset, in
the generic conceptual model may be handled differently within specific implementations. In one specific
implementation these may be grouped in one text field in standard (e.g. ISO 8601) format such as YYYY-
MM-DDhh:mm:ss.sss:UTCOffset (e.g. 2006-03-2516:19:56.232:-?), while in another format the date and
time may be specified as the number of fractional days from an origin (e.g. Excel represents the above
date as the following number 38801.6805 and allows the user to specify the format for display) with
UTCOffset as a separate attribute. In general we expect specific implementations to take advantage of the
representation of date time objects provided by the implementation software, but to expose the
LocalDateTime and UTCOffset to users so that time may be unambiguously interpreted. In the
SeriesCatalog table, begin and end times for each data series are represented by the attributes
BeginDateTime, EndDateTime, BeginDateTimeUTC, and EndDateTimeUTC. The UTC offset may be
derived from the difference between the UTC and local times. Because local time may change (e.g. with
daylight savings) it is important during the derivation of the SeriesCatalog table that identification of the
first and last records be based on UTC time and that local times be read from the corresponding records,
rather than using a min or a max function on local times which can result in an error.
Support Scale
In interpreting data values that comprise a time series it is important to know the scale information
associated with the data values. Bloschl and Sivapalan (1995) review the important issues. Any set of
data values is quantified by a scale triplet comprising support, spacing, and extent as illustrated in Figure
6.
(a) Extent
O
O
(b) Spacing
O
(c) Support
i i
i i
i i
O
length or time length or time length or time
Figure 6. The scale triplet of measurements (a) extent, (b) spacing, (c) support (from Bloschl, 1996).
Extent is the full range over which the measurements occur, spacing is the spacing between
measurements, and support is the averaging interval or footprint implicit in any measurement. In ODM,
extent and spacing are properties of multiple measurements and are defined by the LocalDateTime or
DateTimeUTC associated with data values. We have included a field called Time Support in the
Variables table to explicitly quantify support. Figure 7 shows some of the implications associated with
support, spacing, and extent in the interpretation of time series data values.
B-14
-------
(a) spacing too large - noise (aliasing)
(b) extent too small - trend
(c) support too large - smoothing out
Figure 7. The effect of sampling for measurement scales not commensurate with the process scale: (a)
spacing larger than the process scale causes aliasing in the data; (b) extent smaller than the process scale
causes a trend in the data; (c) support larger than the process scale causes excessive smoothing in the data
(adapted from Bloschl, 1996).
The concepts of scale described here apply in spatial as well as time dimensions. However, TimeSupport
is only used to quantify support in the time dimension. The spatial support associated with a specific
measurement method needs to be given or implied in the methods description in the Methods table. The
next section indicates how time support should be specified for the different types of data.
Data Types
In the ODM, the following data types are defined.
Variables table.
These are specified by the DataType field in the
1. Continuous data - the phenomenon, such as streamflow, Q(t) is specified at a particular instant in
time and measured with sufficient frequency (small spacing) to be interpreted as a continuous
record of the phenomenon. Time support may be specified as 0 if the measurements are
B-15
-------
instantaneous, or given a value that represents the time averaging inherent in the measurement
method or device.
Sporadic data - the phenomenon is sampled at a particular instant in time but with a frequency
that is too coarse for interpreting the record as continuous. This would be the case when the
spacing is significantly larger than the support and the time scale of fluctuation of the
phenomenon, such as for example infrequent water quality samples. As for continuous data, time
support may be specified as 0 if the measurements are instantaneous, or given a value that
represents the time averaging inherent in the measurement method or device.
Cumulative data - the data represents the cumulative value of a variable measured or calculated
up to a given instant of time, such as cumulative volume of flow or cumulative precipitation:
t
V(t) = [ Q(T)dT , where i represents time in the integration over the interval [0,t]. To
o
unambiguously interpret cumulative data one needs to know the time origin. In the ODM we
adopt the convention of using a cumulative record with a value of zero to initialize or reset
cumulative data. With this convention, cumulative data should be interpreted as the accumulation
over the time interval between the date and time of the zero record and the current record at the
same site position. Site position is defined by a unique combination of SitelD, VariablelD,
OffsetValue and OffsetType. All four of these quantities comprise the unambiguous description
of the position of an observation value and there may be multiple time series associated with
multiple observation positions (e.g. redundant rain gauges with different offsets) at a location.
The time support for a cumulative value should be specified as 0 if the measurement of the
cumulative quantity is instantaneous, or given a value that represents the time averaging inherent
in the measurement of the cumulative value at the end of the period of accumulation.
Incremental data - the data value represents the incremental value of a variable over a time
interval At such as the incremental volume of flow, or incremental precipitation:
t+&t
AV(t) = \Q(r)dT . As for cumulative data, unambiguous interpretation requires knowledge of
t
the time increment. In the ODM we adopt the convention of using TimeSupport to specify the
interval At, or the time interval to the next data value at the same position if TimeSupport is 0.
This accommodates incremental type precipitation data that is only reported when the data value
is non-zero, such as NCDC data. Such NCDC data is irregular, with the interpretation that
precipitation is 0 if not reported unless qualifying comments designate otherwise. See example
E.4 below for an illustration of how NCDC precipitation data is accommodated in the ODM.
Average data - the data value represents the average over a time interval, such as daily mean
discharge or daily mean temperature: Q (t) = —. The averaging interval is quantified by
At
TimeSupport in the case of regular data (as quantified by the IsRegular field) and by the time
interval from the previous data value at the same position for irregular data.
Maximum data - the data value is the maximum value occurring at some time during a time
interval, such as annual maximum discharge or a daily maximum air temperature. Again
unambiguous interpretation requires knowledge of the time interval. The ODM adopts the
convention that the time interval is the TimeSupport for regular data and the time interval from
the previous data value at the same position for irregular data.
Minimum data - the data value is the minimum value occurring at some time during a time
interval, such as 7-day low flow for a year, or the daily minimum temperature. The time interval
is defined similarly to Maximum data.
Constant over interval data - the data value is a quantity that can be interpreted as constant over
the time interval to the next measurement.
B-16
-------
9. Categorical data - the data value is a categorical rather than continuous valued quantity.
Mapping from data values to categories is through the Categories table.
We anticipate that additional data types such as median, standard deviation, variance, and others may
need to be added as users work with ODM.
Data types 4 to 8 above apply to data values that occur over an interval of time. The date and time
reported and entered in to the ODM database associated with each interval data value is the beginning
time of the observation interval. This convention was adopted to be consistent with the way dates and
times are represented in most common database management systems. It should be noted that using the
beginning of the interval is not consistent with the time a data logger would log an observation value.
Care should be exercised in adding data to the ODM to ensure that the beginning of interval convention is
followed.
A considerable portion of hydrologic observations data is in the form of time series. This was why the
initial model was based on the Arc Hydro Time Series Data Model. The ODM design has not specifically
highlighted time series capabilities; nevertheless, the data model has inherited the key components from
the Arc Hydro Time Series Data Model to give it time series capability. In particular one variable
DataType is "Continuous," which is designed to indicate that the data values are collected with sufficient
frequency as to be interpreted as a smooth time series. The IsRegular field also facilitates time series
analysis because certain time series operations (e.g., Fourier Analysis) are predisposed to regularly
sampled data. At first glance it may appear that there is redundancy between the IsRegular field and the
DataType "Continuous," but we chose to keep these separate because there are regularly sampled
quantities for which it is not reasonable to interpret the data values as "Continuous." For example,
monthly grab samples of water quality are not continuous, but are better categorized as having DataType
"Sporadic." Note that ODM does not explicitly store the time interval between measurements, nor does it
indicate where a continuous series has data gaps. Both of these are required for time series analysis, but
are inherently not properties of single measurements. The time interval is the time difference between
sequential regular measurements, something that can be easily computed from date and time values by
analysis tools. The inference of measurement gaps (and what to do about them) from date and time
values we also regard as analysis functionality left for a Hydrologic Analysis System to handle.
In ODM, categorical or ordinal variables are stored in the same table as continuous valued 'real' variables
through a numerical encoding of the categorical data value as a 'real' data value. The Categories table
then associates, for each variable, a data value with an associated category description. This is a
somewhat cumbersome construct because real valued quantities are being used as database keys. We do
not see this as a significant shortcoming though, because typically, in our judgment, only a small fraction
of hydrologic observations will be categorical. The Categories table stores the categories associated with
categorical data values. If a Variable has a DataType that is "Categorical" then the VariablelD must
match one or more VariablelDs in Categories that define the mapping between DataValues and
Categories. The CategoryDescription field in the Categories table defines the category.
At first glance there may appear to be redundancy between the information in the Samples table and
Methods table. However, the samples table is intended to only be used where data values are derived
from a physical sample that is later analyzed in a laboratory (e.g., a water chemistry sample or biological
sample). The SamplelD that links into the Samples table provides tracking of the specific physical
sample used to derive each measurement and, by reference to information in the LabMethods table, the
B-17
-------
laboratory methods and protocols followed. The Methods table refers to the method of field data
collection, which may specify "how" a physical observation was made or collected (e.g., from an
automated sampler or collected manually), but is also used to specify the measurement method associated
with an in-situ measurement instrument such as a weir, turbidity sensor, dissolved oxygen sensor,
humidity sensor, or temperature sensor.
Each record in the DataValues table has an attribute called QualifierlD that references the Qualifiers
table. Each QualifierlD in the Qualifiers table has attributes QualifierCode and QualifierDescription that
provide qualifying information that can note anything unusual or problematic about individual
observations such as, for example, "holding time for analysis exceeded" or "incomplete or inexact daily
total." Specification of a QualifierlD in the DataValues table is optional, with the inference that if a
QualifierlD is not specified then the corresponding data value is not qualified.
Each data value in the DataValues table has an attribute called QualityControlLevellD that references the
QualityControlLevels table and is designed to record the level of quality control processing that the data
value has been subjected to at the level of data series. Quality control level is one of the attributes
(together with site, variable, method, and source) used to uniquely identify data series. Each quality
control level is uniquely identified by its QualityControlLevellD; however, each level also has a text
QualityControlLevelCode that, along with a Definition and Explanation, provides a more descriptive
encoding of the quality control level. The default quality control level system used by ODM applies
integer values between 0 and 4 (converted to text strings) as the QualityControlLevelCodes. Other
custom systems for QualityControlLevelCodes can be used (e.g., 0.1, 0.2 to represent raw data that is
progressing through a quality control work sequence, or text strings such as "Raw" or "Processed"). The
following 0-4 QualityControlLevelCode definitions are adapted from those used by other similar
systems, such as NASA, Earthscope and Ameriflux (e.g.
http://ilrs.gsfc.nasa.gov/reports/ilrs reports/9809 attach7a.htmL
http://public.ornl.gov/ameriflux/available.shtml accessed 3/6/2007) and are suggested so that CUAHSI
ODM is consistent with the practice of other data systems:
- QualityControlLevelCode = "0" - Raw Data
Raw data is defined as unprocessed data and data products that have not undergone quality control.
Depending on the data type and data transmission system, raw data may be available within seconds
or minutes after real-time. Examples include real time precipitation, stream/low and water quality
measurements.
- QualityControlLevelCode = "1" - Quality Controlled Data
Quality controlled data have passed quality assurance procedures such as routine estimation of timing
and sensor calibration or visual inspection and removal of obvious errors. An example is USGS
published stream/low records following parsing through USGS quality control procedures.
QualityControlLevelCode = "2" -Derived Products
Derived products require scientific and technical interpretation and include multiple-sensor data. An
example might be basin average precipitation derived from rain gages using an interpolation
procedure.
QualityControlLevelCode = "3" -Interpreted Products
These products require researcher (PI) driven analysis and interpretation, model-based interpretation
using other data and/or strong prior assumptions. An example is basin average precipitation derived
from the combination of rain gages and radar return data.
B-18
-------
QualityControlLevelCode = "4" -Knowledge Products
These products require researcher (PI) driven scientific interpretation and multidisciplinary data
integration and include model-based interpretation using other data and/or strong prior assumptions.
An example is percentages of old or new water in a hydrograph inferred from an isotope analysis.
These definitions for quality control level are stored in the QualityControlLevels table. These definitions
are recommended for use, but users can define their own quality control level system. The
QualityControlLevels table is not a controlled vocabulary, but specification of a quality control level for
each data value is required. Appendix B of this document provides a discussion of how to handle data
versioning in terms of quality control levels (using the levels defined above), data series editing, and data
series creation.
ODM has been designed to contain all the core elements of the CUAHSI HIS metadata system
(http://www.cuahsi.org/his/metadata.html) required for compliance with evolving standards such as the
draft ISO 19115. In its design, the ODM embodies much record, variable, and site level metadata.
Dataset and project level metadata required by these standards, such as TopicCategory, Title, and
Abstract are included in a table called ISOMetaData linked to each data source.
The Methods, Sources, LabMethods and ISOMetaData tables contain fields that can be used to store links
to source or reference information. At the general conceptual level of the ODM we do not specify how,
or in what form these links to references or sources should be implemented. Options include using URLs
or storing entire documents in the database. If external URLs are used it will be important as the database
grows and is used over time to ensure that links or URLs included are stable. An alternative approach to
external links is to exploit the capability of modern databases to store entire digital documents, such as an
html or xml page, PDF document, or raw data file, within a field in the database. The capability therefore
exists to instead have these links refer to a separate table that would actually contain this metadata
information, instead of housing it in a separate digital library. There is some merit in this because then
any data exported in ODM format could take with it the associated metadata required to completely
define it as well as the raw data upon which it is derived. However, this has the disadvantage of
increasing (perhaps substantially) the size of database file containing the data and being distributed to
users.
The following tables in the ODM are tables where controlled vocabularies for the fields are required to
maintain consistency and avoid the use of synonyms that can lead to ambiguity:
• CensorCodeCV
• DataTypeCV
• GeneralCategoryCV
• SampleMediumCV
• SampleTypeCV
• SpatialReferences
• SpeciationCV
• TopicCategoryCV
• Units
• ValueTypeCV
• VariableNameCV
B-19
-------
• VerticalDatumCV
The initial contents of these controlled vocabularies are specified in the Microsoft SQL Server 2005 blank
schema for the ODM. However, the ODM controlled vocabularies are dynamic. A central repository of
current ODM controlled vocabulary terms is maintained on the ODM Website at
http://water.usu.edu/cuahsi/odm/. together with the most recent version of the ODM SQL Server 2005
blank schema, this design specifications document, and other tools for working with ODM. Users can
submit new terms for the controlled vocabularies and can request changes to existing terms using
functionality available on the ODM website (http://water.usu.edu/cuahsi/odm/'). Functionality for
updating local controlled vocabulary tables with new terms from the central ODM controlled vocabulary
repository is provided in the ODM Tools software application, which is also available from the ODM
website. The CUAHSI HIS team welcomes input on the controlled vocabularies.
Examples
The following examples show the capability of ODM to store different types of point observations. It is
not possible in examples such as these to present all of the field values for all the tables. Because of this,
the examples present selected fields and tables chosen to illustrate key capabilities of the data model.
Refer to Appendix A for the complete definition of table and field contents.
-
./ «,,/
-------
JD«l«V«lies; Table
Valued I OalaValue | ValueAccuracy | LocalDaleTime | UTCQffsel | SilelD | VanablelO | MethodID [ DerivedFromlD
22 39831642
05/01/200600000000
OS/01/2006 0 00 00.000
05/01/2006 0:00 00.000
05/01/20060:1500.000
05/01/2006 0:15 00.000
05/01/20060.30.00.000
05/01/200603000000
05/01/2006 0:45:00.000
05/01/200604500.000
0501/20061:00.00.000
05/01/2006 1 15:00.000
05/01/2006 1 15:00.000
415
From ; T
Frc.mjfli
— f 1
3
4
S
6
7
ibte _ L '
Valued
> I
3
J
S
6
7
8 8
9
10
11
12
13
14
9
s
U
12
13
14
IS IS
16
17
16
17
: [MJ(7] f
I VariabteUniiaD I Samplrtiledium | Vi i87,pe
TimeSuppM ITimgUnatlO] DaliTtpe | L.emnalCalsaar) | NpDalaWte |-
Demei
F.eld <
0
5
-
Q
15
24
0
5 Continuous
5 Cwilmuous
Hydloktgic
Hydrologic
UltlMMtoul W»UrOu>Wj
.9099,
JUnHs
UnitsI
Table
i| UnilsName
Jeel
f 2 3ubtc feet pet second
T3*Mi!ligiams per tiler
4 Meters
5 Mmules
~t 6 Hours
Rea»*QlIT)l * ' 5D
UnrtsType
Length
Flow
Concentration
Length
Time
Time
•• of 8
onus
.'raiv-irkwi.1 tl "
ft
«"3/s
mj/L
m
mm
hr v
J Melhodt
Tabte
Methodll
MelrtQdDescnpltort
J-
•age height measured vvrth conttriuous data logger
Ischarge derived from water stage using site specific rating cuiw
laily average discharge derived from 15 minute continuous discharge values
4 Dissolved oxygen measured wilri a Hydrolab multiprobe field instrument v
Bemfil:(7r|fTl| 5 > (ff\f~ of 5 < >
Figure E.I. Excerpts from tables illustrating the population of ODM with streamflow gage height (stage)
and discharge data.
Streamflow - Daily Average Discharge
Daily average streamflow is reported as an average of continuous 15 minute interval data values. Figure
E.2 shows excerpts from tables illustrating the population of ODM with both the continuous discharge
values and derived daily averages. The record giving the single daily average discharge with a value of
722 ft3/s in the DataValues table has a DerivedFromID of 100. This refers to multiple records in the
DerivedFrom table, with associated Value IDs 97, 98, 99, ... 113 shown. These refer to the specific 15
minute discharge values in the DataValues table used to derive the average daily discharge. VariablelD
in the DataValues table identifies the appropriate record in the Variables table specifying that this is a
daily average discharge with units of ft3/s from UnitsID referencing in to the Units table. MethodID in
the DataValues table identifies the appropriate record in the Methods table specifying that the method
used to obtain this data value was daily averaging.
B-21
-------
J DauVaiiws : Table QiSJB
—
1
ValuelD
i
97
193
2
98
99
4
100
5
101
E
102
CKdrflTI
J Variable;
VanaUd
f
2 ^~
DataValue ValueAccuracy | LocalDaieTime | UTCOffset | SitelD | VariablelD | MelhodID | DerwerJFromlO
4.18 05/01/2006 OOC:C
743 05/01/20060.00.1
722 22.83631642 05/01/20060 00 (
4.18 05(01/20060:15:1
748 05/01/2006 0.15.1
4.17 05/01/20060:30:1
W.OOO -7
JO.OOO .71
B 000 -7
JO .000 -7
JO.OOO -7 T
JO.OOO -7
742 05/01/20060:30:00.000 -7
4.17 05/01/2006 045:00.000 -7
742 05/01/2006 0:45:00.000 -J
4,17 05/01/2006 1:00:00.000 -7
742 05/01/20061.00.00.000 -7
417 05/01/20061 1500000 -7
742 05/01/2006115-00.000 -7
« | T I * I MO*] of 415
1
Cp C
(
)
?
1
2
:T«bte
• vii;3r ;-::.:'j? j v.aii^c.'^.J*!^ •/iiisr t '.'iv! :. j 3:Hn-i^:,1*diL!i<
1 00065 Gage hetglil
hffipgO Dischaige
JOT60 Discharge. daily aveiage
•rtoSOO Di55oh»d«yg*neonc«n1(atipn
Reradi U<_LU I « ii_L»U»»J rf '
— ^.v^t- Fa
f 2\«
-------
J DataValues Table
ValuelD | DataValue | LocalDateTime
197
194 10 09/04/2003 14:00:00.000
195 10.13 09/04/2003 14:00:00.000
196 10.02 09/04/2003 14:00:00.000
9.28 09/04/2003 14:00:00.000
7.85 09/04/2003 14:00:00.000
6.68 09/04/2003 14:00:00.000
200 4.76 09/04/2003 14:00:00.000
201 4.49 09/04/200314:00:00.000
Record: fJT] < | T [Q]T]fI] of 41S
UTCOffset | SitelD [VariablelP I QffsetValue | OffselTypelD | MethodID | -•>
A
0.2
198
199
0 -7
Ol -7
0 -7
0' -7
o; -7
0 -7
0 -7
2
2
2
2
2
2
2
m —
4
4
4
4
4
4
4
1
2
|
4
:§
|
7
1
1 1
1
1
1
1
1
V
\
)
4
4
4l
4
4
4
VY-
} Vambl**
e I
I IsRe&iln [ l.meSup^.n TmgUnJslO) DauT
| General
l, | NoOalaV
Dissofred oxygen c
-9999'v
J GroupDejctiptions : Table
GroupID
GroupDescription
UnitsType [UnitsAbb
1 E;ho Reservoir Profile 9/4/2003 „
Record; (TTIfTi | 2 > (TT1 > r of 2
per liter Mass Per Volume mg/L
I Length m
T [ZEDS of i
J Groups : Tabl>
GroupID
ValuelD |'»
194
195
196
197
198
199
\ 1 / 200
_j \jy 201'
RKOTd: [TTTT] I is" > (TT
OffsetDescriplion
4 D^pth below water surface
of 2
MethodDeschption
4 Qssolved oxygen measured with 3 Hydrolab mulliprobe field inslrumenl
Record: [KL*'' I 1
Figure E.3. Excerpts from tables illustrating the population of ODM with water chemistry data.
NCDC Precipitation Data
Figure E.4 illustrates the representation of NCDC 15 minute precipitation data by ODM. The data
includes 15 minute incremental data values as well as daily totals. Separate records in the Variables table
are used for the 15 minute or daily total values. These data are reported at irregular intervals and only
logged for time periods for which precipitation is non zero. This is accommodated by setting the
IsRegular attribute associated with the variable to "False" and specifying the TimeSupport value as 15 or
24 and the TimeUnits as "Minutes" or "Hours". The DataType of "Incremental" is used to indicate that
these are incremental data values defined over the TimeSupport interval. The convention for incremental
data (see above) is that when the time support is specified, it specifies the increment for irregular
incremental data. When time support is specified as 0 it means the increment is from the previous data
value at the same site position. Data qualifiers indicate periods where the data is missing. The method
associated with each precipitation variable documents the convention that zero precipitation periods are
not logged in this data acquired from NCDC. A data qualifier is also used to flag days where the
precipitation total is incomplete due to the record being missing during part of the day.
B-23
-------
J n.it.iVa
ValuelD | DalaValue | LotalDateTime | UTCOItegl
0 02(01/20030:1500000
01 02*1/2003233000000
0.1 02/02/2003 0.00.00 000
0 1 02*2/2003 1 30 00 000
0,1 02(02/20037.00:00000
02 02*2/200313:4500000
01 02*2/2003163000000
0 5 02C3/20D3 0 00 00 000
-9999 02*3/20038:30.00000
-9999 02*3/2003 10:00:00.000
0.1
01
01
0,1
01
0.3
2iD3/2003 23 45 00 OCO
02*4/200300000000
2*4/200313:3000000
2*4/2003 18:00:00.000
2*4/200321:15:00000
02(05/2003 0:00 00 000
tf «5
Table
Method Descnption
eciprtatron from a lipping bucket (jage 0 values not logge
lly precipitation from a tipping bucket gage. 0 values not
of« < i
_ Variabtei: Tabh
jgged.
>
-
UnilsJUl UnilsName 'A
5 Sjinutes
7 Ir ches
mrd: [14 |
-
Variable!!'
VariableName
| VanableUnilsID | SaimpleMsdium [ ValgeType | IsRegular | TimeSupporl |TimeUnilsl l[ DalaTyne
I ft
Lcve! <
6 Precip
7 Frecirj
(alive to land surface (down negative)
ation
ation
• ! Ground Walei
7
<
Field Observation
Fiolrl Oh^prvarmn
Field Obseivation
D
a
n
0
16
24
Inslanlaneous Hydrologic
Srkcremenlal Climate
S llcremental Climate
J QuaHfwrs : Ti bte
QualifieiDescriptiiin
BJteill
1\Jnly used for day 1, hour 0015 when precipitation is zero
2 Begin missing period during the 15 minute period (inclusive).
3 and missing period during Ihe 15 minute penod (inclusive)
4^complete or Inexact daily total occurring Value is nol a true 24-hour amount
Incomplete or Inexact daily total occurring.
Value is not a true 24-hour amount. One or
more periods are missing and/or an
accumulated amount has begun but not ended
during the daily period.
Figure E.4. Excerpts from tables illustrating the population of the ODM with NCDC Precipitation Data.
Groundwater Level Data
The following is an example of how groundwater level data can be stored in ODM. In this example, the
data values are the water table level relative to the ground surface reported as negative values. This
example shows multiple data values of a single variable at a single site made by a single source that have
been quality controlled as indicated by the QualityControlLevellD field in the QualityControlLevels
table. The Site ID field in the DataValues table indicates the site in the Sites table that gives the location
information about the monitoring site. In this case, the elevation is with respect to the NGVD29 datum as
indicated in the VerticalDatum field, and latitude and longitude are with respect to the NAD27 datum as
indicated in the SpatialReferences table. The VariablelD field in the DataValues table references the
appropriate record in the Variables table indicating information about the variable. The SourcelD field in
the DataValues table references the appropriate record in the Sources table giving information about the
source of the data.
B-24
-------
—
SitelO | SiteCode
1 10109000
^-8,492613
( 3*4109111522101
R«w>a?ijai r
| SiteName
I
Logan River Above State Dam Near Logan. Utah
Echo Reservoir above Dam 01
(A-11-1)18ddd-1
Latitude |
41 74333
40964167
416858
Longitude | LatLongDatumID | Elevation m | VerticalDatum
-11178194 1
•111.427667 1
-111.8725 1
1427 NGVD29
1753 NGVD29
1365 NGVD29
I State | County |*
Utah Cache
Utah Summit
Utah Cache v
>
203
209
210
211
212
213
214
215
216
217
218
219
220
221
Record: fJT|
elD | DataValue | LocalDaleTitne | UTCOffsel | SitelDl VariableiaTsourcelD | QualityConlrolLevellD
-3.03
-3.64
-5
-7.1
-8.25
-8.2
-7.8
03/05/1936 0:00:00.000
05/09/1936 0:00:00.000
06/26/1936 0:00:00.000
08/13/19360:00:00000
10/11/1936000:00.000
12/14/19360:00,00.000
01/06/19370:00:00.000
-7
-7
-7
-7
-7
-7
-7
•7.5
-6.6
-6.2
-7.75
•8.35
-8,25
-8.1
01/17/19370:00:00.000
03/12/19370:00:00.000
05/12/1937 0:00:00.000
08^)6/1937 0:00:00.000
09/30/1937 0:00:00.000
11/02/19370:00:00.000
12/16/19370:00:00.000
-7
-7
-7
-7
1
: Hi.ililyti.iit!
VanablsName
Level relatrve to land surface (down negative)
[ Vanafale Unit siD | SampleMediurn |
1 Ground Water Field
OualityConti ilLevellD| QualityControlLevelCodej
* 11 BS
33
44
Raw data
Quality controlled data
Derived products
Interpreted products
Knowledge products
-
J Sources Table
SourceDescriplion
SourceLink
Record:
1 LSGS United States Geolofical Survey Data Retrieved from the National Water Information System hllp //walerdata usg$ gov/nwis
TArtahDWQ Utah Division of Water Quality Data Retrieved from the EPA STORE! Repository htlp //www epa gov/STORET/ >
3 ITTMKF*] of 3 <| >
Figure E.5. Excerpts from tables illustrating the population of the ODM with irregularly sampled
groundwater level data.
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant
Nos. EAR 0412975 and 0413265. Any opinions, findings and conclusions or recommendations
expressed in this material are those of the authors and do not necessarily reflect the views of the
National Science Foundation (NSF).
B-25
-------
References
AIAA, (1995), Assessment of Wind Tunnel Data Uncertainty. American Institute of Aeronautics and
Astronautics: AIAA S-071-1995.
Bloschl, G., (1996), Scale and Scaling in Hydrology, Habilitationsschrift. Weiner Mitteilungen Wasser
Abwasser Gewasser, Wien, 346 p.
Bloschl, G. and M. Sivapalan, (1995), "Scale Issues in Hydrological Modelling: A Review," Hydrological
Processes. 9(1995): 251-290.
Horsburgh, J. S., D. G. Tarboton and D. R. Maidment, (2005), "A Community Data Model for
Hydrologic Observations, Chapter 6," in Hydrologic Information System Status Report. Version
1, Edited by D. R. Maidment, p. 102-135, http://www.cuahsi.org/his/docs/HISStatusSeptl5.pdf
Horsburgh, J. S., D. G. Tarboton, D. R. Maidment, and I. Zaslavsky, (2008), A relational model for
environmental and water resources data, Water Resources Research, Vol. 44, W05406,
doi: 10.1029/2007WR006392.
Maidment, D. R, ed. (2002), Arc Hydro GIS for Water Resources. ESRI Press, Redlands, CA, 203 p.
Maidment, D. R., (2005), "A Data Model for Hydrologic Observations." Paper prepared for presentation
at the CUAHSI Hydrologic Information Systems Symposium, University of Texas at Austin.
March?, 2005.
Tarboton, D. G., (2005), "Review of Proposed CUAHSI Hydrologic Information System Hydrologic
Observations Data Model." Utah State University. May 5, 2005.
B-26
-------
Appendix A. Observations Data Model Table and Field Structure
The following is a description of the tables in the observations data model, a listing of the fields contained
in each table, a description of the data contained in each field and its data type, examples of the
information to be stored in each field where appropriate, specific constraints imposed on each field, and
discussion on how each field should be populated. Values in the example column should not be
considered to be inclusive of all potential values, especially in the case of fields that require a controlled
vocabulary. We anticipate that these controlled vocabularies will need to be extended and adjusted.
Tables appear in alphabetical order.
Each table below includes a "Constraint" column. The value in this column designates each field in the
table as one of the following:
Mandatory (M) - A value in this field is mandatory and cannot be NULL.
Optional (O) - A value in this field is optional and can be NULL.
Programmatically derived (P) - Inherits from the source field. The value in this field should be
automatically populated as the result of a query and is not required to be input by the user.
Additional constraints are documented where appropriate in the Constraint column. In addition, where
appropriate, each table contains a "Default Value" column. The value in this column is the default value
for the associated field. The default value specifies the convention that should be followed when a value
for the field is not specified. Below each table is a discussion of the rules and best practices that should
be used in populating each table within ODM.
Table: Categories
The Categories table defines the categories for categorical variables. Records are required for variables
where DataType is specified as "Categorical." Multiple entries for each VariablelD, with different
DataValues provide the mapping from DataValue to category description.
Field Name
VariablelD
DataValue
CategoryDescription
DataType
Integer
Real
Text (Unlimited)
Description
Integer identifier that references the
Variables record of a categorical
variable.
Numeric value that defines the category
Definition of categorical variable value
Examples
45
1.0
"Cloudy"
Constraint
M
Foreign
key
M
M
The following rules and best practices should be used in populating this table:
1. Although all of the fields in this table are mandatory, they need only be populated if categorical
data are entered into the database. If there are no categorical data in the DataValues table, this
table will be empty.
2. This table should be populated before categorical data values are added to the DataValues table.
Table: CensorCodeCV
The CensorCodeCV table contains the controlled vocabulary for censor codes. Only values from the
Term field in this table can be used to populate the CensorCode field of the DataValues table.
B-27
-------
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for CensorCode.
Definition of CensorCode controlled
vocabulary term. The definition is
optional if the term is self explanatory.
Examples
"It", "gt",
"nc"
"less than",
"greater
than", "not
censored"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: DataTypeCV
The DataTypeCV table contains the controlled vocabulary for data types. Only values from the Term
field in this table can be used to populate the DataType field in the Variables table.
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for DataType.
Definition of DataType controlled
vocabulary term. The definition is
optional if the term is self explanatory.
Examples
"Continuous"
"A quantity
specified at a
particular
instant in time
measured with
sufficient
frequency
(small
spacing) to be
interpreted as
a continuous
record of the
phenomenon."
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
0
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: DataValues
The DataValues table contains the actual data values.
B-28
-------
Field Name
ValuelD
DataValue
ValueAccuracy
LocalDateTime
UTCOffset
DateTimeUTC
SitelD
VariablelD
OffsetValue
OffsetTypelD
CensorCode
QualifierlD
MethodID
Data Type
Integer
Identity
Real
Real
Date/Time
Real
Date/Time
Integer
Integer
Real
Integer
Text (50)
Integer
Integer
Description
Unique integer identifier for each
data value.
The numeric value of the
observation. For Categorical
variables, a number is stored here.
The Variables table has DataType
as Categorical and the Categories
table maps from the DataValue
onto Category Description.
Numeric value that describes the
measurement accuracy of the data
value. If not given, it is interpreted
as unknown.
Local date and time at which the
data value was observed.
Represented in an implementation
specific format.
Offset in hours from UTC time of
the corresponding LocalDateTime
value.
Universal UTC date and time at
which the data value was observed.
Represented in an implementation
specific format.
Integer identifier that references the
site at which the observation was
measured. This links data values to
their locations in the Sites table.
Integer identifier that references the
variable that was measured. This
links data values to their variable in
the Variables table.
Distance from a datum or control
point to the point at which a data
value was observed. If not given
the OffsetValue is inferred to be 0,
or not relevant/necessary.
Integer identifier that references the
measurement offset type in the
OffsetTypes table.
Text indication of whether the data
value is censored from the
CensorCodeCV controlled
vocabulary.
Integer identifier that references the
Qualifiers table. If Null, the data
value is inferred to not be qualified.
Integer identifier that references
method used to generate the data
value in the Methods table.
Example
43
34.5
4
9/4/2003
7:00:00
AM
-7
9/4/2003
2:00:00
PM
o
J
5
2.1
3
"nc"
4
o
J
Constraint
M
Unique
Primary key
M
O
M
M
M
M
Foreign key
M
Foreign key
O
O
Foreign key
M
Foreign key
0
Foreign key
M
Foreign key
Default
Value
NULL
NULL =
No Offset
NULL =
No Offset
"nc" = Not
Censored
NULL
0 = No
method
specified
B-29
-------
Field Name
SourcelD
SamplelD
DerivedFromID
Quality Co ntrolLevellD
Data Type
Integer
Integer
Integer
Integer
Description
Integer identifier that references the
record in the Sources table giving
the source of the data value.
Integer identifier that references
into the Samples table. This is
required only if the data value
resulted from a physical sample
processed in a lab.
Integer identifier for the derived
from group of data values that the
current data value is derived from.
This refers to a group of derived
from records in the DerivedFrom
table. If NULL, the data value is
inferred to not be derived from
another data value.
Integer identifier giving the level of
quality control that the value has
been subjected to. This references
the QualityControlLevels table.
Example
5
7
5
1
Constraint
M
Foreign key
O
Foreign key
O
M
Foreign key
Default
Value
NULL
NULL
-9999
Unknown
The following rules and best practices should be used in populating this table:
1. ValuelD is the primary key, is mandatory, and cannot be NULL. This field should be
implemented as an autonumber/identity field. When data values are added to this table, a unique
integer ValuelD should be assigned to each data value by the database software such that the
primary key constraint is not violated.
2. Each record in this table must be unique. This is enforced by a unique constraint across all of the
fields in this table (excluding ValuelD) so that duplicate records are avoided.
3. The LocalDateTime, UTCOffset, and DateTimeUTC must all be populated. Care must be taken
to ensure that the correct UTCOffset is used, especially in areas that observe daylight saving time.
If LocalDateTime and DateTimeUTC are given, the UTCOffset can be calculated as the
difference between the two dates. If LocalDateTime and UTCOffset are given, DateTimeUTC
can be calculated.
4. Site ID must correspond to a valid Site ID from the Sites table. When adding data for a new site to
the ODM, the Sites table should be populated prior to adding data values to the DataValues table.
5. VariablelD must correspond to a valid Variable ID from the Variables table. When adding data
for a new variable to the ODM, the Variables table should be populated prior to adding data
values for the new variable to the DataValues table.
6. OffsetValue and OffsetTypelD are optional because not all data values have an offset. Where no
offset is used, both of these fields should be set to NULL indicating that the data values do not
have an offset. Where an OffsetValue is specified, an OffsetTypelD must also be specified and it
must refer to a valid OffsetTypelD in the OffsetTypes table. The OffsetTypes table should be
populated prior to adding data values with a particular OffsetTypelD to the DataValues table.
7. CensorCode is mandatory and cannot be NULL. A default value of "nc" is used for this field.
Only Terms from the CensorCodeCV table should be used to populate this field.
8. The QualifierlD field is optional because not all data values have qualifiers. Where no qualifier
applies, this field should be set to NULL. When a QualifierlD is specified in this field it must
refer to a valid QualifierlD in the Qualifiers table. The Qualifiers table should be populated prior
to adding data values with a particular QualifierlD to the DataValues Table.
B-30
-------
9. MethodID must correspond to a valid MethodID from the Methods table and cannot be NULL. A
default value of 0 is used in the case where no method is specified or the method used to create
the observation is unknown. The Methods table should be populated prior to adding data values
with a particular MethodID to the DataValues table.
10. SourcelD must correspond to a valid SourcelD from the Sources table and cannot be NULL. The
Sources table should be populated prior to adding data values with a particular SourcelD to the
DataValues table.
11. SamplelD is optional and should only be populated if the data value was generated from a
physical sample that was sent to a laboratory for analysis. The SamplelD must correspond to a
valid SamplelD in the Samples table, and the Samples table should be populated prior to adding
data values with a particular SamplelD to the DataValues table.
12. DerivedFromID is optional and should only be populated if the data value was derived from other
data values that are also stored in the ODM database.
13. QualityControlLevellD is mandatory, cannot be NULL, and must correspond to a valid
QualityControlLevellD in the QualityControlLevels table. A default value of-9999 is used for
this field in the event that the QualityControlLevellD is unknown. The QualityControlLevels
table should be populated prior to adding data values with a particular QualityControlLevellD to
the DataValues table.
Table: DerivedFrom
The DerivedFrom table contains the linkage between derived data values and the data values that they
were derived from.
Field Name
DerivedFromID
ValuelD
Data Type
Integer
Integer
Description
Integer identifying the group of data
values from which a quantity is derived.
Integer identifier referencing data values
that comprise a group from which a
quantity is derived. This corresponds to
ValuelD in the DataValues table.
Examples
O
5
1,2,3,4,5
Constraint
M
M
The following rules and best practices should be used in populating this table:
1. Although all of the fields in this table are mandatory, they need only be populated if derived data
values and the data values that they were derived from are entered into the database. If there are
no derived data in the DataValues table, this table will be empty.
Table: GeneralCategoryCV
The GeneralCategoryCV table contains the controlled vocabulary for the general categories associated
with Variables. The General Category field in the Variables table can only be populated with values from
the Term field of this controlled vocabulary table.
B-31
-------
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for
GeneralCategory.
Definition of GeneralCategory
controlled vocabulary term. The
definition is optional if the term is self
explanatory.
Examples
"Hydrology"
"Data
associated
with
hydrologic
variables or
processes."
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: GroupDescriptions
The GroupDescriptions table lists the descriptions for each of the groups of data values that have been
formed.
Field Name
GroupID
GroupDescription
Data Type
Integer
Identity
Text (unlimited)
Description
Unique integer identifier for each group
of data values that has been formed.
This also references to GroupID in the
Groups table.
Text description of the group.
Example
4
"Echo
Reservoir
Profile
7/7/2005"
Constraint
M
Unique
Primary key
0
The following rules and best practices should be used in populating this table:
1. This table will only be populated if groups of data values have been created in the ODM database.
2. The GroupID field is the primary key, must be a unique integer, and cannot be NULL. It should
be implemented as an auto number/identity field.
3. The GroupDescription can be any text string that describes the group of observations.
Table: Groups
The Groups table lists the groups of data values that have been created and the data values that are within
each group.
Field Name
GroupID
ValuelD
Data Type
Integer
Integer
Description
Integer ID for each group of data values
that has been formed.
Integer identifier for each data value that
belongs to a group. This corresponds to
ValuelD in the Data Values table
Example
4
2,3,4
Constraint
M
Foreign key
M
Foreign key
B-32
-------
The following rules and best practices should be used in populating this table:
1. This table will only be populated if groups of data values have been created in the ODM database.
2. The GroupID field must reference a valid GroupID from the GroupDescriptions table, and the
GroupDescriptions table should be populated for a group prior to populating the Groups table.
Table: ISOMetadata
The ISOMetadata table contains dataset and project level metadata required by the CUAHSI HIS
metadata system (http://www.cuahsi.org/his/documentation.html) for compliance with standards such as
the draft ISO 19115 or ISO 8601. The mandatory fields in this table must be populated to provide a
complete set of ISO compliant metadata in the database.
Field Name
MetadatalD
TopicCategory
Title
Abstract
ProfileVersion
MetadataLink
Data Type
Integer
Identity
Text (255)
Text (255)
Text
(unlimited)
Text (255)
Text (500)
Description
Unique integer ID for each
metadata record.
Topic category keyword that
gives the broad ISO19115
metadata topic category for data
from this source. The controlled
vocabulary of topic category
keywords is given in the
TopicCategoryCV table.
Title of data from a specific data
source.
Abstract of data from a specific
data source.
Name of metadata profile used
by the data source
Link to additional metadata
reference material.
Example
4
"inlandWaters"
"ISO8601"
Constraint
M
Unique
Primary key
M
Foreign key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
M
Cannot
contain tab,
line feed, or
carriage
return
characters
O
Default Value
"Unknown"
"Unknown"
"Unknown"
"Unknown"
NULL
The following rules and best practices should be used in populating this table:
1. The MetadatalD field is the primary key, must be a unique integer, and cannot be NULL. This
field should be implemented as an auto number/identity field.
2. All of the fields in this table are mandatory and cannot be NULL except for the MetadataLink
field.
3. The TopicCategory field should only be populated with terms from the TopicCategoryCV table.
The default controlled vocabulary term is "Unknown."
B-33
-------
4. The Title field should be populated with a brief text description of what the referenced data
represent. This field can be populated with "Unknown" if there is no title for the data.
5. The Abstract field should be populated with a more complete text description of the data that the
metadata record references. This field can be populated with "Unknown" if there is no abstract
for the data.
6. The ProfileVersion field should be populated with the version of the ISO metadata profile that is
being used. This field can be populated with "Unknown" if there is no profile version for the
data.
7. One record with a MetadatalD = 0 should exist in this table with TopicCategory, Title, Abstract,
and ProfileVersion = "Unknown" and MetadataLink = NULL. This record should be the default
value for sources with unknown/unspecified metadata.
Table: LabMethods
The LabMethods table contains descriptions of the laboratory methods used to analyze physical samples
for specific constituents.
Field Name
LabMethodID
LabName
LabOrganization
LabMethodName
LabMethodDescription
LabMethodLink
Data Type
Integer
Identity
Text (255)
Text (255)
Text (255)
Text
(unlimited)
Text (500)
Description
Unique integer identifier
for each laboratory
method. This is the key
used by the Samples table
to reference a laboratory
method.
Name of the laboratory
responsible for processing
the sample.
Organization responsible
for sample analysis.
Name of the method and
protocols used for sample
analysis.
Description of the method
and protocols used for
sample analysis.
Link to additional
reference material on the
analysis method.
Example
6
"USGS
Atlanta Field
Office"
"USGS"
"USEPA-
365.1"
"Processed
through Model
*** Mass
Spectrometer"
Constraint
M
Unique
Primary
key
M
Cannot
contain tab,
line feed,
or carriage
return
characters
M
Cannot
contain tab,
line feed,
or carriage
return
characters
M
Cannot
contain tab,
line feed,
or carriage
return
characters
M
O
Default Value
"Unknown"
"Unknown"
"Unknown"
"Unknown"
NULL
The following rules and best practices should be used when populating this table:
B-34
-------
1. The LabMethodID field is the primary key, must be a unique integer, and cannot be NULL. It
should be implemented as an auto number/identity field.
2. All of the fields in this table are required and cannot be null except for the LabMethodLink.
3. The default value for all of the required fields except for the LabMethodID is "Unknown."
4. A single record should exist in this table where the LabMethodID = 0 and the LabName,
LabOrganization, LabMethdodName, and LabMethodDescription fields are equal to "Unknown"
and the LabMethodLink = NULL. This record should be used to identify samples in the Samples
table for which nothing is known about the laboratory method used to analyze the sample.
Table: Methods
The Methods table lists the methods used to collect the data and any additional information about the
method.
Field Name
MethodID
MethodDescription
MethodLink
Data Type
Integer
Identity
Text
(unlimited)
Text (500)
Description
Unique integer ID for
each method.
Text description of each
method.
Link to additional
reference material on
the method.
Example
5
"Specific
conductance
measured using a
Hydrolab" or
"Streamflow
measured using a V
notch weir with
dimensions xxx"
Constraint
M
Unique
Primary key
M
O
Default Value
NULL
The following rules and best practices should be used when populating this table:
1. The MethodID field is the primary key, must be a unique integer, and cannot be NULL.
2. There is no default value for the MethodDescription field in this table. Rather, this table should
contain a record with MethodID = 0, MethodDescription = "Unknown", and MethodLink =
NULL. A MethodID of 0 should be used as the MethodID for any data values for which the
method used to create the value is unknown (i.e., the default value for the MethodID field in the
DataValues table is 0).
3. Methods should describe the manner in which the observation was collected (i.e., collected
manually, or collected using an automated sampler) or measured (i.e., measured using a
temperature sensor or measured using a turbidity sensor). Details about the specific sensor
models and manufacturers can be included in the MethodDescription.
B-35
-------
Table: ODM Version
The ODM Version table has a single record that records the version of the ODM database. This table
must contain a valid ODM version number. This table will be pre-populated and should not be edited.
Field Name
VersionNumber
Data Type
Text (50)
Description
String that lists the version of the ODM
database.
Example
"1.1"
Constraint
M
Cannot
contain tab,
line feed, or
carriage
return
characters
Table: Offset Types
The OffsetTypes table lists full descriptive information for each of the measurement offsets.
Field Name
OffsetTypelD
OffsetUnitsID
OffsetDescription
Data Type
Integer
Identity
Integer
Text (unlimited)
Description
Unique integer identifier that identifies
the type of measurement offset.
Integer identifier that references the
record in the Units table giving the Units
of the OffsetValue.
Full text description of the offset type.
Example
2
1
"Below water
surface"
"Above
Ground Level"
Constraint
M
Unique
Primary key
M
Foreign key
M
The following rules and best practices should be followed when populating this table:
1. Although all three fields in this table are mandatory, this table will only be populated if data
values measured at an offset have been entered into the ODM database.
2. The OffsetTypelD field is the primary key, must be a unique integer, and cannot be NULL. This
field should be implemented as an auto number/identity field.
3. The OffsetUnitsID field should reference a valid ID from the UnitsID field in the Units table.
Because the Units table is a controlled vocabulary, only units that already exist in the Units table
can be used as the units of the offset.
4. The OffsetDescription field should be filled in with a complete text description of the offset that
provides enough information to interpret the type of offset being used. For example, "Distance
from stream bank" is ambiguous because it is not known which bank is being referred to.
Table: Qualifiers
The Qualifiers table contains data qualifying comments that accompany the data.
Field Name
QualifierlD
Data Type
Integer
Identity
Description
Unique integer identifying the
data qualifier.
Example
3
Constraint
M
Unique
Primary key
Default Value
B-36
-------
QualifierCode
QualifierDescription
Text (50)
Text
(unlimited)
Text code used by
organization that collects the
data.
Text of the data qualifying
comment.
"e" (for
estimated) or
"a" (for
approved) or
"p" (for
provisional)
"Holding
time for
sample
analysis
exceeded"
0
Cannot
contain
space, tab,
line feed, or
carriage
return
characters
M
NULL
This table will only be populated if data values that have data qualifying comments have been added to
the ODM database. The following rules and best practices should be used when populating this table:
1. The QualifierlD field is the primary key, must be a unique integer, and cannot be NULL. This
field should be implemented as an auto number/identity field.
Table: QualityControlLevels
The QualityControlLevels table contains the quality control levels that are used for versioning data within
the database.
Field Name
Quality Co ntrolLevellD
Quality Co ntrolLevelCode
Definition
Explanation
Data Type
Integer
Identity
Text (50)
Text (255)
Text
(unlimited)
Description
Unique integer identifying the
quality control level.
Code used to identify the level
of quality control to which data
values have been subjected.
Definition of Quality Control
Level.
Explanation of Quality Control
Level
Example
0,1,2,3,4,5
"1", "1.1", "Raw",
"QC Checked"
"Raw Data",
"Quality Controlled
Data"
"Raw data is defined
as unprocessed data
and data products
that have not
undergone quality
control."
Constraint
M
Unique
Primary key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
This table is pre-populated with quality control levels 0 through 4 within the ODM. The following rules
and best practices should be used when populating this table:
B-37
-------
The QualityControlLevellD field is the primary key, must be a unique integer, and cannot be
NULL. This field should be implemented as an auto number/identity field.
It is suggested that the pre-populated system of quality control level codes (i.e.,
QualityControlLevelCodes 0 - 4) be used. If the pre-populated list is not sufficient, new quality
control levels can be defined. A quality control level code of-9999 is suggested for data whose
quality control level is unknown.
Table: SampleMediumCV
The SampleMediumCV table contains the controlled vocabulary for sample media.
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for
sample media.
Definition of sample media
controlled vocabulary term. The
definition is optional if the term
is self explanatory.
Examples
"Surface Water"
"Sample taken from
surface water such as a
stream, river, lake,
pond, reservoir, ocean,
etc."
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM.
http://water.usu.edu/cuahsi/odm/.
Changes to this controlled vocabulary can be requested at
Table: Samples
The Samples table gives information about physical samples analyzed in a laboratory.
Field Name
SamplelD
SampleType
LabSampleCode
Data Type
Integer
Identity
Text (255)
Text (50)
Description
Unique integer identifier that
identifies each physical sample.
Controlled vocabulary specifying
the sample type from the
SampleTypeCV table.
Code or label used to identify and
track lab sample or sample
container (e.g. bottle) during lab
analysis.
Example
O
3
"FD",
"PB",
"SW",
"Grab
Sample"
"AB-123"
Constraint
M
Unique
Primary key
M
Foreign key
M
Unique
Cannot
contain tab,
line feed, or
carriage
return
characters
Default Value
"Unknown"
B-38
-------
Field Name
LabMethodID
Data Type
Integer
Description
Unique identifier for the
laboratory method used to process
the sample. This references the
LabMethods table.
Example
4
Constraint
M
Foreign key
Default Value
0 = Nothing
known about
lab method
The following rules and best practices should be followed when populating this table:
1. This table will only be populated if data values associated with physical samples are added to the
ODM database.
2. The SamplID field is the primary key, must be a unique integer, and cannot be NULL. This field
should be implemented as an auto number/identity field.
3. The SampleType field should be populated using terms from the SampleTypeCV table. Where
the sample type is unknown, a default value of "Unknown" can be used.
4. The LabSampleCode should be a unique text code used by the laboratory to identify the sample.
This field is an alternate key for this table and should be unique.
5. The LabMethodID must reference a valid LabMethodID from the LabMethods table. The
LabMethods table should be populated with the appropriate laboratory method information prior
to adding records to this table that reference that laboratory method. A default value of 0 for this
field indicates that nothing is known about the laboratory method used to analyze the sample.
Table: SampleTypeCV
The SampleTypeCV table contains the controlled vocabulary for sample type.
Field Name
Term
Definition
Data Type
Text (255)
Text
(unlimited)
Description
Controlled vocabulary for sample
type.
Definition of sample type controlled
vocabulary term. The definition is
optional if the term is self
explanatory.
Examples
"FD", "PB", "Grab
Sample"
"Foliage Digestion",
"Precipitation Bulk"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: SeriesCatalog
The SeriesCatalog table lists each separate data series in the database for the purposes of identifying or
displaying what data are available at each site and to speed simple queries without querying the main
DataValues table. Unique site/variable combinations are defined by unique combinations of SitelD,
VariablelD, MethodID, SourcelD, and QualityControlLevellD.
B-39
-------
This entire table should be programmatically derived and should be updated every time data is added to
the database. Constraints on each field in the SeriesCatalog table are dependent upon the constraints on
the fields in the table from which those fields originated.
Field Name
SeriesID
SitelD
SiteCode
SiteName
VariablelD
VariableCode
VariableName
Speciation
VariableUnitsID
VariableUnitsName
SampleMedium
ValueType
TimeSupport
Data Type
Integer
Identity
Integer
Text (50)
Text (255)
Integer
Text (50)
Text (255)
Text (255)
Integer
Text (255)
Text (255)
Text (255)
Real
Description
Unique integer identifier for
each data series.
Site identifier from the Sites
table.
Site code used by organization
that collects the data.
Full text name of sampling
site.
Integer identifier for each
Variable that references the
Variables table.
Variable code used by the
organization that collects the
data.
Name of the variable from the
variables table.
Code used to identify how the
data value is expressed (i.e.,
total phosphorus concentration
expressed as P). This should
be from the SpeciationCV
controlled vocabulary table.
Integer identifier that
references the record in the
Units table giving the Units of
the data value.
Full text name of the variable
units from the UnitsName
field in the Units table.
The medium of the sample.
This should be from the
SampleMediumCV controlled
vocabulary table.
Text value indicating what
type of data value is being
recorded. This should be from
the ValueTypeCV controlled
vocabulary table.
Numerical value that indicates
the time support (or temporal
footprint) of the data values. 0
is used to indicate data values
that are instantaneous. Other
values indicate the time over
which the data values are
implicitly or explicitly
averaged or aggregated.
Example
5
7
"1002000"
"Logan River"
4
"00060"
"Temperature"
"P", "N", "NO3"
5
"milligrams per
liter"
"Surface Water"
"Field Observation"
0,24
Constraint
P
Primary key
P
P
P
P
P
P
P
P
P
P
P
P
B-40
-------
Field Name
TimeUnitsID
TimeUnitsName
DataType
GeneralCategory
MethodID
MethodDescription
SourcelD
Organization
SourceDescription
Data Type
Integer
Text (255)
Text (255)
Text (255)
Integer
Text
(unlimited)
Integer
Text (255)
Text
(unlimited)
Description
Integer identifier that
references the record in the
Units table giving the Units of
the time support. If
TimeSupport is 0, indicating
an instantaneous observation,
a unit needs to still be given
for completeness, although it
is somewhat arbitrary.
Full text name of the time
support units from the
UnitsName field in the Units
table.
Text value that identifies the
data as one of several types
from the DataTypeCV
controlled vocabulary table.
General category of the
variable from the
GeneralCategoryCV table.
Integer identifier that
identifies the method used to
generate the data values and
references the Methods table.
Full text description of the
method used to generate the
data values.
Integer identifier that
identifies the source of the
data values and references the
Sources table.
Text description of the source
organization from the Sources
table.
Text description of the data
source from the Sources table.
Example
4
"hours"
"Continuous"
"Instantaneous"
"Cumulative"
"Incremental"
"Average"
"Minimum"
"Maximum"
"Constant Over
Interval"
"Categorical"
"Water Quality"
2
"Specific
conductance
measured using a
Hydrolab" or
"Streamflow
measured using a V
notch weir with
dimensions xxx"
5
"USGS"
"Text file retrieved
from the EPA
STORET system
indicating data
originally from Utah
Division of Water
Quality"
Constraint
P
P
P
P
P
P
P
P
P
B-41
-------
Field Name
Citation
Quality Co ntrolLevellD
Quality Co ntrolLevelCode
BeginDateTime
EndDateTime
BeginDateTimeUTC
EndDateTimeUTC
ValueCount
Data Type
Text
(unlimited)
Integer
Text (50)
Date/Time
Date/Time
Date/Time
Date/Time
Integer
Description
Text string that give the
citation to be used when the
data from each source are
referenced.
Integer identifier that indicates
the level of quality control that
the data values have been
subjected to.
Code used to identify the level
of quality control to which
data values have been
subjected.
Date of the first data value in
the series. To be
programmatically updated if
new records are added.
Date of the last data value in
the series. To be
programmatically updated if
new records are added.
Date of the first data value in
the series in UTC. To be
programmatically updated if
new records are added.
Date of the last data value in
the series in UTC. To be
programmatically updated if
new records are added.
The number of data values in
the series identified by the
combination of the SitelD,
VariablelD, MethodID,
SourcelD and
QualityControlLevellD fields.
To be programmatically
updated if new records are
added.
Example
"Slaughter, C. W.,
D. Marks, G. N.
Flerchinger, S. S.
Van Vactor and M.
Burgess, (2001),
"Thirty-five years of
research data
collection at the
Reynolds Creek
Experimental
Watershed, Idaho,
United States,"
Water Resources
Research, 37(11):
2819-2823."
0,1,2,3,4
"1", "1.1", "Raw",
"QC Checked"
9/4/2003 7:00:00
AM
9/4/2005 7:00:00
AM
9/4/2003 2:00 PM
9/4/2003 2:00 PM
50
Constraint
P
P
P
P
P
P
P
P
B-42
-------
Table: Sites
The Sites table provides information giving the spatial location at which data values have been collected.
Field Name
SitelD
SiteCode
SiteName
Latitude
Longitude
LatLongDatumID
Elevation_m
VerticalDatum
LocalX
LocalY
Data Type
Integer
Identity
Text (50)
Text (255)
Real
Real
Integer
Real
Text (255)
Real
Real
Description
Unique identifier for each
sampling location.
Code used by organization that
collects the data to identify the
site
Full name of the sampling site.
Latitude in decimal degrees.
Longitude in decimal degrees.
East positive, West negative.
Identifier that references the
Spatial Reference System of
the latitude and longitude
coordinates in the
SpatialReferences table.
Elevation of sampling location
(inm). If this is not provided it
needs to be obtained
programmatically from a DEM
based on location information.
Vertical datum of the elevation.
Controlled Vocabulary from
VerticalDatumCV.
Local Projection X coordinate.
Local Projection Y Coordinate.
Example
37
"10109000"
(USGS Gage
number)
"LOGAN
RIVER
ABOVE
STATE DAM,
NEAR
LOGAN,UT"
45.32
-100.47
1
1432
"NAVD88"
456700
232000
Constraint
M
Unique
Primary
key
M
Unique
Allows
only
characters
in the
range of A-
Z (case
insensitive)
, 0-9, and
and" ".
M
Cannot
contain
tab, line
feed, or
carriage
return
characters
M
(>= -90
AND
<=90)
M
(>= -180
AND <=
360)
M
Foreign
key
O
O
Foreign
key
O
O
Default Value
0 = Unknown
NULL
NULL
NULL
NULL
B-43
-------
Field Name
LocalProj ectionID
PosAccuracy_m
State
County
Comments
Data Type
Integer
Real
Text (255)
Text (255)
Text
(unlimited)
Description
Identifier that references the
Spatial Reference System of
the local coordinates in the
SpatialReferences table. This
field is required if local
coordinates are given.
Value giving the accuracy with
which the positional
information is specified in
meters.
Name of state in which the
monitoring site is located.
Name of county in which the
monitoring site is located.
Comments related to the site.
Example
7
100
"Utah"
"Cache"
Constraint
O
Foreign
key
O
O
Cannot
contain
tab, line
feed, or
carriage
return
characters
0
Cannot
contain
tab, line
feed, or
carriage
return
characters
O
Default Value
NULL
NULL
NULL
NULL
NULL
The following rules and best practices should be followed when populating this table:
1. The SitelD field is the primary key, must be a unique integer, and cannot be NULL. This field
should be implemented as an auto number/identity field.
2. The SiteCode field must contain a text code that uniquely identifies each site. The values in this
field should be unique and can be an alternate key for the table. SiteCodes cannot contain any
characters other than A-Z (case insensitive), 0-9, period ".", dash "-", and underscore "_".
3. The LatLongDatumID must reference a valid SpatialReferencelD from the SpatialReferences
controlled vocabulary table. If the datum is unknown, a default value of 0 is used.
4. If the Elevation_m field is populated with a numeric value, a value must be specified in the
VerticalDatum field. The VerticalDatum field can only be populated using terms from the
VerticalDatumCV table. If the vertical datum is unknown, a value of "Unknown" is used.
5. If the LocalX and LocalY fields are populated with numeric values, a value must be specified in
the LocalProj ectionID field. The LocalProj ectionID must reference a valid SpatialReferencelD
from the SpatialReferences controlled vocabulary table. If the spatial reference system of the
local coordinates is unknown, a default value of 0 is used.
Table: Sources
The Sources table lists the original sources of the data, providing information sufficient to retrieve and
reconstruct the data value from the original data files if necessary.
B-44
-------
Field Name
SourcelD
Organization
SourceDescription
SourceLink
ContactName
Phone
Email
Address
Data Type
Integer
Identity
Text (255)
Text
(unlimited)
Text (500)
Text (255)
Text (255)
Text (255)
Text (255)
Description
Unique integer identifier that
identifies each data source.
Name of the organization that
collected the data. This should
be the agency or organization
that collected the data, even if
it came out of a database
consolidated from many
sources such as STORET.
Full text description of the
source of the data.
Link that can be pointed at the
original data file and/or
associated metadata stored in
the digital library or URL of
data source.
Name of the contact person for
the data source.
Phone number for the contact
person.
Email address for the contact
person.
Street address for the contact
person.
Example
5
"Utah Division
of Water
Quality"
"Text file
retrieved from
the EPA
STORET
system
indicating data
originally from
Utah Division
of Water
Quality"
"Jane Adams"
"435-797-0000"
"Jane.Adams@
dwq.ut"
"45 Main
Street"
Constraint
M
Unique
Primary key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
0
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
Default Value
NULL
"Unknown"
"Unknown"
"Unknown"
"Unknown"
B-45
-------
Field Name
City
State
ZipCode
Citation
MetadatalD
Data Type
Text (255)
Text (255)
Text (255)
Text
(unlimited)
Integer
Description
City in which the contact
person is located.
State in which the contact
person is located. Use two
letter abbreviations for US.
For other countries give the full
country name.
US Zip Code or country postal
code.
Text string that give the
citation to be used when the
data from each source are
referenced.
Integer identifier referencing
the record in the ISOMetadata
table for this source.
Example
"Salt Lake City"
"UT"
"82323"
"Data collected
by USU as part
of the Little
Bear River Test
Bed Project"
5
Constraint
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
M
Foreign key
Default Value
"Unknown"
"Unknown"
"Unknown"
"Unknown"
0 = Unknown
or uninitialized
metadata
The following rules and best practices should be followed when populating this table:
1. The SourcelD field is the primary key, must be a unique integer, and cannot be NULL. This field
should be implemented as an auto number/identity field.
2. The Organization field should contain a text description of the agency or organization that created
the data.
3. The SourceDescription field should contain a more detailed description of where the data was
actually obtained.
4. A default value of "Unknown" may be used for the source contact information fields in the event
that this information is not known.
5. Each source must be associated with a metadata record in the ISOMetadata table. As such, the
MetadatalD must reference a valid MetadatalD from the ISOMetadata table. The ISOMetatadata
table should be populated with an appropriate record prior to adding a source to the Sources table.
A default MetadatalD of 0 can be used for a source with unknown or uninitialized metadata.
6. Use the Citation field to record the text that you would like others to use when they are
referencing your data. Where available, journal citations are encouraged to promote the correct
crediting for use of data.
Table: SpatialReferences
The SpatialReferences table provides information about the Spatial Reference Systems used for latitude
and longitude as well as local coordinate systems in the Sites table. This table is a controlled vocabulary.
B-46
-------
Field Name
SpatialReferencelD
SRSID
SRSName
IsGeographic
Notes
Data Type
Integer
Identity
Integer
Text (255)
Boolean
Text
(unlimited)
Description
Unique integer identifier for each Spatial
Reference System.
Integer identifier for the Spatial
Reference System from
http://www.epsg.org/.
Name of the Spatial Reference System.
Value that indicates whether the spatial
reference system uses geographic
coordinates (i.e. latitude and longitude)
or not.
Descriptive information about the Spatial
Reference System. This field would be
used to define a non-standard study area
specific system if necessary and would
contain a description of the local
projection information. Where possible,
this should refer to a standard projection,
in which case latitude and longitude can
be determined from local projection
information. If the local grid system is
non-standard then latitude and longitude
need to be included too.
Example
37
4269
"NAD83"
"True",
"False"
Constraint
M
Unique
Primary key
0
M
Cannot contain
tab, line feed, or
carriage return
characters
O
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: SpeciationCV
The SpeciationCV table contains the controlled vocabulary for the Speciation field in the Variables table.
Field Name
Term
Definition
Data Type
Text (255)
Text
(unlimited)
Description
Controlled vocabulary for Speciation.
Definition of Speciation controlled
vocabulary term. The definition is
optional if the term is self explanatory.
Examples
"P"
"Expressed as
phosphorus"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
B-47
-------
Table: TopicCategoryCV
The TopicCategoryCV table contains the controlled vocabulary for the ISOMetaData topic categories.
Field Name
Term
Definition
Data Type
Text (255)
Text
(unlimited)
Description
Controlled vocabulary for
TopicCategory.
Definition of TopicCategory controlled
vocabulary term. The definition is
optional if the term is self explanatory.
Examples
"InlandWaters"
"Data
associated with
inland waters"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: Units
The Units table gives the Units and UnitsType associated with variables, time support, and offsets. This
is a controlled vocabulary table.
Field Name
UnitsID
UnitsName
UnitsType
Units Abbreviation
Data Type
Integer
Identity
Text (255)
Text (255)
Text (255)
Description
Unique integer identifier that identifies
each unit.
Full text name of the units.
Text value that specifies the dimensions
of the units.
Text abbreviation for the units.
Example
6
"Milligrams
Per Liter"
"Length"
"Time"
"Mass"
"mg/L"
Constraint
M
Unique
Primary key
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
M
Cannot
contain tab,
line feed, or
carriage
return
characters
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
B-48
-------
Table: ValueTypeCV
The ValueTypeCV table contains the controlled vocabulary for the ValueType field in the Variables and
SeriesCatalog tables.
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for ValueType.
Definition of the ValueType controlled
vocabulary term. The definition is
optional if the term is self explanatory.
Examples
"Field
Observation"
"Observation
of a variable
using a field
instrument"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
O
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: VariableNameCV
The VariableName CV table contains the controlled vocabulary for the VariableName field in the
Variables and SeriesCatalog tables.
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for Variable
names.
Definition of the VariableName
controlled vocabulary term. The
definition is optional if the term is self
explanatory.
Examples
"Temperature",
"Discharge",
"Precipitation"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
0
This table is pre-populated within the ODM. Changes to this controlled vocabulary can be requested at
http://water.usu.edu/cuahsi/odm/.
Table: Variables
The Variables table lists the full descriptive information about what variables have been measured.
B-49
-------
Field Name
VariablelD
VariableCode
VariableName
Speciation
VariableUnitsID
SampleMedium
ValueType
IsRegular
TimeSupport
Data Type
Integer
Identity
Text (50)
Text (255)
Text (255)
Integer
Text (255)
Text (255)
Boolean
Real
Description
Unique integer identifier for
each variable.
Text code used by the
organization that collects the
data to identify the variable.
Full text name of the variable
that was measured, observed,
modeled, etc. This should be
from the VariableName CV
controlled vocabulary table.
Text code used to identify how
the data value is expressed (i.e.,
total phosphorus concentration
expressed as P). This should be
from the SpeciationCV
controlled vocabulary table.
Integer identifier that references
the record in the Units table
giving the units of the data
values associated with the
variable.
The medium in which the
sample or observation was taken
or made. This should be from
the SampleMediumCV
controlled vocabulary table.
Text value indicating what type
of data value is being recorded.
This should be from the
ValueTypeCV controlled
vocabulary table.
Value that indicates whether the
data values are from a regularly
sampled time series.
Numerical value that indicates
the time support (or temporal
footprint) of the data values. 0 is
used to indicate data values that
are instantaneous. Other values
indicate the time over which the
data values are implicitly or
explicitly averaged or
aggregated.
Example
6
"00060" used
by USGS for
discharge
"Discharge"
"P" "M"
A , n ,
"NO3"
4
"Surface
Water"
"Sediment"
"Fish Tissue"
"Field
Observation"
"Laboratory
Observation"
"Model
Simulation
Results"
"True"
"False"
0,24
Constraint
M
Unique
Primary key
M
Unique
Allows only
characters in
the range of
A-Z (case
insensitive),
0-9, and ".",
"-", and " ".
M
Foreign key
M
Foreign key
M
Foreign key
M
Foreign key
M
Foreign key
M
M
Default Value
"Not
Applicable"
"Unknown"
"Unknown"
"False"
0 = Assumes
instantaneous
samples where
no other
information is
available
B-50
-------
Field Name
TimeUnitsID
DataType
GeneralCategory
NoDataValue
Data Type
Integer
Text (255)
Text (255)
Real
Description
Integer identifier that references
the record in the Units table
giving the Units of the time
support. If TimeSupport is 0,
indicating an instantaneous
observation, a unit needs to still
be given for completeness,
although it is somewhat
arbitrary.
Text value that identifies the
data values as one of several
types from the DataTypeCV
controlled vocabulary table.
General category of the data
values from the
GeneralCategoryCV controlled
vocabulary table.
Numeric value used to encode
no data values for this variable.
Example
4
"Continuous"
"Sporadic"
"Cumulative"
"Incremental"
"Average"
"Minimum"
"Maximum"
"Constant
Over Interval"
"Categorical"
"Climate"
"Water
Quality"
"Groundwater
Quality"
-9999
Constraint
M
Foreign key
M
Foreign key
M
Foreign key
M
Default Value
103 = hours
"Unknown"
"Unknown"
-9999
The following rules and best practices should be followed when populating this table:
1. The VariablelD field is the primary key, must be a unique integer, and cannot be NULL. This
field should be implemented as an auto number/identity field.
2. The VariableCode field must be unique and serves as an alternate key for this table. Variable
codes can be arbitrary, or they can use an organized system. VaraibleCodes cannot contain any
characters other than A-Z (case insensitive), 0-9, period ".", dash "-", and underscore "_".
3. The VariableName field must reference a valid Term from the VariableNameCV controlled
vocabulary table.
4. The Speciation field must reference a valid Term from the SpeciationCV controlled vocabulary
table. A default value of "Not Applicable" is used where speciation does not apply. If the
speciation is unknown, a value of "Unknown" can be used.
5. The VariableUnitsID field must reference a valid UnitsID from the UnitsTable controlled
vocabulary table.
6. Only terms from the SampleMediumCV table can be used to populate the SampleMedium field.
A default value of "Unknown" is used where the sample medium is unknown.
7. Only terms from the ValueTypeCV table can be used to populate the ValueType field. A default
value of "Unknown" is used where the value type is unknown.
8. The default for the TimeSupport field is 0. This corresponds to instantaneous values. If the
TimeSupport field is set to a value other than 0, an appropriate TimeUnitsID must be specified.
The TimeUnitsID field can only reference valid UnitsID values from the Units controlled
vocabulary table. If the TimeSupport field is set to 0, any time units can be used (i.e., seconds,
minutes, hours, etc.), however a default value of 103 has been used, which corresponds with
hours.
B-51
-------
9. Only terms from the DataTypeCV table can be used to populated the DataType field. A default
value of "Unknown" can be used where the data type is unknown.
10. Only terms from the GeneralCategoryCV table can be used to populate the General Category
field. A default value of "Unknown" can be used where the general category is unknown.
11. The NoDataValue should be set such that it will never conflict with a real observation value. For
example a NoDataValue of -9999 is valid for water temperature because we would never expect
to measure a water temperature of -9999. The default value for this field is -9999.
Table: VerticalDatumCV
The VerticalDatumCV table contains the controlled vocabulary for the VerticalDatum field in the Sites
table.
Field Name
Term
Definition
Data Type
Text (255)
Text (unlimited)
Description
Controlled vocabulary for
VerticalDatum.
Definition of the VerticalDatum
controlled vocabulary. The definition is
optional if the term is self explanatory.
Examples
"NAVD88"
"North
American
Vertical
Datum of
1988"
Constraint
M
Unique
Primary key
Cannot
contain tab,
line feed, or
carriage
return
characters
0
This table is pre-populated within the ODM.
http://water.usu.edu/cuahsi/odm/.
Changes to this controlled vocabulary can be requested at
B-52
-------
Appendix B. Data Versioning Within ODM
The main text of this document focuses on how ODM is structured to store observations data. It does not
address how to manage editing data stored within ODM. Software applications based on ODM will have
functionality that will allow data managers and database administrators to modify, delete, change, or
otherwise make edits to data stored within ODM. In addition, these software tools will provide
functionality to create derived datasets, or datasets that are calculated or derived from data already stored
in ODM (i.e., calculate a time series of discharge from a time series of stage, or calculate a time series of
daily average temperature from a time series of hourly observations). The purpose of this appendix is to
clarify how data editing and versioning can be managed within the ODM schema.
Data Series Defined
In order to fully grasp the concepts that follow, the idea of a "data series" in the context of ODM must be
clarified. A "data series" is an organizing principle that is present in the ODM. A data series consists of
all of the data values associated with a unique site, variable, method, source, and quality control level
combination. An example of the full specification for a data series is: "all of the raw unchecked
(QualityControlLevel) water temperature (Variable) values measured in the Logan River near Logan, UT
(Site) using a field temperature sensor (method) by Utah State University (Source)." Each record in the
SeriesCatalog table of ODM represents a unique data series.
Rules for Editing and Deriving Data Series in ODM
The following rules are suggested so that versioning of and edits to data series can be managed within the
ODM schema. Software applications that work with ODM should follow these rules. These rules are
based on the default set of Quality Control Levels that are distributed with the ODM blank schema.
1. Data versioning should be done at the data series level - Within ODM, the concept of data
versioning is related to the quality control level. Quality control level is a data series level
attribute, and as such, changes to the quality control level should occur at the data series level
rather than at the individual value level. For example, if an investigator wished to create a quality
controlled Level 1 data series from a raw Level 0 data series, he/she should first make a copy of
the raw Level 0 data series and then perform any edits and adjustments required in the quality
control process to the copy. The edited copy then becomes the Level 1 data series, and the Level
0 data series is preserved intact.
2. Data series with a QualityControlLevelCode ofO cannot be edited - Level 0 data series represent
raw data from sensors (i.e., stage measured by a water level recorder) or other products derived
from raw data (i.e., discharge that is programmatically derived from stage before the stage values
have been quality controlled). By definition, Level 0 data have not been quality controlled and
may contain significant errors and bad values. However, Level 0 data series represent the source
from which all other derived data series are based, and as such should remain intact for archive
purposes. Level 0 data series should not be used for analysis unless no other adequate options are
available, and only if the user is aware that the data are raw. Level 0 data series can be removed
entirely from the database, but only by removing the entire data series.
3. Only one QualityControlLevel 0 data series can exist for a Site, Variable, and Method
combination - Only one raw data series for a Site, Variable, and Method combination can exist
within an ODM database. If multiple sensors are measuring the same variable at the same site,
the method description would have to distinguish between the two.
4. Only one QualityControlLevel 1 data series can exist for each Site, Variable, and Method
combination - Once a Level 0 data series has been loaded to the database, a Level 1 data series
B-53
-------
can be "derived" from that Level 0 data series. This is done by first making a copy of the Level 0
data series, second changing the QualityControlLevel of the copy to 1, and last doing any
necessary filtering or editing required so that the Level 1 data series is acceptable as quality
controlled. In most cases, the majority of the values within a Level 0 data series and its
corresponding Level 1 data series will remain the same. However, where instruments
malfunction or other conditions are present that affect the raw data values, Level 0 values may be
deleted, adjusted, or otherwise edited in creating the Level 1 data series.
Any edits to a data series are saved to that data series - Level 0 data cannot be edited. With
Levels 1 or higher, however, software applications should be allowed to edit and delete values.
Each time an edit is made, the result should overwrite the previous value within a data series. In
other words, edits should not create new data series, they should modify an existing one. This
will be true even where edits are done within multiple editing sessions. The editing software
should record the method or basis for any data edits in appropriate method records.
Data series of Level 2 or higher can only be created from data series of Level 1 or higher -
Derived data series of Level 2 or higher can only be created from data series of Level 1 or higher.
If a user wishes to create a derived data series from a Level 0 data series (such as discharge from
raw, unchecked stage values) that derived data series would also be Level 0.
B-54
-------
This page left intentionally blank.
B-55
-------
APPENDIX C: DESCRIPTION OF ESF DATA LOADED TO THE ODM
Sources
Loaded: Sources have been successfully loaded to the ESF DB.
Comments: Contacts had to be updated.
Total: Four sources
Sites
Loaded: Sites have been successfully loaded to the ESF DB.
Comments: Sites had to be added. Elevations have been updated (Google Earth).
Total: 128 Sites.
SiteCode
UHL
SLT
SHC
HST
KRT
SHA
AYR
OWT
USR
HWR
FMR
HLR
LRC
GRR
SHR
CLC
STC
EFG
EFY
EFM
EFB
ELI
DWT
DAM
Sitename
Upper Hall Run at Clough Pike
South Lucy Tributary at Apple Rd
Shaylor Crossing at County Lift Station in
Shaylor Crossing Subdivision
Heiserman Stream at Rt 50 crossing and
Experimental Stream Facility Intake
Tributary to Kain Run at rt 276 crossing
South Harsha Tributary at Bantam Rd
crossing State Park
Aveys Run at Cincinnati Nature Center's
"Stream Site B"
Owensville Tributary at County lift station
off of St. Louis Rd
Upper Salt Run at County lift station off of
Shepard Rd.
Howard Run at Biddings Rd Crossing
Fourmile Run at Elick Rd crossing
Hall Run at Roundbottom Rd crossing
Lucy Run at Steve Wilson's property off of
SR222
Grassy Fork at Glancey-Marathon Rd
crossing
Shaylor Run at Bethel Rd Crossing
Cloverlick Creek in State Park at Austin Rd
off of 133
Stonelick Creek at RT 50 crossing
East Fork River @ Morgan Rd
East Fork River @ Fayetteville
East Fork River @ Marathon
East Fork River @ Blue Sky Parkway
East Fork River @ Williamsburg
Drinking Water Treatment Plant
East Fork River Below East Fork Lake Dam
Latitude
39.08526
39.04119
39.08114
39.14839
39.06425
39.0072
39.12321
39.12775
39.11801
39.12367
39.05586
39.14039
39.05701
39.13286
39.11834
38.98984
39.12203
39.2191
39.18764
39.13799
39.11526
39.05272
39.05131
39.02614
Longitude
-84.295
-84.2129
-84.2319
-84.2565
-84.0719
-84.1368
-84.2463
-84.1364
-84.2596
-84.0077
-84.1681
-84.259
-84.1795
-84.0152
-84.2163
-84.0594
-84.1986
-83.9156
-83.9381
-84.0029
-84.0246
-84.0515
-84.1361
-84.1482
Elevation_m
270.34
254.8
249.92
159.4
262.72
248.4
185
243.52
191.1
263.64
182.26
158.49
182.57
262.12
169.16
244.54
163.97
279.18
27.95
261.2
258.46
240.78
212.74
186.53
C-l
-------
EFK
EFC
ESF
LEF
2EFR1120
5
2EFR2100
1
2EFR1000
2
2EFR2000
4
2EFR2002
2
2EFR2002
o
6
2EFR2002
4
2EFR2002
5
2EFR2000
1
2EFR1000
0
EFL
CEC
1IJ000600
01
1IJ000600
02
1IN00287
001
1PB00105
001
1PB00105
801
East Fork River @ Experimental Stream
Facility
East Fork Lake @ Confluence w/ Little
Miami River
Experimental Stream Facility Weather
Station
Lower East Fork Waste Water Treatment
Plant Effluent
Poplar Creek and Ohio 232, Bethel O. Quad.
East Fork Lake, Cloverlick Inflow Tributary
above Poplar Creek
EFLMR at Ohio 32 Bridge at Williamsburg,
USGS Gage, Williamsburg, O. Quad
(Equivalent to EPA "ELI" and CCOEQ
"EFRM34.8")
East Fork Lake, EFLMR Main Inflow
Tributary, Williamsburg, O. Quad
East Fork Lake, New Site 2010 to account
for mixing of major inflows
East Fork Lake, New Site 2010 upstream
point of 'narrows'
East Fork Lake, at mile 24. 0.5mi S of Elk
Lick, Ohio, ON Batavia, OH Quad. At cross
section from STA 20024 to mouth of
Slabcamp Run, 2/10 of length from left bank
East Fork Lake, New Site 2010, Cove section
North of STA 2EFR20024
Located on East Fork Lake at the Log Boom.
(Lake Side of "DAM" near USACE Flow
Control Structure)
EFLMR at CO. RD. Bridge 0.1 Miles off Elk
Lick Road, Batavia, O. Quad. (Near EPA
"DAM" Site)
East Fork Lake near Bob McEwen Drinking
Water Treatment Plant Intake (Sampled from
bridge to intake structure)
Cemetery Creek
Ohio Asphaltic Limestone Corp; Final
effluent discharge from sedimentation basin
to an unnamed tributary of Turtle Creek
Ohio Asphaltic Limestone Corp; Final
effluent discharge from sedimentation basin
to an unnamed tributary of Dodson Creek
Hanson Aggregates Midwest - Highland
Quarry Industrial Discharge Facility
Village of Lynchburg Municipal WWTP,
Final Effluent
Village of Lynchburg Municipal WWTP,
Upstream Monitoring (Pearl St. Covered
Bridge)
39.14608
39.15574
39.14723
39.14676
38.96139
38.99917
39.0525
39.01999
39.01722
39.01611
39.02
39.02833
39.02917
39.02611
39.03695
39.08222
39.24972
39.24306
39.19417
39.23806
39.24306
-84.2519
-84.2884
-84.2547
-84.2559
-84.1067
-84.0858
-84.0506
-84.1311
-84.1017
-84.1114
-84.1311
-84.1319
-84.0867
-84.1478
-84.1387
-84.1767
-83.6814
-83.6825
-83.7222
-83.8003
-83.7953
154.53
150.56
159.71
158.18
253.58
219.14
240.78
216.09
207.56
215.18
199.94
195.06
191.41
186.22
212.74
193.54
354.77
338.92
315.45
297.47
299.3
C-2
-------
1PB00105
901
1PG00100
001
2EFR1120
4
2EFR2000
5
2EFR2100
0
2EFR2300
1
5MILECR
0.5
BARNS1.
9
BRUSHO.
3
CABIN1.5
CLOV5.1
CWL
DODSNO.
1
DWT-0
DWT-
PCD
DWT-PFL
DWT-
PMN
DWT-SET
E01
E02
EOS
E04
£05
Village of Lynchburg Municipal WWTP,
Downstream Monitoring
Rolling Acres Municipal WWTP Final
effluent discharge to an unnamed tributary of
Dodson Creek. Monitoring just after
dechlorination.
Poplar Creek and St Route 125
East Fork Lake near combined confluence of
Cloverlick and Poplar Creek's inflows
East Fork Lake near Cabin Run Inflow
East Fork Lake near Slabcamp Run Inflow
Five Mile Creek @ Bluesky; upstream from
Bluesky Pkwy
Barnes Run @ Bethel-Concord; Bethel
Concord Rd
Brushy Fork RM 0.3 at Titus Road
Cabin Run; At Campground Loop G
Clover Creek RM 1.9 at St. Rt. 133
Cornwell Farm, stream crossing at driveway
to Stahl's Property
Dodson Creek RM 0.1 at mouth; Dodson
Creek @ Crampton Rd.
Inside Bob McEwen Drinking Water
Treatment Plant, Pumped from inside intake
structure on lake
Drinking Water Post Chlorine Disinfection,
before discharge to distribution system
Effluent of all filters currently operating
Drinking Water Treatment Plant - Post
Manganese addition before coagulant is
added
DWT- effluent from settling basin
Mesocosm 1 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 2 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 3 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 4 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 5 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
39.2375
39.22222
38.97379
39.02336
39.01527
39.03919
39.1136
39.0082
39.143
39.0327
38.9856
39.18389
39.22285
39.05169
39.05169
39.05169
39.05169
39.05169
39.14718
39.14718
39.14718
39.14718
39.14718
-83.8006
-83.6856
-84.1058
-84.1013
-84.0984
-84.1364
-84.0203
-84.0733
-84.1447
-84.1019
-84.0528
-84.0133
-83.8127
-84.1356
-84.1356
-84.1356
-84.1356
-84.1356
-84.2548
-84.2548
-84.2548
-84.2548
-84.2548
297.17
344.41
247.5
218.8
208.2
212.4
264.9
249.92
199.02
237.7
250.84
286.19
294.12
261.2
262.4
262.1
261.5
261.8
159.7
159.7
159.7
159.7
159.7
C-3
-------
E06
E07
EOS
EFRM0.7
EFRM15.
6
EFRM34.
8
EFRM44.
1
EFRM60.
6
EFRM70.
1
EFRM75.
3
EFRM9.1
ERS
GRSSYO.
2
GRSSY3.
0
GRSSY3.
2
HALL0.2
HWRD0.4
NLT
NWT
PLEAS0.2
POPLR2.1
SHYLR1.
7
SLAB0.5
ST13.4
Mesocosm 6 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 7 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
Mesocosm 8 in the Experimental Stream
Facility, 1003 US Route 50, Milford, OH
45150
EFLMR RM 0.7 at S. Milford Rd. (Same as
EPA EFC site 0)
East Fork @ State Route 222
EFLMR @ Main st. autosampler;
Williamsburg Main Street Bridge
EFLMR RM 44. 1 at Blue Sky Park Rd.
EFLMR RM 60.6 at US 50 Fayetteville Lift
Station
East Fork @ Wise Road; Clinton Co Wise
Road Bridge
East Fork @ Canada Rd; Canada Road
Bridge
East Fork @Stonelick-Olive Branch;
upstream Stonelick-Olive branch bridge
East Fork River Supply Monitoring Station
in the Experimental Stream Facility, 1003
US Route 50, Milford, OH 45150
Grassy Fork @ GC-Marathon Rd
autosampler; Glancy Corner-Marathon Road
Grassy Fork Marathon Edenton; 5764
Marathon-Edenton Road
Grassy Fork @ 131 Brown County; river left
tributary, U.S. 131, Brown County
Hall Run @ Roundbottom Rd
Howard Run @ Burdsall; Burdsall Road
North Lucy Tributary at confluence with
South Lucy Tributary
Cedarville Rd stream crossing north of Main
St., Newtonsville OH
Pleasant Run Hutchinson Prop; Hutchinson
Road off SR 133
Poplar Creek @ Macedonia; Macedonia
Road Bridge
Shayler Run @ Baldwin Rd
Slabcamp Run accessed via Greenbriar Rd;
trailhead across from NE WTP pond
Stonelick Creek RM 13.4 at 727
39.14718
39.14718
39.14718
39.1533
39.0624
39.0525
39.1158
39.1865
39.22513
39.2731
39.11869
39.14718
39.1329
39.1664
39.1741
39.1401
39.1239
39.04121
39.185
39.1084
38.982
39.1183
39.0511
39.2097
-84.2548
-84.2548
-84.2548
-84.2975
-84.1789
-84.0504
-84.025
-83.9372
-83.8255
-83.7812
-84.2086
-84.2548
-84.0153
-84.0092
-83.9982
-84.2593
-84.0073
-84.212
-84.095
-84.0374
-84.101
-84.2163
-84.129
-84.0889
159.7
159.7
159.7
151.48
176.47
240.5
259.98
271.56
298.38
303.87
163.06
159.7
262.1
274.31
278.88
158.8
264.6
250.5
265.47
258.8
237.4
169.2
245.7
262.42
C-4
-------
ST17.3
ST5.7
STEFLM
R
TBS
ULREY1.
o
5
1IN00116
601
1IN00123
902
1PA00005
001
1PB00001
001
1PB00034
001
1PC00005
001
1PD00024
001
1PH00031
001
1PK00010
001
1PNOOOOO
002
1PP00020
001
1PR00116
001
1PT00077
001
1PV00002
001
1PV00009
001
1PV00034
001
1PV00074
001
1PX00059
001
1PZ00029
001
1PNOOOOO
001
2PD00024
002
Stonelick Creek @ Stonelick Tr; u/s from
Stonelick Tr Private Drive
Stonelick Creek RM 0.9 at Anstaett Rd.
Stonelick Creek at US 50
Twin Bridges Road, stream crossing just east
of horse camp
Ulrey Run@ St. Rt. 125; u/s from St. Rt. 125
USEPA Experimental Stream Fac
CECOS International Inc
New Vienna WWTP
Batavia WWTP
Williamsburg WWTP
Milford STP
Fayetteville Perry Twp WWTP
Martinsville-Midland WWTP
Middle East Fork Regional WWTP
US DOA William H Harsha Lake
Stonelick State Park Campgrounds
Cincinnati Nature Center - Rowe Woods
Recreation
Clermont NE Local Schools WWTP
Holly Towne MHP
Orchard Lake MHP
Forest Creek MHP
Royal Hills MHP
Locust Ridge Nursing Home Inc
Snow Hill Country Club
US DOA William H Harsha Lake
South Pleasant Street Lift Station Combined
Sewer Overflow
39.232
39.1429
39.1222
39.03806
39.0017
39.1475
39.12694
39.32889
39.08222
39.05667
39.16444
39.18361
39.32333
39.08861
39.02639
39.22444
39.12778
39.12806
39.00722
39.19111
38.99083
39.18
38.9875
39.34917
39.22
40
-84.044
-84.1492
-84.1991
-84.1203
-84.1515
-84.2553
-84.0519
-83.7033
-84.1767
-84.0464
-84.2836
-83.9392
-83.8694
-84.1872
-84.145
-84.0578
-84.2472
-84.1072
-84.1783
-84.2464
-84.1589
-84.2511
-84.0225
-83.7158
-84.06
-85
268.2
195.67
164
252.97
239.3
160
271
336
170
241
153
272
297
170
249
269
233
256
262
246
255
232
274
335
272
-9999
C-5
-------
DAMCS
DRI
DWF
GWS
REF
ROA
TMC
QAC
GWT
FF
Located on East Fork Lake at the Log Boom.
(Lake Side of ""DAM"" near USAGE Flow
Control Structure)
Site managed by Jake Beaulieu
Drinking Water Treatment Plant, Post
Filtration, Sampled by Macke for Elovitz
ESF Ground Water Source
Site managed by Jake Beaulieu
Site managed by Jake Beaulieu
Site managed by Jake Beaulieu
Quality Assurance Quality Control Sample
Grailville Treatment Wetland
??
39.02
40
39.0369
39.14718
40
40
40
39.14718
40
40
-84.1536
-85
-84.1387
-84.2548
-85
-85
-85
-81.2548
-85
-85
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
Physico-chemical Data Water and Sediment - EXCEL Sheets
Variables
Loaded:
The variables listed below have been loaded and mapped to the existing ODM vocabulary.
VariableCode is a unique identifier and consists of the AnalyteName-Matrix.
DRP-SW
NH4-SW
NH4-
Urease-SW
NO2-SW
NO2-3-SW
TDN-SW
TDP-SW
TN-SW
TNH4-SW
TNO2-SW
TNO2-3-
SW
TOC-SW
TP-SW
TRP-SW
TUREA-
SW
UREA-SW
DOC-SW
DOC-AD
DOC-BP
DOC-DW
DOC-IG
DOC-WW
DRP-AD
DRP-BP
DRP-DW
Phosphorus, orthophosphate
Nitrogen, NH4
Ammonium analyzed as an endpoint to 24hr Urease Assay
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, total dissolved
Phosphorus, total dissolved
Nitrogen, total
Nitrogen, NH3 + NH4
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Carbon, total organic
Phosphorus, total
Phosphorus, orthophosphate dissolved
Urea, total
Urea, dissolved (filtered)
Carbon, dissolved organic
Carbon, total organic
Carbon, dissolved organic
Carbon, dissolved organic
Carbon, dissolved organic
Carbon, dissolved organic
Phosphorus, orthophosphate
Phosphorus, orthophosphate
Phosphorus, orthophosphate
P
N
N
N
N
N
P
N
N
N
N
C
P
P
N
N
C
C
C
C
C
C
P
P
P
C-6
-------
DRP-IG
DRP-WW
NH4-AD
NH4-BP
NH4-DW
NH4-IG
NH4-
Urease-AD
NH4-
Urease-BP
NH4-
Urease-DW
NH4-
Urease-IG
NH4-
Urease-
WW
NH4-WW
NO2-3-AD
NO2-3-BP
NO2-3-DW
NO2-3-IG
NO2-3-
WW
NO2-AD
NO2-BP
NO2-DW
NO2-IG
NO2-WW
TDN-AD
TDN-BP
TDN-DW
TDN-IG
TDN-WW
TDP-AD
TDP-BP
TDP-DW
TDP-IG
TDP-WW
TN-AD
TN-BP
TN-DW
TNH4-AD
TNH4-BP
TNH4-DW
TNH4-IG
TNH4-WW
TN-IG
TNO2-3-
AD
Phosphorus, orthophosphate
Phosphoras, orthophosphate
Nitrogen, NH4
Nitrogen, NH4
Nitrogen, NH4
Nitrogen, NH4
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Ammonium analyzed as an endpoint to 24hr Urease Assay
Nitrogen, NH4
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, total dissolved
Nitrogen, total dissolved
Nitrogen, total dissolved
Nitrogen, total dissolved
Nitrogen, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Phosphorus, total dissolved
Nitrogen, total
Nitrogen, total
Nitrogen, total
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, NH3 + NH4
Nitrogen, total
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
P
P
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
P
P
P
P
P
N
N
N
N
N
N
N
N
N
N
C-7
-------
TNO2-3-BP
TNO2-3-
DW
TNO2-3-IG
TNO2-3-
WW
TNO2-AD
TNO2-BP
TNO2-DW
TNO2-IG
TNO2-WW
TN-WW
TOC-AD
TOC-BP
TOC-DW
TOC-IG
TOC-WW
TP-AD
TP-BP
TP-DW
TP-IG
TP-WW
TRP-AD
TRP-BP
TRP-DW
TRP-IG
TRP-WW
TUREA-
AD
TUREA-BP
TUREA-
DW
TUREA-IG
TUREA-
WW
UREA-AD
UREA-BP
UREA-DW
UREA-IG
UREA-WW
TNH4-
SWNR
TUREA-
SWNR
TN-SWNR
TP-SWNR
TUREA-
GW
TRP-GW
TNH4-GW
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, total
Carbon, total organic
Carbon, total organic
Carbon, total organic
Carbon, total organic
Carbon, total organic
Phosphorus, total
Phosphorus, total
Phosphorus, total
Phosphorus, total
Phosphorus, total
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Phosphorus, orthophosphate dissolved
Urea, total
Urea, total
Urea, total
Urea, total
Urea, total
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Urea, dissolved (filtered)
Nitrogen, NH3 + NH4
Urea, total
Nitrogen, total
Phosphorus, total
Urea, total
Phosphorus, orthophosphate dissolved
Nitrogen, NH3 + NH4
N
N
N
N
N
N
N
N
N
N
C
C
C
C
C
p
p
p
p
p
p
p
p
p
p
N
N
N
N
N
N
N
N
N
N
N
N
N
P
N
P
N
-------
TNO2-DI
TNO2-3-
SWNR
TNO2-
SWNR
TRP-
SWNR
TNO2-3-
GW
TN-GW
TP-GW
Nitrogen, nitrite (NO2) nitrogen
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, nitrite (NO2) nitrogen
Phosphorus, orthophosphate dissolved
Nitrogen, nitrite (NO2) + nitrate (NO3) nitrogen
Nitrogen, total
Phosphorus, total
N
N
N
P
N
N
P
Comments:
OffsetTypes
Loaded:
Tray 1.1 A
Tray 1.2 A
Tray 1.3 A
Tray LIB
Tray 1.2B
Tray 1.3B
Tray 2.1 A
Tray 2.2A
Tray 2.3A
Tray 2. IB
Tray 2.2B
Tray 2.3B
Tray 3.1 A
Tray 3.2A
Tray 3.3A
Tray 3.IB
Tray 3.2B
Tray 3.3B
Tray 4.1 A
Tray 4.2A
Tray 4.3A
Tray 4. IB
Tray 4.2B
Tray 4.3B
Tray 5.1 A
1. The variables in the Excel sheets had to be changed to enable them to be
committed to the Database.
2. Specifications, VariableUnitsID and Value Type had to be checked.
3. VariableNameCV. The following new variable terms have been loaded to this
table: urea, dissolved (filtered) and urea, total urea.
4. File: ESF-DataValues-final.csv
The following new offset terms have been added
C-9
-------
Tray 5.2A
Tray 5.3A
Tray 5.IB
Tray 5.2B
Tray 5.3B
Center
Left
Right
Below Surface Water
Below Surface
Sensor Data Water-immersed Sensors for Physico-chemical Variables
http://66.161.146.122/ESF Field Data/
1:34 PM
1:10PM
1:00 PM
1:43 PM
1:10PM
12:57 PM
1:00 PM
12:58 PM
12:58 PM
12:56 PM
1:02 PM
12:55 PM
12:55 PM
12:59 PM
that combines date and time and a second column with OffsetTypelD (3 which
describes Below Surface water in ft).
The sensor data variables listed below have been loaded to the ODM.
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Friday,
Comments:
July
July
July
July
July
July
July
July
July
July
July
July
July
July
For all
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
08,
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
2011
the sensor data fil
1549191
880313
74762
14801139
476831
4335011
4382015
3695128
4439957
4352695
1482854
3915199
4415080
3998020
CEC.csv
CLC.csv
FMR.csv
HST.csv
FIWR.csv
KRT.csv
LRC.csv
OWT.csv
SHA.csv
SHC.csv
SFIR.csv
SLT.csv
UHL.csv
USR.CSV
Variables
Loaded:
VariableC
ode
Temp
SPCOND
DoSat
VariableNam
e
Temperature
Specific
conductance
Oxygen,
dissolved
percent of
saturation
Variab
lellnits
ID
96
192
1
ValueType
Field
Observation
Field
Observation
Field
Observation
TimeSupport
10
10
5
TimeUnitsID
104
104
104
DataType
Continuous
Continuous
Average
GeneralCategory
Water Quality
Water Quality
Water Quality
C-10
-------
DO
PH
ORP
TURB
Oxygen,
dissolved
pH
Reduction
potential
Turbidity
199
309
169
221
Field
Observation
Field
Observation
Field
Observation
Field
Observation
5
5
5
5
104
104
104
104
Continuous
Continuous
Continuous
Continuous
Water Quality
Water Quality
Water Quality
Water Quality
C-ll
-------
This page intentionally left blank
C-12
-------
APPENDIX D: DESCRIPTION OF SHEPHERD'S CREEK DATA LOADED TO THE
ODM
Loaded:
Comments:
Total:
Sources
Sources have been successfully loaded to the Shepherd Creek DB.
Contacts had to be updated.
Loaded:
Comments:
Total: 6 Sites.
Sites
Sites have been successfully loaded to the SC DB.
Sites had to be added. Elevations have been updated.
SiteCode
CON
PWR
DRI
ROA
URB
REF7
Sitename
ConS(Catch)
Pwr2(Subl)
Dri3(Sub2)
Roa4(Sub3)
Urb6(Sub4)
Ref7(Sub5)
Latitude
39.17329
39.18222
39.1773
39.17879
39.18235
39.17649
Longitude
-84.5797
-84.5814
-84.5785
-84.576
-84.5746
-84.5781
Elevation_m
215
242
225
233
242
223
Physico-chemical Data Water - Excel Sheet
Variables
Loaded:
The variables listed below have been loaded and mapped to the existing ODM vocabulary.
VariableCode is a unique identifier and consists of the AnalyteName-Matrix.
TP
Temp
TN
Ecoli
Turb
Alk
DOC
TOC
Cl
Br
NO3
S04
DIN
TKN
NH3-N
TOP
Al
Ca
Cu
Fe
Mg
Phosphorus, total
Temperature
Nitrogen, total
E-coli
Turbidity
Alkalinity, carbonate
Carbon, dissolved organic
Carbon, total organic
Chloride
Bromide
Nitrogen, nitrate (NO3)
Sulfate
Nitrogen, dissolved inorganic
Nitrogen, total kjeldahl
Nitrogen, NH3
Phosphorus, total dissolved
Aluminum, dissolved
Calcium
Copper, dissolved
Iron, dissolved
Magnesium
P
Not Applicable
N
Not Applicable
Not Applicable
CaCO3
Unknown
Unknown
Cl
Br
NO3
S04
Unknown
Unknown
N
Unknown
Al
Ca
Cu
Fe
Mg
D-l
-------
Mn
K
Na
Zn
oPO4
ssc
ZnTR
MnTR
FeTR
CuTR
A1TR
Manganese, dissolved
Potassium
Sodium
Zinc, dissolved
Phosphorus, ortophosphate
Suspended Sediment Concentration
Zinc, total reactive
Manganese, total reactive
Iron, total reactive
Copper, total reactive
Aluminum, total reactive
Mn
K
Na
Zn
PO4
Not Applicable
Zn
Mn
Fe
Cu
Al
Comments: 1. The variables in the Excel sheets had to be changed to enable them to be
committed to the Database.
2. Specifications, VariableUnitsID and Value Type had to be checked.
3. VariableNameCV. The following new variable terms have been loaded to this
table: Zinc, total reactive; Manganese, total reactive; Iron, total reactive;
Copper, total reactive; Aluminum, total reactive.
4. File: ShepherdCreek-DataValues-final.csv
D-2
-------
D-3
-------
APPENDIX E:
CUAHS
universities allied for water research
HYDROLOGY OF
JACOB'S WELL SPRING
A tutorial for using HydroDesktop to discover and access water data
Presented at the University of Cincinnati
September 6, 2011
by:
Dr. Tim Whiteaker (twhitgDmail.utexas.edu)
Center for Research in Water Resources
The University of Texas at Austin
And
Dr. David Tarboton
Utah State University
E-l
-------
Distribution
Copyright © 2011, The University of Texas at Austin.
All rights reserved.
Funding
Funding for this document was provided by the Consortium of Universities for the Advancement of
Hydrologic Science, Inc. (CUAHSI) under National Science Foundation Grant No. EAR-0622374. In
addition, much input and feedback has been received from the CUAHSI Hydrologic Information System
development team. Their contribution is acknowledged here.
E-2
-------
Table of Contents
Introduction 4
About Jacob's Well Spring 4
Goals and Objectives 6
Computer and Data Requirements 6
Participating in the Open Source Community 7
Exercise Procedure 8
Getting To Know HydroDesktop 8
Creating a Project 9
Searching for Hydrologic Data 10
Selecting Data for Download 13
Downloading Data 15
Visualizing Time Series Data 16
Labeling Features 18
Delineating Watersheds 20
Searching for Additional Data 21
Adding Data to a Theme 22
Exporting Data 23
Advanced: Analysis with R 25
Enabling the HydroR Plug-in 25
Plotting a Graph with R 26
Analyzing Flow in Jacob's Well Spring 27
Appendix A: R Scripts 30
Script 1: Preparing Inputs for Flow Analysis 30
Script 2: Computing Surface Water Flow Fraction 30
References 32
E-3
-------
Introduction
CUAHSI-HIS enables sharing of water data
The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Hydrologic
Information Systems project (CUAHSI-HIS) has devoted itself to improving access to time series of
water data. Towards that end, CUAHSI-HIS has developed standards for sharing data that make it easier
to ask for water data and interpret what comes back from a given data source. CUAHSI-HIS also
maintains a catalog of data available from organizations that use CUAHSI-HIS standards, essentially
serving as a search engine for water data. The result is a universal mechanism for accessing time series
data, greatly simplifying the typically laborious task of getting the data you need to do your analysis. But
how do people who are unfamiliar with CUAHSI-HIS standards use this system? That's where
HydroDesktop comes in.
HydroDesktop uses CUAHSI-HIS to help you find water data
HydroDesktop is a free and open source Geographic Information Systems (GIS) application that helps
you discover, use, and manage hydrologic time series data published with CUAHSI-HIS. It handles the
details of how to work with CUAHSI-HIS so that you don't have to. HydroDesktop includes data query,
download, visualization, graphing, analysis and modeling capabilities. The result is a spatially-enabled
system that facilitates the aggregation of observational data describing the water environment.
Let's use HydroDesktop to learn about Jacob's Well Spring
This document presents an exercise that shows how to use HydroDesktop to find water data for Jacob's
Well Spring in Texas. With some simple analysis, you will compare characteristics of this groundwater-
dominated system with those of a nearby river. During the exercise, you will learn about some of the
most commonly-used tools in HydroDesktop.
Related Links:
HydroDesktop - http://hydrodesktop.codeplex.com/
CUAHSI Hydrologic Information System - http://his.cuahsi.org/
About Jacob's Well Spring
The underwater cave known as Jacob's Well emerges in Hays County, Texas, at Jacob's Well Spring
where it serves as one of the primary sources of water for Cypress Creek, which later flows into the
Blanco River. The clear, crisp water cools down many Texans as it moves through the Blue Hole
swimming area near Wimberley, Texas. This karst spring has been impacted in recent years by
development in Hays County and increasing demands on the Middle Trinity Aquifer (Davidson, 2008).
E-4
-------
Figure 0-1 Jacob's Well Spring (San Marcos Local News, 2009)
In 2005, a monitoring station was installed at Jacob's Well Spring 18 meters below the ground surface,
reporting flow and temperature conditions at 15-minute intervals. The data for this station are accessible
via the US Geological Survey's National Water Information System (USGS NWIS).
0 m
Location of USGS
gauging equipment
40
60 50 40 30 20 10 Om
Figure 0-2 Cross-sectional diagram of Jacob's Well (Davidson, 2008)
E-5
-------
Figure 0-3 Jacob's Well Spring Monitoring Station (United States Geological Survey, 2007)
For more information about the spring, please read the 2008 Masters thesis of Sarah Cain Davidson from
The University of Texas at Austin.
Goals and Objectives
The goal of this exercise is to introduce you to the tools and functions available in HydroDesktop that
allow you to search for and synthesize hydrologic time series data in an area of interest. This exercise will
teach you how to find and obtain data for Jacob's Well Spring in Texas and compare data characteristics
using the analysis capabilities of HydroDesktop.
Objectives for this exercise include:
• Find streamflow and temperature data for Jacob's Well Spring in Texas.
• Identify useful time series and download them.
• Visualize time series data in graphs.
• Export time series data for use in other programs.
Computer and Data Requirements
At the time of this writing, HydroDesktop is still in beta stages, and thus changes are made frequently to
fix bugs and enhance the software. Therefore, it is recommended that you install the version of
HydroDesktop that was used to prepare this exercise: Version 1.2.591 Beta Release. This version is only
compatible with a Windows operating system such as Windows XP or Windows 7. You will also need
an Internet connection since you will be accessing online resources to download time series data.
To install HydroDesktop:
1. In a Web browser, navigate to http://hydrodesktop.codeplex.com/.
2. Click the Downloads link near the top left of the page.
3. Find the link for the 1.2.601 Beta Release installer and click it.
4. Read the license and agree to it.
5. Save and run the installer, accepting all defaults. The installer will guide you through the
rest.
E-6
-------
An advanced portion of the exercise involves using the R statistical package within HydroDesktop. R is a
separate program from HydroDesktop, so you will need to install it if it is not already installed on your
computer. It is not included with the HydroDesktop installation.
To install R:
1. In a Web browser, navigate to http://www.r-proiect.org/.
2. Click the download R link.
3. Click the link for a download site near your current location.
4. Click the link for your operating system (most likely Windows if you are using
HydroDesktop).
5. Click the link for the base installation.
6. Click the link to Download R.
7. Run the setup file and follow the instructions to complete the installation.
Participating in the Open Source Community
HydroDesktop is an open source product, which means that anyone can see the source code used to create
the program and contribute to its development. Even if you aren't a programmer, you can still participate
in the discussion forums and post bugs or feature requests in the issue tracker.
The home for HydroDesktop is on CodePlex, a Web site for open source software. To add to the
discussions or post a bug, you must first register for your free CodePlex account. Once you have a
CodePlex account, you can log in at http://hvdrodesktop.codeplex.com/ and start contributing. The
community is really what drives open source software development, so this is an exciting opportunity to
make your voice heard.
You are encouraged to provide feedback on any issue or problem you may encounter throughout this
exercise. Feel free to utilize online resources such as the issue tracker on the HydroDesktop Web site
when providing feedback. In this exercise you'll learn how to access these resources directly through
HydroDesktop.
E-7
-------
Exercise Procedure
Suppose you live in Hays County in Texas, and for years you have enjoyed taking a dip in the Blue Hole
swimming area along Cypress Creek during hot Texas summers. As population growth and increased
groundwater pumping threaten Jacob's Well Spring, the primary source of water for Cypress Creek, you
decide to learn more about this valuable resource. In this exercise, you'll use HydroDesktop to find
temperature data and see how it compares to a nearby river.
IMPORTANT
At the time of this writing, HydroDesktop is still in the beta stages of software development and thus still contains
bugs. We are working hard to fix these bugs, but you may want to closely and carefully follow the exercise
procedure in the meantime in order to minimize bugs encountered.
Getting To Know HydroDesktop
Let's open HydroDesktop and get to know its user interface.
1. Open HydroDesktop (Start I All Programs I CUAHSI HIS I HydroDesktop I HydroDesktop).
2. Choose to create a new North America project and click OK.
Take a moment to explore the user interface. As you can see, HydroDesktop looks much like a typical
GIS interface. It supports complex layer symbologies, access to online map services, and custom
programmed tools and plugins. It even comes with some basemap shapefile data which are already added
to the map. What sets HydroDesktop apart from other GIS applications, is the ability to query for
hydrologic time series data.
Notice that HydroDesktop presents many of its controls on a ribbon, much like modern versions of
Microsoft Office. The ribbon is organized into tabs which contain groups of buttons and tools. There is
also an orb for accessing basic functions like saving and printing.
HydroDesktop Orb Button
b_ o
«— Cm** ; CJ|I »j K>J>I ^PMI>l
-------
If you have comments or issues as you work through this exercise, you can find helpful resources on the
Help tab. The buttons on this tab let you view documentation, jump to the discussion forums or issue
tracker, email for help, or submit a comment.
3. Click the Help tab in the ribbon to view the buttons available on that tab.
4. Click the Issues button to open the issue tracker on the HydroDesktop Web site.
Home Tibte Graph Edit Help
ConSact Submi!
Support Comrrant
Figure 0-2 Using the Help Tab To Open the Issue Tracker
5. Close the Issue Tracker Web page.
Creating a Project
HydroDesktop manages your work within projects. A HydroDesktop project file (.hdprj) contains
information about what geospatial layers you have in your map and how those layers are symbolized.
These layers are stored in shapefiles, a widely available GIS data format. The shapefiles such as state
boundaries that are included with HydroDesktop are located in its installation folder, e.g.,
C:\Program Files\CUAHSI HIS\HydroDesktop\maps\BaseData.
The HydroDesktop project file also connects your work to a database (.sqlite file) where temporal data are
stored. This is where the time series data that you download through HydroDesktop are saved. A
relational database is much more efficient at storing time series data than shapefiles, and HydroDesktop
uses a free database called SQLite for this purpose.
You can create projects to organize your work, and you can save the project so that you can open it again
later. When you first open HydroDesktop, it sets up a clean map and loads the default system database.
In order to better manage the work in this exercise, you will give this project a name and save it
somewhere meaningful to you.
To save the project and database:
1. Click the HydroDesktop Orb button.
2. In the Orb menu, click Save Project.
3. Choose a location to save your project such as your desktop.
4. Name the project springs and then click Save.
The text in the title bar of the HydroDesktop window should now include the name of your project.
HydroDesktop has also created a database for your project named "springs.sqlilte." This database is saved
in the same location as your project file. You are now ready to work within your newly saved project and
database.
E-9
-------
Searching for Hydrologic Data
When searching for data in HydroDesktop, you can specify the following filters: region of interest, time
period of interest, data source and variables of interest. HydroDesktop then searches the CUAHSI-HIS
national catalog of known time series data to find locations of time series that match your search.
Locations of time series data that match your search are presented in the map. These results include
information that HydroDesktop can use to connect to each individual data provider for data access. You
can further filter the results and then choose which data you want to actually download and store in your
database.
When you save data to your database, it is stored as a theme. A theme is a collection of hydrologic time
series data that share a common relationship. A theme can be anything from a geographic space (e.g.,
Texas, Colorado) to a hydrologic event (e.g., flood, hurricane) to a combination of both (e.g., Texas
Flood). Simply put, a theme organizes a collection of related time series. HydroDesktop can save data to
a new theme or append data to an existing theme. The workflow for finding data and saving it to a theme
is shown in Figure 0-3.
Start
Select Region
(where)
J
1
^
Select Variable(s)
(what)
j
Select Time Period
(when)
V. v1
Cil+ar Baciil+c
V. -S
f" **\
Select Service(s)
(who)
L J
|
v J
>C\
\J
End
Figure 0-3 Workflow for Searching for Hydrologic Data
In this exercise, you will locate streamflow and temperature data for one water year near Jacob's Well
Spring. The county boundary for Hays County is included in the U.S. Counties layer that is already in the
map. You'll use this boundary to restrict the area being searched.
1. Click the Home tab in the ribbon to activate it.
2. In the Search Panel on the right, under the Area tab, choose U.S. Counties in the list of
Active Layers. The map activates this layer while the Search Panel shows the fields in this
layer.
3. Under Select Search Parameter, scroll through the names until you find Hays, TX. Click
Hays, TX to select it. The map zooms to the county and highlights it in blue.
E-10
-------
Ke>wrt» Rewti Search
Active Layer U 5 Ca/t*i
Siea Search Parameter
NAME
Figure 0-4. Choosing a Search Area
Tip
As you build your query, the current search parameters are shown in the Search Summary at the bottom of the
Search Panel. To give yourself more room to work, you can hide the Search Summary by clicking the Hide Search
Summary button ^J. Click the button again to show the Search Summary once it is hidden.
Next you will tell HydroDesktop the date range of time series that you want. For this exercise, search for
data available in the 2010 water year (i.e.. 10/1/2009 to 9/30/2010).
1. In the Search Panel, click the Options tab to activate it.
2. Specify a Start Date of 10/1/2009 and an End Date of 9/30/2010. You can click and type
the numbers in directly, or you can click the drop down arrow next to the date to open an
interactive calendar.
Tip"
On the Options tab, you can also select specific Web services (i.e., data sources) to query. The default is to
search all Web services.
Next you will tell HydroDesktop what hydrologic variables you want. To help you in this regard.
HydroDesktop employs a list of official CUAHSI-HIS keywords for hydrologic variables. Data providers
use this list when registering with CUAHSI-HIS. This is a lot easier than typing whatever term the data
provider may be using internally (e.g., 00060 for USGS streamflow).
3. In the Search Panel, click the Keywords tab to activate it.
4. Start typing "streamflow" in the Keywords text box. The list of keywords automatically
selects keywords that match your search.
5. Click Streamflow in the list of keywords.
E-ll
-------
rj,
tea option* Keywwb
frst lew ietlsra
Hydrosphere
L S**c." Nxiagemtrt
Figure 0-5 Choosing a Hydrologic Variable Keyword to Search
When you click Streamflow, the keyword list automatically jumps to the term "Discharge, stream." It
just so happens that "Discharge, stream" is the official keyword for what we think of as streamflow.
However, the keyword list also includes synonyms like "streamflow" to make it easier to find the variable
you're after. To the right of the keyword list, you can see where stream discharge fits within the overall
hierarchy of hydrologic variables.
Now that you've identified the right keyword, you'll tell HydroDesktop to add that keyword to the list of
variables for which it will search.
9. In the Search Panel, Click the Add button J to add "Discharge, stream" into the Selected
Keywords box.
Now you will repeat the process to add the water temperature keyword.
10. In the Keywords text box, start typing "temperature."
11. Click Temperature, water in the list of keywords.
12. In the Search Panel, click the Add button to add "Temperature, water" into the Selected
Keywords box.
With search parameters set, you will now tell HydroDesktop to run the search for data.
13. In the Search Panel, click Run Search.
When you run a search, HydroDesktop asks the CUAHSI-HIS national catalog for descriptions of time
series that match your search criteria. At this point, your software is using a remote online resource and
bringing back data to display in your map. After HydroDesktop has finished searching for time series, it
displays the locations of time series that fit your search in a map layer called "Search Results." Note that
there will be several "dots" at a single location in the map if the site represented by a given dot measures
more than one time series that matches your search. In other words, each dot in the map represents a time
series of data. Different symbols for the dots indicate different data sources.
E-12
-------
Figure 0-6 Locations of Streamflow and Water Temperature Observations in Hays County, TX
While the search may have seemed fast, remember that your map is only showing where time series of
interest are located, and that you haven't actually downloaded any time series values yet. Now you can
begin to refine these search results to locate time series that you actually want to download and save to
your database.
Selecting Data for Download
To help you identify time series of interest, HydroDesktop includes data and tools to give you a spatial
context for the data. One of these is the ability to display online basemaps from ESRI, Bing, and
OpenStreetMap. These are beautiful cartographic maps cached at multiple scales which are accessed in
real time as you move around in the HydroDesktop map. For this exercise, you will enable the ESRI
Hydro Base Map. This map shows a nice blend of hydrologic features and administrative boundaries.
To enable the basemap:
1. On the Home tab in the ribbon, find the Online Basemap group. Click the drop down list of
basemaps and choose ESRI Hydro Base Map.
ESRI Hydro Base Map
Opacity 100 -
Online Basemap
Figure 0-7 Enabling an Online Basemap
In addition to providing spatial context, you can see that this basemap can help you produce a more
aesthetically pleasing printed map.
For this exercise, you will work with the sites and variables shown in Table 0-1. You will select the
features that represent these time series so that HydroDesktop knows which time series you want to
download. These time series are located at or near Jacob's Well Spring.
Table 0-1 Selected Time Series in the San Marcos River Basin
SiteName
Blanco Rv nr Kyle, TX
Jacobs Well Spg nr Wimberley, TX
Jacobs Well Spg nr Wimberley, TX
Blanco Rv at Halifax Rch nr Kyle, TX
VarName
Discharge, cubic feet per second
Discharge, cubic feet per second
Temperature, water, degrees Celsius
Temperature, water, degrees Celsius
DataType
Average
Average
Average
Average
E-13
-------
To select time series for download:
1. In the Map Contents on the left, right-click on the Search Results layer name and click View
Attributes.
a
NWISIIC Set Dyrwnc Wsttty Scale
WIYIStA L*elns
S£«*t?>Wl
BSOn««L« ^_U
B E B«H M«p D
Figure 0-8 Viewing the Search Results Layer Attributes
The Attribute Table Editor opens showing you descriptions of time series in the Search Results layer.
You can scroll through the table and resize columns to see the information. Some key columns to note
are:
• SiteName - The name of the monitoring point where the time series is recorded.
• VarName - The name of the variable represented by the time series.
• DataType - Some sites report several statistics of data. For example, at Jacob's Well Spring, you
can find minimum, maximum, and average streamflow values computed on daily time step.
• StartDate, EndDate, ValueCount - These fields give you sense of the overall period of record for
a time series and the number of values for its period of record.
As indicated in Table 0-1, you will focus on average streamflow and temperature for Jacob's Well Spring,
and you'll also choose a couple of time series from nearby sites for comparison.
2. In the Attribute Table Editor, use the values in the SiteName, VarName, and DataType
columns to locate rows that match the values in Table 0-1. While holding down the CTRL
key, left click on these rows to select them.
E-14
-------
Attribute Table Editor
Edit View Selection Tools
El -i j * m
Bear Ck bl FM 1S26 nr Driftwood. TX Discharge, cubic feet per second
Blanco Rv at Wimberley. TX
Discharge, cubic feet per second
Blanco Rvnr Kyle. TX
Discharge, cubic feet per second
Discharge, cubic feet per second
Blanco Rvnr Kyle, TX
Jacobs Well Spg nrWMxafay, TX
Discharge, cubic feet per second
Jacobs Well Spg nr Wimberiey. TX
Discharge, cubic feet per second
C:\Tem plsea rch resu I t_N Wl S DV.sh p
Figure 0-9 Selecting Time Series for Download
3. Close the Attribute Table Editor.
With the items selected, you are ready to download the data.
Downloading Data
Recall that when you save data, it is organized into a Theme and stored in the database associated with
your HydroDesktop project. You'll now download the data that you selected.
To download data:
1. In the Search Panel, in the Results tab, type "Hays County Data" in the New Theme text box,
and make sure New Theme is selected.
2. Click Download Data.
Save data to..
© Now Theme
Hays CAiry Data'
InfifTM
Figure 0-10 Saving a Theme
A download manager opens to show progress of the download.
E-15
-------
' Download Manager
Download Complete.
Total series: 4 Downloaded and saved:
Remaining series: 0 With error:
Estimated time: 00:00:00
fJ'vVISDV:Cei712... N'',VISDV:CCC1 C/
Blanco Rv at Mali... Temperature, wat
Figure 0-11 Download Manager
3. Dismiss the message box and hide the download manager when it is finished.
Once the download is complete, the new theme is shown under the Themes group in the Map Contents. If
you create additional themes, they will also appear in the Themes group.
Now that you've downloaded the data, you can view the data in both tabular and graph form.
Visualizing Time Series Data
HydroDesktop takes a series-centric view of temporal data, meaning that it provides access to the data at
the time series level. An example of a time series is all of the temperature values measured at a certain
point on the Blanco River. Let's take a look at the time series that you just downloaded.
To visualize time series data in HydroDesktop:
1. Click the Graph tab in the ribbon to activate it.
On the left you will see a list of all the time series in your database. You could use filters to restrict the
time series that are shown, but you only retrieved a handful in this exercise so it's fine to leave the default
view.
2. Click to place a check next to Temperature at the Blanco River at Halifax Ranch.
From the graph you can see how the temperature in the water changes with the seasons throughout the
year. Now let's compare this time series with the one for Jacob's Well Spring.
3. Place a check next to Temperature at Jacob's Well Spring.
HydroDesktop allows you to visualize multiple time series on the same graph. The plot axes
automatically adjust to fit your data. In this example, there is a dramatic difference between the two
temperature time series. The one for Jacob's Well Spring shows much less variation throughout the year
than the one for the Blanco River at Halifax Ranch.
E-16
-------
SefectenToei
© ALL O
Q Conwte* Fief
Temperature, water •
II :
Barao Rv rtr Ky
.^,-c-t-! ',Ve! £05
TemperaUre. watw
. wdte1 Jacobs W*i Sp$
35
1 30
-------
36
er - degrees Celsius
Temperature
drop
t.-A^H,
Flow £ .
increase u f*ft
. '%$ *
0
Q
120
100
80
- 60 o
40 8,
o
- 20 Q
Aug-2009 Nov-2009 Feb-2010 May-2010
Date and Time
Aug-2010 Nov-2010
— •— Blanco Rv at HaSifax flch nr Kyle TX. ID: 4
Jacobs WeH Spg or Wlmbertey. TX. ID: 3
— •— Jacocs wen &og nr wimuertey TX, ID; 2
Figure 0-13 Examining Changes in Flow and Temperature
7. Uncheck the two temperature time series, and place a check next to Discharge at the
Blanco River near Kyle.
The flow of the Blanco River dwarfs that of Jacob's Well Spring, but you can still see increases in flow in
the Blanco River at about the same time as those observed in the spring. Also notice the peak flow in
September, 2010. This flow is the result of tropical storm Hermine as it swept through Texas.
Labeling Features
Are you wondering where the time series that you just graphed are located? In this portion of the
exercise, you will add labels to the map to identify site locations.
1. Click the Home tab in the ribbon to activate it.
2. In the Map Layers, uncheck Search Results to turn off that layer. Now you should only see
the three sites for which you downloaded data.
3. In the Map Layers, right-click on the Hays County Data layer name and click Labeling | Label
Setup.
E-18
-------
'.-<&
^^-
%--
'Stare* Sesults-
NWISOV
V NWISllD
aBntMrnw
'-: -V] HiyiGourtyDiti
f NWSOV
- J1 -:.V,i •!!.• U.ovf'i.;;-,
S El Base Map Data
H IVi laksi
Zoom to Layer
ViewAttnbutei
Set Dynamic Vtubl.tv Scale
Ubci.ng t
Selection >
• -
Label Setup ^U
Set Dynamic Visibl
Figure 0-14 Accessing Label Setup
4. In the Feature Labeler dialog, in the list of Field Names, double-click SiteName to use that
field for the labels. [SiteName] will be added to the text box near the bottom of the dialog.
Foalure Labder
Figure 0-15 Choosing a Field for Labels
5. Click Apply, and then click OK. Your map should now be labeled according to the settings
you have just chosen. You can clearly see Jacob's Well Spring and the two sites along the
Blanco River.
E-19
-------
Jacobs Well Spgwir Wimberley.'TX.--''
Blanco Rvwir Kyle, TX
Figure 0-16 Labeling USGS Sites
Delineating Watersheds
At this point, it might be nice to see precipitation data in this watershed for this water year. You can
delineate watersheds for any river in the conterminous U.S. using a Web service provided by the EPA.
All you have to do is click on the desired watershed outlet location in the map, and then HydroDesktop
sends that point location to the EPA service. The service figures out which National Hydrography
Dataset (NHD) reach the clicked point is closest to, and then finds all catchments that the reach drains.
The catchments are merged into a single watershed and returned to HydroDesktop.
Note that the watershed returned is for the outlet of the entire reach, so if the point you clicked isn't at the
reach outlet, then the resulting watershed will include some additional area downstream of your clicked
point. Thus, this tool is useful for helping to identify an area of interest but should not be used to
determine watershed parameters such as area. Future versions of the tool will support more precise
delineation.
In this portion of the exercise you will delineate a watershed for the area draining to Jacob's Well Spring.
The watershed delineation tool is part of a HydroDesktop extension called EPA Delineation.
To delineate a watershed for the area draining approximately to the Jacob's Well Spring location:
1. On the Home tab, in the EPA Tool panel, click Delineate to activate the delineation tool.
2. The tool prompts you for where to save the resulting datasets. Accept the defaults by clicking
OK.
3. Click on the site location for Jacob's Well Spring.
After a moment, the watershed is shown in the map. The NHD reaches flowing to the point that you
clicked and the point itself are also shown.
r Wimberley; TX.
Figure 0-17 Delineated Watershed
If you didn't get the correct watershed delineated then you can activate the tool and try again. It's OK to
overwrite previous results.
Note
Recall that the watershed is actually delineated for the outlet of the nearest NHD reach, which happens to be
very close to Jacob's Well Spring in this example. Also be aware that the surface watershed you just delineated
defines some but not all of the area contributing water to the aquifer for Jacob's Well Spring. However, the area
will suffice for this exercise which merely demonstrates how to delineate watersheds and use those watersheds
to find data.
E-20
-------
With the watershed delineated, now you're ready to search for data in this watershed.
Searching for Additional Data
You can append this data to your current theme by performing another search. In this exercise, you'll
search for daily precipitation data in the watershed for Jacob's Well Spring.
To demonstrate how to choose a particular data source for a search, you will select to search for
precipitation data from the National Weather Service.
1. Click the Home tab in the ribbon to activate it if it is not already active.
2. In the Search Panel, under the Area tab, choose watershed as the active layer.
3. Under Select Search Parameter, select the only item present.
*«a Options
tave Layer waterahed
i Management
Figure 0-18 Selecting a Watershed to Search
4. In the Options tab, click the Show Web Service Selection Panel checkbox.
5. Click Select None.
6. In the list of services, place a check next to NWS-WGRFC Daily Multi-sensor Precipitation
Estimates. This service provides precipitation data from the National Weather Service West
Gulf River Forecast Center.
Figure 0-19 Selecting a Single Service to Search
7. In the Search Panel, click the Keywords tab to activate it.
8. In the Selected Keywords list, select all keywords from the previous search.
E-21
-------
9. Click the Remove button .* L to remove those keywords from the search.
Figure 0-20 Removing Keywords from a Search
10. In the Keywords text box, start typing "precipitation."
11. Click Precipitation in the list of keywords.
12. In the Search Panel, click the Add button to add "Precipitation" into the Selected Keywords
box.
13. In the Search Panel, click Run Search.
When the search finishes, you'll see some regularly-spaced dots over the watershed. These dots represent
the centroids of NEXRAD HRAP cells. In other words, this Web service provides discrete point
locations where you can basically sample this gridded rainfall dataset. Each of these points is like a
virtual rain gauge. For this exercise, you'll just pick one near Jacob's Well Spring.
Adding Data to a Theme
1. In the ribbon, click the Select tool to activate it.
2. In the Map Layers, make sure the Search Results layer is the only layer selected. The Select
tool works with the selected layer in the list of map layers.
3. Draw a box around one or more precipitation sites to select time series for download.
Jacobs Well Spg nr
Figure 0-21 Graphically Selecting a Time Series for Download
4. In the Search Panel, in the Results tab, select Hays County Data in the Existing Theme text
box.
5. Click Download Data. Hide the download manager once the download is complete.
E-22
-------
The precipitation data are added to your theme. Now let's view the results.
6. Click the Graph tab.
7. Plot a graph with one of the precipitation time series you just downloaded and Discharge
from Jacob's Well Spring. Notice how quickly Jacob's Well Spring responds to rainfall
events.
120 ,
100 -
"O
o
(I
0
fe 80
Q.
**
£
w 60 -
2
3
0
O
= 40 •
ra
3
5
20 -
0 -
mm
I
j|
^^^^~ ^^T^~
I
.
;
4
u^^
*
*
:
1
M
V
%
•
i
i^
\
\
\
!]f\
. >J
f£ ^ ^^ 1 .
1
f
1
iVk
V4w
- 8
- 7
£
- 6 c
__
8
-5^
1
- 4 1
•™
1
c
-3 1
£
a.
- 2 S
Q.
- 1
- 0
Aug-2009 Nov-2009 Fet>-2010 May-2010 Aug-2010 Nov-2010
Date and Time
| • Jacobs well Spg nr wimberley. TX ID 2 - 289744 ID" 6 |
Figure 0-22 Rainfall and Streamflow at Jacob's Well Spring
While HydroDesktop does contain additional analysis capabilities, it can also export data to a text file for
use in other programs.
Exporting Data
HydroDesktop can export data to a variety of output file types for further study and analysis. For
example, you can export individual time series by placing a check next to them and then right-clicking
them in Table or Graph view. For this exercise, you will export all time series for an entire theme.
To export time series data for a theme:
1. Click the Table tab in the ribbon to activate it.
2. In the Data Export group, click Export.
E-23
-------
- tjcercis*1.httprj
:••,'.-.-
J* o
Change New
Database
Help
Add
Metadata
Export
Data Ex port
Figure 0-23 Exporting Data
This tool exports all data in a theme to a delimited text file. In the Export To Text File dialog, notice that
your theme name is already selected by default. You can also control the fields that are included in the
export and choose a delimiter. For this exercise, you will accept all defaults to produce a comma
delimited text file.
3. In the Export To Text File dialog, specify the output file location and name.
4. Click Export Data.
5. Close the Export To Text File dialog when it is finished.
Export IP Text File
•- -•: . "-- |
Theme Hame' Ha>-s CouHy Dala
Q pOutH >Ja Daia- Value*
S*rt Fwyt to Sowrt
0
0
QVarUnt*
-
Setedtf
© GBHHB CSV] Q Tab O a»e«
OBe* OS«ffl«tav
Figure 0-24 Setting Export Options
6. Find the file on your computer and open it to verify that the data were exported.
Congratulations! With your theme data in hand, you have completed the exercise and learned how to
use HydroDesktop to discover and access water data. Feel free to experiment with other functionality
such as creating and printing a map, and be sure to give feedback using the Help tab. This concludes the
main portion of the exercise. For an example of more advanced analysis, continue with the section
below to learn how to use the R statistical environment with HydroDesktop.
E-24
-------
Advanced: Analysis with R
The work above has illustrated temperature, precipitation and discharge data and suggested that variations
in temperature in Jacob's Well Spring may be related to the mixing of surface and subsurface water
sources. In this section, you will use the HydroR plug-in in HydroDesktop to explore this phenomenon.
The HydroR plug-in provides an interface between HydroDesktop and the free R statistical software
environment.
Enabling the HydroR Plug-in
Once R is installed, you will need to enable the HydroR plug-in in HydroDesktop.
To enable the HydroR plug-in:
1. Click the HydroDesktop Orb button.
2. In the Orb menu, click Extensions. In the extensions list, click HydroR. This adds a HydroR
tab to the ribbon. You may have to hover the mouse on and off of Extensions to make the
list open.
3. Click the HydroR tab in the ribbon to activate it.
4. In the HydroR tab, click Start R.
5. If prompted for the path to R.exe, enter the path where it was installed on your computer.
Note that R may include more than one R.exe file. The one you typically need is in the
bin\i386\ folder, as in C:\Program Files\R\R-2.13.0\bin\i386\R.exe. Click OK once the path
is entered. HydroDesktop will remember this path the next time you use the HydroR
extension.
6. HydroDesktop needs some additional R libraries. If you are using the HydroR extension for
the first time, you may be prompted for a CRAN mirror to download these libraries. Select
the mirror closest to your location and click OK. The appropriate libraries are downloaded
automatically.
R should now start and give you a blank R script in the top panel and R Console in the bottom panel.
Standard R commands can be entered in the R Console. HydroR makes it easy to provide R access to the
data you have downloaded with HydroDesktop using the buttons on the ribbon in the HydroR tab.
E-25
-------
& CUAH3 HyrftoOwttop - )««*'* Wefl Sprmj.Mprj
Hen* Tabte G«|3*i Edit
Hj, I
«*--i . UJIwirlmiir ttctn
j^fltRay^ jjinoowr> U^f
R Console
CNTempVljccbsVliccb's Well Spnng-sqfctc
Figure 0-1 HydroR Layout
Plotting a Graph with R
To get familiar with how HydroR works, you'll plot a hydrograph for Jacob's Well Spring.
To plot a graph in HydroR:
1. In the HydroR tab, in the list of time series on the left, select a time series that you would
like to import as a data frame into R. For this exercise, select discharge at Jacob's Well
Spring.
2. In the ribbon, click Generate R code. The R code to get the selected data series is entered
into the script.
3. Click Send All. This sends the script text to the console and executes it. The result is an
object named dataO, which contains a list of R data frames.
4. In the R Console, enter labels (dataO) to see a list of data frames that make up dataO.
These data frames are basically tables from your HydroDesktop database. The key table we're interested
in for this exercise is the DataValues table.
5. To see the first six rows of DataValues, in the R Console, enter head (dataO$DataValues)
(R is case sensitive, so type the command exactly as it appears in this text). You may have to
scroll up in the R Console to see the full result.
You'll use the LocalDateTime and DataValue columns to provide data for the graph.
E-26
-------
7.
6. To make it easier to access this streamflow time series, in the R Console, enter
Q. j acobs=dataO$DataValues. This assigns the DataValues data frame to a variable
named Q.Jacobs.
In the R Console, enter
plot (Q. Jacobs$LocalDateTime,Q. jacobs$DataValue, type="l") (the type is a
lower case L, not a one). A time series plot of the data should appear in an R graphics
window, demonstrating that the full capability of R is available to work with the data that
has been imported.
1^ R Graphics: Device 2 (ACTIVE]
File History Resize
_
03
-2
to
Q
O
oo
Nov Jan Mar May
flowSLocalDateTi me
Jul
Sep
Figure 0-2 Jacob's Well Spring Hydrograph Plotted Using R
Let's add a title to this graph.
8. In the R Console, enter dataO$Variable to see all attributes for the Variable table.
9. In the R Console, enter title (dataO$Variable$VariableName) to add the variable
name as the title for the plot.
10. After verifying that the title was added, close the R graphics window showing the graph.
Analyzing Flow in Jacob's Well Spring
Let's use mixing theory to estimate the fractions of Jacob's Well Spring flow that are from the surface and
subsurface based on temperature. Assume the surface source has temperature equal to the temperature in
Blanco River. Assume groundwater source is at a fixed temperature. The following equations then
apply.
Energy Balance:
Mass Balance:
QT=Q1T1+Q2T2
Q=Qi+Q2
where Q is discharge in Jacob's Well Spring, T is Temperature in Jacob's Well Spring, TI is the
temperature of the surface source (assumed equal to Blanco J^iver temperature), and T2 is the temperature
of the subsurface source (assumed constant and taken as the average of last 60 days). Qi and Q2 are the
unknown discharge contributions from surface and subsurface sources respectively (Figure 0-3).
E-27
-------
Figure 0-3 Surface and subsurface contributions to Jacob's Well Spring outflow and temperature
Two linear equations, two unknowns can be easily solved (see your high school algebra book). The
solution is
Qi/Q = (T-T2)/(TrT2)
The R scripts in Appendix A use data from HydroDesktop to solve this equation. You'll assign the
relevant time series to simple variable names and then use the R scripts to plot a graph representing the
amount of flow in Jacob's Well Spring inferred as coming from the surface.
To use the R scripts to compute fractional flow:
1. In the same manner that you created the Q.Jacobs variable above and assigned it to be the
discharge at Jacob's Well Spring, create and assign the following R variables (remember, the
variable names are case sensitive). In other words, for each variable, clear the R script
panel, select a series, generate the R code, send it to R, and assign the variable in the R
Console.
a. Q.blanco - Discharge at the Blanco River near Kyle
b. t.blanco - Water temperature at the Blanco River at Halifax Ranch
c. t.jacobs - Water temperature at Jacob's Well Spring**
"""IMPORTANT
You may encounter a bug when generating R code for water temperature at Jacob's Well Spring. In the R script
panel, if the endDate is not "2010-09-30" then edit the script to use "2010-09-30" before sending the script to the
R Console.
2. Enter Script 1 found in Appendix A into the R Console to execute the script. This script
prepares inputs for the analysis and plots graphs of the input temperature and flow.
3. Once you have reviewed the graphs of temperature and streamflow generated by Script 1,
close the two R Graphics Windows containing the graphs.
E-28
-------
4. Enter Script 2 found in Appendix A into the R Console to execute the script. This script
smooths the temperature time series and then performs the analysis to determine the
fraction of flow in Jacob's Well Spring from surface water.
The resulting graphs show smoothed temperature time series and the portion of flow in Jacob's Well
Spring inferred to be from the surface (the red line in the graph). Note that the analysis requires
differences between the assumed groundwater temperature and surface water temperature, so the graph
will be missing segments when those temperatures are nearly the same.
1^ R Graphics: Device3 (ACTIVE)
File History Resize
= II B
CD
O
O
CD
O
<£>
O
T
O
CM
nferred From Surface
Nov Jan Mar May
DT
Jul
Sep
Figure 0-4 Fractional Flow in Jacob's Well Spring
Congratulations! You have completed the exercise and seen how advanced analysis environments such as
R can be integrated into HydroDesktop using the power of plug-ins. This concludes the advanced portion
of the exercise.
E-29
-------
Appendix A: R Scripts
Script 1: Preparing Inputs for Flow Analysis
# SCRIPT 1: PREPARING INPUTS FOR FLOW ANALYSIS
# This code plots input time series of flow and temperature.
# The code assumes the following variables have already been set to
# the DataValues data frame for these time series:
# Q.Jacobs - Discharge at Jacob's Well Spring
# Q.bianco -- Discharge at the Blanco River near Kyle
# t.Jacobs -- Water temperature at Jacob's Well Spring
# t.bianco -- Water temperature at the Blanco River at Halifax Ranch
# The code handles intermittent missing values
# Start one day earlier because queries seem to be based on UTC
DT = seq(from=as.Date("2009-09-30"),to=as.Date("2010-09-30"),by=l)
ind=match(t.blanco$LocalDateTime,DT)
Tl=rep(NA,length(DT))
Tl[ind]= t.blanco$DataValue
ind=match(t.jacobs$LocalDateTime,DT)
T=rep(NA,length(DT))
T[ind]=t.Jacobs$DataValue
T2 = T[l] # The first value
ind=match(Q.jacobs$LocalDateTime,DT)
Q=rep(NA,length(DT))
Q[ind] = Q.jacobs$DataValue
plot(DT,Tl,type="l",ylab="T")
lines(DT,T,col=2)
legend("bottomright",c("T Blanco","T Jacobs"),col=c(l,2),lty=l)
windows()
plot(DT,Q,type="l")
Script 2: Computing Surface Water Flow Fraction
# SCRIPT 2: COMPUTING SURFACE WATER FLOW FRACTION
# This script solves the equation Ql/Q = (T-T2)/(T1-T2)
# and plots a graph showing the portion of flow inferred
# to be directly from surface water sources in Jacob's
# Well Spring.
# Before running this script, you must run SCRIPT 1:
# PREPARING INPUTS FOR FLOW ANALYSIS
# Smoothing Blanco River temperature data using lowess
ind=!is.na(Tl) # array indices of unmissing values
TIK-lowess (DT [ind] ,T1 [ind] ,f=0.1)
plot(DT,Tl,type="l",ylab="Degrees C")
lines(Tll,col=2)
lines(DT,T,col=3)
E-30
-------
legend("bottomright",c("Blanco T","Smoothed Blanco T","Jacobs
T"),col=c(l:3),lty=l)
title("Temperatures")
# Match the dates of the output for use in calculations
ind=match(Tll$x,DT)
Tls=rep(NA,length(DT))
Tls[ind]=Tll$y
# For results to be reasonable Tl and T2 have to be different.
# Only evaluate answers when Tl and T2 differ by at least 3 degrees
# Also only accept positive answers
# Calculate T2 as average over last 60 days
T2p=c(rep(T[1],59),T)
T2=rep(NA,length(DT))
for(i in 1:length(DT)) T2[i]=mean(T2p[i:(i+ )],na.rm="True")
Qlf=(T-T2)/(T1-T2) # apply the mixing solution equation
# eliminate answers when temperature difference is less than 3
indna=abs(T1-T2)<3
Qlf[indna]=NA
indna=Qlf< -0.05 # eliminate large negative values
Qlf[indna]=NA
# Plot the results
windows()
plot(DT,Q,type="l",ylab="cfs",lwd=2)
lines(DT,Q*Qlf,col=2,lwd=2)
legend("topright",c("Flow","Inferred From Surface"),lty=l,col=c(1,2;
title("Jacobs Well Discharge")
E-31
-------
References
Davidson, S. C. (2008). Hydrogeological characterization of base/low to Jacob's Well spring, Hays County, Texas
(Master's thesis). Retrieved November 2, 2010, from Hays Trinity Groundwater Conservation District Web
site: http://havsgroundwater.com/files/Documents/Davidson-08 thesis Cypress Crk Jacobs Well.pdf.
San Marcos Local News. (2009, March 10). Jacobs Well area to hold incorporation vote. Retrieved November 2,
2010, from San Marcos Local News Web site: http://www.newstreamz.com/2009/03/10/area-around-
iacobs-well-to-hold-incorporation-election/.
United States Geological Survey. (2007, March 26). National Surface Water Conference and Hydroacoustics
Workshop. Retrieved November 1, 2010, from United States Geological Survey Web site:
http://water.usgs.gov/osw/images/2007 photos/Hydroacoustics.html.
E-32
------- |