Surface Waters Western Pilot Study: Ecologically Relevant Quantification of Streamflow Regimes in Western Streams


              United States
              Environmental Protection
              Agency
Office of Research and
Development
Washington, DC 20460
EPA 600/R-06/056
 December 2006
 www.epa.gov
vvEPA    I Surface Waters
              Western Pilot Study:
              Ecologically Relevant
              Quantification of
              Streamflow Regimes
              in Western Streams
              Environmental Monitoring and
              Assessment Program

-------
                                               EPA/600/R-06/056
                                                 December 2006
                                                  www.epa.gov
Ecologically Relevant Quantification of
Streamflow Regimes
in Western Streams
by
Valerie J. Kelly 1
Steven M. Jett2
1 U.S. Geological Survey
Oregon Water Science Center
3200 SW Jefferson Way
Corvallis OR 97333


2 INDUS Corporation
200 SW 35th Street
Corvallis OR 97333

-------
Notice

   This analysis was funded via an Interagency Agreement between the U.S. Environmental
Protection Agency (EPA) and the U.S. Geological Survey (DW-14-92156001-0), and in part by EPA
contract with the Facilities Administration and Information Resources II (Fair II) (Contract 68-W-01-
032).
   This work was conducted to support the Environmental Monitoring and Assessment Program
(EMAP). It has been subjected to peer and administrative review by the EPA, and approved for
publication as an EPA document.
   The correct citation for this document is:
Kelly, VJ. and S.M. Jett. 2006. Ecologically-relevant quantification of streamflow regimes in western
    streams. EPA/600/R-06/056. U.S. Environmental Protection Agency, Washington, D.C.


Disclaimer:  Reference herein to any specific commercial products, process, or service by trade name,
trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement,
recommendation, or favoring by the United States Government. The views and opinions of authors
expressed herein do not necessarily state or reflect those of the United States Government and shall not
be used for advertising or product endorsement purposes.

-------
Abstract

   This report describes the rationale for and application of a protocol for estimation of ecologically-
relevant streamflow metrics that quantify streamflow regime for ungaged sites subject to a range of
human impact. The analysis presented here is focused on sites sampled by the U.S. Environmental
Protection Agency as part of their Environmental Monitoring and Assessment Program in the upper
Missouri Basin and Oregon. Streamflow data are provided by the U.S. Geological Survey. Specific
guidance is provided for selection of gage sites, development of probabilistic frequency distributions
for annual peak- and 7-day low-flow events, and regionalization of the frequency curves based on
multivariate analysis of watershed characteristics. Evaluation of the uncertainty associated with the
various components of this protocol indicates that the results are reliable for the intended purpose of
hydrologic classification to support ecological analysis. They should not be considered suitable for
more standard water-resource evaluations that require greater precision, especially those focused on
management and forecasting of extreme flow conditions.
                                            in

-------
Executive Summary

The purpose of this report is to summarize the regional analysis techniques used to estimate
ecologically relevant streamflow metrics that quantify the streamflow regime for sites sampled by the
U.S. Environmental Protection Agency's (EPA) Environmental Monitoring and Assessment Program
(EMAP) in the upper Missouri Basin and Oregon. The report is intended to serve as guidance for
application of these techniques to extend the scope of analysis to other areas.
Analysis of streamflow patterns is frequently based on annual streamflow frequency information
that, in the United States, is supplied primarily by data from the U.S. Geological Survey (USGS)
streamflow database. The necessary data are available only at sites where long-term gaging stations are
located, however, and many streams are ungaged. Regional frequency analysis provides a way to
estimate the frequency distributions for a variety of flood and low-flow metrics at ungaged sites, based
on pooled data from gaged sites within a homogeneous region (Riggs, 1973; Stedinger and others,
1993). In this study, an index procedure was used whereby the regional frequency curve is scaled by a
site-specific scaling factor (termed the "index flow," e.g., the mean peak or low-flow value for the site)
(Hosking and Wallis, 1993). This curve defines the dimensionless frequency distribution for the
region, from which specific quantile estimates for ungaged sites can be determined based on estimated
values for the index flow for those sites. The steps involved in regional index-flow analysis include the
following: (1) identification of homogeneous regions, (2) choice and estimation of a regional
frequency distribution, and (3) estimation of the index flow.
A critical assumption for any index-flow procedure is that the scaled frequency distributions for all
sites within the region are similar. Geographically contiguous regions have frequently been defined
according to physiographic and political boundaries, which do not always correspond to similarities in
hydrologic response (Simmers, 1975). For this analysis, a "region of influence" approach was used to
group sites according to basin features that are presumed to control streamflow characteristics
(Wiltshire, 1986). In this approach to regionalization, every site potentially has a unique set of basins
defined as its hydrological "region" that is not necessarily spatially contiguous (Burn, 1990a, 1990b;
Zrinji and Burn, 1994). These sites were selected from the correspondence between selected
hydrologic and watershed characteristics, as determined by canonical correlation analysis (CCA)
(Ribeiro-Corea and others, 1995). CCA provides canonical scores that reflect the correlation structure
between the two sets of variables for gaged sites. These scores, in turn, were used to determine the
associated score on the hydrologic vector from watershed data for individual ungaged sites where no
hydrologic data were available. An ellipsoidal region around each hydrologic score was then identified
with a defined level of confidence based on a chi-square distribution. This region contained the basins
that compose the corresponding so-called hydrologic neighborhood or site-specific region for the target
site.
A second important assumption of regional streamflow analysis is that watersheds with similar
attributes, especially regarding climate, topography, vegetation, soils, and geology, will exhibit similar
streamflow patterns (Riggs, 1973; Cunnane, 1988). The extent of human activity and water use in the
basin is also an important factor, especially related to low-flow conditions, although it has not often
been included in regional analysis because of the difficulty in quantifying these effects (Smakhtin,
2001). For this analysis, data quantifying the extent and type of human land and water use were
included in the watershed characterization to improve the identification of appropriate regions. Metrics
describing water use were derived from active withdrawal permits from State water resource agencies.
These did not represent estimates of use from actual measurements or surveys of reported use; they
simply describe the maximum allowable volumes of water that can be withdrawn within each
IV

-------
watershed. As a result, these were considered indices of potential water pressure from human use,
rather than precise descriptions of actual water use.
After identifying the appropriate hydrologic region for each site, the next step was to select an
appropriate frequency distribution to describe the regional frequency curves. Because of the focus on
extreme events, the generalized extreme value distribution (GEV), based on probability-weighted
moments (PWM) was selected for this analysis (Greenwood and others, 1979; Landwehr and Matalas,
1979; Hosking and others, 1985). This procedure is flexible and easy to implement, and has proven to
be especially reliable when regions are not homogeneous (Lettenmaier and others, 1987). Additionally,
the index flood PWM/GEV approach, based on hydrologic regions determined by canonical
correlation analysis, was found to give the best results in an inter-comparison study of different
procedures for rivers in Canada (GREHYS, 1996a, 1996b).
The final step was the estimation of the index flow for ungaged sites by regression against
appropriate watershed and climatic attributes (Riggs, 1973).
Any regional analysis is associated with an unavoidable level of uncertainty in the predictions,
which should be considered carefully to understand the limitation of the analysis. The evaluation of
errors associated with the streamflow metrics is, accordingly, a critical element of this protocol. The
initial sources of error include the selection of streamflow data and the subsequent application of the
GEV theoretical frequency distribution. These errors were minimized by trend analysis and setting
criteria for acceptable goodness of fit for the frequency analysis. Further errors are associated with the
regionalization process; these were minimized by careful selection of variables for CCA and stringent
rules for the definition of regions. Nonetheless, it is not possible to eliminate all uncertainty in the
predicted metrics, especially for sites such as these that are subject to a range of human activity that
cannot be precisely quantified. As a result, the estimates derived from this analysis should clearly not
be assumed appropriate for other types of water-resource evaluations, especially those focused on
management and forecasting of extreme flow conditions that require greater precision. Nonetheless,
they can be considered suitable for the stated purpose, that is, hydrologic classification of stream
systems to support ecological analysis.

-------
Table of Contents

Notice ii
Abstract iii
Executive Summary iv
Figures vi
Tables vii
Acknowledgements vii
Introduction 1
Purpose and Scope 1
Rationale for Approach 3
Methods 4
Data Assembly 4
USGS Streamflow Data 4
Basin Characteristics 4
Regionalization Process 6
Frequency Analysis 6
Canonical Correlation Analysis 8
Estimation of Metrics for EMAP sites 13
Reliability and Limitations 16
Summary and Conclusions 23
References 24
Appendix 1. Description of watershed characteristics used in regional frequency analysis 26
Appendix 2. Results from canonical correlation analysis, by aggregated large-scale region 27
Figures

Figure 1. Study area map for the upper Missouri River Basin, showing EMAP sites and USGS gages in
the context of large-scale aggregated ecoregions 2
Figure 2. Study area map for Oregon, showing EMAP sites and USGS gages in the context of large-
scale aggregated ecoregions 2
Figure 3. Sample of range for goodness of fit for theoretical frequency distributions for annual peak-
and 7-day low-flow data from USGS gages 7
Figure 4. Relation between canonical variates 1 and 2 for peak-flow frequency analysis, by aggregated
large-scale ecoregion 10
Figure 5. Relation between canonical variates 1 and 2 for 7-day low-flow analysis, by aggregated
large-scale ecoregion 12
Figure 6. Distribution of mean probabilities associated with regional assignments 14
Figure 7. Cross-validation results for peak-flow analysis (difference between metrics derived from
regional and individual frequency curves for gaged sites) 18
vi

-------
Figure 8. Cross-validation results for 7-day low-flow analysis (difference between metrics derived
         from regional and individual frequency curves for gaged sites)	20

Figure 9. Standard errors for selected quantiles estimated for EMAP sites, by large-scale aggregated
         ecoregion	22

Tables

Table 1. Aggregated ecoregion classification	5
Table 2. Definitions of ecologically relevant streamflow metrics	8
Table 3. Results of large-scale regional regression analysis to predict index flow	15
Table 4. Selected elements of the asymptotic covariance matrix	22
Acknowledgements

   The authors acknowledge both the research leadership of U.S. Environmental Protection Agency
(EPA), who conceived and implemented the EMAP program, and the streamflow gaging network
maintained by the U.S. Geological Survey (USGS). Additionally, much of the data upon which this
project depends is the product of a multitude of dedicated field crews and technicians over many years,
who are unnamed but much appreciated nonetheless.
                                          vn

-------
Introduction

   The U.S. Environmental Protection Agency (EPA) implemented the Environmental Monitoring
and Assessment Program (EMAP) in 1999 in 12 Western States to develop and demonstrate the tools
necessary to evaluate the ecological condition of aquatic resources in the Western U.S. (EMAP, 1997).
A primary goal for EMAP is to provide technical support and relevant information to regional, State,
and Tribal resource managers, so that environmental policies and decisions will be based on sound
scientific understanding. An important component for analysis, therefore, is the association between
ecological conditions, as measured by various indices of biotic integrity, and anthropogenic stressors.
This project was conceived to provide explicit hydrologic context to aid in interpretation of EMAP
data, context that is especially critical for streams in the West, where surface water regimes are highly
variable and under intense pressure from human activities.
   Streamflow characteristics integrate multiple watershed factors and are related to many structural
habitat features in streams, as well as to the functional organization of stream communities (Leopold,
1994; Ward and Stanford, 1983). As such, streamflow represents a critical component of temporal
variability in stream systems, and is considered to be a major organizing feature of physical habitat for
aquatic biota. An ecologically relevant description of the streamflow regime is provided by the
variability in streamflow from floods (e.g., the impact of increased water velocities and corresponding
substrate movement) and prolonged low-flow periods (especially intermittent or periodic zero-flow
conditions) (Poff and Ward, 1989). Metrics to describe patterns of these extreme events  can be
determined from streamflow frequency curves, and concisely represent various components of the
respective flow regime so that objective comparisons can be made between regimes for different sites.

Purpose and Scope

   The purpose of this report is to summarize the regional analysis techniques used to estimate
ecologically-relevant streamflow metrics in order to quantify the streamflow regime for  western
EMAP sites in the upper Missouri Basin and selected sites in Oregon (Figures 1 and 2).  The report is
intended to serve as  guidance for application of these techniques to extend the scope of analysis to
other areas.

-------
                                                           >x          ~~r
Figure 1. Study area map for the upper Missouri River Basin, showing EMAP sites and USGS
gages in the context of large-scale aggregated ecoregions.
                     •  •   •
                       /.-•:.*•   f
         Figure 2. Study area map for Oregon, showing EMAP sites and USGS
         gages in the context of large-scale aggregated ecoregions.

-------
Rationale for Approach

Analysis of streamflow patterns is frequently based on annual streamflow frequency information
that, in the United States, is supplied primarily by data from the U.S. Geological Survey (USGS)
streamflow database. The necessary data are available only for sites where long-term gaging stations
are located, however, and many streams are ungaged. Regional frequency analysis provides a way to
estimate the frequency distributions for a variety of flood and low-flow metrics at ungaged sites, based
on pooled data from gaged sites within a homogeneous region (Riggs, 1973; Stedinger and others,
1993). In this study, an index procedure was used, whereby the regional frequency curve is scaled by a
site-specific scaling factor (termed the "index flow," e.g., the mean peak or low-flow value for the site)
(Hosking and Wallis, 1993). This curve defines the dimensionless frequency distribution for the
region, from which specific quantile estimates for ungaged sites can be determined on the basis of
estimated values for the index flow for those sites. The steps involved in regional index-flow analysis
include the following: (1) identification of homogeneous regions, (2) choice and estimation of a
regional frequency distribution, and (3) estimation of the index flow.
A critical assumption for any index-flow procedure is that the scaled frequency distributions for all
sites within the region are similar. Geographically contiguous regions have frequently been defined
according to physiographic and political boundaries, which do not always correspond to similarities in
hydrologic response (Simmers, 1975). For this analysis, a "region of influence" approach was used to
group sites according to basin features that are presumed to control streamflow characteristics
(Wiltshire, 1986). In this approach to regionalization, every site potentially has a unique set of basins
defined as its hydrological "region" that is not necessarily spatially contiguous (Burn, 1990a, 1990b;
Zrinji and Burn, 1994). These sites are selected from the correspondence between selected hydrologic
and watershed characteristics, as determined by canonical correlation analysis (CCA) (Ribeiro-Corea
and others, 1995). CCA provides canonical scores that reflect the correlation structure between the two
sets of variables for gaged sites. These scores, in turn, can be used to determine the associated score on
the hydrologic vector from watershed data for individual ungaged sites where no hydrologic data were
available. An ellipsoidal region around each hydrologic score can be then identified with a defined
level of confidence based on a chi-squared distribution. This region contains the basins that compose
the corresponding so-called hydrologic neighborhood or site-specific region for the target site.
A critical assumption of regional streamflow analysis is that watersheds with similar attributes,
especially regarding climate, topography, vegetation, soils, and geology, will exhibit similar
streamflow patterns (Riggs, 1973; Cunnane, 1988). The extent of human activity and water use in the
basin is also an important factor, especially related to low-flow conditions, although it has not often
been included in regional analysis because of the difficulty in quantifying these effects (Smakhtin,
2001). For this analysis, data quantifying the extent and type of human land and water use were
included in the watershed characterization to improve the identification of appropriate regions. Metrics
describing water use were derived from active withdrawal permits from State water resource agencies.
These do not represent estimates of use from actual measurements or surveys of reported use; they
simply describe the maximum allowable volumes of water that can be withdrawn within each
watershed. As a result, these were considered indices of potential water pressure from human use,
rather than precise descriptions of actual water use.
After identifying the appropriate hydrologic region for each site, the next step is to select an
appropriate frequency distribution to describe the regional frequency curves. Because of the focus on
extreme events, the generalized extreme value distribution (GEV), based on probability-weighted
moments (PWM), was selected for this analysis (Greenwood and others, 1979; Landwehr and Matalas,
1979; Hosking and others, 1985). This procedure is flexible and easy to implement, and has proven to
be especially reliable when regions are not homogeneous (Lettenmaier and others, 1987). Additionally,

-------
the index flood PWM/GEV approach, based on hydrologic regions determined by canonical
correlation analysis, was found to give the best results in an inter-comparison study of different
procedures for rivers in Canada (GREHYS, 1996a, 1996b).
The final step is the estimation of the index flow for ungaged sites by regression against
appropriate watershed and climatic attributes (Riggs, 1973).

Methods

Data Assembly

USGS Streamflow Data
Data for annual peak and daily mean streamflow were obtained from the online USGS National
Water Information System (NWISWeb; http://waterdata.usgs.gov/nwis/). For Oregon, only sites with
the entire watershed located within the State were included in the analysis; additionally, all sites in the
Klamath River Basin were excluded because of the lack of reliable data about human water use. For
the low-flow analysis, daily mean streamflow data were first subset to include data only for the
summer season (June-September); the annual 7-day minimum flow was then determined for each year.
Once the data for both peak and 7-day minimum streamflow were compiled, the data were
evaluated for temporal trend as defined by Kendall's tau-b (p<0.01) (SAS, 1990). Trend analysis
proceeded in an iterative process in order to maximize the period of record. First the data for the entire
record were evaluated. For gages where no trend was observed, the entire dataset was included in the
regional analysis. For gages where a trend was observed, a subset of the data was evaluated again,
limited to the period 1960-2003. For gages where no trend was observed in the subset, data from this
period only were included in the regional analysis. Finally, for gages where a trend was still evident,
data were subset a second time to include the period 1980-2003. If no trend was observed, data from
this period only were included in the regional analysis.
Sites were further limited so that each had a minimum of 20 years of record between 1960 and
2003. Finally, only sites with sufficient nonzero flow over the relevant period ( N > 5 where 7-day low
flow > 0.5 cubic feet per second) were included in the low-flow analysis. Based on this screening
process, the total number of suitable gages for peak-flow analysis was 301 in the upper Missouri Basin
and 179 in Oregon; the total number for 7-day low-flow analysis was 283 in the upper Missouri Basin
and 204 in Oregon.

Basin Characteristics
A large number of basin characteristics was obtained for each gaging station and EMAP site,
including drainage area, topography, precipitation, land use, soil characteristics, dominant ecoregion,
location of major dams, and human water pressure. Definitions of basin characteristics used in the
various components of this analysis are provided in Appendix 1. All basin characteristics were
extracted from GIS databases using Arc Macro Language programs written for Arc/Info
(Environmental Systems Research Institute, Inc., 1999). Drainage area was determined by digitizing
basin boundaries using 1:24,000 USGS topographic maps. Elevation and watershed slope were
determined from data from the USGS National Elevation Database (NED), with 30 meter resolution.
Annual precipitation was calculated as the sum of area-weighted estimates, based on raster
precipitation data for monthly average precipitation totals (1961-1990), with 2-km resolution (Daly
and others, 1994). Precipitation intensity metrics for selected recurrence intervals were calculated from
raster data, including both local (site-specific) and watershed-wide characteristics (USDC, 1961;
NOAA, 1973). Land use characteristics were determined by areal proportion of aggregated categories

-------
defined by data from the National Land Cover Dataset (NLCD) (Vogelmann and others, 1998). Soil
characteristics were described by the sum of area-weighted values for the watershed, based on data
from the State Soil Geographic (STATSGO) database (Schwarz and Alexander, 1995). Dominant
ecoregion was defined by the Level III ecoregion with the largest area within the watershed (Omernik,
1987; USEPA, 2006). The location of major dams was provided by the U.S. Army Corps of Engineers
National Inventory of Dams (NID) (http://www.nicar.org/data/damsA accessed March 13, 2006).
Characterization of human water pressure for watersheds was based on an inventory of active water-
permit information from State water-resource agencies. The focus for the water-right analysis was
limited to off stream use, that is, water withdrawn or diverted from a source, either surface water or
ground water. No  effort was made to distinguish consumptive from nonconsumptive use, or to estimate
quantities of return flow.
   An a priori large-scale regional classification for all  sites was generated from aggregated Level III
ecoregions, areas previously identified and mapped as having similar physical features, climate,
vegetation, and soil characteristics) (Table 1) (Omernik,  1987; USEPA, 1998). It was presumed that
these large-scale regions are characterized by distinct associations of geomorphic and climatic
processes that determine the streamflow regime, and that these associations vary among the aggregated
regions across the study area. As a result, the CCA component of this analysis was conducted
separately for each aggregated region.
Table 1. Aggregated ecoregion classification
Aggregated ecoregion
Mountain
Plains
Xeric
Pacific Northwest (East)
Pacific Northwest (West)
Level III ecoregion
16 (Idaho Batholith)
17 (Middle Rockies)
21 (Southern Rockies)
41 (Canadian Rockies)
25 (High Plains)
42 (Northwestern Glaciated Plains)
43 (Northwestern Great Plains)
44 (Nebraska Sand Hills)
46 (Northern Glaciated Plains)
47 (Western Corn Belt Plains)
48 (Lake Agassiz Plain)
10 (Columbia Plateau)
18 (Wyoming Basin)
80 (Northern Basin and Range)
9 (Eastern Cascades Slopes and Foothills)
11 (Blue Mountains)
1 (Coast Range)
3 (Willamette Valley)
4 (Cascades)
78 (Klamath Mountains)

-------
Regionalization Process

Frequency Analysis
   As previously discussed, frequency analysis of annual peak- and 7-day low-flow data from USGS
gages was based on an index-flow procedure, using PWM estimators of the GEV distribution (Hosking
and others, 1985). The GEV distribution for any random variable (x) is described by
and M, a, and g represent parameters of location, scale, and shape. For this analysis, the probability-
weighted moments (M}) for each site were first determined as

                     for 7 = 0,1,2                                    (2)

where pt = (/-0.35)/w is the plotting position estimate oiF(Q) and Qt is the series of annual peak or 7-
day minimum streamflow. For the peak-flow analysis, this series was ordered from lowest to highest
(i.e. Qi < Q2 < . . . < Qn), so that/?, represents Px(x) or the probability of an event equal to or smaller
than the designated value (Haan, 2002). For the low-flow analysis, the series was ordered in reverse,
from highest to lowest, effectively turning over the frequency curve so that the limit became a lower
one (Gordon and others, 1992). In this case, the plotting position (pt) represents l-Px(x), the
probability of an event greater than or equal to the designated value.

   Next, the PWMs for each site were normalized by their mean (M* = M j /M0 ). The parameters of
the GEV distribution were estimated as follows:
      = 2M, -M, _  log2
        3M2-M0    Iog3
                                                                          (3b)

    a-           °                                                        (3c)
                   -
                                                                          (3d)

   Finally, the selected quantiles (T-year flow events) of the GEV distribution were determined by
    Q* = u
where g ^ 0.
'IS                                         (4)
   In the 7-day low-flow analysis, the presence of zero flow values (< 0.5 cubic feet per second) was
dealt with by adjustment of probabilities based on the theorem of total probability (Haan, 2002). In
other words, it was assumed that all the probability was accounted for simply by the sum of the
probability of flow equal to zero plus the probability of flow greater than zero. On this basis, the
frequency distribution for each site was first determined for all 7-day low-flow values greater than
zero. The resulting probabilities were then adjusted by the fraction of nonzero values observed in the

-------
data for that site, effectively shifting the frequency curve along the probability axis to reflect the
probability of zero flow (Gordon and others,  1992).
    A probability plot correlation test was conducted to test whether the sample data from each gage
were drawn from the GEV distribution (Stedinger and others,  1993). This test is based upon the
correlation r between the sample data, ordered as described above, and the corresponding predicted
values based upon their plotting positions. Values of r close to 1 indicate a close correspondence
between the data and the theoretical distribution. For this study, a lower critical value for r of 0.95
(p x 0.10) was selected as the cutoff for inclusion of data in the next stage of the analysis (Stedinger
and others, 1993). Selected probability plots are presented to describe the range of goodness of fit that
was observed for sites that satisfied this criterion (Figure 3).
               Annual peak—flow frequency distribution
                   slation 050M500 correlation-
            -0.1
            -02
                         I     2     3
                            EV1 varialti
  Annual peak—flow frequency distribution
      stalion"05056200 correlation 0.95199
                                                            -1
             1     2      3
                EVl variale
            Annual 7—day law—flow frequency distribution
           Did
           II.JH
           O.B
           O.H
           i'.E
           MO
           iJ.UH
           (UK
           0.04
           om
           0.00
          -0.02
          -OM
          -0.06
          -0,08
          -O.B
          -0,12
            -2
                 correlalion- -0.99999 station-06289600
                              0        1
                            EVl vuriate
 Annual 7-day low-flow frequency dislribulion
       correlation -0.96978 station= 11509SOO
0.1
          -1        0
                 EVl variale
       Figure 3. Sample of range for goodness of fit for theoretical frequency distributions for annual
       peak- and 7-day low-flow data from USGS gages.

-------
   Finally, hydrologic metrics were determined for each site, either directly from the frequency curve
or calculated from the time series data (Table 2).
Table 2. Definitions of ecologically relevant streamflow metrics.
Streamflow metric
Measurements used to define
Peak-flow regime
QP
Q2P
Q100P
DP
JDP
JDpCV
Mean annual peak flow magnitude
Index of bankfull flow (normalized 2-year peak flow)
Index of peak-flow variability (normalized 100-year peak flow)
Index of dispersion (Q2p-Q1 00P)
Peak-flow timing (mean Julian day for annual peak flow)
Variability in peak-flow timing (CV for mean Julian day)
Low -flow regime
QL
Q2L
Q100L
DL
Pzero
Q7Q10L
JDL
JDLCV
Mean annual 7-day low-flow magnitude
Index of baseflow stability (normalized 7-day 2-year flow)
Index of low-flow variability (normalized 7-day 1 00-year low flow)
Index of dispersion (Q2L-Q1 00L)
Index of intermittency (percent of days/year with zero flow)
Important low-flow event (7-day 10-year low flow magnitude)
Low-flow timing (mean Julian day, onset of annual 7-day low flow)
Variability in low-flow timing (CV for mean Julian day)
Canonical Correlation Analysis
    Separately for USGS sites within each large-scale region, canonical correlation was performed
between sets of watershed variables and sets of hydrologic metrics using SAS CANCORR (SAS,
1989) (Appendix 2, A-E). Logarithmic transformation was applied to selected variables (drainage area,
elevation, mean annual peak flow, and 7-day low flow magnitude) to improve normality of distribution
and linearity of relationship between variables. The hydrologic set varied slightly for different
components of the analysis. For peak-flow analysis, it was necessary to conduct the CCA separately
for timing metrics for some sites in order to restrict the number of significant canonical variates to a
minimum of 2 (p<0.0001). Nonetheless, the complete analysis included the same metrics for all sites: a
measure of scale (mean annual peak or 7-day low flow magnitude), measure of timing (mean Julian
day of peak or onset of 7-day low flow), and measures of flow variability (normalized values of peak
or 7-day low flow for selected recurrence intervals). Variability in timing (coefficient of variation of
mean Julian day) was included when it did not correlate with timing. A measure of intermittency
(proportion of 7-day low flow < 0.5 cubic feet per second) was also included in the low-flow analysis.
    Variables for the watershed set were selected on the basis of correlation with nonmetric
multidimensional  scaling (NMS) ordination of hydrologic metrics, using the Euclidian distance
measure (McCune and Mefford, 1999). Ordinations were performed separately for peak and low-flow
metrics, and separately for each large-scale region. All data were first relativized to the maximum
value to account for differences in scale. At least 15 iterations were used for each NMS run, based on
random starting coordinates. Each analysis was repeated several times to verify that the solution was
stable, and that the configuration represented a good fit with the data. The number of dimensions was
selected by plotting a measure of fit to the number of dimensions; a Monte Carlo randomization test
(minimum of 30 iterations) was conducted to evaluate whether the ordination axes were extracting

-------
more variability than expected by chance (p< 0.05). Axes were rotated as appropriate to improve the
interpretability of the ordination.
    The relationship between the ordination scores for the hydrologic metrics and the associated
hydrologic and watershed characteristics were evaluated by examination of a joint plot. These plots
portray the direction and strength of the correlation between the two sets of variables; they provided
the justification for selecting nonredundant hydrologic metrics, as well as watershed variables that
showed the strongest linear relationship with the ordination structure of the hydrologic regime. These
always included some measure of topography (e.g., elevation or watershed slope), and some measure
of climate (e.g., mean annual precipitation or precipitation intensity); watershed scale was always
included, as measured by drainage area. Additionally, measures describing soil characteristics (i.e.,
water capacity or organic material), spatial location (e.g., longitude), and human water use (i.e., water
diversions) were important for some components of the analysis.
    The frequency analysis was limited to the first two canonical correlations, which were always
significant (p < .001) and together accounted for essentially all the variation (cumulative proportion
ranging from 0.90 to 1.0 (Appendix 2). The first canonical correlation was consistently high for all
large-scale regions (ranging from 0.82 to 0.95), and accounted for a significant proportion of
overlapping variance (0.70-0.91); the second ranged from 0.53 to 0.76 (overlapping variance 0.30-
0.58).  Similar results were obtained for the separate analysis of timing where that was done, although
generally most of the variation was explained by the first canonical correlation. These results indicate
that the first pair of canonical variates was always strongly related, while the second pair was generally
at least moderately to strongly related (Figures 4 and 5).

-------
           Evaluation of relation between canonical variates
                     for Plains USGS sites
  P  2
  .£
                             ..,'•;"•'.*•>.-

                     .- ':'•• '-.'.-'<•.'•
               -2-1       0       1        2
                     Watershed scores (variale 1 )
Evaluation of relation between canonical variates
         for Plains USGS sites

   -2      -1       0       1       2
          Watershed scores (variale 2)
           Evaluation of relation between canonical variates
                    for Mountain USGS sites
                   -1           0           1
                     Watershed scores {variate 1)
Evaluation of relation between canonical variates
        for Mountain USGS sites
                                                               -3      -2
           -1012
          Watershed scores (variate 2)
           Evaluation of relation between canonical variates
                     for Xenc USGS sites
  ^   0


  1-1
Evaluation of relation between canonical variates
          for Xenc USGS sites
       -3-2-10123
                     Watershed scores {variate 1)
           -10123
          Watershed scores (variate 2)
Figure 4. Relation between canonical variates 1 and 2 for peak-flow frequency analysis,
by aggregated large-scale ecoregion—Plains, Mountains, and Xeric sites.
                                                     10

-------
        Evaluation of relation between canonical variates
                for PNW W USGS sites
Evaluation of relation between canonical variates
        for PNW W USGS sites
-
I
I
™
1
I


I
M
1
1
J
2
1
0

-1
-2
-3
•
, ..-.•'';*"•
..:v^/V:
. » i ** ".
»' " *
•" '
3 -2 -1 D 1 234
Watershed scores (variate 1)
2
1
0
-1
-2
-3
Evaluation of relation between canonical variates
for PNW_E USGS sites
.
• • . * . "

•.'
                -2-101
                 Watershed scores (uariate 1)
                                                                   -1          0           1
                                                                     Watershed scores (variate 2)
                                                            Evaluation of relation between canonical variates
                                                                     for PNW E USGS sites
     -1012
         Watershed scores (variate 2)
Figure 4, continued. Relation between canonical variates 1 and 2 for peak-flow frequency
analysis by aggregated large-scale ecoregion—PNW-W and PNW-E sites.
                                               11

-------
         Evaluation of relation between canonical variates
                  for Plains USGS sites
Evaluation of relation between canonical variates
         for Plains USGS sites
a
P 2
i
W 1
•
§ ^
Q
§ -1
I* -2
^
" •
.
•*•*•** *.

. V-<;''V >'" •'•
. • *'•".•' '*•. •
• •> tsf'.A • '. '

'•'

j
sr 2
IB
m 1
>
i "
y
5 "1
I -2
T


•
1

*
. •* . ••
.' • *


-3 -2 -1 0 1 2 3 4 * -3 -2 -1 0

*
• •» * . « •'
•" •
;.'..''>;••:";
..•.••::•
* .
., .


1 2
Watershed scores (variate 1 ) Watershed scores (variate 2)
Evaluation of relation between canonical variates Evaluation of relation between canonical variates
for Mountain USGS sites for Mountain USGS sites
3
P 2
I 1
o 0
D
8" -1
a
TJ
K
x -2
. , .
• * * . •
.- 1 y '* '-
: •. • "
*.•* .
. r .


•
a
I 1
' 0
i
a -1
$
"5
•a
>, -2
X
•*
• " • *
3 -4 -3 -2 -1 0 1 2 3 " -3 -2 -1 0
Watershed scores (variate 1) Watershed scores (va
1 2
rials 2)
Evaluation of relation between canonical variates Evaluation of relation between canonical variates
for Xeric USGS sites for Xeric USGS sites
P 2
•
i
a) 1

o 0
a
f "1
I
I -2

•
• •-.
• ..
-/,"- '/•
.'• '"•
. . .
* •'
.

£
? 1
l
n
0
1
B -1
•f
4 -2

-3

. •
                  Watershed scores (variate 1)
                                                                         1
                                                                     Watershed scores (variate 2)
Figure 5. Relation between canonical variates 1 and 2 for 7-day low-flow analysis, by
aggregated large-scale ecoregion—Plains, Mountains, and Xeric sites.
                                             12

-------
                Evaluation of relation between canonical variates
                       for PNW W USGS sites
                         -1      0      1
                       Watershed scores (variate 1)
Evaluation of relation between canonical variates
       for PNW W USGS sites
                                                           .•:*>.,       . ,-..
  -2-101    234
       Watershed scores (variate 2)
         P  2
         £
                Evaluation of relation between canonical variates
                       for PNW E USGS sites
                        -1      0      1
                       Watershed scores (variate 1)
Evaluation of relation between canonical variates
       for PNW E USGS sites
         0      1      2
       Watershed scores (variate 2)
          Figure 5, continued. Relation between canonical variates 1 and 2 for 7-day low-flow analysis,
          by aggregated large-scale ecoregion—PNW-W and PNW-E sites.

    From the correlations between the variables and canonical variates shown in Appendix 2, the most
consistent pattern observed in the peak-flow frequency analysis was that the first canonical variate was
directly associated with both flow magnitude (Qp) and drainage area (A). For all large-scale regions
outside the PNW, the first canonical variate was also directly associated with normalized 2-year peak
flow (Q2P). In addition to A, other watershed characteristics that were associated with the first
canonical variate included: elevation (Emax, Emin, and Estd), precipitation intensity (PI and PIstd),
and soil characteristics (AWC, Clay, and Thick).  The second canonical  variate was more varied among
the regions, although generally complementary to the first. Timing metrics, when evaluated separately,
showed a strong and consistent relation with elevation.
    In the low-flow analysis, a similar pattern  of strong association of flow magnitude (QjJ and A with
the first canonical variate was observed for all sites. Other associated watershed variables included
elevation (Emax and Evar), land cover, (H2Opct), precipitation, (PI), and water use (Tdivert and
GWdivert). The second canonical  variate was  consistently associated with normalized 100-year low
flow (Q100L) and variability in timing of annual 7-day low flow (JDLCV) for all sites, and with timing
of onset (JDL) for all sites except the western PNW.

Estimation of Metrics for EMAP sites
    Based on output from the canonical correlation analysis of USGS sites, scores on the first two
canonical watershed variates were determined for each EMAP site using SAS SCORE (SAS, 1989).
USGS sites that were located far from the position of each EMAP site along the watershed variate
were identified by Mahalonobis distance, a multivariate distance measure that conforms to a chi-square
                                              13

-------
distribution. This distribution was evaluated for canonical watershed scores (v0) for each EMAP site
along the first two canonical variates according to the following:

                                   XL                                      (5)
where w is the score on the appropriate hydrologic variate for each USGS site, A is the eigenvalue, or
squared canonical correlation between the pair of appropriate canonical variates, and Ip is thep xp
identity matrix (p = 2) (Ouarda and others, 2001). Regions were defined by 90% confidence when
possible, with a further requirement to contain a minimum number of sites. A sensitivity analysis was
conducted to compare the results of basing the regional definition on a minimum of 5 or 10 sites. No
significant differences were observed for the mean values for hydrologic metrics, although the
variability was increased when regions were based on the larger N. As a result, a minimum of 5 sites
was used in  order to focus the analysis on the most similar watersheds; for most regions, it was
necessary to tolerate lower confidence in order to obtain the minimum number of sites (Figure 6).
                                         Peak-flow analysis
                     1.0
                     0.9

                     °'8
                  o
                  t
                  .a
0.7



0.5

0.4

0.3

0.2

0.1

0.0



1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
                          MT
PL        PNW_E

  7-day low-flow analysis
                                                            PNW W
                                                                          XE
                          MT
                                      PL
                                                 PNW E
                                                            PNW W
                                                                          XE
                  Figure 6. Distribution of mean probabilities associated with regional
                  assignments.
                                             14

-------
   Once the EMAP-site-specific regions were determined, the regional PWMs were calculated as
weighted averages of the PWMs for the USGS sites within each region:

                                                                          (6)

where the denominator is the total number of years of record for the region. Regional average PWMs
were used to estimate the parameters of the GEV distribution based on Equation 3(a-d), and quantiles
of the regional GEV distribution were calculated using regional parameter values and Equation 4.
Streamflow metrics were derived directly from the regional frequency curve for each EMAP site, as
described in Table 2. Metrics describing timing (mean and CV for Julian day) of onset of 7-day low
flow were estimated from data for each site-specific region.
   A modification of this procedure was done for extreme low-flow conditions (Q2L) at sites when
"large" dams (i.e., with storage capacity greater than 30,000 acre feet) were located upstream. The
distinction, although somewhat arbitrary, was based on the presumed capability of this volume of
storage to significantly modify downstream low-flow conditions from what might be expected from
watershed characteristics. These dams were consistently located on gaged streams. The selected metric
was estimated for EMAP sites on these streams from the observed frequency distribution for the single
gaged site rather than the theoretical distribution derived from the site-specific region.
   Index flows (mean annual peak- and 7-day low-flow magnitude) were estimated by multiple
regressions for those sites with no gages suitably located on the same stream. Separate regression
equations were developed for each large-scale region; regression parameters are presented in Table 3.
For sites with gages on the same stream, the index flows were calculated by drainage-area (DA)
adjustment of the observed index flow for the gage (where the difference in drainage area was either +
33 percent or  < 100 miles2).
Table 3. Results of large-scale regional regression analysis to predict index flow.
Large-scale
ecoregion
Regression equations
Peak-flow analysis
Plains
Mountain
Xeric
PNW-east
PNW-west
logQP=0.648+0.495*logA+0.0718*PI+0.393*Plstd+0.999*P-0720*logEmax-5.331*AWC-0.079*Perm+
0.028*Long
logQP=1.79+0.780*logA+0.466*P-12.3*AWC
logQP=-.286+0.840*logA+2.45*P-0.054*Perm
logQp=1.32+0.580*logA+0.472*PI24Std-1.426*Plstd+4.62*AWC-0.067*Perm
logQP=1.40+0.987*logA+0.491*P-5.54*AWC+0.140*OM-0.118*Perm
Low-flow analysis
Plains
Mountain
Xeric
PNW-east
PNW-west
logQL=-18.0+0.890*logA+1.00*logEmax+0.102*Long+0.00002*Tdivert+0.063*S+0.114*Wet+0.010*Ag+
0.083*Perm+0.312*l
logQL=-1.93+0.769*logA+0.000046*SWdivert-0.135*Wet-0.046*Ag-
0.172*PI24+0.548*l+0.925*P+0.002*Asp
logQL=-109+0.405*logA+3.39*logEmax-0.00028*Tdivert+0.00032*SWdivert+0.119*Y-0.010*R
logQL=2.24+0.714*logA-0.141*W-0.069*LL
logQL=27.1 + 1.25*logA+0.853*logEmax-0.258*Long-0.027*S+0.366*Wet-0.014*Ag-0.031*Clay+0.030*LL+
0.075*1
   The proportion of intermittent 7-day low flow was estimated directly from the regional frequency
curve (rounded to nearest 5-year recurrence interval) for those sites with no gages nearby; where
suitable gage data was available, as defined above, the proportion of intermittent flow was estimated
by DA adjustment.
                                            15

-------
Reliability and  Limitations

   Any regional analysis is associated with an unavoidable level of uncertainty in the predictions,
which should be considered carefully in order to understand the limitations of the analysis. This
uncertainty arises in part because of the inherent difficulties in the probabilistic approach, especially
when focused on the long-term distribution of events from comparatively short-term data series.
Another source of error is the regionalization process itself, especially given the assumption of
sufficient similarity within regions that are defined by a small number of attributes. For this analysis,
additional uncertainty occurs as a result of the confounding and dynamic nature of a range of human
influence on streamflow regimes, and the challenge of quantifying that influence in a realistic and
relevant way. Accordingly, the evaluation of errors associated with the streamflow metrics is a critical
element of this protocol.
   The first component of the analysis subject to potential error is the selection of streamflow data and
the subsequent application of the GEV theoretical frequency distribution to describe the components of
the various flow regimes for USGS gaged sites. As previously mentioned, data were screened to
simultaneously  eliminate any temporal trend (p<0.01) and maximize the period of record, with a
minimum of 20 years since 1960. Additional screening was done to exclude sites where the probability
plot correlation test indicated a poor fit with the theoretical distribution (r<0.95) (Figure 3). These
screening measures allowed the basic assumptions of the distribution to be met by focusing the
analysis on a relatively "ideal"  set of sites. In the process, however, sites were excluded where
streamflow was most likely to be influenced by significant recent human activity, a major factor in
causing temporal trend. As a result, the metrics generated by this analysis are (by  definition) less
reliable for EMAP watersheds that are similar to the excluded gaged sites, presumably especially
regarding human impact. Because of the limitations of the frequency analysis itself, as well as the data
that are currently available for human water use, it is not possible to accurately describe the nature of
these errors nor the sites that may be most affected. Nonetheless, the fact that these metrics do not well
describe the effect of intense human modification of streamflow regime represents an  important source
of uncertainty in the analysis.
   Additional error is associated with the regionalization component of the analysis, including both
the correspondence between watershed and flow attributes determined for USGS gages by CCA,  and
the subsequent assignment of site-specific regions for EMAP sites based on watershed characteristics.
The uncertainty associated with CCA is described by the canonical correlation results  presented in
Appendix 2, and graphically by the relationship between canonical variates depicted in Figures 4 and
5.  As previously mentioned, the CCA results indicate that the two canonical variates together
explained most of the variability in the watershed and flow data (at least 90 percent, as measured by
the cumulative proportion extracted by the eigenvalues). A generally strong correspondence was
observed between the first pair of canonical variates for all large-scale regions, and a variable but less
strong relationship between the second pair. These results indicate a moderately high degree of
confidence is warranted for the characterization of the multivariate relationship between selected
hydrologic and watershed features. The analysis is limited, however, by the selection of variables that
are included,  and especially by the assumption of linearity and independence among them. While an
attempt was made to focus on nonredundant hydrologic metrics and watershed characteristics that are
strongly and linearly correlated, it is unrealistic to assume that the available data perfectly describe the
complex array of factors that determine streamflow regime.
   Further errors are associated with the process of region definition, and essentially represent the lack
of similarity in key watershed attributes between USGS gaged sites and EMAP sites. These are
described by the mean probabilities (P) associated with each site-specific region represented in


                                            16

-------
Figure 6, and reflect large differences in the range of confidence among the various flow analyses and
large-scale regions. For the peak-flow analysis, the highest level of confidence is shown for the Plains
sites (median P=.93), with most values > .90. The greatest range of uncertainty was observed for the
Mountain sites (median P=.60); most values for these sites were less than .70. Somewhat higher/1
values were observed for the other regions (median P ranging from .62 to .81). Much the same pattern
was observed for the low-flow analysis for Plains sites (median P=.94), although the greatest
uncertainty for this analysis was associated with the western PNW sites (median P=.45); median P
values for the other large-scale regions ranged from .72 to .79. These results primarily reflect the
greater number of similar gaged sites available for analysis in the aggregated Plains region.
   An evaluation of the effect of these errors in the regionalization process on the metrics estimated
for EMAP sites was provided by two additional analyses. First, cross-validation analysis of data from
USGS gaged sites was conducted to evaluate how well frequency distributions derived from regions
defined by CCA compared with those derived from observed data. Watershed scores for USGS sites
were used to generate regions, and regional frequency curves were then derived from those regions in
the same manner used for EMAP sites. The differences between the metrics derived from these
regional curves and those determined from the individual frequency distributions are presented in
Figures 7 and 8. These results show generally quite close agreement between the two sets of metrics,
with median differences consistently close to 0. For peak-flow metrics, very small differences were
observed for the slope of the annual frequency curve (most values within± .05), and Q2P (most values
within + .2) (Figure 7). Larger differences were observed for  Q100P, although most values were
within ± 2. Because the dispersion index (DP) was conservatively calculated as the difference between
these two metrics (Q100P-Q2P), it was associated with differences that were similarly small. For
metrics describing timing of peak-flow onset, differences for JDP were generally  within + 1-3 weeks
and for JDPCV generally within + 1-2 weeks. Comparable results were observed for low-flow metrics
(Figure 8).
   Another measure of uncertainty in the metrics estimated for EMAP sites is the standard error of the
estimate for the selected quantiles, which were determined according to the method described in
Rosbjerg and Madsen (1995). The mean and variance of each T-year event estimator (i.e., Q/Qm) were
first approximated by:

                                                                                 (7)
                                    lg             g

    Var\xT} = — [wu + A(Aw22 + 2wl2) + B(Bw33 - 2wl3 - 2w23)]                        (8)

where a, g, and u are from equation 3(b-d), Nis the total number of years of record for all sites within

the site-specific region, KT = -In 1	, and A and B were determined as follows:


                                                                                 (9)


                       .    .                                                    (10)
                    g
                                            17

-------
f
     0.20
     0.15
     0.10
     0.05
     o.oo
o
 2  -0.05
|  -0.10
M
3  -0.15
    -0.20
CM
O

CNJ
O
a
c:
o
'
    0.5
    0.4
    0.3
    0.2
    0.1
    0.0
    -0,1
    -0.2
    -0.3
    -0.4
    -0.5
    8
    6
    4
    2
g.  0
i5
S   -2
o
oj   -4
    -6
    -8
             A
            MT
                          PL        PNW_E      PNW_W
                          Aggregated Level III Ecoregion
                                                                XE
            B
           MT
                         PL         PNW_E     PNW_W
                         Aggregated Level III Ecoregion
                                                                XE
         MT
                       PL        PNW_E      PNW_W        XE
                        Aggregated Level III Ecoregion
Figure 7. Cross-validation results for peak-flow analysis (difference between metrics
derived from regional and individual frequency curves for gaged sites). A, slope of
frequency curve; B, normalized 2-year peak flow; C, normalized 100-year peak flow.
                                 18

-------
s
o
c
8
 5
 4
 3
 2
 1
 0
-1
i
Q  -2
    -3
    -4
    -5
     60
     40
|-
I   2°

1
.£  -20
Q
 ,0

 O
-6O

-80

 60
 50
 40
 30
 20
 10
   0
 -10
 -20
 -30
 -40
            D
          MT
           MT
            MT
                    PL         PNW E       PNW W
                     Aggregated Level III Ecoregion
                                                                   XE

                         PL         PNW_E      PNW_W         XE
                         Aggregated Level III Ecoregion
                         PL        PNW_E      PNW_W
                          Aggregated Level III Ecoregion
                                                                  XE
  Figure 7, continued. D, dispersion index (Q100-Q2); E, mean Julian day of annual
  peak; F, CV of mean Julian day.
                                  19

-------
I
3
3-
8
0
Q
1.2
1.1
1.0
0.9
0.8
0.7
O.6
0.5
0.4
0.3
0.2
0.1
0.0
 0.3
 0.2
 0.1
 0.0
-0.1
-0.2

-0.4
-0.5
-0.6
-0.7
-O.8
-0.9
-1.0
A
                     PL         PNW_
                     Aggregated Level
                                                  PNW_W
                                             Ecoregion
                                                                   XE
t
                 B
                                       *           *
            MT
                         PL         PNW_E      PNW_W
                          Aggregated Level III Ecoregion
                                                                   XE
s?
    -1
    -2
          MT
                        PL         PNW_E       PNW_W
                         Aggregated Level III Ecoregion
                                                                  XE
Figure 8. Cross-validation results for 7-day low-flow analysis (difference between
metrics derived from regional and individual frequency curves for gaged sites). A, slope
of frequency curve; B, normalized 2-year peak flow; C, normalized 100-year peak flow.
                                  20

-------
             D
  a
   c.
   8   o
      -1
      -2
            MT
      40
      20
  §   -20
  
-------
    The terms wtj are elements of the asymptotic covariance matrix of the PWM estimators of the GEV
parameters, and were derived by Hosking and others (1985) for several values of g; the values used in
this analysis were based on a mean value for g of -0.1 for the peak-flow analysis and 0.4 for the low-
flow analysis (Table 4) (cf Rosberg and Madsen, 1995).
Table 4. Selected elements of the asymptotic covariance matrix
g
-0.1
0.4
Wn
1.2915
1.2433
W12
0.5104
-0.1205
W13
0.3245
0.3592
M/22
0.8440
0.6368
M/23
0.2240
0.3329
W33
0.6815
0.5880
    Finally, the standard errors associated with each T-year event estimator (or quantile) were

determined as ^Jvar{xr }/\N . The distributions of these errors for recurrence intervals of 2 years and

100 years for the peak- and low-flow analysis are plotted in Figure 9.
                  Standard errors far quantile estimates
                     Two-year peak flow (Q2)
               A
         0002
                      PL     PNW_E    PNW_W
                       Aggregated Level III Ecoregujn
                                                   I  1.5
                                                    0.0
 B
     Standard errors for quantile estimates
      One hundred-year peak flow (Q100)
                                                                i
        PL     PNW_E    PNW_W
         Aggregated Level III Ecciegicn
                  Standard errors for quantise estimates
                    Two-year 7-day low-flow (Q2)
         owe
                             i!
                      PL     PNW_E    PNW_W
                       Aggregated Level III Ecoregion
     Standard errors for quantile estimates
     One hundred-year 7-day low-ftow (Q100)
                                                         D
i
*       I
        PL      PNW_E    PNW_W
         Aggregated Level III Ecoregion
      Figure 9. Standard errors for selected quantiles estimated for EMAP sites, by large-scale
      aggregated ecoregion. A, Q2P; B, Q100P; C, Q2L; D, Q100L.

    These results indicate the greatest uncertainty is associated with estimating extreme peak flows,
most pronounced for Plains sites. These sites are also those with the highest interannual variability in
peak flow in general. Differences between the two quantile estimates (E\xT}- QT) were also
determined to be uniformly positive in sign, which implies that quantile estimates from this analysis
                                                22
-------
may be biased low. Measures of Q100P, although uncertain, should nonetheless be considered to
represent a conservative estimate of peak-flow variability, especially for sites subject to extreme
variability in peak flow. In contrast, errors associated with low-flow quantile estimation (Q2L and
Q100L), as well as 2-year peak-flow quantiles (Q2P), were essentially nil. These results indicate that a
high level of confidence can be associated with these metrics.

Summary  and Conclusions

    The various components of this analysis represent a series of abstractions from observed data to
derived metrics. Because of the nature of the problem, that is, the description of streamflow regime for
sites without streamflow data, these metrics are necessarily based on a range of assumptions. While
founded on an empirical base and based on well-developed theoretical techniques, each step in the
analysis provides an opportunity for some level of uncertainty to enter into the  final result. These
uncertainties have been minimized to the greatest extent possible, and yet it is not possible to eliminate
completely. There is even greater error associated with predicting metrics for sites such as these, which
are subject to a  range of human activity that can only be quantified in general terms. As a result, the
estimates derived from this analysis should clearly not be assumed appropriate  for other types of
water-resource evaluations, especially those focused on management and forecasting of extreme flow
conditions that require greater precision. Nonetheless, they can be considered suitable for the stated
purpose, that is, hydrologic classification of stream systems to support ecological analysis.
                                              23
-------
References

Burn, D.A., 1990a. An appraisal of the "region of influence" approach to flood frequency analysis.
    Hydrological Sciences 35(2): 149-165.
Burn, D.H., 1990b. Evaluation of regional flood frequency analysis with a region of influence
    approach. Water Resources Research 26(10): 2257-2265.
Cunnane, C., 1988. Methods and merits of regional flood frequency analysis. Journal of Hydrology
    100: 269-290.
Daly, C., R.P. Neilson, and D.L. Phillips, 1994. A statistical-topographic model for mapping
    climatological precipitation over mountainous terrain.  Journal of Applied Meteorology 33: 140-
    158.
EMAP (Environmental Monitoring and Assessment Program), 1997. Research Plan 1997. U.S.
    Environmental Protection Agency, Washington, DC: 138 pp.
Environmental Systems Research Institute, Inc., 1999. Getting started with Arc/Info: Redlands, CA.
    Environmental Systems Research Institute, Inc., 230 p.
Gordon, N.D., T.A. McMahon,  and B.L. Finlayson,  1992. Stream hydrology, an introduction for
    ecologists. John Wiley & Sons, 526 p.
Greenwood, J.A., J.M. Landwehr, N.C. Matalas, and J.R. Wallis, 1979. Probability weighted moments:
    definition and relation to parameters of several distributions expressible in inverse form. Water
    Resources Research 15(5):  1049-1054.
GREHYS, 1996a. Presentation and review of some methods for regional flood frequency analysis.
    Journal of Hydrology 186: 63-84.
GREHYS, 1996b. Inter-comparison of regional flood frequency procedures for Canadian rivers.
    Journal of Hydrology 186: 85-103.
Haan, C.T., 2002. Statistical methods in hydrology. Iowa State Press, Ames, IA. 496 p.
Hosking, J.R.M., and J.R. Wallis, and E.F. Wood, 1985. An appraisal of the regional flood frequency
    procedure in the UK. Hydrological Sciences 30: 85-109.
Hosking, J.R.M., and J.R. Wallis, 1993. Some statistics useful in regional frequency analysis. Water
    Resources Research, 29(2): 271-281.
Landwehr, J.M., and N.C. Matalas, 1979. Probability weighted moments compared with some
    traditional techniques in estimating Gumbel parameters and quantiles. Water Resources Research
    15(5): 1055-1064.
Lettenmaier, D.P., J.R. Wallis, and E.F. Wood, 1987. Effect of regional heterogeneity on flood
    frequency estimation. Water Resources Research 23: 313-324.
Leopold, L.B., 1994. A view of the river. Harvard University Press, Cambridge MA.
Mccune, B., and M.J. Mefford,  1999. PC-ORD. Multivariate analysis of ecological data, version 4.
    MjM Software Design, Gleneden Beach, Oregon, USA.
NOAA (National Oceanographic and Atmospheric Administration), 1973. Atlas 2, Precipitation
    frequency atlas of the western United States. Data available online
    (http://hdsc.nws.noaa.gov/hdsc/pfds/index.html).
Omernik, J.M., 1987. Ecoregions of the conterminous United States (map supplement):  Annals of the
    Association of American Geographers 77(1): 118-125.

                                             24
-------
Ouarda, T.B.M.J., C. Girard, G.S. Cavadias, andB. Bobee, 2001. Regional flood frequency estimation
    with canonical correlation analysis. Journal of Hydrology 254: 157-173.
Poff, N.L., and J.V. Ward, 1989. Implications of streamflow variability and predictability for lotic
    community structure: a regional analysis of streamflow patterns. Can. J. Fish. Aquat. Sci. 46:
    1805-1817.
Ribeiro-Correa, J., G.S. Cavadias, B. Clement, and J . Rousselle, 1995. Identification of hydrological
    neighborhoods using a canonical correlation analysis. Journal of Hydrology 173: 71-89.
Riggs, H.C., 1973. Regional analyses of streamflow characteristics. Techniques of Water-Resources
    Investigations of the U.S. Geological Survey, book 4, chap. B3, 15 p.
SAS Institute Inc., 1989. SAS/STAT users guide, version 6, 4th edition, volume 1 and 2, Cary, NC.
    1,686 p.
	1990. SAS procedures guide, version 6, 3rd edition, Cary, NC. 705 p.
Schwarz, G.E. and Alexander, R.B., 1995. State soil geographic (STATSGO) data base for the
    conterminous United States. U.S. Geological Survey, Open-File Report 95-449. Data available
    online (http://water.usgs.gov/lookup/getspatial$ussoils).
Simmers, I, 1975. The use of regional hydrology concepts for spatial translation of streamflow data.
    International Association of Hydrological Sciences Publication 117: 109-117.
Smakhtin, V.U., 2001. Low flow hydrology: a review. Journal of Hydrology 240: 147-186.
Stedinger, J.R., R.M. Vogel, and E. Foufoula-Georgiou,  1993. Frequency analysis of extreme events,
    pp. 18.1-18.66 in D.R. Maidment (ed) Handbook of hydrology. McGraw-Hill, Inc.
USDC (U.S. Department of Commerce), 1961. East U.S. rainfall frequency, Technical Paper No. 40,
    U.S. Department of Commerce, Weather Bureau.
USEPA (U.S. Environmental Protection Agency), 1998, Level III Ecoregions of the Conterminous
    United States in BASINS: http://www.epa.gov/waterscience/basins/metadata/ecoreg.htm, accessed
    March  13, 2006.
USEPA, 2006. Level III ecoregions of the conterminous Unites States. URL
    http://www.epa.gOv/wed/pages/ecoregions/level_iii.htm, updated March 13, 2006, accessed March
    13, 2006)
Vogelmann, J.E., T.L. Sohl, P.V. Campbell, andD.M. Shaw, 1998. Regional land cover
    characterization using Landsat Thematic mapper data and ancillary data sources. Environmental
    Monitoring and Assessment 51 (1-2): 415-428.
Ward, J.V., and J.A. Stanford, 1983. The intermediate disturbance hypothesis: an explanation for biotic
    diversity patterns in lotic ecosystems, p. 347-356 in E.D. Fontaine, III and S.M. Bartell (ed.)
    Dynamics of lotic ecosystems. Ann Arbor Press, Ann Arbor, MI.
Wiltshire, S.E., 1986. Identification of homogeneous regions for flood frequency analysis. Journal of
    Hydrology 84: 287-302.
Zrinji, Z. and D.H. Burn, 1994. Flood frequency analysis for ungaged sites using a region of influence
    approach. Journal of Hydrology 153: 1-21.
                                             25
-------
Appendix 1. Description of watershed characteristics used  in
     regional  frequency analysis.
         Basin characteristic
Abbreviation
Description
 Drainage area                           A
 Latitude                               Lat
 Longitude                              Long
 Maximum elevation                      E-max
 Minimum elevation                      E-min
 Mean elevation                         E-mean
 Variability in elevation                   E-std
 Watershed slope                         S
 Aspect                                Asp

 Percent agriculture                       Ag

 Percent range                           R
 Percent water                           W

 Percent wetland                         Wet
 Precipitation                            P
 Precipitation intensity                    I
 Maximum precipitation intensity           I-max
 Peak precipitation intensity                PI

 Variability in peak precipitation intensity     PIstd
 Soil water capacity                       AWC
 Soil clay content                         Clay
 Soil liquid limit                         LL
 Soil organic material                     OM
 Soil permeability                        Perm
 Soil thickness                           Thick
 Total water diversion                     Tdivert

 Total surface water diversion              SWdivert
 Yield                                 Y
                Area of watershed (mi )
                Latitude of site (degrees)
                Longitude of site (degrees)
                Maximum elevation of the basin (m)
                Minimum elevation of the basin (m)
                Mean elevation of the basin (m)
                Standard deviation for mean elevation (m)
                Mean watershed slope (%)
                Estimated aspect of watershed longest dimension
                Proportion of watershed covered by agricultural lands (NLCD-81,
                NLCD-82, NLCD-83, NLCD-84, andNLCD-61) (%)
                Proportion of watershed covered by range land (NLCD-51 and
                NLCD-71)(%)
                Proportion of watershed covered by open water (NLCD-11) (%)
                Proportion of watershed covered by wetlands (NLCD-91 and
                NLCD-92) (%)
                Total annual precipitation (m)
                Mean 2-year 24-hour precipitation intensity, at site
                Maximum for watershed, 2-year 24-hour precipitation intensity
                100-year 6-hour precipitation intensity, at site
                Standard deviation for watershed mean, 100-year 6-hour
                precipitation intensity
                Available water capacity of soil (inches/inch)
                Clay content in soil (% of material less than 2mm in size)
                Liquid limit of soil (% moisture by weight)
                Organic material in soil (% by weight)
                Permeability of soil (inches/hour)
                Cumulative thickness of all soil layers (inches)
                Sum of all water permitted to be withdrawn from watershed (cfs)
                Sum of all surface water permitted to be withdrawn from
                watershed (cfs)
                Ratio of total water diversion / watershed area
                                                      26
-------
AppGndJX 2. Results from canonical correlation analysis, by aggregated large-
scale region

[ns = not significant]

A. Plains
Parameter
N for sites
N for variables
Canonical variate pairs
Canonical correlation
Overlapping variance (canonical R-square)
Cumulative proportion extracted by
eigenvalue
Cumulative variance extracted for—
Hydrologic variates
Explained by own variables
By opposite variables
Watershed variates
Explained by own variables
By opposite variables
Correlations of variables and variates—
Hydrologic variables
Index flow (QP or QL)
Mean Julian day for annual peak or 7-day
low flow
Variability in mean Julian day
Normalized 2-year flow
Normalized 100-year flow
Percent intermittent flow
Watershed variables
Drainage area (A)
Maximum elevation (Emax)
Watershed slope (S)
100-6 precipitation intensity (PI)
Variability in precipitation intensity
(PIstd)
Soil water capacity (AWC)
Soil liquid limit (LL)
Total water diversion (Tdivert)
Peak flow
204
10
1
0.82
0.68
0.61


0.38
0.26

0.41
0.28
w/own w/other

0.67 0.55
0.87 0.72
-0.60 -0.50
0.37 0.31
-0.44 -0.36


0.60 0.49
0.83 0.68

-0.47 -0.39
0.61 0.50
-0.64 -0.53




2
0.75
0.56
0.98


0.56
0.36

0.71
0.44
w/own

0.69
-0.35
0.56
ns
ns


0.78
-0.54

0.59
ns
0.46


182
9
1
0.81
0.65
0.73


0.22
0.14

0.31
0.20
w/other w/own

0.52 0.94
ns ns
0.42
ns ns
ns ns
-0.42

0.58 0.84
-0.40
ns
0.44
ns
0.35
ns
0.69
Low flow


2
0.62
0.40
0.98


0.53
0.26

0.68
0.35
w/other w/own

0.76 ns
ns 0.91

ns 0.41
ns -0.35
-0.34 ns

0.68 0.53

ns -0.92



ns 0.57
0.56 ns














w/other

ns
0.58

ns
ns
ns

0.33

-0.58



0.36
ns
                                       27
-------
B. Mountains
Parameter
N for sites
N for variables
Canonical variate pairs
Canonical correlation
Overlapping variance (canonical R-square)
Cumulative proportion extracted by
eigenvalue
Cumulative variance extracted for:
Hydrologic variates
Explained by own variables
By opposite variables
Watershed variates
Explained by own variables
By opposite variables
Correlations of variables and variates—
Hydrologic variables
Index flow (QP or QL)
Mean Julian day for annual peak or 7-day
low flow
Variability in mean Julian day
Normalized 2-year flow
Normalized 100-year flow
Watershed variables
Drainage area (A)
Longitude (Long)
Maximum elevation (Emax)
Percent water (H20pct)
100-6 precipitation intensity (PI)
Soil clay content (Clay)
Total water diversion (Tdivert)

55
9
1
0.95
0.91
0.83


0.41
0.37

0.55
0.50
w/own

0.82
ns
-0.60
0.76


0.58
0.79
0.88

-0.79
-0.61

Peak flow


2
0.76
0.58
0.95


0.60
0.48

0.72
0.60
•w/other w/own

0.78 0.50
ns ns
-0.57 0.42
0.72 -0.56


0.55 0.76
0.76 ns
0.84 -0.41

-0.75 ns
-0.58 ns


60
8
1
0.87
0.76
0.81


0.22
0.17

0.46
0.36
w/other w/own

0.38 0.89
ns ns
0.32 ns
-0.43
ns

0.58 0.93
ns
-0.31 0.59
0.48
ns
ns
0.63
Low flow


2
0.59
0.35
0.95


0.65
0.32

0.66
0.42
w/other w/own

0.78 ns
ns 0.97
ns -0.74

ns -0.74

0.81 ns

0.52 0.79
0.42 ns


0.55 -0.31














w/other

ns
0.58
-0.44

ns

ns

0.47
ns


ns
                                     28
-------
C. Xeric (values in brackets from separate CCA for timing metrics)
Parameter
N for sites
N for variables
Canonical variate pairs
Canonical correlation
Overlapping variance (canonical R-square)
Cumulative proportion extracted by
eigenvalue
Cumulative variance extracted for:
Hydrologic variates
Explained by own variables
By opposite variables
Watershed variates
Explained by own variables
By opposite variables
Correlations of variables and variates—
Hydrologic variables
Index flow (QP or QL)
Mean Julian day for annual peak or
7-day low flow
Variability in mean Julian day
Normalized 2-year flow
Normalized 100-year flow
Percent intermittent flow
Watershed variables
Drainage area (A)
Longitude (Long)
Maximum elevation (Emax)
Mean elevation (Emean)
Variability in elevation (Estd)
2-6 precipitation intensity (PI)
2-24 maximum precipitation intensity
(PImax)
Soil organic material (OM)
Soil thickness (Thick)
Total water diversion (Tdivert)
Peak flow
46
6
1
0.82
0.70
0.84


0.64
0.45

0.52
0.36
w/own

1.00
(0.93)
(-0.98)
0.53



0.92
(-0.95)

(0.96)
0.67
0.71

(-.89)
0.51

(5)

(0.93)
(0.87)
(0.99)


(0.91)
(0.80)

(.87)
(.76)
w/other

0.83
(0.87)
(-0.92)
0.45



0.77
(-.89)

(0.90)
0.56
0.59

(-.84)
0.42


2
0.53
0.30
1.00


1.00
0.55

0.78
0.44
w/own

ns
(-0.37)
(ns)
0.84



ns
(ns)

(ns)
0.53
0.33

(.35)
0.77



(0.25)
(0.08)
(1.0)


(1.0)
(0.80)

(0.95)
(0.77)
w/other

ns
(ns)
(ns)
0.46



ns
(ns)

(ns)
ns
ns

(ns)
0.42

63
9
1
0.87
0.76
0.81


0.21
0.16

0.49
0.37
w/own

0.90
ns
ns

ns
-0.46

0.89

0.55



0.61


0.71
Low flow

2
0.62
0.38
0.97


0.44
0.25

0.72
0.46
w/other w/own

0.78 ns
ns 0.57
ns ns

ns -0.55
-0.40 -0.68

0.78 -0.40

0.48 0.79



0.53 ns


0.62 -0.33













w/other

ns
0.35
ns

-0.34
-0.42

ns

0.49



ns


ns
                                         29
-------
D. Pacific Northwest (west) (values in brackets from separate CCA for timing metrics)
Parameter
N for sites
N for variables
Canonical variate pairs
Canonical correlation
Overlapping variance (canonical R-square)
Cumulative proportion extracted by
eigenvalue
Cumulative variance extracted for:
Hydrologic variates
Explained by own variables
By opposite variables
Watershed variates
Explained by own variables
By opposite variables
Correlations of variables and variates—
Hydrologic variables
Index flow (QP or QL)
Mean Julian day for annual peak or 7-
day low flow
Variability in mean Julian day
Normalized 2-year flow
Normalized 100-year flow
Watershed variables
Drainage area (A)
Latitude (Lat)
Maximum elevation (Emax)
Minimum elevation (Emin)
Mean elevation (Emean)
Variability in mean elevation (Estd)
Mean annual precipitation (Pmean)
100-6 precipitation intensity (PI)
2-24 precipitation intensity (PI)
Soil clay content (Clay)
Groundwater diversion (GWdivert)
Peak flow
102
6
1
0.90
0.81
0.91


0.51
0.41

0.28
0.23
w/own

0.99
(.98)
(-0.75)
ns


0.97
ns

ns
ns

(0.94)
0.40

(-0.63)

(5)

(0.65)
(0.42)
(0.69)


(0.76)
(0.32)

(0.42)
(0.18)
w/other

0.89
(.64)
(-0.49)
ns


0.87
ns

ns
ns

(0.61)
0.35

(-0.41)


2
0.54
0.29
1.00


1.00
0.55

0.71
0.35
w/own

ns
(ns)
(0.66)
0.97


ns
0.90

(-0.84)
-0.74

(0.35)
0.56

(.072)



(0.50)
(0.25)
(1.00)


(1.00)
(0.38)

(.87)
(0.29)
w/other

ns
(ns)
(0.33)
0.52


ns
0.48

(-0.42)
0.40

(ns)
ns

(0.36)

123
9
1
0.91
0.82
0.85


0.26
0.22

0.48
0.40
w/own

0.99
ns
ns

ns

0.94

0.71


0.81


-0.47

0.37
Low flow

2
0.62
0.38
0.96


0.47
0.29

0.59
0.44
w/other w/own

0.89 ns
ns ns
ns 0.80

ns 0.38

0.85 ns

0.64 0.52


0.74 ns


-0.43 -0.41

0.34 -0.32













w/other

ns
ns
0.49

ns

ns

0.32


ns


ns

ns
                                        30
-------
E. Pacific Northwest (east) (values in brackets from separate CCA for timing metrics)
Parameter
N for sites
N for variables
Canonical variate pairs
Canonical correlation
Overlapping variance (canonical R-square)
Cumulative proportion extracted by
eigenvalue
Cumulative variance extracted for:
Hydrologic variates
Explained by own variables
By opposite variables
Watershed variates
Explained by own variables
By opposite variables
Correlations of variables and variates—
Hydrologic variables
Index flow (QP or QL)
Mean Julian day for annual peak or 7-
day low flow
Variability in mean Julian day
Normalized 2-year flow
Normalized 100-year flow
Watershed variables
Drainage area (A)
Minimum elevation (Emin)
Mean elevation (Emean)
100-6 precipitation intensity (PI)
2-24 precipitation intensity (PI)
Soil available water capacity (AWC)
Soil permeability (Perm)
Soil thickness (Thick)
Total water diversion (Tdivert)
Groundwater diversion (Gwdivert)
Peak flow
52
6
1
0.91
0.83
0.92


0.52
0.43

0.42
0.35
w/own

0.99
(ns)
(0.98)
ns


0.90
-0.77
(-0.93)
(0.34)

ns

(ns)
0.52

(5)

(0.81)
(0.66)
(0.65)


(0.52)
(0.34)

(0.33)
(0.22)
w/other

0.91
(ns)
(0.79)
ns


0.82
-0.70
(-0.75)
ns

ns

(ns)
0.47


2
0.54
0.29
1.00


1.00
0.57

0.68
0.42
w/own

ns
(0.95)
(ns)
0.98


ns
ns
(ns)
(0.85)

0.91

(0.94)
0.39



(0.72)
(0.51)
(1.00)


(1.00)
(0.59)

(0.89)
(0.50)
w/other

ns
(0.68)
(ns)
0.53


ns
ns
(ns)
(0.61)

0.49

(0.67)
ns

59
9
1
0.84
0.71
0.63


0.34
0.24

0.42
0.30
w/own

0.98
-0.39
ns

-0.36

0.81

-0.55

0.63

ns


0.82
Low flow

2
0.72
0.51
0.90


0.58
0.36

0.71
0.45
w/other w/own

0.83 ns
-0.32 -0.32
ns 0.68

-0.30 -0.64

0.68 -0.57

-0.46 ns

0.53 0.57

ns 0.87


0.69 ns













w/other

ns
ns
0.48

-0.46

-0.41

ns

0.41

0.62


ns
                                        31
-------