September 2022
EPA-840-R-22Q03

of the Beta
Streamflow Duration Assessment Method
(SDAM) for the Great Plains (GP)


-------
Development and Evaluation of the Beta
Streamflow Duration Assessment Method
for the Great Plains

Data analysis supplement

Michele Eddy
RTI International

Research Triangle Park, NC 27709
Ken Fritz

Office of Research and Development
Cincinnati, OH 45268

Shannon Gross
RTI International
Fort Collins, CO 80528

Brian Topping

Office of Wetlands, Oceans, and Watersheds
Washington, DC 20004

Tracie-Lynn Nadeau	Rachel Fertik Edgerton

Office of Wetlands, Oceans, and Watersheds Office of Wetlands, Oceans, and Watersheds

Portland, OR 97205	Washington, DC 20004

Julie Kelso, ORISE Fellow (former)

Office of Wetlands, Oceans, and Watersheds

Washington, DC 20004

This document has been reviewed in accordance with U.S. Environmental Protection Agency
policy and approved for publication. This report fulfills EPA QA requirements. The research for
the data was conducted under the Office of Water approved Quality Assurance Project Plan
"Streamflow Duration Assessment Method (SDAM) development in the Great Plains (GP) and
Western Mountains (WM)" which was given an ORD ID of J-WECD-0033408-QP-1-0. Any mention
of trade names, manufacturers or products does not imply an endorsement by the United States
Government orthe U.S. Environmental Protection Agency. EPA and its employees do not endorse
any commercial products, services, or enterprises. Funding was provided under contracts EP-C-
17-001 and 68HERC21D0008 for data management and analysis, respectively, and EP-C-16-006
for data collection. The views expressed in this report are those of the authors and do not
necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Suggested citation: Eddy M., Gross, S., Fritz, K.M., Nadeau, T.-L., Topping, B., Fertik Edgerton, R.,
and Kelso, J. 2022. Development and Evaluation of the Beta Streamflow Duration Assessment
Method for the Great Plains. Document No. EPA-840-R-22003.

1


-------
Introduction

Streamflow duration assessment methods (SDAMs) are rapid, field-based methods to determine
flow duration class at the reach scale. The development of a beta SDAM for the Northern and
Southern Great Plains regions (hereafter referred to as the GP) followed the conceptual
framework and process steps presented by Fritz and others (2020) to integrate the three key
components of an SDAM development study: hydrological data, indicators, and study reaches.

This supplemental document describes the data collection, data analysis, and evaluation steps
that resulted in the beta SDAM for the GP. This document is available to inform public review and
comment on the beta method, as well as serving as a companion to the beta SDAM GP for those
that are interested in more background on the development of the method and the underlying
data. For a complete description of the beta SDAM GP protocol, please see the User Manual
(https://www.epa.gov/svstem/files/documents/2022-09/beta-sdam-for-the-gp-user-
manual.pdf). The data used to develop the beta SDAM GP can be found here:
(https://doi.org/10.23719/1527943). For more information on the collaborative effort between
the U.S. Environmental Protection Agency (EPA) and the U.S. Army Corps of Engineers (Corps) to
develop regional SDAMs for nationwide coverage, please see: https://www.epa.gov/streamflow-
duration-assessment.

Streamflow Duration Classes

Streamflow duration governs important ecosystem functions (such as support for aquatic life,
sediment transport, and biogeochemical processing rates), and streamflow duration classes are
often used to guide watershed management decisions, including assessing the applicability of
water quality standards. Our definitions of streamflow duration classes follow those used by
Nadeau (2015):

•	Ephemeral reaches flow only in direct response to precipitation. Water typically flows
only during and/or shortly after large precipitation events, the streambed is always
above the water table, and stormwater runoff is the primary water source.

•	Intermittent reaches contain sustained flowing water for only part of the year, typically
during the wet season, where the streambed may be below the water table or where
the snowmelt from surrounding uplands provides sustained flow. The flow may vary
greatly with stormwater runoff.

•	Perennial reaches contain flowing water continuously during a year of normal rainfall,
often with the streambed located below the water table for most of the year.
Groundwater typically supplies the baseflow for perennial reaches, but the baseflow
may also be supplemented by stormwater runoff or snowmelt.

For these definitions, a reach is a section of stream or river along which similar hydrologic
conditions exist (e.g., discharge, depth, velocity, or sediment transport dynamics) and
consistent drivers of hydrology are evident (e.g., slope, substrate, geomorphology, or

2


-------
confinement). A channel is an area that is confined by banks and a bed and contains flowing
water (continuously or not).

Overview of the Beta Method for the Great Plains

The beta SDAM GP uses a small number of indicators to predict the streamflow duration class of
stream reaches. All indicators are measured during a single field visit. The beta SDAM GP results
in one of four possible classifications: ephemeral, intermittent, perennial, or at least intermittent.
The latter category occurs when an intermittent or perennial classification cannot be made with
high confidence, but an ephemeral classification can be ruled out.

The tool uses a machine learning model known as random forest (Figure 1). Random forest
models are increasingly common in the environmental sciences because of their superior
performance in handling complex relationships among indicators used to predict classifications.
This approach was previously used to develop regional SDAMs for the Pacific Northwest (PNW;
Nadeau et al. 2015, Nadeau 2015), Arid West (AW; Mazor et al. 2021a, Mazor et al 2021b), and
Western Mountains (WM; Mazor et al. 2021c; Mazor et al. 2022).

Set aside 20% for
testing

Sample from the original
training set with
replacement to create
independent subsamples

Build the trees on a
random subset of
features

Aggregate decisions

Majority voting

| Ephemeral Final Prediction

Figure 1. Random forest procedure used to determine a flow classification.

3


-------
Development of the Beta Great Plains SDAM

The specific data analysis steps described in this document follow the approach used to develop
and evaluate the beta SDAM WM (Mazor et al. 2022).

Study Area

The GP spans the central U.S. from Canada to Mexico and encompasses all or portions of 15 states
(Figure 2). It includes areas largely dominated by native prairie-type vegetation (tall, short, and
mixed grass) that generally receive less than 40 inches of precipitation a year. However,
significant forested areas are also found in the northeast part of the Northern GP region, where
average yearly rainfall totals are closer to the upper end of the range (30 to 40 inches). The GP
regions are divided into Northern and Southern GP regions based on the importance of snowmelt
to river discharge; the boundary between the two approximately follows the line south of which
mean annual snowfall is less than 0.7 m/y (<2 ft/y; Wohl et al. 2016). Ephemeral and intermittent
reaches may be found at any position within a watershed but are more common in smaller
headwaters, where flow accumulation is insufficient to sustain longer-duration flows. Ephemeral
and intermittent reaches are also generally more common in semi-arid parts of these regions,
where mean annual precipitation totals are lowest (10-20 inches), and evapotranspiration is
relatively high.

There are several large and/or growing metropolitan areas within or partially within the GP,
including Austin, Chicago, Dallas, Denver, Kansas City, Minneapolis, Milwaukee, and San Antonio.
Thus, there are places within the GP regions where the need for an SDAM in permitting and
management programs is particularly high. In addition, development associated with oil and
natural gas, as well as agricultural uses that may require more and/or modified water sources
due to climate change, occur across the GP (Vengosh et al. 2014, Perkin et al. 2017). Within a
portion of the Southern GP region, there is one SDAM currently in use, applicable to New Mexico
(New Mexico Environment Department [NMED] 2011).

4


-------
Alaska

Pacific
Northwest

Northern
Great Plains

Northeast

Western
Mountains

Arid
West

Southeast

Hawaii

Southern
Great Plains

Figure 2. Map ofSDAM study regions (based on Wohl et al. 2016). The beta SDAM GP applies to the Northern and Southern

Great Plains as shown.

Preparation and Candidate Indicators

At the outset of the project, we assembled a regional steering committee (RSC) consisting of
technical staff at Corps Districts and EPA Regional Offices in the GP region that manage
programs where streamflow duration information is often needed (e.g., Clean Water Act
programs, including permits and enforcement). RSC members were selected based on their
expertise in both scientific and programmatic elements relevant to streamflow duration
classification needs. The RSC served several functions in the development process, such as
reviewing technical products, facilitating connections with local experts, identifying resources
such as sources of hydrologic data, and providing input on the model selection.

We identified candidate indicators that were supported by the scientific literature (James et al.
2022) or used in the New Mexico SDAM (herein referred to as NM method; NMED 2011). In
addition, we included candidate indicators from the SDAM PNW (Nadeau 2015). Following
input from the RSC, these candidate indicators were then screened using the criteria described
by Fritz and others (2020), including:

Primary criteria

•	Consistency: Does the indicator consistently discriminate among flow duration classes
(e.g., demonstrated in multiple studies)?

•	Repeatability: Can different practitioners take similar measurements, given sufficient
training and standardization?

5


-------
•	Defensibility: Does the indicator have a rational mechanistic relationship with flow
duration, as either a response or a driver?

•	Rapidness: Can the indicator be measured during a one-day reach-visit (even if
subsequent lab analyses are required)?

•	Objectivity: Does the indicator rely on objective (often quantitative) measures, as
opposed to subjective judgments of practitioners?

Secondary criteria

•	Robustness: Does human activity complicate indicator measurement or interpretation
(e.g., poor water quality may affect the expression of some biological indicators)?

•	Practicality: Can practitioners realistically sample the indicator with typical capacity,
skills, and resources?

Candidate indicators were included in the study (Table 1) if they: 1) met all the primary criteria;
2) at least one of the secondary criteria; or 3) were included in the NM method (Level 1 only) to
facilitate comparison (because not all NM indicators met all primary criteria). Desktop
geospatial indicators (derived using a geographic information system and applicable spatial
datasets) that characterize mechanisms affecting flow duration and have been explored in
other flow duration classification tools (e.g., Eng et al. 2016, Jaeger et al. 2019, Mazor et al.
2021c) were also included in the analysis.

Table 1. Candidate indicators evaluated in the present study. Indicators with "NM" in the Origin column were measured following
the NM method protocol (NMED 2011) and indicators marked with "PNW" were measured following the PA/1/1/ protocol (Nadeau
2015); other indicators (OTH) were measured with protocols developed for this study (USEPA 2019) and derived from sources
resulting from a literature review completed by James et al. (2022) or recommendations from the RSC. Asterisks (*) indicate
hydrologic indicators that are considered direct measures of water presence.

Candidate indicator

Description

Origin

Geomorphic indicators



Sinuosity

Visual estimate of the curviness of the stream
channel

NM



Bankfull width

Width of the channel at bankfull height

PNW



Floodplain channel

Visual estimate of the extent of channel

NM



dimensions

entrenchment and connectivity to the floodplain





Particle size/stream substrate

Visual estimate of the extent of evidence of

NM



sorting

substrate sorting within the channel





Slope

Valley slope measured with a handheld
clinometer

PNW



In-channel structure/riffle

Visual estimate of the diversity and

NM



pool sequence

distinctiveness of riffles, pools, and other flow-
based microhabitats





Sediment deposition on

Visual estimate of the extent of evidence of

NM



plants and debris

sediment deposition on plants and on debris
within the floodplain



6


-------
Candidate indicator Description Origin

H

ydrologic indicators



Surface and subsurface flow*

Estimate of the percent of the reach-length with
surface and subsurface flow

PNW



Isolated pools*

Number of pools in the channel without any
connection to flowing surface water

PNW



Water in channel*

Visual estimate of the extent of surface flow in
the channel

NM



Seeps and springs*

Presence/absence of springs or seeps within one-
half channel width of the channel

NM



Hydric soils

Presence/absence of hydric soils within the
channel, measured at up to 3 locations

NM



Soil moisture and texture*

Extent of soil saturation and texture measured at
three locations in the channel

OTH



Woody jams

Number of woody jams within the channel

OTH

Biological indicators



Live and dead algal cover

Visual estimate of the percent of streambed
covered by live or dead algal growth

OTH



Filamentous algal abundance

Estimate of the overall abundance of filamentous
algae within the channel

NM



Stream shading

Percent shade-providing cover above the
streambed measured with a densiometer at
three locations

OTH



Hydrophytic plant species

Number of OBL or FACW-rated plants (as listed in
Lichvar et al. 2016) growing within the channel or
one half-channel width from the channel

PNW



Fish

Estimate of the overall abundance offish (other
than non-native mosquitofish) in the channel.

NM



Aquatic invertebrates

Abundance and richness of aquatic invertebrate
families collected from the channel

PNW



Aquatic invertebrates

Estimate of the overall abundance of aquatic
invertebrates within the channel

NM



Amphibians

Estimate of the overall abundance of amphibians
within the channel

NM



Mosses and liverworts

Visual estimate of the percent of streambed and
banks covered by live or dead bryophytes or
liverworts

OTH



Differences in vegetation
(riparian corridor)

Visual estimate of the distinctiveness of
vegetation in the riparian corridor compared to
surrounding upland vegetation

NM



Absence of upland rooted
plants in the streambed

Visual estimate of the extent of upland rooted
plants growing within the streambed

NM

7


-------
Candidate indicator

Description

Origin



Presence of iron-oxidizing
fungi or bacteria

Presence of oily sheens indicative of iron-
oxidizing fungi or bacteria within the assessment
reach

NM



Presence of aquatic or semi-
aquatic snakes

Presence of aquatic or semi-aquatic snakes (e.g.,
most garter snake species) in the channel

PNW

Geospatial indicators



Elevation

Elevation above mean sea level

OTH



Long-term normal
precipitation and
temperature

30-y normal mean annual and monthly
precipitation, and 30-y normal mean, maximum,
and minimum annual temperature (PRISM
climate data; Hart and Bell 2015).

OTH



Strata (location)

The four subregions or 'strata' into which the
Northern and Southern Great Plains have been
subdivided: Northern Prairie, Central Prairie,
Upper Midwest, and Southern Plains

OTH



Baseflow Index (BFI)

The ratio of baseflow to total flow, expressed as
a percentage and provided as a 1-kilometer
raster grid for the conterminous U.S. (Wolock,
2003)

OTH

Candidate Reach Identification and Data Collection

We had two objectives in selecting candidate reaches for this study: first, to include a sufficient
number of reaches in each streamflow duration class to characterize variability in indicator
measurements; and second, to select reaches representing the range of key natural and
disturbance gradients within the GP to support applicability of the method across anticipated
conditions. To support our goal of geographic representativeness, we subdivided the Northern
GP into 3 subregions or strata, based on EPA Level II Ecoregion boundaries (Omernik 1995). This
resulted in 4 strata: Central Prairie, Northern Prairie, Upper Midwest, and Southern Great
Plains. We aimed to select 290 stream-reaches (one assessed location per reach) with equal
representation of perennial, intermittent, and ephemeral flow duration classes among and
within the four GP strata (Figure 3).

8


-------
To screen reaches for use iri method
development, we first compiled a list
of 3566 candidate study reaches
based on existing hydrologic data
records (e.g., U.S. Geological Survey
(USGS) stream gages, water presence
loggers, wildlife cameras, field
photos), published studies, and
interviews with local experts familiar
with the specific reach's hydrology.

Most of these reaches (2945) were
derived from the database of stream
gages operated by the USGS and
2298 (78%) of them were perennial.

(Actual streamflow duration class
was determined by applying the
flowchart in Figure 4, which was
informed by existing definitions
(Hedman and Osterkamp 1982,

Hewlett 1982).) Consequently, other
sources were required to identify
candidate ephemeral and
intermittent reaches. Another 621
candidate study reaches were

Figure 3: The four GP sub-strata; study reaches shov
-------
r



DOR >328

v



No

f

Insufficient
record

Yes

*¦ Zyear <37

No

Zyear >328

No

Myear > 37

No

Unclassified

Yes

Perennial

Yes

Ephemeral

Yes

Intermittent

Figure4. Flowchart used to determine actual streamflow duration class of reaches based on continuous measures of water
presence (e.g., USGS streamgages). DOR: days of record. Zyear: Average number of dry days per year. Myear: Average length of
longest continuous wet period per year, in days. For USGS gages, at least 20 years of data were analyzed whenever possible

(Kelso and Fritz 2021j.

Of the 3566 candidate reaches, 293 study reaches were sampled from November 2019 to June
2021. These study reaches were parsed into 'instrumented' and 'single-visit' reaches1.
Instrumented reaches (183) were visited multiple times (up to four), and each had at least one
Stream Temperature, Intermittence, and Conductance (STIC; Chapin et al. 2014) logger
deployed, with 10% of instrumented reaches having duplicate data loggers. Instrumented
reaches generally had fewer existing lines of evidence to determine actual streamflow duration
classification before sampling; therefore, post-sampling reach classifications were reviewed in
light of the STIC logger data and hydrology indicator data that were direct measures of water
presence collected during each visit. For further details on STIC data loggers and their
verification/calibration, deployment, and data retrieval, see Schumacher and Fritz (2019).
Single-visit reaches (110) were visited once (with a 10% resample) and did not have loggers
deployed. Because actual streamflow duration classification of most single-visit reaches was
determined using existing data, these reaches generally had multiple direct flow duration data
sources. Ultimately, due to data loss from STIC loggers and other factors, actual streamflow
duration class at 42 reaches (35 instrumented and seven single-visit reaches) could not be

1 These reaches were termed 'baseline' and 'validation', respectively, in prior beta SDAMs but have been renamed
for clarity.

10


-------
determined with confidence and were excluded from analysis used to develop the beta SDAM
GP. Of the 251 study reaches used to develop the beta SDAM GP, 71 were ephemeral, 100 were
intermittent, and 80 were perennial (Table 2).

Table 2. Distribution of reaches used to develop the beta SDAM GP. Instrumented reaches were visited up to four times and had
Stream Temperature, Intermittence, and Conductance loggers installed and single-visit reaches were visited once (rarely, twice)
and did not have loggers installed.

Single-Visit	Instrumented

Class	Gaged Preferred Gaged Preferred Acceptable Total

Ephemeral	14	14	6	7	30 71

-Northern Prairie	3	6	0	0	9	10

-Upper Midwest	0	2	0	1	2	7

-Central Prairie	7	4	1	3	13	6

-Southern Plains	4	2	5	3	6	8

Intermittent	13	26	10	15	36 100

-Northern Prairie	4	3

-Upper Midwest	1	13

-Central Prairie	2	7

-Southern Plains	6	3

Perennial	32	4

-Northern Prairie	9	0

-Upper Midwest	8	1

-Central Prairie	8	1

-Southern Plains	7	2

4	2	6	19

1	1	23	39

2	8	5	24

3	4	2	18
23	9	12	80

6	0	1	16

5	6	9	29

7	1	1	18
5	2	1	17

During each field visit to a study reach the suite of candidate indicators (Table 1) were
measured following the development protocol (USEPA 2019). This compilation of indicators
from a single field visit constitutes one reach sample (or observation) in terms of the analyses
described within this data analysis supplement. Surrounding land use may affect or disturb
streamflow duration indicators without substantially shifting flow duration at reaches (e.g.,
changes in water quality). Up to two predominant land use categories within a 100-m radius of
each study reach were noted on each field visit. If "urban" or "agriculture" were the identified
land use category the sample was considered disturbed; otherwise, the sample was considered
not disturbed for comparisons of beta SDAM GP performance.

Data analysis

Metric calculation

Candidate indicator data were used to create 95 candidate metrics, of which 52 were biological,
11 were geomorphological, ten were hydrologic (eight directly measured water presence, and
two were indirect measurements), and 22 were geospatial (Table 3).

11


-------
Table 3. Candidate metrics evaluated for the development of the beta SDAM GP. Please see Appendix A for full definitions of Candidate metrics. Asterisks (*) indicate hydrologic metrics that directly
measure the presence of water. Abbreviations in Candidate metric names include - EPT: Ephemeroptera, Plecoptera, and Trichoptera insect orders. GOLD: Gastropoda, Oligochaeta, and Diptera
invertebrate groups. OCH: Odonata, Coleoptera, and Heteroptera insect orders. For Type the following categories apply - Ord: Ordinal metrics. Cat: Categorical metrics. Bin: Binary metrics. Con:
Continuous metrics. The following fields provide the screening criteria - PctDom: Percent of reach samples with the most common value (typically zero). Min: minimum value. Max = maximum value.
Range: Maximum possible value minus minimum possible value for the candidate metric. PvlvE: F-statistic from a comparison of mean values at perennial, intermittent, and ephemeral reaches.
EvAU: Absolute t-statistic from a comparison of mean values at ephemeral and at least intermittent reaches. PvNP: Absolute t-statistic from a comparison of mean values at perennial and non-
perennial reaches. PvlWet: Absolute t-statistic from a comparison of mean values at flowing intermittent and perennial reaches. Evldry: Absolute t-statistic from a comparison of mean values at non-
flowing intermittent and ephemeral reaches. rf_MDA: Variable importance from a random forest model, measured as mean decrease in accuracy. Screened: Indicates if the metric passed or failed
screening criteria in Table 5. NA = not applicable

Candidate metrics

Group

Type

PctDom

Min

Max

Range

PvlvE

EvNE

PvNP

Pvlwet

Evldry

rf_MDA

Screened

ai_present

Bio

Bin

64%

0

1

1

267.06

21.08

19.12

4.24

3.99

0.01

Pass

Algae_score

Bio

Ord

48%

0

3

3

126.58

15.63

12.90

4.95

3.27

0.01

Pass

algdead_cover_score

Bio

Ord

89%

0

3

3

11.17

4.83

3.49

2.35

1.29

0.00

Pass

algdead_noupstream_cover_score

Bio

Ord

89%

0

3

3

11.52

4.73

3.58

2.56

1.29

0.00

Pass

a Igl ive_co ve r_sco re

Bio

Ord

52%

0

4

4

102.24

14.77

11.35

4.08

3.14

0.01

Pass

alglivedead_cover_score

Bio

Ord

50%

0

4

4

106.29

15.09

11.50

4.21

3.58

0.01

Pass

amphib_score

Bio

Bin

83%

0

1

1

17.82

8.11

2.43

0.62

2.11

0.00

Pass

BMI score

Bio

Ord

43%

0

3

3

292.33

22.08

21.59

7.36

3.50

0.01

Pass

DifferenceslnVegetation_score

Bio

Ord

27%

0

3

3

86.72

12.36

9.71

3.61

4.34

0.00

Pass

EPT abundance

Bio

Con

57%

0

45

45

117.44

13.85

11.26

7.84

2.15

0.01

Pass

EPT relabd

Bio

Con

57%

0

1

1

125.81

15.40

12.33

7.38

1.73

0.01

Pass

EPT reltaxa

Bio

Con

57%

0

1

1

141.40

16.00

13.23

7.61

1.95

0.01

Pass

EPT taxa

Bio

Con

57%

0

7

7

172.21

15.77

14.07

9.45

2.01

0.01

Pass

Fish score

Bio

Ord

65%

0

3

3

116.37

14.52

12.18

6.49

0.03

0.00

Pass

fishabund score2

Bio

Ord

67%

0

3

3

116.06

14.39

12.12

6.43

1.87

0.00

Pass

frogvoc_score

Bio

Bin

85%

0

1

1

10.63

5.90

1.80

1.01

2.02

0.00

Pass

GOLD abundance

Bio

Con

51%

0

29

29

37.81

10.88

5.32

0.85

2.48

0.00

Pass

GOLD relabd

Bio

Con

51%

0

1

1

30.34

9.27

2.77

2.63

1.29

0.00

Pass

GOLD reltaxa

Bio

Con

51%

0

1

1

39.55

10.15

4.37

1.90

1.57

0.00

Pass

GOLD taxa

Bio

Con

51%

0

5

5

75.47

13.95

8.56

2.18

2.54

0.00

Pass

GOLDOCH relabd

Bio

Con

42%

0

1

1

54.00

11.44

3.20

2.99

3.64

0.01

Pass

GOLDOCH reltaxa

Bio

Con

42%

0

1

1

76.36

13.53

5.28

2.25

3.98

0.01

Pass

hydrophytes_present

Bio

Ord

22%

0

8

8

116.68

15.60

10.60

3.77

5.15

0.00

Pass

hydrophytes_present_any

Bio

Bin

78%

0

1

1

116.21

11.87

10.45

2.02

5.71

0.00

Pass

hydrophytes_present_noflag

Bio

Ord

22%

0

8

8

117.79

15.88

10.53

3.59

5.26

0.01

Pass

12


-------
Candidate metrics

Group

Type

PctDom

Min

Max

Range

PvlvE

EvNE

PvNP

Pvlwet

Evldry

rf_MDA

Screened

iofb score

Bio

Bin

89%

0

1.5

1.5

4.75

3.65

0.81

1.52

0.91

0.00

Pass

liverwort cover score

Bio

Ord

97%

0

3

3

5.35

2.40

2.44

2.24

0.55

0.00

Fail

mayfly_abundance

Bio

Con

64%

0

30

30

83.84

12.38

9.67

6.48

1.98

0.00

Pass

mayfly_gt6

Bio

Bin

81%

0

1

1

63.01

11.22

8.54

5.70

2.08

0.00

Pass

moss cover score

Bio

Ord

92%

0

3

3

18.09

6.27

4.47

3.18

1.27

0.00

Pass

Noninsect abundance

Bio

Con

61%

0

30

30

31.19

10.38

4.58

0.64

2.58

0.00

Pass

Noninsect relabund

Bio

Con

61%

0

1

1

23.53

8.35

2.48

1.79

1.87

0.00

Pass

Noninsect reltaxa

Bio

Con

61%

0

1

1

29.39

9.00

3.36

1.45

1.97

0.00

Pass

Noninsect taxa

Bio

Con

61%

0

4

4

50.62

12.11

6.61

1.64

2.57

0.00

Pass

OCH abundance

Bio

Con

58%

0

29

29

17.45

6.94

1.97

0.47

3.97

0.00

Pass

OCH relabd

Bio

Con

58%

0

1

1

18.54

6.81

1.46

0.99

3.84

0.00

Pass

OCH reltaxa

Bio

Con

58%

0

1

1

31.55

9.29

2.78

0.91

4.05

0.00

Pass

OCH taxa

Bio

Con

58%

0

6

6

39.62

11.03

4.90

0.74

4.14

0.00

Pass

PctShading

Bio

Con

32%

0

1

1

5.32

2.42

1.04

2.73

1.12

0.00

Pass

peren_present

Bio

Bin

72%

0

1

1

133.73

12.37

13.62

8.83

0.49

0.00

Pass

perennial_abundance

Bio

Con

72%

0

32

32

53.94

8.29

7.82

5.86

0.09

0.00

Pass

perennial_live_abundance

Bio

Con

72%

0

32

32

52.84

8.23

7.75

5.79

0.09

0.00

Pass

perennial_taxa

Bio

Con

72%

0

5

5

98.70

10.06

10.80

8.02

0.27

0.00

Pass

Richness

Bio

Con

36%

0

18

18

189.94

19.03

15.32

7.04

3.81

0.02

Pass

ripariancorr_score

Bio

Bin

70%

0

1

1

39.98

8.12

4.75

0.53

3.06

0.00

Pass

snake score

Bio

Bin

98%

0

1

1

3.26

2.30

1.96

1.88

1.09

0.00

Fail

TotalAbundance

Bio

Con

36%

0

86

86

121.21

16.60

11.51

5.16

3.89

0.02

Pass

turt score

Bio

Bin

95%

0

1

1

7.06

4.30

2.71

1.34

0.55

0.00

Fail

UplandRootedPlants_score

Bio

Ord

57%

0

3

3

180.91

15.56

16.32

4.54

3.13

0.01

Pass

vert score

Bio

Bin

71%

0

1

1

32.79

10.33

3.80

0.58

3.03

0.00

Pass

vert sumscore

Bio

Ord

79%

0

3

3

22.05

8.68

3.68

0.69

2.27

0.00

Pass

vertvoc sumscore

Bio

Bin

71%

0

4

4

27.02

9.57

3.67

0.03

2.74

0.00

Pass

BankWidthMean

Geomorph

Con

2%

0.4

68.3

67.9

24.76

4.16

6.63

4.36

0.79

0.02

Pass

ChannelDimensions score

Geomorph

Ord

57%

0

3

3

3.25

1.40

1.37

1.13

2.40

0.00

Pass

erosion score

Geomorph

Bin

89%

0

1

1

0.18

0.53

0.01

0.56

0.19

0.00

Fail

floodplain_score

Geomorph

Bin

66%

0

1

1

2.37

1.39

0.87

1.41

0.79

0.00

Pass

RifflePoolSeq_score

Geomorph

Ord

30%

0

3

3

40.49

7.71

7.69

3.27

0.69

0.00

Pass

SedimentOnPlantsDebris_score

Geomorph

Ord

29%

0

1.5

1.5

30.88

7.25

5.91

0.80

0.41

0.00

Pass


-------
Candidate metrics

Group

Type

PctDom

Min

Max

Range

PvlvE

EvNE

PvNP

Pvlwet

Evldry

rf_MDA

Screened

Sinuosity_score

Geomorph

Ord

49%

0

3

3

15.96

6.01

1.40

1.05

4.16

0.00

Pass

Slope

Geomorph

Ord

40%

0

20

20

4.57

1.77

3.70

1.96

0.32

0.00

Pass

slope_gtl0.5

Geomorph

Bin

98%

0

1

1

3.17

0.51

3.03

2.26

0.78

0.00

Fail

slope_gtl6

Geomorph

Bin

100%

0

1

1

1.20

1.00

1.00

0.00

1.00

0.00

Fail

SubstrateSo rti ng_sco re

Geomorph

Ord

33%

0

3

3

77.71

8.20

13.21

6.77

1.35

0.01

Pass

BFI

GIS

Con

5%

7

76

69

38.44

4.19

8.47

6.30

0.21

0.01

Pass

Elev m

GIS

Con

3%

13

2643

2630

3.77

2.41

0.42

2.27

0.77

0.01

Pass

MeanSnowPersistence 01

GIS

Con

1%

0.000

52.789

52.789

34.02

8.76

5.23

2.18

4.83

0.01

Pass

MeanSnowPersistence 05

GIS

Con

1%

0.096

50.826

50.730

33.88

8.56

5.33

2.20

4.84

0.01

Pass

MeanSnowPersistence 10

GIS

Con

1%

0.074

51.522

51.448

34.02

8.57

5.34

2.23

4.71

0.01

Pass

ppt

GIS

Con

1%

287.21

1056.46

769.25

12.69

5.03

2.98

0.12

1.57

0.01

Pass

ppt.mOl

GIS

Con

1%

6.33

70.31

63.97

6.84

1.75

3.63

2.30

1.35

0.01

Pass

ppt.m02

GIS

Con

1%

7.44

69.55

62.11

5.72

2.69

2.94

1.10

0.11

0.01

Pass

ppt.m03

GIS

Con

1%

9.38

90.06

80.68

3.52

1.73

2.61

0.97

0.89

0.01

Pass

ppt.m04

GIS

Con

1%

9.77

103.05

93.29

13.89

5.77

3.08

0.13

1.72

0.01

Pass

ppt.m05

GIS

Con

1%

25.45

152.61

127.15

7.95

4.07

1.83

1.00

1.22

0.01

Pass

ppt.m06

GIS

Con

1%

28.57

146.24

117.68

21.84

7.01

1.41

1.82

3.68

0.01

Pass

ppt.m07

GIS

Con

1%

25.22

123.55

98.33

19.77

6.77

1.49

1.08

4.24

0.01

Pass

ppt.m08

GIS

Con

1%

16.13

121.45

105.32

14.81

5.96

0.71

1.77

3.32

0.02

Pass

ppt.m09

GIS

Con

1%

16.68

130.63

113.95

11.64

4.66

3.01

0.48

1.56

0.01

Pass

ppt.mlO

GIS

Con

1%

18.72

110.64

91.91

7.88

2.93

3.51

1.44

0.11

0.01

Pass

ppt.mll

GIS

Con

1%

9.53

76.10

66.57

8.14

2.74

3.74

1.90

0.43

0.01

Pass

ppt.ml2

GIS

Con

1%

7.10

75.11

68.01

5.82

2.21

3.18

1.66

0.64

0.01

Pass

Strata

GIS

Cat

NA

NA

NA

NA

NA

NA

NA

NA

NA

NA

Pass

tmax

GIS

Con

2%

9.13

28.66

19.53

28.69

7.71

3.70

0.80

5.51

0.01

Pass

tmean

GIS

Con

2%

3.09

22.68

19.59

20.48

6.32

3.06

0.73

5.11

0.01

Pass

tmin

GIS

Con

2%

-2.98

17.26

20.24

12.34

4.77

2.28

0.64

4.43

0.01

Pass

IsolatedPools number*

H20
(Direct)

Ord

88%

0

20

20

8.20

1.17

6.00

2.81

2.50

0.00

Pass

SoilMoist MaxScore *

H20
(Direct)

Ord

79%

0

2

2

180.34

13.46

12.08

0.00

6.45

0.01

Pass

SoilMoist MeanScore *

H20
(Direct)

Ord

79%

0

2

2

200.30

14.19

12.46

0.00

6.88

0.01

Pass

springs_score *

H20
(Direct)

Bin

94%

0

3

3

4.29

3.63

0.44

1.45

0.19

0.00

Pass


-------
Candidate metrics

Group

Type

PctDom

Min

Max

Range

PvlvE

EvNE

PvNP

Pvlwet

Evldry

rf_MDA

Screened

SurfaceFlow_pct *

H20
(Direct)

Ord

56%

0

100

100

465.35

32.48

23.62

3.89

3.00

0.04

Pass

SurfaceSubsurfaceFlow_pct *

H20
(Direct)

Ord

60%

0

100

100

456.57

32.26

22.38

2.19

3.05

0.03

Pass

WaterlnChannel score*

H20
(Direct)

Ord

46%

0

6

6

531.00

31.42

24.02

6.28

6.11

0.04

Pass

HydricSoils_score

H20
(Indirect)

Bin

78%

0

3

3

90.55

10.93

6.47

0.13

7.76

0.00

Pass

WoodyJams_number

H20
(Indirect)

Ord

85%

0

100

100

1.83

1.56

1.50

2.45

1.19

0.00

Pass


-------
Metric Screening

As an initial data exploration step, we visualized the relationships between actual streamflow
duration class (hereafter "flow class") and indicators by ordinating all 95 metrics for all samples
in the dataset in a nonmetric multidimensional scaling using Gowers' distance (Gower 1971).
Convex hulls were drawn around each flow class to help visualize their distributions in
ordination space. The ordination of all candidate metrics for Northern and Southern GP
samples showed intermittent reaches overlapped with ephemeral and perennial reaches and
there was more separation between ephemeral and perennial reaches (Figure 5). Axis 1 tended
to separate reaches with flowing and dry conditions at the time of sample collection.

0.3

0.2-

0.1

CM
0)
Q

0.0-

-0.1 -

-0.2

A . A aa a A ~
. * 4	A 111 * A

** ^ A * *A A "
A A A ~

• %

.• •"

!	*•	*	M

.. 1	v " I >	*	*	i

#	^ •	A	A

•'Vi."	a -

• • A*t • fa	i Aji. . A	A A*

A

. * A1	*1*' i A ** 1

' >,v -* "'*/ *

A Al

a®!

•	Eph

•	Int

•	Per

• Dry
Flowing

0.0	0.1

MDS1

0.2

0.3

Figure 5. Beta SDAM GP candidate metric ordination.

Next, candidate metrics were evaluated using criteria for inclusion in the beta SDAM GP (Table
4):

•	Distribution statistic criterion: calculated as percent dominance of the most common
value (which was typically zero); all metrics had to meet this criterion.

•	Criteria measuring the responsiveness of metrics (i.e., ability to discriminate across flow
classes) included:

16


-------
o A set of statistical comparisons of mean values at different subsets of reaches
(e.g., t-statistic from a comparison of metric values at perennial and non-
perennial reaches), as has been used in other studies (Hawkins et al. 2010, Cao
and Hawkins 2011, Mazor et al. 2016).
o A responsiveness statistic based on variable importance (specifically, mean
decrease in accuracy) from a random forest model to predict streamflow
duration class from all candidate metrics; the model was calibrated using the
default option from the randomForest function in the randomForest package in R
(Liaw and Wiener 2002).

Candidate metrics had to meet at least one responsiveness criterion, in addition to the
distribution criterion, to be considered in further analyses. An exception was Strata, which is
the metric representing the four strata among which the study reaches were geographically
distributed; therefore, it was included in further analyses. A total of 89 of the 95 candidate
metrics were considered as screened metrics. Of the six metrics that failed, all but one
(erosion_scored) failed due to Percent Dominance (PctDom) scores greater than 95%. Note that
this evaluation was carried out using the testing dataset described in the next section.

Table 4. Metric screening criteria. Metrics had to meet the distribution criterion and at least one responsiveness criterion to be
considered screened for further analysis.

Criterion	Definition

Distribution criterion

% dominance of most

<95%

Frequency of most common value (typically, zero) in the

common value



development data set

Responsiveness criteria

PvlvE

F>2

F-statistic in a comparison of values at perennial versus
intermittent versus ephemeral reaches

EvALI

t>2

t-statistic in a comparison of values at ephemeral versus at least
intermittent reaches

PvNP

t>2

t-statistic in a comparison of values at perennial versus non-
perennial reaches

Pvlwet

t>2

t-statistic in a comparison of values at perennial versus flowing
intermittent reaches

Evldry

t>2

t-statistic in a comparison of values at ephemeral versus dry
intermittent reaches

rf_MDA

Top

Mean decrease accuracy (MDA) in a random forest model to



quartile

predict perennial, intermittent, or ephemeral streamflow
duration class

As in the development of previous SDAMs, direct measures of water were excluded from
further analysis. Metrics that directly measure water (e.g., soil moisture, number of isolated
pools, water in channel) can greatly increase performance. However, such metrics introduce
circularity (because water presence was used to confirm and update actual streamflow
duration classes in the development data set) and may degrade the ability of the SDAM to

17


-------
perform well during atypical conditions, such as drought. See Mazor et al. (2021b) for a
discussion of the implications of including direct measures of water presence as an indicator in
SDAMs.

Data Preparation

Prior to method development, a portion of the data was withheld for use in final model testing.
Samples from 20% of the study reaches, balanced by Class and Strata, were withheld into a
"test" dataset. These samples were used to inform the final model selection and refinement, by
evaluating the model on novel reaches. Samples from the remaining 80% of the reaches were
used to develop (or "train") the model and are referred to hereafter as the training dataset.

Repeat reach visits

Of the 251 reaches included in the GP dataset, each was visited between one and four times,
yielding a total of 692 samples. Figure 6 shows the distribution of repeat reach visits.

137

100-

93

c

13

o
o

50-









12

I

c



1

12	3	4

Number of visits

Figure 6. Distribution of number of visits across the 251 study reaches. Numbers inside of bars are the number of study sites with

1, 2, 3 or 4 visits.

To minimize bias, oversampling was performed on the training dataset (Figure 7). Oversampling
is a common preprocessing step that serves to give under-represented classes more visibility in
the data (Mohammed et al. 2020).

18


-------
Raw data

20%

Test set



Site

| VisitNo |

X

LL

1

1.4

LL

2

3.1

NN

1

0.8

Set aside 20% of sites
for testing. The model
does not see any
samples from these
sites during
development.

| Site |

VisitNo |



JJ

1

5.7

KK

1

3.1

KK

2

4.3

LL

1

1.4

LL

2

3.1

MM

1

1.2

MM

2

3.3

MM

3

6.2

MM

4

3.7

NN

1

0.8

Calculate performance of
random forest using testing
set (i.e.. sites the model has
never seen before)

If site was
visited once,
repeat the
sample 4x

If site was
visited twice,
repeat each
sample 2x



If site was
visited 3-4x, —
leave as-is

| Site

| VisitNo |

*

JJ

1

5.7

JJ

1

5.7

JJ

1

5.7

JJ

1

5.7

KK

1

3.1

KK

2

4.3

KK

1

3.1

KK

2

4.3

MM

1

1.2

MM

2

3.3

MM

3

6.2

MM

4

3.7

Oversampling

Oversample until
each study site in
the training set is
represented by
the same number
of samples

Figure 7.Oversampling process used for training dataset. x is a hypothetical candidate indicator

Oversampling was performed on the training dataset only (no manipulations were conducted
on the test dataset) and was included the following steps:

•	If a reach was sampled one time, its sample was repeated four times.

•	If a reach was sampled twice, each sample was repeated two times.

•	If a reach was sampled three or four times, the samples were left as-is.

The result of the oversampling process was that each study reach had three or four samples
used in the analysis process for method development and the distribution of flow duration
classes was preserved from the original training dataset to the oversampled training dataset,
which also matched well to the distribution of flow duration classes within the testing dataset
(Figure 8). Therefore, the augmented (oversampled) training data with 822 samples were used
in the next step of the method development analysis process to select screened metrics.

19


-------
"00- c

120°"

100-

25.6%	28.9%

31 ¦	I 35

0-

E	!	P

Class

Figure 8: Distribution of ephemeral (E), intermittent (I), and perennial (P) classes in the (A) training dataset before oversampling,

(B) training dataset after oversampling, and the (C) testing dataset. Shown for each bar is the number of samples for a
streamflow duration class and the percent of samples within the datasets. A balanced distribution between classes is important

to mitigate against bias and improve model accuracy.

Metric selection

The screened metrics were reduced to a final set of metrics for the beta SDAM GP based on
their importance in random forest models using the Recursive Feature Elimination (RFE)
function in the R caret package (Kuhn 2020). Briefly, RFE is a form of stepwise selection where
complex models (i.e., those based on many metrics) are calibrated, and simpler models are
considered incrementally by eliminating the least important metrics. Here, the most complex
model was first considered. Then, the five least important metrics were eliminated based on
their relative performance in the random forest model. This process was iterated until a 20-
metric model was identified, after which only one variable was eliminated in each successive
step. The best performing model (highest accuracy in predicting true streamflow duration class)
was identified. Then, the simplest model (i.e., the one with the fewest metrics) with accuracy
within 1% of the model with the best accuracy was selected to identify the final set of metrics.
If the best-performing model selected by this approach had more than 20 metrics, the 20-
metric model was selected. For this analysis, accuracy on the training dataset was measured
with Cohen's Kappa statistic—a measure of accuracy that accounts for uneven distribution
among the three streamflow duration classes. Note that the Kappa statistic varies from 0 to 1,
where 0 equals agreement equivalent to chance and 1 equates to perfect agreement. Due to
the use of random forest models, the Out-of-Bag (OOB) error rate is provided. This means that
the prediction error measure for the model is computed through bootstrap or bagging, where
subsampling with replacement creates a set of training samples for the model to learn from and
the OOB error is the mean prediction error on each training sample (James et al. 2013).

This modeling process (including RFE) was applied to the dataset to produce 10 models:

•	The entire Great Plains (Northern and Southern Great Plains) dataset (unstratified
model set)

•	Datasets for each stratum (stratified model sets): Central Prairie, Northern Prairie,
Southern Plains, and Upper Midwest (Figure 3)

20


-------
There are advantages and disadvantages to including geospatial metrics in an SDAM. Geospatial
metrics may improve SDAM performance but would require GIS analysis in the application of
the resulting method. See Mazor et al. (2021b) for a discussion of the implications of including
geospatial metrics in SDAMs.

The 10 models were compared to determine the degree of improved performance by the
inclusion of GIS metrics and strata-specific models. Model design characteristics and optimal
number of metrics selected by RFE are shown in Table 5, and the selected metrics for each
model are shown in Figure 9.

Table 5. Design characteristics of the 10 models. GIS: included geospatial metrics, ft samples: number of samples used in model
training and testing. RFE OOB error rate: out-of-bag (OOB) error rate of the best model produced by recursive feature
elimination.

tt samples tt samples tt metrics tt metrics RFE OOB error

Model set	Stratum	(training) (testing)	eligible	chosen	rate

Unstratified models

Unstratified

Entire Great Plains

822

121

61

11

0.13

Unstratified GIS

Entire Great Plains

822

121

82

6

0.03

Models stratified by region

Stratified

Northern Prairie

174

18

61

20

0.20

Stratified

Southern Plains

180

29

61

9

0.10

Stratified

Upper Midwest

237

38

61

7

0.17

Stratified

Central Prairie

231

36

61

20

0.10

Stratified GIS

Northern Prairie

174

18

82

11

0.02

Stratified GIS

Southern Plains

180

29

82

18

0.07

Stratified GIS

Upper Midwest

237

38

82

13

0.01

Stratified GIS

Central Prairie

231

36

82

20

0.02

Biological metrics, particularly those based on aquatic invertebrates, were among the most
widely selected metrics across model sets (Figure 9). Among non-biological metrics, mean
bankfull width was the only frequently selected geomorphological metric.

21


-------
Biological

Geomorphic

Metrics selected by RFE per model set

Algaoscore

BMI score 1
OifforenceslnVcgetatMW score A
EPT abundancei
'EPTrelabd -
EPT retux.i-
SPT_ta*a A

GOLD VetiAn,

GOLDOCH 1roH»owPereisteflce_ 10

PPtnSf?
ppl m02
pplmOJ
ppitrKM
ppl m05
ppj a>06
pptm07
ppl mOS
ppl m09
ppLmtO
pfX rnll
pptm12

tmax
tmean
limn

Hw)f>cSo«ls_score
WoodyJamsnumbw
IsowtedPoois number
SooMo.s: MiuScore
So>iMoist_MeanScore
sprwtgsscore
SurfaceFiowpct
SurtaceSutisuffaceFlow pet
WatctlnChannel score







-



-
-





-
_













.





-

-





:









-



- ¦







-













-











:





:





-





-





¦









¦





-





¦

:





:



-	











-









-





_





-













































































































































-

Shading indicates if 0,1, 2,
3 or all 4 strata included a
candidate metric in the
model. The unstratified
models include all 4 strata.

I

Not Selected
Not Elgoble
Selected -1 strata
Selected - 2 Strata
Selected • 3 Strata
Selected - A Straia

Figure 9. Screened metrics (left) selected by RFE for each model set (bottom). White tiles indicate that a screened metric was
ineligible for selection in that model set (e.g., Elev_m was ineligible for models that did not allow G IS metrics). X-axis labels refer
to model sets described in Table 6. Y-axis labels refer to screened metrics described in Table 4 and Appendix A.

22


-------
Preliminary model calibration and performance assessment

Random forest models were fit for each of the 10 models using the randomForest function in
the randomForest package in R (Liaw and Wiener 2002) using default parameters, except that
the number of trees was set to 1500 instead of the default 500.

Model performance evaluation focused on two aspects: accuracy and repeatability (Table 7 and
Figure 8). Accuracy was assessed by calculating the same comparisons used to evaluate metric
responsiveness during the metric screening phase (e.g., ephemeral versus at least intermittent
reaches [EvALI], perennial versus wet intermittent reaches [Pvlwet], etc.; Table 5). Accuracy of a
model's ability to correctly distinguish among ephemeral, intermittent, and perennial
streamflow classes was assessed on both the training and testing datasets independently.
Training and testing measures were compared against each other to see if models validated
poorly (training dataset accuracy substantially higher than testing dataset accuracy), suggesting
that models may be overfit for the training reaches and not generally predictive for streamflow
duration classification. The performance of unstratified models was evaluated for individual
strata by examining results for reaches within the four strata separately.

Repeatability, or precision, was assessed using data from the 158 reaches that were resampled
(Figure 6) and was calculated as the percent of reaches where model classifications from
repeated samples at the same reach were consistent (regardless of classification accuracy). Due
to the limited amount of data, repeatability was only assessed for the entire GP and not within
each stratum.

Along with the 10 models, the classification accuracy of existing SDAMs (models) for the PNW
(Nadeau 2015), NM (NMED 2011), and beta AW (Mazor et al. 2021a) as applied to the GP
dataset was also compared (Table 6 and Figure 10).

Table 6. Performance evaluation of the 10 RF model options developed for the GP and 3 existing SDAMs. PvlvE: Percent of reach
samples classified correctly as perennial, intermittent, or ephemeral. EvALI: Percent of reach samples classified correctly as
ephemeral or at least intermittent. PvNP: Percent of reach samples classified correctly as perennial or non-perennial. Pvlwet:
Percent of flowing reach samples classified correctly as perennial or intermittent. IvEdry: Percent of dry reach samples correctly
classified as intermittent or ephemeral. Train: Result for training data. Test: Result for testing data. Model sets are described in
Table 6. AW: Results for the Beta SDAM AW. PNW: Results for the SDAM PNW. NM: Results for the SDAM NM.

	Accuracy	

PvlvE	EvALI	PvNP	Pvlwet	IvEdry

Model set

Train

Test

Train

Test

Train

Test

Train

Test

Train

Test

Precision

Unstrat

87

74

93

89

93

84

89

75

85

73

87

Unstrat GIS

97

50

98

73

99

71

98

39

96

66

94

Strat

86

72

93

89

92

83

86

73

87

72

83

Strat GIS

97

69

98

93

99

76

98

59

97

79

92

AW

43

48

78

85

46

52

42

48

39

43

71

PNW

47

49

84

87

62

62

40

40

63

57

78

NM

55

54

84

87

68

66

55

52

56

52

86

23


-------


Accuracy
PvlvE



Accuracy
EvALI

Unstrat ¦



*





Unstrat GIS -

••

• • f



.. ••

Strat -







•

Strat GIS ¦







—••

AW-





-

PNW -



( •



-

NM-

%

• • —



04

Accuracy
PvNP



Accuracy
Pvlwet







• • •











#•••

~ •



Accuracy
IvEdry



*

•

.. Am







• • .•

• • •

• •



• •

1 1 1

Precision

Dataset

•	Testing

•	Training

Strata

•	GreatPlains

•	Central

•	Northern

•	Southern

•	Upper

0.0 0.5 1.00.0 0.5 1.00.0 0.5 1.00.0 0.5 1.00.0 0.5 1.00.0 0.5 1.0

Performance

Figure 10, Performance of the various model sets evaluated within strata defined by sub-region. PvlvE: Proportion of reach
samples classified correctly as perennial, intermittent, or ephemeral. The y-axis labels on the left indicate the stratifications used
to develop the models (if any). EvALI: Proportion of reach samples classified correctly as ephemeral or at least intermittent.
PvNP: Proportion of reach samples classified correctly as perennial or non-perennial. Pvlwet: Proportion of flowing classified
correctly as perennial or intermittent. IvEdry: Proportion of dry reach samples correctly classified as intermittent or ephemeral.
Model sets are described in Table 5. AW: Results for the Beta SDAM AW; PNW: Results for the SDAM PNW; NM: Results for the

SDAM NM.

Selection of the final model

SDAM models newly developed through the current effort using data from the GP had better
performance than previously developed SDAMs, confirming higher classification accuracy is
achieved through development of region-specific SDAMs.

Among the 10 models, performance was highest in the training dataset for the unstratified and
stratified model versions that included GIS metrics (Figure 10; Table 6). However, performance
of the models containing GIS data sharply decreased when evaluated against the testing
dataset, indicating that the GIS models were overfitting to the training dataset (Figure 11).

24


-------
100

100

75

o 50
O

25

95 4



Dataset

J Training
I Testing

Central Northern Southern Upper
Strata

Central Northern Southern Upper
Strata

Figure 11: Accuracy of the (A) unstratified GP model without GIS metrics and (B) unstratified GP model with GIS metrics based on
training and testing datasets by strata (943 total observations). Numbers shown in bars are the percent of correctly classified

samples as perennial, intermittent, or ephemeral.

Between the stratified and unstratified models that did not include GIS metrics, performance
was similar and there was no clear best model (Figure 10; Table 6), Because the stratified
models did not show significant improvement (accuracy of training or testing datasets) over a
single model encompassing the entire Great Plains that included a strata metric, separate
models for each sub-region were deemed unnecessary. Thus, the decision, which was affirmed
by the RSC, was to use the unstratified model without GIS data.

Furthermore, the strength of the unstratified (no GIS) model increases when looking at the
ability of the model to accurately distinguish between ephemeral and at least intermittent
(EvALI; Figure 12) compared to distinguishing between all three classes (PvlvE; Figure 11).

25


-------
100 -

75-

o
CD

O 50'
O

25-

0-

931	90.8	928

83.3

83.3

96.6

96.6

94.7

Central

Northern Southern
Strata

Upper

Dataset

Training
I Testing

Figure 12: Accuracy of the unstratified Great Plains model (no GIS) in distinguishing between Ephemeral and At Least

Intermittent for training and testing datasets by strata.

For these reasons, the unstratified model (no GIS) was selected as the beta SDAM GP to apply
to the GP.

Unstratified (no GIS) model description

Eleven metrics were selected via RFE for the unstratified (no GIS) model. The metrics are shown
in Figure 13 by their order of importance. Here, importance to the random forest model is
considered in two ways: (1) through mean decrease in accuracy and (2) through mean decrease
in Gini Index, which is a measure of node impurity, or how important the metric is in splitting
between different flow duration classes.

26


-------
A

B

BankWidthMean
Strata
PctShading
EPTJaxa

S u b strate S o rti n g_s co re

Sinuosity_score

UpiandRootedPlants_score

hydrophytes_present_noflag

ChannelDimensions_score

GOLDOCH_reltaxa

GOLDOCH_relabd

T"

T

60

i	r

70 80 90 100
MeanDecreaseAccuracy

BankWidthMean

GOLDOCH_relabd

GOLDOCH_reltaxa

EPTJaxa

PctShading

UplandRootedPlants_score
hydro phyte s_pre s e nt_n ofl ag
Strata

S u b strate S o rti n g_s co re

Sinuosity_score

ChannelDimensions_score

i	1	1	1	r

0 20 40 60 80
MeanDecreaseGini

Figure 13: Metrics included in the unstratified (no GIS) model, by their order of importance. (A) Mean Decrease in Accuracy is the
relative loss in predictive performance when the particular variable is omitted from the model. (B) Mean Decrease in Gini: Gini
Index is a measure of node impurity or how important the variable is in splitting between different streamflow duration classes.

To evaluate the overall performance of the unstratified (no GIS) model, confusion matrices
were created for both training and testing datasets (Figure 14). Overall classification accuracy
was higher for ephemeral reach samples (training 89.2%, testing 90.3%) than for perennial
(training 86.6%, testing 74.3%) and intermittent reach samples (training 85.7%, testing 61.8%).
No perennial reach samples were misclassified as ephemeral in either testing or training
datasets; only two ephemeral reach samples were misclassified as perennial in the training
dataset. The unstratified (no GIS) model had similar misclassification predictions of intermittent
reach samples as ephemeral or perennial reaches in the testing and training datasets.

27


-------
B

C

o

1'


-------
if a simpler alternative was available, and continuous metrics were converted to binary or
ordinal metrics based on visual interpretation of their distributions. (Binary and ordinal metrics
are typically more rapid to measure and easier to standardize than continuous metrics.)
Accuracy and repeatability measures were re-evaluated to ensure that overall model
performance was not substantially diminished by the modifications.

The suite of metrics of the selected model was iteratively refined while monitoring model
accuracy and repeatability. In each iteration, one or more metrics were either eliminated,
binned, or otherwise simplified. The impact of each iterative refinement on performance was
assessed, and the highest performing refined model was selected. Performance was assessed in
terms of three accuracy measures: PvlvE (i.e., proportion of reach samples classified corrected
as perennial, intermittent, or ephemeral), EvALI (i.e., proportion of reach samples classified
correctly as ephemeral or at least intermittent), and Cohen's Kappa - a measure of accuracy.
Note that the Kappa statistic varies from 0 to 1, where 0 equals agreement equivalent to
chance and 1 equates to perfect agreement.

Ten refinements of the unstratified (no GIS) model were performed and are summarized in
Table 7 and Figure 15. For example, a refinement made between Version 0 and Version 1 was
the binning the mean bankfull width from continuous data to binary data (<20 m and >20 m).

29


-------
Table 7. Ten model refinement versions of the statistically determined unstratified model without GIS metrics. Includes refinement descriptions, metrics included and accuracy of refined models (PvlvE:
Percent of reach samples classified correctly as perennial, intermittent, or ephemeral; EvAU: Percent of reach samples classified correctly as ephemeral or at least intermittent) as measured using the
testing dataset. Bold metrics included in refined models identify the iterative metric refinements made to the previous model refinement version.

Version 0

Version 1

Version 2

Version 3

Version 4

Version 5

Version 6

Version 7

Version 8

Version 9

Version 10

Unstratified, no

Bin continuous

GOLDOCH

GOLD presence/

OCH presence/

GOLD andOCH

GOLDOCH

without

Upper, Northern

Southern,

Upper and

GIS model (no

variables into

presence/

absence

absence

presence/

abundance

GOLDOCH

and Central

Northern and

Northern strata

refinements)

discrete groups

absence





absence

binned

variables

strata combined

Central strata
combined

combined

Metrics Included

BankWidthMean

BankWidth

BankWidth

BankWidth

BankWidth

BankWidth

BankWidth

BankWidth

BankWidth

BankWidth

BankWidth



binned

binned

binned

binned

binned

binned

binned

binned

binned

binned

Strata

Strata

Strata

Strata

Strata

Strata

Strata

Strata

Strata UNC

Strata SNC

Strata UN

PctShading

PctShading

PctShading

PctShading

PctShading

PctShading

PctShading

PctShading

PctShading

PctShading

PctShading



binned

binned

binned

binned

binned

binned

binned

binned

binned

binned

EPT taxa

EPT taxa binned

EPT taxa binned

EPT taxa binned

EPT taxa binned

EPT taxa binned

EPT taxa binned

EPT taxa binned

EPT taxa binned

EPT taxa binned

EPT taxa binned

Substrate

Substrate

Substrate

Substrate

Substrate

Substrate

Substrate

Substrate

Substrate

Substrate

Substrate

Sorting score

Sorting score

Sorting score

Sorting score

Sorting score

Sorting score

Sorting score

Sorting score

Sorting score

Sorting score

Sorting score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

Sinuosity score

hydrophytes

hydrophytes

hydrophytes

hydrophytes

hydrophytes

hydrophytes

hydrophytes

hydrophytes

hydrophytes

hydrophytes

hydrophytes

present

binned

binned

binned

binned

binned

binned

binned

binned

binned

binned

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Upland Rooted

Plants score

Plants score

Plants score

Plants score

Plants score

Plants score

Plants score

Plants score

Plants score

Plants score

Plants score

Channel

Channel

Channel

Channel

Channel

Channel

Channel

Channel

Channel

Channel

Channel

Dimensions

Dimensions

Dimensions

Dimensions

Dimensions

Dimensions

Dimensions

Dimensions

Dimensions

Dimensions

Dimensions

score

score

score

score

score

score

score

score

score

score

score

GOLDOCH

GOLDOCH

GOLDOCH y/n

GOLD y/n

OCH y/n

GOLD y/n

GOLDOCH









reltaxa

reltaxa binned









relabd binned









GOLDOCH









OCH y/n











relabd





















Model Accuracy

PvlvE: 72.7

PvlvE: 68.6

PvlvE: 67.8

PvlvE: 65.3

PvlvE: 61.2

PvlvE: 65.3

PvlvE: 66.1

PvlvE: 62.8

PvlvE: 68.6

PvlvE: 62.8

PvlvE: 62.8

EvALI: 90.1

EvALI: 90.1

EvALI: 88.4

EvALI: 89.3

EvALI: 83.5

EvALI: 88.4

EvALI: 88.4

EvALI: 84.3

EvALI: 87.6

EvALI: 84.3

EvALI: 86.8

30


-------
0.75

CD

-i 0-50

CO
>

0.25

0.00

Accuracy	EvALI	Kappa

Figure 15. Impact of refinement of metric set on the model performance relative to the unstratified (no GIS) model using the training dataset. Each refinement description is relative the description at 0
(unstratified, no GIS model). Black circles Indicate the highest Accuracy, EvALI, and Kappa scores. Dashed lines show performance of the unstratified (no GIS) model.

BankWidth_cat

Strata
PctShadbin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytes_presen
UplandRootedPlants
ChannelDimensions
GOLD_taxa_present

BankWidthMean
Strata
PctShading
EPTjtaxa
SubstrateSorting
Sinuosity_soore
hydrophytes_presen
U pland Rooted Plants
ChannelDimensioris
GOLDOCH_reltaxa
GOLDOCHrelabd

BankWidth_cat

Strata
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytespresen
UplandRootedPfants
ChannelDimensions
OC H_taxa_present

BankWidthcat
Strata
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytesjsresen
Upland Rooted Plants
ChannelDimensions

BankWidth_cat
Strata_SNC
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity _score
hydrophytes_presen
UplandRootedPlants
ChannelDimensions

BankWidth_cat
Strata_UN
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytes_presen
UplandRootedPlants
ChannelDimensions

BankWidth_cat

Strata
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytes_presen
UplandRootedPlants
ChannelDimensions
GOLDOCH_reltaxa_bin

BankWidth_cat
Strata
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytes_presen
UplandRootedPlants
ChannelDimensions
GOLDtaxapresent
OCH_taxa_present

BankWidth_cat
Strata
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_soore
hyd rophy tes_presen
U pla nd Rooted PI ants
ChannelDimensions
GOLDOCH_taxa_present

Refinement of Chosen Model

1.00-

BankWidth_cat
Strata
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytes_presen
UplandRootedPlants
ChannelDimensions
GOLDOCH_abd_bin

BankWidth_cat
Strata_UNC
PctShad_bin
EPT_bin
SubstrateSorting
Sinuosity_score
hydrophytes_presen
U pla n d Roote d Plan ts
ChannelDimensions

31


-------
As shown by the decreasing performance lines in Figure 15, none of the attempted refinements
improved the performance of the unstratified (no GIS) model in terms of PvlvE accuracy, EvALI
accuracy, or Cohen's Kappa. However, the slight decrease in model predictive performance was
weighed against the relative advantages of simplifying field data collection. For this reason, the
two GOLDOCH metrics were removed due to the data collection effort required.

Final model selection

After consultation with the PDT and RSC, the final model selected was the Version 8 refinement
of the unstratified (no GIS) model. The Version 8 refinement differs from the unstratified (no
GIS) model as follows:

•	BankWidthMean, originally a continuous metric on the scale of 0.4 - 68.3 meters, was
binned into two discrete groups (less than 20m, greater than or equal to 20m) based on
visual interpretation of the metric distributions across ephemeral, intermittent, and
perennial classes, and through trial-and-error testing.

•	Strata, originally containing four strata, was simplified into the two Great Plains Regions:
the Southern Great Plains, and the Northern Great Plains (containing the Upper
Midwest, Northern Prairie, and Central Prairie strata).

•	Percent Shading, originally a continuous metric ranging from 0-100%, was binned into
discrete groups (less than 10% and greater than or equal to 10%) based on visual
interpretation of the metric distributions across ephemeral, intermittent, and perennial
classes, and through trial-and-error testing.

•	Number of EPT families ranged from zero to seven in the original dataset. This was
simplified in the refined model into two discrete groups (zero to one family, two or
more families). This metric binning was based on visual interpretation of the metric
distributions across streamflow duration classes and through trial-and-error testing.
However, the beta SDAM GP User Manual recommends enumerating up to five families,
if present, to provide redundancy.

•	Number of hydrophytic species recorded ranged from zero to eight species in the
original dataset. This was simplified in the refined model into two discrete groups (fewer
than two species, two or more species). This metric binning was based on visual
interpretation of the metric distributions across streamflow duration classes and
through trial-and-error testing. However, the beta SDAM GP User Manual recommends
enumerating up to five families, if present, to provide redundancy.

•	GOLDOCH_reltaxa and GOLDOCH_relabd were removed from the model.

The performance of the Version 8 refined model is shown as confusion matrices (Figure 16).
There was a decrease in performance based on the training dataset (Figure 15), relative to the
unstratified (no GIS) model, but of similar performance based on the testing dataset (Table 7).

32


-------
A

B

c

0

1	'

0)

217

43

32

243

54

35

I	E

Actual

s=

0

1	'¦

a)

20

15

37

10

26

I	E

Actual

Proportion

o.o

0.2
" 0.4
0.6
0.8

¦

Predictions

B Correct
Incorrect

Figure 16: Performance of the selected refined model based on (A) training (822 reach samples) and (B) testing (121 reach
samples) datasets. X-axis shows actual flow duration class and Y-axis shows predicted flow duration class. Blue diagonal
indicates correct predictions. P = perennial, I = intermittent, and E = ephemeral. Shading of boxes in matrices describe the

proportion of reach samples in each dataset.

Using the refined model, two reaches in the training dataset continued to incorrectly predict
ephemeral when the correct classification was perennial during one of four visits to the sites. In
addition, two reaches in the training dataset incorrectly predicted perennial when the correct
classification was ephemeral during one of four visits to the sites. The four sites were the
following:

Reach Code

State

Strata

Actual

Predicted

TXSB14

TX

Southern

P

E

WIUB20

Wl

Upper

P

E

WIUB37

Wl

Upper

E

P

WYNB1

WY

Northern

E

P

No incorrect predictions between ephemera I and perennial occurred using the refined model
on the testing dataset.

Increased confidence required for classifications

Random forest models created for classification traditionally make assignments based on the
class that receives the highest number of votes by each "tree" in the forest. Thus, in a three-
way decision (ephemeral, intermittent, or perennial), the class with the most votes could
receive much less than a majority of all votes—as low as 34%. Given concern that such low-
confidence classifications may not provide sufficient defensibility for some management
decisions, approaches to distinguish between high- and low-confidence classifications were
explored.

33


-------
We explored increasing the minimum number of votes required to make a confident
classification from 30% to 100% by increments of 1% to understand the effect on classification.
When the selected refined model was applied to a novel test reach and a single class received a
sufficient percent of votes, then the reach was classified accordingly. If none met the minimum
but the combined percent of votes for intermittent and perennial classes exceeded the
minimum, then the reach was classified as at least intermittent. In all other cases, the reach
was classified as need more information. This decision framework reflects that distinguishing
between ephemeral and at least intermittent reaches is a high priority use of the beta SDAM
GP. The percent of reaches under each of the five possible classifications with increasing
minimum vote agreement thresholds were calculated.

At a minimum required proportion of votes of 0.5, only 3.5% of reach samples in the training
dataset (5% of reach samples in the test dataset) were classified as at least intermittent, and
none were classified need more information (Figure 16). Classifications of at least intermittent
first appear with a minimum proportion of 0.37 in the training dataset (0.45 in the testing
dataset), whereas classifications of need more information appear at 0.51 in both the training
and testing datasets. Although it cannot be ruled out, it is unlikely that the beta SDAM GP will
result in a classification of need more information. Based on these results the RSC
recommended a minimum proportion threshold of 0.5 for flow classification.

800-

600-

400-

200-

Training

120-

90-

60-

30-

Testing

Classification
NMI
ALI

¦

E

0.8 0.9 1.0 0.3 0.4 0.5
Minimum proportion of votes

Figure 17. Influence of the minimum proportion of votes required to make a classification on n (the number of reaches in each
class). NMI: Need more information. ALI: At least intermittent. P: Perennial. I: Intermittent. E: Ephemeral. The vertical black line
represents a minimum proportion of required votes of 0.5, reflecting the final recommendation of the RSC. The two red lines
represent the proportion of votes that first result in classification of ALI (the lower line) or NMI (the upper line) for the dataset.

Evaluation of single indicators of at least intermittent flow

Single indicators can supersede a model classification of ephemeral to make it change to at
least intermittent. Single indicators provide technical benefits (i.e., improved accuracy) as well
as non-technical benefits, such as greater acceptance of the SDAM, given public understanding
of the role of streamflow duration in supporting biological organisms and rapidity of

34


-------
determining a flow classification. Single indicators are also used in other SDAMs (e.g., Nadeau
et al. 2015, Dorney and Russell 2018, Mazor et al. 2021a); for instance, indicators can include
the presence of fish, iron-oxidizing bacteria, hydric soils, and/or aquatic vertebrates
(amphibians and reptiles), among others.

We evaluated single indicators used in previous SDAMs. The number of instances where
inclusion of a prior single indicator would correct a misclassification (i.e., the reach was truly
intermittent or perennial) and would introduce a misclassification/mistake (i.e., the reach was
truly ephemeral) was quantified. All single indicators investigated had minimal impact on
performance or introduced more errors than were corrected (Figure 18). Based on these
results, the RSC did not recommend including any of the evaluated single indicators in the beta
SDAM GP.

Aquatic vertebrates (incl. frog calls)
Aquatic vertebrates
Aquatic snakes
SDAM PNW single indicators
SDAM NM single indicators
Iron-oxidixing bacteria and fungi
Hydrophytes (3+ species)
Hydrophytes (2+ species)
Hydrophytes (any)
Hydric soils
Fish or hydric soil or algae > 10%

Fish
EPT (5+)
EPT (any)
BMI

Amphibians (incl. frog calls)
Aquatic amphibians
Algal cover >10%

•

• • •

*

'• •



0	25	50	75

Number of samples changed

Dataset

•	Testing

•	Training

Change

•	Mistakes

•	Corrections

Figure 18. Influence of single indicators on performance of the refined model

Performance of the beta SDAM GP

Performance of the selected refined model (with a minimum proportion voting threshold of
0.5) for the beta SDAM GP is summarized in Table 8. The overall classification accuracy among
the three classes (perennial, intermittent, ephemeral) was 81% in the training dataset (and 68%
in the testing dataset), but this accuracy increased to 89% in the training dataset (and 87% in
the testing dataset) when only ephemeral versus at least intermittent classifications were
considered (i.e., both blue and green cells in Table 8 were treated as correct). Note, after
applying the voting threshold one of the two instances in the training dataset that incorrectly

35


-------
predicted perennial when the correct classification was ephemeral changed to a prediction of at
least intermittent (WYNB1).

Table 8. Classifications of the final version of the beta SDAM GP. Blue cells indicate correct classifications of perennial,
intermittent, at least intermittent and ephemeral reaches, whereas green cells indicate correct classifications of ephemeral
versus at least intermittent. Green numbers represent the reach visits with matching actual and predicted classes and red
numbers are reach visits with non-matching actual and predicted classes.

Actual streamflow duration class

Predicted Ephemeral Ephemeral Intermittent Intermittent Perennial Perennial
Class	(Training) (Testing)	(Training) (Testing)	(Training) (Testing)

Ephemeral





47

9

2

0

Intermittent

30

5

236

31

40

14

ALI

7

2

17

7

5

1

Perennial

1

0

29 8





Using the LandUse indicator to identify reaches that were disturbed (LandUse = urban or
agriculture, alone or in combination with any other land use category) and not disturbed
(LandUse does not include urban or agriculture) at the time of the site visit, there were 133
individual reaches identified as disturbed during at least one site visit with a total of 229
disturbed samples (before augmentation). There were 192 (34%) and 37 (31%) disturbed
samples included in the training and testing datasets, respectively. These tallies and the
accuracy results provided below focus on the samples of the original dataset before
augmentation (n = 692).

Among the samples identified as disturbed by human activity in the training dataset, accuracy
among all classes was 76%, which improved to 86% when only ephemeral versus at least
intermittent classifications were considered. For samples in the training dataset that were not
disturbed, the accuracy values indicated similar performance to that of the disturbed sites (i.e.,
73% PvlvE and 84% EvALI).

For the samples in the testing dataset, the accuracy among all classes for disturbed sites was
78%, which improved to 89% when only ephemeral versus at least intermittent classifications
were considered. For samples in the testing dataset that were not disturbed, accuracy among
all classes was 64%, which improved to 86% when only ephemeral versus at least intermittent
classifications were considered.

Data and code availability

All data used to develop the method and R code used in analysis are available at the following
git repository: https://doi.org/10.23719/1527943

36


-------
Next steps

The beta SDAM GP is being made available for one year for public review and comment while
additional data at the study sites are collected through 2022, after which a final method will be
developed and released to replace the beta method.

Acknowledgements

The development of this method and supporting materials was guided by a regional steering
committee (RSC) consisting of representatives of federal regulatory agencies in the Great Plains
of the U.S.: Micah Bennett (U.S. Environmental Protection Agency [USEPA]—Region 5), Andrew
Blackburn (U.S. Army Corps of Engineers [USACE]—Great Lakes and Ohio Valley Division,
Chicago District), Kirsten Brown (USACE—Mississippi Valley Division, Rock Island District), Billy
Bunch (USEPA—Region 8), Gabrielle C. L. David (USACE—Engineer Research and Development
Center, Cold Regions Research and Engineering Laboratory), Gabriel DuPree (USEPA—Region 7),
Wayne Fitzpatrick (USACE—Southwestern Division, Galveston District), Jeremy Grauf (USACE—
Northwestern Division, Omaha District), Ed Hammer (USEPA—Region 5), Rachel Harrington
(USEPA—Region 8), Faye Healy (USACE—Mississippi Valley Division, St. Paul District), Shawn
Henderson (USEPA—Region 7), Rob Hoffman, (USACE—Southwestern Division, Tulsa District),
Rose Kwok (USEPA—Headquarters), April Marcangeli (USACE—Mississippi Valley Division, St.
Paul District), Tunis McElwain (USACE—Headquarters), Elizabeth Shelton (USACE—
Southwestern Division, Galveston District), Chelsey Sherwood (USEPA—Region 6), Loribeth
Tanner (USEPA—Region 6), Kerryann Weaver (USEPA—Region 5), and Matt Wilson (USACE—
Headquarters).

We thank Abel Santana, Robert Butler, Duy Nguyen, Kristine Gesulga, and Anne Holt for
assistance with data management, and Abe Margo, Alex Martinez, Addison Ochs, Morgan
Proko, Alec Lambert, Zak Erickson, Alex Berryman, Jack Poole, Joe Kiel, Joe Klein, Jackson Bates,
Buck Meyer, Margaret O'Brien, Elliot Broder, Jason Glover, and James Treacy for assistance with
data collection. Amy James provided document editorial and formatting assistance.

Numerous researchers and land managers with local expertise assisted with the selection of
study reaches to calibrate the method: Tim Bonner, Jeffrey Brenkenridge, Taylor Dorn, Tim
Fallon, John Genet, Linda Hansen, Garret Hecker, Stephanie Kampf, Kort Kirkeby, Ji Yeow Law,
John Lyons, Kyle McLean, Miranda Meehan, Steve Robinson, Mateo Scoggins, Patrick Trier,

Linda Vance, Ross Vander Vorste, and Jason Zhang.

References

Cao, Y., and C. P. Hawkins. 2011. The comparability of bioassessments: a review of conceptual
and methodological issues. Journal of the North American Benthological Society 30: 680-701.

Chapin, T. P., A. S. Todd, and M. P. Zeigler. 2014. Robust, low-cost data loggers for stream
temperature, flow intermittency, and relative conductivity monitoring. Water Resources
Research 50: 6542-6548.

37


-------
Dorney, J., and P. Russell. 2018. North Carolina Division of Water Quality methodology for
identification of intermittent and perennial streams and their origins. Pages 273-279 in J.
Dorney, R. Savage, R. W. Tiner, and P. Adamus (eds.), Wetland and Stream Rapid Assessments.
Elsevier, San Diego, CA.

Eng, K., D. M. Wolock, and M. D. Dettinger. 2016. Sensitivity of intermittent streams to climate
variations in the USA. River Research Applications 32: 885-895.

Fritz, K. M., T.-L. Nadeau, J. E. Kelso, W. S. Beck, R. D. Mazor, R. A. Harrington, and B. J. Topping.
2020. Classifying streamflow duration: The scientific basis and an operational framework for
method development. Water 12: 2545.

Gower, J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics 27:
857-874.

Hart, E., and K. Bell. 2015. Prism: Access Data From The Oregon State Prism Climate Project.

Hawkins, C. P., Y. Cao, and B. Roper. 2010. Method of predicting reference condition biota
affects the performance and interpretation of ecological indices. Freshwater Biology 55: 1066-
1085.

Hedman, E. R., and Osterkamp, W.R. 1982. Stream Flow Characteristics Related to Channel
Geometry of Streams in Western United States. USGS Water-Supply Paper 2193, Washington,
DC. p. 17. DOI:10.3133/wsp2193.

Hewlett, J. D. 1982. Principles of Forest Hydrology; University of Georgia Press: Athens, GA,
USA, p. 192.

Jaeger, K.L., R. Sando, R. R. McShane, J. B. Dunham, D. P. Hockman-Wert, K. E. Kaiser, K. Hafen,
J. C. Risley, and K. W. Blasch. 2019. Probability of streamflow permanence model (PROSPER): a
spatially continuous model of annual streamflow permanence throughout the Pacific
Northwest. Journal of Hydrology X 2:1000005.

James, A., K. McCune, and R. Mazor. 2022. Review of Flow Duration Methods and Indicators of
Flow Duration in the Scientific Literature, Great Plains of the United States. Document No. EPA-
840-B-22006. 56 pp. (Available from: https://www.epa.gov/svstem/files/documents/2022-
09/FlowDurationLitReview-gp.pdf)

James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. An Introduction to Statistical Learning.
Springer, NY. 440 pp.

Kelso, J. E., and K. M. Fritz. 2021. Standard Operating Procedure: Processing Data and
Classifying Streamflow Duration Using Continuous Hydrologic Data. EPA Report J-WECD-ECB-
SOP-4425-O. Environmental Protection Agency, Cincinnati, OH. 25 pp.

38


-------
Kuhn, M. 2020. caret: Classification and Regression Training. (Available from: https://cran.r-
project.org/web/packages/caret/caret.pdf)

Liaw, A., and M. Wiener. 2002. Classification and regression by randomForest. R News 2: 18-22.

Mazor, R. D., A. C. Rehn, P. R. Ode, M. Engeln, K. C. Schiff, E. D. Stein, D. J. Gillett, D. B. Herbst,
and C. P. Hawkins. 2016. Bioassessment in complex environments: designing an index for
consistent meaning in different settings. Freshwater Science 35: 249-271.

Mazor, R. D., B. J. Topping, T.-L. Nadeau, K. M. Fritz, J. E. Kelso, R. A. Harrington, W. S. Beck, K.
McCune, H. Lowman, A. Aaron, R. Leidy, J. T. Robb, and G. C. L. David. 2021a. User Manual for a
Beta Streamflow Duration Assessment Method for the Arid West of the United States. Version
1.0. Document No. EPA 800-K-21001. 83 pp. (Available from:
https://www.epa.gov/sites/production/files/2021-
03/documents/user_man ual_beta_sdam_aw.pdf)

Mazor, R. D., B. J. Topping, T.-L. Nadeau, K. M. Fritz, J. E. Kelso, R. A. Harrington, W. S. Beck, K. S.
McCune, A. 0. Allen, R. Leidy, J. T. Robb, and G. C. L. David. 2021b. Implementing an operational
framework to develop a streamflow duration assessment method: A case study from the Arid
West United States. Water 13: 3310.

Mazor, R. D., B. J. Topping, T.-L. Nadeau, K. M. Fritz, J. E. Kelso, R. A. Harrington, W. S. Beck, K.
McCune, A. Allen, R. Leidy, J. T. Robb, G. C. L. David, and L. Tanner. 2021c. User Manual for a
Beta Streamflow Duration Assessment Method for the Western Mountains of the United
States. Version 1.0. Document No. EPA-840-B-21008. 116 pp. (Available from:
https://www.epa.gov/system/files/documents/2021-12/beta-sdam-for-the-wm-user-
manual.pdf)

Mazor, R.D., Fritz, K.M., Topping, B., Nadeau, T.-L., and Kelso, J. 2022. Development and
Evaluation of the Beta Streamflow Duration Assessment Method for the Western Mountains -
Data Supplement. Document No. EPA 840-R-22002. 38 pp. (Available from:
https://www.epa.gov/system/files/documents/2022-05/WM%20Data%20supplement_5-4-
22%20FINAL.pdf)

Mohammed, R., J. Rawashdeh, and M. Abdullah. 2020. Machine learning with oversampling and
undersampling techniques: Overview study and experimental results. Pages 243-248 in
Proceedings of the 11th International Conference on Information and Communication Systems.
Irbid, Jordan 7-9 April 2020.

Nadeau, T.-L. 2015. Streamflow Duration Assessment Method for the Pacific Northwest. EPA
910-K-14-001, U.S. Environmental Protection Agency. 36 pp. (Available from:
https://www.epa.gov/sites/default/files/2016-

01/documents/streamflow_duration_assessment_method_pacific_northwest_2015.pdf)

39


-------
Nadeau, T.-L., S. G. Leibowitz, P. J. Wigington, J. L. Ebersole, K. M. Fritz, R. A. Coulombe, R. L.
Comeleo, and K. A. Blocksom. 2015. Validation of rapid assessment methods to determine
streamflow duration classes in the Pacific Northwest, USA. Environmental Management 56: 34-
53.

New Mexico Environment Department (NMED). 2011. Hydrology Protocol for the
Determination of Uses Supported by Ephemeral, Intermittent, and Perennial Waters. Surface
Water Quality Bureau, New Mexico Environment Department, Albuquerque, NM. 35 pp.
(Available from: https://www.env.nm.gov/surface-water-quality/wp-

content/uploads/sites/25/2019/ll/WQMP-CPP-Appendix-C-Hydrology-Protocol-20201023-
APPROVED.pdf)

Omernik, J.M. 1995. Ecoregions: a framework for managing ecosystems. The George Wright
Forum 12: 35-50.

Perkin, J. S., K. B. Gido, J. A. Falke, K. D. Fausch, H. Crockett, E. R. Johnson, and J. Sanderson.
2017. Groundwater declines are linked to changes in Great Plains stream fish assemblages.
Proceedings of the National Academy of Sciences USA 114: 7373-7378.

Schumacher, C., and K. M. Fritz. 2019. Standard Operating Procedure: Verifying/Calibrating,
Deploying, Retrieving Stream Temperature, Intermittency, and Conductivity (STIC) Data
Loggers, and Downloading and Converting Data. EPA Report J-WECD-ECB-SOP-1016-02.
Environmental Protection Agency, Cincinnati, OH. 13 pp.

United States Environmental Protection Agency (USEPA). 2019. Flow duration protocol version
2.1. 38 pp.

Vengosh, A., R. B. Jackson, N. Warner, T. H. Darrah, and A. Kondash. 2014. A critical review of
the risks to water resources from shale gas development and hydraulic fracturing in the United
States. Environmental Science & Technology 48: 8334-8348.

Wohl, E., M. K. Mersel, A. O. Allen, K. M. Fritz, S. L. Kichefski, R. W. Lichvar, T.-L. Nadeau, B. J.
Topping, P. H. Trier, and F. B. Vanderbilt. 2016. Synthesizing the Scientific Foundation for
Ordinary High Water Mark Delineation in Fluvial Systems. Wetlands Regulatory Assistance
Program ERDC/CCREL SR-16-5, U.S. Army Corps of Engineers Engineer Research and
Development Center. 217 pp. (Available from: https://apps.dtic.mil/sti/pdfs/AD1025116.pdf)

Wolock, D. M. 2003. Base-flow index grid for the conterminous United States: U.S. Geological
Survey Open-File Report 03-263, digital dataset. (Available from:
https://water.usgs.gov/lookup/getspatialPbfi48grd)

40


-------
Appendix A: Glossary of Terms Used

Streamflow Class

Description

Ephemeral reaches

Flow only in direct response to precipitation. Water typically flows only during and/or
shortly after large precipitation events, the streambed is always above the water table,
and stormwater runoff is the primary water source.

Intermittent reaches

Contain sustained flowing water for only part of the year, typically during the wet season,
where the streambed may be below the water table or where the snowmelt from
surrounding uplands provides sustained flow. The flow may vary greatly with stormwater
runoff.

Perennial reaches

Contain flowing water continuously during a year of normal rainfall, often with the
streambed located below the water table for most of the year. Groundwater typically
supplies the baseflow for perennial reaches, but the baseflow may also be supplemented
by stormwater runoff or snowmelt.

At Least Intermittent (ALI)

Contain more than ephemeral flow but cannot be determined with high confidence if it is
intermittent or perennial



Performance Measure

Description

PvlvE

Overall measure of accuracy. Ability of model to correctly classify between Perennial
versus Intermittent versus Ephemeral. Calculated as the percent of reach-visits classified
correctly (weighted by the number of visits per reach).

EvALI

Ability of model to correctly classify between Ephemeral and At Least Intermittent (1 or P).
Calculated as the percent of reach-visits classified correctly (weighted by the number of
visits per reach).

Precision

For reaches that have multiple visits, are they consistently predicted correctly? Calculated
as the proportion of visits within a reach with the most frequent classification, averaged
across reaches.



Dataset

Description

Training

A subset of 80% of the total reaches that was used for model development. This subset
was randomly selected, stratifying by strata (i.e., Southern, Central, Upper, and Northern),
and actual streamflow duration class (i.e., perennial, intermittent, and ephemeral).

Testing

A subset of 20% of the total reaches that was used for model testing and is independent
from the training reaches. This subset was randomly selected, stratifying by strata (i.e.,
Southern, Central, Upper, and Northern) and actual streamflow duration class (i.e.,
perennial, intermittent, and ephemeral).

Note: Data are divided by reach so that all visits at a single reach are included either in training or testing

Candidate Metric

Description

Type

Selected by
RFE

Strata

SDAM subregions includes Central Prairie, Upper Midwest,
Northern Prairie, Southern Plains. This is also used for the
Northern Great Plains and Southern Great Plains.

GIS

No

Algae_score (NM)

Are Filamentous Algae and/or periphyton present at the reach?
Higher scores indicate that algae were more prevalent and
easier to find in the reach.

Bio

(algae)

No

algdead_cover_score

Dead algal cover on the streambed within the study reach

Bio

(algae)

No

algdead_noupstream_cover_sc
ore

Are algae on the streambed within the study reach likely from
upstream source (i.e., dead mats deposited in downstream
reach)?

Bio

(algae)

No

alglive_cover_score

Live algal cover on the streambed within the study reach

Bio

(algae)

No

41


-------
Candidate Metric

Description

Type

Selected by
RFE

alglivedead_cover_score

Visual estimate of the percent of streambed covered by live or
dead algal growth

Bio

(algae)

No

ai_present (PNW)

Presence/absence of aquatic invertebrate within the sample
reach

Bio

(aquatic
inverts)

No

BMI_score (NM)

Benthic Macrolnvertebrate (BMI) abundance. Higher scores
indicate that BMI were more prevalent and easier to find in the
reach.

Bio

(aquatic
inverts)

No

EPT_abundance

Abundance of mayflies, stoneflies, or caddisflies (i.e.,
Ephemeroptera, Plecoptera, Trichoptera, EPT)

Bio

(aquatic
inverts)

No

EPT_relabd

Relative abundance of EPT families

Bio

(aquatic
inverts)

No

EPT_reltaxa

Relative richness of EPT families

Bio

(aquatic
inverts)

No

EPT_taxa

Number of EPT families

Bio

(aquatic
inverts)

Yes

GOLD_abundance

Abundance of Gastropoda, Oligochaeta, and Diptera (GOLD)

Bio

(aquatic
inverts)

No

GOLD_relabd

Relative abundance of Gastropoda, Oligochaeta. and Diptera
(GOLD) taxa

Bio

(aquatic
inverts)

No

GOLD_reltaxa

Relative richness of Gastropoda, Oligochaeta, and Diptera
(GOLD) taxa

Bio

(aquatic
inverts)

No

GOLD_taxa

Number of Gastropoda, Oligochaeta, and Diptera (GOLD)
families

Bio

(aquatic
inverts)

No

GOLDOCH_relabd

Relative abundance of GOLD and OCH taxa

Bio

(aquatic
inverts)

No

GOLDOCH_reltaxa

Relative richness of GOLD and OCH taxa

Bio

(aquatic
inverts)

No

mayfly_abundance

Abundance of mayflies

Bio

(aquatic
inverts)

No

mayfly_gt6 (PNW)

Mayfly abundance greater than six

Bio

(aquatic
inverts)

No

Noninsect_abundance

Abundance of non-insect invertebrate taxa

Bio

(aquatic
inverts)

No

Noninsect_relabund

Relative abundance of non-insect invertebrate taxa

Bio

(aquatic
inverts)

No

Noninsect_reltaxa

Relative richness of non-insect invertebrate taxa

Bio

(aquatic
inverts)

No

42


-------
Candidate Metric

Description

Type

Selected by
RFE

Noninsect_taxa

Richness of non-insect invertebrate taxa

Bio

(aquatic
inverts)

No

OCH_abundance

Abundance of Odonata, Coleoptera, and Heteroptera (OCH)

Bio

(aquatic
inverts)

No

OCH_relabd

Relative abundance of Odonata, Coleoptera, and Heteroptera
(OCH) taxa

Bio

(aquatic
inverts)

No

OCH_reltaxa

Relative richness of Odonata, Coleoptera, and Heteroptera
(OCH) taxa

Bio

(aquatic
inverts)

No

OCH_taxa

Number of Odonata, Coleoptera, and Heteroptera (OCH)
families

Bio

(aquatic
inverts)

No

peren_present (PNW)

Presence/absence of perennial indicator invertebrate taxa
within the study reach

Bio

(aquatic
inverts)

No

perennial_abundance

Abundance of perennial invertebrate indicator taxa

Bio

(aquatic
inverts)

No

perennial_live_abundance

Abundance of perennial invertebrate indicator taxa (living
specimens only)

Bio

(aquatic
inverts)

No

perennial_taxa

Number of perennial invertebrate indicator taxa

Bio

(aquatic
inverts)

No

Richness

Total richness of aquatic invertebrate families

Bio

(aquatic
inverts)

No

TotalAbundance

Total abundance of aquatic invertebrates

Bio

(aquatic
inverts)

No

iofb_score (NM)

Presence/absence of iron-oxidizing bacteria and fungi.

Bio

(other)

No

liverwort_cover_score

Liverwort cover on the streambed. Higher scores indicate
higher liverwort cover on streambed.

Bio

(other)

No

moss_cover_score

Moss cover on the streambed. Higher scores indicate higher
moss cover on streambed.

Bio

(other)

No

DifferenceslnVegetation_score
(NM)

Differences in vegetation between the riparian corridor and
adjacent uplands score. Higher scores indicate a more distinct
riparian corridor.

Bio
(veg)

No

hydrophytes_present

Number of hydrophytic plant species (FACW or OBL) observed
within the study reach channel and 1/2 channel width of the
stream on either bank

Bio
(veg)

No

hydrophytes_present_any
(PNW)

Is the presence/absence of hydrophytes within the study reach
channel and 1/2 channel width of the stream on either bank?

Bio
(veg)

No

hydrophytes_present_noflag

Number of hydrophytic plant species (FACW or OBL) observed
within the study reach channel and 1/2 channel width of the
stream on either bank (excluding taxa with unusual
distributions flagged by the field crew)

Bio
(veg)

Yes

PctShading

Percent shading on the streambed.

Bio
(veg)

Yes

43


-------
Candidate Metric

Description

Type

Selected by
RFE

ripariancorr_score (PNW)

With/without distinctive vegetation in the riparian corridor
compared to surrounding upland vegetation.

Bio
(veg)

No

UplandRootedPlants_score
(NM)

Are upland rooted plants absent from the streambed score?
Higher scores indicate fewer upland plants in the streambed.

Bio
(veg)

Yes

amphib_score (PNW)

Detection of aquatic life stage(s) of amphibian(s) within the
study reach.

Bio

(verts)

No

Fish_score (NM)

Fish abundance score. Higher scores indicate that fish were
more prevalent and easier to find in the reach.

Bio

(verts)

No

fishabund_score2

When Mosquitofish are present, set to 0. Otherwise, use
Fish_score (which is the abundance of fish).

Bio

(verts)

No

frogvoc_score

Presence/absence of frog vocalizations

Bio

(verts)

No

snake_score (PNW)

Presence/absence of aquatic snakes within the study reach

Bio

(verts)

No

turt_score

Presence/absence of turtle(s) within the study reach

Bio

(verts)

No

vert_score

Presence/absence of aquatic vertebrates. max(snake_score,
amphib_score, turt_score, frogvoc_score)

Bio

(verts)

No

vert_sumscore

Number of aquatic vertebrate types present. (Sum of
snake_score, amphib_score, and turt_score)

Bio

(verts)

No

vertvoc_sumscore

Sum of (snake_score, amphib_score, turt_score, frogvoc_score)

Bio

(verts)

No

BankWidthMean

Mean of columns that start with 'Bankwidth'

Geom

Yes

ChannelDimensions_score (NM)

Scored channel entrenchment metric from the New Mexico
protocol; higher scores indicate less entrenchment and more
access to the floodplain. Higher scores indicate the channel was
less confined (had higher entrenchment ratios).

Geom

Yes

erosion_score (PNW)

Presence/absence of evidence of fluvial erosion (e.g., undercut
banks, scour marks, channel downcutting, channel incision)
and/or deposition (e.g., bars, recent deposits) within the study
reach channel?

Geom

No

floodplain_score (PNW)

Presence/absence of a true floodplain at the reach?

Geom

No

SedimentOnPlantsDebris_score
(NM)

Visual estimate of the extent of evidence of sediment
deposition on plants and on debris within the floodplain. Higher
scores indicate that sediment deposition was more prevalent
throughout the reach.

Geom

No

Sinuosity_score (NM)

Scored channel sinuosity. Higher scores indicate more sinuous
channels.

Geom

Yes

Slope

Reach slope as measured with a handheld clinometer

Geom

No

slope_gtl0.5 (PNW)

Straightline reach slope as measured with a handheld
clinometer greater than or equal to 10.5%

Geom

No

slope_gtl6 (PNW)

Straightline reach slope as measured with a handheld
clinometer greater than or equal to 16%

Geom

No

SubstrateSorting_score (NM)

Visual estimate of the extent of evidence of substrate sorting
within the channel. Higher scores indicate greater sorting of
substrate within the channel relative to surrounding uplands.

Geom

Yes

RifflePoolSeq_score (NM)

Visual estimate of the diversity and distinctiveness of riffles,
pools, and other flow-based microhabitats. Higher scores
indicate more distinctive riffles, pools, and other flow habitats
with clear transitions within the reach.

Geom

No

BFI

Base flow Index: estimated percentage of total flow that is
attributed to groundwater discharge to streams by
interpolating values from USGS stream gages

GIS

No

44


-------
Candidate Metric

Description

Type

Selected by
RFE

Elev_m

Watershed elevation retrieved from StreamCat database

GIS

No

MeanSnowPersistence_01

Mean snow persistence within a 1-km radius of the reach

GIS

No

MeanSnowPersistence_05

Mean snow persistence within a 5-km radius of the reach

GIS

No

MeanSnowPersistence_10

Mean snow persistence within a 10-km radius of the reach

GIS

No

ppt

Mean annual precipitation

GIS

No

ppt.mOl

Mean January precipitation

GIS

No

ppt.m02

Mean February precipitation

GIS

No

ppt.m03

Mean March precipitation

GIS

No

ppt.m04

Mean April precipitation

GIS

No

ppt.m05

Mean May precipitation

GIS

No

ppt.m06

Mean June precipitation

GIS

No

ppt.m07

Mean July precipitation

GIS

No

ppt.m08

Mean August precipitation

GIS

No

ppt.m09

Mean September precipitation

GIS

No

ppt.mlO

Mean October precipitation

GIS

No

ppt.mil

Mean November precipitation

GIS

No

ppt.ml2

Mean December precipitation

GIS

No

tmax

Maximum annual temperature (PRISM 30-year normal)

GIS

No

tmean

Mean annual temperature (PRISM 30-year normal)

GIS

No

tmin

Minimum annual temperature (PRISM 30-year normal)

GIS

No

HydricSoils_score (NM)

Presence/absence of hydric soils within the study reach

Hydro

No

WoodyJams_number

Number of woody jams present within the study reach channel
(or up to 10 m outside of the study reach). Woody jams much
completely span the active channel and be in contact with the
streambed. Contain at least 3 large pieces (>1 m long and >10
cm diameter). Cause sufficient blockage to disrupt flow of
water or sediment under flowing conditions.

Hydro

No

lsolatedPools_number (PNW)*

Number of pools (must have surface water) with no evidence of
surface water flow in or out

Hydro

No

SurfaceFlow_pct (PNW)*

Visual estimate of percentage of reach length that has flowing
surface water.

Hydro

No

SurfaceSubsurfaceFlow_pct
(PNW)*

Visual estimate of percentage of reach length that has flowing
surface water or sub-surface (hyporheic) flow

Hydro

No

SoilMoist_MaxScore*

Soil is qualitatively assessed for moisture level (saturated, partly
saturated, or dry) in three locations. This indicator uses the
wettest score out of the three.

Hydro

No

SoilMoist_MeanScore*

Soil is qualitatively assessed for moisture level (saturated, partly
saturated, or dry) in three locations. This indicator uses the
mean moisture score observed over all three locations.

Hydro

No

springs_score (NM)*

Scored abundance of seeps and/or springs within the sample
reach. Higher scores indicate larger numbers of seeps and/or
springs.

Hydro

No

WaterlnChannel_score (NM)*

Scored surface water flow/presence in the sample reach.
Higher scores indicate channels with greater levels of surface
water flow/presence.

Hydro

No

Asterisks (*) indicate hydrologic metrics that directly measure the presence of water.

45


-------