vvEPA
    United States
    Environmental Protection
    Agency
Developing  Biological
Indicators:   Lessons Learned
from Mid-Atlantic Streams
Introduction

To demonstrate how monitoring and assessment
at the regional scale could be achieved, the U.S.
Environmental Protection Agency (USEPA)
implemented the Mid-Atlantic Integrated Assessment
(MAIA) pilot study for surface waters. A primary
goal of this study was to define biological
indicators that could be used to assess
stream condition at the regional level.
During 1993-1996, hundreds of wadeable
stream sites throughout the mid-Atlantic
region were surveyed for water chemistry,
land use, riparian condition, and channel
morphology in an effort to better
understand how human influence alters
fish, benthic macroinvertebrate (e.g.,
aquatic insect) and periphyton (e.g., algae
and diatom) assemblages.
             series of workshops to create a consistent approach
             for testing and selecting biological indicators for fish,
             macroinvertebrates and periphyton. This brochure
             summarizes a document that presents issues from those
             workshops. Please refer to Developing Biological Indicators:
                        Lessons Learned from Mid-Atlantic Streams
                        (Fore 2003, EPA/903/R-03/003) in its
                        entirety for a more in-depth explanation
                        of related challenges and conclusions:
                        http://www.epa.gov/bioindicators.
 To create a
 consistent
approach for
 testing and
  selecting
  biological
  indicators
     Goal of EPA-sponsored
          workshops
During the course of the study,
researchers working independently derived different
approaches to data analysis and reported different
results regarding the relationships between human
influence and biological change. To build consensus
among the scientists involved, EPA sponsored a
           Lessons learned from the MAIA study are
           outlined below and highlight the steps
           involved in developing stream biological
           indicators. Efforts to resolve aspects of
           sampling design, data collection and
           management, correlating human impacts
           and survey data, testing and selecting
           individual metrics, and final development
and application of a multi-metric index are presented
here. This document is aimed at agency scientists
or managers tasked with implementing regional
monitoring programs.

                               Probabilistic sampling design was the best choice for MAIA
                                  Given the size of the sampling area and the scope of the questions
                                  asked for the MAIA study, most agreed that randomization of site
                                  selection was necessary to yield an unbiased estimate of conditions
                                  across the entire region (Figure 1). Unless one samples every site
                                  (census sampling), selecting sites randomly is the only method for
                                  inferring regional condition from a smaller set of sites.

                                  Why sample randomly? If site  selection is random and all sites are
                                  sampled with an equal or known probability, then information from the
                                  sampled sites can be used to infer the condition of sites not sampled.
                                  Thus, results based on a random sample of sites can be scaled up to
                                  the entire population of sites within a region, as long as each site in the
                                  region could have been included in the sample.
 Figure 1. MAIA probabilistic study
 design locations for fish.
     Lesson Learned:

     This design strategy met the scope of the study and
     circumvented the need to conduct census sampling,
     an otherwise costly venture.

-------
   Reference sites did not always meet criteria for reference condition
   Expectations for biological indicators are based on observed conditions at undisturbed or minimally
   disturbed locations. These reference sites are used to define reference conditions. The MAIA study used a
   two-pronged approach to select these sites:
                                             •  Researchers developed independent criteria, such as
                                                acid-neutralizing capacity (ANC) and nutrient loads
                                                (Table 1).
                                             •  Local biologists helped select sites based on their best
                                                professional judgment (BPJ) in order to ensure the full
                                                range of site disturbance be included,  not just "typical"
                                                site disturbance.
Table 1. Independent reference site criteria.
   All reference sites in calibration dataset met
   ALL of these criteria:
                 ANC > 50 ueq/L
     Total Phosphorus < 20 ug/L
         Total Nitrogen < 750 ug/L
             Chloride < 100 ueq/L
               Sulfate < 400 ueq/L
      Mean RBP Score > 15
                                           Ironically, 73% (44 out of 60) of the BPJ sites failed to meet
                                           the independently established criteria.
                                                  Lesson Learned:

                                                  Best professional judgment should always be confirmed
                                                  with objective criteria in choosing reference sites.
Perils of Data Management

  Different "names" for the same site caused confusion
  The MAIA data set was large and complex; therefore,
  it was not possible to put all the data in a single file.
  Consistent data identifiers, particularly site names,
  were extremely important for matching multiple
  files of related data. Although the initial information
  management strategy accounted for this need,
  inconsistencies arose in the completion of data fields,
  which in turn complicated data analysis.
                                                              Lesson Learned:

                                                              More information should have been
                                                              included within each data file to identify
                                                              unique sampling occasions. Spending
                                                              more time up front ensuring that data were
                                                              completely and correctly stored would
                                                              have saved considerable time spent trying
                                                              to repair or retrieve corrupted data.
   Lesson Learned:

   Rather than create a complicated
   database structure from which data
   would have to be exported for analysis,
   data files were kept simple from the
   beginning so that they could be easily
   downloaded from an EPA Internet site
   and quickly entered into the user's own
   statistical software.
                                          Simple files were best
                                          Data analysis for MAIA involved multiple institutions and
                                          investigators using different statistical software. Posting
                                          files on an Internet server was the most practical approach
                                          to sharing files among so many remote users. Hosting a
                                          searchable relational database that included all the data was
                                          an option, but these were typically slow and difficult for the
                                          host to maintain. Because researchers were typically interested
                                          in a subset of data, smaller,  simpler files with variables
                                          grouped according to topic worked best. The MAIA data had
                                          to be accessible to many remote users with the intention of
                                          manipulating the data within a variety of software.
  Original data must be archived
  The tendency was to lose track of original
  files with confusing formats when newer
  versions were created. For the MAIA study,
  referencing original files was the only way
  to catch major errors in later versions of the
  data.
                                                  Lesson Learned:

                                                  Original field or bench sheets must be archived along
                                                  with the first generation of electronic data in a way that
                                                  the data will not be changed or lost.

-------
    Linking Human Disturbance to Biological Change

    Because biological systems are complex and human disturbance is multidimensional (e.g., differing
    types, sources, duration and intensity), single causes and mechanisms of impairment are difficult to
    isolate. As a result, much of the evidence for human degradation of natural resources is correlative. In
    such situations, although the path to causality (i.e., demonstrating cause and effect) is blocked by the
    inability to perform controlled experiments and use statistical inference, logical argument (or weight
    of evidence) constructed according to a recognized set of rules can be used instead (Table 2). In fact,
    this approach typically yields a stronger case because researchers consider alternative  explanations
    explicitly, rather than ignoring them.

    Results from the Mid-Atlantic illustrate how a causal argument can be constructed to support the
    idea that human disturbance causes biological change. Specifically, Figure 2 shows the strength of the
    correlation among reference site values versus sites impacted by acid  deposition. This information can
    be used to support Beyers' first criterion for constructing causal arguments (Table 2).
  Table 2.  Ten criteria for constructing causal arguments.
  (modified after Beyers, 1998)
       1. Strength: a large proportion of sampling units
        are affected in exposed areas compared with
        reference areas
      2. Consistency: the association has been observed at other
        times and places
      3. Specificity: the effect is diagnostic of exposure
      4. Temporality: exposure must precede the effect in time
      5. Dose-response: the intensity of the observed effect is
        related to the intensity of the exposure
      6. Plausibility: a plausible mechanism links cause and effect
      7. Evidence: a valid experiment provides strong evidence of
        causation
      8. Analogy: similar stressors cause similar effects
      9. Coherence: the causal hypothesis does not conflict with
        current knowledge
      10. Exposure: indicators of exposure must be found in affected
        organisms
        Beyers, D. W. 1998. Causal inference in environmental impact studies.
          Journal of the North American Benthological Society 17:367-173.
                100
jD Fish
T1 Invertebrate
T1 Diatom      80

                60

                40

                20
                          8
                     Reference
                               TT   III
I
                                                         Addressing concerns about circular
                                                         reasoning

                                                         An argument based on circular reasoning is
                                                         one in which the conclusion is embedded in
                                                         the premise, as for example, in the statement,
                                                         "decline in mayfly taxa richness is a good
                                                         indicator of biological disturbance because
                                                         we find many types of mayflies at undisturbed
                                                         places." The concern is that the observed
                                                         correlation may be due to spurious correlation
                                                         with another underlying cause that drives  both
                                                         biology and patterns of human settlement, such
                                                         as elevation or watershed size.
Figure 2. Multimetric index values for fish, invertebrates, and
diatoms as a function of human disturbance. All index values
were higher at reference sites. Diatom index values were higher
than invertebrate index values for sites with acid deposition.
Lessons Learned:
In addition to the criteria in Table 2, a variety
of safeguards helped reduce drawing
unsubstantiated conclusions:
  •    Site selection was randomized
      across a large geographic area
      to ensure that the sample was
      representative of all possible sites.
  •    Measures of disturbance were
      selected independently of the
      biological metrics.
  •    Part of the  data set was reserved
      to independently validate the final
      indexes.
  •    All metrics  were tested for
      correlation  with multiple gradients of
      human disturbance.
  •    Potential confounding factors such
      as watershed area were explicitly
      tested.

-------
Patterns of human disturbance were complex

Dozens of variables related to water chemistry, metals, nutrients,
fish tissue contaminants, habitat, channel morphology,
geographic features, human census data, satellite land cover
and use, and specific point sources were included in the data
set. Hundreds more were derived from the data collected. The
hope was that such a complete record of human activity would
provide a clear picture of human influence and disturbance
within a watershed. In reality, disturbance measures were not
necessarily correlated with each other because not all activities
were present in every watershed. Consequently, one of the
primary challenges for the MAIA study was to determine which
variables most accurately characterized human influence.
Examination of a correlation matrix of all site condition
variables revealed correlative patterns among related variables.
               \>
Lesson Learned:

A comprehensive study linking
the types of human activities
(e.g., mining or agriculture) with
their specific stressors (e.g.,
SO4 or nutrients) would have
been helpful in clarifying metric
response to disturbance. Such a
study would have provided a better
understanding of which measures
of disturbance tended to vary
together and which measures were
related to natural geographic or
landscape features.
Integrated versus single measures of
disturbance were better predictors of
human influence

Measures of generalized disturbance
reflect multiple attributes of degradation
as opposed to singular stressors, such
as nitrogen levels or turbidity. Overall,
specific stressors tended to be more highly
correlated with integrative (or generalized)
measures of human disturbance than
they were with measures of only a single
aspect of disturbance. For example, four
individual measures (turbidity, pebble
size, riparian vegetation condition, and
riparian disturbance) were correlated with
one or two of each other, but all four were
correlated with Bryce, et al.'s (1999) index
of disturbance, an integrated measure of site
condition.


Similarly for biological indicators, integrative
measures of disturbance, rather than specific
stressors, showed a higher correlation with
multimetric indexes for all three assemblages
(Table 3). One chemical measure, chloride,
was a strong indicator of general disturbance
and also highly correlated with all three
biological indexes.
 Lesson Learned:

 Measures of disturbance that integrate
 measures of site condition over multiple
 spatial scales tended to better capture the
 cumulative effects of human influence.
Table 3.  Spearman's correlation of three multimetric indexes
with selected measures of human disturbance. All correlation
coefficients were significant; only values > 0.3 (or < -0.3) are
shown.
Measure Fish
N (total nitrogen) -0.45
P (total phosphorus)
NH4 (ammonia) -0.33
AN C (acid neutralizing)
SO4 (sulfate) -0.34
Turb (turbidity)
%S F (% sand and fine sediments)
PbSz (PeDDle size corrected for stream power)
RVeg (riparian vegetation)
RDlSt (riparian disturbance)
RBP (rapid bioassessment/habitat protocol)
Invertebrate
-0.32

-0.36
-0.33

-0.33
-0.54
0.43

-0.35
0.42
Diatom
-0.54
-0.61
-0.32
-0.53

-0.39
-0.39
0.33
0.30

0.36
CL (chloride)
     -Q.45
-0.31
-0.55
BrVCe at a. [1999] disturbance
categories
     n on
     -U.OO
-0.57
-0.54
%Dist
Total
         Bryce et al. developed a risk index that
     summarized the intensity of human disturbance
     in the watershed upstream of sampled reaches.
       The risk index integrated information from
      the regional, watershed and reach scale. Each
     watershed was scored from 1 to 5 representing
           minimal to high risk of impairment.

    Bryce, S. A., et al., 1999. Assessing relative risks to aquatic
     ecosystems: a mid-Appalachian case study. Journal of the
      American Water Resources Association. 35(1):23-36.

-------
Metric Testing
Potential measures are selected for inclusion
in a multimetric index if they are biologically
meaningful, consistently associated with human
disturbance, not redundant with other metrics, and
reliably and easily quantifiable from field samples.
With a large list of candidate metrics and a single
test for each, there was the possibility that candidates
would meet the criteria for metric selection because
of chance alone. However, multiple tests against
different measures of human disturbance avoided
this pitfall.

Simple criteria were used first to eliminate
candidate metrics
Simple statistical rules were developed to shorten
the long list of candidate metrics identified for each
assemblage. This first round of elimination  focused
on evaluating each metric's range of values, that
is, the ability of a metric to differentiate levels of
human disturbance.
    Lesson Learned:

    For Mid-Atlantic streams, candidate metrics
    were eliminated in favor of metrics with a
    broader range of values.
Statistical precision was no substitute for
correlation with disturbance
Signal-to-noise ratios estimate a measure's ability to
distinguish differences among sites from differences
within individual sites. If the variability of a candidate
metric within individual sites is higher than its
variability among all sites, then the measure is unlikely
to detect differences in biological condition among
sites (or differences at sites that change over time).

Although most metrics incorporated into multimetric
indexes have high signal-to-noise ratios (indicating
high precision), a high ratio alone  does not guarantee
that a candidate metric will be a meaningful indicator.
Metric values can be highly repeatable at individual
sites but still be unrelated to human disturbance.

Consider, for example, pool depth  and embeddedness.
The depth of a pool in a stream is often considered
an indicator of good fish habitat. Quality is  expected
to decline as erosion, dredging, and sedimentation
fill pools, creating a homogeneous  channel profile.
Embeddedness represents the proportion of the
stream reach filled with sand and fine sediments. For
the MAIA study, pool depth measures were very
precise, with signal-to-noise ratio of 16. In contrast,
embeddedness measures revealed a signal-to-noise
ratio of 1.9, failing to meet the authors' suggested
minimum value of 2. Embeddedness, however,
showed a strong correlation with human disturbance.
Conversely, pool depth was precise but not related
to human disturbance. Embeddedness, though
less statistically precise, was the better indicator of
biological condition (Figure 3).
     Lesson Learned:

     Certainly statistical precision is a desirable
     property of a good metric, but statistical
     precision alone does not guarantee a
     predictable association with human disturbance.

-------
  Metrics from different assemblages were
  eliminated for different reasons
  The list of plausible metrics proposed for testing
  in Mid-Atlantic streams included 58 for fish,
  120 for invertebrates, and 240 for periphyton.
  Most of the candidate metrics for periphyton
  represented untested hypotheses, whereas the
  other assemblage metrics had experienced a
  greater amount of testing. As Table 4 shows, fish
  metrics were eliminated for different reasons and
  at different frequencies than were invertebrate
  metrics.
       Lesson Learned:

       Across assemblages, metrics were
       selected and eliminated for different
       reasons.
  Table 4. Numbers of candidate metrics tested for
  MAIA's fish and invertebrate multimetric indexes
  and a summary of the reasons for which they were
  eliminated. This winnowing process resulted in
  fewer than 10 metrics included in the final indexes.



1
CO
1
01
en
o
<5



Total # of
candidate metrics
Insufficient range
Poor signal/noise
Redundant

Fail to correlate
Persistent correlation
w/watershed area
Total # of metrics
in final index
Fish
58
13
2
3

30
1
9
Invertebrates
120
20
66
25

2
0
7
Development and Application of Multimetric Indexes
Multimetric indexes were created in part to fulfill a Clean Water Act mandate that all states develop
numeric criteria for assessing biological condition of water bodies. A multimetric index, as the name
implies, is a carefully constructed framework of multiple types of measurements. Once individual
metrics have been tested and selected for inclusion in a multimetric index, it is necessary to ensure the
index as a whole will offer a reliable and quantifiable indication of human disturbance.
         Biological criteria depend on
         the definition of reference sites
         In the same way reference sites
         are used to develop individual
         metrics, multimetric index
         values observed at reference, or
         minimally disturbed, sites are
         used by many states to define
         biological impairment. Currently,
         states vary both in the way they
         characterize reference condition
         and define deviation from
         reference condition. States also
         vary in their determinations of
         biological impairment thresholds.
Lesson Learned:

From the MAIA study we learned the importance
of having objective criteria to select reference
sites, a lesson relevant to all states as they
develop reference condition criteria and rules for
defining impairment. Additionally, it is important
to develop informative, defensible and consistent
thresholds from state to state.
                                                              Red breasted sunfish

-------
Patterns of index variability were similar
across assemblage types
Statistical precision is an important feature of
any monitoring tool because it determines the
ability of an indicator to detect change should
it occur. A highly variable indicator must show
a large change in value before the change
is statistically significant. Lack of sensitivity
translates into an inability to sound an alarm
that will protect resources from degradation.

Statistical power analysis can be used to
estimate the magnitude of change that
an indicator can detect. Results from two
commonly used statistical models for power
analysis (t-test and regression) indicated that
the MAIA multimetric indexes had adequate
precision to distinguish between two and five
categories of biological condition (such as
good, fair, poor) and could detect between
1.5% and 2.5% change per year after five years
of monitoring.

As shown in Table 5, indexes for each
assemblage differed in percentages of
"nuisance" variance, that is,  the amount of an
index's total variance that can be explained by
year-to-year differences, statistical outliers, and
measurement error. Site variance is associated
with biological condition—the higher the
percentage of site variance, the more precise
the index.
          Caddisfly larvae (Family Leptoceridae)
     Lesson Learned:

     Despite differences in how the statistical
     models ranked the three indexes, percentages
     of "nuisance" variance components were
     approximately similar across assemblages
     (13-20%).
Table 5. Components of variance expressed as a
percentage of the total variance for diatom, invertebrate,
and fish multimetric indexes. Variance associated with site
differences, year-to-year differences, site x year interaction,
and repeat visits within years are shown for each index.

| "nuisance"
Percentage of total variance

Site
(target variability)
Year
(year-to-year differences)
Site x year
(statistical outliers)
Error [repeat visits]
(measurement error)
Diatom
80.4
0
2.7
16.9
Invertebrates
83.3
2.1
1.6
12.9
Fish
86.8
1.5
5.6
6.2




-------
A
                   U.S. Environmental Protection Agency
                   Mid-Atlantic Integrated Assessment
                   701 Mapes Road
                   Fort Meade, MD  20755-5350

                   EPA/903/F-06/001
                   January 2006
   Assemblages differed in their sensitivity to disturbance types
   The MAIA study concluded that any of the three assemblages could be used to monitor stream
   condition because multimetric indexes for all three assemblages could reliably distinguish degraded
   sites from sites with little or no human influence. However, each assemblage varied in its sensitivities
   to different types of disturbance. Figure 4 shows the relative sensitivity to disturbance conditions (or
   relative risk) of each assemblage. For example, fish showed less sensitivity to sedimentation effects than
   invertebrates or algae.
                                   Lesson Learned:

                                   Employing all three multimetric indexes to monitor stream
                                   condition yields the fullest range of information.


Non-Native Fish -
Sedimentation -
Large Wood -
Riparian Habitat -

Nitrogen -
Phosphorus -
Mine Drainage -
Acidic Deposition ~

Acid Mine Drainage -

0
FISH
i
	 ,
1 1 	 1


1 	 '
1 | 	 1

1 | 	 1

1
0 0.5 1.0 1.5 0
MACROINVERTEBRATES
I
1 i 	 1
	 =1

	 1 ' 1
i ' 	 '
i i 	 1

i 	 1

1.
1
0 0.5 1.0 1.5 0
Relative Risk
ALGAE
1
	 ' 1


1 i 	 1
n 	 1
| | |

f 1
1. .
1 1 1 1
.0 0.5 1.0 1.5 2.0 2.5 3.0












                               Figure developed by John Stoddard based on data from the Mid-Atlantic Highlands Streams Assessment (EPA 2000, EPA/903/R-00/015)
Figure 4. A relative risk of 1.0 denotes "no stressor effect", and  stressors with confidence intervals lying
entirely above 1.0 (green bars) are statistically significant (one-sided p<=0.05). This figure shows relative risk
values for associations between biotic integrity (for each assemblage) and stressor condition (for each assessed
stressor). Length of bars is the increase in likelihood of encountering a poor ecological condition (based on biological
indicators) when the stressor is also ranked as poor.

  Contact
  Wayne Davis
  Office of Environmental Information
  Environmental Analysis Division
  Mid-Atlantic Integrated Assessment
  701 Mapes Road
  Fort Meade, MD 20755-5350
  410-305-3030
  davis.wayne@epa.gov
  www.epa.gov/bioindicators
                                 Lou Reynolds
                                 USEPA Region 3
                                 EAID Freshwater Biology Team
                                 1060 Chapline Street
                                 Suite 303
                                 Wheeling, WV 26003-2995
                                 304-234-0244
                                 reynolds.louis@epa.gov
John Stoddard
USEPA Office of Research and
Development
NHEERL - Western Ecology Division
200 SW 35th Street
Corvallis, OR 97333
541-754-4441
stoddard.john@epa.gov
www.epa.gov/nheerl/arm
                                ^,-f^-l,. Printed on chlorine free 100% recycled paper with
                                     100% post-consumer fiber using vegetable-based ink.

-------