vvEPA
  Agencf
of         for
                  :


-------

-------
                                                EPA/903/R-05/003
                                                       June 2006
of                   for
             Prepared for:

             Wayne Davis
  U.S. Environmental Protection Agency
   Office of Environmental Information
   Mid-Atlantic Integrated Assessment
         Fort Meade, Maryland

 COMMITS Contract No.
              Prepared by:

              Versar, Inc.
          9200 Rumsey Road
    Columbia, Maryland 21045-1934
      Recycled /'Recyclable
      Printed with Vegetable Oil Based Inks on 100%
      Post-consumer Process Chlorine Free Recycled Paper

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
                                    NOTICE

     This document has been reviewed and approved in accordance with U.S.
     Environmental Protection Agency policy. Mention of trade names, products, or
     services does not convey and should not be interpreted as conveying official EPA
     approval, endorsement, or recommendation for use.

     Funding was provided by the U.S. Environmental Protection Agency under U.S.
     Department of Commerce, Commerce Information Technical Solutions Contract
     No. 50-CMAA-900065.

     The appropriate citation for this report is:

     Southerland, M., V01stad, I, Erb, L., Weber, E., Rogers, G. 2006. "Proof of
        Concept for Integrating Bioassessment Results from Three State Probabilistic
        Monitoring Programs". Report prepared for EPA under Contract No. 50-
        CMAA-900065. EPA/903/R-05/003. U.S. Environmental Protection Agency,
        Office of Environmental Information and Mid-Atlantic Integrated Assessment
        Program, Region 3, Ft. Meade, MD.
                          ACKNOWLEDGEMENTS

     Jason Hill (Virginia Department of Environmental Quality), Jeffrey Bailey (West
     Virginia Department of Environmental Protection), and Maryland Department of
     Natural Resources provided the data and program information needed to complete
     this study. Tony Olson (U.S. EPA) provided estimates of stream condition for
     Virginia and West Virginia. Wayne Davis, Jason Hill, Jeffrey Bailey, Maggie
     Passmore, Laura Gabanski, and Leska Fore provided valuable comments on the
     draft report. We greatly appreciate the efforts of Kate Kritcher Traut and Juanita
     Soto-Smith (Perot Systems Government Services, Inc.) for the technical editing
     and layout of this report.

-------
                                    Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
                              ABSTRACT

If data from state stream monitoring assessment programs can be integrated,
EPA will be able to obtain estimates of stream condition over larger regions.
We assessed the feasibility of integrating three probabilistic monitoring
programs—Maryland, Virginia, and West Virginia—and calculated a provisional
combined estimate of condition for the non-Coastal Plain region of these
states using multimetric indices. All three states had probability-based surveys
with similar sample frames (ranges of stream types and sizes) and benthic
macroinvertebrate collection procedures outside of the Coastal Plain. Virginia
and West Virginia used similar Stream Condition Indices (SCIs) where index
scores were derived from the range of values at all sample sites (with thresholds
for rating stream condition based on reference condition), while Maryland used
a Benthic Index of Biotic Integrity (B-IBI) with metric scores assigned relative
to reference condition  (and thresholds based on the average of metric scores).
To compare the three index methods and establish a common benchmark, SCIs
were first calculated for Maryland  sites using the Virginia and West Virginia
methods. The two SCIs produced nearly identical results on Maryland data
indicating that the Virginia and West Virginia methods were directly comparable.
The Maryland B-IBI had a more uniform distribution of scores than the SCIs and
was not directly comparable. The West Virginia procedure for selecting reference
sites included site-by-site best professional judgment (BPJ) exclusions that were
more restrictive, but which could not be reproduced for other states, so were not
included in the provisional integrated assessment. Application of each state's
reference criteria to Maryland data (excluding West Virginia's BPJ exclusions)
resulted in different suites of reference sites. However, the distributions of
reference sites selected were similar, suggesting the  different reference sites
were of similar stream quality (comparably affected by human disturbance).
Using our example integration approach (and treating each state as a stratum)
and the  10th percentile  of reference sites as a degradation threshold, we estimated
that approximately 39% of all streams in the non-Coastal Plain of the three
states would be classified as degraded for 1997-2003. Applying a threshold of
degradation derived from higher quality reference sites (e.g., those including West
Virginia's BPJ exclusions) would increase the proportion of streams designated as
degraded. We conclude that similar integrations at the level of stream condition
assessment will be possible even when data integration is problematic.
                                     HI

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
                                CONTENTS


            1: Introduction	1

            2: Summary of State Programs	3
           2.1: Maryland
           2.2: Virginia
           2.3: West Virginia


            3: Comparison of Sample Frames, Survey Designs,
               and Data Collection 	10

            4: Comparison of Indicators and Reference Conditions	11

            5: Integration of Assessments from the Three States	20

    Section 6: Discussion and Recommendations	23

            7: Literature  Cited	26



                                 FIGURES

    Figure 1: Calculation of SCI scores	8

    Figure 2: Venn diagram of reference     selected from
              Maryland     using the Virginia, West Virginia, and
              Maryland reference criteria	16

    Figure 3: Distribution of SCI scores for reference sites selected
              from Maryland data using Virginia, West Virginia, and
              Maryland methods	17

    Figure 4: Cumulative distribution of Maryland benthic IBI
              scores and Virginia and West Virginia SCI scores for
              Maryland streams	19

    Figure 5: Cumulative distribution of SCI scores for Maryland,
              Virginia, and West Virginia, and all three states
              combined	20

    Figure 6: Distribution of SCI scores for West Virginia reference
              sites selected using all criteria	21

           7: Cumulative distribution of SCI scores for all three
                    combined at two different thresholds	22
                                     IV

-------
                                   Proof of Concept for Integrating Bioassessment
                        Results from Three State Probabilistic Monitoring Programs

Table 1: Comparison of sample frames and survey designs used by
stream monitoring programs in Maryland, Virginia, and West Virginia...
Table 2: Comparison of benthic sampling methods used by stream
monitoring programs in Maryland, Virginia, and West Virginia	
Table 3: Metrics used by each state to create benthic
macro in vertebrate IBl or SCI scores	
Table 4: Criteria used by Maryland, Virginia, and West Virginia to
select reference      for multimetric indices of biological condition
based on benthic macroinvertebrates	12


Table 5: Total Habitat score calculation used by Virginia and West
Virginia for high and low gradient streams, along with the surrogate
applied to Maryland	14

-------

-------
                                   Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
1. Introduction

The U.S. Environmental Protection Agency (EPA) is evaluating methods
for developing a national assessment of stream conditions. One method for
completing such an assessment is to conduct a regional survey of wadeable
streams using a probability-based design and standardized sampling protocols
such as the Environmental Monitoring and Assessment Program  (U.S. EPA
2002a). However, such an effort would be costly and partially redundant with the
stream assessment programs that individual states now conduct to characterize
water quality (U.S. EPA 2002b). If the methods and results from  these state
programs are similar enough, EPA may be able to use data already collected by
states to assess stream conditions over larger, multi-state regions.

Stream assessment programs from individual states must meet several
requirements before they can be used for larger scale assessments. First, each
program must address comparable sample frames, i.e., each state must sample a
stream network with a comparable range of stream sizes and types, over a similar
time period. Second, each state program must have probability-based survey
design that allows unbiased area-wide estimates of stream condition to be made
with quantifiable precision. Third, each state must have reliable field collection
techniques, laboratory protocols, and quality assurance and control procedures that
ensure accurate data. Ideally, these methods will have documented performance
characteristics (U.S. EPA 2000, NWQMC 2001). Lastly, each state program must
have data to support a single metric, set of metrics, or model that can be used to
evaluate stream condition (i.e., an assessment endpoint or indicator). Multimetric
indices of biological assemblages that characterize stream condition as a single
value (Karr 1991, Barbour et al. 1995) are easily interpreted and  used by almost
every state (U.S. EPA 2002b).

If the above requirements are met, it is still likely that most state programs will
differ significantly in the field collection protocols and indicators they use. These
differences can  prevent directly integrating data into a consolidated data set, but
they are unlikely to preclude combining results at the  level of stream condition
assessment. That is, states may share assessment comparability but not data
comparability. For example, two states may employ different sampling gear so
that different invertebrate taxa are targeted (e.g., different numbers of mayflies
would be collected by each method at the same site), while the indicators used by
each state would rate the site in the same condition. This is the virtue of using the
deviation from reference condition to rate streams (i.e., indicators with different
scores at the same site will be similar distances from reference scores developed
for each indicator). Also, it is likely that states will have different survey designs;
however, if each is probability-based, the differences will not affect integration at
the assessment level, as each state is a de  facto stratum that can be combined into
a single estimate of the proportion of degraded stream miles.

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     Little previous work has been done to integrate results of state stream assessment
     programs over larger regions because few programs have conducted probabilistic
     surveys that meet the requirements for integration described above. However,
     Maryland, Virginia, and West Virginia each conducted probability-based surveys
     of wadeable streams that used benthic macroinvertebrate multimetric indices
     to evaluate stream condition statewide over 4- or 5-year periods from 1997 to
     2004. Here we assess the feasibility of integrating results from these programs for
     the non-Coastal Plain (Piedmont and Highland) regions of these states, and we
     describe the steps necessary to achieve a regional assessment of stream condition.
     Specifically, we

         • Compared the sample frames and survey designs of the
          three states;
         • Compared the benthic macroinvertebrate sampling
          methods of the three states;
         * Compared the construction and scoring of the multimetric
          indices of the three states (by applying all three indices to
          Maryland 2000-2004 data);
         • Compared the criteria used by the three states to select
          reference sites and the distribution of index scores for each
          state's reference sites (again, applied to Maryland data);
         « Combined the site results from all three states using
          comparable index scores to produce a regional (non-
          Coastal Plain) cumulative distribution of scores; and,
         * Applied example thresholds of degradation to estimate the
          regional proportion of stream miles rated as 'good,' 'fair,'
          or 'poor' (based on different assumptions about reference
          condition).

-------
                                    Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
2. Summary of State Programs

The relevant components of the stream assessment programs of Maryland,
Virginia, and West Virginia are described below. Note that this section describes
the three state programs when this study was initiated. All three programs continue
to evolve and have already incorporated refinements that are not captured here.
Therefore, this study should be viewed as a demonstration of integration principles
and not a critique of individual state programs. A comparison of components from
each state program is provided for (1) sample frames and survey designs (Table 1),
(2) benthic macroinvertebrate sampling methods (Table 2), and (3) metrics used in
each state's multimetric index (Table 3).
 Table 1. Comparison of sample frames and survey designs used by stream
 monitoring programs in Maryland, Virginia, and West Virginia.
Program
Components
Sample
Frame
Sample Unit
Survey
Design
Survey
Density

STATE
Maryland
(2000-2004)
U.S. Geological Survey
1:1 00, 000 stream
network
1st through 4th order1
streams
75-m reach
Probabilistic
(Lattice sampling)
Target minimum of 10
sites per PSD3 plus 3-1 1
additional samples for
PSUs with more than
100 stream miles
Approximately 300 sites
per year
1 ,500 sites statewide
Same 25 sentinel sites
sampled each year
Virginia
(2000-2003)
U.S. EPA RF3
reach file 1:100,000
stream network
1st through 6th order
streams
30 to 400-m reaches
Probabilistic
(GRTS2 design)
Target 60 sites per
stratum over 5 years
with a minimum of
50 sites per year
250 sites statewide
One site per region
chosen randomly
from all sites
sampled to date and
revisited per year
West Virginia
(1997-2001)
U.S. EPA RF3
reach file
1:1 00, 000 stream
network
1st through 5th order
streams
1 00-m reach
Probabilistic
(GRTS design)
150 sites per year
750 sites statewide
No revisits
 1 All states use Stabler (1957) stream order classifications
 " Generalized random tessellation stratified (Stevens 1997, Stevens and Olsen 2004)
 J Primary sampling units

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     Table 2.  Comparison of benthic sampling methods used by stream monitoring
     programs in Maryland, Virginia, and West Virginia.

Benthic Field
Sampling
Field QA
Benthic
Habitat
Sampled
Index Period
for Benthos
Laboratory
Methods
Laboratory
QA
Benthic
Indicator
MD
600-(j,m, 0.3-m D-frame
net
20 jabs of 1 ft2
Duplicate samples at
12 to 15 sites per year
(7% of all sites)
Multi-habitat
Primarily riffles but
also rootwads/woody
debris/leaf packs,
macrophytes, and
undercut banks
Approximately March 1
to May 1
Random subsample
of approximately 1 00
organisms based on
grid cells
Identification to genus
or lowest practical
taxon (ehironomids/
oligochaetes to family)
Resarnple and
identification of every
20th sample (7%)
Benthic Index of Biotic
Integrity on 1 to 5 scale
VA
(SCI non-Coastal Plain)
600-fim, 0.3-m D-frame
net or 2 m2 kick net1
Approximately 2 m2total
Duplicated 10% of
probabilistic sites during
2001-20042
Single or Multi-habitat
One to three kicks
per riffle, multi-habitat
samples when riffles were
rare
Samples from
downstream half of reach
Approximately March 1 to
May 1
Random subsample
of minimum of 100
organisms or 4 quadrats
based on grid cells
(2-inch square grids in a
50-quadrat box)
Identification to family
level
No resampling
Virginia Stream Condition
Index on 0-100 point
scale
WV
(SCI)
600-(j,m, 0.3-m
D-frame net or 2 m2
kick net
Approximately 2 m2
total, eight 0.25 m
individual kicks
Duplicate samples
at 12 to 15 sites per
year
Single or Multi-
habitat
Mid-April through
October
Random subsample
of approximately 200
organisms based on
grid cells
Identification to
family level
Resample and
identification of 5%
of samples
West Virginia Stream
Condition Index on
0-100 point scale
      1 Depending on the amount of riffle habitat available (Barbour et al. 1999)

      " Jason Hill. Virginia Department of Environmental Quality, personal communication

-------
                                      Proof of Concept for Integrating Bioassessment
                          Results from Three State Probabilistic Monitoring Programs
Table 3. Metrics used by each state to create benthic macroinvertebrate 1BI or SCI
scores. Lines beginning with a"%" symbol indicate the percentage of that taxon in
the total sample. Complete descriptions of the Maryland B-BIBIs, Virginia SCI, and
West Virginia SCI  are reported  in Southerland et al. (2005), Burton and Gerritsen
(2003), and Tetra Tech, Inc. (2000), respectively.
Metric Type
Taxonomic
Richness

Taxonomic
Composition
Tolerance


MD (B-IBIs for
Highlands and
Eastern Piedmont)1
Number of taxa (genera)
Number of EPT2 taxa
Number of
Ephemeropteran taxa
% Chironomidae
% Clingers
%Tanytarsini
% Scrapers
% Swimmers
% Diptera
% Intolerant to urban
stressors
% Tolerant taxa
% Collectors
VA (SCI for non-Coastal
Plain)
Number of taxa (families)
Number EPT families

% Ephemeroptera
% Plecoptera +
Trichoptera +
Hydropsychidae
% Chironomidae
% Top 2 Dominant Taxa
HBI3
% Scrapers

WV
(SCI)
Number of taxa
(families)
Number of EPT
families
%EPT
% Chironomidae
% Top 2 dominant
taxa
HBI


  1  List of 10 metrics includes those found in either the Highlands B-IBI (8 metrics) or Eastern
    Piedmont B-IBI (6 metrics)

    Ephemeropteran. Plecopteran. or Tricopteran

    The Hilsenhoff Biotic Index (HBI) was defined as abundance-weighted average tolerance of
    assemblage of organisms (family taxonomic level)

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     2.1  Maryland

     The Maryland Biological Stream Survey (MBSS) is a long-term program
     conducted by the Maryland Department of Natural Resources to assess the
     condition of the state's nontidal, freshwater streams (Klauda et al. 1998). Benthic
     macroinvertebrates are collected during spring each year as part of a larger
     sampling effort and used to calculate a Benthic Index of Biotic Integrity (B-
     IBI) for Maryland  streams. The MBSS has completed two rounds of statewide
     sampling; the first  was conducted in 1995-1997 and the second in 2000-2004. In
     this study, we used data collected at 596 randomly selected, non-Coastal Plain
     sites sampled in 2000-2004 to assess stream condition. We also used data from
     144 reference sites sampled in 1995-2004.

     Sampling sites for  the second round of the MBSS were selected from a 1:100,000-
     scale stream network using a lattice sampling design (see Cochran 1977) to select
     watersheds randomly in time and space. Eighty-four primary sampling units
     (PSUs) consisting  of one or more Maryland 8-digit watersheds were sampled
     over 5 years (Roth et al.  2005). It is worth noting that Maryland's 138 8-digit
     watersheds, averaging 75 m2, are different from the 20 U.S. Geological Survey
     (LJSGS) 8-digit cataloging units in Maryland which average 500 m2 (Roth et al.
     2002). In principle, the survey design supports the use of the Sen-Yates-Grundy
     variance estimator of statewide mean stream condition because the selection
     probability of any  stream segment, and the joint selection probabilities of any pair
     of segments for the entire round, is known and greater than zero. Seventeen PSUs
     were sampled per year, and two randomly selected PSUs were sampled twice
     during the 5 years. Each PSU had a target minimum of 10 sites sampled, with
     an additional 3 to 1I sample sites allocated for PSUs with  more than 100 stream
     miles. Streams were stratified into two groups within a PSU, 1s - or 2" -order
     streams (Strahler 1957) and 3r - or 4 -order streams, unless a stratum would have
     contained less than 10% of the stream miles in the PSU. In that case, sites were
     selected within the PSU using simple random sampling. The samples within each
     PSU were allocated proportionally to stream lengths in the strata, ensuring equal
     selection probability for all stream segments in the PSU. While sampling random
     PSUs twice in the lattice design provides temporal information, we pooled the
     samples for each PSU and analyzed the Maryland data using standard stratified
     random estimators (Cochran  1977) to simplify this analysis (see Berger 2004).

     Benthic macroinvertebrates were collected within 75-m sample segments during
     spring each year using 600-jim D-frame nets (Roth et al. 2005). Twenty kick
     net samples were taken at each site from riffle, rootwad, and leaf-pack habitats
     in approximate proportion to their abundance in the stream segment. For each
     segment, a random subsample of 100 organisms was identified to genus or
     the lowest practical taxon level. These data were used to calculate a B-IBI for
     Maryland streams.
                                         6

-------
                                   Proof of Concept for Integrating Bioassessment
                        Results from Three State Probabilistic Monitoring Programs
The B-IBI rates streams on a scale of 1 to 5 where scores of 4-5 represent good
condition, 3-3.9 represent fair, 2-2.9 represent poor, and 1-1.9 represent very poor.
In this study we used both the Highlands and Eastern Piedmont MBSS IBIs to
cover the non-Coastal Plain region of interest (Southerland et al. 2005). These
B-IBIs include 8 and 6 metrics, respectively, related to the number or percentage
of different invertebrate taxa in a sample (Table 3). Each metric was rated as a 1,
3, or 5 depending on how it compared to the distribution of scores from a set of
reference sites; these reference sites were selected from the entire MBSS dataset
using criteria for sites minimally affected by human activities and representative
of Maryland non-Coastal Plain streams. Each metric was scored as a 1 if its value
was less than the 10th percentile of the reference values, as a 3 if it was in the 10th
to 49th percentiles, and as a 5 if it was equal to or greater than the 50th percentile.
Metrics that were expected to increase with stream degradation were scored
conversely. The B-IBIs were calculated as the average  of the metric  scores. A
value of 3 is also used as the threshold of degradation for the B-IBI; a B-IBI of 3
corresponds to the 9th percentile of Maryland reference sites.

2.2  Virginia

The Virginia Department of Environmental Quality (VDEQ) biomonitoring and
assessment program samples fixed and randomly  chosen monitoring sites to
meet state and federal water quality monitoring requirements. We analyzed  180
randomly chosen Ist- to 6th-order, non-Coastal Plain streams sampled in 2001-2003
as part of the random portion of the survey known as ProbMon (VDEQ 2003).

The survey design used to select Virginia streams for sampling was a generalized
random tessellation stratified (GRTS) design (Stevens 1997, Stevens and Olsen
2004) chosen to ensure that a spatially balanced selection of streams was
achieved. This design selects sites randomly by Strahler stream order but assigns
a greater probability to selecting higher order streams. This ensures that higher
order streams are adequately represented in the sample despite constituting a
lower proportion of the total stream miles in the state. A target of 50 stream  sites
was selected from throughout the state each year using EPA Reach File 3 (RF3)
overlaid onto a l:100,000-scale topographic map.

Benthic macroinvertebrates were collected from stream reaches following EPA
Rapid Bioassessment Protocols (Barbour et al.  1999). Sample reaches had lengths
30 times the stream width to a maximum of 100 m for 3rd or lower-ordered streams
and 400 m for 4fll or higher-ordered streams. Invertebrates were collected using
600-um kick nets and sampling approximately 2 m2 of riffle substrate. If no  riffle
habitat was available, multi-habitat samples were collected with 600-(im D-frame
nets following Barbour et al. (1999). Either 100 organisms or 16 in2 (103-cm2)

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     of sample material spread out onto a 200-in2 (1290-cm2) sampling tray were
     enumerated and identified to family.

     Invertebrate data were used to calculate the Virginia SCI as described in Burton
     and Gerritsen (2003). The index consisted of" 8 metrics related to the abundance,
     species composition, and environmental tolerance of invertebrates collected
     (Table 3). Each metric was standardized to a 100-point scale where 0 represented
     the worst condition observed, and 100 the best. A score corresponded to its rank
     between  the 5th and 95fll percentiles in the distribution of all data collected (not just
     the reference sites). Extreme values below the 5th percentile or greater than the 95fll
     percentile were assigned 0 or 100, respectively (Figure 1). This practice eliminates
     the influence of outliers and reduces the effect that different datasets in the future
     will have on setting the  SCI scores. The SCI was calculated as the average of the
     8 metrics. The 10th percentile of reference sites (adjusted downward 5 SCI points
     for the variance in duplicate samples) is used to designate the SCI value as the
     threshold of degradation.
            O
            C
                                          of
     Figure 1. Calculation of SCI scores. All sites (not just reference) are used to determine
     the range of metric scores. Metric scores are converted to 0-100 scale. For Virginia, the
     5th percentile of worst conditions represented 0, and the 95th percentile represented 100.
     For West Virginia, the procedure was the same except that the worst value represented
     0 rather than the 5th percentile. The SCI score was calculated as the mean of the
     standardized metrics.

-------
                                   Proof of Concept for Integrating Bioassessment
                        Results from Three State Probabilistic Monitoring Programs
2.3 West Virginia

The methods applied by the Watershed Assessment Program of the West Virginia
Department of Environmental Protection (WVDEP) for assessing stream
condition were similar to those used by Virginia's program. Watersheds were
sampled on a rotating basis, completing a statewide survey during 1997-2001.
West Virginia used a GRTS survey design to sample streams, but sampled by
USGS 8-digit basins. The year of sampling was a stratum in this design and five
to seven basins were sampled each year. We used data collected at 716 randomly
selected sites from 1st- through 5th-order streams (WVDEP 2005). Invertebrate
data were collected according to EPA Rapid Bioassessment Protocols using
procedures that differed from those of Virginia by not including multihabitat
samples and by having a longer index period (April to October), as described by
Tetra Tech, Inc. (2000). Benthic macroinvertebrate data were used to calculate
the West Virginia SCI, which is similar to the Virginia SCI except that the index
consisted of two fewer metrics (Table 3). Note also that West Virginia scored
metrics on a scale of 1-100 based on the 0 to 95th percentiles (Figure  1) rather
than the 5th to  95th percentiles used by Virginia. The 5fll percentile of reference sites
(adjusted downward 7.4 SCI points for the variance in duplicate samples) is used
to designate the SCI value as the threshold of degradation.

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     3.  Comparison of Sample Frames, Survey Designs,     Data Collection

     The sample frames, survey designs, and data collection used by the three state
     programs were somewhat different, but comparable (Table 1). Most importantly,
     all three states used probabilistic survey designs that supported unbiased
     estimates of means, totals, and proportions statewide with quantifiable precision.
     Differences in sample frames were small. Maryland sampled no  5lh or 6fll order
     streams, but these higher order streams constituted small fractions of the total
     samples in Virginia (4%) and West Virginia (5%). In addition, the sample frames
     of all three states were limited to wadeable streams on a l:100,000-scale stream
     network, i.e., even the highest order streams sampled were wadeable and therefore
     not unusually large.

     The states used different stream segment lengths as sample units for field data
     collections. This difference was also expected to have little or no effect on
     comparability because the stream segment lengths sampled by Maryland (75 m)
     and West Virginia (100 m) were very similar, as were segments in the lower order
     streams in Virginia (based on 30 times stream width to a maximum of 100  m on
     1st- through 3rd-order streams). The vast majority of streams sampled in Virginia
     were of lower orders. Overall, the differences in sample frames of the three states
     were unlikely to cause large differences in assessment results, and did not preclude
     the calculation of an integrated estimate of stream condition.

     Benthic sampling methods were likewise similar among the three states (Table 2).
     All three states used frame nets to sample, and each focused on riffle habitat where
     the greatest diversity of invertebrates is expected (Barbour et al.  1999). Laboratory
     sorting procedures were generally similar, except that (1) Maryland sorted to
     a lower taxonomic level (genus) than the other states (which sorted to family)
     and (2) West Virginia sorted a larger sample (200 rather than 100 organisms). In
     an earlier Maryland study, V01stad et al. (2003) determined that 200-organism
     subsamples improved the precision of mean B-IBI scores only marginally over
     100-organism subsamples. West Virginia sampled during the summer in addition
     to spring but did not observe appreciable variation between the two seasons at the
     family level (Tetra Tech, Inc. 2000). All programs included documented quality
     assurance procedures.

     Based on comparability among sample frames,  survey designs, and data
     collection, the sampling programs of the three states were suitable for conducting
     an integrated assessment of stream condition.
                                         10

-------
                                   Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
4. Comparison of Indicators and Reference Conditions

As described earlier, the stream condition indices used by the three states were
developed independently. To ultimately integrate the results from the states,
we had to (1) obtain a common indicator of stream condition and (2) evaluate
its scores in the context of comparable reference conditions. First, we had to
determine if the SCIs from Virginia and West Virginia were directly comparable,
so that only the Maryland B-IBI (which was most dissimilar) needed to be
substituted  by an SCI. This would allow us to use the regional estimates already
calculated for each state in the final integration. To compare the SCIs, we needed
to use a single dataset, so we used Maryland data from 2000-2004 since they were
readily available to the investigators. We evaluated the two SCIs by calculating
both the Virginia and West Virginia SCI  scores for Maryland data, and comparing
them to each other. Specifically, the Maryland benthic macroinvertebrate data were
reduced from genus to family level identifications for each site because both SCIs
were based on family-level identification. The SCI scores were then calculated
both using the Virginia SCI (Burton and Gerritsen 2003) and the West Virginia
SCI  (Tetra Tech, Inc.  2000).

Second, we had to adjust for the different reference conditions used by the three
states, which we did by applying each state's reference criteria to Maryland
data. Each state used  a combination of criteria that measured water chemistry,
instream habitat, and land use to select reference sites that represented streams
with minimal human  disturbance (Tables 4 and 5). All three states used 13
selection criteria, although the specific criteria used varied by state. Maryland
was  the only state to use percent of forested cover in the watershed or remoteness
from human development explicitly, while Virginia and West Virginia used more
instream habitat variables than Maryland. Virginia was the only state to use total
phosphorus. West Virginia used conductivity and fecal coliform as secondary
criteria (i.e., best professional judgment was used to confirm their importance),
and applied best professional judgment on a site-by-site basis to identify sites with
additional human  disturbance.

It is important to remember that both the B-IBI and the SCIs rate stream condition
relative to reference conditions, but in different ways. In the Maryland B-IBI,
each metric is scored relative to the distribution of reference sites and the average
of all metric values is the B-IBI score (see Southerland et al. 2005).  Because a
metric value of 3 denotes departure from reference, a B-IBI of less than 3 indicates
degradation. Thirteen of the 144 Maryland reference sites have scores below 3,
which represents the 9th percentile of candidate reference sites. SCI scores are
calculated by Virginia and West Virginia based on the distribution of component
metric values at all sites sampled (not on reference sites alone as is done for the
Maryland B-IBI).  The threshold for rating streams as  degraded is then applied to
the SCI scores as a percentile of SCI scores at reference sites (1 Olh percentile of
                                    11

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
Table 4. Criteria used by Maryland, Virginia, and West Virginia to select reference sites for
multimetric indices of biological condition based on benthic macroinvertebrates. As part of this
study, Stream Condition Indices (SCIs) were calculated for Maryland data using the Virginia
and West Virginia methods. Where Maryland data included Virginia and West Virginia reference
criteria variables, these criteria were applied directly; in  other cases, substitute Maryland
variables were used as reference criteria (see two right  columns); note that a variable may have
substituted for more than one criterion.

Chemical Criteria

PH
DO
ANC
Nitrate
Conductivity
Fecal Coliform
TN
TP
Maryland*
>6
>4 ppm
> 50u eq/L
< 4.2 mg/L
Not used
Not used
Not used
Not used
Virginia*
6-9
> 6 mg/L
Not used
Not used
<250
umhos/cm
Not used
< 1 .5 mg/L
< 0,05 mg/L
West Virginia*
6-9
> 5 mg/L
Not used
Not used
< 500 umhos/
cm
[secondary
criterion]
< 800 colonies/
100 mi
[secondary
criterion]
Not used
Not used
MD substitute
for VA SCI
6-9
> 6 mg/L
-
-
< 250 umhos/
cm
< 1 ,5 mg/L
< 0.05 mg/L
MD substitute for
WVSCI
6-9
> 5 mg/L
-
-
< 500 umhos/cm
[applied directly
even though
secondary criterion
forWV]
< 5% urban
[based on strong
relationship with
urban land use]
   * For more information on each state's metrics and criteria, see:

    Maryland: Paul et al. 2002; Paul et al. 2003; Roth et al. 2005

    Virginia: Burton 2003; VDEQ 2003, 2005

    West Virginia: Tetra Tech 2000; West Virginia Department of Environmental Protection 2006
                                                                       (Continued on next page)
                                              12

-------
                                                 Proof of Concept for Integrating Bioassessment
                                    Results from Three State Probabilistic Monitoring Programs
Table 4. Continued

Habitat and Land Use Criteria

% Urban
Development
% Forested
Modified
Remoteness
Rating
Aesthetic/Trash
Rating
Instream Habitat
Rating
Anthropogenic
Activities/
Disturbances
Violations of State
WQ Standards
Non-point
Pollution
Epifaunal
Substrate Score
Channel Alteration
Score
Sediment
Deposition Score
Bank Disruptive
Pressure Score
Riparian Vegetated
Buffer Width
Score/m
Total Habitat Score
Point Source
Discharge
Other
Maryland
<5
>35
> 11
> 11
> 11
Not used
Not used
Not used
Not used
Not used
Not used
Not used
> 30m
Not used
No effluent
discharge
No
channelization
no storm drains
Virginia
<5
Not used
Not used
Not used
Not used
BPJ1
Not used
Not used
> 11
>11
> 11
> 11
> 11
> 140
Not used
Not used
West Virginia
Not used
Not used
Not used
Not used
Not used
BPJ1
No violations
None obvious
> 11
> 11
> 11
>6
>6
> 130
Not used
[except as part of
BPJ]
Not used
[except as part of
BPJ]
MD substitute
for VA SCI
<5
-
_
-
-
Aesthetics
rating >1 1
< 5% urban
~
~
Instream habitat
score >1 1
Channel
alteration score
>11 and no
channelization
Excluded sites
with extensive
bar formation
Converted bank
stability score
(>11)2
>11
(MD meters
converted to 0-20
score)
> 108
(scored as
described in
Table 5)
-
~
MD substitute for
WVSCI
-
-
-
-
-
Aesthetics
rating >1 1
< 5% urban
Nitrate < 4.2 mg/L,
ANC>50ueq/L,
and <5% urban
Nitrate <4,2 mg/L,
ANC > 50 u eq/L,
and < 5% urban
Instream habitat
score >1 1
Channel alteration
score >11 and no
channelization
Excluded sites
with extensive bar
formation
Converted bank
stability score
(>11)
>6
(MD meters
converted to 0-20
score)
> 108
(scored as described
in Table 5)
-
°
        Best professional judgment (BPJ) decisions by West Virginia were made in the field based on visual assessment
        informed by secondary criteria. Virginia used a less formal BPJ to eliminate sites with anthropogenic
        disturbance. Urban land use < 5% and Maryland aesthetics rating only partially capture this BPJ. See text for
        additional discussion.
        Paul et al. 2002
                                                  13

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     Table 5. Total Habitat score calculation used by Virginia and West Virginia for high and
     low gradient streams, along with the surrogate applied to Maryland. Substitutions for
     variables not measured in Maryland are described in the far right column.

Epifaunal Substrate
Score
Sediment Deposition
Score
Channel Flow
Channel Alteration
Source
Bank Disruptive
Pressure Score
Riparian Veg (buffer)
Zone Width Score
Vegetation Protection
Embeddedness
Velocity/Depth
Frequency of Riffles
Pool Substrate
Characterization
Pool Variability
Channel Sinuosity
Total Possible Points
MD High
and Low
Gradient
Streams
0-20 score
Not used
Not used
0-20 score
0-20 score
0-20 score
Not used
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
180
VA/WV
High
Gradient
Streams
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
Not used
Not used
Not used
200
VA/WV Low
Gradient
Streams
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
0-20 score
Not used
Not used
Not used
0-20 score
0-20 score
0-20 score
200
MD substitution
Same
Excluded MBSS sites with
extensive bar formation
-
Used a Tetra Tech conversion
(0-20)
Used Tetra Tech values to create
scores based on SCi method
Created a score based on SCI
scoring method
-
Created a score based on SCI
scoring method
-
Used MBSS riffle quality score
(0-20)
Used MBSS pool quality score
(0-20) in place of both pool
substrate and pool variability
Created a score based on SCI
scoring method
-
                                           14

-------
                                    Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
Virginia and 5th percentile for West Virginia, both adjusted downward to account
for variability in duplicate samples) that also denotes departure from reference
condition. In this way, the threshold of degradation for SCIs is applied to a
specific point on the distribution of reference sites. For this reason, differences
in reference criteria did not affect the SCI scores calculated by Virginia and West
Virginia, allowing us to address comparability of reference condition after the SCI
scores were calculated.

The Maryland data included sample values for all  chemical criteria used by
Virginia and West Virginia, but did not include all  habitat variables used by these
states as reference criteria. Several variables related to habitat were approximated
using similar characteristics collected by the MBSS. We could not construct a
surrogate for the BPJ decisions made by West Virginia (by definition), so they
were not included in the reference criteria for these analyses (resulting in more
reference sites being selected). We will return to the issue of including BPJ in
reference criteria later in this report. As described  above, rating stream condition
with the SCIs is based on applying threshold scores that are percentiles of
reference sites, e.g., the boundary between "not degraded" and "degraded" stream
conditions. A standard percentile may be used to set this boundary (e.g., 10th
percentile for Virginia and 5fll percentile for West Virginia) or a set index score
(based on average of reference-based metrics) that corresponds to a percentile
of reference may be used (e.g., 9th percentile for Maryland that corresponds to a
B-IBI of 3). All three states also use a confidence interval around the threshold
value for degradation at individual sites to make impairment decisions. These
confidence intervals are based on the variability in values from duplicate samples
within sites (i.e., the threshold for designating degradation is 5 to 8% lower
than the score corresponding to the percentiles listed above). For purposes of
illustration, we have used the exact percentile (e.g., 10fll) as the threshold for
designating streams as not degraded (those with scores above the percentile) and
degraded (those with scores below the percentile), without accounting for the
uncertainty associated with sample variability within or between sites.

We applied each state's reference criteria to the Maryland data,  resulting in three
different sets of reference sites for comparison (Figure 2). The reference sites
chosen from the Maryland data were most similar between the Virginia SCI
and West Virginia SCI (excluding BPJ) methods, with the 150 Virginia-method
reference sites forming a subset of the 209 West Virginia-method sites. This was
a result of the reference criteria being the same, except that the Virginia method
included total nitrogen and total phosphorus, as well as  slightly stricter criteria for
urban land use, bank disruptive pressure, and riparian buffer (Table 4). In contrast,
59 of the 144 reference sites (41%) selected using the Maryland B-IBI reference
selection method were not selected using the Virginia or West Virginia methods.
All but one of the 59 sites that met Maryland reference criteria—but were rejected
using the Virginia or West Virginia methods—did  not meet criteria related to
                                     15

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     instream physical habitat (specifically total habitat score, channel alteration, bank
     stability, or bar formation). Conversely, sites that were selected as reference using
     the Virginia or West Virginia methods—but were rejected using the Maryland
     method—did not meet Maryland criteria for remoteness (66 sites), percent of
     forested land cover (23), or riparian width (15). The reference sites selected by all
     three methods met the same chemical reference criteria.
                           Reference-Selection Method of:
                               Virginia (150 sites)
                           ^ West Virginia (209 sites)
                           — i
                           1   i Maryland (144 sites)
     Figure 2. Venn diagram of reference sites selected from Maryland data using the Virginia,
     West Virginia, and Maryland reference criteria (described in fable 4). Numerals indicate
     the number of sites selected by all methods within the overlapping region.

     Despite these differences in the specific reference sites selected by the Maryland
     method compared to the Virginia and West Virginia methods, the distributions
     of reference site SCI scores (using Maryland data) were similar for all three
     reference-selection methods (Figure  3). Nonparametric comparisons of the
     distributions of SCI scores for reference sites using the Kolmogorov-Smirnov test
     (Zar 1999) did not reveal significant  differences among the three sets of reference
     sites using either the Virginia (D = 1.21, P = 0.11) or West Virginia (D = 0.68, P
     = 0.75)  method for calculating SCIs. These data suggest that applying any of the
     three methods for selecting reference sites characterized the same range of stream
     conditions, even though different sites were selected (i.e., these different reference
     sites had the same  stream quality). Note again that these analyses did not include
     the BPJ reference decisions of West Virginia.
                                           16

-------
                                     Proof of Concept for Integrating Bioassessment
                          Results from Three State Probabilistic Monitoring Programs
                              20
                              15
                              10
                            m
                            "o

                            *S
                            "c
                      sc
                            01
                            0.
         E3
         n
         n
                              20
                              15
10
                         SC!
                                 22.5 32.5 42.5 52,5 62.5 72.5 82.5
                                              SCi
Figure 3. Distribution of SCI scores for reference sites selected from Maryland data using
Virginia, West Virginia, and Maryland methods.
                                      17

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     The SCI scores for each Maryland site were then used to generate cumulative
     distribution functions for the non-Coastal Plain region of Maryland (Figure 4).
     For each SCI score observed, we estimated the proportion of stream miles with
     that score by applying a 1 for streams with that score and a 0 for all other streams.
     The cumulative distribution was calculated by summing the proportion of streams
     with an equal or lower value for each SCI score. The standard error (SE) of all
     proportions was also estimated using stratified sampling estimators on the 1 or 0
     scores for each stream, as described above.

     The Virginia and West Virginia cumulative distribution curves of index scores on
     Maryland data were nearly identical (Figure 4) indicating that the SCI methods of
     these states characterized streams similarly. The correlation coefficient between
     Virginia and West Virginia SCI scores for Maryland data was r = 0.96. The
     correlation coefficient between West Virginia SCI and Maryland B-IBI scores for
     Maryland data was somewhat lower at r = 0.86.  Using the degradation thresholds
     of the 10th and 5th percentiles of reference sites for Virginia and West Virginia,
     respectively, the two SCI methods rated streams similarly, suggesting that about
     40% of Maryland  streams were degraded. Although the Maryland B-IBI was not
     directly comparable to the SCIs, the proportion of streams rated as degraded and
     corresponding to 9th percentile of reference sites (i.e., below an B-IBI score of 3)
     was 53%, roughly corresponding with the proportion rated as degraded by the
     SCIs.

     Because of the differences in the indicators and reference criteria used, the
     sampling results of the three states had to be adjusted before they could be
     combined into  an integrated assessment of stream condition. The differences in
     indicators were resolved by calculating the West Virginia SCI for Maryland data
     (eliminating the Maryland B-IBI) so that Maryland results could be combined
     directly with the West Virginia SCI for West Virginia data and the Virginia SCI
     for Virginia data (which was very similar to the West Virginia SCI and could
     be used in its original form). An additional adjustment was needed to address
     the different reference criteria among the three states. Specifically, comparable
     reference criteria had to be applied to data from  all three states and a threshold of
     degradation selected and its implications evaluated.

-------
                                  Proof of Concept for Integrating Bioassessment
                       Results from Three State Probabilistic Monitoring Programs
                    A, MD            by MD         IBI
                 100
                  80-

                  80-

                  40-

                  20-
o
o
05

I
3h«,
o
"5
cr
tu
                     1234
                                     IBI
                    B. MD            by VA SC!
                 100
                  80-

                  80-

                  40-

                  20-
                   0
                        Degraded (43%)
£
CO
£

2    C, MD            by      SC!
c 100'
5
                  eo-

                  40-

                  20-
                           Degraded (38%)
                     0
             20
40     60
SCI
80
100
Figure 4. Cumulative distribution of Maryland benthic IBI scores (A), Virginia SCI
scores (B), and West Virginia SCI scores (C) for Maryland streams randomly sampled
during 2000-2003.
                                   19

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     5. Integration of Assessments from the Three States

     Given the comparability among the three state's methods (excluding the West
     Virginia BPJ reference decisions), we proceeded to integrate the results into
     a single estimate of stream condition using stratified sampling estimators that
     weighted each state estimate by the proportion of total stream miles contributed
     by each to the combined non-Coastal Plain region of the three states. The non-
     Coastal Plain stream  miles for each state were as follows: 5,946 miles (7.2% of
     the three state total) in Maryland; 47,920 miles (58.2%) in Virginia; and 28,510
     miles (34.6%) in West Virginia. Because the SCI methods produced such similar
     results, we compared the Virginia and West Virginia SCI distributions directly.
     For Maryland, we arbitrarily decided to apply the West Virginia SCI method to
     combine data. The combined cumulative distributions of scores from each state's
     sample of sites were  calculated as described above (Figure 5).
            A,        SCI      |WV SCI)
         100
               B.       SCI
          80-
        O
       m
        I
        o
        o 20-
        15
        3
        cr  ft
        LU  u
                             Degraded (37%)
             100
             80-
          8
          8  80 -I
          w
          o  20-
          «
          3
          IM   0
                 20    40    80
                      SCI
             .            SCI
80   100
          <
          .c
0
20
40   60
SCI
80   100
               D.         SCI
                                              100
                                            I 801
                                           m
                                              §0-



                                              20-

                                               0

                 20   40    60
                      SCI
80   100
0
                     20
     40    60
     SCI
           80
     100
     Figure 5. Cumulative distribution of SCI scores for Maryland (using WV SCI method),
     Virginia, West Virginia, and all three states combined. Thresholds for categorizing
     "degraded" condition correspond to the 10th percentile of the distribution of reference
     SCI scores.
                                          20

-------
                                     Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
Because the distributions of reference SCI scores were similar among states
(Figure 3), we based our degradation thresholds for the assessment on percentiles
taken from the combined set of reference sites from all three states (including
reference sites that would have been eliminated by West Virginia BPJ).  Using the
10th percentile of reference sites as the threshold of degradation, the cumulative
distribution across states (Figure 5D) rated about 39% (SE= 4%) of the non-
Coastal Plain streams in Maryland, Virginia, and West Virginia as degraded.
Virginia had the greatest estimated proportion of degraded streams (63%, SE=8%;
Figure 5) and West Virginia the least (14%, SE=1%). Maryland streams were
intermediate, exhibiting an estimated 37% (SE=2%) degraded streams.  Figure 5
illustrates how all three states can be combined using a single reference condition
and threshold of degradation. Because West Virginia actually uses a stricter
reference condition that includes site-by-site BPJ, the proportion of West Virginia
streams rated as degraded are much higher when the reference sites including BPJ
exclusions are used. Including BPJ reduced the number of reference sites from
349 to 216 and shifted the distribution of reference sites SCI scores toward higher
values (Figure 6). This shift effectively raises the threshold that streams have to
meet to be considered non-degraded. This analysis indicates that West Virginia
has far fewer degraded streams than in Maryland or Virginia, a result that was not
apparent using each state's independent assessments.
            CO
           i
           CO
            CD
            O
            C
           &
           •s
           .1
           tf
           o
           Q.
           g
           DL
0.50-i

0.45-

0.40

0.35

0.30-

0.25-

0.20

0.15-

0.10-

0.05

0.00
                     • Excluding BPJ-eliminated sites, N = 216
                     D Including BPJ-eliminated sites, N = 349
                     15   25   35   45   55   65   75   85
                              SCI      (midpoint of group)
                                               95
  Figure 6. Distribution of SCI scores for West Virginia reference sites selected using all
  criteria including best professional judgment (BPJ) of anthropogenic disturbance (more
  restrictive; i.e., fewer sites), and all  criteria except BPJ (less restrictive; i.e., more sites).
                                     21

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     These higher degradation thresholds derived from the distribution of West Virginia
     reference sites selected using the additional BPJ decisions cannot be used in the
     combined non-Coastal Plain region of the three states, because these site-by-site
     decisions cannot be applied to the data from Maryland and Virginia. However,
     to illustrate the effect of using degradation thresholds based on higher quality
     reference sites (e.g., West Virginia's), we calculated the proportion of stream miles
     in the combined non-Coastal Plain region based on the 47th percentile of reference
     sites. This percentile was selected because the 47th percentile of West Virginia
     reference sites without BPJ exclusions corresponds to the 10fll percentile of West
     Virginia reference sites using the BPJ exclusions. Therefore, the 47lh percentile
     simulates the degradation threshold that would have been obtained if all three
     states used comparable BPJ exclusions. This higher threshold results in 64% of
     stream miles being rated as degraded (Figure 7). This difference in assessments
     indicates the importance of selecting a reference condition appropriate for water
     resource management goals.
            A. Combined SCI
          100
P 8(
E—«. 01
  «
E 8
  fsu en -
  *™ o\/

"51 40-
           20-
        1
                                          B. Combined SCI Score     Threshold
                                       • lOO'
                                            is
                                            El
                                            f.S
                                                80-
        "S| 40-
        |f
        Si 20-I
                                                 0
                                                                       Degraded
                                                                           (64%)
                  20
                 40    80
                 SCI
80   100
20
40    iO
SCI
SO   100
     Figure 7. Cumulative distribution of SCI scores for all three states combined (Maryland
     using WV SCI method, Virginia, West Virginia) at two different thresholds for categorizing
     stream condition based on percentiles of reference sites. Left figure shows "degraded"
     condition corresponding to the 10th percentile of the distribution of reference SCI scores
     (no BPJ). Right figure corresponds to the 47th percentile of references scores (no BPJ) to
     simulate the effect that West Virginia BPJ would have had if applied to all states.
                                           22

-------
                                    Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
6. Discussion and Recommendations

This study demonstrates that state stream assessment programs can be integrated
over larger regions if all states have similar sample frames, probabilistic survey
designs, and comparable indicators (though differences in reference condition
must be adjusted for). This assessment integration was possible even though
differences in benthic macroinvertebrate collection procedures (related to
sampling gear, habitats sampled, and level of taxonomic identification) precluded
directly combining site data.

If the sample frames are different among the states, the integration must be
restricted to the overlapping stream scale so that the population of interest is
the same. In this study all three states used a l:100,000-scale stream network.
Differences in survey  designs (if all designs are probability-based) do not affect
integration, as each state is a de facto stratum that can be combined into an overall
estimate.

The most challenging aspect of assessment integration is reconciling different
indicators of stream condition. While Virginia and West Virginia used similar SCIs
(metric scores based on the range of values at all sample sites and degradation
thresholds applied as percentiles of reference SCI scores), Maryland used
a conceptually different B-IBI (metric scores assigned relative to reference
condition and averaged to produce B-IBI scores). We  determined that the
distribution of site scores (using Maryland data only) was very similar for the
Virginia and West Virginia SCIs, but more uniform (i.e., a flatter line; see Figure
4) for the Maryland B-IBI, indicating the scores "stretched" more evenly across
the full range of sites. We also noted that the wider distribution of site scores in
the West Virginia data stretched the SCI scores relative to the Virginia  SCIs as a
result of their wider range of scores at all sites. These  differences in indicators
have resulted from independent indicator development in each state undertaken
to address each state's management objectives. Each indicator is based on sound
principles and serves its state's needs well. At the same time, such differences
create problems for integration. While the B-IBI may  perform better in Maryland,
integration required that one of the SCIs be substituted for the B-IBI so that
similar indicators could be used in all states. We calculated SCI scores (using the
West Virginia method) on Maryland data for the final  assessment integration.

In addition to calculating the same or similar indicators for all sites to be
integrated, a single reference condition must be used to set assessment thresholds.
Ideally, a single set of reference sites could be selected from all three states and
used to develop and rate the indicators. Such a project, however, is time and
resource intensive; another solution is to calibrate the  reference conditions of each
state on one set of sites (Maryland in this case). This requires a careful comparison
of reference criteria and use of surrogate variables where necessary.
                                    23

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     The West Virginia procedure for selecting reference sites included BPJ decisions
     that excluded additional sites with known human disturbances, and thus was
     more restrictive (i.e., fewer reference sites qualified). These BPJ decisions
     involved visual evaluations of candidate reference sites for signs of anthropogenic
     disturbances (e.g., surface mines) and nonpoint source pollution (e.g., livestock
     feedlots). In addition, BPJ was used to exclude sites with high conductivity and
     fecal coliform bacteria values when appropriate. These site-by-site decisions
     effectively set a higher standard of stream quality for West Virginia. These BPJ
     decisions could not be precisely defined (e.g., assigned standard values as is done
     with subjective habitat evaluations) and thus cannot (by definition) be replicated
     for other states. While including BPJ increases the confidence that West Virginia's
     reference sites are minimally disturbed (and may improve stream management), it
     is a barrier to integration.

     After excluding West Virginia's BPJ, we applied each state's reference criteria to
     Maryland data and still produced different suites of reference sites. However,  the
     distributions of scores from reference sites selected were similar, suggesting the
     different reference sites were of similar stream quality (comparably affected by
     human disturbance). Even though all reference criteria are incomplete surrogates
     of minimally disturbed condition, different reference criteria may be equally useful
     for selecting subsets of minimally disturbed streams.

     In contrast, inclusion of BPJ dramatically affected the assessment of stream
     condition, as expected, increasing the proportion of degraded streams in non-
     Coastal Plain region of Maryland, Virginia, and West Virginia, from 39% to
     64%. Again, this is a result of using higher quality reference sites that effectively
     raises the threshold as a higher percentile of the larger set of reference sites (i.e.,
     from 10th to 47th).  Therefore, it is critical that the proportion of degraded streams
     in the entire non-Coastal Plain region of Maryland, Virginia, and West Virginia,
     be determined using the same "yardstick" (i.e., similar SCIs with comparable
     reference conditions). Which yardstick is used depends on management objectives
     and the confidence that the reference sites are minimally disturbed.
                                          24

-------
                                    Proof of Concept for Integrating Bioassessment
                         Results from Three State Probabilistic Monitoring Programs
This proof of concept study leads us to make the following recommendations for
integrating stream assessment results among state programs:

          Sample frames must be comparable; if different map scales are used,
          potentially expensive geographic information system (GIS) analysis
          may be needed to determine the overlapping populations of streams.
       •   Different survey designs may be combined if they are probability-
          based, since individual states are strata in calculations of regional
          estimates.
          Results from different biological sampling procedures can be
          integrated if reference-based indicators are used to summarize the
          results.
          The ratings of stream condition will depend on how indicators are
          linked to reference condition, so a common reference condition
          ("yardstick") must be used to set thresholds of degradation.
       *   A common reference condition requires the application of objective
          criteria for which there are appropriate variables or surrogates for all
          sites, i.e., BPJ decisions that are not codified as standard values are not
          repeatable and cannot  be used in integration.
          Both (1) modifications to state programs to make them more
          comparable  and (2) the analyses to integrate results that are
          significantly different require staff and financial resources;  therefore,
          we  recommend that states collaborate early in their development of
          stream assessment programs to facilitate future integration.
                                    25

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     7.  Literature Cited


     Barbour, M.T., Genitsen, I, Snyder, B.D. and Stribling, J.B. 1999. "Rapid
           Bioassessment Protocols for using streams and wadeable rivers:
           periphyton, benthic macroinvertebrates and fish." Second Edition.
           EPA/841'-B-99-002. U.S. Environmental Protection Agency, Office of
           Water, Washington, B.C.
     Barbour, M.T., Stribling, J.B. and Karr, J.R. 1995. "Multimetric approach for
           establishing biocriteria and measuring biological condition." In W.S. Davis
           and T.P. Simon (eds.) Biological Assessment and Criteria: Tools for Water
           Resource Planning and Decision Making. Lewis, Boca Raton, FL. Pages
           63-77.
     Berger, Y. 2004. "A simple variance estimator for unequal probability sampling
           without replacement." Journal of Applied Statistics 31:305-315.
     Burton, J., and Gerritsen, J. 2003. A stream condition index for Virginia
           non-Coastal Plain streams. Prepared by Tetra Tech, Inc., for U.S.
           Environmental Protection Agency Office of Science and Technology,
           Office of Water, Washington, DC, U.S. EPA Region 3 Environmental
           Services Division,  Wheeling, and WV Virginia Department of
           Environmental Quality, Richmond, VA. Available atwww.deq.virginia.gov/
           watermonitoring/pdf/vastrmcon.pdf
     Cochran, W.G. 1977. Sampling Techniques. Third ed. John Wiley and Sons, New
           York, NY.
     Karr, J.R. 1991. "Biological integrity: Along-neglected aspect of water resource
           management." Ecological Applications 1:66-84.
     Klauda, R., Kazyak, P., Stranko, S., Southerland, M., Roth, N. and Chaillou, J.
           1998. "The Maryland Biological Stream Survey: A state agency program
           to assess the impact of anthropogenic stresses on stream habitat quality and
           biota." Third EMAP Symposium, Albany, NY. Environmental Monitoring
           and Assessment 51:299-316.
     National Water Quality Monitoring Council (NWQMC). 2001. "Towards a
           Definition of Performance-Based  Laboratory Methods." National Water
           Quality Monitoring Council Technical Report 01	02. U.S. Geological
           Survey, Reston, VA.
     Paul, M.J., Stribling, J.B.,  Klauda, R., Kazyak, P., Southerland, M. and Roth,
           N. 2002. A physical habitat index for freshwater wadeable streams in
           Maryland. Prepared by Tetra Tech, Inc., Owings Mills, MD for Maryland
           Department of Natural Resources.
                                        26

-------
                                   Proof of Concept for Integrating Bioassessment
                        Results from Three State Probabilistic Monitoring Programs
Paul, M.J., Stribling, J.B., Klauda, R., Kazyak, P., Southerland, M., and Roth,
       N. 2003. Further Development of a Physical Habitat Index for Maryland
       Wadeable Freshwater Streams. Report prepared by Versar, Inc., Columbia,
       MD; Tetra Tech, Inc., Owings Mills, MD; and Maryland Department of
       Natural Resources. CBWP-MANTA-EA-03-04.
Roth, N., V01stad, L, Erb, L. and Weber, E. 2005. Maryland Biological Stream,
       Survey 2000-2004 Volume 6: Laboratory, Field, and Analytical Methods.
       DNR 12-0305-0108. Maryland Department of Natural Resources,
       Monitoring and Non-tidal Assessment Division, Annapolis,  MD.
Roth, N.E., V01stad, J.H., Mercuric, G., and Southerland, M.T. 2002. Biological
       Indicator Variability and Stream. Monitoring Program Integration: A
       Maryland Case Study. Prepared by Versar, Inc., Columbia, MD, for U.S.
       Environmental Protection Agency,  Office of Environmental  Information
       and the Mid-Atlantic Integrated Assessment Program.
Southerland, M., Rogers, G., Kline, M., Morgan, R., Boward, D., Kazyak, P. and
       Stranko, S. 2005. New Biological Indicators to Better Assess Maryland
       Streams (DRAFT). Prepared for Monitoring and Non-Tidal  Assessment
       Division, Maryland Department of Natural Resources, Annapolis, MD.
Stevens, D.L., Jr. 1997. "Variable density grid-based sampling designs for
       continuous spatial populations." Environmetrics 8:167-95.
Stevens, D.L., Jr. and Olsen, A.R. 2004. "Spatially balanced sampling of natural
       resources." Theory and Methods. Journal, of American Statistical
       Association 99:262-278.
Strahler, A.N. 1957. "Quantitative analysis of watershed geomorphology."
       Transactions of the American Geophysical Union 38:913-920.
Tetra Tech, Inc. 2000. A stream condition index for West Virginia wadeable
       streams. Prepared for U.S. EPA Region 3 Environmental Services
       Division, and U.S. EPA Office of Science and  Technology, Office of Water.
       Available at www.dep.state.wv.us//show_blob.cfm?id=536&name=WV-
       Index.pdf
WVDEP. 2006. Standard Operating Procedures. Watershed Branch. Jeffrey Bailey,
       primary author (Working Document).
U.S. Environmental Protection Agency (EPA). 2000. Guidance for the Data
       Quality Objectives Process (QA/G-4). EPA/600/R-96/055. Washington,
       DC.
U.S. EPA. 2002a. Research strategy, Environmental monitoring and assessment
       program. EPA 620-R/02-2002. Research Triangle Park, NC.
U.S. EPA. 2002b. "Summary of Biological Assessment Programs and Biocriteria
       Development for States, Tribes, Territories, and Interstate Commissions:
       Streams and Wadeable Rivers." EPA 822-R-02-048. Washington, DC.
                                   27

-------
Proof of Concept for Integrating Bioassessment
Results from Three State Probabilistic Monitoring Programs
     Virginia Department of Environmental Quality (VDEQ). 2003. "The quality of
           Virginia non-tidal streams: first year report." VDEQ Technical Bulletin
           WQA/2002-001. Richmond, VA. Available at http://www.deq.virginia.gov/
           water/probmon.pdf.
     VDEQ. 2005. "Using Probabilistic Monitoring Data to Validate the Virginia
           Stream Condition Index (DRAFT)." VDEQ Technical Bulletin
           WQA/2005-002. Richmond, VA. Available at http://www.deq.state.va.us/
           probmon/pdf/scival. pdf
     V01stad, J.H., Roth, N.E., Southerland, M.T. and Mercuric, G. 2003. "Pilot Study
           for Montgomery  County and Maryland DNR Data Integration: Comparison
           of Benthic Macroinvertebrate Sampling Protocols for Freshwater Streams."
           EPA/903/R-03/005. U.S. Environmental Protection Agency Region 3,
           Office of Environmental Information and Mid-Atlantic Integrated
           Assessment Program, Fort Meade, MD.
     West Virginia Department of Environmental Protection (WVDEP). 2005. West
           Virginia's enhanced water qualify monitoring strategy. Prepared by
           Watershed Branch, Division of Water and Waste Management. Available at
           www. dep. state. wv.us/Docs/8949_W.Va._Enhanced_Monitoring_Strategy.
           pdf
     WVDEP. 2006. Standard Operating Procedures. Watershed Branch. Jeffrey Bailey,
           primary author (Working Document)
     Zar, J.H. 1999. Biostatistical Analysis. 4th Edition. Prentice-Hall, Inc., Upper
           Saddle River, NJ.
                                        28

-------

-------
vvEPA

     Agencf
Please make all necessary changes on the below label,
detach or copy, and return to the address in the upper
left-hand corner.

If you do not wish to receive these reports, CHECK HERE|~I:
detach, or copy this cover, and return to the address in the
upper left-hand corner.
PRESORTED STANDARD
 POSTAGE & FEES PAID
        EPA
   PERMIT No. G-35
      Office of Environmental Information
      Mid-Atlantic Integrated Assessment
      Environmental Science Center
      701 Mapes Road
      Fort Meade, MD

      Official Business
      Penalty for Private Use
      $300
      EPA/903/R-05/003
      June 2006

-------