United States
Environmental Protection
Agency
Environmental Research
Laboratory
Corvallis, OR 97333
Research and Development
EPA/600/S3-86/019 April 1986
Project  Summary
A  Computerized  System  for  the
Evaluation  of  Aquatic  Habitats
Based  on Environmental
Requirements  and  Pollution
Tolerance Associations  of
Resident  Organisms
Clyde L Dawson and Ronald A. Hellenthal
  The Environmental Requirements and
Pollution Tolerance (ERAPT) system is a
computerized retrieval and analysis system
for environmental information on aquatic
organisms. It can be  used to predict
organism assemblages based on envir-
onmental conditions, to  describe en-
vironmental  characteristics  associated
with plants  and animals which inhabit
specific sites, and to compare site char-
acterizations with associated biological
communities for inconsistencies resulting
from organism misidentification or erron-
eous habitat information. It can also iden-
tify species that are particularly sensitive
to specific sets of environmental condi-
tions and can predict changes in species
composition that are likely to result from
environmental disturbance.
  The system has been developed for an
IBM 370/3033* computer system. It can
be used  interactively  to predict  and
evaluate aquatic habitats and organism
assemblages. ERAPT computerized data
bases contain 57,345 biological, physical,
chemical, and distributional characteristics
drawn from 883 sources for 1,691 species
of North American diatoms, blue-green
algae, the insect orders Ephemeroptera
(mayflies) and Plecoptera (stoneflies), the
•Mention of trademarks or commercial products does
not constitute endorsement or recommendation for
use.
dipteran family Chironomidae (midges),
and the fishes  of U.S. Environmental
Protection Agency Region 5 (North Cen-
tral U.S.).
  The system was tested by comparing its
prediction of environmental conditions
based on the dominant fish community
with environmental characteristics ob-
tained from aquatic habitat surveys for 64
sites on three river systems in Ohio. An
overall successful prediction rate of 95%
was achieved. Where disagreements be-
tween predicted and reported environmen-
tal characterizations occurred, they almost
always were among very similar tolerance
ranges or habitat types.
  This Project Summary was developed
byEPA's Environmental Research Labora-
tory, Corvallis, OR, to announce key find-
ings of the research project that is fully
documented in a separate report of the
same title (see Project Report ordering in-
formation at back).

Introduction
  Aquatic organisms  long have been
recognized as potentially useful indicators
of habitat conditions and water quality.
This is due to their ability to reflect con-
ditions through time, to demonstrate the
effects of disturbances after the environ-
ment has returned to apparently normal
physical and chemical  conditions, to

-------
integrate the effects of many different en-
vironmental factors and their interactions
simultaneously, and to  provide a living
context for considerations of environmen-
tal  quality. The  basic  requirements for
indicator organisms are that they must be
sensitive to environmental  changes and
have life spans and generation lengths
which are appropriate for use in environ-
mental assessment, long enough to reflect
intermittent or occasional disturbances,
and short enough to subject sensitive life
stages  to  adverse  environmental
conditions.
  Although organisms have been a funda-
mental  component  of  aquatic habitat
surveys, they play a relatively minor role
in aquatic assessment. One reason may be
the difficulties encountered  in comparing
and summarizing information obtained by
different researchers. This may be due to
variability in environmental requirements
and habitat associations for some species
making them unsuitable as environmental
indicators or the result of inconsistent
methods of data collection or analysis. A
second factor may be the diverse outlets
for publication of these data;  often appear-
ing  only in progress and summary reports
receiving little dissemination. Another fac-
tor  is the difficulty in developing quality
control procedures for organism identifica-
tions. This has resulted in great variation
in the quality and reliability of published
associations between organisms and spe-
cific environmental conditions.
  Environmental data and  reports vary
widely in  quality and reliability. Stand-
ardized methods for data acquisition and
analysis and quality control procedures are
reasonably well established  for most bio-
logical,  chemical, and physical environ-
mental data. However, the association of
specific organisms with these parameters
rests on the reliability of species identi-
fications performed  by  investigators of
varying backgrounds, experience, and ex-
pertise. At present, the primary quality
assurance mechanism for organism identi-
fications is the competence and dedica-
tion of  individual  researchers.  While
independent verification  of voucher
specimens by taxonomic experts is possi-
ble, this is a procedure which, even on a
small scale, is time consuming and costly
and, on a large scale, would significantly
exceed our present resource of taxonomic
experts.
  Publications providing summary infor-
mation on the environmental relationships
and tolerances of aquatic organisms also
may be useful in identifying potential in-
dicator organisms  and in establishing
tolerance levels of organisms to specific
environmental factors. However,  unless
the reference publication survey is ex-
tremely exhaustive or very selective, it may
be  difficult to establish  useful  trends
among the vast amounts of often  con-
tradictory information reviewed. Further-
more, the fixed format of printed reports
severely limits the ways in which their data
can be used. It is often difficult to deter-
mine which environmental requirements
are consistent within a genus or higher
taxonomic level and which environmental
inconsistencies at the specific level could
be  due to errors  in data collection  or
identification.
  A computerized storage, retrieval and
evaluation system for these data has been
developed to enhance their usefulness by
allowing queries based on  taxonomic
association, environmental parameters, or
a combination of these factors. Since the
environmental information for diverse tax-
onomic groups can be standardized, it is
possible to answer questions concerning
biological communities and their environ-
ments. A major application of this system
is the development of lists of taxa asso-
ciated  with specific environmental con-
ditions  which can serve as  reference
communities for environmental scientists.
This system also can be used as a device
for scrutinizing  environmental  data by
comparing  the  known  tolerances of
organisms reported in monitoring and im-
pact  studies  with the  physical  and
chemical characteristics of the habitats
from which they  were  collected.  Dis-
crepancies encountered in these compar-
isons are flagged as potential errors in data
collection  or  specimen  identification.
Another  quality  control device  possible
with these data is comparing the known
ecological  parameters associated with
combinations  of taxa  reported  to  be
collected together at individual sites. In-
consistencies  in the  environmental re-
quirements of these  taxa also can be
flagged as potential errors in identification.
  The Environmental Requirements and
Pollution Tolerance  (ERAPT)  information
retrieval and analysis system permits com-
puterized  prediction and  evaluation of
habitat  characteristics  and organism
assemblages. ERAPT system data bases
contain  57,660 parameter values repre-
senting environmental characterizations of
402 genera and 1,694 species of aquatic
organisms  in  the  diatoms,  blue-green
algae, the  insect order Ephemeroptera
and  Plecoptera,  the  dipteran  family
Chironomidae, and the fish species  of U.S.
Environmental  Protection Agency (EPA)
Region 5 (North Central U.S.).  The en-
vironmental characterizations used by the
ERAPT system have been derived  from
883  data  sources,  primarily  from the
published   literature  (Table   1).  This
represents a mean of 34 environmental
parameters from  six data sources per
species, with four species per  genus.
Information  for the  environmental
characterization of diatoms,  bluegreen
algae, chironomids, Ephemeroptera and
Plecoptera were obtained from the EPA
Environmental Monitoring  and Support
Laboratory, Cincinnati, Ohio. Characteriza-
tion of Region 5 fishes from primary and
secondary literature sources was done as
part of this project.

System Operation
  The ERAPT system consists of an in-
tegrated set of computer programs for the
storage, retrieval, and manipulation of en-
vironmental requirements and  pollution
tolerance   information  on   aquatic
organisms.  It has been developed for use
on an  IBM 370/3033 computer system at
the Univeristy of  Notre  Dame.  Data are
stored and  manipulated  as hierarchically
related environmental requirements and
pollution tolerance categories representing
tolerance ranges to specific pollutants or
environmental  conditions,  geographic
locations,  general or  specific habitat
characteristics, and periods of appearance,
emergence, or greatest  abundance. At
present the system uses 21  heading
categories  (stage, abundance,  pollution
tolerance, optimal growth period, repro-
ductive season,  reproductive   behavior,
feeding  behavior, hardness,  salinity,
nutrients, degradable organics, pH,  oxy-
gen,  temperature, turbidity,  current,
general habitat, specific habitat, bottom
type,  ecosystem  regions, political  geo-
graphy) divided into 303 specific param-
eters. These categories and parameters
were adapted from a set developed by the
Aquatic Biology Section, EPA Environmen-
tal  Monitoring and Support Laboratory
(Cincinnati) for use  with macroinverte-
brates and  diatoms, and were  expanded
for fish and other aquatic organisms  as
part of this project in collaboration  with
the staff of the EPA Corvallis Environmen-
tal  Research Laboratory.
  Data for ERAPT are encoded  on tabular
forms which can be produced by the
system. These forms contain boxes cor-
responding to specific parameters such as
stage, feeding behavior, or tolerance to en-
vironmental conditions for an  individua
species from a single reference source,
The ERAPT system reads  the tabulai
forms as digitized X-Y coordinate value:
corresponding to  each mark on a form.

-------
Table 1.    Summary totals for information in the ERAPT data bases.
          BLGR, Bluegreen algae; DIAT, Diatoms; CHIR, Chironomids;
          EPHE, Ephemeroptera; PLEC, Plecoptera; FISH, Fishes of EPA Region 5
            BLGR     DIAT      CHIR      EPHE     PLEC      FISH
                                 Total
Authors
Records
Genera
Species
Citations
Entries
420
221
50
161
2,289
4,364
48
342
50
295
2,726
5,510
34
306
74
232
529
8,156
200
459
60
399
1,207
6,806
125
450
86
364
1,767
7,685
56
446
82
243
1,000
25,139
883
2,224
402
1,694
9,518
57,660
  The digitized data are standardized for
the ERAPT system. During this process
both direct and hierarchical correspond-
ences between categories are established.
For example, one set of data entry forms
may contain the environmental category
Mesosaprobic,  whereas  another may
divide this category into  alpha and beta
ranges. During this step these parameters
are associated so that a parameter shown
at a lower  level  of the environmental
category hierarchy is included at all upper
levels. For example, the heading salinity is
divided into  11 categories with the cate-
gories Mesohalobous and Oligohalobous
further subdivided  into  two  and three
parameters,  respectively  (Figure 1). This
structure  permits the  ERAPT system to
store, manipulate  and  use environmental
information differing in precision without
sacrificing the most reliable data.
  The next procedure is the creation of a
searchable data base. At this point the
various components of the  system are
linked. The  taxonomic  categories  are
hierarchically connected in a matter similar
to that described  previously for environ-
mental  headings and parameters. This
enable queries at any level of the tax-
onomic hierarchy. The linking operation
involves the association of the taxonomic
names, environmental headings and pa-
rameters plus their definitions, author cita-
tions and references, environmental data,
and a series of 2 to 8-character abbrevi-
ations which are used in the query and
information retrieval process. The lowest
taxonomic level directly accessible by the
system is subspecies. However, data from
individual  sources are also available.
  The system uses two summaries of the
information contained in reference sources
for each taxon, a logical sum  and logical
product. The logical sum includes all en-
vironmental parameters  given by any
source for each taxon in the system. For
example, if one investigator reported that
a species  occurs in streams and another
reported that the same species occurs in
both lakes and streams, the logical sum for
general habitat category for that species
would include both lakes and streams. The
logical product includes only those param-
eters  indicated by all sources  containing
                                  Salinity
  40,000
                  Polyhalobous
                  SA.-POPOLY
                  Euhalobous
                  SA.-POEUHA
                  a-Mesohalobous
                  SA.-MEALPH
                  ^-Mesohalobous
                  Halophilous
                  Haline Indifferent
                 Halophobous
  Poly- or Euhalobous
  Marine Forms
  SA:POLEUH
   Mesohalobous
   Brackish-water forms
   SA.-MESOHA
   Oligohalobous
   Freshwater forms
   SA.-OLIGOH
11
I!
Figure  1.    Hierarchy diagram for salinity in mg/l.
environmental requirements or pollution
tolerance information within each heading
category for a  taxon.  Therefore,  each
product summary consists of those en-
vironmental parameters which are con-
sistently associated with a taxon in the
system. For the  general habitat example
given above, the  product summary would
show only lakes, since it was the only
parameter indicated by all sources of in-
formation for the taxon. Inconsistencies
among investigators reporting information
about any heading category for a taxon
may be obtained  by calculating the logical
difference  (exclusive OR)  between the
sum  and  product  summaries.  In the
general habitat example, streams would be
considered  an  inconsistent parameter
since one source used the parameter and
the other did not. For taxonomic  cate-
gories above species, similar data sum-
maries are maintained based on logical
sums of the information for all lower tax-
onomic levels. The system also maintains
the number of data sources which were
summarized for each taxon and environ-
mental category.
  Information retrieval and analysis are ac-
complished  through  an interactive pro-
gram which uses simple commands for
queries of the data base. These commands
may be used to predict groups of organ-
isms having a select set of environmental
characteristics. Species may be evaluated
for consistency of associated environmen-
tal information, providing lists of potential
indicators. Groups of organisms also may
be defined  using abbreviated taxonomic
names for characterization and evaluation.
Species or groups of species may be com-
pared to determine the similarities and
differences  in their   environmental  re-
quirements. Information may be selected
by taxonomic level, number of sources, or
groups of environmental parameters.
  The program provides information on
program  commands,  definitions of en-
vironmental retrieval  codes and a profile
of the searches performed during the
session. Individual search statements or
complete terminal sessions may be saved
and later recalled for use with data bases
containing information on other groups of
organisms. The  program can  retain  an
almost unlimited  number of saved
searches and can provide directories and
lists of these upon request. The program
will permit customization for a variety of
printing and nonprinting computer ter-
minals and can be directed to pause be-
tween output pages. Input may come from
sources other than a terminal to permit
other programs to interface directly with

-------
the system. Output may also be redirected
to files or to a high-speed lineprinter.

Validation Methods
  Survey  data for verification  of  the
ERAPT system were obtained from the
State of Ohio Environmental Protection
Agency for 64 sites  from  20  rivers,
streams, and tributaries in the Cross Creek
and Yellow Creek, Blanchard River,  and
Tuscarawas River systems in southeastern
and northwestern Ohio.  For each site a list
of  fish species and  abundances  and
descriptive information on the habitat was
provided. This included the dissolved ox-
ygen, temperature, pH, turbidity, bottom
type, and general  and  specific  habitat
characteristics of most sites, with current,
degradable organics, and pollution infor-
mation provided for some  of the sites.
  The five numerically dominant species
at each site were entered into the ERAPT
system and used to predict local environ-
mental  conditions. These species  ac-
counted for at least one half of the total
fishes collected and, therefore, were taken
to be representative of the environmental
conditions  present within the  habitat
sampled. The ERAPT system's compare
function was used to identify the environ-
mental characteristics common to each
fish community. This  information was
used for comparison with the environmen-
tal description of each  habitat.
  Habitat characterization information for
each collecting site was transformed into
ERAPT environmental  parameter  codes
directly from field survey forms.  These
data then were compared with the envir-
onmental descriptions  predicted by the
system from the associated dominant fish
community for each category of environ-
mental characteristic. The compared data
were considered to  be in agreement if they
shared common  environmental  param-
eters.  For  example, a general  habitat
predicted by the fish assemblage as lake
or  large  river  was considered to be in
agreement with a habitat characterized as
either large river or lake. Likewise, a com-
munity predicted bottom type of gravel
was considered to be in agreement with
a habitat characterized as gravel, sand,
and silt.

Validation Results
  In  comparing  300 environmental
characteristics obtained from field survey
forms to those predicted  by the ERAPT
system from the dominant fish communi-
ty,  285 showed agreement for an overall
successful prediction rate of 95% (Table
2).  The successful prediction rates for the
Cross Creek and Yellow Creek, Blanchard
River, and Tuscarawas River systems were
95%, 97%, and 93%, respectively. There
was 100% agreement between predicted
and observed  characterizations  for
temperature, current, and degradable or-
ganics  in all three river systems. Those
environmental parameters not incomplete
agreement  between observed and pre-
dicted  characterizations  were:  general
habitat (98%), dissolved oxygen (96%),
specific habitat and bottom type (95%),
and pH and turbidity (88%).
  While there were some disagreements
between predicted and reported environ-
mental  characterizations,  they almost
always were among very similar tolerance
ranges or  habitat types.  Most of  these
disagreements were consider to be minor
by the staff of the State of Ohio Environ-
mental Protection Agency considering the
precision of the original measurements. In
addition, only one site had discrepancies
between the observed and predicted char-
acterizations of  more than one kind of
environmental  parameter. This was site
number 1  of Yellow Creek where there
were  disagreements in  both specific
habitat and bottom type. Overall, ERAPT
system predictions of habitat character-
istics from the dominant fish assemblages
closely approximated the information ob-
tained from field surveys at all of the sites
evaluated.

Future Development
  Use   of   environmental  indicator
organisms on a regional basis appears to
be  particularly  promising.  Assessment
systems based on a sound knowledge of
a local fauna would seem to have a great
probability for  success. Development of
local or regional environmental data bases
is within the capabilities of the ERAPT
system. This will be greatly facilitated by
development of microprocessor-based ver-
sions of the retrieval and analysis pro-
grams. Microprocessor implementation of
the ERAPT system on computers such as
the IBM PC and the Apple Macintosh is
feasible and a  prototype microcomputer
version of the  retrieval and analysis pro-
gram currently is being developed.
  An interesting capability of the ERAPT
system is  the  potential to screen  envir-
onmental data. Detection of apparent in-
consistencies among the environmental
tolerances of different organisms said to
be collected together at  a given site can
be used both to flag errors in either data
collection or organism identifications and
to evaluate the reliability of environmental
data bases. Taxonomic experts could con-
firm the  identifications of  organisms
flagged as inconsistent, providing a cost-
effective means of evaluating the quality
of biological information. When new en-
vironmental associations of organisms are
found, the data bases can be updated.
  Since the ERAPT system is designed to
accept input from files, it could be inter-
faced with ongoing environmental data
collection projects to  summarize  and
evaluate information and to scan for pro-
bable changes in  environmental  condi-
tions. Interfacing programs could convert
numerical environmental data and codes
unique to local or regional data collection
projects so that they can be used with the
ERAPT  system. Output from the ERAPT
system also might be used by  other pro-
grams for environmental decision making
and evaluation. Sources of environmental
information within the data bases also can
be evaluated. It is possible to identify and
eliminate sources  that provide environ-
mental information which frequently con-
tradicts that supplied by other workers for
the same organisms.
  Because  the ERAPT programs are in-
dependent  of  the kinds of  data and
organisms which they can process, the
system has many applications in  other
areas  of biological  and  environmental
research.  Alternate  sets of  biological
categories  or  environmental parameters
can be  developed easily for terrestrial or
marine  environments  or  for  microbial
organisms  or  higher vertebrate groups.
The system also has direct application to
behavioral,   physiological, and  genetic
organism  information. Because of the
structure and internal organization of the
data used by the system, there is no limit
to the amount of information which can
be accumulated or queried for each kind
of organism contained within a  data base.
System capacities are related only to the
number of  kinds of organisms and data
categories  which can be accessed and
evaluated as  a group, and are easily
modified within programs. There is also no
limit to the number of groups which can
be developed. Programs permit transfer of
search  statements  between  organism
groups if they have compatible data.

-------
Table 2.    Results of ERAPT system validation study
                    Cross & Yellow
Tuscarawas
Blanchard
Total
River Systems
DO2mg/l
Temperature
PH
Current
Turbidity
General Habitat
Bottom Type
Organics
Specific Habitat
Pollution
No. of
Sites
13
18
16
2
22
34
33
—
31
—
Agreement
10O%
100%
81%
100%
82%
100%
97%
—
97%
—
No. of
Sites
5
6
4
3
2
11
10
3
10
2
Agreement
1OO%
100%
100%
100%
100%
91%
80%
100%
90%
100%
No. of
Sites
6
7
5
—
8
17
16
—
16
—
Agreement
83%
100%
100%
—
100%
100%
100%
—
94%
—
No. of
Sites
24
31
25
5
32
62
59
3
57
2
Agreement
96%
100%
88%
100%
88%
98%
95%
100%
95%
100%
   R. A. Hellenthal and C. L Dawson are with the University of Notre Dame, Notre
     Dame, IN 46556.
   Phil Larson is the EPA Project Officer (see below).
   The complete report, entitled "A Computerized System for  the  Evaluation of
     Aquatic Habitats Based on Environmental Requirements and Pollution Toler-
     ance Associations of Resident Organisms," (Order No. PB 86-167 343/AS;
     Cost: $16.95, subject to change) will be available only from:
          National Technical Information Service
          5285 Port Royal Road
          Springfield.  VA 22161
          Telephone: 703-487-4650
   The EPA Project Officer can be contacted at:
          Environmental Research Laboratory
          U.S. Environmental Protection Agency
          Corvallis,  OR 97333
                                                                               •&U.S. GOVERNMENT PRINTING OfFICE: 1986/646 116/20811

-------
United States
Environmental Protection
Agency
Official Business
Penalty for Private Use $300

EPA/600/S3-86/019
Center for Environmental Research
Information
Cincinnati OH 45268
  .£7^ U3.QFFiqALMA:.

';^v,,^ir-[WOSIAOf
          f ',-i'WT?; '
"s          'U5P. ;iCD*'   O  1 ^  " '
%   ,     / *    * i  ." "J  J £,   i
  - • .(.I It?- -• P n MFTF-t          ~ I
           OCOC329   PS

           U  S EWVIR FROTfCTIOM  AGENCY
           REGION 5  LIBRARY
           E30 S  DEARBORN  STREET
           CHICAGO             IL    60604

-------