United States
Environmental Protection
Agency
Environmental Research
Laboratory
Corvallis, OR 97333
Research and Development
EPA/600/S3-86/019 April 1986
Project Summary
A Computerized System for the
Evaluation of Aquatic Habitats
Based on Environmental
Requirements and Pollution
Tolerance Associations of
Resident Organisms
Clyde L Dawson and Ronald A. Hellenthal
The Environmental Requirements and
Pollution Tolerance (ERAPT) system is a
computerized retrieval and analysis system
for environmental information on aquatic
organisms. It can be used to predict
organism assemblages based on envir-
onmental conditions, to describe en-
vironmental characteristics associated
with plants and animals which inhabit
specific sites, and to compare site char-
acterizations with associated biological
communities for inconsistencies resulting
from organism misidentification or erron-
eous habitat information. It can also iden-
tify species that are particularly sensitive
to specific sets of environmental condi-
tions and can predict changes in species
composition that are likely to result from
environmental disturbance.
The system has been developed for an
IBM 370/3033* computer system. It can
be used interactively to predict and
evaluate aquatic habitats and organism
assemblages. ERAPT computerized data
bases contain 57,345 biological, physical,
chemical, and distributional characteristics
drawn from 883 sources for 1,691 species
of North American diatoms, blue-green
algae, the insect orders Ephemeroptera
(mayflies) and Plecoptera (stoneflies), the
Mention of trademarks or commercial products does
not constitute endorsement or recommendation for
use.
dipteran family Chironomidae (midges),
and the fishes of U.S. Environmental
Protection Agency Region 5 (North Cen-
tral U.S.).
The system was tested by comparing its
prediction of environmental conditions
based on the dominant fish community
with environmental characteristics ob-
tained from aquatic habitat surveys for 64
sites on three river systems in Ohio. An
overall successful prediction rate of 95%
was achieved. Where disagreements be-
tween predicted and reported environmen-
tal characterizations occurred, they almost
always were among very similar tolerance
ranges or habitat types.
This Project Summary was developed
byEPA's Environmental Research Labora-
tory, Corvallis, OR, to announce key find-
ings of the research project that is fully
documented in a separate report of the
same title (see Project Report ordering in-
formation at back).
Introduction
Aquatic organisms long have been
recognized as potentially useful indicators
of habitat conditions and water quality.
This is due to their ability to reflect con-
ditions through time, to demonstrate the
effects of disturbances after the environ-
ment has returned to apparently normal
physical and chemical conditions, to
-------
integrate the effects of many different en-
vironmental factors and their interactions
simultaneously, and to provide a living
context for considerations of environmen-
tal quality. The basic requirements for
indicator organisms are that they must be
sensitive to environmental changes and
have life spans and generation lengths
which are appropriate for use in environ-
mental assessment, long enough to reflect
intermittent or occasional disturbances,
and short enough to subject sensitive life
stages to adverse environmental
conditions.
Although organisms have been a funda-
mental component of aquatic habitat
surveys, they play a relatively minor role
in aquatic assessment. One reason may be
the difficulties encountered in comparing
and summarizing information obtained by
different researchers. This may be due to
variability in environmental requirements
and habitat associations for some species
making them unsuitable as environmental
indicators or the result of inconsistent
methods of data collection or analysis. A
second factor may be the diverse outlets
for publication of these data; often appear-
ing only in progress and summary reports
receiving little dissemination. Another fac-
tor is the difficulty in developing quality
control procedures for organism identifica-
tions. This has resulted in great variation
in the quality and reliability of published
associations between organisms and spe-
cific environmental conditions.
Environmental data and reports vary
widely in quality and reliability. Stand-
ardized methods for data acquisition and
analysis and quality control procedures are
reasonably well established for most bio-
logical, chemical, and physical environ-
mental data. However, the association of
specific organisms with these parameters
rests on the reliability of species identi-
fications performed by investigators of
varying backgrounds, experience, and ex-
pertise. At present, the primary quality
assurance mechanism for organism identi-
fications is the competence and dedica-
tion of individual researchers. While
independent verification of voucher
specimens by taxonomic experts is possi-
ble, this is a procedure which, even on a
small scale, is time consuming and costly
and, on a large scale, would significantly
exceed our present resource of taxonomic
experts.
Publications providing summary infor-
mation on the environmental relationships
and tolerances of aquatic organisms also
may be useful in identifying potential in-
dicator organisms and in establishing
tolerance levels of organisms to specific
environmental factors. However, unless
the reference publication survey is ex-
tremely exhaustive or very selective, it may
be difficult to establish useful trends
among the vast amounts of often con-
tradictory information reviewed. Further-
more, the fixed format of printed reports
severely limits the ways in which their data
can be used. It is often difficult to deter-
mine which environmental requirements
are consistent within a genus or higher
taxonomic level and which environmental
inconsistencies at the specific level could
be due to errors in data collection or
identification.
A computerized storage, retrieval and
evaluation system for these data has been
developed to enhance their usefulness by
allowing queries based on taxonomic
association, environmental parameters, or
a combination of these factors. Since the
environmental information for diverse tax-
onomic groups can be standardized, it is
possible to answer questions concerning
biological communities and their environ-
ments. A major application of this system
is the development of lists of taxa asso-
ciated with specific environmental con-
ditions which can serve as reference
communities for environmental scientists.
This system also can be used as a device
for scrutinizing environmental data by
comparing the known tolerances of
organisms reported in monitoring and im-
pact studies with the physical and
chemical characteristics of the habitats
from which they were collected. Dis-
crepancies encountered in these compar-
isons are flagged as potential errors in data
collection or specimen identification.
Another quality control device possible
with these data is comparing the known
ecological parameters associated with
combinations of taxa reported to be
collected together at individual sites. In-
consistencies in the environmental re-
quirements of these taxa also can be
flagged as potential errors in identification.
The Environmental Requirements and
Pollution Tolerance (ERAPT) information
retrieval and analysis system permits com-
puterized prediction and evaluation of
habitat characteristics and organism
assemblages. ERAPT system data bases
contain 57,660 parameter values repre-
senting environmental characterizations of
402 genera and 1,694 species of aquatic
organisms in the diatoms, blue-green
algae, the insect order Ephemeroptera
and Plecoptera, the dipteran family
Chironomidae, and the fish species of U.S.
Environmental Protection Agency (EPA)
Region 5 (North Central U.S.). The en-
vironmental characterizations used by the
ERAPT system have been derived from
883 data sources, primarily from the
published literature (Table 1). This
represents a mean of 34 environmental
parameters from six data sources per
species, with four species per genus.
Information for the environmental
characterization of diatoms, bluegreen
algae, chironomids, Ephemeroptera and
Plecoptera were obtained from the EPA
Environmental Monitoring and Support
Laboratory, Cincinnati, Ohio. Characteriza-
tion of Region 5 fishes from primary and
secondary literature sources was done as
part of this project.
System Operation
The ERAPT system consists of an in-
tegrated set of computer programs for the
storage, retrieval, and manipulation of en-
vironmental requirements and pollution
tolerance information on aquatic
organisms. It has been developed for use
on an IBM 370/3033 computer system at
the Univeristy of Notre Dame. Data are
stored and manipulated as hierarchically
related environmental requirements and
pollution tolerance categories representing
tolerance ranges to specific pollutants or
environmental conditions, geographic
locations, general or specific habitat
characteristics, and periods of appearance,
emergence, or greatest abundance. At
present the system uses 21 heading
categories (stage, abundance, pollution
tolerance, optimal growth period, repro-
ductive season, reproductive behavior,
feeding behavior, hardness, salinity,
nutrients, degradable organics, pH, oxy-
gen, temperature, turbidity, current,
general habitat, specific habitat, bottom
type, ecosystem regions, political geo-
graphy) divided into 303 specific param-
eters. These categories and parameters
were adapted from a set developed by the
Aquatic Biology Section, EPA Environmen-
tal Monitoring and Support Laboratory
(Cincinnati) for use with macroinverte-
brates and diatoms, and were expanded
for fish and other aquatic organisms as
part of this project in collaboration with
the staff of the EPA Corvallis Environmen-
tal Research Laboratory.
Data for ERAPT are encoded on tabular
forms which can be produced by the
system. These forms contain boxes cor-
responding to specific parameters such as
stage, feeding behavior, or tolerance to en-
vironmental conditions for an individua
species from a single reference source,
The ERAPT system reads the tabulai
forms as digitized X-Y coordinate value:
corresponding to each mark on a form.
-------
Table 1. Summary totals for information in the ERAPT data bases.
BLGR, Bluegreen algae; DIAT, Diatoms; CHIR, Chironomids;
EPHE, Ephemeroptera; PLEC, Plecoptera; FISH, Fishes of EPA Region 5
BLGR DIAT CHIR EPHE PLEC FISH
Total
Authors
Records
Genera
Species
Citations
Entries
420
221
50
161
2,289
4,364
48
342
50
295
2,726
5,510
34
306
74
232
529
8,156
200
459
60
399
1,207
6,806
125
450
86
364
1,767
7,685
56
446
82
243
1,000
25,139
883
2,224
402
1,694
9,518
57,660
The digitized data are standardized for
the ERAPT system. During this process
both direct and hierarchical correspond-
ences between categories are established.
For example, one set of data entry forms
may contain the environmental category
Mesosaprobic, whereas another may
divide this category into alpha and beta
ranges. During this step these parameters
are associated so that a parameter shown
at a lower level of the environmental
category hierarchy is included at all upper
levels. For example, the heading salinity is
divided into 11 categories with the cate-
gories Mesohalobous and Oligohalobous
further subdivided into two and three
parameters, respectively (Figure 1). This
structure permits the ERAPT system to
store, manipulate and use environmental
information differing in precision without
sacrificing the most reliable data.
The next procedure is the creation of a
searchable data base. At this point the
various components of the system are
linked. The taxonomic categories are
hierarchically connected in a matter similar
to that described previously for environ-
mental headings and parameters. This
enable queries at any level of the tax-
onomic hierarchy. The linking operation
involves the association of the taxonomic
names, environmental headings and pa-
rameters plus their definitions, author cita-
tions and references, environmental data,
and a series of 2 to 8-character abbrevi-
ations which are used in the query and
information retrieval process. The lowest
taxonomic level directly accessible by the
system is subspecies. However, data from
individual sources are also available.
The system uses two summaries of the
information contained in reference sources
for each taxon, a logical sum and logical
product. The logical sum includes all en-
vironmental parameters given by any
source for each taxon in the system. For
example, if one investigator reported that
a species occurs in streams and another
reported that the same species occurs in
both lakes and streams, the logical sum for
general habitat category for that species
would include both lakes and streams. The
logical product includes only those param-
eters indicated by all sources containing
Salinity
40,000
Polyhalobous
SA.-POPOLY
Euhalobous
SA.-POEUHA
a-Mesohalobous
SA.-MEALPH
^-Mesohalobous
Halophilous
Haline Indifferent
Halophobous
Poly- or Euhalobous
Marine Forms
SA:POLEUH
Mesohalobous
Brackish-water forms
SA.-MESOHA
Oligohalobous
Freshwater forms
SA.-OLIGOH
11
I!
Figure 1. Hierarchy diagram for salinity in mg/l.
environmental requirements or pollution
tolerance information within each heading
category for a taxon. Therefore, each
product summary consists of those en-
vironmental parameters which are con-
sistently associated with a taxon in the
system. For the general habitat example
given above, the product summary would
show only lakes, since it was the only
parameter indicated by all sources of in-
formation for the taxon. Inconsistencies
among investigators reporting information
about any heading category for a taxon
may be obtained by calculating the logical
difference (exclusive OR) between the
sum and product summaries. In the
general habitat example, streams would be
considered an inconsistent parameter
since one source used the parameter and
the other did not. For taxonomic cate-
gories above species, similar data sum-
maries are maintained based on logical
sums of the information for all lower tax-
onomic levels. The system also maintains
the number of data sources which were
summarized for each taxon and environ-
mental category.
Information retrieval and analysis are ac-
complished through an interactive pro-
gram which uses simple commands for
queries of the data base. These commands
may be used to predict groups of organ-
isms having a select set of environmental
characteristics. Species may be evaluated
for consistency of associated environmen-
tal information, providing lists of potential
indicators. Groups of organisms also may
be defined using abbreviated taxonomic
names for characterization and evaluation.
Species or groups of species may be com-
pared to determine the similarities and
differences in their environmental re-
quirements. Information may be selected
by taxonomic level, number of sources, or
groups of environmental parameters.
The program provides information on
program commands, definitions of en-
vironmental retrieval codes and a profile
of the searches performed during the
session. Individual search statements or
complete terminal sessions may be saved
and later recalled for use with data bases
containing information on other groups of
organisms. The program can retain an
almost unlimited number of saved
searches and can provide directories and
lists of these upon request. The program
will permit customization for a variety of
printing and nonprinting computer ter-
minals and can be directed to pause be-
tween output pages. Input may come from
sources other than a terminal to permit
other programs to interface directly with
-------
the system. Output may also be redirected
to files or to a high-speed lineprinter.
Validation Methods
Survey data for verification of the
ERAPT system were obtained from the
State of Ohio Environmental Protection
Agency for 64 sites from 20 rivers,
streams, and tributaries in the Cross Creek
and Yellow Creek, Blanchard River, and
Tuscarawas River systems in southeastern
and northwestern Ohio. For each site a list
of fish species and abundances and
descriptive information on the habitat was
provided. This included the dissolved ox-
ygen, temperature, pH, turbidity, bottom
type, and general and specific habitat
characteristics of most sites, with current,
degradable organics, and pollution infor-
mation provided for some of the sites.
The five numerically dominant species
at each site were entered into the ERAPT
system and used to predict local environ-
mental conditions. These species ac-
counted for at least one half of the total
fishes collected and, therefore, were taken
to be representative of the environmental
conditions present within the habitat
sampled. The ERAPT system's compare
function was used to identify the environ-
mental characteristics common to each
fish community. This information was
used for comparison with the environmen-
tal description of each habitat.
Habitat characterization information for
each collecting site was transformed into
ERAPT environmental parameter codes
directly from field survey forms. These
data then were compared with the envir-
onmental descriptions predicted by the
system from the associated dominant fish
community for each category of environ-
mental characteristic. The compared data
were considered to be in agreement if they
shared common environmental param-
eters. For example, a general habitat
predicted by the fish assemblage as lake
or large river was considered to be in
agreement with a habitat characterized as
either large river or lake. Likewise, a com-
munity predicted bottom type of gravel
was considered to be in agreement with
a habitat characterized as gravel, sand,
and silt.
Validation Results
In comparing 300 environmental
characteristics obtained from field survey
forms to those predicted by the ERAPT
system from the dominant fish communi-
ty, 285 showed agreement for an overall
successful prediction rate of 95% (Table
2). The successful prediction rates for the
Cross Creek and Yellow Creek, Blanchard
River, and Tuscarawas River systems were
95%, 97%, and 93%, respectively. There
was 100% agreement between predicted
and observed characterizations for
temperature, current, and degradable or-
ganics in all three river systems. Those
environmental parameters not incomplete
agreement between observed and pre-
dicted characterizations were: general
habitat (98%), dissolved oxygen (96%),
specific habitat and bottom type (95%),
and pH and turbidity (88%).
While there were some disagreements
between predicted and reported environ-
mental characterizations, they almost
always were among very similar tolerance
ranges or habitat types. Most of these
disagreements were consider to be minor
by the staff of the State of Ohio Environ-
mental Protection Agency considering the
precision of the original measurements. In
addition, only one site had discrepancies
between the observed and predicted char-
acterizations of more than one kind of
environmental parameter. This was site
number 1 of Yellow Creek where there
were disagreements in both specific
habitat and bottom type. Overall, ERAPT
system predictions of habitat character-
istics from the dominant fish assemblages
closely approximated the information ob-
tained from field surveys at all of the sites
evaluated.
Future Development
Use of environmental indicator
organisms on a regional basis appears to
be particularly promising. Assessment
systems based on a sound knowledge of
a local fauna would seem to have a great
probability for success. Development of
local or regional environmental data bases
is within the capabilities of the ERAPT
system. This will be greatly facilitated by
development of microprocessor-based ver-
sions of the retrieval and analysis pro-
grams. Microprocessor implementation of
the ERAPT system on computers such as
the IBM PC and the Apple Macintosh is
feasible and a prototype microcomputer
version of the retrieval and analysis pro-
gram currently is being developed.
An interesting capability of the ERAPT
system is the potential to screen envir-
onmental data. Detection of apparent in-
consistencies among the environmental
tolerances of different organisms said to
be collected together at a given site can
be used both to flag errors in either data
collection or organism identifications and
to evaluate the reliability of environmental
data bases. Taxonomic experts could con-
firm the identifications of organisms
flagged as inconsistent, providing a cost-
effective means of evaluating the quality
of biological information. When new en-
vironmental associations of organisms are
found, the data bases can be updated.
Since the ERAPT system is designed to
accept input from files, it could be inter-
faced with ongoing environmental data
collection projects to summarize and
evaluate information and to scan for pro-
bable changes in environmental condi-
tions. Interfacing programs could convert
numerical environmental data and codes
unique to local or regional data collection
projects so that they can be used with the
ERAPT system. Output from the ERAPT
system also might be used by other pro-
grams for environmental decision making
and evaluation. Sources of environmental
information within the data bases also can
be evaluated. It is possible to identify and
eliminate sources that provide environ-
mental information which frequently con-
tradicts that supplied by other workers for
the same organisms.
Because the ERAPT programs are in-
dependent of the kinds of data and
organisms which they can process, the
system has many applications in other
areas of biological and environmental
research. Alternate sets of biological
categories or environmental parameters
can be developed easily for terrestrial or
marine environments or for microbial
organisms or higher vertebrate groups.
The system also has direct application to
behavioral, physiological, and genetic
organism information. Because of the
structure and internal organization of the
data used by the system, there is no limit
to the amount of information which can
be accumulated or queried for each kind
of organism contained within a data base.
System capacities are related only to the
number of kinds of organisms and data
categories which can be accessed and
evaluated as a group, and are easily
modified within programs. There is also no
limit to the number of groups which can
be developed. Programs permit transfer of
search statements between organism
groups if they have compatible data.
-------
Table 2. Results of ERAPT system validation study
Cross & Yellow
Tuscarawas
Blanchard
Total
River Systems
DO2mg/l
Temperature
PH
Current
Turbidity
General Habitat
Bottom Type
Organics
Specific Habitat
Pollution
No. of
Sites
13
18
16
2
22
34
33
31
Agreement
10O%
100%
81%
100%
82%
100%
97%
97%
No. of
Sites
5
6
4
3
2
11
10
3
10
2
Agreement
1OO%
100%
100%
100%
100%
91%
80%
100%
90%
100%
No. of
Sites
6
7
5
8
17
16
16
Agreement
83%
100%
100%
100%
100%
100%
94%
No. of
Sites
24
31
25
5
32
62
59
3
57
2
Agreement
96%
100%
88%
100%
88%
98%
95%
100%
95%
100%
R. A. Hellenthal and C. L Dawson are with the University of Notre Dame, Notre
Dame, IN 46556.
Phil Larson is the EPA Project Officer (see below).
The complete report, entitled "A Computerized System for the Evaluation of
Aquatic Habitats Based on Environmental Requirements and Pollution Toler-
ance Associations of Resident Organisms," (Order No. PB 86-167 343/AS;
Cost: $16.95, subject to change) will be available only from:
National Technical Information Service
5285 Port Royal Road
Springfield. VA 22161
Telephone: 703-487-4650
The EPA Project Officer can be contacted at:
Environmental Research Laboratory
U.S. Environmental Protection Agency
Corvallis, OR 97333
&U.S. GOVERNMENT PRINTING OfFICE: 1986/646 116/20811
-------
United States
Environmental Protection
Agency
Official Business
Penalty for Private Use $300
EPA/600/S3-86/019
Center for Environmental Research
Information
Cincinnati OH 45268
.£7^ U3.QFFiqALMA:.
';^v,,^ir-[WOSIAOf
f ',-i'WT?; '
"s 'U5P. ;iCD*' O 1 ^ " '
% , / * * i ." "J J £, i
- .(.I It?- - P n MFTF-t ~ I
OCOC329 PS
U S EWVIR FROTfCTIOM AGENCY
REGION 5 LIBRARY
E30 S DEARBORN STREET
CHICAGO IL 60604
------- |