United States Environmental Protection Agency Environmental Research Laboratory Corvallis, OR 97333 Research and Development EPA/600/S3-86/019 April 1986 Project Summary A Computerized System for the Evaluation of Aquatic Habitats Based on Environmental Requirements and Pollution Tolerance Associations of Resident Organisms Clyde L Dawson and Ronald A. Hellenthal The Environmental Requirements and Pollution Tolerance (ERAPT) system is a computerized retrieval and analysis system for environmental information on aquatic organisms. It can be used to predict organism assemblages based on envir- onmental conditions, to describe en- vironmental characteristics associated with plants and animals which inhabit specific sites, and to compare site char- acterizations with associated biological communities for inconsistencies resulting from organism misidentification or erron- eous habitat information. It can also iden- tify species that are particularly sensitive to specific sets of environmental condi- tions and can predict changes in species composition that are likely to result from environmental disturbance. The system has been developed for an IBM 370/3033* computer system. It can be used interactively to predict and evaluate aquatic habitats and organism assemblages. ERAPT computerized data bases contain 57,345 biological, physical, chemical, and distributional characteristics drawn from 883 sources for 1,691 species of North American diatoms, blue-green algae, the insect orders Ephemeroptera (mayflies) and Plecoptera (stoneflies), the •Mention of trademarks or commercial products does not constitute endorsement or recommendation for use. dipteran family Chironomidae (midges), and the fishes of U.S. Environmental Protection Agency Region 5 (North Cen- tral U.S.). The system was tested by comparing its prediction of environmental conditions based on the dominant fish community with environmental characteristics ob- tained from aquatic habitat surveys for 64 sites on three river systems in Ohio. An overall successful prediction rate of 95% was achieved. Where disagreements be- tween predicted and reported environmen- tal characterizations occurred, they almost always were among very similar tolerance ranges or habitat types. This Project Summary was developed byEPA's Environmental Research Labora- tory, Corvallis, OR, to announce key find- ings of the research project that is fully documented in a separate report of the same title (see Project Report ordering in- formation at back). Introduction Aquatic organisms long have been recognized as potentially useful indicators of habitat conditions and water quality. This is due to their ability to reflect con- ditions through time, to demonstrate the effects of disturbances after the environ- ment has returned to apparently normal physical and chemical conditions, to ------- integrate the effects of many different en- vironmental factors and their interactions simultaneously, and to provide a living context for considerations of environmen- tal quality. The basic requirements for indicator organisms are that they must be sensitive to environmental changes and have life spans and generation lengths which are appropriate for use in environ- mental assessment, long enough to reflect intermittent or occasional disturbances, and short enough to subject sensitive life stages to adverse environmental conditions. Although organisms have been a funda- mental component of aquatic habitat surveys, they play a relatively minor role in aquatic assessment. One reason may be the difficulties encountered in comparing and summarizing information obtained by different researchers. This may be due to variability in environmental requirements and habitat associations for some species making them unsuitable as environmental indicators or the result of inconsistent methods of data collection or analysis. A second factor may be the diverse outlets for publication of these data; often appear- ing only in progress and summary reports receiving little dissemination. Another fac- tor is the difficulty in developing quality control procedures for organism identifica- tions. This has resulted in great variation in the quality and reliability of published associations between organisms and spe- cific environmental conditions. Environmental data and reports vary widely in quality and reliability. Stand- ardized methods for data acquisition and analysis and quality control procedures are reasonably well established for most bio- logical, chemical, and physical environ- mental data. However, the association of specific organisms with these parameters rests on the reliability of species identi- fications performed by investigators of varying backgrounds, experience, and ex- pertise. At present, the primary quality assurance mechanism for organism identi- fications is the competence and dedica- tion of individual researchers. While independent verification of voucher specimens by taxonomic experts is possi- ble, this is a procedure which, even on a small scale, is time consuming and costly and, on a large scale, would significantly exceed our present resource of taxonomic experts. Publications providing summary infor- mation on the environmental relationships and tolerances of aquatic organisms also may be useful in identifying potential in- dicator organisms and in establishing tolerance levels of organisms to specific environmental factors. However, unless the reference publication survey is ex- tremely exhaustive or very selective, it may be difficult to establish useful trends among the vast amounts of often con- tradictory information reviewed. Further- more, the fixed format of printed reports severely limits the ways in which their data can be used. It is often difficult to deter- mine which environmental requirements are consistent within a genus or higher taxonomic level and which environmental inconsistencies at the specific level could be due to errors in data collection or identification. A computerized storage, retrieval and evaluation system for these data has been developed to enhance their usefulness by allowing queries based on taxonomic association, environmental parameters, or a combination of these factors. Since the environmental information for diverse tax- onomic groups can be standardized, it is possible to answer questions concerning biological communities and their environ- ments. A major application of this system is the development of lists of taxa asso- ciated with specific environmental con- ditions which can serve as reference communities for environmental scientists. This system also can be used as a device for scrutinizing environmental data by comparing the known tolerances of organisms reported in monitoring and im- pact studies with the physical and chemical characteristics of the habitats from which they were collected. Dis- crepancies encountered in these compar- isons are flagged as potential errors in data collection or specimen identification. Another quality control device possible with these data is comparing the known ecological parameters associated with combinations of taxa reported to be collected together at individual sites. In- consistencies in the environmental re- quirements of these taxa also can be flagged as potential errors in identification. The Environmental Requirements and Pollution Tolerance (ERAPT) information retrieval and analysis system permits com- puterized prediction and evaluation of habitat characteristics and organism assemblages. ERAPT system data bases contain 57,660 parameter values repre- senting environmental characterizations of 402 genera and 1,694 species of aquatic organisms in the diatoms, blue-green algae, the insect order Ephemeroptera and Plecoptera, the dipteran family Chironomidae, and the fish species of U.S. Environmental Protection Agency (EPA) Region 5 (North Central U.S.). The en- vironmental characterizations used by the ERAPT system have been derived from 883 data sources, primarily from the published literature (Table 1). This represents a mean of 34 environmental parameters from six data sources per species, with four species per genus. Information for the environmental characterization of diatoms, bluegreen algae, chironomids, Ephemeroptera and Plecoptera were obtained from the EPA Environmental Monitoring and Support Laboratory, Cincinnati, Ohio. Characteriza- tion of Region 5 fishes from primary and secondary literature sources was done as part of this project. System Operation The ERAPT system consists of an in- tegrated set of computer programs for the storage, retrieval, and manipulation of en- vironmental requirements and pollution tolerance information on aquatic organisms. It has been developed for use on an IBM 370/3033 computer system at the Univeristy of Notre Dame. Data are stored and manipulated as hierarchically related environmental requirements and pollution tolerance categories representing tolerance ranges to specific pollutants or environmental conditions, geographic locations, general or specific habitat characteristics, and periods of appearance, emergence, or greatest abundance. At present the system uses 21 heading categories (stage, abundance, pollution tolerance, optimal growth period, repro- ductive season, reproductive behavior, feeding behavior, hardness, salinity, nutrients, degradable organics, pH, oxy- gen, temperature, turbidity, current, general habitat, specific habitat, bottom type, ecosystem regions, political geo- graphy) divided into 303 specific param- eters. These categories and parameters were adapted from a set developed by the Aquatic Biology Section, EPA Environmen- tal Monitoring and Support Laboratory (Cincinnati) for use with macroinverte- brates and diatoms, and were expanded for fish and other aquatic organisms as part of this project in collaboration with the staff of the EPA Corvallis Environmen- tal Research Laboratory. Data for ERAPT are encoded on tabular forms which can be produced by the system. These forms contain boxes cor- responding to specific parameters such as stage, feeding behavior, or tolerance to en- vironmental conditions for an individua species from a single reference source, The ERAPT system reads the tabulai forms as digitized X-Y coordinate value: corresponding to each mark on a form. ------- Table 1. Summary totals for information in the ERAPT data bases. BLGR, Bluegreen algae; DIAT, Diatoms; CHIR, Chironomids; EPHE, Ephemeroptera; PLEC, Plecoptera; FISH, Fishes of EPA Region 5 BLGR DIAT CHIR EPHE PLEC FISH Total Authors Records Genera Species Citations Entries 420 221 50 161 2,289 4,364 48 342 50 295 2,726 5,510 34 306 74 232 529 8,156 200 459 60 399 1,207 6,806 125 450 86 364 1,767 7,685 56 446 82 243 1,000 25,139 883 2,224 402 1,694 9,518 57,660 The digitized data are standardized for the ERAPT system. During this process both direct and hierarchical correspond- ences between categories are established. For example, one set of data entry forms may contain the environmental category Mesosaprobic, whereas another may divide this category into alpha and beta ranges. During this step these parameters are associated so that a parameter shown at a lower level of the environmental category hierarchy is included at all upper levels. For example, the heading salinity is divided into 11 categories with the cate- gories Mesohalobous and Oligohalobous further subdivided into two and three parameters, respectively (Figure 1). This structure permits the ERAPT system to store, manipulate and use environmental information differing in precision without sacrificing the most reliable data. The next procedure is the creation of a searchable data base. At this point the various components of the system are linked. The taxonomic categories are hierarchically connected in a matter similar to that described previously for environ- mental headings and parameters. This enable queries at any level of the tax- onomic hierarchy. The linking operation involves the association of the taxonomic names, environmental headings and pa- rameters plus their definitions, author cita- tions and references, environmental data, and a series of 2 to 8-character abbrevi- ations which are used in the query and information retrieval process. The lowest taxonomic level directly accessible by the system is subspecies. However, data from individual sources are also available. The system uses two summaries of the information contained in reference sources for each taxon, a logical sum and logical product. The logical sum includes all en- vironmental parameters given by any source for each taxon in the system. For example, if one investigator reported that a species occurs in streams and another reported that the same species occurs in both lakes and streams, the logical sum for general habitat category for that species would include both lakes and streams. The logical product includes only those param- eters indicated by all sources containing Salinity 40,000 Polyhalobous SA.-POPOLY Euhalobous SA.-POEUHA a-Mesohalobous SA.-MEALPH ^-Mesohalobous Halophilous Haline Indifferent Halophobous Poly- or Euhalobous Marine Forms SA:POLEUH Mesohalobous Brackish-water forms SA.-MESOHA Oligohalobous Freshwater forms SA.-OLIGOH 11 I! Figure 1. Hierarchy diagram for salinity in mg/l. environmental requirements or pollution tolerance information within each heading category for a taxon. Therefore, each product summary consists of those en- vironmental parameters which are con- sistently associated with a taxon in the system. For the general habitat example given above, the product summary would show only lakes, since it was the only parameter indicated by all sources of in- formation for the taxon. Inconsistencies among investigators reporting information about any heading category for a taxon may be obtained by calculating the logical difference (exclusive OR) between the sum and product summaries. In the general habitat example, streams would be considered an inconsistent parameter since one source used the parameter and the other did not. For taxonomic cate- gories above species, similar data sum- maries are maintained based on logical sums of the information for all lower tax- onomic levels. The system also maintains the number of data sources which were summarized for each taxon and environ- mental category. Information retrieval and analysis are ac- complished through an interactive pro- gram which uses simple commands for queries of the data base. These commands may be used to predict groups of organ- isms having a select set of environmental characteristics. Species may be evaluated for consistency of associated environmen- tal information, providing lists of potential indicators. Groups of organisms also may be defined using abbreviated taxonomic names for characterization and evaluation. Species or groups of species may be com- pared to determine the similarities and differences in their environmental re- quirements. Information may be selected by taxonomic level, number of sources, or groups of environmental parameters. The program provides information on program commands, definitions of en- vironmental retrieval codes and a profile of the searches performed during the session. Individual search statements or complete terminal sessions may be saved and later recalled for use with data bases containing information on other groups of organisms. The program can retain an almost unlimited number of saved searches and can provide directories and lists of these upon request. The program will permit customization for a variety of printing and nonprinting computer ter- minals and can be directed to pause be- tween output pages. Input may come from sources other than a terminal to permit other programs to interface directly with ------- the system. Output may also be redirected to files or to a high-speed lineprinter. Validation Methods Survey data for verification of the ERAPT system were obtained from the State of Ohio Environmental Protection Agency for 64 sites from 20 rivers, streams, and tributaries in the Cross Creek and Yellow Creek, Blanchard River, and Tuscarawas River systems in southeastern and northwestern Ohio. For each site a list of fish species and abundances and descriptive information on the habitat was provided. This included the dissolved ox- ygen, temperature, pH, turbidity, bottom type, and general and specific habitat characteristics of most sites, with current, degradable organics, and pollution infor- mation provided for some of the sites. The five numerically dominant species at each site were entered into the ERAPT system and used to predict local environ- mental conditions. These species ac- counted for at least one half of the total fishes collected and, therefore, were taken to be representative of the environmental conditions present within the habitat sampled. The ERAPT system's compare function was used to identify the environ- mental characteristics common to each fish community. This information was used for comparison with the environmen- tal description of each habitat. Habitat characterization information for each collecting site was transformed into ERAPT environmental parameter codes directly from field survey forms. These data then were compared with the envir- onmental descriptions predicted by the system from the associated dominant fish community for each category of environ- mental characteristic. The compared data were considered to be in agreement if they shared common environmental param- eters. For example, a general habitat predicted by the fish assemblage as lake or large river was considered to be in agreement with a habitat characterized as either large river or lake. Likewise, a com- munity predicted bottom type of gravel was considered to be in agreement with a habitat characterized as gravel, sand, and silt. Validation Results In comparing 300 environmental characteristics obtained from field survey forms to those predicted by the ERAPT system from the dominant fish communi- ty, 285 showed agreement for an overall successful prediction rate of 95% (Table 2). The successful prediction rates for the Cross Creek and Yellow Creek, Blanchard River, and Tuscarawas River systems were 95%, 97%, and 93%, respectively. There was 100% agreement between predicted and observed characterizations for temperature, current, and degradable or- ganics in all three river systems. Those environmental parameters not incomplete agreement between observed and pre- dicted characterizations were: general habitat (98%), dissolved oxygen (96%), specific habitat and bottom type (95%), and pH and turbidity (88%). While there were some disagreements between predicted and reported environ- mental characterizations, they almost always were among very similar tolerance ranges or habitat types. Most of these disagreements were consider to be minor by the staff of the State of Ohio Environ- mental Protection Agency considering the precision of the original measurements. In addition, only one site had discrepancies between the observed and predicted char- acterizations of more than one kind of environmental parameter. This was site number 1 of Yellow Creek where there were disagreements in both specific habitat and bottom type. Overall, ERAPT system predictions of habitat character- istics from the dominant fish assemblages closely approximated the information ob- tained from field surveys at all of the sites evaluated. Future Development Use of environmental indicator organisms on a regional basis appears to be particularly promising. Assessment systems based on a sound knowledge of a local fauna would seem to have a great probability for success. Development of local or regional environmental data bases is within the capabilities of the ERAPT system. This will be greatly facilitated by development of microprocessor-based ver- sions of the retrieval and analysis pro- grams. Microprocessor implementation of the ERAPT system on computers such as the IBM PC and the Apple Macintosh is feasible and a prototype microcomputer version of the retrieval and analysis pro- gram currently is being developed. An interesting capability of the ERAPT system is the potential to screen envir- onmental data. Detection of apparent in- consistencies among the environmental tolerances of different organisms said to be collected together at a given site can be used both to flag errors in either data collection or organism identifications and to evaluate the reliability of environmental data bases. Taxonomic experts could con- firm the identifications of organisms flagged as inconsistent, providing a cost- effective means of evaluating the quality of biological information. When new en- vironmental associations of organisms are found, the data bases can be updated. Since the ERAPT system is designed to accept input from files, it could be inter- faced with ongoing environmental data collection projects to summarize and evaluate information and to scan for pro- bable changes in environmental condi- tions. Interfacing programs could convert numerical environmental data and codes unique to local or regional data collection projects so that they can be used with the ERAPT system. Output from the ERAPT system also might be used by other pro- grams for environmental decision making and evaluation. Sources of environmental information within the data bases also can be evaluated. It is possible to identify and eliminate sources that provide environ- mental information which frequently con- tradicts that supplied by other workers for the same organisms. Because the ERAPT programs are in- dependent of the kinds of data and organisms which they can process, the system has many applications in other areas of biological and environmental research. Alternate sets of biological categories or environmental parameters can be developed easily for terrestrial or marine environments or for microbial organisms or higher vertebrate groups. The system also has direct application to behavioral, physiological, and genetic organism information. Because of the structure and internal organization of the data used by the system, there is no limit to the amount of information which can be accumulated or queried for each kind of organism contained within a data base. System capacities are related only to the number of kinds of organisms and data categories which can be accessed and evaluated as a group, and are easily modified within programs. There is also no limit to the number of groups which can be developed. Programs permit transfer of search statements between organism groups if they have compatible data. ------- Table 2. Results of ERAPT system validation study Cross & Yellow Tuscarawas Blanchard Total River Systems DO2mg/l Temperature PH Current Turbidity General Habitat Bottom Type Organics Specific Habitat Pollution No. of Sites 13 18 16 2 22 34 33 — 31 — Agreement 10O% 100% 81% 100% 82% 100% 97% — 97% — No. of Sites 5 6 4 3 2 11 10 3 10 2 Agreement 1OO% 100% 100% 100% 100% 91% 80% 100% 90% 100% No. of Sites 6 7 5 — 8 17 16 — 16 — Agreement 83% 100% 100% — 100% 100% 100% — 94% — No. of Sites 24 31 25 5 32 62 59 3 57 2 Agreement 96% 100% 88% 100% 88% 98% 95% 100% 95% 100% R. A. Hellenthal and C. L Dawson are with the University of Notre Dame, Notre Dame, IN 46556. Phil Larson is the EPA Project Officer (see below). The complete report, entitled "A Computerized System for the Evaluation of Aquatic Habitats Based on Environmental Requirements and Pollution Toler- ance Associations of Resident Organisms," (Order No. PB 86-167 343/AS; Cost: $16.95, subject to change) will be available only from: National Technical Information Service 5285 Port Royal Road Springfield. VA 22161 Telephone: 703-487-4650 The EPA Project Officer can be contacted at: Environmental Research Laboratory U.S. Environmental Protection Agency Corvallis, OR 97333 •&U.S. GOVERNMENT PRINTING OfFICE: 1986/646 116/20811 ------- United States Environmental Protection Agency Official Business Penalty for Private Use $300 EPA/600/S3-86/019 Center for Environmental Research Information Cincinnati OH 45268 .£7^ U3.QFFiqALMA:. ';^v,,^ir-[WOSIAOf f ',-i'WT?; ' "s 'U5P. ;iCD*' O 1 ^ " ' % , / * * i ." "J J £, i - • .(.I It?- -• P n MFTF-t ~ I OCOC329 PS U S EWVIR FROTfCTIOM AGENCY REGION 5 LIBRARY E30 S DEARBORN STREET CHICAGO IL 60604 ------- |