iu
S
10
Vlusclc Shoals AL ,<5660
United States
Environmental Protection
Agency
Effri ts Rcsear
EPA GOO 1 8
M;m:h 1984
Research and Development
Consolidation of
Baseline Information,
Development of
Methodology, and
Investigation of Thermal
Impacts on Freshwater
Shellfish, Insects, and
Other Biota
Interagency
Energy/Environment
R&D Program Report
-------
CONSOLIDATION OF BASELINE INFORMATION,
DEVELOPMENT OF METHODOLOGY, AND
INVESTIGATION OF THERMAL IMPACTS
ON FRESHWATER SHELLFISH, INSECTS,
AND OTHER BIOTA
by
John S. Grossman and James R. Wright, Jr.
Office of Natural Resources
Tennessee Valley Authority
Knoxville, Tennessee 37902
Roger L. Kaesler
Department of Geology
University of Kansas
Lawrence, Kansas 66045
Interagency Agreement EPA-IAG-DS-E721
Project No. E-AP-80-BDR
Program Element No. INE-625A
Project Officer
Alfred Galli
Office of Environmental Processes and Effects Research
U.S. Environmental Protection Agency
Washington, DC 20460
Prepared for
Office of Environmental Processes and Effects Research
Office of Research and Development
U.S. Environmental Protection Agency
Washington, DC 20460
U S Environmental Protection Agency
Region 5, Library (5PL-16)
230 S Dearborn Street, Room 1670
Chicago, Ii 60604
-------
DISCLAIMER
This report was prepared by the Tennessee Valley Authority and has been
reviewed by the Office of Research and Development, Energy and Air Division,
U.S. Environmental Protection Agency, and approved for publication. Although
the research described in this document has been funded wholly or in part by
the United States Environmental Protection Agency through Interagency
Agreement No. EPA-IAG-82-D-X0511 with TVA, it has not been subject to Agency
policy and peer review and therefore does not necessarily reflect the views of
the agency or the Tennessee Valley Authority and no official endorsement
should be inferred.
ii
-------
ABSTRACT
A computerized information system was developed for storing, retrieving,
and analyzing data collected during limnological surveys. The system
accommodates 19 variables and uses the Statistical Analysis System as the
basic data-management system. To facilitate storage of information, a series
of hierarchical codes was developed. These codes not only reduced storage
requirements but also helped reduce computing costs.
When the information system had been developed, three analytical
procedures were tested used in various forms, and evaluated as tools for
analyzing benthic macroinvertebrate data sets. The first two of these were
cluster analysis and ordination using nonmetric multidimensional scaling
(MDS). Tests of these two methods included, first, development of a rationale
for selecting methods of transforming the data and choosing coefficients of
similarity to express relationships between samples and, second, application
of the two methods to data from the Clinch River, Virginia, and the Cumberland
River, Tennessee. The resulting dendrograms from cluster analysis and
ordinations from multidimensional scaling were evaluated by preparing
analytical tables and dealing with small subsets of the total data set.
Measurements of species diversity from information theory were the third
analytical technique considered, again using benthic macroinvertebrate data.
First, the relative importance of diversity at the species level was compared
to components of diversity contributed by other categories in the taxonomic
hierarchy. Results indicated that identifications to species contributed
little information about the structure of the communities that discrimination
of genera had not already contributed. Second, the heuristic properties of
species diversity were used to evaluate two classifications stressing
functional morphology and trophic-functional relationships of benthic
invertebrates, independent of the taxonomic hierarchy. Both methods produced
results similar to ones obtained by cluster analysis, suggesting that they
merit further investigation.
ill
-------
CONTENTS
Abstract iii
List of Figures vii
List of Tables xii
List of Abbreviations and Symbols xvi
Disclaimer ii
Acknowledgments xvii
1. Introduction 1
2. Conclusions and Recommendations 3
2.1 Introduction 3
2.2 Generic 3
2.3 Cluster Analysis 3
2.4 Ordination 4
2.5 Indices of Diversity 5
2.6 Final Statement 5
3. Development of the Information System 6
3.1 General Description of the Information System 6
3.2 Preparation of Data 6
3.3 Editing of Data 12
4. Methods 31
4.1 Description of Analytical Techniques 31
4.1.1 Cluster Analysis 31
4.1.2 Ordination by Nonmetric Multidimensional Scaling . . 34
4.1.3 Indices of Species Diversity 41
4.2 Description of Data Sets 42
4.2.1 Clinch River Data Set 42
4.2.2 Cumberland River Data Sets —1973 and 1975 44
5. Cluster Analysis 54
5.1 General Description 54
5.2 Analytical Procedures 54
5.2.1 Selection of Similarity Coefficients 54
5.2.2 Reducing Size of Data Matrices 57
5.2.3 Evaluation of Distortion 60
5.3 Results 60
5.3.1 Q-Mode Analysis 60
5.3.1.1 Clinch River Data Set 60
5.3.1.1.1 Presence-absence data 60
5.3.1.1.2 Quantitative data, counts
of species - 62
-------
CONTENTS (continued)
5.3.1.2 Cumberland River Data Set--1973 62
5.3.1.2.1 Substrate 62
5.3.1.2.2 Zoomacrobenthos 62
5.3.1.2.2.1 Presence-absence
data 65
5.3.1.2.2.2 Quantitative data,
counts of species . 65
5.3.1.2.3 Summary 65
5. Cluster Analysis (continued)
5.3.1.3 Cumberland River Data Set--1975 69
5.3.1.3.1 Substratum 69
5.3.1.3.2 Zoomacrobenthos 69
5.3.1.3.2.1 Presence-absence
data 69
5.3.1.3.2.2 Quantitative data,
species counts . . 70
5.3.1.3.4 Summary 70
5.3.2 R-Mode Analysis 73
5.3.2.1 Introduction 73
5.3.2.2 Clinch River Data Set 76
5.3.2.2.1 Presence-absence data 76
5.3.2.2.2 Quantitative data, species
counts 85
5.3.2.2.3 Summary 85
5.3.2.3- Cumberland River Data Set--1973 85
5.3.2.3.1 Presence-absence data 85
5.3.2.3.2 Quantitative data, species
counts 87
6. Ordination—Nonmetric Multidimensional Scaling 124
6.1 General Description 124
6.2 Analytical Procedures 124
6.3 Results 125
6.3.1 Q-Mode Analysis—Clinch River Data 125
6.3.1.1 Presence-Absence Data 125
6.3.1.2 Quantitative Data (Species Counts) .... 125
7. Diversity Indices 129
7.1 General Description 129
7.2 Analytical Procedures 130
7.3 Diversity of Samples from the Clinch River 134
7.3.1 Species Diversity 134
7.3.2 Hierarchical Diversity 136
8. Summary and Discussion 148
8.1 Introduction 148
8.2 Nature of the Ecosystems from which Data Bases were
Selected 148
-------
CONTENTS (continued)
8.3 Methods 149
8.3.1 Relationships Between Methods 149
8.3.2 Data 150
8.3.2.1 Presence-Absence Data 150
8.3.2.2 Quantitative Data 151
8.3.3 Cluster Analysis 151
8.3.4 Ordination 153
8.3.5 Species Diversity and Hierarchical Diversity .... 153
References 154
-------
LIST OF FIGURES
Number
1 Flow diagram illustrating the computerized infor-
mation system (CIS) used for TVA's nonfisheries
biological data
Coding form used for keypunching field and laboratory
limnological data
3 One-dimensional Q-mode ordination of hypothetical
samples in Table 9 computed with nonmetric multidi-
mensional scaling, nine iterations (stress = 0.230) .... 36
4 Two-dimensional Q-mode ordination of hypothetical
samples in Table 9 computed with nonmetric multi-
dimensional scaling, one iteration (stress = 0.097) .... 37
5 Two-dimensional Q-mode ordination of hypothetical
samples in Table 9 computed with nonmetric multidimen-
sional scaling, 20 iterations (stress = 0.051) 37
6 Three-dimensional Q-mode ordination of hypothetical
samples in Table 9 computed with nonmetric multidi-
mensional scaling, 43 iterations (stress = 0.001) 39
7 Dendrogram computed from Q-mode cluster analysis of a
matrix of distance coefficients (Table 10) showing
faunal similarities between hypothetical samples in
Table 9 40
8 Map of the Clinch River in Virginia and Tennessee
showing the locations of stations sampled during the
1970 zoomacrobenthic survey 43
9 Stream discharge of the Clinch River at the United
States Geological Survey gauging station at Cleveland,
Virginia, June-August 1970 45
10 Location of stations in the vicinity of a power plant
on the Cumberland River 46
11 Dendrogram computed from cluster analysis of a matrix of
coefficients of cophenetic correlation showing similarity
between the various correlation and similarity matrices
in Tables 17 and 20 (r = 0.914) 56
VI1
-------
LIST OF FIGURES (Cont.)
Number Page
12 Dendrogram computed from Q-mode cluster analysis of a
matrix of Jaccard's coefficients showing faunal simi-
larities between samples collected from the Clinch River
in 1970; data include total insect fauna 58
13 Dendrogram computed from Q-mode cluster analysis of a
matrix of Jaccard's coefficients showing faunal simi-
larities between 36 zoomacrobenthic samples collected
from stations 4, 7, 8, 9, 10, and 11 in the Clinch
River, 1970 (r = 0.93) 61
cc
14 Dendrogram computed from Q-mode cluster analysis of a
matrix of correlation coefficients computed from propor-
tions of each phi size in the substrate after arcsine
transformation; shows similarity of substrate between
samples collected from the Cumberland River in 1973
(r = 0.760) 63
15 Dendrogram computed from Q-mode cluster analysis of a
matrix of distance coefficients computed from propor-
tions of each phi size in the substrate after arcsine
transformation; shows similarity of substrate between
samples collected from the Cumberland River in 1973 .... 64
16 Dendrogram computed from Q-mode cluster analysis of a
matrix of Jaccard's coefficients showing faunal simi-
larities between samples collected from the Cumberland
River in 1973 (r = 0.852) 66
cc
17 Dendrogram computed from Q-mode cluster analysis of a
matrix of distance coefficients computed from data
transformed with the square-root transformation; shows
faunal similarities between samples collected from the
Cumberland River in 1973 (r = 0.833) 67
cc
18 Dendrogram computed from Q-mode cluster analysis of a
matrix of correlation coefficients computed from data
transformed with the square-root transformation; shows
faunal similarities between samples collected from the
Cumberland River in 1973 (r = 0.979) 68
cc
19 Dendrogram computed from Q-mode cluster analysis of a
maxtrix of distance coefficients computed from data
that had been transformed by the square-root transfor-
mation and standardized by rows; shows faunal similar-
ities between samples collected from the Cumberland
River in 1975 71
Vlll
-------
LIST OF FIGURES (Cont.)
Number Page
20 Dendrogram computed from Q-mode cluster analysis of a
matrix of distance coefficients computed from data that
has been transformed by the square-root transformation
and standardized by rows; shows faunal similarities
between samples collected form the Cumberland River
in 1975 72
21 A truncated log-normal distribution fitted to a distri-
bution of species in an aquatic ecosystem not adversely
affected by environmental stress 74
22 Response of organism to severe organic enrichment:
changes in types of organisms present, population
densities, and biological diversity 75
23 Dendrogram computed from R-mode cluster analysis of a
matrix of Jaccard's coefficients, showing distributional
similarities of taxa collected from stations on the
Clinch River unaffected by low-pH stress that resulted
from the 1970 spill of acid (r = 0.97) 77
24 Dendrogram computed from R-mode cluster analysis of a
matrix of simple matching coefficients, showing distri-
butional similarities of taxa collected from stations on
the Clinch River unaffected by low-ph stress that resulted
from the 1970 spill of acid (r = 0.91) 78
25 Dendrogram computed from R-mode cluster analysis of a
matrix of distance coefficients computed from data
that had been transformed by the square-root transfor-
mation; shows distributional similarities of taxa
collected from stations on the Clinch River unaffected
by low-pH stress that resulted from the 1970 spill of
acid (r = 0.92) 79
26 Dendrogram computed from R-mode cluster analysis of a
matrix of Jaccard's coefficients, showing distributional
similarities of taxa collected from stations on the Clinch
River affected by low-pH stress that resulted from the
1970 spill of acid (r = 0.97) 80
cc
27 Dendrogram computed from R-mode custer analysis of a
matrix of simple matching coefficients, showing distri-
butional similarities of taxa collected from stations
on the Clinch River affected by low-pH stress that
resulted from the 1970 spill of acid (r = 0.84) 81
cc
-------
LIST OF FIGURES (Cont.)
Number Page
28 Dendrogram computed from R-mode cluster analysis of a
matrix of distance coefficients computed from data
that had been transformed by the square-root transfor-
mation; shows distributional similarities of taxa
collected from stations on the Clinch River affected
by low-pH stress that resulted from the 1970 spill
of acid (r = 0.97) 82
cc
29 Dendrogram computed from R-mode cluster analysis of a
matrix of Jaccard's coefficients, showing distributional
similarities of taxa collected from stations on the
Clinch River both affected and unaffected by the low-pH
stress that resulted from the 1970 spill of acid
(r = 0.95). Only those taxa are included that comprise
10 percent or more of the total sample 83
30 Dendrogram computed from R-mode cluster analysis of a
matrix of distance coefficients computed from data that
had been transformed by the square-root transformation;
shows distributional similarities of taxa collected
from stations on the Clinch River both affected and
unaffected by the low-ph stress that resulted from the
1970 spill of acid (r = 0.97). Only those taxa that
comprise 10 percent or more of the total sample are
included 84
31 Dendrogram computed from R-mode cluster analysis of a
matrix of Jaccard's coefficients, showing distributional
similarities of taxa collected from the Cumberland
River in 1973 86
32 Dendrogram computed from R-mode cluster analysis of a
matrix of simple matching coefficients, showing distri-
butional similarities of taxa collected from the
Cumberland River in 1973 88
33 Dendrogram computed from R-mode cluster analysis of a
matrix of distance coefficients with data transformed
with the square-root transformation and standardized
by rows, showing distributional similarities of taxa
collected from the Cumberland River in 1973 89
34 Dendrogram computed from R-roode cluster analysis of a
matrix of distance coefficients with data transformed
by the square-root transformation, showing distributional
similarities of taxa collected from the Cumberland
River in 1973 90
-------
LIST OF FIGURES (Cont.)
Number
35 Dendrogram computed R-mode cluster analysis of a matrix
of correlation coefficients with data transformed by
the square-root transformation and standardized by rows,
showing distributional similarities of taxa collected
from the Cumberland River in 1973 91
36 Three-dimensional ordination by nonmetric multidimen-
sional scaling computed from distance coefficients
based on presence-absence data, showing faunal simi-
larities between samples collected from the Clinch
River in 1970 126
37 Three-dimensional ordination by nonmetric multidimen-
sional scaling computed from distance coefficients
based on counts of species, showing faunal similarities
between samples collected from the Clinch River in 1970 . 127
38 Species diversity (Brillouin's H) of 36 samples from the
Clinch River, 1970; contour interval 0.4 135
39 Species diversity (approximate index H") of 36 zoomacro-
benthic samples from the Clinch River, 1970 137
XI
-------
LIST OF TABLES
Number Page
1 Numerical codes for rivers and streams in the Tennessee
Valley, and sources of organisms for bioassays 14
2.0 Codes for identifying methods of limnological sampling
and types of gear 15
2.1 Codes for types of sampling equipment and sampling material
used in artificial-substrate sampling 16
2.2 Codes for types of sampling equipment and sampling material
used in natural substrate removal and organism removal . . 17
2.3 Codes for types of sampling equipment used with emergence
traps and volume samplers 18
3 Codes for identifying location of limnological samples
on a river or reservoir transect 19
4 Alphabetic codes identifying site (or project) at which
sample was collected 20
5 Codes for identifying type of sample (community, parameter,
test) 21
6 Codes used to report data units 22
7 Codes for identifying basic type of habitat from which
sample was collected or hardness at which bioassay was
performed 29
8 Codes for recording instar or size of organisms
collected or used in bioassays 30
9 Hypothetical data showing proportions of 10 species at
12 stations 47
10 Matrix of distance coefficients computed from the data
in Table 9 after arcsine transformation 48
11 Generalized trophic, functional classification of zoo-
macrobenthic invertebrates 49
12 Hierarchical classification of the trophic-functional
role of organisms; includes only those categories that
occurred in samples 50
xii
-------
LIST OF TABLES (Cont.)
Page
Hierarchical classification assigned for zoomacrobenthic
invertebrates based on functional morphology: head
position, body shape, and respiratory organs 51
14 Range and mean of physicochemical data for the Clinch
River, June to September 1970 52
15 Descriptions of habitats at stations 4, 7, 8, 9, 10,
and 11 on the Clinch River, 1970 53
16 Coefficients of correlation, distance, and similarity:
abbreviations, equations, and upper and lower limits ... 92
17 Contingency table (2 X 2) defining the terms a_, b, (:,
and d, as used in the equations in Table 13 94
18 Effect of the transformation log (X . +1), where X..
is the abundance of the :ith species In the j^th sample . . 94
19 Effect of the transformation V(x-- + 0.5), where X..
is the abundance of the ^th species in the j^th sample . . 94
20 Labels of correlation and distance matrices with
various transformations 95
21 Matrix of coefficients of cophenetic correlation
computed between corresponding elements of 21 correlation
and similarity matrices 96
22 Coefficients of cophenetic correlation comparing
distance matrices and selected correlation and
similarity matrices 98
23 Number of taxa in each major taxonomic group in the
Clinch River (1970) before and after relative species
abundance was determined and rare taxa were eliminated . . 99
24 Twenty-nine taxa and their respective trophic codes
for the reduced Clinch River data set, 1970 100
25 Number of taxa with a relative abundance X).01
divided by the total number of taxa per station,
Clinch River, 1970 101
26 Taxa with a relative abundance X1.01 as percent of the
total number of taxa per station, Clinch River, 1970 . . . 101
27 Total number of organisms per station, Clinch River,
1970 102
xiii
-------
LIST OF TABLES (Cont.)
Number Page
28 Twenty-nine taxa with relative abundance >0.01 as
percent of total number of organisms per station,
Clinch River, 1970 102
29 Analytical technique, type of comparison, and the number
and type of similarity coefficients used to analyze the
reduced (1970) Clinch River data set 103
30 Cophenetic correlation values (r ) for 24 dendrograms
computed from the Clinch River data set, June to
August 1970 104
31 Results of Q-mode cluster analysis of zoomacrobenthic
samples from the Clinch River, 1970; minimum level of
similarity used to define clusters = 0.62; Jaccard's
coefficient 105
32 Results of Q-mode cluster analysis of zoomacrobenthic
samples from the Clinch River, 1970; level of simi-
larity used to define clusters = 0.06; correlation
coefficient 6, ^/Y + 0.5 transformation and standard-
ization by rows 106
33 Results of Q-mode cluster analysis of zoomacrobenthic
samples from the Clinch River, 1970; level of simi-
lar ity__usjed to define cluster = 1.2; distance coefficient
6, VY + 0.5 transformation and standardization by rows . . 107
34 Results of Q-mode cluster analysis of zoomacrobenthic
samples from the Clinch River, 1970; level of simi-
larity used to define clusters = 4.8; distance co-
efficient 7, VY + 0.5 transformation and standardi-
zation by rows 108
35 Summary of results of Q-mode cluster analyses,
Cumberland River, 1973 109
36 Summary of results of Q-mode cluster analyses,
Cumberland River, 1975 HI
37 Results of R-mode cluster analysis of Jaccard's (S.)
and simple-matching (S ) coefficients, Clinch River
data set, 1970 . . . ?m 113
38 Results of R-mode cluster analysis of distance
coefficients, Clinch River data set, 1970 114
39 Results of R-mode cluster analysis of S and S.,
Clinch River stations unaffected by theSIF970 pHJstress . . 116
xiv
-------
LIST OF TABLES (Cont.)
Number Page
40 Trophic-functional codes for taxa clustered by R-mode
cluster analysis of Jaccard's (S.) and simple-matching
(S ) coefficients, Clinch Riverjdata set, 1970 117
sm
41 Trophic-functional codes for taxa cluster by R-mode
cluster analysis of Dist 7 coefficients Clinch River
data set, 1970 118
42 Trophic-functional codes for taxa clustered by R-mode
cluster analysis of simple-matching (S ) and Jaccard's
(S.) coefficients, Clinch River, 1970 ?m 119
J
43 Results of R-mode cluster analysis of S. and S after
reordering clusters according to trophic-functional
codes, Clinch River, 1970 120
44 Results of R-mode cluster analysis of Dist 7 coefficients
after reordering clusters according to trophic-functional
codes, Clinch River, 1970 122
45 Clusters of taxa defined by R-mode cluster analysis of
Jaccard's and the simple matching coefficients, Clinch
River, 1970 123
46 Species diversity (Brillouin's H) of 36 zoomacrobenthic
samples from the Clinch River, 1970 140
47 Species diversity (approximate index H") of 36 zoo-
macrobenthic samples from the Clinch River, 1970 141
48 Hierarchical taxonomic diversity (Brillouin's H) of
zoomacrobenthic samples from the Clinch River, 1970 .... 142
49 Component of species diversity (H) at each level in the
trophic-functional hierarchy for five samples or sub-
samples collected immediately after the acid spill on the
Clinch River, 1970 145
50 Percent of species diversity (H) contributed at each level
in the trophic-functional hierarchy for five samples or
subsamples collected immediately after the acid spill on
the Clinch River, 1970 146
51 Component of species diversity (H) at each level in the
head-body-respiratory functional morphology hierarchy for
five samples or subsamples collected immediately after the
acid spill on the Clinch River, 1970 147
xv
-------
LIST OF ABBREVIATIONS AND SYMBOLS
CIS
Corr
AT
d
Dist
DM/IS
DMS
H
MDS
NT
NTSYS
Q-mode comparison
R-mode comparison
SAS
SCE
SJ
SSM
TF
7LB&MS
7RB
TSO
computerized information system
correlation coefficient; for use with species
counts
temperature change, C
Wilhm-Dorris diversity index
distance coefficient; for use with species
counts
data management/information system
data management system
Brillouin's index
multidimensional scaling
number of taxa
Numerical Taxonomy System
pairwise comparison between columns;
used to determine similarity between sites
or stations on basis of biotic assemblages
present
coefficient of cophenetic correlation
pairwise comparison between rows; used to
determine similarity between species
assemblages on basis of distribution among
samples
Statistical Analysis System
standing crop estimates
Jaccard's coefficient; for use with presence
absence data
simple matching coefficient; for use with
presence-absence data
Trophic-Functional code
Station 7, left bank and midstream
Station 7, right bank
time-sharing option
greater than or equal to
less than
xvi
-------
ACKNOWLEDGMENTS
The authors express their appreciation to Dr. Ralph H. Brooks,
Mr. Billy G. Isom, and Dr. Harrison Hickey, Tennessee Valley Authority,
for their cooperation and support; Mr. Clinton W. Hall, Project Officer,
Environmental Protection Agency, for his patience and support; and
Ms. Rachel C. Strong for her drafting work. Acknowledgment is also
given to Dr. Brian Armitage, Mr. Clay Barr, and Ms. Sandra Emond of the
Tennessee Valley Authority, for their help in developing many of the
coding schemes for the computerized information system. Recognition is
also given Dr. Cornelius Weber, Environmental Protection Agency, and
Mr. Tom Toole and Dr. Ken Tennessen, Tennessee Valley Authority, for
their assistance in updating the BIO-STORET species list.
xvi i
-------
-------
SECTION 1
INTRODUCTION
An essential part of achieving national self-sufficiency in energy is
minimizing the adverse environmental impacts that may accompany accelerated
development and increased use of energy resources. The Tennessee Valley
Authority (TVA) has been studying ways to evaluate and minimize these impacts
through the Federal Energy/Environment Research and Development Program,
coordinated by the Office of Energy, Minerals, and Industry (OEMI) of the
Environmental Protection Agency (EPA). This program is designed to (1) add
environmental objectivity and balance to the mission of the Department of
Energy, (2) prevent delays in development of energy resources that are caused
by inadequate environmental information, (3) develop cost-effective strategies
for pollution control, (4) promote transfer of energy-related environmental
information, and (5) project the impacts of future energy technologies
(Environmental Protection Agency 1976).
As part of the federal interagency agreement with EPA, TVA's applied
research program undertook a comprehensive evaluation of the impacts of
energy-related technologies on the aquatic environment. The research
discussed in this milestone report summarizes the work completed during the
first two years of the project on Task 1 (Information Systems Development) of
Subagreement 10 (Consolidation of Thermal Impacts on Freshwater Shellfish,
Insects, and Other Biota). The overall objective of Task 1 was to develop the
capability to measure and evaluate existing and expected environmental impacts
of energy-related technologies on important biotic assemblages (nonfish) in
the aquatic environment.
To accomplish this objective, the first priority was to develop a
computerized information system (CIS). This system was designed to be
inexpensive to operate, user oriented, adaptable for use in both routine
monitoring studies and research projects, capable of performing a variety of
analytical procedures, and able to interface with EPA's BIO-STORET system.
Using these criteria and the Statistical Analysis System (SAS)
(Barr et al., 1976) as the basic data management system, a CIS was developed
that could accommodate 19 different biotic and abiotic parameters. Since it
was important to conserve space while including information for each
parameter, a series of numeric and alphanumeric codes was developed. These
codes not only reduced storage requirements, but saved time and resources each
time the data were sorted, compiled, and analyzed. Additional software
packages were also adapted or developed to interface with the CIS. These
routines expanded the analytical and computer-graphics capabilities of the
system.
After the CIS had been developed, the next priority was to evaluate three
procedures used to analyze data from biological surveys. The first procedure
-------
was the cluster analysis routine in the Numerical Taxonomy System (NTSYS)
written by F. J. Rohlf and associates. Two applications of cluster analysis
were considered. 1. Q-mode analysis was used to determine the similarity of
different samples or sampling stations on the basis of the species found in
each sample. 2. R-mode analysis was used to identify associations of species
on the basis of their spatial and temporal distributions.
The second analytical technique tested was nonmetric multidimensional
scaling (MDS), in which the information was presented in a scatter diagram
that was examined without first assuming that clusters were present. MDS was
examined as, and might be considered, an alternative to cluster analysis for
determining whether species form distinct biological assemblages.
The third analytical procedure considered was hierarchical diversity
analysis. This technique was an extension of the diversity measures commonly
used to summarize biological data from environmental surveys. It was used to
determine (1) the usefulness of two trophic-functional coding schemes
developed for zoomacrobenthic organisms and (2) the importance of species
diversity compared to diversities at higher taxonomic levels.
-------
SECTION 2
CONCLUSIONS AND RECOMMENDATIONS
2.1 INTRODUCTION
The primary purpose of "Task 1: Information Systems Development" was to
develop a system for storing, retrieving, and analyzing data on aquatic
ecosystems. A secondary objective was to examine and evaluate forms of
various analytical techniques routinely used to summarize and predict the
environmental impacts of accelerated development of energy resources. This
report summarizes the system and provides tests and demonstrations of
quantitative procedures used to analyze large data sets. The methods tested
are cluster analysis, ordination, species diversity, and hierarchical
diversity analysis.
2.2 GENERIC
Throughout the study the paramount importance of pertinent and
representative data in assuring sound environmental interpretations was
evident. Although this point is intuitive, it is often a prerequisite that is
not achieved. One controllable and important part of assuring that sound data
will be available is to make sure that the right questions are asked.
Attempts at general-purpose monitoring should be abandoned, and statistically
trained ecologists familiar with sampling theory should be brought into study
teams from the very start. Moreover, extensive preliminary sampling should be
undertaken to discover subtleties of the ecosystem. During actual monitoring
unessential parts of the preliminary sampling plan can be abandoned in favor
of more detailed sampling of critical areas. Cursory study of a few adequate
samples will provide greater insight than detailed examination of many
inadequate ones. Samples must not only be well located but also of sufficient
size that possible errors of sampling are minimized. For example, with small
sample sizes the likelihood that rare species may be missed by chance alone
increases. Thus, species may be present in some samples and absent from
others due to chance and not due to environmental reasons.
Biological data may be recorded as presence-absence data or quantitative
data. Presence-absence data may be obtained much more quickly and less
expensively, but information provided by differential abundances is lost. For
most purposes of applied aquatic ecology, quantitative data are preferred, but
more work needs to be done to assess the utility of presence-absence data.
Such data are often all that is available, and quick results may be essential
in times of acute environmental crisis.
2.3 CLUSTER ANALYSIS
Cluster analysis is a multivariate quantitative procedure that provides a
classification of samples. The method is often useful in environmental
-------
surveys because it provides readily interpretable results in the form of a
tree-like graph called a dendrogram. Similarities among all samples in a
study are presented simultaneously in the dendrogram, although a measurable
amount of distortion may be introduced during the clustering procedure.
Fortunately, this distortion is measurable, and interpretation of dendrograms
with unacceptable levels of distortion can be avoided.
In Q-mode analyses, samples with similar faunas are grouped together. In
aquatic surveys involving a perturbation of the environment, one expects
samples from upstream control stations to be grouped with downstream controls
in the zone of recovery. Unfortunately, faunas at downstream stations are
sometimes inherently different from upstream ones because of changes in stream
gradient, discharge, substrate, and other factors.
R-mode cluster analyses indicate similarities among, or associations of,
species on the basis of their distribution and on the basis of abundance if
quantitative coefficients are used. Q-mode analyses have been used much more
frequently than R-mode in applied ecology, but R-mode analysis has promise as
a method of comparing faunal associations from stream to stream or basin to
basin.
A further disadvantage of cluster analysis and a further source of
distortion is that it forces samples into clusters whether or not such
clusters exist in nature. Although the total amount of distortion can be
measured, its effects cannot be determined precisely without time-consuming
study of the raw data.
For cluster analysis of presence-absence data, we recommend the use of
Jaccard's coefficient, which bases similarity between stations on only the
mutual presence of organisms at stations and not on their absences.
Quantitative data should be analyzed with correlation or distance coefficients
after the data have been transformed with a square-root transformation to
reduce inordinate effects of highly abundant species. Future research should
be directed toward evaluating use of presence-absence data.
2.4 ORDINATION
The results of ordination are presented as a scatter diagram in which
stations are plotted as points in a space defined by axes that represent the
faunal similarities of stations to each other. The goal of ordination is
similar to that of Q-mode cluster analysis because both are computed from
similarities among stations. It differs in that no a priori assumptions need
be made about the presence of clusters in the data. Although little use has
been made of ordination in applied aquatic ecology, it has potential as an
analytical tool. Streams are linear ecosystems, and to cluster stations along
such an environmental gradient may require forcing them into unrealistic
configurations. An ordination can let the investigator determine if clusters
exist.
Several methods of ordination are available. We have tested only
nonmetric multidimensional scaling, a technique that seems ideally suited to
data from aquatic environmental surveys. Because little use has been made of
ordination, we have evaluated it only against the results of cluster analysis.
In general, it seems to give comparable results, being especially sensitive to
-------
inadequate data, from which it produces uninterpretable results. Future
research should stress additional study of nonmetric multidimensional scaling,
especially applications to presence-absence data.
2.5 INDICES OF DIVERSITY
An index of species diversity is a single statistic that expresses both
the number of species present and the evenness of distribution of organisms
among species. Unlike cluster analysis and ordination, it does not consider
which species are present. Thus, a sample collected in a zone of recovery
downstream from a source of pollutional stress may have a species diversity
equal to samples from upstream control stations, yet the species may be
altogether different. Because cluster analysis and ordination measure a
different aspect of community structure from species diversity, the methods
should be used in combination for best results.
The concept of species diversity has been misunderstood, misapplied, and
subsequently much maligned in the ecological literature. Too many
investigators have overlooked the nonuniqueness of the relationship between a
particular community structure and the index of species diversity computed
from it. Others have sought global values of species diversity to indicate
healthy or damaged ecosystems, overlooking the dependence of all such indices
on sample size. Nevertheless, species diversity is a useful tool for applied
aquatic ecologists. If used properly and not overinterpreted, it gives a
useful and efficient measure of community structure for communicating
information about the state of the ecosystem to nonbiologists.
Species diversity may be partitioned according to categories in the
taxonomic hierarchy to give diversity components contributed by orders,
families, genera, and species. For some purposes of applied aquatic ecology,
discrimination of genera may provide as much environmental information as
identification of species, with savings of time and money. Diversity may also
be partitioned according to other hierarchical classifications that consider
functional morphology and feeding strategies. Hierarchical diversity was
tested in all three ways in the report and was found to be useful.
Species diversity can be computed with a number of equations. We
recommend the use of Brillouin's equation from information theory. Other
equations from information theory are biased and often give misleading
results. Hierarchical diversity analysis should be applied in future studies
to help reduce the cost of aquatic environmental surveys.
2.6 FINAL STATEMENT
Although most of the methods of applied aquatic ecology were derived from
theoretical ecology, the goals of the two sciences are not the same. It is
incumbent on applied ecologists to adapt and modify existing methods to suit
their needs and to address questions of urgency. At the same time, they must
continue to recognize their dependence on the groundwork laid by more
theoretically inclined ecologists.
-------
SECTION 3
DEVELOPMENT OF THE INFORMATION SYSTEM
3.1 GENERAL DESCRIPTION OF THE INFORMATION SYSTEM
When TVA initiated the research program on effects of energy use and
development, Task 1 (Information Systems Development) was responsible for
developing a system through which to measure and evaluate the impact of energy
technologies on biotic assemblages. To accomplish this task, a computerized
information system (CIS) was developed to accommodate biological data.
Criteria for design of the system were that it must interface with EPA's
BIO-STORET system, be inexpensive to operate, be user oriented, be adaptable
for use with both routine monitoring and research programs, and perform a
variety of analytical functions.
A flow diagram illustrating the CIS and steps involved in processing the
data is presented in Figure 1. In its present stage of development, the CIS
uses (1) the Statistical Analysis System (SAS) (Barr et al., 1976) to create
and manipulate data sets; print, sort, rank, and store data; and perform
analyses such as simple descriptive statistics, Duncan's multiple range test,
analysis of variance, correlation, prohibit analysis, and regression; (2) the
Numerical Taxonomy System (NTSYS) for cluster analysis and ordination with
nonmetric multidimensional scaling; (3) the MIT-SNAP programs (Hoaglin and
Welsch, 1975) for resistant regression and box plotting; (4) user-written
FORTRAN programs to calculate diversity indices; and (5) Tektronix software
for producing instantaneous hard-copy graphics.
3.2 PREPARATION OF DATA
To store and manipulate the large volume of information available within
TVA, a series of numeric and alphanumeric codes was developed to allow for
easier storage and retrieval of biological data; to save time and resources
each time the data were sorted, compiled, and analyzed; and to centralize
storage of environmental data.
An example of the standard coding form used to transfer field and labora-
tory data into a format for keypunching is presented in Figure 2. The type of
program (macrobenthos, phytoplankton, or zooplankton), account number, job
number, data originator, address and phone number of originator, and sheet
number are recorded at the top of each sheet.
The standard coding form has 80 columns and 26 rows. The 80 columns are
divided into 19 data fields, with 1 to 16 columns per field. The data fields
correspond to the following variables:
-------
RAW DATA
_y
DATA PREPARATION
DATA CONVERSIONS
CODING
KEYPUNCHING
DATA STORAGE
TAPE
DISC
CARDS
-------
ACCOUNT NO..
JOB NO..
DATA ORIGINATOR
PROGRAM
ADDRESS —
Sheet No..
PHONE (EXT)..
•O
O
0
€>
>
River Lotit
Mile Lone
5 IO
ude /
jitude
15
Date
20
Gear
Code
25
3
8
ti
!
Collect.
Number
30
1
Depth
35
Temp.
40
Time
45
1
H*
Q.
i
c/>
»
o«
a
£z
™
P
CE
K
&
^
S
2S
?
«l
c (A
~ ,«
c
c
O
K
-jConc Toxicant
Species Code
SO 65 70 75
Number
of
Organ.
80
Figure 2. Coding form used for keypunching field and laboratory limnological data.
-------
1. river code 11. sample type
2. river mile 12. replicate number
3. latitude and longitude 13. reporting unit
4. date 14. habitat type
5. gear code 15. instar and size class
6. sample location code 16. toxicant
7. collection number 17. concentration of toxicant
8. depth 18. species code
9. temperature 19. number of organisms
10. time
The first data field consists of a two-digit numeric code for rivers and
streams in the Tennessee Valley (Table 1) and sources of cultures utilized in
bioassays.
Columns 3 to 7 contain the river-mile location of a sampling site. This
information specifies the distance a site is located upstream from the mouth
of the river and is usually obtained from a basin navigation map.
The latitude and longitude of a site are entered in columns 8 to 15.
These data are reported in degrees and minutes. The first four columns are
for latitude, and the second four columns are for longitude. For example, a
sample taken at latitude 36 19' and longitude 86 23' is be entered as
"36198623."
Columns 16 to 21 specify the date of sampling or bioassay. The date is
entered numerically by year, month, and day with "730615" denoting June
15, 1973.
The code identifying the type of equipment used to collect a sample, is
entered in columns 22 to 26. This hierarchical gear code has six general
categories: undefined, artificial substrates, natural substrate removal,
direct organism removal, emergence traps, and volume samplers (Table 2). Five
of these categories are divided into subgroupings to accommodate information
such as sampler name, mesh size, or type of substratum used. Each sampling
method or device has a five-character alphanumeric code, starting with the
letter "A" and followed by four numbers. Generally, the first two digits
identify the type of sampling equipment, and the next two digits describe the
specific type of sampling material used in artificial substrate samplers
(Tables 2.1-2.3). For example, the four major categories of artificial
substrate samplers are baskets, trays, flat surfaces, and sterile indigenous
substrates; and one or more substrate materials can be used with each. Thus,
the code "A1210" denotes a tray sampler with a rock substratum. The code
"A1410" indicates that the sampler uses sterile, indigenous rocks (Table 2.1).
Table 2.2 lists the gear codes for natural substrate removal and organism
removal. The gear codes for natural substrate removal have five main
categories, three of which give specific information on the name or type of
sampler used. For example, the code "A2200" is the general code number for
corers, but "A2250" is the code for the Benthos 2170 gravity corer. Again,
the last two digits indicate the type of gear used.
Columns 27 and 28 contain a two-digit code (Table 3) that identifies the
distance a sample was collected from the riverbank. The codes "01" to "49"
-------
indicate distance from the right bank (facing downstream), and the codes "50"
to "98" are coded distances from the left bank. Code "63," for example,
indicates that a sample was taken 25.1 to 30.0 m from the left bank, facing
downstream.
The collection number is a five-character alphanumeric code in columns 29
through 33. The first two characters identify the location or project at
which at which a sample was collected. The remaining three digits refer to
the 1st, 2nd, 3rd, . . . , or 999th time a sample was collected at the site.
Table 4 lists only the alphabetic prefix of the collection code.
The depth (in meters) a sample was collected is recorded in columns 34 to
37. For example, samples collected 3 and 24.5 m below the surface are
recorded as "03.0" and "24.5." The decimal points occupies a separate column.
If the depth occupies only three columns, a zero is placed on the left, as in
"03.0." A computer printout of 3.0 is obtained by right justifying all
integers.
The water temperature (in C) at which the sample was collected or at
which the bioassay was performed is entered in columns 38 to 41. A tempera-
ture of 21.2 C is entered as "21.2." Again, the decimal point is entered in a
separate column. If the temperature occupies only three columns, zeros are
placed to the left to fill the space; e.g., 9.6 C is entered as "09.6."
The time of day (0000 to 2400) a sample was collected, or the length of
time for a bioassay, is recorded in columns 42 through 45. A sample collected
at 2:45 p.m. is entered as "1445." A data entry for hour 48 of a bioassay is
entered as 0048.
The code for sample type is a one-character alphabetic code entered in
column 46 (Table 5). This code identifies the type (community, parameter) of
sample or data collected or reported, such as periphyton, phytoplankton,
chlorophyll, or bioassay.
The replicate number is entered in columns 47 and 48. Because replicate
sampling and testing is done for most surveys and bioassays, it is important
to know not only how many replicates were collected but the replicate one is
dealing with. This code refers to a given sample or replicate and not to the
total number collected. Thus, "06" refers to the sixth sample collected at a
particular sample station, and "10" refers to the tenth sample taken at the
same station.
The units in which the data are reported are coded as a two-character
alphanumeric code entered in columns 49 and 50. Table 6 lists the units
commonly used. These units are listed according to general categories of
area, chlorophyll pigments, percentages, productivity-respiration,
radiation-light, rate, flags and foul-ups, length, number, ratio, temperature,
time, turbidity, volume, weight, and zooplankton.
The code for habitat type or ecologic zone is entered in columns 51
and 52. This two-digit code identifies the habitat (ecologic zone and
dominant substratum) found at each site. In the case of bioassays, the code
indicates the range of hardness (Table 7).
10
-------
Columns 53 to 55 contain a three-character code tor instar and HI'XP
class. The first column refers to the stage of development or to the set of
characters used to express the stage of development of a specimen:
I--instar
S--general size class
L--length class
P--pupal stage
H--head capsule
A—immature
The next two columns contain a numeric code that represents size intervals in
millimeters (Table 8). Because instars are not always determined by measure-
ments of total body length, it is important to specify on the laboratory bench
sheet whether head capsule dimensions, wing pad length, or other measurements
were used to determine the instar. Examples of common codes are:
L48--length class: 100.0 to 125.0 mm
S07—size class: 0.61 to 0.70 mm
P00--pupal stage (interval code disregarded)
A00--nauplius immature (interval code disregarded)
H03—head capsule width: 0.21 to 0.30 mm
Additional bioassay information where appropriate is recorded in columns
56 through 59. Symbols from the periodic table or a coded listing of
compounds are entered in columns 56 and 57, and the concentration of the
toxicant is entered in columns 58 and 59. If data on a toxicant are recorded,
the units code entered in columns 49 and 50 gives the units of the
concentration of the toxicant. For example, a combination of "68" in columns
49 and 50 (units; see Table 6), "HG" in columns 56 and 57 (toxicant), and "05"
in columns 58 and 59 (concentration of toxicant) indicates that a
concentration of 5 pg/1 mercury was used in the bioassay.
The next field refers to the 16-unit biological species code, which is
entered in columns 60 to 75. This code identifies eight taxonomic categories.
Columns 60 and 61 identify the phylum or division; columns 62 and 63 identify
class; columns 64 and 65 identify order; columns 66 and 67 identify family;
columns 68 and 79 identify genus; columns 71 to 73 identify species; and
columns 74 and 75 identify either the variety, form, or authority. The
numeric values are 01 to 99 for the two-digit fields and 001 to 999 for the
three-digit fields.
A typical 16-unit species code is "1801190100100100," which gives the
following information:
18 Phylum: Arthropoda
1801 Class: Crustacea
180119 Order: Branchiura
18011901 Family: Thalestridae
18011901001 Genus: Argulus sp.
18011901001001 Species: japonicus
1801190100100100 Authority: Thiele
11
-------
The codes for species found in the Tennessee Valley were compiled in the
Synoptic Catalog of Algae and Aquatic Invertebrates for the Tennessee Valley
included as Appendix A of this report.
The total number of organisms per data entry (row) is recorded in
columns 76 to 80. For field collections this number refers to the number of
organisms collected per data entry (sample or replicate). To indicate that 97
individuals of Chironomus tentans Fabricius were found per square meter, "NG"
is entered in the units field (columns 49 and 50), "1802111501602000" is
entered as the species code (columns 60 to 75), and "00097" is entered as the
number of organisms (columns 76 to 80). For recording bioassay results,
columns 76 to 80 are utilized to indicate the number of organisms killed and
the total number of organisms tested, respectively. For example, to indicate
that after 48 hours of an acute bioassay, 28 of 40 specimens of C. tentans had
died, "0048" is entered in columns 42-45, "H" is entered in column 46,
"1802111501602000" is entered in columns 60-75, "28" is entered in columns
76-77, column 78 is left blank, and "40" is entered in columns 79-80.
3.3 EDITING OF DATA
After the biological data are coded and transcribed onto the coding form,
they are keypunched onto cards of a specified color, depending on the type of
data collected. For example, zooplankton data are punched on yellow cards,
zoomacrobenthic data on red cards, phytoplankton data on green cards,
carbon-14 data on orange cards, and chlorophyll data on white cards.
After the data have been punched, they are read into the computer where a
SAS program sorts and merges the data with another SAS data set that contains
the taxonomic name associated with each species code. If a data record does
not have a taxonomic name for a given code, an error message is printed. A
list of the data is then checked by the principal investigator to ensure that
all other requirements of the data have been met. If a record is incomplete,
new data are added by means of a user-written FORTRAN program. For instance,
if a zoomacrobenthic species was found in sample replicates 1, 5, 6, and 9 at
a site, the FORTRAN program would insert zeros for replicate numbers 2, 3, 4,
7, 8, and 10 if ten replicated samples had been taken at the station. Once
this is done, SAS can be used to sort, print, and perform statistical
analyses. Additional data can also be merged with the test data set at this
time.
Once these steps have been completed, a formatted data set can be output
via a time-sharing option (TSO) or batch processed for use with other
programs. Programs used on TSO or submitted to batch from TSO are:
1. SPECLIST provides a species list. This list is checked to ensure
that all organisms listed were actually found in the study.
2. MATRIX provides a data matrix for use in NTSYS. NTSYS language
control cards are inserted on TSO, and the programs are submitted to
batch.
3. DIVER calculates the diversity index H'.
12
-------
4. MIT-SNAP; median polish is used from this software package.
5. NTSYS provides 7 types of clustering, 7 types of ordination, 24 data
transformations, and 22 indices of similarity or dissimilarity.
6. User-written programs for species diversity (H) and hierarchical
diversity.
13
-------
TABLE 1. NUMERICAL CODES FOR RIVERS AND STREAMS IN THE TENNESSEE VALLEY,
AND SOURCES OF ORGANISMS FOR BIOASSAYS (UTILIZED IN COLUMNS 1
AND 2 OF CODING FORM—FIGURE 2)
Numerical
code
00
01
02
03
04
05
06
07
08
09
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Alphabetical
code
TRM
ERM
HRM
LTRM
FBRM
HORM
HSRM
WRM
CURM
CRM
SRM
CFRM
MRM
OHRM
GRM
DRM
OBRM
BRM
PRM
HNRM
HPRM
NRM
PGRM
SQRM
CHRM
NTRM
TGRM
NLRM
TCRM
EMR
OCO
TELL
CFM
TCM
CCM
YCM
Identification
Tennessee
Elk
Hiwassee
Little Tennessee
French Broad
Ho Is ton
South Fork Holston
Watauga
Cumberland
Clinch
Stones
Caney Fork
Mississippi
Ohio
Green
Duck
Obey
Buffalo
Powell
North Fork Holston
Harpeth
Nolichucky
Pigeon
Sequatchie
Cheoah
Nantahala
Tuckasegee
Nottely
Toccoa
Emory
Ocoee
Tellico
Clear Fork River
Town Creek
Crooked Creek
Yellow Creek
Research Station,
Browns Ferry
Rearing Ponds ,
NFDC, Muscle
Shoals
EPA Laboratory,
Corvallis, OR
Used by other data storage systems in TVA.
14
-------
TABLE 2.0. CODES FOR IDENTIFYING METHODS OF LIMNOLOGICAL SAMPLING,
AND TYPES OF GEAR (UTILIZED IN COLUMNS 22-26 OF
CODING FORM—FIGURE 2)
AOOOO Undefined
A1000 Artificial substrates
A2000 Natural substrate removal
A3000 Direct organism removal
A4000 Emergence traps
A5000 Volume samplers
15
-------
TABLE 2.1. CODES FOR TYPES OF SAMPLING EQUIPMENT AND SAMPLING
MATERIAL USED IN ARTIFICIAL-SUBSTRATE SAMPLING
A1000 Artificial substrate
A1100 Baskets
A1110 Rock
Allll Rocks, bagged (basket collected in bag)
All12 Rocks, unbagged (basket not collected in bag)
A1120 Leaf
A1130 Brush
A1140 Conservation webbing
A1150 Other synthetic material
A1160 Balls (porcelain or other material)
A1170 Concrete and pebble blocks
A1171 Concrete, pebble blocks, and conservation webbing
A1200 Trays
A1210 Rock
A1220 Pebbles
A1230 Sand
A1240 Silt
A1250 Clay
A1260 Mud
A1270 Conservation webbing
A1280 Other synthetic material
A1300 Flat surfaces
A1310 Multiplate samplers
A1320 Glass slides or plates
A1330 Plexiglass slides or plates
A1340 Plastic strips or sheets
A1350 Polyethylene plates
A1360 Polyurethane
A1370 Polystyrene
A1380 Iron plates
A1390 Ceramic tile or block
A13AA Wood
A13BB Cement tile or block
A1400 Sterile indigenous substrate
A1410 Rock
A1420 Wood
A1430 Aquatic vascular plants
16
-------
TABLE 2.2. CODES FOR TYPES OF SAMPLING EQUIPMENT AND SAMPLING
MATERIAL USED IN NATURAL SUBSTRATE REMOVAL AND ORGANISM REMOVAL
A2000 Natural substrate removal
A2100 Dredges
A2110 Ecfcman
A2120 Petersen
A2130 Ponar
A2140 Franklin-Anderson
A2150 Shipek
A2160 Dietz-LaFond
A2170 Orange peel dredge
A2180 Tonolli spiraling mud burrower
A2200 Corers
A2210 Vertical core sampler
A2211 Cork borer
A2212 Dendy inverted sampler
A2213 Phleger corer
A2214 Ewing piston corer
A2220 FRB multiple corer
A2230 Peat corer
A2240 Deep core sampler
A2250 Benthos 2170 gravity corer
A2260 Alpine 211 gravity corer
A2270 PVC pipe corer
A2280 Glass tube corer
A2290 Kajak corer
A2300 Area samplers
A2310 Surber square foot sampler
A2320 Wilding square foot sampler
A2330 Hess circular sampler
A2340 Neill sampler
A2350 Dome sampler
A2360 Diver-Actuated Sampler - Circular
A2361 Diver-Actuated Sampler - Square
A2400 Scrapes
A2500 Ooze suckers
A3000 Organism removal
A3100 Kick nets
A3200 Hand net sweeps
A3300 Drift-nets
A3400 Grab samples
A3500 Trawls
17
-------
TABLE 2.3. CODES FOR TYPES OF SAMPLING EQUIPMENT USED WITH
EMERGENCE TRAPS AND VOLUME SAMPLERS
A4000 Emergence traps
A4100 Light traps
A4110 Lantern-sheet method
A4120 Incandescent
A4130 Fluorescent
A4140 Black light (UV)
A4200 Submerged traps
A4300 Floating
A4400 Aerial net traps
A4500 Staked box traps
A4600 Hand net sweeps
A5000 Volume samplers
A5100 Juday traps
A5200 Kemmerer bottles
A5300 Van Dorn bottles
A5400 Clarke-Bumpus plankton sampler
A5500 Nansen bottle
A5600 Grab samples
A5700 Undefined
A5800 0.5-Meter tow net (#20 mesh)
18
-------
TABLE 3. CODES FOR IDENTIFYING LOCATION OF LIMNOLOGICAL SAMPLES
ON A GIVEN RIVER OR RESERVOIR TRANSECT (UTILIZED IN
COLUMNS 27 AND 28 OF CODING FORM—FIGURE 2)
Code
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
r*
Interval (meters)
0.1
1.1
2.1
3.1
4.1
5.1
6.1
7.1
8.1
9.1
10.1
15.1
20.1
25.1
30.1
35.1
40.1
45.1
50.1
75.1
100.1
125.1
150.1
175.1
200.1
300.1
400.1
500.1
600.1
700.1
800.1
900.1
1000.1
1250.1
1500.1
1750.1
2000.1
2250.1
2500.1
3000.1
3500.1
4000.1
4500.1
5000.1
5500.1
6000.1
6500.1
7000.1
7500.1
- 1.0
- 2.0
- 3.0
- 4.0
- 5.0
- 6.0
- 7.0
- 8.0
- 9.0
- 10.0
- 15.0
- 20.0
-25.0
- 30.0
- 35.0
- 40.0
- 45.0
- 50.0
- 75.0
- 100.0
- 125.0
- 150.0
- 175.0
- 200.0
- 300.0
- 400.0
- 500.0
- 600.0
- 700.0
- 800.0
- 900.0
- 1000.0
- 1250.0
- 1500.0
- 1750.0
- 2000.0
- 2250.0
- 2500.0
- 3000.0
- 3500.0
- 4000.0
- 4500.0
- 5000.0
- 5500.0
- 6000.0
- 6500.0
- 7000.0
- 7500.0
- 8000.0
Code
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Interval (m)
0.1
1.1
2.1
3.1
4.1
5.1
6.1
7.1
8.1
9.1
10.1
15.1
20.1
25.1
30.1
35.1
40.1
45.1
50.1
75.1
100.1
125.1
150.1
175.1
200.1
300.1
400.1
500.1
600.1
700.1
800.1
900.1
1000.1
1250.1
1500.1
1750.1
2000.1
2250.1
2500.1
3000.1
3500.1
4000.1
4500.1
5000.1
5500.1
6000.1
6500.1
7000.1
7500.1
- 1.0
- 2.0
- 3.0
- 4.0
- 5.0
- 6.0
- 7.0
- 8.0
- 9.0
- 10.0
- 15.0
- 20.0
- 25.0
- 30.0
- 35.0
- 40.0
- 45.0
- 50.0
- 75.0
- 100.0
- 125.0
- 150.0
- 175.0
- 200.0
- 300.0
- 400.0
- 500.0
- 600.0
- 700.0
- 800.0
- 900.0
- 1000.0
- 1250.0
- 1500.0
- 1750.0
- 2000.0
- 2250.0
- 2500.0
- 3000.0
- 3500.0
- 4000.0
- 4500.0
- 5000.0
- 5500.0
- 6000.0
- 6500.0
- 7000.0
- 7500.0
- 8000.0
Distance from right bank (facing downstream),
Distance from left bank (facing downstream).
19
-------
TABLE 4. ALPHABETIC CODES IDENTIFYING SITE (OR PROJECT) AT WHICH
SAMPLE WAS COLLECTED (UTILIZED IN
COLUMNS 29 AND 30 OF CODING FORM—FIGURE 2)
Site prefix code
Site
Site prefix code
Site
AA
AB
AC
AD
AE
AF
AG
AH
AI
AJ
AK
AL
AM
AN
AO
AP
AQ
AR
AS
AT
AU
AV
AW
AX
AY
AZ
BA
BB
BC
BD
BE
BF
BG
BH
BI
BJ
BK
BL
BM
BN
BO
BP
BQ
BR
BS
BT
BU
BV
Raccoon Mountain (P)
Bellefonte (N)
Browns Ferry (N)
Hartsville (N)
Sequoyah (N)
Watts Bar (N)
Yellow Creek (N)
Watts Bar (FF)
Johnsonville (FF)
Widows Creek (FF)
Shawnee (FF)
Kingston (FF)
Colbert (FF)
John Sevier (FF)
Gallatin (FF)
Allen (FF)
Paradise (FF)
Bull Run (FF)
Cumberland (FF)
Kentucky (F)
Pickwick (R)
Wilson (R)
Wheeler (R)
Guntersville (R)
Nickajack (R)
Chickamauga (R)
Watts Bar (R)
Fort Loudon (R)
Melton Hill (R)
Norris (R)
Douglas (R)
Nolichucky (R)
Cherokee (R)
Fort Patrick Henry (R)
Boone (R)
Watauga (R)
South Holston (R)
Tims Ford (R)
Tellico (R)
Chilhowee (R)
Calderwood (R)
Cheoah (R)
Fontana (R)
Santetlah (R)
Nantahala (R)
Thorpe (R)
Appalachia (R)
Hiwassee (R)
BW
BX
BY
BZ
CA
CB
CC
CD
CE
CF
CG
CH
CI
CJ
CK
CL
CM
CN
CO
CP
CQ
CR
CS
CT
CU
CV
CW
CX
CY
CZ
DA
DB
DC
DD
DE
DF
DG
DH
DI
DJ
DK
DL
DM
DN
DO
DP
DQ
Chatuge (R)
Nottely (R)
Parksville (R)
Blue Ridge (R)
Hales Bar (R)
Great Falls (R)
Barkley (R)
Cheatham (R)
Old Hickory (R)
Ocoee No. 1 (H)
Wilbur (H)
Ocoee No. 2 (H)
Nolichucky (H)
Great Falls (H)
Wilson (H)
Blue Ridge (H)
Norris (H)
Wheeler (H)
Pickwick (H)
Guntersville (H)
Chickamauga (H)
Hiwassee (H)
Cherokee (H)
Watts Bar (H)
Nottely (H)
Chatuge (H)
Ocoee No. 3 (H)
Appalachia (H)
Douglas (H)
Fort Loudon (H)
Kentucky (H)
Fontana (H)
Watauga (H)
South Holston (H)
Boone (H)
Fort Patrick Henry (H)
Melton Hill (H)
Nickajack (H)
Tims Ford (H)
Clinch R. Breeder
Clinch R. Carbo Plant
Jamestown CH (S)
Jamestown CM (S)
Jamestown CA (S)
Jamestown LY (S)
Jamestown LB (S)
Jamestown PM (S)
a
'Abbreviations: P = pump storage facility; N = nuclear power plants; FF - fossil
fuel power plants; R = reservoir; H = hydro plants; and S = strip mine site.
20
-------
TABLE 5. CODES FOR IDENTIFYING TYPE OF SAMPLE (COMMUNITY, PARAMETER,
TEST) UTILIZED IN'COLUMN 46 OF CODING FORM—FIGURE 2
Code Sample type
A Zoomacrobenthos
B Periphyton
C Phytoplankton
D Zooplankton
E Macrophyton
F Productivity, light bottle-
dark bottle (oxygen)
14
G Productivity, C
H Bioassay, acute
I Bioassay, chronic
21
-------
TABLE 6. CODES UTILIZED TO REPORT DATA UNITS (COLUMNS 49 AND 50
__^__ ON CODING FORM—FIGURE 2)
Area
AO Undefined
Al Sq. micron
A2 Sq. millimeter
A3 Sq. centimeter
A4 Sq. meter
A5 Hectare
A6 Sq. kilometer
A7 Sq. inch
A8 Sq. foot
A9 Sq. yard
AA Acre
AB Sq. mile
Chlorophyll-pigments
CO Undefined
Cl Micrograms active chlorophyll A/sq. centimeter
C2 Micrograms phaeophytin/sq. centimeter
C3 Micrograms chlorophyll A/sq. centimeter
C4 Micrograms chlorophyll B/sq. centimeter
C5 Micrograms chlorophyll C/sq. centimeter
C6 Micrograms beta-carotene/sq. centimeter
C7 Milligrams active chlorophyll A/sq. meter
C8 Milligrams phaeophytin/sq. meter
C9 Milligrams chlorophyll A/sq. meter
CA Milligrams chlorophyll B/sq. meter
CB Milligrams chlorophyll C/sq. meter
CC Milligrams beta-carotene/sq. meter
CD Milligrams active chlorophyll A/cubic meter
CE Milligrams phaeophytin/cubic meter
CF Milligrams chlorophyll A/cubic meter
CG Milligrams chlorophyll B/cubic meter
CH Milligrams chlorophyll C/cubic meter
CI Milligrams beta-carotene/cubic meter
CJ Milligrams chlorophyll A/liter
22
-------
TABLE 6 (continued)
Percentages
0 Undefined
XI Percent abundance, numbers
X2 Percent abundance, biomass
X3 Percent efficiency
Productivity-respiration
PO Undefined
PI Milligrams ATP/cubic meter/day
P2 Milligrams ATP/cubic meter/hour
P3 Milligrams ATP/sq. meter/day
P4 Milligrams ATP/sq. meter/hour
P5 Milligrams C/cubic meter/day
P6 Milligrams C/cubic meter/hour
P7 Milligrams C/sq. meter/day
P8 Milligrams C/sq. meter/hour
P9 Milligrams CCL/cubic meter/day
PA Milligrams CO /cubic meter/hour
PB Milligrams CL/cubic meter/day
PC Milligrams CL/cubic meter/hour
PD Milligrams protein/cubic meter/day
PE Milligrams protein/cubic meter/hour
PF Milligrams protein/sq. meter/day
PG Milligrams protein/sq. meter/hour
PH Grams C/cubic meter/day
PI Grams C/cubic meter/hour
PJ Grams C/sq. meter/day
PK Grams C/sq. meter/hour
PR Milligrams CL uptake/gram fr. wt./day
PS Milligrams CL uptake/gram fr. wt./hour
PT Milligrams CL uptake/gram dry wt./day
PU Milligrams CL uptake/gram dry wt./hour
Radiation-light
RO Undefined
Rl Foot-candles
R2 Gram calories/centimeter square
R3 Langleys
R4 Lux
R5 Percent of surface illumination
23
-------
TABLE 6 (continued)
Rate-(l)
10 Undefined
11 Millimeters/second
12 Centimeters/second
13 Meters/second
14 Millimeters/minute
15 Centimeters/minute
16 Meters/minute
17 Millimeters/hour
18 Centimeters/hour
19 Meters/hour
1A Kilometers/hour
IB Inches/second
1C Inches/minute
ID Inches/hour
IE Feet/second
IF Feet/minute
1G Feet/hour
1H Miles/hour
Rate-(v)
20 Undefined
21 Cubic feet/second
22 Cubic meters/second
Rate-(wt/a)
30 Undefined
31 Grams/sq. meter/day
32 Grams/sq. meter/hour
33 Milligrams/sq. meter/day
34 Milligrams/sq. meter/hour
Rate-(wt/vol)
40 Undefined
41 Grams/cubic meter/day
42 Grams/cubic meter/hour
43 Micrograms/liter/day
44 Micrograms/liter/hour
45 Milligrams/liter/day
46 Milligrams/liter/hour
24
-------
TABLE 6 (continued)
Flags and foul-ups
FO Undefined
Fl Sample lost during collection
F2 Sample lost during analysis
F3 Unable to access sampling station
F4 Unable to recover sample or substrate
LO Undefined
LI Micron
L2 Millimeter
L3 Centimeter
L4 Meter
L5 Kilometer
L6 Inch
L7 Foot
L8 Yard
L9 Mile
LA International nautical mile
Number
NO Undefined
Nl Number of colonies
N2 Number of eggs
N3 Number of exuviae
N4 Number of hatches
N5 Number of individuals
N6 Number/acre
N7 Number/cubic meter
N8 Number/gram dry wt.
N9 Number/gram ashfree wt.
NA Number/hectare
NB Number/liter
NC Number/milliliter
ND Number/sq. millimeter
NE Number/sq. centimeter
NF Number/sq. foot
NG Number/sq. meter
NH Number/sample
NI Number/cubic meter X 10
25
-------
TABLE 6 (continued)
Ratio (wt/a)
50 Undefined
51 Grams/sq. meter
52 Kilograms/acre
53 Kilograms/hectare
54 Micrograms/sq. centimeter
55 Milligrams/sq. centimeter
56 Milligrams/sq. meter
57 Grams/sq. meter, ashfree
58 Milligrams/sq. centimeter, ashfree
59 Milligrams/sq. meter, ashfree
5A Grams/sq. meter, ash
5B Milligrams/sq. centimeter, ash
5C Milligrams/sq. meter, ash
5D Kilograms/channel
Ratio (wt/vol)
60 Undefined
61 Grams/cubic meter
62 Grams/liter
63 Kilograms/cubic meter
64 Micrograms/milliliter
65 Milligrams/cubic meter
66 Milligrams/liter
67 Milligrams/milliliter
68 Micrograms/liter
Temperature
DO Undefined
Dl Degrees Celsius
D2 Degrees Fahrenheit
D3 Degrees Kelvin
D4 Degree-days
Time
TO Undefined
Tl Day
T2 Hour
T3 Minute
T4 Month
T5 Second
T6 Week
T7 Year
26
-------
TABLE 6 (continued)
Turbidity
GO Undefined
Gl Jackson turbidity units (JTU)
G2 Formazin turbidity units (FTU)
G3 Coleman nephlos units (CTU)
G4 Percentage transmittance (%T)
Volume
VO Undefined
VI Cubic micron
V2 Cubic millimeter
V3 Milliliter
V4 Liter
V5 Cubic meter
V6 Cubic kilometer
V7 Cubic inch
V8 Ounce
V9 Cubic foot
VA Cubic yard
VB Cubic mile
WO Undefined
Wl Picogram
W2 Microgram
W3 Milligram
W4 Centigram
W5 Gram
W6 Kilogram
W7 Grain
W8 Ounce
W9 Pound
WA Ton
WB Milligram, ashfree
WC Gram, ashfree
WD Kilogram, ashfree
WE Milligram, ash
WF Gram, ash
WG Kilogram, ash
27
-------
TABLE 6 (continued)
Zooplankton
ZO
Zl
Z2
Z3
Z4
Z5
Z6
Z7
Z8
Z9
ZA
ZB
ZC
ZD
Undefined
No. females
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
females
females
females
females
females
females
females
females
females
females
females
females
with 0 eggs per brood chamber
with 1 egg per brood chamber
with 2 eggs per brood chamber
with 3 eggs per brood chamber
with 4 eggs per brood chamber
with 5 eggs per brood chamber
with 6-7 eggs per brood chamber
with 8-10 eggs per brood chamber
with 11-15 eggs per brood chamber
with 16-20 eggs per brood chamber
with 21-25 eggs per brood chamber
with 26-30 eggs per brood chamber
with >30 eggs per brood chamber
28
-------
TABLE 7. CODES FOR IDENTIFYING BASIC TYPE OF HABITAT FROM WHICH
SAMPLE WAS COLLECTED, OR HARDNESS AT WHICH BIOASSAY WAS PERFORMED
(USED IN COLUMNS 51 AND 52 OF CODING FORM—FIGURE 2)
Code
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
G
H
Ecologic zone Code
Pool 1
Riffle 2
Profundal 3
Pelagic 4
Littoral 5
Sublittoral 6
Eulittoral 7
Abyssal 8
Channel 9
Overbank A
Tail water B
Supratidal C
Intertidal
Subtidal
Splash
Intake canal
Discharge canal
Substratum
Bedrock
Boulders
Rubble
(small rocks)
Gravel
Sand
Silt
Clay
Marl
Organic detritus
(unconsolidated)
Fibrous peat
Pulpy peat
Organic muck
Code
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
Water Hardness
(mg/1 as CaC03)
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
29
-------
TABLE 8. CODES FOR RECORDING INSTAR OR SIZE OF ORGANISMS
COLLECTED OR USED IN BIOASSAYS (UTILIZED IN COLUMNS 54 AND
55 OF CODING FORM—FIGURE 2)
Code
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Interval
0.001 -
0.11 -
0.21 -
0.31 -
0.41 -
0.51 -
0.61 -
0.71 -
0.81 -
0.91 -
1.1 -
2.1 -
3.1 -
4.1 -
5.1 -
6.1 -
7.1 -
8.1 -
9.1 -
10.1 -
11.1 -
12.1 -
13.1 -
14.1 -
15.1 -
16.1 -
17.1 -
18.1 -
19.1 -
20.1 -
21.1 -
22. 1 -
23.1 -
(mm)
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
13.0
14.0
15-0
16.0
17.0
18.0
19.0
20.0
21.0
22.0
23.0
24.0
Code
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
Interval
24.1 -
25.1 -
27.1 -
29.1 -
31.1 -
33.1 -
35.1 -
37.1 -
39.1 -
41.1 -
43.1 -
45.1 -
50.1 -
75.1 -
100.1 -
125.1 -
150.1 -
175.1 -
200.1 -
225.1 -
250.1 -
275.1 -
300.1 -
325.1 -
350.1 -
375.1 -
400.1 -
425.1 -
450.1 -
475.1 -
500.1 -
525.1 -
550.1 -
(mm)
25.0
27.0
29.0
31.0
33.0
35.0
37.0
39.0
41.0
43.0
45.0
50.0
75.0
100.0
125.0
150.0
175.0
200.0
225.0
250.0
275.0
300.0
325.0
350.0
375.0
400.0
425.0
450.0
475.0
500.0
525.0
550.0
575.0
Code
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93 -
94
95
96
97
98
99
Interval
575.1 -
600.1 -
625.1 -
650.1 -
675.1 -
700.1 -
725.1 -
750.1 -
775.1 -
800.1 -
825.1 -
850.1 -
875.1 -
900.1 -
925.1 -
950.1 -
975.1 -
1000.1 -
1100.1 -
1200.1 -
1300.1 -
1400.1 -
1500.1 -
1600.1 -
1700.1 -
1800.1 -
1900.1 -
2000.1 -
3000.1 -
4000.1 -
5000.1 -
7500.1 -
(mm)
600.0
625.0
650.0
675.0
700.0
725.0
750.0
775.0
800.0
825.0
850.0
875.0
900.0
925.0
950.0
975.0
1000.0
1100.0
1200.0
1300.0
1400.0
1500.0
1600.0
1700.0
1800.0
1900.0
2000.0
3000.0
4000.0
5000.0
7500.0
10000.0
>10000.1
30
-------
SECTION 4
METHODS
4.1 DESCRIPTION OF ANALYTICAL TECHNIQUES
After the CIS had been placed in operation, emphasis was initially placed
on analysis of selected sets of data representing radically different environ-
mental situations with which the staff had had experience. Three analytical
(exploratory) techniques were tested with these data. The first was cluster
analysis, which was used (1) to determine the similarity of selected sites
along two rivers on the basis of the biotic assemblages found in samples from
those sites (Q-mode) and (2) to identify biological communities or
associations of species (R-mode).
The second technique was an ordination procedure called nonmetric multi-
dimensional scaling (MDS). This technique was used because it allows "one to
examine a scatter diagram displaying a summary of the structure of the data
without- having to first assume that clusters are present" (Rohlf 1970).
Ordination was thus an alternative to cluster analysis and served as a check
to help determine if stations or species actually formed distinct groups.
The third technique considered was an index of species diversity, which
concurrently examines the number of species and the uniformity or evenness of
distribution of individuals among species (Pielou 1969).
4.1.1 Cluster Analysis
Cluster analysis is a multivariate analytical technique used to consider
simultaneously all the data contained in a large data matrix. A frequent
application of this technique is to search for patterns in data, especially
data that do not meet assumptions of rigorous statistical methods.
When cluster analysis is used to analyze data from limnological surveys,
the data are tabulated by taxa for each station into a data matrix in which
rows are taxa and columns are stations at which samples were collected or, in
the case of multiple samples from the same station, the samples themselves.
The data can be presented as the number of individuals of each species in each
sample, presence or absence of species in samples, or ranked abundances of
species. A typical matrix for presence-absence data is shown below, in which
1 stands for presence and 0 stands for absence.
31
-------
Station
Taxa
A
B
C
D
E
F
1
1
1
1
1
1
0
2
1
1
1
1
0
1
3
0
0
1
0
1
1
4
0
0
1
0
1
1
5
1
0
0
0
0
1
Once a data matrix has been compiled, a similarity or distance matrix is
computed that expresses the resemblance between each pair of samples or
species in the data matrix. Pairwise comparison between columns (samples) is
referred to as Q-mode analysis; comparison between rows (species) is called
R-mode analysis. Any of a variety of similarity, correlation, or distance
coefficients may be used to quantify the resemblance. One of the simplest is
Jaccard's coefficient, S (Jaccard 1908):
J a + b + c '
where, in Q-mode analysis;
a = number of taxa found at both stations,
b = number of taxa found at the first station and not
the second,
c = number of taxa found at the second station and not
the first.
The similarity matrix that results from Q-mode analysis of the hypothetical
data matrix on the previous page is:
Stations
1
2
3
4
5
4/6
276 2/6
2/6 2/6 3/3
1/6 2/5 1/4 1/4
The next step in the procedure is the clustering itself. One of the most
widely used techniques for clustering is the unweighted pair-group method
using arithmetic averages (UPGMA) (Sokal and Sneath, 1963). Using this
procedure the computer first seeks mutually closest resemblances in the
similarity or distance matrix. In our example the closest resemblances,
indicated by the highest similarity coefficients, are (1) station 1 with
station 2 (4/6 = 0.67) and (2) station 3 with station 4 (3/3 - 1.00). Note
32
-------
that station 5's highest similarity (2/5) is with station 2 but that this is
not a mutually highest similarity because station 2 is more similar to station
1 than it is to station 5. After the mutually most similar pairs have been
found, the average similarities of these stations with all others in the
matrix is found, and the matrix is recomputed. Mutually highest pairs in the
new similarity matrix are sought, and the process is repeated until all
stations have joined a cluster. Continuing with our example:
Avg. sim. (1-2) with (3-4) = (1 w/ 3 + 1 w/ 4 + 2 w/ 3 + 2 w/ 4)/4
= (2/6 + 2/6 + 2/6 + 2/6)4 = 0.33
Avg. sim. (1-2) with (5) = (1 w/ 5 + 1 w/ 5)/2
- (1/6 + 2/5)/2 = 0.28
Avg. sim. (3-4) with (5) = (3 w/ 5 + 4 w/ 5)/2
(1/4 + l/4)/2 = 0.25
The resulting recomputed similarity matrix, with mutually highest similarity
underlined is:
Stations
1-2
3-4
5
1-2 3-4
0.33
0.28 0.25
5
Continuing:
Avg. sim. (1-2-3-4) with (5) = [(1-2 w/ 5 + (3-4) w/ 5]/2
= (0.28 + 0.25)/2 = 0.27
33
-------
The scale at the top of the figure indicates the level of average similarity
between stations and clusters of stations. For example, station 1 clusters
with station 2 at a similarity of 0.67; they in turn cluster with stations 3
and 4 at an average similarity of 0.33, and the four stations join station 5
at an average similarity of 0.27.
The last step in cluster analysis is to compute a coefficient of
cophenetic correlation (Sokal and Rohlf, 1962). This procedure, described by
Roback et al. (1969),
"is necessary because the clustering method involves averaging of simi-
larities in order to express the multidimensional Jaccard coefficient
matrix as a 2-dimensional, hierarchical relationship. Sokal and Rohlf
(1962) have developed a method of making this comparison in which
similarity values from the dendrogram are expressed as a matrix of
cophenetic values. A correlation coefficient is computed between this
matrix and the original matrix of coefficients of association. This
correlation coefficient, called the cophenetic correlation, is a measure
of the amount of distortion introduced by the clustering method. The
unweighted pair-group method commonly yields a higher cophenetic correla-
tion than other clustering methods."
The principal difficulty in using cluster analysis, particularly in the
study of lotic and semilotic systems, is that most agglomerative clustering
algorithms construct hyperspheroidal clusters. In some instances, opposite
sides of a stream may be less similar to each other than either side is to
upstream or downstream areas. In such situations, longitudinal environmental
gradients may exist rather than true clusters, and hyperspheres may be poor
descriptors. In essence, the use of cluster analysis superimposes hyper-
spheroidal clusters onto the natural system, whether or not such clusters
express the true similarities in the natural system (Kaesler, 1970).
If hyperspheroidal clusters do not exist, the use of cluster analysis
will result in distortion of the similarities among samples. Fortunately, the
overall amount of distortion introduced by clustering can be measured by the
coefficient of cophenetic correlation, r (Sokal and Rohlf, 1962; Farris,
1969; Kaesler, 1970). Use of the coefficient provides a means of quantifying
the uncertainties about applying cluster analysis to a particular set of data.
Values of r larger than 0.8 usually indicate that serious distortion has not
been introduced. The larger the matrix of similarity coefficients, the
greater the chance of introducing distortion of similarities that may go
undetected.
4.1.2 Ordination by Nonmetric Multidimensional Scaling
Ordination is an exploratory analytical technique often used when dealing
with large sets of data. In aquatic ecology, ordination involves plotting
either the samples or the taxa found in the samples in a two- or
three-dimensional scatter diagram. The choice of axes on which the points are
plotted depends on the method of ordination selected. In the most successful
ordinations, either the samples are plotted in a space where the axes are
taxa, or the taxa are plotted in a space defined by the samples. The primary
advantage of ordination over cluster analysis is that it "allows one to
examine a scatter diagram displaying a summary of the structure of the data
34
-------
without having to first assume that clusters are present" (Rohlf, 1970). In
this sense, ordination is a powerful alternative to cluster analysis.
Several techniques of ordination are available, including principal
component ordination, principal coordinate ordination, the Bray-Curtis polar
ordination technique (Whittaker, 1975), and nonmetric multidimensional scaling
(MDS) (Kruskal, 1964a & 1964b). Only MDS was dealt with in this project
because it seems to be ideally suited to the kinds of data obtained from
biological surveys. Specifically, MDS is computationally robust when data are
missing; it can accommodate quantitative, ranked, or presence-absence data;
and it can be used with any measure of correlation, similarity, or distance.
In contrast, principal component ordination is usually inappropriate with
missing data and operates only on matrices of correlation coefficients.
Principal coordinate ordination is not hampered by missing data and operates
on a distance matrix, but it gives results that are proportional to principal
component ordination when no data are missing. Thus, principal coordinate
ordination is likely to be redundant when used with principal component
ordination.
An important advantage of MDS, according to Sneath and Sokal (1973), is
that "it seems better than principal component analysis in giving balance
between the large intercluster distances and the fine differences between
members of a given cluster." That is, MDS is a reasonable compromise between
cluster analysis, which recognizes fine, intracluster distances, and principal
component ordination, which gives a better indication of large intercluster
distances than of fine intracluster distances.
The following discussion of MDS is based on the work of Green and
Carmone (1970) and Rohlf (1972). Principal component analysis, principal
coordinate analysis, and the ordinations developed from them operate in such a
way as to find dimensions or components that explain the greatest amount of
variance or scatter in the data. The first principal component explains the
most variance. The second principal component explains as much of the
remaining variance as possible, and it is perpendicular to and uncorrelated
with the first component. If any variance remains, additional components are
computed and arranged in the same manner described for the first two. In
practice, the investigator usually settles for three principal components
because three is the greatest number that can be expressed graphically
although Green (1979) has urged caution because much of the variation in a
data set "may be irrelevant to the purposes of the study."
Table 9 shows data on proportional faunal composition of twelve samples
from a hypothetical ecosystem containing ten species. Table 10 shows a
distance matrix computed from the data in Table 9 after they were transformed
by means of an arcsine transformation. The distance matrix is in the Q-mode
and thus shows similarities among samples based on their faunal composition.
Figure 3 shows an ordination that resulted from one-dimensional nonmetric
MDS, in which the stations are arrayed along a line. The stress equals 0.230,
which indicates a fit that is less than fair.
Figures 4 and 5 show the results of nonmetric MDS in two dimensions. The
figures show ordinations, respectively, after 1 and 20 iterations. The stress
35
-------
H Jl GEDBCA
Figure 3. One-dimensional Q-^mode ordination of hypothetical samples iji Table 9 computed
with nonmetric multidimensional scaling, nine iterations (stress = 0.230).
-------
• H
.B
'D
'C
Figure 4. Two-dimensional Q-mode ordination of hypothetical samples in
Table 9 computed with nonmetric multidimensional scaling, one iteration
(stress = 0.097).
•H
I
•A
•B
"6 D*C
•> . •
Figure 5. Two-dimensional Q-mode ordination of hypothetical samples in
Table 9 computed with nonmetric multidensional scaling, 20 iterations
(stress = 0.051).
37
-------
of these ore) i naLJ oris is, respectively, 0.097 and 0.051. In the write-up for
the NT-SYS programs, Rohlf evaluated stress as follows (modified from Kruskal,
1964a):
STRESS GOODNESS OF FIT
0.40 Poor
0.20 Fair
0.10 Good
0.05 Excellent
0.00 "Perfect"
Because our hypothetical data set is small and well structured, the
goodness-of-fit is good to excellent. As a result, little change in stress or
in the ordination itself is seen as the number of iterations increases.
Slight changes in the configuration of the ordination can be observed,
however, particularly within cluster ABCDEG and as station F gradually moves
away from that cluster.
The ordination in Figure 6 is nearly perfect. It is a three-dimensional
ordination in which the vertical axis has been greatly exaggerated. Note that
samples A, B, C, D, E, and G are grouped together and that sample F is close
on the two horizontal axes. The wide separation of sample F from samples A,
B, C, D, E, and G on the vertical axis results partly from the exaggerated
vertical scale and partly from its low similarity to the other samples
(Table 13). The other five samples (H, I, J, K, and L) are arrayed linearly
more than they are grouped into distinct clusters.
For comparison the results of a Q-mode cluster analysis of the same data
set (Tables 9 and 10) are shown in Figure 7. Samples A, B, C, D, E, and G
form a compact cluster, as do samples H and I and samples J, K, and L. Sample
F is similar to cluster ABCDEG, but it is not closely similar to all members
of that cluster. (Refer to Table 10 for actual, unaveraged similarities.)
The use of principal component and principal coordinate analysis in an
ecological context generally consists of representing the samples collected in
a space of reduced dimensionality (Q-mode) where the axes of the space are
composites of species that explain as much of the sample variance as possible.
Alternatively, one may plot the species found in the study in a space where
the axes are composites of samples (R-mode application). Interpretation of an
ordination based on either of these methods (principal component or principal
coordinate analysis) involves two steps: (1) the explanation of the
configuration of points (the arrangement of samples in a reduced species-space
or of the species in a reduced reification of station-space), and (2)
reification of the axes and relating them to some important environmental
factor, such as degree of environmental stress, season, or stream gradient.
In general, the development of computational methods for multivariate analysis
is far ahead of the development of means of interpreting the results.
Nonmetric MDS takes an entirely different approach to ordination.
Similarities or distances between samples are treated on the ordinal scale;
that is, they are ordered from smallest to largest. As a result, a
configuration is found in which "rank order of (ratio scaled) distances best
produces the original input ranks. One tries to do this in the lowest
38
-------
K
H
B!
D
Figure 6. Three-dimensional Q-mode ordination of hypothetical samples in Table 9 computed
with nonmetric multidimensional scaling, 43 iterations (stress = 0.001).
-------
D
Figure 7. Dendrogram computed from Q-mode cluster analysis of a matrix of
distance coefficients (Table 10) showing faunal similarities between
hypothetical samples in Table 9.
-------
dimensionality that produces a 'close enough' ordinal fit" (Green and Carmone,
1970). The important point is that MDS operates on ranked similarities and
distances rather than on actual similarities. Thus, MDS is perfectly
applicable to matrices of similarity and distance that are computed from
ranked data or even presence-absence data.
For a specified number of dimensions, chosen in advance by the investi-
gator, the computer programs "try to find a configuration of points whose
interpoint distances are monotone—that is, have the same (or possibly the
inverse) ranks as the input data" (Green and Carmone, 1970). The coordinates
of this new configuration are the values used, in the ordination. In practice,
perfect configurations are unusual. The measure of departure from
monotonicity is called stress. The higher the stress, the less nearly perfect
the degree of monotonicity.
4.1.3 Indices of Species Diversity
Margalef (1956) proposed the use of indices derived in information theory
for analysis of multispecies communities. Such use is appropriate where
diversity is equated with the uncertainty that exists as to what species will
be found when a single organism is selected at random from the community. The
greater the number of species present in a community and the more even their
distribution, the greater the degree of uncertainty and, thus, the larger the
associated species diversity. Information content is a measure of uncertainty
and is thus a reasonable measure of diversity as well (Pielou, 1977; Kaesler
and Herricks, 1977). The three most commonly used indices of species diversity
from information theory are Shannon's index (H*) (Shannon and Weaver, 1949),
the approximate index (H") (also called d by Wilhm and Dorris, 1968), and
Brillouin's (1962) index (H).
H' = - I p. log p.
H" = - I (Ni/H) loge (N./N)
1 N!
8
N e N ' N ' ... N !
12 s
where p. is the probability of selecting at random a member of the .ith
species, N. is the number of individuals of the i.th species in a collection, N
is the total number of individuals of all species in a collection, and s is
the number of species. Logarithms to the base e are recommended, although
some authors have used the base 2 or the base 10 (Wilhm and Dorris, 1968). It
is necessary to specify which logarithmic base is used because the value of
diversity is quite base-dependent.
-------
Pielou (1966, 1969, 1975, 1977) and Kaesler and Herricks (1977), in the
context of applied aquatic biology, have stressed that Brillouin's index H is
the appropriate one for use with fully censused collections of organisms,
especially from biological surveys. H* cannot be used because our only
knowledge of the p.'s must come from samples. H", which uses data from
samples, is a biased estimator of H, always giving too high a value. H, on
the other hand, gives the actual species diversity of a fully censused
collection of organisms.
In the past, Brillouin's H has been difficult to compute because the
factorials involved usually become astronomically large, and even their
logarithms are difficult to handle. The ready availability of high-speed
digital computers, however, obviates any perceived need to use the approximate
diversity H".
In addition to the traditional use of Brillouin's equation for species
diversity, diversity was also partitioned into components of diversity using
two proposed classification schemes contributed at several levels of three
hierarchical classifications, the taxonomic hierarchy, a trophic-functional
hierarchy, and a head-body-respiratory functional morphological hierarchy
(Tables 11, 12, and 13).
4.2 DESCRIPTION OF DATA SETS
To evaluate and delineate the variations and the usefulness of the three
described analytical techniques, three representative sets of zoomacrobenthic
data were selected for testing. These are referred to as (1) the Clinch River
data set (acute stress); (2) the Cumberland River data set--1973 (chronic
stress, high flow); and (3) the Cumberland River data set--1975 (chronic
stress, low flow). Zoomacrobenthic data sets alone were utilized in part
because Kaesler and Cairns (1972) noted a great deal of redundancy among
information derived from the different groups of organisms that are commonly
studied as a part of biological surveys. They concluded that the distribution
of aquatic insects often is representative of the total biota in lotic
environments.
4.2.1 Clinch River Data Set
The Clinch River data set includes information on the fauna collected
immediately before and after an accidental release of concentrated sulfuric
acid. The spill resulted in an acute stress that killed an estimated 5300
fish. A cursory examination by the Virginia State Water Control Board
indicated that stream damage was confined to a 22-km section of the stream,
starting 1.5 km downstream from the power plant and extending to St. Paul,
Virginia, a distance of 22 km (Soukup, 1970). This conclusion was later
substantiated by Grossman et al. (1973, 1974).
Six similar riffle-pool habitats were sampled: one upstream control
station, four stations in the affected area, and one station downstream from
St. Paul, Virginia (Figure 8). Station 4, 2.5 km upstream from the site of
the spill, served as a control station. Stations 7, 8, 9, and 10 were located
downstream from the power plant at 3- to 10-km intervals within the affected
area. Station 11 was located 11 km downstream from station 10 and was used to
substantiate whether the effects of the spill were restricted to the section
of the river from the power plant to St. Paul, Virginia.
42
-------
Figure 8. Map of the Clinch River in Virginia and Tennessee showing the
locations of stations sampled during the 1970 zoomacrobenthic
survey.
-------
Stations 4, 7, 8, 9, and 10 were sampled 12 to 48 hours before the spill
as part of a survey initiated the previous year that also included numerous
other upstream and downstream stations (Crossman et al., 1973). Stations 7,
8, 9, 10, and 11 were sampled again immediately after notification of the
spill to determine the extent of the damage to the zoomacrobenthic community.
Sampling continued at two-week intervals for the next 56 days at stations 7,
8, 9, and 10. Stations 4 and 11, the upstream and downstream control
stations, were sampled every four weeks. Physical and chemical data from the
power plant and in situ measurements taken in conjunction with biological
sampling are summarized in Table 14. Daily measurements of flow from a U.S.
Geological Survey (USGS) gauging station 7 km upstream from the power plant
are presented in Figure 9. Descriptions of habitats, including information on
station location, substratum, width, depth, stream gradient, and riparian
vegetation, are summarized in Table 15.
Water quality and stream discharge remained essentially unchanged during
the June-August study period (Table 14 and Figure 9). Habitats at each
station were also similar, as evidenced by the descriptive data summarized in
Table 18. The only major difference was found at station 7. Effluents from
the power plant channeled along the right bank at this station, resulting in
an alteration of the natural substratum.
4.2.2 Cumberland River Data Sets —1973 and 1975
Two sets of data were utilized from the Cumberland River (a
river-reservoir environment). The study area was the headwater region of Old
Hickory Reservoir (mean width, 0.9 km), with depths at the different stations
varying from 0.9 to 10 m. The major water user within the study area was a
steam electric generating plant located on the north bank of the reservoir
(Figure 10). Unlike the study area on the Clinch River, this site was
affected by the discharges from the power plant's once-through cooling system,
which was essentially constant through time. Data were collected from June to
October 1973, in January 1975, and from April to September 1975.
Several factors complicated the collection and analysis of the Cumberland
River data:
1. The size and shape of the thermal plume varied seasonally and
yearly, at times moving upstream over areas previously designated as
controls.
2. The flow in the river varied markedly from year to year. For
example, during the 1973 study, the median daily average flow did
not drop below 5000 cfs for any week, whereas in 1975, there were
nine weeks in which median weekly flows were less than 5000 cfs.
3. Some of the stations were located in shallow overbank areas of the
reservoir, whereas others were located in the channel (Figure 10).
Rather than sample all benthic habitats, sampling was limited to the
predominant silty clay substratum. An a priori assumption was made that the
response exhibited by the macrobenthic community inhabiting the silt-clay
substratum was representative of the total macrobenthic community.
44
-------
ISO
~i 1 1 1 1 1 1 1 1 1 1 1 1 1 r
X
0
100
.p-
l/l
tc
llj
50
(_>
>
o
10 15 20 25 ~3of
JUNE '
10 15 20 25 3O
AUGUST
Figure 9. Stream discharge of the Clinch River at the United States Geological Survey
gauging station at Cleveland, Virginia, June-August 1970.
-------
Station I
CRM 243.8
I0-I5ft.
Station 7
CRM 244.6
18-24 ft.
Station 3
CRM 241.7
10 ft.
Station 6
CRM 240.9
10ft.
Station 2
CRM 242.5
8ft.
Station 4
CRM 241.7
30ft.
Station 5
CRM 240.9
30ft.
Figure 10. Location of stations in the vicinity of a power plant on
the Cumberland River.
46
-------
TABLE 9. HYPOTHETICAL DATA SHOWING PROPORTIONS OF 10 SPECIES AT 12 STATIONS
Species
number
1
2
3
4
5
6
7
JN
^1
8
9
10
Station
A
0.110
0.090
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0.100
B
0.200
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0.050
0.050
c
0.200
0.200
0.100
0.100
0.100
0.100
0.100
0.100
0
0
D
0.200
0.200
0.100
0.100
0.100
0.100
0.050
0.050
0.050
0.050
E
0.250
0.250
0.150
0.050
0.050
0.050
0.050
0.050
0.050
0.050
F
0.250
0.250
0.250
0.250
0
0
0
0
0
0
G
0.300
0.100
0.100
0.100
0.100
0.100
0.100
0.100
0
0
H
0.400
0.050
0.050
0 '
0
0
0
0.050
0.050
0.400
I
0.500
0.050
0.050
0
0
0
0
0.050
0.050
0.300
J
0.600
0.100
0.100
0.100
0.050
0.010
0.010
0.010
0.010
0.010
K
0.750
0.050
0
0.050
0
0.050
0
0.050
0
0.050
L
0.910
0.010
0.010
0.010
0.010
0.010
0.010
0.010
0.010
0.010
For all subsequent computations, these data were transformed with an arcsine transformation.
-------
TABLE 10. MATRIX OF DISTANCE COEFFICIENTS COMPUTED FROM THE DATA IN TABLE 9
AFTER ARCSINE TRANSFORMATION
oo
Station
A
B
C
D
E
F
G
H
I
J
K
L
A
0
0.037
0.064
0.055
0.082
0.124
0.076
0.154
0.162
0.181
0.244
0.337
B
0
0.039
0.039
0.064
0.108
0.040
0.149
0.147
0.150
0.215
0.308
Station
CDEFGHIJKL
0
0.032 0
0.050 0.039 0
0.096 0.090 0.082 0
0.046 0.056 0.068 0.106 0
0.169 0.153 0.147 0.186 0.152 0
0.163 0.151 0.141 0.178 0.138 0.049 0
0.153 0.149 0.138 0.150 0.119 0.154 0.111 0
0.219 0.217 0.206 0.225 0.184 0.182 0.134 0.080 0
0.312 0.310 0.297 0.312 0.276 0.265 0.219 0.166 0.098 0
-------
TABLE 11. GENERALIZED TROPHIC, FUNCTIONAL CLASSIFICATION OF
ZOOMACROBENTHIC INVERTEBRATES (ADAPTED FROM CUMMINS, 1973)
Level in
hierarchy
Name
Subdivision
Functional group
II
Feeding mechanism
III
IV
Dependence
Food habit
Shredders (vascular plant tissues)
Collectors (detrital materials)
Grazers (Aufwuchs)
Predators
Parasites
Chewers and miners
Filterers (suspension feeders)
Gatherers (sediment or deposit
feeders)
Scrapers
Chewers and suckers
Swallowers and chewers
Piercers
Attachers
Obligate
Facultative
Herbivory
Detritivory
Carnivory
Omnivory
49
-------
TABLE 12. HIERARCHICAL CLASSIFICATION OF THE TROPHIC-FUNCTIONAL
ROLE OF ORGANISMS; INCLUDES ONLY THOSE CATEGORIES THAT
OCCURRED IN SAMPLES
Functional group Feeding mechanism Dependence • Food habit Code
1-Herbivory 1111
1-Shredders
1-Obligate
1-Chewers & miners
2-Detritivory 1112
2-Facultative 4-Omnivory 1124
2-Collectors
2-Filterers
3-Gatherers
6-Swallowers &
chewers
1-Obligate 2-Detritivory 2212
2-Facultative 4-Omnivory 2224
1-Obligate 2-Detritivory 2312
2-Facultative 2-Detritivory 2322
1-Obligate 2-Detritivory 2612
3-Grazers
1-Obligate
1-Herbivory 3411
4-Scrapers
5-Chewers & suckers
2-Detritivory 3412
2-Facultative 4-Omnivory 3424
1-Obligate 1-Herbivory 3511
2-Facultative 4-Omnivory 3524
4-Predators
6-Swallowers &
chewers
7-Piercers
1-Obligate 3-Carnivory 4613
2-Facultative 4-Omnivory 4624
1-Obligate 3-Carnivory 4713
50
-------
TABLE 13. HIERARCHICAL CLASSIFICATION AND NUMERICAL CODES ASSIGNED
FOR ZOOMACROBENTHIC INVERTEBRATES BASED ON FUNCTIONAL
MORPHOLOGY: HEAD POSITION, BODY SHAPE, AND RESPIRATORY ORGANS
Level in
hierarchy Name Subdivision
Head position
(feeding category)
Hypognathous
Prognathous
Opisthorphynchous
Vestigial or other
II
Body shape
(current of stream)
Flattened irregular
Flattened oval
Flattened elongate
Compressed laterally
Cylindrical
Elongate
Short, compact
Fusiform
Irregular
Hemicylindrical or
subtriangular
III
Respiratory organs
(substratum)
Simple filamentous gills
Compound filamentous gills
Platelike gills
Operculate gills
Leaflike gills or organs
Respiratory dish
Respiratory tube
Spiracular gills
Caudal chamber
Plastron
Body integument
Tracheal respiration
51
-------
TABLE 14. RANGE AND MEAN OF PHYSICOCHEMICAL DATA FOR THE CLINCH RIVER,
JUNE TO SEPTEMBER 1970
Characteristic Range Mean
pH 7.4-8.9 8.2
Temperature, °C 14.4-28.3 21.1
Dissolved oxygen, mg/1 4.6-9.8 7.4
Total hardness, mg/1 as CaCO 127-198 155
Conductivity, |_mihos/cm 150-305 246
52
-------
TABLE 15. DESCRIPTIONS OF HABITATS AT STATIONS 4, 7, 8, 9, 10, and
11 ON THE CLINCH RIVER,
Station
Characteristic
River kilometer
Mean depth, m
Mean width, m
Stream gradient, m/km
Composition of
substratum (percent)
Bedrock
Rubble
Gravel
Sand
Maximum rooted vegetation <-
Dominant streams ide
vegetation
4
934
0.4
60
1.1
5-40
40-75
10-20
10-15
W
7
430
0.4
65
1.0
5-60°
20C-75
10-15
10-15
Restricted I
W
8
427
0.4
25
1.7
10-30
40-70
10-15
10-15
C
9
423
0.4
60
0.3
10-20
40-40
10-15
5-10
C
10 11
413 402
0.4 0.4
65 60
0.3 0.9
0.10 5-15
70-80 40-70
5-20 15-30
0-5 10-15
-Moderate (,10-jU/t) >
W C
Adapted from a classification developed by Pennak (1971).
Because each station was sampled along the right bank, left bank, and
midchannel, the composition is expressed as a range.
"The right bank area of station 7 was a solid, calcareous substratum.
Symbols: W = woodland; C = combination of woodland on one side of the
stream and brush with herbs and grasses on the other side.
53
-------
SECTION 5
CLUSTER ANALYSIS
5.1 GENERAL DESCRIPTION
Only in recent years has cluster analysis been applied in aquatic ecology
and problems related to water pollution. Cairns and Kaesler (1969, 1971),
Cairns et al. (1970), Roback et al. (1969), Kaesler and Cairns (1972), and
Kaesler et al. (1971) were the first to use cluster analysis to evaluate the
impact of a power plant on a river environment. In a similar study Crossman
et al. (1973, 1974) used cluster analysis to describe the response of
zoomacrobenthic communities to pH stress and the recovery of a stream from a
spill of hazardous materials. Stephenson and Dredge (1976) and Stephenson
et al. (1976) also used numerical classification and other methods of analysis
to estimate the impact of construction on estuarine fisheries and macrobenthic
communities.
The purpose of cluster analysis is to produce a classification that
expresses the degree of similarity between the items being classified. The
major application of cluster analysis in stream surveys has been to determine
the similarities between stations or samples on the basis of their contained
biotas. Such analysis is referred to as Q-mode analysis. The use of Q-inode
analysis, of course, is not limited to biological applications. It can also
be used to compare habitats on the basis of their physical and chemical
properties. One such application was Shannon's (1970) use of cluster analysis
to group 55 lakes in Florida according to water quality and trophic structure.
R-mode analysis quantifies the similarities between species on the basis
of their distribution among the samples studied. Little use has been made of
this heuristic technique, although ecologists have long recognized the need to
quantify associations of species (Forbes, 1907; Shelford, 1915). Buchanan and
Lighthart (1973), however were able to associate water parcels in a eutrophic
lake with characteristic assemblages of species. Stephenson et al. (1972)
also used R-mode comparison to determine whether Petersen communities that can
be characterized by one or two dominant species actually exist in natural
systems.
5.2 ANALYTICAL PROCEDURES
5.2.1 Selection of Similarity Coefficients
When considering the use of cluster analysis for summarizing data, the
investigator must first select an appropriate similarity coefficient. In this
study 26 coefficients were tested to (1) evaluate their usefulness in
analyzing environmental data and (2) determine which ones were highly
correlated so that redundant expressions could be identified and duplication
eliminated. Three kinds of coefficients were tested. Pearson's
54
-------
product-moment correlation coefficient and Sokal's (1961) average taxonomic
distance were used with species count data, while a number of similarity
coefficients were used with presence-absence data. Some of these coefficients
have been analyzed and compared previously (Simpson, 1960; Sokal and Sneath,
1963; and Cheetham and Hazel, 1969) but never in the context of aquatic
ecology. Their equations are given in Table 16, where n is the number of
species, s_. is the standard deviation of the j^th variable, S^k is the square
root of the covariance of the j^th and kth variables, Xij is the value of the
.ith species in j^th sample, and a, b, c, and d are terms from a 2 by 2
contingency table (Table 17).
In general, correlation coefficients give a high similarity for samples
with species present in the same proportions, whereas distance coefficients
give a high similarity (low distance) for samples with species present in the
same numbers. The similarity coefficients vary, depending on whether negative
matches are included. For a more detailed discussion of these coefficients,
the reader should consult Sokal and Sneath (1963).
In addition to the different coefficients, several transformations of
quantitative data were evaluated: standardization by rows (species),
logarithm of the abundance plus 1, and square root of the abundance plus 0.5.
Standardization is the process of transforming each row in a data matrix by
subtracting the row mean and dividing by the row standard deviation. When a
matrix is standardized, each species has a mean abundance of 0 and a standard
deviation of 1. Of course, it is not possible to conceptualize a species with
a mean abundance of 0, because it would necessitate negative abundances to
offset the positive ones. The justification for standardization is that very
abundant species are given proportionally less weight.
The logarithmic transformation of species abundance is a drastic
procedure (Table 18). It has the net effect of reducing species abundances to
ranks and effectively eliminating the impact of very abundant species.
The square-root transformation is a compromise between the use of
untransformed data and the logarithmic transformation (Table 19). Throughout
the study, correlation and distance coefficients were computed on
untransformed and transformed data. In these computations, the coefficients
were labeled as shown in Table 20.
To compare the various coefficients and transformations, matrices of
correlation, distance, and similarity were computed for 30 zoomacrobenthic
samples. The data were collected at six stations from June to October 1973 in
the vicinity of the Cumberland River power plant. All correlation and
similarity matrices were then compared by using the coefficient of cophenetic
correlation, r (Sokal and Rohlf, 1962; Kaesler, 1970; Kaesler and Cairns,
1972; and Kaesler et al., 1974). The r is a product-moment correlation
coefficient computed between corresponding elements of the matrices. If the
matrices have identical or exactly proportional values, they are perfectly
correlated and r =1.0. Ifr =-1.0, the values in the matrices show a
perfect negative correlation. After the r 's were computed, they were
arranged in a correlation matrix (Table 21). This matrix was clustered by
using the unweighted pair-group method with arithmetic averages (UPGMA), and a
dendrogram was constructed (Figure 11). The dendrogram shows the overall
similarities in the matrices and groups those coefficients that produce
55
-------
0,0
CORRELATION
1,0
I I I I I I I I
1
7
2
I I I I I » I I i I I
Figure 11. Dendrogram computed from cluster analysis of a matrix
of coefficients of cophenetic correlation showing
similarity between the various correlation and similarity
matrices in Tables 17 and 20 (rpr, = 0.914).
Lcc
56
-------
similar matrices. The distance coefficients were not included because they
are negatively correlated with similarity matrices with which they are similar
because distance is the opposite of similarity.
With 0.75 selected as an arbitrary limit for clustering, five distinct
clusters were formed (Figure 11): (1) Corr 1 and 7; (2) Corr 2, 4, and 6; (3)
S , H, UNI, and RT; (4) S , UN2, D, OCH, K2, UN4, RHI, UN5, RR, and Y; and
(§7 UN3 and Kl. One coefficient was selected for further consideration from
each of the first four clusters. The low intracorrelations in cluster 5 and
the similarity of cluster 5 to clusters 3 and 4 suggested that it could be
ignored. The representatives selected from the first four clusters were
Corr 7 (square-root transformation, unstandardized data), Corr 6 (square-root
transformation, standardized data), S (simple matching coefficient), and S
(Jaccard's coefficient).
The S and S coefficients were chosen from their respective clusters
because of their simplicity and their relatively widespread use in analyzing
environmental data. Corr 7 was selected over Corr 1 because of the utility of
the square-root transformation. Although Corr 1 and 7 gave similar results,
it was possible that some species might be present in very large numbers in
other data sets, thus giving misleading results with the product-moment
correlation coefficient. Similarly, Corr 6 was selected over Corr 2 and 4
because of the square-root transformation. In this case, however, the
transformed data were standardized by rows before the correlation matrix was
computed.
A similar rationale was used to select representative distance
coefficients (Table 22). Dist 1 and 7 were highly intracorrelated, as were
Dist 2, 4, and 6. Dist 7 and 6 were selected as exemplars because of the
square-root transformation. The negative correlations between the distance
matrices and the correlation and similarity matrices are also presented in
Table 22.
5.2.2 Reducing Size of Data Matrices
After the similarity coefficients were selected, data matrices were
prepared for the Clinch and Cumberland Rivers data sets. In the case of the
Clinch River data set, it was necessary to reduce the size of the data matrix.
If all the 1970 Clinch River data had been included (i.e., stations 1
through 21 with right bank, left bank, and midchannel substations), the matrix
would have had 248 substations and 123 taxa. This matrix would not only be
computationally unmanageable, but the resulting dendrogram would have been
difficult to interpret (Figure 12). Two mechanisms were used to limit the
size of the data matrices: (1) partitioning the data set into subsets on the
basis of the season the stations were sampled and (2) reducing the data set to
these samples collected only in immediate vicinity of the spill, i.e.,
stations 4, 7, 8, 9, 10, and 11.
In order to reduce further the data set to manageable size, the number of
taxa needed to be reduced. Relative abundance of species was selected as the
criterion upon which to base this reduction. The rationale for using relative
abundance was based on Patrick's (1961) discussion on how to determine whether
a diatom species was an established resident or a temporary inhabitant of a
station. Patrick "considered those species established which were represented
57
-------
H
Jj-H
H
4 ,— i .
r^—1
i
A
in— i
, H
u — '
"-(
j i
"A
|H
j-H
H
,
Trir
iin
...— 7DHC
u ' "*
ft..,,., l i°a
H 1 lowc
1 | \ornc
f~l v
^
_| .
H — '
4 1
1
i— i
4
H
H
r4 '
M — '
rf
1
J
1 — 1
rl j
P— 1
,4
r Lr1
1 — (
fi— T"1111
j j i
1 •
H_i
1 — |
J-H ,
H '
" r1 — i
H
1 M
i
'
116HC
13CL
11CR
•.AR
'• - 7FR
9CHC
1..MC
SCR
BCK
. ioa
70B
nnw-
9ER
9FNC
8FR
fir
^— BFL
9FL
12BMC
t- UBMC
"
L
J_
J
Figure 12. Dendrogram computed from Q-mode cluster analysis of a
matrix of Jaccard's coefficients showing faunal simi-
larities between samples collected from the Clinch River
in 1970; data include total insect fauna.
58
-------
by six or more specimens when 8,000 or more diatoms are counted." Since this
criterion appeared applicable to any biological community, relative species
abundance was selected as the most reasonable means of reducing the number of
species.
The formula used for computing relative abundance (RA) was
s
RA = N./ I N.,
where N. = number of individuals for the jLth species,
s
Z N. = total number of organisms per station.
To determine the minimum acceptable value of RA, RA's were calculated for each
taxon collected at a station unaffected by the spill. Four different levels
were considered, 0.10, 0.05, 0.01, 0.005. The 0.10 and 0.05 levels were too
exclusive. Emphasis was therefore placed on RA values of 0.01 and 0.005.
An RA of 0.01 reduced the number of taxa from 123 to 29 taxa (Table 23).
Representatives of each major taxonomic group normally found in the stream
were present. To determine whether major functional groups remained
represented, trophic, functional designations were assigned (refer to
Tables 11 and 12, section 4.1.3) to each taxon. With the exception of the
shredders, every functional group had at least five representatives at the
0.01 level of discrimination (Table 24).
In addition to determining whether major functional groups were present,
numerical importance was also considered. Taxa with an RA X).01 accounted for
38 to 64 percent of the total number of taxa found at each station (Tables 25
and 26) and 82 to 99 percent of all the organisms found at each site
(Tables 27 and 28). While species with an RA <0.01 were important in
determining total diversity they were not considered a major component of the
stream's macrobenthic community and were, therefore, treated as outliers and
deleted from, further cluster analyses.
At the RA = 0.005 level of discrimination, the number of taxa increased
from 29 to 37 taxa. Since this was an increase of only 8 taxa, the 0.005
level of discrimination was not considered further.
After the taxa to be included had been selected, three data matrices were
considered for the Clinch River data set:
1. Those stations affected by the pH stress—stations 7RB , 8, 9, and 10
after the spill on June 19, 1970.
Those stations unaffected by the :
throughout the study and stations 8, 9, and 10 before the spill.
2. Those stations unaffected by the spill—stations 4, 7LB&MS , and 11
1Station 7 was divided into 7RB (right bank) and 7LB&MS (left bank and
midstream because effluents from the power plant flowed along the right
bank.
59
-------
3. A composite data matrix that considered each station sampled.
The analytical techniques, types of comparison, and similarity coefficients
used to analyze each data set and the number of dendrograms and scatter
diagrams computed are listed in Table 29.
5.2.3 Evaluation of Distortion
The amount of distortion introduced by the clustering procedure, was
measured by the coefficient of cophenetic correlation (r ). If values were
>0.8, it was assumed that no serious distortion was introduced by the
clustering procedure.
5.3 RESULTS
5.3.1 Q-Mode Analysis
5.3.1.1 Clinch River Data Set—Of the six Q-mode dendrograms developed from
the Clinch River data set, four had acceptable values of r X).8 (Table 30).
5.3.1.1.1 Presence-absence data—The dendrogram from cluster analysis of
the matrix of Jaccard's coefficients had the least distortion of any of the
Q-mode dendrograms (r = 0.93). Two clusters were formed at a similarity
level of 0.62 (Figure 13). The first cluster contained 22 samples, the second
had 6 samples, and eight stations were left unclustered. The first cluster
was dominated by samples from stations that were either unaffected by the
spill or had recovered from the pH stress (Table 31). The second cluster
consisted of samples from stations 8, 9, and 10, the stations impacted by the
spill.
A sequence of stream degradation and recovery is indicated by the
clusters in Table 31. Station 8, the downstream station nearest the power
plant and the most severely impacted site, had only three samples in the first
cluster, samples collected before the pH stress and in August, six and eight
weeks after the spill. Station 9, the next station downstream, had four
samples in the first cluster. They were the same as those for station 8, but
also included the late July sample collected four weeks after the spill. At
station 10, the farthest downstream station affected by the spill, only the
sample collected immediately after the pH stress was missing from the cluster.
This indicated that the farther downstream a station was located, the less
severe the impact and the faster its recovery.
The six samples in the second cluster are from stations 8, 9, and 10
after the spill. Their clustering indicates that their zoomacrobenthic
communities were similar immediately after the spill. The severity of the
impact and the time required for recovery depended on the station's proximity
to the site of the spill. This cluster also indicates that the Clinch River's
zoomacrobenthic community had a high natural resiliency or ability to recover
from an acute stress.
The eight unclustered samples were those collected from stations 7RB and
station 7LB&MS in August. As noted in the description of the sites, effluents
from the power plant channeled along the right bank, resulting in a chronic
stress. As a result, the low similarities of samples from station 7RB to
other samples were not unexpected. The failure of samples from station 7LB&MS
60
-------
0.090 0.240
I 1—
0.39O 0-640
0.690
H—
0.840
-I—
0.990
-I
4-6/17
4-6/21
10-6/19
11-6/22
8-6/18
9-6/18
4-8/15
11-8/18
7-6/12
7-6/23
7-7/7
7- 7/20
8-8/6
10-8/7
9-7/22
9-8/6
9-8/17
10-7/23
10-8/17
11-6/23
8-8/16
10-7/10
7-8/5
8-6/23
10-6/24
8-7/8
8-7/21
9-6/24
9-7/8
7-8/16
7- 6/12
7-7/20
7-6/23
7-8/16
7-7/7
7-8/5
0.090 0.240 0.390 0.540 0.690 0.840 0.990
Figure 13. Dendrogram computed from Q-mode cluster analysis of a matrix of
Jaccard's coefficients showing faunal similarities between 36
zoomacr obenth ic samples collected from stations 4, 7, 8, 9, 10,
and 11 in the Clinch River, 1970 (rcc = 0.93).
61
-------
to cluster with the unaffected stations was not expected, however, and the
only possible explanation was an increase in the size of the discharge plume
during the period of extremely low stream flow in August.
The dendrogram resulting from use of the simple-matching coefficient
(S ) produced an unacceptable level of distortion (r <0.8) and is not
discussed here.
5.3.1.1.2 Quantitative data, counts of species—The Corr 6 and Dist 6
dendrograms had acceptable r 's of 0.82 and 0.91. The resulting clusters,
however, had very low levels of similarity and were uninterpretable. For
example, at the 0.06 and 1.2 levels of similarity for the Corr 6 and Dist 6
dendrograms, samples from the upstream control station clustered with the
stations impacted by the spill (Tables 32 and 33).
The Dist 7 dendrogram had an r of 0.80. Since it is computed from a
data matrix transformed by the square-root transformation, the level of
similarity was expected to be low. At a level of similarity of 4.8, three
clusters were formed (Table 34). The first cluster was dominated by 11
samples from the control stations and from stations 7 through 10 before the
spill. The second cluster consisted of 20 samples, from stations 7 through
10, with station 7LB&MS clustering with the most severely impacted stations.
These results were contrary to previous findings by Crossman (1973, 1974) and
suggest that dendrograms computed from correlation and distance coefficients
should be interpreted with caution.
5.3.1.2 Cumberland River Data Set—1973--
5.3.1.2.1 Substrate—Stations on the Cumberland River were established
in areas with sediments of similar texture. The intent was to sample only
substrates dominated by very fine sand to clay. Cluster analysis of the
particle size data using various coefficients showed similarity of the
substratum at the various stations.
Cluster analysis of a matrix of correlation coefficients (Figure 14),
however, shows changes in sediment composition with time, especially in
overbank areas (level of similarity 0.7). A higher level of similarity, 0.82,
shows the trend even more strongly with almost all samples collected in
September remaining unclustered, whereas samples collected in August tended to
join clusters. At both levels of similarity, it was evident that the
substratum was more consistent (1) in the channel than in overbank areas and
(2) in downstream areas than in upstream areas around the intake and discharge
canals.
Cluster analysis of a matrix of distance coefficients showed the same
(Figure 15): (1) August samples clustered; (2) September samples generally
remained unclustered, especially in overbank areas; and (3) downstream and
channel stations generally had more consistent substratum composition than
upstream and overbank stations.
5.3.1.2.2 Zoomacrobenthos--Six different similarity coefficients (two
presence-absence and four quantitative) were used to examine the total
macrobenthic data set for 1973.
62
-------
1 ' 1
0
I.I,
1
1
1
1
—
1
1
1 1
1
1 l,m
r i, ,i
L) — JUL
r 5— JUN
— t ij-AuG
1 ._ r A II,-
l „ /i |m
r ° c D
O--OEP
C3~JUN
9-.ini
1 — 6—AuG
r~ 3— JUL
1 ^ 3— AUG
Mr I--AUG
L 2-AuG
' 4— SEP
r °rr
J — oEP
4 JUN
r 6— JUN
L 6— JUL
i i . . .
i — JUL
2— SEP
-7 o
J — ObP
1 rcrn
I--JEP
I
Figure 14. Dendrogram computed from Q-mode cluster analysis of a matrix of
correlation coefficients computed from proportions of each phi
size in the substrate after arcsine transformation; shows simi-
larity of substrate between samples collected from the Cumberland
River in 1973 (rcc = 0.760).
63
-------
Distance
6.0
I I I
I
0.0
I--JUN
5-JuL
5--JUN
4— AUG
5— AUG
4--JUL
3--JUL
3— AUG
1— AUG
4--SEP
2--AuG
S--SEP
3— JUN
2--JUL
6— AUG
G--JUN
6--JuL
G--SEP
I--JUL
2--SEP
3--SEP
I--SEP
Figure 15. Dendrogram computed from Q-mode cluster analysis of a matrix of distance
coefficients computed from proportions of each phi size in the substrate
after arcsine transformation; shows similarity of substrate between
samples collected from the Cumberland River in 1973.
64
-------
5.3.1.2.2.1 Presence-absence data--Jaccard's coefficient (S ) produced a
dendrogram with an acceptably low level of distortion (r = 0.85). At
similarity levels of 0.5 and 0.6, samples from stations 2 and 3 tended to form
clusters (Figure 16). These stations were located at the discharge and on the
overbank area below the discharge. Moreover, samples collected in September
and October tended to form clusters except for samples from stations 2 and 3.
The simple-matching coefficient (S_M) produced higher similarities than
Jaccard's coefficient because a number of samples contained rare species
(i.e., species that did not occur in more than one or two other samples). The
large number of negative matches increased overall similarity. A more
thorough discussion of the results obtained with the simple-matching
coefficient is not warranted because it included an unacceptable level of
distortion (r <0.8).
5.3.1.2.2.2 Quantitative data, counts of species—The distance
coefficient Dist 7 formed clusters of samples at a high level of similarity
and with an acceptably low level of distortion (Figure 17). Two groups of
samples appear as distinct clusters at a level of similarity of 2.3:
(1) samples collected from station 2 during the summer and from station 1
during the late summer and fall (The temperature at station 1 in late summer
and fall was similar to that at station 2 in early summer as a result of
warmer weather and possible upstream migration of the thermal plume.); and
(2) samples collected at all downstream stations in the month of September.
At a higher level of similarity, samples from stations 1 and 2 remained
clustered, but two new clusters appeared: (1) samples from stations located
in overbank areas only and (2) samples predominantly from the channel stations
and entirely from the stations downstream from the plant.
With the correlation coefficient Corr 7, clusters were formed at
extremely high levels of similarity (Figure 18). At a level of 0.9, for
example, three clusters were formed. The first cluster comprised only samples
collected from stations 1 and 2 in June, a similarity that cannot be explained
by temperature alone because AT between these two stations in June was 5.6 C.
The biotic similarity may have resulted from similarity of substrate. The
second cluster contained 57 percent of the samples. The only noticeable trend
was the complete absence of samples from September, during which time the
lowest flows and highest temperatures were recorded. Cluster 3 contained only
two samples (3-SEP and 3-OCT), indicating a distinct biota during the autumn
at station 3.
5.3.1.2.3 Summary--A thorough discussion of results obtained with Corr 6
and Dist 6 is not warranted because use of these two coefficients produced
unacceptable levels of distortion (r <0.8). In addition, use of these
coefficients produced dendrograms with less structure and lower similarities
than those of their counterparts (Corr 7 and Dist 7), indicating that the
standardization of data may increase distortion and decrease the structure and
similarity level of dendrograms.
A summary of the cluster analyses for the 1973 data set is presented in
Table 35. Three of the six coefficients had acceptably low levels of
distortion (S_, Dist 7, and Corr 7). They also produced dendrograms that were
readily interpretable, and all three indicated that both station location and
the month in which the samples were collected were causes for association.
65
-------
I
0.2
1
0.0
0.2
0.4
0.6
1
0.8
I
.
I
1
1
1
1.0
1—JUN
3—JUN
6—JUN
4--JUN
4—AUG
3-JuL
3—AUG
3-Oc.T
6—AUG
if—OCT
5-Oc.T
1-Oc.T
6—OCT
1—AUG
5—AUG
5-
2-
2-
H-
5-
2
2-
1-
6-
5-
H-
3-
2-
6-
•JUL
• ^ C D
O d i
-JUL
—AUG
•SEP
-SEP
-JuN
-Jut
•JUN
•JUL
1
Figure 16. Dendrogram computed from Q-mode cluster analysis of a matrix of Jaccard's
coefficients showing faunal similarities between samples collected from
the Cumberland River in
1973 (r
66
cc
= 0.852).
-------
DIST 7
6.0
|
— T
1
1 1
I . I
1 1
i I
^•B
I
1
1 1 ' 1
0.0
3— 1 1 1 M
5 Illl
.- -i- ^ Anr
^ Aur
6 — — 1 1 1 M
0 llIN
1
-------
CORR7
I
Figure 13. Dendrogram computed from Q-mode cluster analysis of a matrix of cor-
relation coefficients computed from data transformed with the square-
root transformation; shows faunal similarities between samples collected
from the Cumberland River in 1973 (rcc = 0.979).
68
-------
Two of the three coefficients whose use produced unacceptable levels of
distortion (Corr 6, Dist 6) also produced dendrograms that had low
similarities and were not readily interpretable. This result weighs against
the use of standardization in the analysis of ecological data.
The high degree of structure and similarity in the dendrogram resulting
from use of the simple-matching coefficient was due largely to its inclusion
of negative matches, a possible source of bias.
In this case, however, despite the possible bias and the introduction of
more distortion than Jaccard's coefficient, the stations associated at the
highest levels of similarity were nearly the same with these two
presence-absence coefficients (Table 35). The highly similar stations using
these presence-absence similarity coefficients were in many cases, however,
different from those that were associated at the highest levels by the various
quantitative coefficients.
5.3.1.3 Cumberland River Data Set—1975--
5.3.1.3.1 Substratum—Sediment data collected in 1975 from seven
stations on the Cumberland River were clustered through use of the Pearson
product-moment correlation coefficient. Moderate distortion was introduced
through the clustering procedure (r = 0.76). At a similarity level of 0.40,
two large clusters were formed. Cluster 1 contains most of the samples and is
dominated by samples from stations 2, 3, and 6, indicating that sediments at
the downstream overbank stations were similar. Cluster 2 is dominated by
channel stations, suggesting that composition of substratum differed between
the overbank and channel sites. At higher levels of similarity, change in
sediment composition with time is suggested.
5.3.1.3.2 Zoomacrobenthos—
5.3.1.3.2.1 Presence-absence data--Jaccard's coefficient (S ) produced a
dendrogram that clustered samples at low levels of similarity but with little
distortion (r = 0.88). At a similarity level of 0.68, seven clusters were
formed. The first three clusters were dominated by samples collected in May,
the fourth cluster was dominated by samples collected in June and July, and
the last three clusters were dominated by samples collected in August and
September. These groupings suggest that faunal assemblages were influenced
more by seasonal factors than by their proximity to the thermal discharge.
This type of distribution is expected when environmental conditions are
the same at all stations. Examination of temperature data revealed that
temperatures differed very little between the control stations upstream and
the stations downstream from the power plant. Temperatures in May were
13 + 1.5 C at all stations, increasing to 26.5 + 2 C at all stations in
August.
The simple matching coefficient (S ) resulted in overall higher
similarities, but a great deal more distortion was introduced (r = 0.70).
Because 0.80 had been established as the minimum r value for anCacceptable
dendrogram, interpretation of this dendrogram is limited to the general
observation that seasonal factors tended to influence similarities between
stations more than location.
69
-------
5.3.1.3.2.2 Quantitative data, species counts—Cluster analysis with
the distance coefficient Dist 6 resulted in a dendrogram with little
distortion (r = 0.95). At a level of similarity of 0.84, one large cluster
and three small clusters were formed, with 11 samples remaining unclustered
(Figure 19). The large cluster was dominated by samples collected in May and
June and samples from station 3, whereas the unclustered samples were
collected predominantly at station 2 or in the month of September.
Only 47 taxa were collected either at station 3 or in May when the
diversity was low, whereas 78 taxa were collected at station 2 or in
September. Similarly, the mean number of organisms collected per square meter
in May was 1,325, while the mean number in September was 2,337. These facts
probably account for the separation of these sample groups.
Cluster analysis with the distance coefficient Dist 7 produced a
dendrogram with a stair-stepped appearance and little distortion of
similarities (r = 0.84) (Figure 20). At a similarity level of 1.55, two
large clusters and five small clusters were formed. Both large clusters
contained samples from (1) at least four of the five months sampled, (2) both
overbank and channel stations, and (3) samples both upstream and downstream
from the plant. This arrangement seems to indicate that all the samples
collected were similar. At higher levels of similarity, however, we were able
to detect slight differences with time--a trend that was also noted in cluster
analysis of other coefficients.
Cluster analysis of coefficients Corr 6 and Corr 7 resulted in
dendrograms that contained an unacceptable amount of distortion and a
structure that made interpretations difficult. Neither temporal nor spatial
factors were shown to predominate in the formation of biotic associations.
5.3.1.3.4 Summary--Only three of the six coefficients tested, Jaccard's
coefficient and the distance coefficients Dist 6 and Dist 7, produced
dendrograms with acceptable low levels of distortion (r >0.80) (Table 36).
Samples with highest similarities are nearly the same in all cluster analyses.
That is, some samples are closely similar regardless of which coefficient is
used. The principal difference among samples with a high similarity was that
the two presence-absence coefficients (S and S ) indicated a high similarity
between a channel and an overbank sample collected during May, which the
quantitative coefficients associated at a much lower level. Similarly, the
quantitative coefficients indicated high similarity for one pair of overbank
samples in May that the presence-absence coefficients clustered at a much
lower level. All three dendrograms with acceptable levels of distortion
indicated that the season during which the samples were collected was the
primary cause of association. Only the Jaccard coefficient (presence-absence
data only), however, failed to indicate that station location was a secondary
factor.
For both the 1973 and 1975 Cumberland River data sets, only three of the
six coefficients tested produced dendrograms with acceptably low levels of
distortion. The Jaccard coefficient and the Dist 7 (quantitative, distance,
square root transformation with no standardization) were two of the three that
were acceptable for both data sets. For the 1973 data set, the Corr 7
coefficient (quantitative, correlation, square root transformation with no
standardization) produced acceptably low distortion, while the Dist 6
70
-------
DIST 6
2.240
i
J
1.890
1
,
_
-
1.54C
^
1
1.190 0.840 0.490 0.140 -0.210
0 HAY
r- _.. - . 5 JUN
._ . 5 -JUT,
i— 1-MAY
I 6 MAY
7-4"r-
7-M*Y
'i-M4v
. S-MAY
1 Jin,
_. 1-J1IN
_ T-JU1,
1 - TUN
.. 7-.TTN
_., 7 -JUT,
fi-iiir.
5 A^jp
6 SEP
2-J"N
j—
5_cj^p
3 IFF
_^__ 6-JI1N
1-5EP
, 'I-TIIN
/ cut
. .. 9-AlIf
1 1 1 1
2.240 1.890 1.540 1.190 0.840 0.490 0.140 -0.210
Figure 19. Dendrogram computed from Q-mode cluster analysis of a matrix of distance
coefficients computed from data that had been transformed by the square-
root transformation and standardized by rows; shows faunal similarities
between samples collected from the Cumberland River in 1975.
71
-------
3.300
2.800
2.300
DIST 7
1.800 1.300
0.800
0.300
-0.200
—I
1-MAY
2-MAY
3-MAY
6-MAY
3-AUG
1-JUN
7-AUG
7-MAY
5-JUN
5-JUL
1-AUG
4-AUG
5-SEP
4-MAY
5-MAY
1-JUL
3-JUN
3-JUL
2-AUG
3-SEP
2-JUN
5-AUG
6-AUG
7-SEF
7-JOT
7-JUL
4-JUN
4-JUL
1-SEP
C6-JUN
6-JUL
2-JUL
2-SEP
4-SEP
6-SEP
-c
3.300 2.800 2.300 1.800 1.300 0.800 0.300 -0.200
Figure 20. Dendrogram computed from Q-mode cluster analysis of a matrix of distance
coefficients computed from data that had been-transformed by the square-
root transformation and standardized by rows; shows faunal similarities
between samples collected from the Cumberland River in 1975.
72
-------
coefficient (quantitative, distance, square root transformation with
standardization by rows) produced the third acceptably low level of
distortion.
For the 1973 data set, the principal cause of association (faunal
similarity) based on an interpretation of biological and physicochemical data
(including cluster analyses interpretation based on the formation of distinct
clusters) was the location of the sampling station. In some cases this
analysis differed from an analysis based solely on a listing of the highest
(i.e., highest similarity level) biological associations in the dendrogram.
For the 1975 data set, the principal cause of association based on an
interpretation of biological and physicochemical data was in every case
different from an interpretation based solely on a listing of the clusters
formed at the highest levels of similarity. While these lists identify the
most similar pair groups, they do not provide insight into community
structure, and for this reason we do not recommend the use of mere
similarity-level lists as a basis for interpretaton of biological data.
5.3.2 R-Mode Analysis
5.3.2.1 Introduction--
The purpose of R-mode cluster analysis is to identify recurring groups of
species that form biological communities (i.e., associations). If the
biological communities are discrete, the cluster analysis will form distinct
clusters. If the communities intergrade, the results of cluster analysis will
show this.
Whether discrete communities exist has been the subject of numerous
ecological studies, and investigators are almost equally divided as to whether
their analyses support or contradict the community concept. In a series of
studies of rivers of the eastern and southeastern United States, Patrick
(1961, 1967) noted that "the number of species of the major groups of
organisms--that is, the algae, protozoa, other lower invertebrates, insects,
and fish--remain similar from season to season in the same stream. Likewise
in similar types of streams they are similar." In other words, the total
number of different kinds of organisms is remarkably constant from one system
to another. Patrick (1961, 1967) also observed that unstressed, healthy
systems usually consist of many species with only a few individuals and a few
species with many individuals. Plotting ranked abundance of species against
number of species characteristically gives a truncated log-normal curve
(Figure 21). Although such generalities do not prove the existence or
nonexistence of communities or even associations of species, they indicate
similarity among the biotic components of unpolluted streams.
A familiar work suggesting the existence of specific
species-environmental associations is the study of indicator organisms
developed in Europe by Kolkwitz and Marrson (1908, 1909). In their saprobian
system, Kolkwitz and Marrson associated certain species with different zones
or regions of water quality. As pointed out by Cairns et al. (1972), this
system "was a logical extrapolation of the niche concept (Hutchinson, 1957;
Parker and Turner, 1961). That is, each organism has a particular set of
environmental prerequisites essential to its survival." An illustration of
the type of information resulting from study of indicator organism is shown in
Figure 22.
73
-------
35 r-
30
CO
UJ 25
O
UJ
0.
CO 20
Of
UJ
CD
15
10
1-2 2-4 4-8 8-16 16-32 32"64 64- 128- 256~ 512- 1024- 2048' 4096-8192-
128 256 512 1024 2048 4096 8192 16384
INDIVIDUALS PER SPECIES
Figure 21. A truncated log-normal distribution fitted to a distribution of species in an aquatic
ecosystem not adversely affected by environmental stress.
-------
ACTIVE DECOMPOSITION
36
24
Figure 22. Responses of organisms to severe organic enrichment: changes in types of
organisms present, population densities, and biological diversity.
-------
In general, studies indicate that structural pattern exists within fresh-
water communities. Problems confronting the ecologist are (1) identifying and
characterizing species associations and (2) identifying and understanding
environmental variables that control the presence of these associations.
Stephenson (1972) discussed the first problem in a paper describing the use of
computers to classify marine benthic communities. He identified three basic
characteristics or attributes that a species must have to belong to an
association. The first characteristic was dominance, which is usually
expressed as the number of organisms or biomass per unit area. The other
characteristics were constancy and fidelity. According to Stephenson (1972),
"A species is highly constant if it appears in all the samples or quadrants
within an association, but it need not be restricted to a single association.
Conversely, a species is highly faithful if it occurs in a single association,
but it need not occur within all the samples within the association."
Although the theory of community ecology is based on associations of
species, the discussion that follows is based on taxonomy at the generic level
or higher. Use of less refined taxonomy is necessitated by the difficulties
inherent in large aquatic ecological surveys where taxonomy must be based
solely on morphological observations of often limited size classes and some-
times damaged specimens. There also remain, of course, taxonomic
uncertainties and discrepancies in certain groups.
5.3.2.2 Clinch River Data Set--
To identify representative assemblages of taxa, R-mode clustering of the
Clinch River data was undertaken with six similarity coefficients--S , S ,
Corr 6, Corr 7, Dist 6, and Dist 7. Because the complete Clinch River data
set included samples collected before and after the pH stress, it was divided
into affected and unaffected subsets. R-mode clustering was done on each
subset to determine (1) whether assemblages of taxa changed after the spill
and (2) whether keeping all 36 zoomacrobenthic samples in the same data matrix
changed the clusters of taxa identified in each subset.
The r for each dendrogram is listed in Table 27 (refer back to section
cc
5.3.1.1). Because r values for Corr 6, Corr 7, and Dist 6 dendrograms were
less than 0.8, no discussion of these dendrograms is included here. The r
values for the S , S , and Dist 7 dendrograms were greater than 0.8, except
for one S dendrogram that had a value of 0.76.
oil
To interpret the results, one or two arbitrary levels of similarity were
selected for each dendrogram. The clusters or assemblages of taxa found at
each level of similarity were tabulated, and the tables compared. For eight
dendrograms (Figures 23 to 30) with an r of X).8, 11 tables were formed, and
the dendrograms were grouped according to their overall similarity to each
other (Tables 37 through 39). Trophic, functional (TF) codes (refer to
Table 12) were then assigned to each taxon, and intercluster reordering by
numeric code was initiated. The TF clusters were then rearranged within each
dendrogram to show similarities among dendrograms (Tables 40 to 42). The last
step was to decode the TF clusters and list by scientific name (Tables 43 to
45).
5.3.2.2.1 Presence-absence data—In Table 39, four clusters of taxa were
identified at the 0.87 and 0.89 levels of similarity for the ST and S
dendrograms computed from samples unaffected by the spill. Cluster 1 had
eight taxa, dominated by pH-tolerant beetles. Two major trophic-functional
76
-------
0.015
0.165
0.315
r
O.A65
1
0.615
0.765
1
0.915
1.065
l_
_l
STENELMIS
OPTIOSERVUS
DUBIRAPHIA
EPHEMERELLA
BAETIS
HYDROPSYCHE
PSEPHENUS HERRICKI
GONIOBASIS SPINELLA
STENONEMA
ISONYCHIA
GONIOBASIS CARINIFERA
ACRONEURIA
TRICORYTHODES
CHIRONOMIDAE
CORYDALUS CORNUTUS
CHEUMATOPSYCHE
POTAMANTHUS
ANCULOSA
ANCULOSA SUBGLOBOSA
HETAERINA
EPHORON
MICROCYLLOEPUS
HEMERODROMIA
PROMORESIA
SIMULIUM
PERLESTA PLACIDA
PLEUROCER1DAE
SPHAERIUM
HEPTAGENIA
0.015 0.165 0.315 0.465 0.615 0*765 0.915 1.065
Figure 23. Dendrogram computed from R-mode cluster analysis of a matrix of
Jaccard's coefficients, showing distributional similarities of
taxa collected from stations on the Clinch River unaffected by
low-pH stress that resulted from the 1970 spill of acid (rcc = 0.97).
77
-------
'sm
0.369
0.459
0.549
0.639
0.729
1
0.819
—I
0.909
0.999
, STENELMIS
OPTIOSERVUS
., DUBIRAPHIA
T EPHEMERELIA
BAETIS
i
HYDROPSYCHE
PSEPHENUS HERRICKI
GONIOBASIS SPINELLA
STENONEMA
ISONYCHIA
GONIOBASIS CARINIFERA
ACRONEURIA
TRICORYTHODES
CHIRONOMIDAE
CORYDALUS CORNUTUS
CHEUMATOPSYCHE
POTAMANTHUS
ANCULOSA
ANCULOSA SUBGLOBOSA
EPHORON
HETAERINA
MICROCYLLOEPUS
PROMORESIA
HEMERODROMIA
SIMULIUM
PERLESTA PLACIPA
HEPTAGENIA
PLEUROCERIDAE
SPHAERIUM
0.369 0.459 0.549 0.639 0.729 0.819 0.909 0.999
Figure 24. Dendrogram computed from R-mode cluster analysis of a matrix of simple
matching coefficients, showing distributional similarities of taxa
collected from stations on the Clinch River unaffected by low-pH stress
that resulted from the 1970 spill of acid (rcc = 0.91).
78
-------
DIST 7
13.800
I
11.800
1
9.800
1
7.800
1
5.800
1
3.800
1.800
1
-0.200
STENELMIS
ISONYCHIA
STENONEMA
CHEUMATOPSYCHE
MICROCYLLOEPUS
HEMERODROMIA
HETAERINA
PROMORESIA
PLEUROCERIDAE
SPHAERIUM
PSEPHENUS HERRICKI
ANCULOSA SUBGLOBOSA
GONIOBASIS SPINELLA
ACRONEURIA
HEPTAGENIA
PERLESTA PLACIDA
EPHEMERELLA
CORYDALUS CORNUTUS
TRICORYTHODES
EPHORON
POTAMANTHUS
CHIRONOMIDAE
ANCULOSA
GONIOBASIS CARINIFERA
SIMULIUM
OPTIOSERVUS
BAETIS
DUBIRAPHIA
HYDROPSYCHE
13.800 11.800 9.800 7.800 5.800 3.800 1.800 -0.200
Figure 25. Dendrogram computed from R-mode cluster analysis of a matrix of distance
coefficients computed from data that had been transformed by the square-
root transformation, showing distributional similarities of taxa collected
from stations on the Clinch River unaffected by low-pH stress that resulted
from the 1970 spill of acid (rcc = 0.92).
79
-------
0.015
I
0.165
0.315
1
0.465
0.615
,
0.765
1
0.915
1
1.065
——1
STENELMIS
OPTIOSERVUS
CHIRONOMIDAE
CORYDALUS CORNUTUS
MICROCYLLOEPUS
CHEUMATOPSYCHE
DUBIRAPHIA
HEMERODROMIA
HYDROPSYCHE
PROMORESIA
ACRONEURIA
GONIOBASIS SPINELLA
ISONYCHIA
BAETIS
TRICORYTHODES
HETAERINA
EPHEMERELLA
GONIOBASIS CARINIFERA
STENONEMA
SIMULIUM
ANCULOSA
PSEPHENUS HERRICKI
POTAMANTHUS
PERLESTA PLACIDA
ANCULOSA SUBGLOBOSA
O.Q15 0.165 0.315 0.465 0.615 0.765 0.915 1.065
Figure 26. Dendrogram computed from R-mode cluster analysis of a matrix of Jaccard's
coefficients, showing distributional similarities of taxa collected from
stations on the Clinch River affected by low-pH stress that resulted from
from the 1970 spill of acid (rcc = 0.97).
80
-------
0.408
0.488
0.568
0.648
0.728
0.808
0.888
0.968
I
STENELMIS
OPTIOSERVUS
MICROCYLLOEPUS
CORYDALUS CORNUTUS
CHIRONOMIDAE
DUBIRAPHIA
HEMERODROMIA
HYDROPSYCHE
CHEUMATOPSYCHE
PROMORESIA
ACRONEURIA
GONIOBASIS SPINELLA
ISONYCHIA
BAETIS
EPHEMERELLA
GONIOBASIS CARINIFERA
TRICORYTHODES
HETAERINA
PSEPHENUS HERRICKI
SIMULIUM
ANCULOSA
POTAMANTHUS
ANCULOSA SUBGLOBOSA
STENONEMA
PERLESTA PLACIDA
0.408 0.488 0.568 0.648 0.728 0.808 0.888 0.968
Figure 27. Dendrogram computed from R-mode cluster analysis of a matrix of simple
matching coefficients, showing distributional similarities of taxa
collected from stations on the Clinch River affected by low-pH stress
that resulted from the 1970 spill of acid (rcc = 0.84).
81
-------
DIST 7
17.330
14.830
12.330
1
9.830
1
7.330
1
4.830
2.330
-0.170
1
STENELMIS
OPTIOSERVUS
CHEUMATOPSYCHE
MICROCYLLOEPUS
CORYDALUS CORNUTUS
BAETIS
CHIRONOMIDAE
DUBIRAPHIA
PROMORESIA
HEMERODROMIA
PSEPHENUS HERRICKI
POTAMANTHUS
ANCULOSA SUBGLOBOSA
GOKIOBASIS CARINIFERA
ANCULOSA
GONIOBASIS SPINELLA
STENOHEMA
PERLESTA PLACIDA
EPHEMERELLA
TRICORYTHODES
HETAERINA
ACROSEURIA
ISONYCHIA
SIMULIUM
HYDROPSYCHE
17.330 14.830 12.330 9.830 7.330 4.830 2.330 -0.170
Figure 28. Dendrogram computed from R-mode cluster analysis of a matrix of distance
coefficients computed from data that had been transformed by the square-
root transformation, showing distributional similarities of taxa collected
from stations on the Clinch River affected by low-pH stress that resulted
from the 1970 spill of acid (r
82
cc
0.97).
-------
0.000
0.150
0.300
o.wo
0.600
0.750
0.900
1.050
STENELMIS
OPTIOSERVUS
DUBIRAPHIA
HYDROPSYCHE
CHIRONOMIDAE
CORYDALUS CORNUTUS
CHEUMATOPSYCHE
ACRONEURIA
MICROCYLLOEPUS
PROMORESIA
HEMERODROMIA
GONIOBASIS SPINELLA
ISONYCHIA
BAETIS
EPHEMERELLA
GONIOBASIS CARINIFERA
TRICORYTHODES
HETAERINA
SIMULIUM
STENONEMA
ANCULOSA
ANCULOSA SUBGLOBOSA
PSEPHENUS HERRICKI
EPHORON
POTAMANTHUS
PERLESTA
PLEUROCERIDAE
SPHAERIUM
HEPTAGENIA
0.000
0.150
0.300
0.450
0.600
0.750
0.900
1.050
Figure 29. Dendrogram computed from R-mode cluster analysis of a matrix of Jaccard's
coefficients, showing distributional similarities of taxa collected from
stations on the Clinch Riber both affected and unaffected by the low-pH
stress that resulted from the 1970 spill of acid (rcc = 0.95). Only
those taxa are included that comprise 10 percent or more of the total
samples.
83
-------
DIST 7
17.000
14.500
12.000
9.500
7.000
4.500
2.000
-0.500
STENELMIS
OPTIOSERVUS
CHEUMATOPSYCHE
ISONYCHIA
MICROCYLLOEPUS
CORYDALUS CORNUTUS
CHIRONOMIDAE
BAETIS
DUBIRAPHIA
PROMORESIA
HEMERODROMIA
PSEPHENUS HERRICKI
ANCULOSA SUBGLOBOSA
PLEURDCERIDAE
SPHAERIUM
GONIOBASIS SPINELLA
HEPTACENIA
HETAERINA
ACRONEURIA
PERLESTA PLACIDA
EPHEMERELLA
TRICORYTHODES
EPHORON
POTAMANTHUS
STENONEMA
ANCUI.OSA
GONIOBASIS CARINIFERA
SIMULIUM
HYDROPSYCHE
17.000 14.500 12.000 9.500 7.000 4.500 2.000 -0.500
Figure 30. Dendrogram computed from R-mode cluster analysis of a matrix of distance
coefficients computed from data that had been transformed by the square-
root transformation, showing distributional similarities of taxa collected
from stations on the Clinch River both affected and unaffected by the
low-pH stress that resulted from the 1970 spill of acid (rcc = 0.97). Only
those taxa are included that comprise 10 percent or more of the total sample.
84
-------
groups were represented—collectors and grazers. Clusters 2 and 3 had four
and five taxa, respectively, but each cluster had three major functional
groups—collectors, grazers, and predators. This indicated that assemblages
of taxa represented in clusters 2 and 3 were more characteristic of the total
zoomacrobenthic community than those in cluster 1. The fourth cluster in
Table 39 consisted of two snail taxa representing one TF group.
Table 43 shows that several taxa clustered together (using
presence-absence coefficients), regardless of the level of similarity, type of
data considered, or size of the data matrix: (1) Optioservus-Stenelmis, (2)
Chironomidae-Corydalus ornutus, (3) Hydropsyche-Cheumatopsyche-Dubiraphia,
(4) Isonychia-Baetis, (5) Goniobasis carinifera-Ephemerella, and (6)
Tricorythodes-Hetaerina. These were identified as (1) grazers, (2)
grazer-predators, (3) collector-grazers, (4) collector-grazers, (5) grazers,
and (6) collector-predators.
The six clusters were evaluated with regard to constancy and fidelity.
5.3.2.2.2 Quantitative data, species counts—Associations of taxa based
on counts of individuals per taxon were more difficult to identify and less
discrete. Table 44 summarizes clusters of taxa computed with the distance
coefficient Dist 7. Only three taxa (Psephenus herricki, Goniobasis spinella,
and Hetaerina) met the criteria of constancy and fidelity. Unfortunately, the
information provided by this coefficient was of limited value, and other
quantitative techniques should be considered in future studies.
5.3.2.2.3 Summary--0ne justification for dividing the original data
matrix into subsets was to determine whether the associations of taxa found at
stations unaffected by the spill were different from those found at stations
affected by the spill. To answer this question, the associations identified
in Tables 39 and 43 were collated. Table 45 shows that species associations
after the spill were much smaller (two to three taxa per association), than
before the spill. The table also shows that Stenelmis-Optioservus and
Chironomidae-Corydalus cornutus were the only associations found in
dendrograms of both unaffected and affected stations. This indicates that
associations were influenced by the pH stress and that division of the
original data matrix of 36 samples into subsets was justified.
5.3.2.3 Cumberland River Data Set—1973--
5.3.2.3.1 Presence-absence data--Jaccard's coefficient S produced
clusters with low levels of similarity (Figure 31). At a similarity level of
0.27, four clusters were formed, with several taxa left unclustered. The
first cluster consisted of a heterogeneous association of nine taxa from three
phyla—five detrital collectors, three predators, and one grazer. This group
included one omnivore and almost equal numbers of detritivores and carnivores.
Cluster 2 contained a small but equally heterogeneous group of taxa displaying
three different food habits. The last two clusters consisted entirely of
wormlike taxa that are usually abundant only in streams that have been
organically enriched.
85
-------
-0.105
0.045
0.195
0.345
0.495
0.645
0.795
7
0.945
— BRANCH.
— PROCLAD.
— C.(CRYP)
— HEXAGEN.
— LIM.SP.2
— CHAOBOR.
— PENT.SPE
— LIM.SP.l
— CORBIUC.
— COELOTAN
— PECTINAT
— PENT.MON
— CHIR.SP.
— LUM.SP.l
— NEM.SP.l
— CHIR.RIP
_ SMITTIA
— TANYPUS
— CHIR.TEN
— HELOB.
— SPHAER.
NEURECL.
0.495
0.645
0.795
0.945
-0.105 0.045 0.195 0.345
Figure 31. Dendrogram computed from R-mode cluster analysis of a matrix of Jaccard's
coefficients, showing distributional similarities of taxa collected from
the Cumberland River in 1973.
86
-------
Cluster analysis of the simple matching coefficient S produced clusters
with low levels of similarities (Figure 32). Two clusters were formed at a
level of similarity of 0.75. The first cluster comprises four taxa that were
also closely clustered with Jaccard's coefficient S . The second cluster is a
heterogeneous association of taxa that had been unclustered or placed in small
clusters by Jaccard's coefficient.
5.3.2.3.2 Quantitative data, species counts--The distance coefficient
Dist 6 (square-root transformation of data and standardization by rows)
produced clusters with low levels of similarity (Figure 33). Seven small
clusters were formed at a level of similarity of 1.11, with six taxa remaining
unclustered. Only one association had also occurred when presence-absence
coefficients were used.
The distance coefficient Dist 7 (square-root transformation of data)
produced a dendrogram in which almost all the taxa were grouped at the same
level of similarity and thus contained little useful information about
discrete associations (Figure 34).
The correlation coefficients Corr 6 and Corr 7 produced identical
matrices and dendrograms due to the standardization inherent in the use of the
product-moment correlation coefficient. Again, levels of similarity of
clusters were fairly low. At levels of 0.46 and 0.35, five and seven small
clusters, respectively, were formed (Figure 35). Some clusters contained
functionally and morphologically similar taxa, but others did not. Some
contained only frequently collected taxa, whereas others contained mixtures of
frequent and rare taxa. Very little ecological information that pertained to
the question of associations of taxa came from study of these dendrograms.
87
-------
0.413
0.488
0.563
0.638
0.713
0.788
0.863
0.938
r
T
T
T
T
T
I
BRANCH.
PROCLAD.
C.(CRYP)
HEXAGEN.
CHAOBOR.
PENT.SPE
LIM.SP.2
CHIR.RIP
— CHIR.TAN
HELOB.
NBURBCL.
SPHAER.
L_ TANYPUS
SMITTIA
NEM.SP.l
PENT.MON
COELOTAN
PECTINAT
CORBIUC.
CHIR.SP.
LUM.SP.l
LIM.SP.l
I
0.413 0.488 0.563 0.638 0.713 0.788 0.863 0.938
Figure 32. Dendrogram computed from R-mode cluster analysis of a matrix of simple
matching coefficients, showing distributional similarities of taxa
collected from the Cumberland River in 1973.
88
-------
1.422 1.332
DIST6
1.242 1.152 1.062 0.972
0.882
0.792
r~
i
i
NEM SP 1
. . CHIP TAN
— — — — PROCLAD .
_ r (cicn*}
,,,_^_ TM T TT T fi
TANYPITS
CHAr>BriB
SPHAER.
. NEURZCL
1 i 1 1 1 | 1
1.422 1.332 1.242 1.152 1.062 0.972 0.882 0.792
Figure 33. Dendrogram computed from R-mode cluster analysis of a matrix of distance
coefficients with data transformed with the square-root transformation
and standardized by rows, showing distributional similarities of taxa
collected from the Cumberland River in 1973.
89
-------
DIST 7
16.000
I
13.500
1
11.000
—I
8.500
—I
6.000
—1
3.500
—I
1.000
—I—
-1.500
BRANCH.
CHIR.RIP
CHIR.TAN
TANYPUS
HELOB.
NEM.SP.l
PENT.MON
SPHAER.
NEURECL.
SMITTIA
COELOTAN
CHIR.SP.
CORBIUC.
PECTINAT
C.(CRYP)
PENT.SPE
PROCLAD.
HEXAGEN.
LUM.SP.l
CHAOBOR.
LIM.SP.l
LIM.SP.2
16.000 13.500 11.000 8.500 6.000 3.500 1.000 -1.500
Figure 34. Dendrogram computed from R-mode cluster analysis of a matrix of distance
coefficients with data transformed by the square-root transformation,
showing distributional similarities of taxa collected from the Cumberland
. River in 1973.
90
-------
CORR6
.210 -0.060 0.090 0.240 0.390 0.540 0.690 0.840
i 1
i i
....... •pFrTTTSTAT
. 11W HEKACEN
i i i i i i
-0.210 -0.060 0.090 0.240 0.390 0.540 0.690 0.840
Figure 35. Dendrogram computed R-mode cluster analysis of a matrix of correlation
coefficients with data transformed by the square-root transformation and
standardized by rows, showing distributional similarities of taxa collected
from the 'Cumberland River in 1973.
91
-------
TABLE 16. COEFFICIENTS OF CORRELATION, DISTANCE, AND SIMILARITY: ABBREVIATIONS, EQUATIONS,
AND UPPER AND LOWER LIMITS3
Coefficient
Pearson product-moment
correlation coefficient
Sokal's average taxonomic
distance coefficient
Simple matching
coefficient
Unnamed coefficient 1
Rogers and Tanimoto's
coefficient
Unnamed coefficient 3
Hamann
Jaccard
Russell and Rao
Dice
Unnamed coefficient 2
Abbreviation
Corr
Dist
SSM
UNI
RT
UN3
H
SJ
RR
D
UN2
Equation
r .. = s ., /s . s.
jk jk j k
d/r v f-v v \ o t „ \
.. = vIECX. . - X )Z/nj
J K. 1 J IK
SSM = (a + d)/n
SUN1 = 2(a + d)/n + a + d
SRT = (a + d)/(a + d + 2b + 2c)
SUN3 = (a + d)/(b + c)
£„ = (a + d - b - c)/n
H.
S = a/(a + b + c)
•J
S = a/n
S = 2a/(2a + b + c)
S._TO = a/ (a + 2b + 2c)
Lower
limit
-1
0
0
0
0
0
0
0
0
0
0
Upper
limit
1
CO
1
1
1
CO
1
1
1
1
1
-------
TABLE 16 (continued)
1C
w
Coefficient
Kulczynski 1
Kulczynski 2
Unnamed coefficient 4
Ochiai (Otsuka)
Unnamed coefficient 5
Yule
Phi
Abbreviation
Kl
K2
UN4
OCH
UN5
Y
PHI
Lower
Equation limit
S = a/(n. + n - 2a) 0
JxJ. J ix
Svo = l/2(a/n.) + l/2(a/nv) 0
IxZ J IS.
STrKT. = l/4(a/nT) + l/4(a/nT,) 0
UN4 J K
SOCH - a//(nJ V °
^5 ~ <,nj nK nj nk->
SY = (ad - be) /(ad + be) -1
S = (ad — bc)//(n n n. n. ) —1
PHI J K j k
Upper
limit
oo
1
1
1
1
1
1
See Sokal and Sneath (1963) for further discussion.
-------
TABLE 17. CONTINGENCY TABLE (2 X 2) DEFINING THE TERMS
a., b_, £, AND d. AS USED IN THE EQUATIONS IN TABLE 13
Sample k
Sample j
Species present
Species absent
Species present
Species absent
a (present in both)
c (present in j,
absent from k)
b (present in k, absent
from j)
d (absent from both;
negative match)
TABLE 18. EFFECT OF THE TRANSFORMATION LOG (X.. + 1),
WHERE X IS THE ABUNDANCE OF THE ith SPECIES IN THE^th SAMPLE
Abundance
0
9
99
999
9999
Value after transformation
0
1
2
3
4
TABLE 19. EFFECT OF THE TRANSFORMATION /(X. . + 0.5), WHERE X..
IS THE ABUNDANCE OF THE i.th SPECIES IN^HE j_th SAMPLE J
Abundance
Value after transformation
0
10
100
1000
10000
0.707
3.240
10.025
31.631
100.003
94
-------
TABLE 20. LABELS OF CORRELATION AND DISTANCE MATRICES
WITH VARIOUS TRANSFORMATION
Transformation
Correlation
Distance
None (raw data) Corr 1
Standardization by rows Corr 2
Log transformation,
standardization by rows Corr 4
Square root transformation,
standardization by rows Corr 6
Square root transformation,
no standardization Corr 7
Dist 1
Dist 2
Dist 4
Dist 6
Dist 7
95
-------
TABLE 21. MATRIX OF COEFFICIENTS OF COPHENETIC CORRELATION COMPUTED BETWEEN CORRESPONDING
ELEMENTS OF 21 CORRELATION AND SIMILARITY MATRICES3
Corr 1
Corr 2
Corr 4
Corr 6
Corr 7
SSM
UNI
RT
UN 3
H
SJ
RR
D
UN2
Kl
K2
UN4
OCR
UN5
Y
PHI
Corr 1
1.000
0.021
0.137
0.086
0.988
0. 121
0.116
0.127
0.129
0.121
0.224
0.223
0.240
0.211
0.169
0.336
0.289
0.282
0.251
0.344
0.270
Corr 2
1.000
0.786
0.925
0.055
0.494
0.478
0.509
0.446
0.494
0.283
0.138
0.253
0.302
0.286
0.237
0.348
0.253
0.339
0.211
0.351
Corr 4
1.000
0.943
0.202
0.646
0.622
0.668
0.600
0.646
0.537
0.348
0.508
0.554
0.507
0.531
0.631
0.529
0.618
0.486
0.637
Corr 6
1.000
0.139
0.571
0.550
0.590
0.525
0.571
0.421
0.257
0.389
0.441
0.409
0.386
0.491
0.397
0.484
0.341
0.498
Corr 7
1.000
0.171
0.165
0.176
0.163
0.171
0.282
0.273
0.300
0.266
0.208
0.407
0.360
0.348
0.316
0.415
0.341
SSM
1.000
0.996
0.994
0.759
1.000
0.700
0.424
0.668
0.711
0.611
0.648
0.828
0.674
0.812
0.619
0.834
UNI
1.000
0.981
0.718
0.996
0.684
0.418
0.658
0.689
0.577
0.636
0.817
0.663
0.796
0.619
0.832
RT
1.000
0.809
0.994
0.712
0.427
0.673
0.732
0.654
0.654
0.832
0.679
0.822
0.608
0.837
UN3
1.000
0.759
0.599
0.338
0.532
0.662
0.805
0.521
0.651
0.538
0.674
0.417
0.657
H
1.000
0.700
0.424
0.668
0.711
0.611
0.648
0.828
0.674
0.812
0.619
0.834
SJ
1.000
0.916
0.987
0.989
0.804
0.900
0.858
0.972
0.958
0.678
0.895
Abbreviations defined in Table 20.
-------
TABLE 21 (continued)
10
RR D UN2 Kl K2 UN4 OCH UN5 Y PHI
Corr 1
Corr 2
Corr 4
Corr 6
Corr 7
s
SM
UNI
RT
UN3
H
s
J
RR
D
UN2
Kl
K2
UN4
OCH
UN5
Y
1.000
0.910
0.889
0.705
0.813
0.667
0.889
0.787
0.546
1.000
0.955 1.000
0.731 0.871 1.000
0.916 0.870 0.669 1.000
0.856 0.845 0.697 0.951 1.000
0.987 0.940 0.719 0.968 0.911 1.000
0.953 0.943 0.765 0.941 0.957 0.968 1.000
0.721 0.633 0.445 0.906 0.912 0.809 0.811 1.000
PHI 0.718 0.891 0.880 0.707 0.951 0.995 0.933 0.975 0.886 1.000
-------
TABLE 22. COEFFICIENTS OF COPHENETIC CORRELATION COMPARING
DISTANCE MATRICES AND SELECTED CORRELATION AND SIMILARITY MATRICES3
Dist 1
Dist 1 1.000
Dist 2 0.332
Dist 4 0.227
Dist 6 0.326
Dist 7 0.890
Corr 6
Corr 7
SOM -0.157
SM
ST 0.093
J
Dist 2 Dist 4 Dist 6 Dist 7
1.000
0.878 1.000
0.978 0.951 1.000
0.416 0.320 0.431 1.000
-0.276 -0.514
-0.412 -0.680 -0.532 -0.190
-0.100 -0.358 -0.210 -0.060
aData collected from Cumberland River (Old Hickory Reservoir) in 1973.
98
-------
TABLE 23. NUMBER OF TAXA IN EACH MAJOR TAXONOMIC GROUP IN THE
CLINCH RIVER (1970) BEFORE AND AFTER RELATIVE SPECIES
ABUNDANCE WAS DETERMINED AND RARE TAXA WERE ELIMINATED
Major taxonomic
group
Amphipoda
Annelida
Coleoptera
Decapoda
Diptera
Ephemeroptera
Gastropoda
Hemiptera
Hydracarina
Lepidoptera
Megaloptera
Odonata
Pelecypoda
Plecoptera
Trichoptera
Total
Number of
All taxa
1
3
22
4
13
15
14
1
1
2
3
14
6
7
17
123
taxa per group
Rare taxa
eliminated
6
3
8
5
1
1
1
2
_2
29
Based on a relative abundance (RA) < 0.01.
99
-------
TABLE 24. TWENTY-NINE TAXA AND THEIR RESPECTIVE TROPHIC CODES
FOR THE REDUCED CLINCH RIVER DATA SET, 1970
Taxon Trophic code* Taxon Trophic code*
Stenelmis sp.
Microcylloepus sp.
Optioservus sp.
Dubiraphia sp.
Promoresia sp.
Psephenus herricki
Chironomidae
Simulium sp.
Hemerodromia sp .
Ephoron sp.
Potamanthus sp.
Stenonema sp .
H eplageni a s p .
Isonychia sp.
Ephemerella sp.
3411
3411
3411
3411
3411
3412
3524
2212
4713
2612
2612
3424
3424
2224
3424
Tricorythodes sp.
Baetis sp.
Anculosa sp.
Anculosa subglobosa
Pleuroceridae
Goniobasis spinella
Goniobasis carinifera
Corydalus cornutus
Hetaerina sp.
Sphaerium sp.
Perlesta placida
Acroneuria sp.
Hydropsyche sp.
Cheumatopsyche sp.
2312
3424
3411
3411
3411
3411
3411
4613
4613
2212
4624
4624
2224
2224
*Trophic codes are listed in Tables 11 and 12 (section 4.1.3).
100
-------
TABLE 25. NUMBER OF TAXA WITH A RELATIVE ABUNDANCE >0.01
DIVIDED BY THE TOTAL NUMBER OF TAXA PER STATION,
CLINCH RIVER, 1970
Station
number
4
7
8
9
10
11
Early
June
28/64
24/54
24/47
25/48
26/50
26/49
Late
June
24/46
16/28
12/24
19/39
Early
July
23/38
16/25
13/30
19/35
Late
July
27/72
24/43
16/30
17/36
20/39
22/56
Early
August
20/43
19/33
16/29
22/48
Late
August
25/57
25/50
15/29
18/35
21/43
23/44
TABLE
THE
Station
number
4
7
8
9
10
11
26. TAXA WITH A RELATIVE ABUNDANCE >0.01 AS PERCENT OF
TOTAL NUMBER OF TAXA PER STATION, CLINCH RIVER, 1970
Early
June
44
44
51
52
52
53
Late
June
52
57
50
49
Early
July
61
64
43
54
Late
July
38
56
53
47
51
39
Early
August
47
58
55
46
Late
August
44
50
52
51
49
101
-------
TABLE 27. TOTAL NUMBER OF ORGANISMS PER STATION,
CLINCH RIVER. 1970
Station
number
4
7
8
9
10
11
Early
June
2693
1830
1465
1738
3857
1645
Late
June
853
703
936
1186
Early
July
989
901
1191
2189
Late
July
3847
1636
2618
5042
5614
2723
Early
August
562
656
1010
2945
Late
August
2522
912
397
812
2351
1557
TABLE 28. TWENTY-NINE
AS PERCENT OF
TAXA WITH
RELATIVE
TOTAL NUMBER OF ORGANISMS
CLINCH RIVER, 1970
ABUNDANCE
>0.01
PER STATION,
Station
number
4
7
8
9
10
11
Early
June
95
95
96
96
97
91
Late
June
90
97
97
97
Early
July
93
97
97
98
Late
July
93
88
98
97
99
96
Early
August
84
89
96
97
Late
August
93
88
82
89
90
86
102
-------
TABLE 29. ANALYTICAL TECHNIQUE, TYPE OF COMPARISON, AND THE
NUMBER AND TYPE OF SIMILARITY COEFFICIENTS USED TO ANALYZE
THE REDUCED 1970 CLINCH RIVER DATA SET
Nonmetric multi- ,
Cluster analysis dimensional scaling
Q-mode R-mode Q-mode R-mode
Unaffected stations - 6 - -
Affected stations - 6 -
All stations 662
Total dendrograms = 24 Total scatter diagrams = 2
o
Six coefficients were used: two correlation, two
distance, and two presence-absence coefficients.
Two coefficients were used: one distance and one
presence-absence coefficient.
103
-------
TABLE 30. COPHENETIC CORRELATION VALUES (r ) FOR 24 DENDROGRAMS
COMPUTED FROM THE CLINCH RIVER DATA SET, SflNE TO AUGUST 1970
SM
Corr 6
Corr 7
Dist 6
Dist 7
R-mode
0.97 0.84 0.73 0.63 0.76 0.97
0.97 0.91 0.63 0.63 0.69 0.92
0.95 0.76 0.75 0.75 0.78 0.97
Q-mode
0.93 0.77 0.82 0.70 0.91 0.80
Affected stations
Unaffected stations
All stations
All stations
104
-------
TABLE 31. RESULTS OF Q-MODE CLUSTER ANALYSIS OF ZOOMACROBENTHIC
SAMPLES FROM THE CLINCH RIVER, 1970: MINIMUM LEVEL OF
SIMILARITY USED TO DEFINE CLUSTERS = 0.62; JACCARD'S COEFFICIENT
Station
Sampling Interval 4 7LB&MS 7RB 8 9 10 11
Cluster 1
Before pH stress XX XXX
Immediately after pH stress X X
2 weeks after X X
4 weeks after XX XXX
6 weeks after XXX
8 weeks after X X X X X
Cluster 2
Before pH stress
Immediately after pH stress XXX
2 weeks after X X
4 weeks after X
6 weeks after
8 weeks after
Unclustered stations
Before pH stress X
Immediately after pH stress X
2 weeks after X
4 weeks after X
6 weeks after X X
8 weeks after X X
105
-------
TABLE 32. RESULTS OF Q-MODE CLUSTER ANALYSIS OF ZOOMACROBENTHIC
SAMPLES FROM THE CLINCH RIVER, 1970: LEVEL OF SIMILARITY
USED TO DEFINE CLUSTERS = 0.06; CORRELATION COEFFICIENT 6,
/Y + 0.5 TRANSFORMATION AND STANDARDIZATION BY ROWS
Station
Date 4 7LB&MS 7RB 8 9 10 11
Cluster 1
Before pH stress XXX
Immediately after pH stress X X
2 weeks after X X
4 weeks after XXX
6 weeks after X X
8 weeks after XXX
Cluster
Before pH stress X
Immediately after pH stress X X
2 weeks after X X
4 weeks after
6 weeks after
8 weeks after
Cluster 3
Before pH stress
Immediately after pH stress
2 weeks after X
4 weeks after X X X X
6 weeks after XX X
8 weeks after X X X X
Cluster 4
Before pH stress X X
Immediately after pH stress X
2 weeks after
4 weeks after
6 weeks after
8 weeks after
106
-------
TABLE 33. RESULTS OF Q-MODE CLUSTER ANALYSIS OF ZOOMACROBENTHIC
SAMPLES FROM THE CLINCH RIVER, 1970: LEVEL OF SIMILARITY
USED TO DEFINE CLUSTERS =1.2; DISTANCE COEFFICIENT 6,
/Y + 0.5 TRANSFORMATION AND STANDARDIZATION BY ROWS
Date
Station
7LB&MS 7RB 8
10 11
Cluster 1
Before pH stress
Immediately after pH stress
2 weeks after
4 weeks after
6 weeks after
8 weeks after
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Clusters 2 and 4 and unclustered stations (U)
Before pH stress U
Immediately after pH stress
2 weeks after
4 weeks after
6 weeks after
8 weeks after
2
2
2
2
U
U
4
4
Cluster 3
Before pH stress
Immediately after pH stress
2 weeks after
4 weeks after
6 weeks after
8 weeks after
X
X
X
X
107
-------
TABLE 34. RESULTS OF Q-MODE CLUSTER ANALYSIS OF ZOOMACROBENTHIC
SAMPLES FROM THE CLINCH RIVER, 1970: LEVEL OF SIMILARITY
USED TO DEFINE CLUSTERS = 4.8; DISTANCE COEFFICIENT 7,
/Y + 0.5 TRANSFORMATION AND STANDARDIZATION BY ROWS
Date
4 7LB&MS
Station
7RB 8 9 10 11
Cluster 1
Before pH stress XX XXX
Immediately after pH stress
2 weeks after
4 weeks after X
6 weeks after X
8 weeks after X X
Cluster 2
Before pH stress X
Immediately after pH stress X X X X X
2 weeks after X XXX
4 weeks after X X
6 weeks after X XXX
8 weeks after X XXX
Cluster 3 and unclustered stations
Before pH stress
Immediately after pH stress
2 weeks after X
4 weeks after X X U U
6 weeks after
8 weeks after
108
-------
TABLE 35. SUMMARY OF RESULTS OF Q-MODE CLUSTER ANALYSES, CUMBERLAND RIVER, 1973
O
VD
Coefficients
Characteristics
rcc
Similarity level
of clusters
Interpretable
structure
Highest
associations
Principal causes
of association
(in order of
importance)
SJ
0.85
1
good
4 Oct
5 Oct
3 Jun
6 Jun
1 Jul
5 Jul
1 Oct
6 Oct
6 Aug
4 Oct
5 Oct
Station
location and
month sampled
SSM
0.64
3
good
2 Jun
6 Jul
3 Jun
6 Jun
4 Oct
5 Oct
1 Oct
6 Oct
1 Jul
5 Jul
Station
location
and month
sampled
Dist 6
0.62
1
poor
6 Jul
2 Aug
2 Oct
6 Jul
2 Jul
2 Oct
6 Jul
2 Aug
2 Jul
2 Oct
6 Jul
2 Aug
3 Sep
1 Oct
6 Oct
Station
location
Dist 7
0.83
2
good
1 Oct
6 Oct
2 Jul
2 Oct
6 Jul
2 Oct
3 Jul
5 Jul
4 Aug
5 Oct
Station
location and
month sampled
Corr 6
0.62
1
poor
1 Oct
6 Oct
6 Jul
2 Aug
1 JUn
2 Jun
4 Oct
5 Oct
3 Jun
5 Aug
Station
location and
month sampled
Corr 7
0.98
3
fair
3 Jul
5 Jul
1 Oct
6 Oct
3 Jun
6 Jul
4 Oct
5 Oct
2 Aug
5 Aug
Month
sampled and
station
location
as indicated
by analysis of
biological and
chemical data
-------
TABLE 35 (continued)
Coefficients
Characteristics
SM
Dist 6
Dist 7
Corr 6
Corr 7
Principal causes
of association as
indicated solely
by the highest
biological
associations
Month
sampled and
station
location
Month Station
sampled location
and
station
location
Station
location and
month sampled
Month
sampled and
station
location
Month sampled
and station
location
1 = distinct clusters form only at relatively low similarity levels;
2 = distinct clusters form at intermediate similarity levels;
3 = distinct clusters form at relatively high similarity levels.
bResults of cluster analysis were weighed heavily in this evaluation.
t-1
o CAssociations listed in characteristic four of this table.
-------
TABLE 36. SUMMARY OF RESULTS OF Q-MODE CLUSTER ANALYSES, CUMBERLAND RIVER, 1975
Coefficients
Characteristics
rcc
Similarity level
of clusters
Interp re table
structure
Highest
associations
i-"
h-*
ST
J
0.88
1
good
3 Jun
3 Jul
4 Jun
4 Jul
5 Jun
5 Jul
6 Jun
6 Jul
7 Jun
7 Jul
2 May
5 May
S0«
SM
0.70
2
good
3 Jun
3 Jul
4 Jun
4 Jul
5 Jun
5 Jul
6 Jun
6 Jul
7 Jun
7 Jul
2 May
5 May
Dist 6
0.95
2
poor
3 Jun
3 Jul
4 Jun
4 Jul
5 Jun
5 Jul
6 Jun
6 Jul
7 Jun
7 Jul
3 May
6 May
Dist 7
0.84
2
fair
3 Jun
3 Jul
4 Jun
4 Jul
5 Jun
5 Jul
6 Jun
6 Jul
7 Jun
7 Jul
3 May
6 May
Corr 6
0.73
1
poor
3 Jun
3 Jul
4 Jun
4 Jul
5 Jun
5 Jul
6 Jun
6 Jul
7 Jun
7 Jul
3 May
3 Aug
Corr 7
0.74
2
poor
3 Jun
3 Jul
4 Jun
4 Jul
5 Jun
5 Jul
6 Jun
6 Jul
7 Jun
7 Jul
1 May
3 May
-------
TABLE 36 (continued)
Coefficients
Characteristics Sow
SM
Principal cause(s) Month in
of association which col-
(in order of lected
importance) as
indicated by
analysis of both
biological and
physicochemical
data
Principal cause(s) Station
of association location
i-" (in order of and month
N> importance) as in which
indicated solely collected
by the highest
biological
associations
Dist
Month in
which col-
lected
and station
location
Station
location
and month
in which
collected
6 Dist
Month in
which col-
lected and
station
location
Station
location
and month
in which
collected
7 Corr
Month in
which col-
lected
and
station
location
Station
location
and month
in which
collected
6
Month in
which col-
lected and
station
location
Station
location
and month
in which
collected
Corr 7
Month in which
collected and
station loca-
tion
Station location
and month in
which collected
al = distinct clusters form only at relatively low similarity levels;
2 = intermediate;
3 = distinct clusters form at relatively high similarity levels.
bResults of cluster analysis were included in this evaluation.
cAssociations listed in characteristic four of this table.
-------
TABLE 37. RESULTS OF R-MODE CLUSTER ANALYSIS OF JACCARD'S (S ) AND SIMPLE-MATCHING (S )
COEFFICIENTS, CLINCH RIVER DATA SET, 1970
CATEGORY OF STATIONS INCLUDED (COEFFICIENT, SIMILARITY LEVEL AT WHICH CLUSTERS FORMED)
Affected
(SJ5 0.88)
Affected
(SSM, 0.88)
All stations
(S, 0.77)
Affected
, 0.71)
Affected
(SJ} 0.58)
Stenelmis
Optioservus
Chironomidae
Corydalus cornutus
Stenelmis
Optioservus
Microcylloepus
Corydalus cornutus
Chironomidae
Stenelmis
Optioservus
Dubiraphia
Hydropsyche
Chironomidae
Corydalus cornutus
Cheumatopsyche
Acroneuria
Microcylloepus
Promoresia
Hemerodromia
Stenelmis
Optioservus
Microcylloepus
Corydalus cornutus
Chironomidae
Dubiraphia
Hemerodromia
Hydropsyche
Cheumatopsyche
Promoresia
Acroneuria
Isonychia
Baetis
Stenelmis
Optioservus
Chironomidae
Corydalus cornutus
Microcylloepus
Cheumatopsyche
Dubiraphia
Hemerodromia
Hydropsyche
Fromoresia
Acroneuria
Goniobasis spinella
Isonychia
Baetis
Tricorythodes
Hetaerina
Microcylloepus
Cheumatopsyche
Dubiraphia
Hemerodromia
Dubiraphia
Hemerodromia
Hydropsyche
Cheumatopsyche
Isonychia
Baetis
Ephemerella
Goniobasis carinifera
Tricorythodes
Hetaerina
Ephemerella
Goniobasis carinifera
Tricorythodes
Hetaerina
Ephemerella Simulium
Goniobasis carinifera Anculosa
Tricorythodes
Hetaerina
-------
TABLE 38. RESULTS OF R-MODE CLUSTER ANALYSIS OF DISTANCE COEFFICIENTS, CLINCH RIVER DATA SET, 1970
CATEGORY OF STATIONS INCLUDED (COEFFICIENT, SIMILARITY LEVEL AT WHICH CLUSTERS FORMED)
Affected
(Dist 7, 1.8)
Affected
(Dist 7, 5.3)
All stations
(Dist 7, 4.2)
Unaffected
(Dist 7, 2.3)
Promoresia
Hemerodromia
Psephenus herricki
Potamanthus
Anculosa subglobosa
Goniobasis carinifera
Anculosa
Goniobasis spinella
Stenonema
Perlesta placida
Ephemerella
Tricorythodes
Hetaerina
Stenelmis
Optioservus
Microcylloepus
Corydalus cornutus
Baetis
Chironomidae
Dubiraphia
Promoresia
Hemerodromia
Psephenus herricki
Potamanthus
Anculosa subglobosa
Goniobasis carinifera
Anculosa
Goniobasis spinella
Stenonema
Microcylloepus
Corydalus cornutus
Chironomidae
Baetis
Promoresia
Hemerodromia
Psephenus herricki
Anculosa subglobosa
Pleuroceridae
Sphaerium
Goniobasis spinella
Heptagenia
Hetaerina
Acroneuria
Perlesta placida
Ephemerella
Tricorythodes
Ephoron
Microcylloepus
Hemerodromia
Hetaerina
Promoresia
Pleuroceridae
Sphaerium
Psephenus herricki
Anculosa subglobosa
Goniobasis spinella
Acroneuria
Ephemerella
Corydalus cornutus
-------
TABLE 38 (continued)
Affected Affected All stations Unaffected
(Dist 7. 1.8) (Dist 7, 5.3) (Dist 7, 4.2) (Dist 7, 2.3)
Perlesta placida Potamanthus
Ephemerella
Tricorythodes
Hetaerina Stenonema
Acroneuria Anculosa
Goniobasis carinifera
-------
TABLE 39. RESULTS OF R-MODE CLUSTER ANALYSIS OF S AND S ,
CLINCH RIVER STATIONS UNAFFECTED BY THE 1970 pH STRESS
CATEGORY OF STATIONS INCLUDED (COEFFICIENT, SIMILARITY LEVEL AT WHICH CLUSTERS FORMED)
Unaffected (S , 0.89)Unaffected (S 0.87)~
u5i 1 J
Stenelmis Stenelmis
Optioservus Optioservus
Dubiraphia Dubiraphia
Ephemerella Ephemerella
Baetis Baetis
Hydropsyche Hydropsyche
Psephenus herricki Psephenus herricki
Goniobasis spinella Goniobasis spinella
Stenonema Stenonema
Isonychia Isonychia
Goniobasis carinifera Goniobasis carinifera
Acroneuria Acroneuria
Tricorythodes Tricorythodes
Chironomidae Chironomidae
Corydalus cornutus Corydalus cornutus
Cheumatopsyche Cheumatopsyche
Potamanthus Potamanthus
Anculosa Anculosa
Anculosa subglobosa Anculosa subglobosa
-------
TABLE 40. TROPHIC-FUNCTIONAL CODES FOR TAXA CLUSTERED BY R-MODE CLUSTER ANALYSIS OF
JACCARD'S (S ) AND SIMPLE-MATCHING (S ) COEFFICIENTS, CLINCH RIVER DATA SET, 1970
J Oil
CATEGORY OF STATIONS INCLUDED (COEFFICIENT, SIMILARITY LEVEL AT WHICH CLUSTERS FORMED)
Affected
(SJf 0.88)
3524
4613
2224
3411
4713
Affected
(SSM, 0.88)
3411
3524
4613
2224
3411
4713
2312
3424
All stations
(Sj, 0.77)
2224
3411
3524
4613
4624
3411
4713
2224
3424
3411
3424
2313
4613
Affected
(SSM, 0.71)
2224
3411
3524
4613
4624
4713
2224
3424
2312
3411
3424
4613
2212
3411
Affected
(Sj, 0.58)
2224
3411
3524
4613
4624
4713
2224
2312
3424
4613
3411
3424
-------
TABLE 41. TROPHIC-FUNCTIONAL CODES FOR TAXA CLUSTERED BY R-MODE CLUSTER ANALYSIS
OF DIST 7 COEFFICIENTS CLINCH RIVER DATA SET, 1970
CATEGORY OF STATIONS INCLUDED (COEFFICIENT, SIMILARITY LEVEL AT WHICH CLUSTERS FORMED)
00
Affected
(Dist 7, 1.8)
3411
4713
2312
2612
3411
3412
3424
4613
4624
Affected
(Dist 7, 5.3)
3411
2312
2612
3411
3412
3424
3524
4613
4624
4713
All stations
(Dist 7, 4.2)
3411
3424
3524
4613
2212
2312
2612
3411
3412
3424
4613
4624
4713
3411
3424
Unaffected
(Dist 7, 2.3)
2212
3411
3412
4613
4624
4713
3424
4613
-------
TABLE 42. TROPHIC-FUNCTIONAL CODES FOR TAXA
CLUSTERED BY R-MODE CLUSTER ANALYSIS OF
SIMPLE-MATCHING (S ) AND JACCARD'S (S )
COEFFICIENTS, CLINCH RIVER, 1970
Unaffected3 (STb, 0.87° and SCMb, 0.89°)
j on
2224
3411
3412
3424
2224
2312
3411
3424
4624
2224
2612
3524
4613
3411
a
Stations unaffected by spill.
Clustering coefficient utilized.
c
Similarity level at which clusters formed.
-------
TABLE 43. RESULTS OF R-MODE CLUSTER ANALYSIS OF S AND S AFTER REORDERING CLUSTERS
ACCORDING TO TROPHIC-FUNCTIONAL CODES, CLINCH RIVER, 1970
CATEGORY OF STATIONS INCLUDED (COEFFICIENT, SIMILARITY LEVEL AT WHICH CLUSTERS FORMED)
Affected
(SJt 0.88)
Optioservus
Stenelmis
Chironomidae
Corydalus cornutus
Cheumatopsyche
Hydropsyche
Dubiraphia
Microcylloepus
Hemerodromia
Affected
(S , 0.88)
Optioservus
Stenelmis
Microcylloepus
Chironomidae
Corydalus cornutus
Cheumatopsyche
Hydropsyche
Dubiraphia
Hemerodromia
All stations
(Sj, 0.77)
Cheumatopsyche
Hydropsyche
Dubiraphia
Optioservus
Stenelmis
Chironomidae
Corydalus cornutus
Acroneuria
Microcylloepus
Promoresia
Hemerodromia
Isonychia
Baetis
Affected
(SSM, 0.71)
Cheumatopsyche
Hydropsyche
Dubiraphia
Microcylloepus
Optioservus
Promoresia
Stenelmis
Chironomidae
Corydalus cornutus
Acroneuria
Hemerodromia
Isonychia
Baetis
Affected
(Sj, 0.58)
Cheumatopsyche
Hydropsyche
Dubiraphia
Microcylloepus
Optioservus
Promoresia
Stenelmis
Goniobasis spinella
Chironomidae
Corydalus cornutus
Acroneuria
Hemerodromia
Isonychia
Tricorythodes
Baetis
Hetaerina
-------
TABLE 43 (continued)
Affected
(SJf 0.88)
Affected
S, 0.88)
All stations
(SJf 0.77)
Affected
S, 0.71)
Affected
(SJf 0.58)
1-0
Goniobasis carinifera
Ephemerella
Tricorythodes
Goniobasis carinifera
Ephemerella
Hetaerina
Goniobasis carinifera
Ephemerella
Tricorythodes
Hetaerina
Tricorythodes
Hetaerina
Simulium
Anculosa
-------
TABLE 44. RESULTS OF R-MODE CLUSTER ANALYSIS OF DIST 7 COEFFICIENTS AFTER REORDERING
CLUSTERS ACCORDING TO TROPHIC-FUNCTIONAL CODES, CLINCH RIVER, 1970
CATEGORY OF STATIONS INCLUDED (COEFFICIENT, SIMILARITY LEVEL AT WHICH CLUSTERS FORMED)
Affected
(Dist 7, 1.8)
Affected
(Dist 7, 5.3)
All stations
(Dist 7, 4.2)
Unaffected
(Dist 7, 2.3)
Promoresia
Hemerodromia
Tricorythodes
Potamanthus
Anculosa
Anculosa subglobosa
Goniobasis carinifera
-*
Goniobasis spiiiella
Psephenus herricki
Ephemerella
Stenonema
Hetaerina
Perlesta placida
Optioservus
Stenelmis
Tricorythodes
Potamanthus
Dubiraphia
Microcylloepus
Promoresia
Anculosa
Anculosa subglobosa
Goniobasis carinifera
Goniobasis spinella
Psephenus herricki
Baetis
Ephemerella
Stenonema
Chironomidae
Corydalus cornutus
Hetaerina
Acroneuria
Perlesta placida
Hemerodromia
Microcylloepus
Baetis
Chironomidae
Corydalus cornutus
Sphaerium
Tricorythodes
Ephoron
Potamanthus
Promoresia
Anculosa subglobosa
Goniobasis spinella
Pleuroceridae
Psephenus herricki
Ephemerella
Heptagenia
Hetaerina
Acroneuria
Perlesta placida
Hemerodromia
Sphaerium
Microcylloepus
Promoresia
Anculosa subglobosa
Goniobasis spinella
Pleuroceridae
Psephenus herricki
Hetaerina
Acroneuria
Hemerodromia
Ephemerella
Corydalus cornutus
-------
TABLE 45. CLUSTERS OF TAXA DEFINED BY R-MODE CLUSTER ANALYSIS OF JACCARD'S AND THE
SIMPLE-MATCHING COEFFICIENTS, CLINCH RIVER, 1970
~ Unaffected stations Affected stations
Stenelmis Stenelmis
Optioservus Optioservus
Dubiraphia
Ephemerella Hydropsyche
Baetis Cheumatopsyche
Hydropsyche Dubiraphia
Psephenus herricki
Goniobasis spinella Isonychia
Baetis
Stenonema
Isonychia Goniobasis carinifera
Acroneuria Ephemerella
M Trieorythodes
S3
u>
Chironomidae Chironomidae
Corydalus cornutus Corydalus cornutus
Cheumatopsyche
Potamanthus Trieorythodes
Hetaerina
Anculosa
Anculosa subglobosa
-------
SECTION 6
ORDINATION--NONMETRIC MULTIDIMENSIONAL SCALING
6.1 GENERAL DESCRIPTION
Ordination is an analytical technique often used when dealing with large
sets of data. In aquatic ecology, ordination involves plotting either the
samples or the species found in the samples in a two- or three-dimensional
diagram. The choice of axes on which the points are plotted depends on the
method of ordination selected. In most ordinations, either samples are
plotted in a space where the axes are species or species are plotted in a
space defined by the samples. The primary advantage of ordination over
cluster analysis is that it "allows one to examine a scatter diagram
displaying a summary of the structure of the data without having to first
assume that clusters are present" (Rohlf, 1970). In this sense, ordination is
a powerful alternative to cluster analysis.
Several techniques of ordination are available, including principal
component ordination, principal coordinate ordination, the Bray-Curtis polar
ordination technique (Whittaker, 1975), and nonmetric multi-dimensional
scaling (MDS) (Kruskal, 1964a and 1964b). Only nonmetric MDS is dealt with in
this study because it seems ideally suited to the kinds of data obtained from
biological surveys. Specifically, nonmetric MDS is computationally robust
when data are missing; it can accommodate quantitative, ranked, or
presence-absence data; and it can be used with any kind of measure of
correlation, similarity, or distance.
6.2 ANALYTICAL PROCEDURES
In nonmetric MDS, similarities or distances between samples are treated
on the ordinal scale; that is, they are ordered from smallest to largest. A
configuration is then found in which "rank order of (ratio scaled) distances
best produces the original input ranks. One tries to do this in the lowest
dimensionality that produces a 'close enough" ordinal fit" (Green and
Carmone, 1970). The important point is that the MDS program operates on
ranked similarities and distances rather than on actual similarities. Thus,
nonmetric MDS is perfectly applicable to matrices of similarity and distance
computed from ranked data or even presence-absence data.
For a specified number of dimensions chosen in advance by the
investigator, the computer programs "try to find a configuration of points
whose interpoint distances are monotone—that is, have the same (or possibly
the inverse) ranks as the input data" (Green and Carmone, 1970). The
coordinates of this new configuration are the values used in the ordination.
In practice, perfect configurations are unusual. The measure of departure
from monotonicity is called stress. The higher the stress, the less nearly
perfect the degree of monotonicity. In general, the stress decreases with the
number of dimensions and the number of iterations used in the analysis.
124
-------
6.3 RESULTS
6.3.1 Q-Mode Analysis—Clinch River Data
6.3.1.1 Presence-Absence Data—
Two applications of nonmetric MDS, presence-absence and quantitative
data, were tested (as an alternative to cluster analysis). Figure 36 shows
the Q-mode ordination of presence-absence data from 36 zoomacrobenthic samples
from the Clinch River. In this three-dimensional display, one large, fairly
tightly grouped set of 16 samples lies near the right-center portion of the
diagram, indicating the faunal similarity of these samples. This group of
samples is quite similar to the first cluster in Table 31, which shows the
results of cluster analysis of the matrix of Jaccard's coefficients (S ).
Samples outside the main group in Figure 36 are from station 8, six and eight
weeks after the spill; station 9, four, six, and eight weeks after the spill;
and station 10, two weeks after the spill.
Samples collected at stations 8, 9, and 10 after the spill tended to form
individual groupings near the upper center of Figure 36. For example, samples
collected at station 9 four, six, and eight weeks after the spill grouped
together. Since the biological communities at these stations were exposed to
different pH conditions for varying lengths of time, they were expected to
stand alone, not to cluster together.
Samples from substation 7RB did not form any distinct pattern, but were
scattered along (left side of Figure 36). This agreed with the results
obtained from Q-mode clustering, where samples from 7RB did not cluster at the
0.62 level of similarity (reference Table 31). Samples from 7LB&MS, collected
six and eight weeks after the spill, were also widely separated. The lack of
similarity of these samples with the other 34 samples also agreed with the
Q-mode clustering results presented in Table 31.
6.3.1.2 Quantitative Data (Species Counts)—
When species counts were used in nonmetric MDS, the 36 zoomacrobenthic
samples appeared to be randomly scattered (Figure 37). Only one recognizable
group of stations was present. This cluster, near the upper left margin of
the diagram, consisted primarily of samples from substation 7RB. Two samples
from 7LB&MS, collected six and eight weeks after the spill, were also in this
cluster, indicating that the left bank and midstream sections of station 7
were similar to the right bank.
The results of cluster analysis of the distance coefficient Dist 7
(Table 34) were compared with Figure 37 to determine any similarities between
clustering and the ordination of quantitative data. At a level of similarity
of 4.8, three clusters were formed in the Dist 7 dendrogram; (1) 11 samples
from either the control stations (stations 4 and 11) or stations 8, 9, and 10
before the spill; (2) 20 samples from stations 7 through 10, including station
7LB&MS; and (3) samples from stations 8 and 9, four weeks after the spill, and
station 10, two weeks after the spill. Figure 37 has only one poorly defined
group of nine stations: station 8, immediately after the spill and two and
eight weeks after the spill; station 9, immediately after the spill and two,
six, and eight weeks after the spill; station 7LB&MS, immediately after the
spill and two weeks after the spill; and station 11, six weeks after the
spill. It could be equated with the second cluster in Table 34, but only in a
125
-------
O 27-32 BEFORE SPILL
A 33-36 IMMEDIATELY AFTER SPILL
D 47-50 2 WEEKS AFTER
• 51-56 4 WEEKS AFTER
A 58-61 6 WEEKS AFTER
• 62-67 3 WEEKS AFTER
A7B
NOTE: 7R = 7RB
7LM = 7LB&MS
BINARY
Figure 36. Three-dimensional ordination by nonmetric multidimensional scaling computed from distance
coefficients based on presence-absence data, showing faunal similarities between samples
collected from the Clinch River in 1970.
-------
O 27-32 BEFORE SPILL
A 33-36 IMMEDIATELY AFTER SPILL
D 47-50 2 WEEKS AFTER
• 51-56 4 WEEKS AFTER
A 58-61 6 WEEKS AFTER
• 62-67 8 WEEKS AFTER
NOTE: 7R = 7RB
7LM = 7LBSMS
7R
Three-dimensional ordination of nonmetric multidimensional scaling computed from distance
coefficients based on counts of species, showing faunal similarities between samples collected
from the Clinch River in 1970.
-------
general sense. No other clusters were found in Figure 37 that would compare
favorably with those noted in Table 34, indicating that cluster analysis
tended to group stations that nonmetric MDS did not.
128
-------
SECTION 7
DIVERSITY INDICES
7.1 GENERAL DESCRIPTION
The third kind of analysis of community structure considered was the use
of indices of species diversity, and indices of diversity at higher taxonomic
levels. A diversity index is a statistic that combines information on both
the number of species in an assemblage and the evenness of the distribution of
individuals among those species (Pielou, 1969). A high index of species
diversity results from the presence of many species with nearly even
abundances; a low index results from a few species in an assemblage dominated
by one species. A diversity index of intermediate value can result either
from a few species with nearly even distributions or from many species with
uneven distributions. Therein lies the method's greatest weakness: that a
value of species diversity can result from study of assemblages with quite
different distributions of species. A second weakness is that, unlike cluster
analysis and ordination, which operate on similarity coefficients, diversity
indices are not affected by the species present, only by their numbers. Thus,
although quite different communities can be compared, equal diversity does not
imply equal tolerance to potential environmental impacts.
Failure to recognize the importance of these two inherent weaknesses has
led to widespread misuse of diversity indices and misunderstanding of the
concept of diversity. As a result, an extensive literature exists on the
disadvantages of their use and the drawbracks of the concept (see e.g.,
Hamilton, 1975; Hedgpeth, 1973). Diversity indices remain, however, as one
tool available for ecologists. Like any tool they can be misused, and getting
the most out of them requires an understanding of their limitations. Like
many other tools, they are most valuable when used in conjunction with other
information or tools. The concept of diversity has been widely applied in all
branches of ecology (Woodwell, 1970), and its use for evaluating potentially
impacted communities in applied aquatic ecology is particularly noteworthy.
Former EPA administrator R. E. Train (1973) has stated that "for top
management and general public policy development, monitoring data must be
shaped into easy-to-understand indices that aggregate data into understandable
forms. Failure to do so will result in suboptimun achievement of goals at
much greater expense." Whitten (1975) has concluded that "the most suitable
means of analyzing community structure for the purpose of pollution assessment
appears to be the diversity index."
Although most of the theory of diversity of communities has been based on
species diversity, applied ecologists often have to enumerate organisms at
higher taxonomic levels because of the difficulty of identifying species (see
earlier discussions, section 5.3.2.1) and, consequently, the high cost of
doing so. Results of such analyses, while internally consistent within a
study, are by no means comparable from study to study. Therefore, the search
129
-------
for absolute values of diversity are unrealistic and inconsistent with the
purpose of applied aquatic ecology. Nonetheless, in the examples that follow,
we continue to refer to species diversity as a concept although the diversity
values are in fact sometimes based on undifferentiated higher taxa, especially
for such groups as oligochaetes and chironomids.
7.2 ANALYTICAL PROCEDURES
Measures of species diversity can be grouped into four general types:
(1) species richness; (2) indices that assume a particular distribution, for
example Williams' ct index (Fisher et al., 1943); (3) indices measuring
probability of encounter, such as Simpson's (1949) index and the sequential
comparison index (SCI) (Cairns et al., 1968); and (4) indices adapted from
information theory, Shannon's index (Shannon and Weaver, 1949), Brillouin's
index (Brillouin, 1962), and the approximate index (e.g., Wilhra and
Dorris, 1968).
Species richness (S) is simply the number of species present in a sample
or community. This measure of diversity has the same drawbacks as the use of
presence-absence data in cluster analysis and ordination. It makes no
distinction, for example, between a sample with 1 individual each of species A
and B, a sample with 1 individual of species A and 1000 of species B, and a
sample with 1000 of each species. Moreover, it is highly dependent on sample
size.
Dennison and Hay (1967) presented an equation that allows one to compute
the number of individuals one must collect to be certain (with a given
probability of error) of collecting species that comprise a given proportion
of the community.
N = log (E)
log (1 - p)
where E = the probability of error the investigator is willing to accept,
p = the proportion each target species comprises of the community, and N = the
number of individuals to be collected. For example, to be 95 percent certain
(i.e., 5 percent error: E = 0.05) of collecting species that comprise 1
percent of the community (p = 0.01),
The use of this equation shows that rather large samples are needed to
assure that even relatively common species are not missed by chance alone and
casts doubt on the value of species richness as a measure of species diversity
in applied aquatic ecology.
Williams' a index (Fisher et al., 1943) is based on the assumption that
the abundances of organisms in the community being studied fit a logarithmic
series. Williams (1950) proposed that the parameter of such distributions be
used as an index of diversity of the community. The parameter Of is an
intrinsic property of communities, unaffected by sample size, that is
proportional to the number of species present.
130
-------
S = a log (1 + -)
e a
where a = the parameter of the logarithmic series used to measure species
diversity, N = the number of individuals in the sample, and S = the number of
species in the sample. Thus, although a is a property of the community that
is independent of sample size, since its estimation is based on number of
individuals and species in a sample, the size of the sample is of paramount
importance. This index can be applied only if species abundance fits a
logarithmic series, which may be impossible to determine when only a few
species are present. Thus, as Pielou (1969) has pointed out, "a is unsuitable
as a diversity index unless the collection at hand has many species and also
unless abundances form a logarithmic series."
Simpson's (1949) index was derived from probability theory and answers
the question "What is the probability that two specimens picked at random from
a community of infinite size are the same species?" If a species i is present
in the community in the proportion p., the probability of selecting two
individuals of species ;L at random is the joint probability of p. . Simpson's
equation is: 1
S
D = 1 - Z (?i2)
i=l
where S = the number of species, p. = the proportion of individuals in the ith
species, and D = the diversity. As with some of the other indices discussed
below, the correct computation of Simpson's diversity requires a fully
censused community, which never occurs in practical ecology. Moreover, Krebs
(1972) has noted that Simpson's index gives relatively little weight to rare
species, a characteristic that is undesirable where rare species have great
impact on the community, as is the case with predators.
A problem universally associated with the use of diversity indices has
been the time and level of professional expertise required for taxonomic
identification of organisms in the sample. Cairns et al. (1968) developed the
sequential comparison index (SCI) to overcome this problem. Given a random
sample of specimens, A , A , . . . A , sequential comparisons are made between
organisms (e.g., AI vs. A2, A2 vs. A ), and the number of runs or consecutive
specimens of the same species is determined. The ratio of number of runs
divided by (N + 1), where N = the number of organisms, is the measure of
diversity. Patil and Taillie (1976) have demonstrated that, with a minor
correction for bias, the SCE becomes an unbiased estimator of Simpson's index
diverted from probability theory.
Margalef (1956) proposed analysis of mixed-species communities by methods
derived from information theory. In this sense, diversity is equated with the
uncertainty that exists when a single organism is selected at random from the
community. The more species present in a community and the more even their
distribution, the greater the uncertainty and the larger the species
diversity. Information content is a measure of uncertainty and is a
reasonable measure of diversity. The most commonly employed diversity indices
derived from information theory are Shannon's index (H'), Brillouin's index
(H), and the approximate index (H"). Pielou (1966, 1967, 1969, 1974, 1975,
131
-------
1977) has elucidated the theory behind the use of these indices in ecology,
and Kaesler and Herricks (1977) and Kaesler et al. (1978), drawing on her
work, have evaluated their use in applied aquatic ecology.
Shannon and Weaver (1949) introduced the following equation for the
information content per symbol of a code made up of S different symbols, each
with a probability of occurrence of p.:
S
H' = - Z (p± loge pj.
In an ecological context, S = the number of species in a conceptually infinite
community and p. = the proportion the ith species comprises of the community.
Note that this equation is not intended for use with sample data but rather
requires knowledge of the proportions each species comprises of the community,
information that is never available to the applied aquatic ecologist. In
their evaluation of this index of diversity, Kaesler and Herricks (1977)
wrote: "The primary problem with the use of this equation is that in the real
world it is usually impossible to define or to sample randomly from the
conceptually infinite population. How, for example, would one define the
limits of a community occupying the polluted reaches of a stream in which
downstream recovery and accommodation, seasonal change, and recruitment due to
stream drift are important factors? Moreover, the probabilities of occurrence
or the proportion of each species in the community can never be known, and a
reasonably precise estimation may require unreasonably large samples,
especially for rare species."
Brillouin's equation gives the information content per symbol of a
message and is thus based on samples rather than conceptually infinite
communities:
H l -i N!
H = - log
N 6e N1!N2! ... Ng!
In an ecological context, S = the species in a sample, N = the number of
organisms in the sample, and N. = the number of organisms belonging to the v
.
species. Kaesler and Herricks (1977) wrote: "Brillouin's H is the species
diversity per individual of a collection in which all N specimens have been
assigned to one of s species and counted to give the N.'s. This equation has
not been popular, partly because the factorials involved often become astro-
nomically large, but with readily available, high-speed computers, and tables
of logarithms of factorials it need no longer be avoided. Moreover, it is
important because it, and not Shannon's equation, gives the actual diversity
of a fully censused collection of organisms. It is not a statistical estimate
but an actualy measurement of the diversity of the working ecologists basic
unit—the sample."
The approximate index is given by:
l log
=l N
132
-------
where S = the number of species in a sample and N./N = the proportion of the
ith species in the sample. Kaesler and Herricks ^1977) have pointed out that
the approximate index "is the one that is used most often in practice. It
resembles Shannon's equations, but the probabilities of occurrence of species
have been replaced by their proportion in the sample, N./N. In spite of its
popularity, the approximate index has some serious drawbacks that have not
always been thoroughly appreciated by some of its users.
"Pielou (1966) has shown that this equation can be used for two quite
different purposes. Unfortunately, it suits neither purpose particularly
well. First, it has been used to estimate Shannon's diversity, H', when the
N.'s are sample values. H" is, in fact, a maximum likelihood estimator of H*,
but it is also a biased estimator. An appropriate correction term has been
derived, but in order to apply it, it is necessary to know s, the number of
species in the community (in the universe, not the sample) (Basharin, 1959).
It is, of course, not possible to know the number of species in a real
community without making a complete census of the community, which, even if it
were possible, would not be consistent with the purposes of environmental
protection. Wilhm (1968) has pointed out that for a finite s the bias
approaches zero as the number of individuals in the sample, N, approaches
infinity; and Peet (1974) has suggested that the bias is small for most
ecological applications. However, when equal-effort sampling methods are
used, one is not assured of large samples, particularly from heavily polluted
stations. Thus, the very samples in which the applied ecologist is most
interested will in many instances be those in which the bias is most severe.
"Second, H" has been used as a substitute for Brillouin's H to
approximate (not estimate in a statistical sense) the reputedly cumbersome H
in order to avoid computing fractorials. This is done when the N.'s are
regarded as population values. The equation for H"...can be derived from
Brillouin's equation,...but the derivation is valid only when all the N.'s
are...very large indeed...(Pielou, 1969). The reason for this stipulation of
very large N.'s is that the derivation of [H"]...from Brillouin's
equation...depends on the substitution of N(ln N-l) for InN! in...[H]. For
small values of N, this substitution is simply not warranted. Especially with
equal-effort sampling, one cannot be sure that all the N.'s will be very
large, and again the low diversity samples that are usually of greatest
interest to applied ecologists will be incorrect—in this case with
artifically high values because H" is always larger than H.
"Wilhm (1968) has argued that H seldom applies to ecological measures of
diversity because the N.'s "...are rarely population values and must be
estimated from the sample." But, as was pointed out earlier, this is entirely
a matter of defining the population (universe). If ecological samples are
regarded as fully-censused collections, then H ... gives the exact diversity
not an estimate or approximate. Because it is usually not possible to define
the limits of a population, especially in stream surveys, nor to sample at
random from it, we judge that it is most appropriate to regard a sample as a
message from the ecosystem. That message has an information content or
diversity that is appropriately computed with Brillouin's equation..."
Pielou (1967) has shown that diversity indices from information theory
may be studied at each level in the taxonomic hierarchy and that the
133
-------
components at various levels are independent and additive. Thus, for example,
one may study diversity of orders, families within orders, genera within
families, and species within genera. Furthermore, one may devise other
schemes of classification that provide greater ecological insight than the
taxonomic heirarchy. In this study two nontaxonomic hierarchical
classifications were used in addition to the taxonomic hierarchy. The first,
adapted from Cummins' (1973) work, emphasized the trophic-functional (TF) role
of each organism in the stream ecosystem. The major categories used in the
classification are functional group, feeding mechanism, dependence, and
principal food habit (Table 11). To adapt this scheme to our needs, a
numerical code was assigned to each designation in a category and a hierarchy
established (Table 12). The second classification scheme was based on the
observations of Steinmann (1907, 1908) and Hynes (1960, 1970), who related
anatomical and behavioral adaptations of benthic invertebrates to habitat
preference and the organism's ability to obtain food. This system used three
categories that stressed functional morphology: head position, H; general
body shape, B, and type and shape of respiratory organs, R (Table 13).
7.3 DIVERSITY OF SAMPLES FROM THE CLINCH RIVER
Species diversity of 36 samples collected in 1970 from the Clinch River
were computed using Brillouin's H and, for comparison, the approximate index
H".
7.3.1 Species Diversity
Results of diversity analysis with Brillouin's index (H) are summarized
in Figure 38, in which time in days is plotted along the abscissa and stations
along the ordinate. The ordinate was complicated somewhat by dividing
station 7 into substations 7RB (right bank) and 7LB&MS (left bank and
midstream). Both substations were at the same river kilometer, but 7LB&MS was
plotted closer to station 4 because it was unaffected by the spill and
day-to-day operation of the power plant. Substation 7RB, on the other hand,
was strongly affected by both because the effluent from the power plant flowed
along the right bank.
H was rounded to one decimal place, values were plotted, and diversity
was contoured at an interval of 0.4 to illustrate geographic and temporal
differences between stations or groups of stations. The most discernible
differences involved the periods before and after the spill and substations
7LB&MS and 7RB (Figure 38). The contours of diversity near these samples are
closely spaced, indicating a pronounced change in community structure.
After the spill, diversity decreased to between 1.6 and 1.8 at the
affected stations. This decrease was followed by an increase in diversity
over the next 60 days until H equaled or approximated the values found before
the pH stress. Examination of the samples contained within each contour line
indicated a sequence of recovery for the affected stations. Biological
recovery through time began at the farthest downstream station and proceeding
upstream. This is contrary to the sequence postulated by some investigators,
who claim that biological recovery begins at the site closest to healthy
sites. Although their hypothesis may explain the recovery of some streams, it
does not in this instance. In this study, biological recovery depended more
on the severity of the initial damage than on a site's proximity to unaffected
tributaries or headwater areas.
134
-------
0>
CD
SNOI1V1 S
Figure 38. Species diversity (Brillouin's H) of 36 zoomacrobenthic samples
from the Clinch River, 1970; contour interval 0.4.
135
-------
Values of H tabulated in Table 46 form groups that are remarkably similar
to those obtained from clustering presence-absence data with Jaccard's
coefficient (Table 31). In both analyses, the upstream and downstream control
stations (stations 4 and 11) are similar. Moreover, samples from stations 8,
9, and 10 collected before the pH stress are similar to the control stations.
After the spill, the most severely impacted sites tended to cluster together,
as evidenced by the cluster containing station 8, immediately after the spill
and two and four weeks later; station 9, immediately after the spill and two
weeks later; and station 10, immediately after the spill.
Results using the approximate index H" (Figure 39, Table 47) are similar,
but H" appears to depend more on sample size than other investigators have
thought (Wilhm and Dorris, 1968).
7.3.2 Hierarchical Diversity
It is important to remember that species diversity is a single statistic
that cannot provide a complete picture of community structure. The statistic
does, however, provide a useful quantitative measure of environmental
conditions, and it has characteristics that allow it to be used as a heuristic
tool. Pielou (1967, 1969, 1975) has demonstrated the possibility of
partitioning species diversity into additive components that express the
amount of diversity contributed by each level or component of a taxonomic
hierarchy.
The use of hierarchical diversity as a heuristic tool enables one to
determine whether additional insight can be gained from species-level
determinations in biomonitoring programs. For example, if only one species of
each genus is found in a community, the component of diversity due to species
within genera equals zero. Similarly, if only one genus of each family is
found, the component of diversity due to genera within families also equals
zero. Thus, under some circumstances, costly and time-consuming species
determinations may be of limited value, and environmental assessment can be
based on discrimination of higher taxa rather than identification of lower
ones .
To test the usefulness of hierarchical diversity, the Clinch River data
set was partitioned into the taxonomic categories order, family, and genus.
Diversity was calculated for each substation, left bank, right bank, and
midchannel sections of the stream (Table 48). For the control stations,
stations 4 and 11, the component of diversity for each taxonomic category
decreased from order to family to genus.
At stations affected by day-to-day operation of the power plant and the
acid spill, however, the trend of decreasing diversity within the taxonomic
hierarchy was not found. At the stations impacted by the power plant, the
component of diversity for genera within families contributed more to H than
did families within orders.
In previous studies of the Clinch River, Grossman et al. (1973) noted
that station 8 was chronically stressed by the power plant and was the most
severely impacted station after the pH stress. Stress not only caused a
reduction in the number of taxa, but it changed the relative abundance of each
species. As a result, diversity decreased. At substation 8LB the diversity
136
-------
<0
o>
SNOI1VJL S
Figure 39. Species diversity (approximate index H) of 36 zoomacrobenthic
samples from the Clinch River, 1970.
137
-------
component for genera (0.52) before the spill was higher than that for families
(0.33), but less than that for orders (1.44) (total = 2.29). Immediately
after the spill, the difference between familial and generic diversity was
even more pronounced: 0.57 (genera), 0.00 (families), 0.79 (orders), and
total = 1.36.
This example also suggests that each component of diversity in a
taxonomic hierarchy may be a useful indicator of stressed vs. unstressed
communities. Note, however, that the diversity for each category in the
taxonomic hierarchy did not decrease in proportion to the overall decrease in
H. And, if the Clinch River data set is representative, one can conclude that
in unstressed, healthy ecosystems the combined components of diversity for
higher taxonomic categories (order and family) contribute more to measures of
diversity than genera and species.
A second test of the usefulness of hierarchical diversity deals with
functional morphology. If all species in a genus have the same functional
role in the ecosystem, how important is their diversity to the overall
stability of the system when compared with the diversity of other taxa? To
answer this question, two hierarchical classifications that bypassed the
classical taxonomic hierarchy were developed.
The first classification was modified from the system proposed by Cummins
(1973) (refer to Tables 11 through 13, section 4.1.3). It stresses the func-
tional group, feeding mechanism, dependence, and the principal food habit of
each species. In adapting it to our needs, numerical codes were assigned to
each designation in a category (Table 12) and a hierarchy was developed that
is admittedly artifical (Tables 11 and 13). Hierarchical diversity was
computed for five samples or subsamples from the Clinch River immediately
after the spill (Table 49). Station 7 was divided into substation 7LB&MS,
unaffected by the spill and day-to-day operation of the power plant, and
substation 7RB, strongly affected by both. Stations 8, 9, and 10 were located
in the zone of mixing downstream from the spill site. Stations 7LB&MS and 10
had similar numbers of taxa, but quite different components of diversity at
all levels. Other than number of taxa, samples from stations 7RB, 8, 9, and
10 were quite similar.
In addition to summarizing diversity for various levels in each
hierarchical classification, Table 49 gives the standard deviations. In this
example, the standard deviation was a convenient measure of scatter of the
diversity components at each level. For the TF classification, the standard
deviation for the control station was much larger than that for any other
stations. The high standard deviation found at the control substation
resulted primarily from the large component of diversity contributed by
species within food habit. Also, the component contributed by dependence
within feeding mechanism was appreciably larger for the control than for the
affected stations. Substation 7RB had a higher diversity component
contributed by feeding mechanism within functional group (Table 50).
The second classification (HER) resulted from the work of Steinmann
(1907, 1908) and Hynes (1960, 1970), who related anatomical and behavioral
adaptations of benthic invertebrates to conditions found in lotic
environments. Three categories stress functional morphology, i.e., an
organism's functional role in the ecosystem and its habitat preference as
138
-------
reflected by its morphology: head position, body shape, and type and shape of
respiratory organs (refer to Table 13).
The relative contribution of each component of diversity did not vary
appreciably from sample to sample as one proceeded from one category to the
next (Table 51). More importantly, each component within the hierarchy
contributed to the overall diversity, suggesting that the HER
(head-body-respiratory) classification scheme warrants further investigation.
In the hierarchical diversity study, different classification schemes
were used. Hynes (1970) stated, "It should be borne in mind that these are,
to a considerable extent, artificial, and that any one species or higher taxon
may display several types of adaptation. Alas, some of the phenomena to be
discussed are not strictly adaptation; they are changes brought about by life
in this peculiar environment." Although it is still too early to discern
whether some of the observed trends are real, it appears that classifications
emphasizing functional relationships merit further consideration and that
hierarchical diversity is a useful heuristic technique.
139
-------
TABLE 46. SPECIES DIVERSITY (BRILLOUIN'S H) OF 36
ZOOMACROBENTHIC SAMPLES FROM THE CLINCH RIVER, 1970
Date
7LB&MS
7RB
10
11
Group I—H values >2.1
Before pH stress
Immediately after
pH stress
2 weeks after
4 weeks after
6 weeks after
8 weeks after
X
X
X
X
X
X
X
X
X
X
X
X
X
Group II—H values >1.7 to <2.1
Before pH stress
Immediately after
pH stress
2 weeks after
4 weeks after
6 weeks after
8 weeks after
X
X
X
X
X
X
X
X
X
Group III—H values >0.9 to <1.7
Before pH stress
Immediately after
pH stress
2 weeks after
4 weeks after
6 weeks after
8 weeks after
X
X
X
X
X
X
X
X
X
140
-------
TABLE 47. SPECIES DIVERSITY (APPROXIMATE INDEX H") OF 36
ZOOMACROBENTHIC SAMPLES FROM THE CLINCH RIVER. 1970
Date
7LB&MS
7RB
10
11
Group I—d values >3.0
Before pH stress X X
Immediately after
pH stress X
2 weeks after X
4 weeks after X X
6 weeks after X
8 weeks after X X
X
X
X
X
X
X
X
X
X
X
X
Group II—d values >1 to <3.0
Before pH stress
Immediately after
pH stress
2 weeks after
4 weeks after
6 weeks after
8 weeks after
X
X
X
X
X
X
X
X
X
X
X
X
X
141
-------
TABLE 48. HIERARCHICAL TAXONOMIC DIVERSITY (BRILLOUIN'S H) OF
ZOOMACROBENTHIC SAMPLES FROM THE CLINCH RIVER, 1970
Sample
4 RB
4 LB
4 MS
4 RB
4 LB
4 MS
4 RB
4 LB
4 MS
7 RB
7 LB
7 MS
7 RB
7 LB
7 MS
7 RB
7 LB
7 MS
7 RB
7 LB
7 MS
7 RB
7 LB
7 MS
7 RB
7 LB
7 MS
8 RB
8 LB
8 MS
Period
Before pH stress
Before pH stress
Before pH stress
4 weeks after
4 weeks after
4 weeks after
8 weeks after
8 weeks after
8 weeks after
Before pH stress
Before pH stress
Before pH stress
Immediately after
Immediately after
Immediately after
2 weeks after
2 weeks after
2 weeks after
4 weeks after
4 weeks after
4 weeks after
6 weeks after
6 weeks after
6 weeks after
8 weeks after
8 weeks after
8 weeks after
Before pH stress
Before pH stress
Before pH stress
Order
1.19
1.55
1.32
1.32
1.28
1.42
1.30
1.20
1.37
1.41
1.37
1.09
1.21
1.20
1.27
0.63
1.35
1.37
0.82
1.21
1.34
0.60
0.83
1.31
1.20
1.26
1.35
1.37
1.44
1.39
Component
family
0.97
0.56
0.42
0.75
0.59
0.54
0.63
0.64
0.39
0.47
0.92
1.09
0.08
0.96
0.40
0.06
0.76
0.41
0.10
0.74
0.43
0.12
0.99
0.61
0.18
0.87
0.72
0.29
0.33
0.42
Genus
0.21
0.39
0.32
0.55
0.32
0.44
0.24
0.33
0.35
0.14
0.41
0.25
0.50a
0.29
0.52a
0.53a
0.33
0.45
0.40a
0.31
0.34
0.17a
0.26
0.19
0.34a
0.24
0.22
0.533
0.52a
0.543
H
2.37
2.50
2.06
2.62
2.19
2.40
2.17
2.17
2.11
2.02
2.70
2.43
1.79
2.45
2.19
1.22
2.44
2.23
1.32
2.26
2.11
0.89
2.08
2.11
1.72
2.37
2.29
2.19
2.29
2.35
142
-------
TABLE 48 (continued)
Sample
8 RB
8 LB
8 MS
8 RB
8 LB
8 MS
8 RB
8 LB
8 MS
8 RB
8 LB
8 MS
8 RB
8 LB
8 MS
9 RB
9 LB
9 MS
9 RB
9 LB
9 MS
9 RB
9 LB
9 MS
9 RB
9 LB
9 MS
9 RB
9 LB
9 MS
9 RB
9 LB
9 MS
Period
Immediately after
Immediately after
Immediately after
2 weeks after
2 weeks after
2 weeks after
4 weeks after
4 weeks after
4 weeks after
6 weeks after
6 weeks after
6 weeks after
8 weeks after
8 weeks after
8 weeks after
Before pH stress
Before pH stress
Before pH stress
Immediately after
Immediately after
Immediately after
2 weeks after
2 weeks after
2 weeks after
4 weeks after
4 weeks after
4 weeks after
6 weeks after
6 weeks after
6 weeks after
8 weeks after
8 weeks after
8 weeks after
Order
0.87
0.79
0.84
0.61
0.62
0.49
1.32
1.28
1.42
0.56
1.01
0.86
1.19
1.33
1.15
1.41
1.40
1.21
0.67
0.43
0.85
0.28
0.97
0.71
0.97
0.90
1.07
1.39
1.29
1.31
1.38
1.58
1.56
Component
family
0.22
0.00
0.07
0.05
0.04
0.01
0.75
0.59
0.54
0.05
0.12
0.07
0.25
0.33
0.16
0.33
0.69
0.19
0.03
0.13
0.06
0.03
0.09
0.01
0.14
0.14
0.17
0.12
0.20
0.05
0.17
0.31
0.24
Genus
0.44
0.57a
0.52a
0.95a
0.94a
0.89a
0.55
0.32
0.44
0.62a
0.65a
0.64a
0.42a
0.17
0.403
0.64a
0.38
0.443
0.65a
0.83a
0.71a
0.753
0.81a
0.80a
0.69a
0.533
0.60a
0.39a
0.44a
0.16a
0.50a
0.14
0.10
H
1.33
1.36
1.43
1.61
1.60
1.39
2.62
2.19
2.40
1.23
1.78
1.57
1.86
1.83
1.71
2.38
2.47
1.84
1.35
1.39
1.62
1.06
1.87
1.52
1.80
1.57
1.84
1.90
1.93
1.52
2.05
2.03
1.90
143
-------
TABLE 48 (continued)
Sample
10 RB
10 LB
10 MS
10 RB
10 LB
10 MS
10 RB
10 LB
10 MS
10 RB
10 LB
10 MS
10 RB
10 LB
10 MS
10 RB
10 LB
10 MS
11 RB
11 LB
11 MS
11 RB
11 LB
11 MS
11 RB
11 LB
11 MS
Period
Before pH stress
Before pH stress
Before pH stress
Immediately after
Immediately after
Immediately after
2 weeks after
2 weeks after
2 weeks after
4 weeks after
4 weeks after
4 weeks after
6 weeks after
6 weeks after
6 weeks after
8 weeks after
8 weeks after
8 weeks after
Immediately after
Immediately after
Immediately after
4 weeks after
4 weeks after
4 weeks after
8 weeks after
8 weeks after
8 weeks after
Order
1.20
1.46
1.32
1.11
1.25
0.86
1.19
1.20
1.33
1.28
1.15
1.14
1.36
1.24
1.23
1.42
1.63
1.36
1.34
1.12
1.24
1.17
1.22
1.30
1.37
1.42
1.33
Component
family
0.29
0.62
0.36
0.04
0.06
0.03
0.18
0.20
0.39
0.19
0.20
0.11
0.15
0.19
0.22
0.24
0.35
0.39
0.71
0.78
0.51
0.49
0.46
0.57
0.85
0.63
0.67
Genus
0.383
0.31
0.35
0.44a
0.483
0.37a
0.40a
0.33a
0.33
0.35a
0.33a
0.34a
0.50a
0.34a
0.27a
0.46a
0.29
0.16
0.24
0.26
0.35
0.36
0.39
0.37
0.17
0.18
0.26
H
1.87
2.39
2.03
1.59
1.79
1.26
1.77
1.73
2.05
1.82
1.68
1.59
2.01
1.77
1.72
2.12
2.27
1.91
2.29
2.16
2.10
2.02
2.07
2.24
2.38
2.23
2.26
The generic component of diversity contributed more to H than the
familial component of diversity.
144
-------
TABLE 49. COMPONENT OF SPECIES DIVERSITY (H) AT EACH LEVEL IN THE TROPHIC-FUNCTIONAL HIERARCHY
FOR FIVE SAMPLES OR SUBSAMPLES COLLECTED IMMEDIATELY AFTER THE ACID SPILL ON THE
CLINCH RIVER. 1970
Cn
Additive component
Station
7LB&MS
7RB
8
9
10
Taxa
44
12
28
24
39
Functional
group
0.90
0.70
0.82
0.78
0.89
Feeding
mechanism
within
functional
group
0.36
0.32
0.09
0.12
0.09
Dependence
within
feeding
mechanism
0.29
0.12
0.04
0.02
0.03
Food
habit
within
dependence
0.03
0.00
0.00
0.03
0.01
Species
within
food habit
1.16
0.58
0.65
0.74
0.54
H
2.74
1.72
1.60
1.69
1.65
Standard
deviation
0.47
0.30
0.38
0.39
0.38
-------
TABLE 50. PERCENT OF SPECIES DIVERSITY (H) CONTRIBUTED AT EACH LEVEL IN THE TROPHIC-FUNCTIONAL
HIERARCHY FOR FIVE SAMPLES OR SUBSAMPLES COLLECTED IMMEDIATELY AFTER THE ACID SPILL
ON THE CLINCH RIVER, 1970
Station
7LB&MS
7RB
8
9
10
Taxa
44
12
28
24
39
Functional
group
32.7
40.9
51.2
46.0
53.9
Feeding mechanism
within functional
group
13.0
18.2
5.9
7.0
5.6
Additive component
Dependence within
feeding mechanism
10.6
6.9
2.3
1.4
7.6
Food habit
within
dependence
1.0
0.0
0.0
1.9
0.1
Species within
food habit
42.6
34.0
40.6
43.7
32.8
-------
TABLE 51. COMPONENT OF SPECIES DIVERSITY (H) AT EACH LEVEL IN THE HEAD-BODY-RESPIRATORY
FUNCTIONAL MORPHOLOGY HIERARCHY FOR FIVE SAMPLES OR SUBSAMPLES COLLECTED
IMMEDIATELY AFTER THE ACID SPILL ON THE CLINCH RIVER, 1970
Additive component
Station
7LB&MS
7RB
8
9
10
Head
position
0.67
0.63
0.70
0.69
0.64
Body shape within
head position
1.29
0.29
0.19
0.19
0.43
Respiratory organ
within body shape
0.84
0.36
0.09
0.08
0.09
Species within
respiratory organ
0.61
0.44
0.62
0.73
0.46
H
2.74
1.72
0.60
1.69
1.62
Standard
deviation
0.87
0.47
1.39
0.44
0.40
-------
SECTION 8
SUMMARY AND DISCUSSION
8.1 INTRODUCTION
Different methods of analysis were used with varying results. The
purpose of this section is to evaluate the results obtained, to assess the
different levels of interpretability, and to make general recommendations
about the application of these methods and the results one can expect from
them.
8.2 NATURE OF THE ECOSYSTEMS FROM WHICH DATA BASES WERE SELECTED
A point that must be stressed is that the nature of the ecosystem and the
nature of the data collected from the ecosystem determine both the methods to
be used in analyzing the data and the ease with which the results can be
interpreted. For example the Clinch River is a shallow, rapidly flowing
stream with a normally diverse benthic community that was adversely impacted
by an acute pH stress. Numerous large samples were collected at closely-timed
intervals both before and after the acute pH stress. Like many streams, it
could be regarded as a linear system, in which contemporaneous samples from
adjacent stations are expected to be similar. The Cumberland River, on the
other hand, was studied in an impounded area with deep, slowly moving water
that received chronic thermal stress. Although both high- and low-flow
conditions were studied, the velocity was never as great as the Clinch River.
Moreover, although temperatures in the range of the Cumberland River are
common in nature and adapted to by numerous warm-water organisms, the low pH
stress received by the Clinch River was a shock that few organisms were able
to tolerate. Finally, the Cumberland River was not a linear system, and the
thermal plume (potential stress) moved upstream during times of low flow.
The depth of water in the Clinch River was much less than the depth in
the Cumberland River, where benthic stations were located at depths varying
from 2.5 to 9 meters. Samples from the Clinch River contained more
individuals and greater diversities than samples from the Cumberland River.
This difference stems in part from more extensive sampling in the Clinch and
in part from real differences between the two ecosystems. The relative
uniformity of the substratum sampled in the Cumberland River provided fewer
data from fewer habitats and niches. Depth reduced the impact of the stress
in the Cumberland, whereas all macroinvertebrates in the Clinch River
downstream from the site of the acid spill were exposed to the low pH shock.
The net effect of the differences between the ecosystems and the kind and
degree of stress resulted in the Clinch River ecosystem being more
understandable and definable than the Cumberland River system.
While major environmental changes are evident with coarse methods of
analysis, subtle impacts may go undetected even with elaborately designed
148
-------
studies. This is especially true when unexpected factors, such as upstream
flow of the plume, inhibit the effectiveness of the analytical protocol
established for the study. In our examples, the Clinch River was studied in
greater detail than necessary to determine initial impact and recovery. The
Cumberland River was probably not sampled adequately in view of the problem
caused by upstream flow. The important point is that it is virtually
impossible to detect most subtleties or the lack of them before the fact.
Thus, the adequacy of a sampling program may be in doubt until after some of
the data has been analyzed. Preliminary studies are too rarely conducted. As
a result judgmental errors may be made that limit the value of the study or
seriously overextend resources.
8.3 METHODS
8.3.1 Relationships Between Methods
Cluster analysis and ordination of data are intended to serve the same
purpose when used in the Q-mode, i.e., to express similarities between samples
from different stations. Similarities between samples are based on their
faunal content with species given equal weight a priori. Because cluster
analysis will force samples into clusters whether or not clusters exist in
nature, it may be unsuited for analysis of strictly linear (e.g., riverine)
systems. When numerous samples are collected through time, however, as was
done in study of the Clinch River, the strict linearity of the system is
overcome due to the imposition of temporal trends. In such cases, cluster
analysis may be a useful analytical tool. For river-reservoir systems, such
as the Cumberland River, where overbank areas and channel areas may have high
intracorrelations regardless of their position along the river, cluster
analysis may be entirely appropriate.
Throughout this report we have stressed cluster analysis and have
deemphasized ordination because the application and interpretation of cluster
analysis to problems of applied aquatic ecology are much better established.
Specifically, procedures have not been developed and tested for limiting the
size of data sets to be studied by ordination and for excluding rare species
that may be present in some samples and absent from others due to chance
alone. In spite of being somewhat more difficult to interpret, ordination
methods such as nonmetric multidimensional scaling can show everything cluster
analysis shows without some of the inherent disadvantages. Ordination methods
merit further investigation by ecologists.
Analysis of species diversity serves a different purpose from cluster
analysis and ordination. Indices of species diversity do not consider what
species are present. They are based on the number of species present and the
evenness of the distribution of individuals among species. Thus, species
diversity is a measure of community structure that is independent of the
particular community being studied. It is tempting to compute indices of
diversity from numerous communities and to compare the state of health of the
ecosystems by comparing indices. Three aspects of communities suggest that
this approach must be used with caution if at all. First, not all natural
communities have high diversity. For example, undisturbed communities of
macroinvertebrates from rapidly flowing streams are more diverse than those
from slower moving streams and reservoirs. The difference in diversity is not
a difference in health of the ecosystems, although it may bear on their
149
-------
resiliency. Second, undisturbed upstream communities may be as diverse as
undisturbed downstream ones, but the faunas may be quite different. Third, a
diverse community that is tolerant to heated water or other pollutional stress
may become established . Such a community may be described as perched if it
is faunally different from upstream communities that would ordinarily
contribute to repopulation of the area. An incremental increase in
pollutional stress or the temporary removal of stress may decimate the perched
community and leave it degraded since no source of stress-tolerant species
exists upstream. None of these three points can be considered adequately by
the rote comparison of species diversity alone.
8.3.2 Data
The kind and nature of data dictate the type of procedures that should be
used for a study, the conclusions that can be drawn, and the confidence one
has in those conclusions. The most important aspect of planning any
ecological study is to make sure the data collected are adequate for answering
the question at hand. Sampling must have a well defined purpose and the more
clearly that purpose is stated the greater the likelihood that data will be
adequate for the intended purpose. In Section 7 we dealt briefly with
problems that may arise from collecting too small a sample. Other aspects of
sampling design should also be considered to develop sampling protocols that
produce data of the desired quality and quantity.
8.3.2.1 Presence-Absence Data--
The use of presence-absence data has one serious drawback: all
information contributed by the differential abundances of organisms is lost.
For some purposes, however, the benefits to be gained may outweigh the
disadvantages. First, presence-absence data can be obtained much more
quickly, and large samples can be processed at less expense. Second,
depending on the purpose of the study, presence-absence data may give all the
information needed. It was also determined that quantitative data collected
during a cursory survey or poorly documented field work should be used only as
presence-absence data. Note also that a wide selection of coefficients of
similarity are available for comparing stations and assembling groups of
species by cluster analyses.
Of the numerous coefficients available for cluster analyses of
presence-absence data, two were tested: Jaccard's coefficient, which omits
negative matches, and the simple matching coefficient, which regards negative
matches (the absence of a species from two stations) as contributing to
similarity. Both coefficients range from 0 to 1. In practice, however,
Jaccard's coefficients are lower than simple matching coefficients because
data from aquatic environmental surveys are often characterized by a large
number of species with few individuals that differ from station to station.
As a result mutual absences (negative matches) may indicate more similarity
between samples than warranted. To minimize this effort we recommend
Jaccard's coefficient except for those rare cases where mutual absences
clearly result from the same cause. If, however, a species is absent from one
station because the temperature is too high and absent from another because
the substrate is unsuitable, the simple matching coefficient may cause an
investigator to draw an erroneous conclusion. If Jaccard's coefficients are
uniformly low, one should be wary of randomness or absence of species by
chance alone due to sampling error.
150
-------
8.3.2.2 Quantitative Data—
Quantitative data usually consist of the numbers of individuals per
species in a sample. Sometimes not all groups of organisms are distinguished
at the species level, so the counts may be expressed as numbers of individuals
at higher taxonomic levels. Specifically, oligochaetes and chironomids are
difficult, costly, and time consuming to identify as to species and are often
lumped. Quantitative data may also be reported as ranked abundances or as
proportions of each species or higher taxon in a sample.
Individual species counts are sometimes transformed before analyses are
performed. Transformations are usually intended to reduce the quantitative
impact of highly abundant species. Ranking data is a kind of transformation
that is roughly equivalent to using a logarithmic transformation. Either
procedure is drastic. The logarithmic transformation, for example, changes
abundances of 1, 10, 100, and 1000 to 0, 1, 2, and 3. The square-root trans-
formation is less drastic (Vy + 0.5), changing abundances of 1, 10, 100, and
1000, respectively, to 1.22, 3.24, 10.02, and 31.63. This transformation
worked well with all three data sets studied. Finally, counts may be
standarized, whereby a data matrix of t samples (columns) and n species (rows)
is operated on, such that each element in a row has the row mean subtracted
from it and is divided by the row standard deviation to give a new row mean
of 0 and a new row standard deviation of 1. This procedure is effective in
removing inordinate effects of species that are very abundant throughout a
study. Use of data with species abundances expressed as proportions was not
examined in this report. Because of problems that may arise with proportions,
the arcsine transformation (arcsine Vp) is recommended for proportional data.
We suggest that use of proportional data be tested in the future.
Two coefficients for use in cluster analysis of quantitative data were
tested: the Pearson product-moment correlation coefficient and Sokal's
taxonomic distance coefficient. In general, the correlation coefficient shows
high similarities between samples with species present in the same relative
proportions, whereas the distance coefficient shows high similarity (low
distance) between samples with species present in the same absolute
abundances. Either coefficient produced useful results with data transformed
by the square-root transformation and unstandardized. Of course, since the
two coefficients measure different things, the results were different; and, in
general, cluster analysis of correlation coefficients proved to be more
readily interpretable.
8.3.3 Cluster Analysis
Cluster analysis was computed in two modes: Q-mode, which shows
similarities among samples on the basis of their contained species; and
R-mode, which shows similarities among species on the basis of their
distributions and abundances in samples. Q-mode cluster analysis has been
used much more than R-mode in applied aquatic ecology. Results of Q-mode
analysis are applicable only to the stream or reservoir from which data have
been collected and clustered. R-mode analysis, on the other hand, has promise
as a method of comparing faunal associations from stream to stream and from
basin to basin.
Fortunately, the distortion introduced during clustering can be assessed
by the coefficient of cophenetic correlation (r ). Cluster analysis not
151
-------
accompanied by such a coefficient is suspect, and we have used an r X1.8 as
the criterion for consideration in data interpretation. Once it has been
determined that distortion is acceptably low, the overall shape and the levels
of clustering of a dendrogram provide useful information about the data being
clustered and the reliability of interpretations. Ideally a dendrogram should
have several good, tight clusters that are distinct from other clusters or
samples in the dendrogram (see, e.g., Figures 11 and 15). Such dendrograms
have sufficient structure that they may be easily interpreted, depending, of
course, on the samples that comprise the clusters. An example of a dendrogram
in which the similarities are all rather low and of about the same value is
Figure 12. One interpretation is that similarities among samples are actually
low and more or less equal throughout the study. A common cause of such
clusters, however, is randomness. In Figure 12, for example, all species were
included in the data matrix, and no attempt was made to remove rare species
that were present in some samples and, perhaps, absent from others by chance
alone due to sampling error. When such data are used with Jaccard's coeffi-
cient, which always gives low similarities because it ignores negative
matches, the resulting similarities can have a large, unknowable component of
randomness. Another cause of the shape of such dendrograms can be lack of
natural clusters. Recall that cluster analysis forces samples into clusters
whether or not such clusters exist in nature. If the system sampled is
strictly linear with sequential similarities such as might be expected in a
stream, the resultant dendrogram can lack structure. (To use cluster analysis
with such data is analogous to trying to logically subdivide a piece of
string. In such cases ordination is called for as an alternative to cluster
analysis.) Other dendrograms have a stair-stepped arrangement and shape (see
e.g., Figure 31). Such shape usually implies the lack of real clusters in
nature and again calls for ordination.
In our examples, we have usually chosen a single level of similarity for
comparison of all clusters in a dendrogram. In practice, there is no
compelling reason why more than one level cannot be chosen, with a different
level of similarity used in different parts of the dendrogram. Samples from
the high-gradient, upstream portion of a stream, for example, may generally be
more similar to each other than samples from the low-gradient, downstream
portion. Similarly, samples collected in winter may generally be more similar
to each other than samples collected in summer. As with analysis of species
diversity, a thorough understanding of the ecosystem and a grasp of the
questions to which answers are being sought is required, not mere cookbook
applications of complex quantitative analytical techniques.
In some dendrograms, especially those with low overall similarities, a
number of samples may be left unclustered at any reasonable level of
similarity. In Figure 19, for example, six of the lower eight samples in the
dendrogram are unclustered. Lack of clustering can result from real
dissimilarity or can be an artifact of clustering. One should refer to the
original similarity matrix to check the values and, if necessary, go back to
the data matrix to see which species contributed to the anomalous lack of
clustering of some samples.
Cluster analysis is best for expressing similarities at the tips of
dendrograms (smaller pair groups, higher similarity levels) and is less
reliable for interpretation of intercluster similarities (lower similarity
levels). Reliability is lost because the clustering algorithm averages
152
-------
similarities at each step in the clustering procedure. If one is interested
in intergroup similarities, he is well advised to use principal component
ordination, which was not considered in this report.
8 •.?. • ^. J? rj? i5?ii!?5
As was pointed out earlier, ordination has been little used in applied
aquatic ecology, and our evaluation of it here is mainly in terms of how well
it agrees with the results of cluster analysis. In general, however,
ordination is a more widely applicable technique than cluster analysis because
no a priori assumptions need be made about clusters in the data. Such
ordination techniques as principal component analysis are well suited for
showing intercluster similarities rather than fine details of intracluster
similarities. The use of nonmetric multidimensional scaling tested was a
compromise between principal component analysis and cluster analysis. We have
found that when no structure exists in a data matrix due to small sample sizes
or randomness, nonmetric multidimensional scaling is quick to show it by
producing totally meaningless and uninterpretable ordination. We recommend
that more work be done with ordination of other data sets to test the
applicability of nonmetric multidimensional scaling to data from aquatic
surveys.
8.3.5 Species Diversity and Hierarchical Diversity
Because species diversity is meant to provide a different measure of
community structure than cluster analysis or ordination, we suggest that it be
used in conjunction with the other methods. Indices of species diversity have
been widely misused, and a vast literature discusses the disadvantages of
using them. The principal difficulty stems from the nonuniqueness of a given
value of species diversity. Thus, a moderate value of species diversity may
result from a wide range of community structures. Communities with many
species with few individuals each, a moderate number of species with a
moderate number of individuals each, and few species with many individuals
each could all give the same value of species diversity, depending on the
sampling and evenness of distribution of individuals among species. With such
latitude, the method must of course be used with care. In particular, the
search for absolute, global values of species diversity to indicate healthy or
damaged ecosystems is wrong in principal. Nevertheless, species diversity of
samples from within a stream, reservoir, or drainage basin can be compared,
especially if used with care and in conjunction with cluster analysis or
ordination.
Hierarchical diversity has been little used in applied aquatic ecology
except by authors of this report and their coauthors. We recommend that it be
further explored because of its value in showing ways in which the high cost
of aquatic environmental surveys can be reduced.
153
-------
REFERENCES
Barr, A. J., J. H. Goodnight, J. P. Sail, and J. T. Helwig. 1976. A User's
Guide to SAS-76. SAS Institute, Inc., Raleigh, North Carolina.
Sparks Press pp 329.
Basharin, G. P. 1959. On a Statistical Estimate for the Entropy of a
Sequence of Independent Random Variables. Theory Probab. Its Appl.
4:333-36.
Brillouin, L. 1962. Science and Information Theory. 2nd ed. New York:
Academic Press. 347 pp.
Buchanan, R. J. and B. Lighthart. 1973. Indicator Phytoplankton Communities:
A Cluster Analysis Approach. B. C. Prov. Mus. Nat. Hist. Anthropol. Rep.
6:1-10.
Cairns, John, Jr., D. W. Albaugh, F. Busey, and M. D. Chanany. 1968. The
Sequential Comparison Index—A Simplified Method for Nonbiologists to
Estimate Relative Differences in Biological Diversity in Stream Pollution
Studies. J. Water Pollut. Control Fed. 40: 1607-1613.
Cairns, J., Jr., and R. L. Kaesler. 1969. Cluster Analysis of Potomac River
Survey Stations Based on Protozoan Presence-Absence Data. Hydrobiol.
34:414-32.
Cairns, J., Jr., and R. L. Kaesler. 1971. Cluster Analysis of Fish in a
Portion of the Upper Potomac River. Trans. Am. Fish Soc. 100:750-56.
Cairns, J., Jr., R. L. Kaesler, and R. Patrick. 1970. Occurrence and
Distribution of Diatoms and Other Algae in the Upper Potomac River.
Nat. Acad. Nat. Sci. Philadelphia 436:1-12.
Cairns, J., Jr., G. R. Lanza, and B. C. Parker. 1972. Pollution Related
Structural and Functional Changes in Aquatic Communities with Emphasis on
Freshwater Algae and Protozoa. Proc. Acad. Nat. Sci. Philadelphia
124(5):79-127.
Cheetham, A. H. and J. E. Hazel. 1969. Binary (Presence-Absence) Similarity
Coefficients. J. Paleontol. 43(5):1130-36.
Grossman, J. S., J. Cairns, Jr., and R. L. Kaesler. 1973. Aquatic
Invertebrate Recovery in the Clinch River Following Hazardous Spills and
Floods. Va. Folytech. Inst. State Univ. Water
Resour. Res. Cent. Bull. 63:1-56.
154
-------
Crossman, J. S., R. L. Kaesler, and J. Cairns, Jr. 1974. The Use of Cluster
Analysis in the Assessment of Spills of Hazardous Materials. Am.
Midi. Nat. 92(1):94-114.
Cummins, K. W. 1973. Trophic Relations of Aquatic Insects. Annu.
Rev. Entomol. 18:183-206.
Dennison, J. M. and W. W. Hay. 1967. Estimating the Needed Sampling
Area for Subaquatic Ecologic Studies. J. Paleont. 41:706-708.
Environmental Protection Agency. 1976. Federal Interagency Energy/
Environment Research and Development Program, Status Report II.
Office of Energy, Minerals, and Industry, Office of Research and
Development, Washington, DC, 1976.
Farris, J. S. 1969. On the Cophenetic Correlation Coefficient. Syst.
ZoojL. 18:279-85.
Fisher, R. A., A. S. Corbet, and C. B. Williams. 1943. The Relation
Between the Number of Species and the Number of Individuals in a
Random Sample of an Animal Population. J. Anim. Ecol. 12:42-58.
Forbes, S. A. 1907. On the Local Distribution of Certain Illinois
Fishes: An Essay in Statistical Ecology. Bull. 111. State Lab.
Nat. Hist. 7:273-303.
Green, P. E. and F. J. Carmone. 1970. Multidimensional Scaling
and Related Techniques in Marketing Analysis. Boston, Mass.:
Allyn and Bacon, Inc. 203 pp.
Green, R. H., Jr. 1979. Sampling Design and Statistical Methods for
Environmental Biologists. Wiley-Interscience, New York, 257 pp.
Hamilton, M. A. 1975. Indexes of Diversity and Redundancy. J. Water
Pollut. Control Fed. 47:630-32.
Hedgpeth, Joel. 1973. Temperature Relationships of Near Shore Oceanic
and Estuarine Communities. In Effects and Methods of Control of
Thermal Discharges, pp. 1271-1431. Serial No. 93-14, Pt. 3,
Washington, DC: U.S. Govt. Printing Office.
Hoaglin, David C. and Roy E. Welch. 1975. MIT-SNAP. An Interactive Data
Analysis Systems. MIT: 60 pages.
Hurlbert, S. H. 1971. The Nonconcept of Species Diversity: A Critique
and Alternative Parameters. Ecology 52:577-86.
Hutchinson, G. E. 1957. Concluding Remarks, Cold Spring Harbor Symposium.
Quant. Biol. 22:415-27.
Hynes, H.B.N. 1960. Biology of Polluted Waters. Liverpool:
Liverpool Univ. Press. 202 pp.
155
-------
Hynes, H.B.N. 1970. The Ecology of Running Waters. Toronto:
Univ. of Toronto Press. 555 pp.
Jaccard, P. 1908. Nouvelles Recherches sur la Distribution Florale.
So. Vanoise Sci. Natur. Bull. 44:233-70.
Kaesler, R. L. 1970. The Cophenetic Correlation Coefficient in Paleo-
Ecology. Bull. Geol. Soc. Am. 81:1261-66.
Kaesler, R. L. and J. Cairns, Jr. 1972. Cluster Analysis of Data
from Limnological Surveys of the Upper Potomac River. Am. Midi.
Nat. 88:56-67.
Kaesler, R. L., J. Cairns, Jr., and J. M. Bates. 1971. Cluster
Analysis of Noninsect Macroinvertebrates of the Upper Potomac
River. Hydrobiol. 37(2):173-81.
Kaesler, R. L., J. Cairns, Jr., and J. S. Crossman. 1974. Redundancy
in Data from Stream Surveys. Water Res. 8:637-42.
Kaesler, R. L. and E. E. Herricks. 1977. Analysis of Data from
Biological Surveys of Streams: Diversity and Sample Size. Water
Res. Bull. 13(l):125-35.
Kaesler, R. L., E. E. Herricks, and J. S. Crossman. 1978. Uses of
Indices of Diversity and Hierarchical Diversity in Stream Surveys.
In ASTM Symposium on Quantitative and Statistical Analyses of
Biological Data for the Assessment of Water and Wastewater Quality,
Minneapolis, Minn. (June 20-21, 1977).
Kolkwitz, R. and M. Marrson. 1908. Ecology of Plant Saprobia. Rep.
Ger. Hot. Soc. 26a:505-19.
Kolkwitz, R. and M. Marrson. 1909. Ecology of Animal Saprobia. Int.
Rev. Hydrobiol. Hydrogeogr. 2:126-52.
Krebs, C. J. 1972. Ecology: The Experimental Analysis of Distribution
and Abundance. New York: Harper and Row. 694 pp.
Kruskal, J. B. 1964a. Multidimensional Scaling by Optimizing Goodness
of Fit to a Nonmetric Hypothesis. Psychometrika. 29:1-27.
Kruskal, J. B. 1964b. Nonmetric Multidimensional Scaling: A Numerical
Method. Psychometrika. 219:115-29.
Margalef, R. 1956. Information y Diversidad Espicfica en las Cominudades
de Organismas. Invest. Pesq. 3:99-106.
Parker, B. C. and B. L. Turner. 1961. Operational Niches and Community-
Interaction Values as Determined from In Vitro Studies of Some Soil Algae.
Evolution. 15(2):228-238.
Patil, G. P. and C. Taillie. 1976. Ecological Diversity: Concepts,
Indices, and Applications. Proc. 9th Int. Biom. Conf. 2:383-411.
156
-------
Patrick, R. 1961. A Study of the Numbers and Kinds of Species Found
in Rivers in Eastern United States. Proc. Acad. Nat. Sci.
Philadelphia 113(10):215-58.
Patrick, R. 1967. Natural and Abnormal Communities of Aquatic Life
in Rivers. Bull. S. C. Acad. Sci. 29:19-28.
Peet, R. K. 1974. The Measurement of Species Diversity. Ann. Rev. Eco.
Syst. 5:285-307.
Pennak, R. W. 1971. Toward a Classification of Lotic Habitats.
Hydrobiol. 30:321-334.
Pielou, E. C. 1966a_. The Measurement of Diversity in Different Types
of Biological Collection. J. Theor. Biol. 13:131-44.
Pielou, E. C. 1967. The Use of Information Theory in the Study of the
Diversity of Biological Population. Proc. 5th Berkeley Symp. Math.
Stat. Probab. 4:163-77.
Pielou, E. C. 1969. An Introduction to Mathematical Ecology. New York:
John Wiley & Sons. 286 pp.
Pielou, E. C. 1974. Population and Community Ecology: Principles
and Methods. New York: Gordon and Breach. 424 pp.
Pielou, E. C. 1975. Ecological Diversity. New York: Wiley-Interscience.
165 pp.
Pielou, E. C. 1977. Mathematical Ecology. Wiley-Interscience, New
York, 311 pp.
Roback, S. S., J. Cairns, Jr., and R. L. Kaesler. 1969. Cluster
Analysis of Occurrence and Distortion of Insect Species in a
Portion of the Potomac River. Hydrobiol. 34:484-502.
Rohlf, F. J. 1970. Adaptive Hierarchical Cluster Schemes. Syst. Zool.
19:58-82.
Rohlf, F. J. 1972. An Empirical Comparison of Three Ordination
Techniques in Numerical Taxonomy. Syst. Zool. 21:271-280.
Shannon, C. E. and W. Weaver. 1949. The Mathematical Theory of
Communication. Urbana: Univ. of Illinois Press.
Shannon, E. E. 1970. Eutrophication-Trophic State Relationships in
North and Central Florida Lakes. Ph.D. Thesis, Univ. of Florida.
258 pp.
Shelford, V. E. 1915. Principles and Problems of Ecology as Illustrated
by Animals. J. Ecol. 3:1-23.
157
-------
Simpson, E. H. 1949. Measurement of Diversity. Nature 163:688.
Simpson, G. G. I960. Notes on the Measurement of Faunal Resemblance.
Am. J. Sci. 258a:300-ll.
Sneath, P.H.A. and R. Sokal. 1973. Numerical Taxonomy.
San Francisco: W. H. Freeman and Co. 573 pp.
Sokal, R. R. 1961. Distance as a Measure of Taxonomic Similarity.
Syst. Zool. 10:70-79.
Sokal, R. R. and F. J. Rohlf. 1962. The Comparison of Dendrograms by
Objective Methods. Taxon 11(2):33-40.
Sokal, R. R. and P.H.A. Sneath. 1963. Principles of Numerical Taxonomy.
San Francisco: W. H. Freeman and Co. 359 pp.
Soukup, J. F. 1970. Fish Kill # 70-025, Clinch River, Carbo,
Russell County. Unpublished.
Steinmann, P. 1907. Die Tierwelt der Gebirgsbache: Eine
Faunistisch-Biologische Studie. Ann. Biol. Lacustre 2:30-150.
Steinmann, P. 1908. Die Tierwelt der Gebirgsbache. Arch. Hydrobiol.
3:266-73.
Stephenson, W. 1972. The Use of Computers in Classifying Marine
Bottom Communities. Oceanogr. South Pac. 31:463-73.
Stephenson, W. and M.C.L. Dredge. 1976. Numerical Analysis of Fish
Catches from Serpentine Creek. Proc. R. Soc. 87:33-43.
Stephenson, W., Y. I. Raphael, and S. D. Cook. 1976. The Macrobenthos
of Bramble Bay, Moreton Bay, Queensland. Mem. Od. Mus. 17(3):425-
47.
Stephenson, W., W. T. Williams, and S. D. Cook. 1972. Computer Analysis
of Petersen's Original Data on Bottom Communities. Ecol. Monogr.
42(4):387-415-
Train, Russel E. 1973. Address to the National Conference on Managing
the Environment. U.S. EPA.
Whiltaker, R. H. 1975. Communities and Ecosystems. 2nd ed. New York:
Macmillan. pp. 124-25.
Whitten, B. A., ed. 1975. River Ecology. Berkeley: Univ. of Calif.
Press. 725 pp.
Wilhm, J. L. 1968. Use of Biomass Units in Shannon's Formula. Ecology
49:153-156.
158
-------
Wilhm, J. L. and T. C. Dorris. 1968. Biological Parameters for Water
Quality Criteria. Bioscience 18(6):477-81.
Williams, C. B. 1950. The Application of the Logarithmic Series to the
Frequency of Occurrence of Plant Species in Quadrats. J. Ecol.
38:107-138.
Woodwell, G. M. 1970. Effects of Pollution on the Structure and
Physiology of Ecosystems. Science 168:429-33.
159
-------
TECHNICAL REPORT DATA
(Please rcatl InMnirtiofit on the rtitne bifftrc c
REPORT NO.
EPA-GQO/7-84-042
3 RECIPIENT'S ACCESSION NO.
4. TITLE AND SUBTITLE
Consolidation of Baseline Information, Development of
Methodology, and Investigation of'Thermal Impacts on
Freshwater Shellfish, Insects, and Other Biota
5 REPORT DATE
March 19Q4
6. PERFORMING ORGANIZATION CODE
7. AUTHORIS)
John S. Grossman, James R. Wright, Jr., and
Roger L. Kaesler
8 PERFORMING ORGANIZATION REPORT NO
TVA/EP-78/09
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Office of Natural Resources
Tennessee Valley Authority
Knoxville, Tennessee 37902
10 PROGRAM ELEMENT NO
TNE-6PSA
II CONTRACT/GRANT NO.
EPA-IAG-DS-E721
12. SPONSORING AGENCY NAME AND ADDRESS
Office of Environmental Processes and Effects Research
Office of Research and Development
U.S. Environmental Protection Agency
Washington, DC 204GO
13 TYPE OF REPORT AND PERIOD COVEREC
14 SPONSORING AGENCY CODE
EPA/600/16
15. SUPPLEMENTARY NOTES
This project is part of the EPA planned and coordinated Federal Interagency Energy/
Environment Research and Development Program.
16. ABSTRACT
A computerized information system was developed for storing, retrieving, and
analyzing data collected during limnological surveys. To facilitate storage of
information, a series of hierarchial codes was developed. These codes not only
reduced storage requirements, but also helped reduce computing costs.
The information system utilized three analytical procedures, cluster analysis,
ordination using nonmetric multidimensional scaling (MDS), and measurement of
species diversity from information theory.
Results indicated that identification to species contributed little information
about the structure of communities that discrimination of genera had not already
provided.
The heuristic properties of species diversity were used to evaluate two
classifications stressing functional morphology and trophic-functional relationships
of benthic invertebrates, independent of the taxonomic hierarchy. Both methods
produced results similar to ones obtained by cluster analysis, suggesting that
they merit further investigation.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS
Ecology, Environment, Hydrology, Methodolog
Limnology, Information Systems
18. DISTRIBUTION STATEMENT
Release to public
b. IDENTIFIERS/OPEN ENDED TERMS
r Control Technology
Thermal, Nuclear,
Coal
Effects
Environmental, Nuclear,
Coal
19 SECURITY CLASS llHiiKtfiani
Unclassified
20. SECURITY CLASS (Tlmpaget
Unclassified
t. COSATI 1 icIJ Group
6F, 8A
21. NO. OF PAGES
159
22. PRICE
EPA Form 2220-1 (t-73)
------- |