VARIABILITY OF GLNPO ZOOPLANKTON DATA
Variability of Crustacean Zooplankton
Data Generated by the Great Lakes
National Program Office's Annual
Water Quality Survey
Richard P. Barbiero
DynCorp Science and Engineering Group
1359 West Elmdale Avenue
Suite #2
Chicago, Illinois 60660
Prepared for:
United States Environmental Protection Agency
Great Lakes National Program Office
77 West Jackson Boulevard
Chicago, Illinois 60604
Louis Blume, Project Officer
September 2003
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Acknowledgements
This report was prepared under the direction of Louis Blume, Project Officer, Great Lakes Na-
tional Program Office (EPA Contract No. 68-C-01-091). Assistance with ANOVA calculations
was provided by Ken Miller, DynCorp Science and Engineering Group.
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Summary
A Data Quality Objective (DQO) has been developed by the Great Lakes National Program Office
(GLNPO) to ensure that data collected from their Water Quality Surveys are of suitable quality to
provide decision makers with sufficient certainty to make educated ecological management deci-
sions. The current GLNPO DQO states that data quality should be sufficient for there to be an
80% chance of detecting a 20% change, at the 90% confidence level, between current and historical
measurements of a variable made in a particular lake during a particular season.
This report determines the extent to which zooplankton data comply with the GLNPO DQO, and
assesses the relative contribution of different sources of variability to the overall uncertainty of
zooplankton data. The most important findings are summarized below:
• Data quality of zooplankton data falls far short of the current DQO. In only 3
of 184 cases examined was the DQO criterion met.
• Minimum detectable differences for the major taxonomic groups and the most
common species were largely between 40 and 190%.
• Estimates of cladoceran densities were most variable; estimates of calanoid co-
pepod densities were least variable.
• It is unclear if the current data quality is sufficient to detect ecologically impor-
tant trends. A recent study show that in at least some cases it is (Barbiero and
Tuchman, in press).
• Relatively little variability is due to analyst error in counting/identification.
• About 25% of variability is introduced during the field sampling and/or labo-
ratory subsampling stages.
• The majority of uncertainty in zooplankton data is due to station-to-station
(within basin) variability. Reducing this source of variability would entail in-
creasing the number of sampling stations.
• The most practical way to reduce variability is to ensure proper functioning/
reading of the flow meter.
• Since the variability introduced into the analysis by subsampling in the labora-
tory is unknown, a study quantifying this source of uncertainty could point to
further means of reducing variability.
• An appropriate QC criterion for relative species composition of duplicate labo-
ratory analyses, using the PSc index, is 0.92.
• An appropriate RPD QC criterion for total organism counts in duplicate labo-
ratory analyses is 4%.
4
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
1 Introduction
1.1 GLNPO water quality survey
The Great Lakes National Program Office
(GLNPO) of the U.S. EPA has been involved
in regular surveillance monitoring of the open
waters of the Laurentian Great Lakes since
1983. This surveillance monitoring is meant
to satisfy the provisions of the Great Lakes
Water Quality Agreement (International Joint
Commission 1978), which calls for periodic
monitoring of the lakes to evaluate the effec-
tiveness of pollution control/reduction strate-
gies in the Great Lakes, recognize emerging
problems, and identify the need for new or re-
vised strategies and further research. Accord-
ing to GLNPO (2003), the water quality sur-
veys have been specifically designed to:
• Focus on key physical, chemical, and
biological indicators of lake health
• Evaluate the health of each lake under
different conditions (stratified and un-
stratified)
• Allow for real-time detection of signifi-
cant changes in water quality, as indi-
cated by significant changes in one or
more parameters
• Provide data that can be compared
from year to year
• Provide data to support decisions re-
garding the need for further study or
new pollution control strategies
In order to ensure that data collected from
GLNPO's water quality surveys fulfill these
requirements, a data quality objective (DQO)
has been developed to be applied to all water
quality survey data. Management of data qual-
ity is an important aspect of the larger mission
of the water quality surveys, and requires an
understanding both of the overall magnitude
of variability, and of the relative contributions
of individual components of sample collection
and analysis to total variability. More funda-
mentally, it is also necessary that the DQO be
sufficiently explicit to enable its unambiguous
application to water quality survey data, and
that it be appropriate to the type of data col-
lected by the water quality survey.
Recognition of the importance of open wa-
ter planktonic communities in the overall as-
sessment of ecosystem health led to the inclu-
sion of sampling for zooplankton communi-
ties at the inception of the monitoring pro-
gram. However, data generated from the sam-
pling of biological communities poses special
challenges for the application of the DQO and
for assessments of variability. DQOs are typi-
cally developed in relation to chemical vari-
ables, which are characteristically univariate,
unlike biological community data, which are
multivariate. It is important, therefore, to as-
sess both the extent to which the DQO is ap-
plicable to biological data, and whether or not
that data satisfies the DQO.
1.2 Objectives of study
The overall purpose of the present study was
to provide an assessment of the variability of
data generated by GLNPO's zooplankton
monitoring program. The specific goals of the
study were several fold:
1. To determine the minimum detectable
differences under the current sampling
regime;
2. To determine if the current level of ef-
fort satisfies the GLNPO DQO;
3. To determine the relative contribution
to overall variability of different stages
of sample collection and analysis;
4. To determine appropriate analysis crite-
ria for duplicate laboratory (QC) analy-
ses.
In addition, the applicability of the current
DQO to zooplankton data is discussed.
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
2 GLNPO's Data Quality
Objectives
2.1 Current ambiguities
The assessment of lake health using data gen-
erated from the water quality survey requires
that sufficient data quality be obtained to per-
mit detection of 'significant' changes in the
variables under consideration. For the pur-
poses of the water quality surveys, GLNPO
has defined a significant change as a 20% dif-
ference between current and 'historical' meas-
urements, made for a particular variable in a
particular lake during a particular season. The
DQO for GLNPO's water quality survey is
stated as the ability to "collect measurements
that will yield an 80% chance of detecting a
change of 20% or more within a particular
lake and season, at the 90% confidence
level" (p. 15; GLNPO 2003). This formula-
tion of the DQO, however, contains several
ambiguities, particularly as it relates to multi-
variate data such as that generated from zoo-
plankton analyses. First, as currently stated
the DQO does not indicate what the detection
target of a 20% change is in relation to. Else-
where in the same document, both a compari-
son to 'historical' values (p. 15; GLNPO 2003)
and comparisons between two years (p. 27;
GLNPO 2003) are referred to. As pointed
out elsewhere (Barbiero, 2003), the detection
of a change between a given season's data and
'historical' values can be variously interpreted
to mean a change in relation to the previous
year's data, a change in relation to a pooling of
all previous years' data, or a change in relation
to any previous year's data. An additional
possible interpretation of the DQO would be
to permit the detection of a trend in historical
data, although this would not seem to be com-
pletely consistent with its current formulation.
The DQO also appears to be at variance
with the basic statistical design of the water
quality surveys, in that the target change is de-
fined in the DQO on a lake-wide basis, while
the statistical design of the survey is based on
replication at the level of two or three homo-
geneous basins within each lake (p. 27,
GLNPO 2003). This can be accommodated
for by employing a stratified statistical design
in assessing changes in variables, i.e., by first
computing the values of each variable on a ba-
sin-wide basis, and then combining those esti-
mates in proportion to how much of the lake
each basin accounts for to arrive at a lake-wide
estimate. Under this scenario, variance would
also have to be calculated proportionately. In-
terpreting the DQO in this way, however, as-
sumes that changes can only take place on a
lake-wide basis. In a case where the timing
and/or magnitude of change differed from ba-
sin to basin, as for instance might be expected
in Lake Erie where differences in morphome-
try result in vast differences in the chemical
and biological characteristics of the three ba-
sins, limiting the detection of changes to a
lake-wide basis could obscure changes taking
place only within a given basin.
While it is not within the scope of this re-
port to clarify the ambiguities of the current
GLNPO DQO, in order to apply it to the
zooplankton data, some assumptions had to
be made concerning its interpretation. For the
purposes of this report, the DQO was as-
sumed to denote the requirement of an 80%
chance of detecting of a 20% change between
two years within a given basin for a particular sea-
son at the 90% confidence level.
2.2 Application to multivariate data
Resolving the ambiguities in the current for-
mulation of the DQO is theoretically possible.
More fundamental difficulties exist, however,
in the application of the DQO to data gener-
ated by the zooplankton sampling program.
As with all data generated by the biological
monitoring program, zooplankton data are
multivariate. Each sample, rather than pro-
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
ducing a single value associated with a single
variate, will produce values associated with a
varying number of variates. Variates here cor-
respond to the different species identified in
each sample, and values associated with those
variates correspond both to the densities of
those species, and to their biomass. The vari-
ates produced by a sample will not necessarily
be consistent within groups of replicates, nor
will they even necessarily be the same between
replicate analyses of the same sample. Theo-
retically, then, the DQO could apply to each
individual variate (i.e., species) identified
within a sample. A given sample could there-
fore be called upon to satisfy as many DQOs
as there are species within that sample, which
in the case of zooplankton could be expected
to vary between several and several dozen. In
addition, it might be of interest to assess
changes in broader taxonomic categories of
organisms, for example to assess changes at
the taxonomic level of order or suborder (e.g.,
cladocerans, calanoid copepods, etc.), or to
assess changes in various functional groups (e.
g., grazers, predators, etc.), or indeed to track
changes in total zooplankton density or bio-
mass.
One problem, therefore, arising from the
multivariate nature of zooplankton data is de-
ciding upon the variate(s) of interest. It is
likely that changes in the populations of some
species, or certain groupings of species, are of
little inherent ecological interest, and therefore
do not need to be subject to the DQO. Also,
the statistical difficulties associated with esti-
mating the abundances of species that typically
occur in very small numbers might preclude
their ability to conform to the DQO.
A more fundamental problem exists, how-
ever, if community-level attributes of the zoo-
plankton data are of interest. Examination of
overall community structure often reveals
changes that are not apparent from examina-
tion of individual species (Yan et al., 1996),
and could provide a more relevant measure of
ecosystem health. In this instance, defining an
appropriate metric, and quantifying the vari-
ability associated with that metric, becomes
highly problematic. Changes in community
structure are typically quantified using multi-
variate techniques, but metrics derived from
such techniques are often not easily converti-
ble into a single number, nor are there univer-
sally accepted methods of quantifying the vari-
ance of such metrics, and they thus would not
be easily amenable to assessment in terms of
the current DQO. There are currently no
guidelines in place to enable the application of
the GLNPO DQO to multivariate community
level data.
3 Zooplankton Program
3.1 Overview
GLNPO's regular surveillance monitoring of
the open waters of the Laurentian Great Lakes
began in 1983. Initially, only the open waters
of Lakes Michigan, Huron and Erie were in-
cluded in GLNPO's monitoring program. In
1986, monitoring of Lake Ontario was added,
and in 1992, Lake Superior was included.
Sampling protocols have undergone some
changes since the beginning of the program.
In 1983 and 1984, two vertical zooplankton
tows were taken at each site with a 63-jam
mesh net: one from 2 m above the bottom to
the surface, and a second from 20 m to the
surface (Makarewicz, 1987; Makarewicz,
1988). In 1985, the deeper tow was apparently
discontinued (Makarewicz and Bertram, 1991),
leaving just the 20-m tow. Concerns about
the representativeness of samples collected
from just the upper 20 m of the water column
led to a further change in the zooplankton
sampling protocol. Starting in the summer of
1997, a second tow was added to the sampling
7
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
regime. This tow was taken from a depth of
100 m, or 2 m from the bottom, whichever
was shallower. Unlike previous deep tows, the
100-m tows were taken using a net with a lar-
ger mesh size (153-um) to prevent clogging
and to reduce the pressure wave created by the
net during sampling. Also, the time of day at
which the tows were taken was recorded from
1996 on, something which had not been done
earlier.
There are two main consequences of taking
zooplankton tows from relatively shallow
depths. In species that undergo diurnal verti-
cal migration, 20-m tows taken during the day,
when such species are typically below the
epilimnion, can result in an underestimation of
abundances. This would lead both to unrepre-
sentative samples, and also to an increase in
both inter- and intra-annual variability. If rep-
licate sites are sampled at different times of
day during a cruise, as is often the case, intra-
annual variability would increase, while if sites
are visited at different times of day from year
to year, as is also likely, this would result in an
increase in apparent inter-annual variability.
Secondly, populations of deeper-living zoo-
plankton that rarely migrate above 20 m
would be consistently underestimated in 20-m
tows, whether taken during the day or at night.
Because of the problems inherent in the inter-
pretation of shallow, 63-um mesh tows, em-
phasis in this report will be on the deeper,
153-|J,m mesh tows.
3.2 Field methods
Currently, two sampling tows are performed at
each station. The first tow is 20 meters below
water surface using a 63-um mesh net. The
second tow is a 'full' water column tow, to 2
meters above the bottom of the lake or 100 m,
whichever is less, using a 153-um mesh net. If
the station depth is less than 20 m, both tows
are taken from one meter above the bottom.
Tows are taken with a 0.5-m diameter conical
net (D:L=1:3) equipped with a flowmeter.
Once on station, the biology technician resets
the flowmeter dials to zero, and has the winch
operator lower the net so the rim of the net is
at the surface of water. The net is then low-
ered to the appropriate depth as indicated by a
winch meter on deck, and raised it at a con-
stant speed (at or close to 0.5 meter/second)
until the rim of the net is approximately eye-
level. Upon retrieval the flowmeter meters are
read and the net is rinsed with a hose from the
outside to wash all of the organisms off of the
net cloth inside and into the sample bucket.
The sample is concentrated into the sample
bucket, which is then detached from the net
and its contents rinsed and poured three times
into a pre-labeled 500-mL sample bottle. The
organisms are then narcotized with soda water
and preserved with sucrose formalin solution.
Triplicate tows of each depth are taken at the
master stations.
3.3 Laboratory methods
Microcrustacea are examined in four stratified
aliquots under a stereoscopic microscope. The
sample is subsampled using a Folsom plank-
ton splitter, with half of each split set aside,
and the other half returned to the splitter to be
split again. Successive splits are made until the
last 2 subsamples contain between 200 and
400 microcrustaceans each (not including nau-
plii). In total, four subsamples are examined
and enumerated. Each is removed, in turn,
with a condensing tube and placed in a circular
counting chamber. All microcrustaceans
within each subsample are identified and enu-
merated under a stereozoom microscope. The
four subsamples are: the final two, most dilute
subsamples which contain 200-400 organisms,
in which all microcrustaceans are examined
and enumerated; a third subsample equal in
fraction to the sum of the first two subsam-
ples, which is examined for subdominant taxa
(taxa enumerated less than 40 times in the first
two subsamples combined); and a fourth sub-
8
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
sample equal in fraction to the sum of the first
three from which rare taxa are enumerated. In
general, ten percent of all samples analyzed are
analyzed in duplicate by a second analyst. If a
given lake/cruise has less than 10 samples, at
least one sample from that data set is also ana-
lyzed in duplicate. Duplicate analyses are per-
formed after subsamples are placed into the
counting chamber, and thus quantify variation
associated with enumeration and identifica-
tion, but not with subsampling.
4 Sources of Variability
4.1 Levels of replication
The statistical design of the zooplankton pro-
gram follows that of the broader water quality
monitoring program, with each lake divided
into statistically homogeneous basins (Fig. 1;
Table 1). Within each basin stations function
as replicates, and provide an indication of
large-scale spatial heterogeneity. Each basin
contains a master station, usually located at the
deepest point in the basin, at which triplicate
zooplankton tows are taken. These tows
function as field replicates and are meant to
quantify the variability within' each station as-
sociated with sample collection, including vari-
ability associated with lowering and raising the
net, the angle and actual (as opposed to nomi-
nal) depth of the tow, the functioning/reading
of the flow meter, and the washing of the net
bucket contents into the collection bottle.
These field replicates also capture the variabil-
ity due to smaller scale zooplankton patchi-
ness.
In the laboratory each sample is subsampled,
and subsamples from successive dilutions are
counted to ensure accurate estimation of rarer
species. There is no replication at this stage,
so there is no way to estimate the amount of
error introduced into the analysis by sub-
sampling. The entire contents of each of four
sub-samples are placed successively into the
microscope chamber and identified and enu-
merated by the analyst. A second analyst pro-
Table 1. Assignment of GLNPO water quality survey stations to homogeneous basins with the
five Laurentian Great Lakes.
Lake
Michigan
Huron
Erie
Ontario
Superior
Basin
southern lake
central lake
northern lake
northern lake
central lake
southern lake
western lake
central lake
eastern lake
western lake
eastern lake
western lake
central lake
eastern lake
Stations
MI 11, MI 17, MI 18, MI 19 MI 23
MI 27, MI 32, MI 34
MI 40. MI 41. MI 47
HU 45, HU 48, HU 53, HU 54, HU 61
HU 32, HU 37, HU 38
HU 06. HU 09. HU 12. HU 15. HU 27. HU 93
ER 58, ER 59, ER 60, ER 61, ER 91, ER 92
ER 30, ER 31, ER 32, ER 36, ER 37, ER 38,
ER 42, ER 43, ER 73, ER 78
ER 09. ER 10. ER 15. ER 63
ON 12, ON 25, ON 33, ON 41
ON 49. ON 55. ON 60. ON 63
SU 15, SU 16, SU 17, SU 18, SU 19
SU 06, SU 07, SU 08, SU 09, SU 10, SU 11,
SU 12, SU 13, SU 14
SU 01, SU 02, SU 03, SU 04, SU 05
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Fig. 1. Locations of GLNPO's water quality survey (WQS) sampling stations within homogeneous
basins, as defined by 2003 quality assurance program plan. Master stations indicated in red.
vides duplicate counts and identifications of
10% of the samples. Duplicate analyses cap-
ture variability associated with species identifi-
cations and with the counting of animals
within the chambers. These duplicate analyses
are conducted after the subsamples are placed
into the counting chambers, so as noted, no
estimate of subsampling variability is possible.
A summary of the main sources of variation is
given in Table 2, along with the measures cur-
rently in place to estimate their magnitude.
4.2 Compliance with DQO
Assessing the degree to which the current
sampling effort satisfies the DQO required
that some assumptions be made in order to
resolve the ambiguities in the DQO pointed
out in Section 2.1. As stated earlier, it was as-
sumed that the DQO required data of ade-
quate quality to permit an 80% chance of de-
tecting a 20% change in a given variable be-
tween two years within a given basin and season
with 90% confidence. Basins were defined
according to GLNPO (2003) as listed in Table
1.
Assessment of such a change can be accom-
plished with a two sample /-test. Therefore,
determination of the minimum detectable dif-
ference currently permitted by the data can be
computed using the following formula:
where:
Sp2 = sample estimate of pooled population
variance; and
8 = the minimum detectable difference.
It was also necessary to make some assump-
tions about which variates should be subject
to the DQO. In this report, the following ma-
jor taxonomic groupings were assessed: total
cladocerans, total adult cyclopoids, total
10
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Table 2. Sources of variability in zooplankton analysis.
Source of Variability
Current Measure
Within-Basin Spatial Heterogeneity
Sample Collection, Small-Scale Patchiness
Sub-Sampling
Laboratory Analysis
Replicate Stations Within Basin
Replicate Field Tows at Master Stations
None
Duplicate QC Counts on 10% of Samples
cyclopoid copepodites, total adult calanoids,
total calanoid copepodites and total crusta-
ceans excluding nauplii. In all cases, density
rather than biomass was used. Groups consti-
tuting less than 20% of total density for any
basin/season combination were excluded
from the analysis. In addition, minimum de-
tectable differences were calculated for several
of the most common species. These included
the cladocerans Daphnia galeata mendotae and
Bosmina longirostris, the cyclopoid copepod Dia-
cyclops thomasi, and the calanoid copepods Lep-
todiaptomus minutus and Leptodiaptomus ashlandi.
Only data generated from the deeper, 153-um
mesh tows were assessed. Estimates of vari-
ance were calculated from 1998 data using
only regular field samples.
4.3 Sources of variability
There are problems posed in trying to assess
the variability of multivariate data. Conven-
tional indices of dispersion, e.g., standard de-
viation, interquartile range, etc., are strictly
speaking not applicable to multivariate data,
and therefore if used must be applied either to
broad summations of the data (e.g., total num-
bers of crustaceans, total numbers of cladocer-
ans, etc.), or must be calculated separately for
each individual variate (i.e., each taxonomic
group). This results in a multitude of esti-
mates of variability for each sample, the exact
number of which depends upon the number
of species encountered in that sample. The
collective interpretation for a given sample of
these estimates of variability is problematic.
Alternatively, recourse can be made to mul-
tivariate techniques. A number of different
numerical techniques have been developed in
ecology to quantify degrees of identity be-
tween pairs or groups of samples which treat
this multivariate data as a whole. Among
these techniques, measures of similarity seek
to provide objective measures of the degree of
identity in the structure of two communities.
Typically these indices involve summing up
the differences in the abundances or bio-
volumes of individual species between two
samples/sites, which reduces these differences
to a single number scaled between 0 and 1.
The inverse of these measures, i.e., dissimilar-
ity, can also be computed to quantify the dis-
tance of two samples from each other. Where
a number of samples are assumed to represent
the same 'population' (used here in a statistical
sense), then the calculation of a matrix of
similarity values between these samples can be
used to represent the degree of variability
among those replicate samples. While this ap-
proach has the dual advantage of treating mul-
tivariate data in its entirety, and of reducing
comparisons between samples to a single
number, the drawbacks are that these tech-
niques, when used as measures of variability,
are not strictly comparable with more standard
methods, and furthermore, the characteristics
(e.g., expected distributions) of the numbers
generated by these comparisons are not fully
defined, as is the case with, for example, esti-
mates of parametric variance. Also, when
more than two samples are compared, the re-
sulting similarity comparisons produce a ma-
trix of values rather than a single value, and
11
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
thus the necessity of reducing these to a single
number remains. Unlike the multiplicity of
variances produced by analyzing each variate
separately by more conventional means,
though, the values of a similarity matrix all es-
timate the same thing, namely the degree of
dispersion amongst a set of replicates. In spite
of these drawbacks, the benefits provided by a
technique capable of fully comparing sets of
multivariate data recommend its use in the
present context.
Here, both approaches (i.e., calculation of
parametric variance on individual variates and
comparison of samples using similarity indi-
ces) were used to assess the variability of
GLNPO's zooplankton data. While these two
methods are complementary, their results are
also largely incommensurate, quantitatively,
and this should be borne in mind when inter-
preting the results presented here.
4.3.1 ANOVA analyses
To assess the relative contributions of the
various stages of sample collection and analy-
sis outlined in Table 2 to the overall variability
of zooplankton data, analyses of variance were
conducted. The sample analysis scheme of
the zooplankton program can be thought of as
being comprised of a number of hierarchical
stages. Within each lake, basins have been de-
fined by GLNPO to be statistically homoge-
neous regions. Within basins, stations serve as
replicates. Multiple tows, performed at master
stations, in turn serve as subsamples within
those stations. Duplicate laboratory analyses,
finally, serve as 'subsamples' of sample analy-
sis. The variance associated with each of these
hierarchical levels can be estimated using a
multi-factor nested analysis of variance
(ANOVA). The theoretical factor structure of
the GLNPO zooplankton data is illustrated in
Fig. 2.
In fact, though, the zooplankton data pre-
sents an extremely unbalanced statistical de-
sign. Field replicates are only nested within
one station per basin (the master station), and
duplicate laboratory analyses are conducted,
on average, on only one sample per lake, and
are rarely nested within field replicates. This
both complicates the calculation of the
ANOVA, and can also lead to anomalous re-
sults. Specifically, an unusually high degree of
variability in a single pair of analyses at one
level of replication (e.g., laboratory duplicate
analysis) can mask the variability in the next
higher level of subsampling (e.g., field replica-
tion).
ANOVA analyses were carried out on six
variates: total adult calanoids, total calanoid
copepodites, total adult cyclopoid copepods,
total cyclopoid copepodites, total cladocerans,
and total crustaceans, exclusive of nauplii.
Only data generated from the deeper, 153-um
mesh nets were used. Data were natural log
transformed prior to analysis; where zeros oc-
curred in the data, 1 was added to all values
prior to transformation. Separate analyses
were conducted for the two years examined
(1998, 1999) and the two seasons (spring,
BASIN 1
Sitel
FD 1
x|
FD2
1
FD3
1
Site 2
FD 1
x|
FD2
1
FD3
1
SiteS
FD 1
x|
FD2
x|
FD3
x|
BASIN 2
Sitel
FD 1
XX
FD2
1
FD3
1
Site 2
FD 1
x|
FD2
x|
FD3
x|
SiteS
FD 1
x|
FD2
1
FD3
1
Fig. 2. Illustration of factor structure for hierarchical analysis of variance of GLNPO zooplankton
data for hypothetical two basin lake. FD indicates field replicate; cells for laboratory replicates are
12
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
summer). Cladocerans were not analyzed in
spring samples due to low numbers. In all a
total of 22 analyses were performed.
Sources of variation included between basin
variance, between station within basin), be-
tween field replicate within station) and be-
tween laboratory duplicate (within field repli-
cate) variance. The structure of the analysis
assumed that the amount of variance contrib-
uted by each factor was similar for all levels of
that factor, so that, for instance, between sta-
tion variability was similar within all basins.
However, it was noted that the variability be-
tween stations in the western and central ba-
sins of Lake Erie was extremely high. In order
that this not exert an undue influence, these
two basins were removed from the analysis.
The magnitude of the different variance com-
ponents was computed as a percentage of the
total variance minus between basin variance, i.
e., variance components were calculated as a
percentage of within basin variance.
This approach can provide information
about the amount of variability involved in es-
timating densities of major taxonomic groups.
However, it cannot address variability in esti-
mates of species composition. This distinc-
tion should be borne in mind when interpret-
ing the results. If the species composition of
the zooplankton community within a basin is
consistent from site to site, but the total num-
bers of organisms vary widely, an ANOVA
will indicate high levels of variability. On the
other hand, if the species composition of the
community is vastly different from site to site,
but densities of individuals are similar within
each broad taxonomic category, then an
ANOVA will indicate low variability.
4.3.2 Similarity analyses
As indicated earlier, special problems are
posed in trying to quantify the variability of
multivariate data. While the data can be sum-
marized by broad taxonomic category into a
smaller number of individual variates, and
variance calculated using univariate methods
as outlined above, this approach will not be
able to detect compositional shifts at lower
taxonomic levels, and thus cannot give a true
picture of variability at the community level.
It is desirable, instead, to use a measure of
variability that can simultaneously compare all
the variates within samples, and which can
produce a single number to quantify the de-
gree to which the samples diverge.
The approach adopted here involves meas-
ures of similarity/dissimilarity. These meas-
ures compare two multivariate samples and
produce a single number indicating to what
extent the two samples share the same species,
and optionally to what extent those species are
present in similar densities in the two samples.
It is important to bear in mind that a similarity
value is the result of a comparison between two
samples. To compare a set of replicates, then,
each replicate must be compared with each
other replicate, and a matrix of similarity val-
ues obtained, from which some measure of
central tendency (e.g., median, mean) can be
computed. Thus for N samples, [N(N-l)]/2
comparisons would be performed.
The primary differences between most simi-
larity indices have to do with whether each
species will be compared on the basis of pres-
ence/absence, relative abundance, or absolute
abundance. Where relative abundances are
compared, the similarity measure will be sensi-
tive to differences in species composition, but
not to variability associated with estimating
overall densities. Where absolute abundances
are used, variability in both species composi-
tion and densities will be quantified with the
similarity measure. Using both types of simi-
larity measures in tandem, therefore, provides
a means of assessing whether the variability
between two samples is due primarily to dif-
ferences in species composition, or differences
in densities.
Of the similarity measures based on com-
13
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
parisons of relative abundances, one of the
most intuitive and most commonly used is the
Percentage Similarity of Community (PSc) in-
dex of Whittaker (1952; Whittaker and Fair-
banks, 1958). As suggested by its name, this
index compares percent abundances of species
in two samples. Therefore, if two samples
have vastly differing total numbers of indi-
viduals, but the species within each sample
contribute exactly the same proportion of in-
dividuals, then the PSc index will indicate that
the two samples are identical. The index is
calculated as:
where a and b are, for a given species, the rela-
tive proportions of the total samples A. and B,
respectively, which that species represents.
The absolute value of their difference is then
summed over all K species. Two samples in
which all species are present in identical pro-
portions will result in a score of 1 (or 100%),
while two samples sharing no species in com-
mon will produce a score of 0.
Another widely used index, but one which
compares absolute abundances of species in
two samples, is the so-called Bray-Curtis in-
dex. Originally developed by Kulczynski
(1927), and subsequently modified by Motyka
et al. (1950), this index provides a number
from 0 (no species in common) to 1.0
(identical samples) similar to that of
Whittaker's PSc index. The index is calculated
where a = the sum of all species abundances
W
C = 2——
a + b
in sample in sample, b = the sum of all species
abundances in the other sample, W = the
smaller of the two abundances for each spe-
cies, summed over all species. In this report,
this index will be referred to as C, in accor-
dance with its presentation in Motyka et al.
(1950). When these two indices are used to-
gether, they can provide both qualitative (i.e.,
relative) and quantitative information about
the similarity of two samples. Specifically,
when C values are substantially lower than PSc
values, this indicates that differences between
the two samples derive at least in part from
differences in absolute numbers of individuals
in the two samples. Where the two values are
substantially the same, then differences be-
tween the two samples are due primarily to
differences in species composition.
To quantify levels of variability associated
with natural variation and different sample
collection/analysis activities, similarity matri-
ces were computed between samples taken
within each basin (separated by season and
mesh size), between sets of field replicates,
and between duplicate laboratory analyses.
Separate matrices were generated for spring
and summer, and 63- and 153-um mesh tows.
Differences in similarity values generated
from the two different measures, as well as
differences in values from each measure due
to season and mesh size, were assessed using a
Mann Whitney rank sum test. While it would
have been preferable to use a multifactor
ANOVA to assess all factors simultaneously,
no transformation was found that could stabi-
lize variance and ameliorate the non-normality
of the data, and formulations for a non-
parametric, multifactor ANOVA type test
could not be found.
To estimate the relative contributions of
within basin spatial heterogeneity, sample col-
lection, and laboratory analysis to the variabil-
ity of the data, similarity values were con-
verted to dissimilarity values by subtracting
them from 1. To determine the relative mag-
nitudes of each source of uncertainty, the
mean dissimilarity associated with each stage
14
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
was subtracted from that of the previous
stage. For example, to determine the amount
of dissimilarity contributed by sample collec-
tion, the dissimilarity estimates generated from
QC analyses were subtracted from those gen-
erated from field replicates. Likewise, an esti-
mate of the amount of dissimilarity contrib-
uted by site to site variability was obtained by
subtracting the dissimilarity of field replicates
from within-basin dissimilarity values.
Results
5.1 Minimum detectable differences
The percent minimum detectable differences
for total crustaceans ranged between 31%
(southern basin of Lake Michigan, spring) and
176% (western basin of Lake Erie, spring),
with a median of 63% (Fig. 3). For this re-
sponse variable, no basin/season met the
DQO. The highest values were seen in Lake
Erie, although all lakes had at least one value
approaching or exceeding 100%. For these
basin/seasons, therefore, the current sampling
250
i i Spring
^^m Summer
20% Diff.
Total Crustaceans
Total Cladocerans
W C E
Superior
N C S
Michigan
N C S
Huron
W C E
Erie
W E
Ontario
Fig. 3. Percent minimum detectable differences for total crustaceans and total cladocerans. Vari-
ances calculated from 1998 data.
15
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
CD
O
c
CD
I
b
0)
.Q
CD
"O
3
"CD
Q
E
'c
Spring
^^f Summer
20% Diff
W
Superior Michigan
N C S
Huron
W E
Ontario
Fig. 4. Percent minimum detectable differences for adult calanoid copepods, immature
(copepodite) calanoid copepods, cyclopoid copepods and immature (copepodite) cyclopoid cope-
pods.
16
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
regime would have an 80% chance of detect-
ing a change in total crustacean density with
90% confidence only if that change consti-
tuted at least a doubling in density. Percent
minimum detectable differences for total
cladocerans could only be assessed for sum-
mer samples, due to low numbers in spring
samples. These were substantially higher than
for total crustaceans, with basin-wide values
ranging from 44% (eastern basin of Lake On-
tario) to 262% (northern basin of Lake Michi-
gan), and an overall median value of 143%
(Fig. 3). Again, no basin met the DQO re-
quirements.
Minimum detectable differences were lower
for both adult and immature calanoid cope-
pods (Fig. 4), and this was probably due at
least in part to the great numbers of these in-
dividuals found at most sites. Median percent
minimum detectable differences were 59%
and 60%, respectively, for these groups. Per-
cent minimum detectable differences for
cyclopoids were intermediate between clado-
cerans and calanoids, again probably due in
part to their relative abundances (Fig. 4). Me-
dian percent minimum detectable differences
for adult and immature cyclopoids were 86%
and 96%, respectively. Among the copepod
groups, the DQO was met in only two cases:
calanoid immatures in the central basin of
Lake Michigan in the summer and cyclopoid
immatures in the eastern basin of Lake On-
tario in the spring. Overall, percent minimum
300
§
CD
I
b
200 -
100 -
0
CLA
CAL
CALIM
CYC
CYCIM
TOT
Fig. 5. Percent minimum detectable differences for major taxonomic groups CLA - total clado-
cerans; CAL= total adult calanoids; CALIM = total calanoid copepodites; CYC = total adult
cyclopoids; CYCIM= total cyclopoid copepodites; TOT = total crustaceans, exclusive of nauplii.
Boxes indicate 25th and 75th percentiles; whiskers denote 10th and 90th percentiles; lines denote me-
dian; symbols denote outliers.
17
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
250
200
150 H
CD
O
c
CD
I
b
_CD
.Q
S
"O
3
"CD
Q
E
D
E
0
200
150
100
0
150
Leptodiaptomus ashlandi
Leptodiaptomus minutus
Limnocalanus macrurus
Diacyclops thomasi
i i Spring
^^f Summer
20% Diff.
W C E
Superior
N C S
Michigan
N C S
Huron
W C E
Erie
J
W E
Ontario
Fig. 6. Percent minimum detectable differences for the most common adult calanoid copepod spe-
cies.
18
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
CD
O
c
CD
I
b
E
D
E
Spring
^^f Summer
20% Diff.
D. galeata mendotae
Bosmina longirosths
W C E
Superior
N C S
Michigan
N C S
Huron
W E
Ontario
Fig. 7. Percent minimum detectable differences for the most common cladoceran species.
detectable differences were highest for clado-
cerans, lowest for calanoids, and intermediate
for cyclopoids (Fig. 5).
Of the six individual species examined, in
only one case was the DQO requirement met
(-L. ashlandi, Lake Michigan, northern basin,
summer). Percent minimum detectable differ-
ences ranged from 12% to 256%, with an
overall median of 94% (Figs 6 and 7). This
suggests that, on average, the density of a spe-
cies would have to double from one year to
another in order for the current sampling re-
gime to be able to detect the change as statisti-
cally significant. Overall, the two cladocerans
(D. galeata mendotae and B. longiwstris) had
higher percent minimum detectable differ-
ences than the copepods examined. As with
the larger taxonomic groupings, there were no
clear lake to lake differences in percent mini-
mum detectable differences for the individual
species.
When considered in aggregate on the basis
of lake basin, minimum detectable differences
were consistently higher in the western basin
of Lake Erie than in the other basins (Fig. 8).
The eastern basin of Lake Superior and the
southern basin of Lake Huron exhibited con-
sistently low minimum detectable differences.
19
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
3.0
CD
O
c
CD
I
b
E
|
'c
2.0 -
1.0 -
0.0
-t-
±
WCE
Superior
NCS
Michigan
NCS
Huron
WCE
Erie
WE
Ontario
Fig. 8. Percent minimum detectable differences basins. Box plots as in Fig. 5.
Aside from these instances, though, percent
minimum detectable differences were highly
variable, and clear basin-to-basin differences
were not seen.
5.2 Sources of variability of zooplank-
ton data - ANOVA analyses
In almost all cases, the largest source of vari-
ance in the estimation of within-basin abun-
dances of major taxonomic groups was associ-
ated with between-station variability (Table 3).
This contributed from 23% (summer, 1999,
total crustaceans) to nearly 95% (summer,
1998, adult cyclopoids) of the within-basin
variance. On average, between-station vari-
ance made up about 70% of the total within
basin variance. This suggests that large scale
spatial heterogeneity in abundances is the
main source of uncertainty in developing ba-
sin-wide estimates of crustacean abundances.
Variances associated with field replicates
contributed on average 26% to total within-
basin variance, and ranged from 2.4%
(summer 1998, cyclopoids) to 76.7% (summer
1999, total crustaceans). It should be borne in
mind that since replicates are not taken at the
point of subsampling in the laboratory, vari-
ances calculated from field replicates would
also incorporate that variance component.
The least amount of variance was contributed
by duplicate laboratory (QC) analyses, which
on average contributed less than 4% of total
within-basin variance. Relatively few duplicate
20
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Table 3. Relative contributions of different sources of variance to the estimation of within-basin
abundances of major taxonomic grouping, as determined by multi-stage hierarchical ANOVA.
Variance Comp
Spring 1998
Between Station
Field Reps
Lab Dups
Summer 1998
Between Station
Field Reps
Lab Dups
Spring 1999
Between Station
Field Reps
Lab Dups
Summer 1999
Between Station
Field Reps
Lab Dups
Cal
42.8%
47.9%
9.3%
48.7%
50.0%
1.3%
69.3%
28.0%
2.7%
64.8%
34.9%
0.4%
Cal Imm
80.7%
17.8%
1.5%
51.8%
19.8%
28.4%
78.6%
20.1%
1.3%
41.6%
58.2%
0.2%
Cla
80.7%
19.2%
0.1%
82.5%
17.5%
0.0%
Cvc
81.5%
17.9%
0.6%
94.9%
2.4%
2.7%
79.6%
19.7%
0.7%
72.8%
26.3%
0.8%
Cvc Imm
76.2%
15.6%
8.2%
87.1%
0.0%
12.9%
79.2%
17.0%
3.9%
79.0%
20.1%
0.9%
Total
83.6%
16.1%
0.4%
70.3%
25.3%
4.3%
78.5%
21.3%
0.1%
23.2%
76.7%
0.1%
Cal - total adult calanoids; Cal Imm - total calanoid copepodites; Cla - total cladocerans;
Cyc = total adult cyclopoids; Cyc Imm = total cyclopoid copepodites; Total = total crusta-
ceans, exclusive of nauplii.
laboratory analyses are carried out, so a single
aberrant counts can have a large impact on
this analysis. This was the case in Summer,
1998, when one set of duplicate laboratory
analyses from Lake Ontario yielded highly di-
vergent estimates of immature copepod densi-
ties. This resulted in both unusually inflated
variance estimates for laboratory duplicates for
immature calanoids and immature cyclopoids,
and anomalously low error estimates of field
replicate variance for those two variates.
In summary, then, it appears that the major-
ity of uncertainty involved in the estimation of
crustacean abundances, at least viewed at the
level of order and suborder, results from large
scale (i.e., station to station) spatial heteroge-
neity, while a relatively minor amount is due to
inaccuracies in counting on the part of labora-
tory analysts. Somewhat less than one third
comes from errors associated with sample col-
lection and/or subsampling in the laboratory.
Given the broad taxonomic groupings used in
this analysis, error due to taxonomic inaccura-
cies would not be included in these estimates
of variance.
21
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
5.3 Sources of variability of zooplank-
ton data - similarity analyses
5.3.1 Duplicate laboratory (QC) analyses
A total of 74 sets of duplicate laboratory (QC)
analyses from both 63-um and 153-um mesh
nets, and both spring and summer cruises,
were assessed, using both PSc and C similarity
indices. Data were from 1998 and 1999, the
only two years for which full datasets of 153-
|j,m mesh net tows are currently available. It
will be recalled that these values quantify the
similarity between tabulated species composi-
tion estimates generated by two different ana-
lysts counting the same sample, subsequent to
sample splitting. Similarity values using both
measures were uniformly high (Table 4); 95%
of PSc values were above 0.91, while 95% of
C values were above 0.88. Median values for
both measures were 0.97. Statistically signifi-
cant differences (a = 0.05) between lakes,
mesh size or season were not found, which
suggests that taxonomic difficulties are not
more marked in any given lake or season, or
for shallow or deeper tows. Differences be-
tween similarity values calculated using PSc
and C also were not apparent. Such differ-
ences would arise from discrepancies in abso-
lute counts of organisms, and the absence of
differences between the two measures indi-
cates that analysts have little trouble consis-
tently counting all of the organisms in the
counting chamber, a conclusion also sup-
ported by the ANOVA results. Subsamples
are chosen specifically to ensure a relatively
narrow range of individual organisms in the
counting chambers - generally between 200
and 400 - so large discrepancies in counts of
individuals would not be expected.
QC limits have as yet not been agreed upon
for zooplankton analyses. Based on the pre-
sent analysis, if duplicate QC counts are com-
pared using the PSc index, a value of 0.92
should be expected in 90% of cases. It is
therefore suggested that this value be adopted
as a QC limit. This limit should be applicable
to both 63-um and 153-um mesh tows taken
during both spring and summer. QC criteria
based on PSc values would guard against taxo-
nomic errors, but not enumeration errors.
When all QC analyses from 1998 and 1999
are examined, the majority of discrepancies
between total counts of organisms resulting
from duplicate laboratory analyses are less
than 2% of the average of the two counts
Table 4. Percentiles of Whittaker PSc and C similarity values for comparisons between pairs of
duplicate laboratory (QC) analyses. Data were from 1998 and 1999, and include data from both
spring and summer cruises and both 63- and 153-um mesh net tows.
PSc
Percentile
95th
75th
50th
25th
5th
Similarity
0.99
0.98
0.97
0.96
0.91
C
Percentile
95th
75th
50th
25th
5th
Similarity
0.99
0.98
0.97
0.95
0.88
22
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Table 5. Percentiles of relative discrepancies in counts of total organisms between duplicate labo-
ratory analyses. Relative discrepancies (A Count %) are calculated as [absolute(count#l-count#2)/
average(count#l, count#2)]*100. Data is from 1998 and 1999, and includes both spring and sum-
mer samples, as well as 63-um and 153-um mesh tows.
Percentile
95th
75th
50th
25th
5th
A Count (%}
5.80%
2.69%
1.53%
0.57%
0.10%
(Table 5). In 90% of cases, differences be-
tween duplicate counts amounted to just over
4% of the average of the two counts. It is
therefore recommended that a relative percent
difference of 4% be adopted as a criterion for
total organism counts of duplicate QC analy-
ses, with those analyses exceeding this limit
subject to recounts by both analysts.
5.3.2 Field replicates
PSc similarity values between field replicates
were on average quite high, with 90% of all
values ranging between 0.84 and 0.97, and an
overall median of 0.93 (Table 6, Fig. 9). This
range is not dramatically lower than similarity
values of QC samples, and indicates that rela-
tively little variability is introduced during the
sampling process as far as relative proportions
of taxa are concerned. PSc similarity between
field replicates taken during the summer
cruises was slightly lower than similarity of
spring field replicates, and this difference,
though slight, was statistically significant
(Table 7). No systematic differences were
found between tows using different mesh
sizes (i.e., deep and shallow tows) (Table 8).
Table 6. Percentiles of Whittaker PSc and C similarity values for comparisons between field repli-
cate analyses. Data were from 1998 and 1999, and include both spring and summer cruises and
both 63- and 153-um mesh net tows.
PSc
Percentile Similarity
C
Percentile
Similarity
95th
75th
50th
25th
5th
0.97
0.95
0.93
0.90
0.84
95th
75th
50th
25th
5th
0.95
0.91
0.86
0.78
0.63
23
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Spring
1.0
0.9 -
1 0.8 H
0.7 -
0.6
AA A-A-::A:A::
AA
n
AA
AA
WCE
su
N
C
Ml
N S
HU
W
C E
ER
Summer
AA
W E
ON
WCE NCS NS WCE WE
SU Ml HU ER ON
Fig. 9. PSc similarity values between field replicates collected in 1998 and 1999. Bars indicate
means; triangles indicate minimum and maximum values for each set of comparisons. Compari-
sons between 63-um mesh tows are left (lighter) bars, comparisons between 153-um tows are right
(darker) bars).
Table 7. Results of Mann Whitney rank sum
test comparing effects of season on values of
PSc similarity comparisons between field rep-
licates.
Group
Median
25%
Spring 0.935 0.900 0.950
Summer 0.920 0.890 0.940
T = 26492.0 P = 0.009
Table 8. Results of Mann Whitney rank sum
test comparing effects of mesh size on values
of PSc similarity comparisons between field
replicates
Group
Median
153 jam
63 jam
0.928
0.928
0.899
0.894
0.949
0.943
T = 25041.0 P = 0.432
24
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Similarity between field replicates calculated
using the C index were substantially lower
than PSc values (Fig. 10, Table 6); this differ-
ence was highly statistically significant (Table
9). C values also exhibited a broader range
than PSc values, and in particular contained
more extremely low values. The difference in
similarity values calculated by the two indices
indicates that field replicates are more variable
in their estimates of zooplankton densities,
while being relatively consistent in their esti-
mates of percent contributions of individual
species.
Table 9. Results of Mann Whitney rank sum
test between PSc and C similarity values.
Group Median 25%
C
PSc
0860
0.930
0.780
0.900
75%
0.910
0.950
T = 69913.0 P = <0.001
Spring
WCE NCS NS WCE WE
SU Ml HU ER ON
Summer
0.6
W E
ON
Fig. 10. C similarity values between field replicates collected in 1998 and 1999. Bars indicate
means; triangles indicate minimum and maximum values for each set of comparisons. Compari-
sons between 63-um mesh tows are left (lighter) bars, comparisons between 153-um tows are right
(darker) bars.
25
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Variability in the estimation of densities be-
tween field replicates can result from a num-
ber of possible factors. Zooplankton patchi-
ness on the spatial scale of the replicate tows -
a scale dependent upon how much the vessel
drifts between replicate tows - would intro-
duce variability into density estimates. Vari-
ability could also result from inaccuracies in
flowmeter readings, due either to malfunction
or to misreading on the part of the technician,
or it can be due to differences between repli-
cates in the angles at which the net is towed.
To test these last two possibilities, regressions
were run between the minimum C index val-
ues within each set of field replicate compari-
sons and the maximum angle of the net for
those field replicates, the maximum difference
in net angle among the field replicates, the
maximum relative difference in flowmeter
readings amongst the three field replicates,
and the depth specific maximum relative dif-
ference in flowmeter readings amongst the
three field replicates. These latter two inde-
max flowmeter - min flowmeter
max flowmeter + min flowmeter
max flowmeter min flowmeter
depth
depth
pendent variables were calculated as follows:
Prior to analysis, C values were transformed
using an arcsin square root transformation to
normalize the data. After transformation, data
met assumptions of normality and homosce-
dasticity.
No relationship was found between C values
and net angle. However, a highly significant
relationship was found between C values and
differences in flowmeter readings between
field replicates (Table 10). This relationship
explained slightly less than a third of the vari-
ance in C values. A similar relationship was
found when depth specific flowmeter values
were examined. Therefore, it appears that a
portion of the variability involved in sample
collection is due to inconsistencies in flow-
meter readings amongst the field replicates.
As noted, this could result from variability in
the meter itself, or from inconsistencies in
reading the meter.
The majority of variance in C values, how-
ever, was not accounted for by flowmeter
readings. This points to patchiness of zoo-
plankton populations, other aspects of sample
handling, such as washing the net, decanting
into bottles, etc., or variance associated with
Table 10. Regression results of C values and maximum relative difference in flow meter readings
between field replicates.
Arcsin sqrt(B-C) = 1.166 - ( 0.380 * Relative diff. in flowmeter readings)
Coefficient Std. Error t P_
Constant
Rel diff flow
1.166
-0.380
0.0186
0.0555
62.7
-6.8
<0.001
<0.001
Analysis of Variance:
DF SS
N = 104
MS
Regression
Residual
Total
1
102
103
0.837
1.823
2.661
0.837
0.0179
0.0258
46.8
<0.001
AdjR2= 0.308
26
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Table 11. Results of Mann Whitney rank sum
test comparing effects of season on values of
C similarity comparisons between field repli-
cates.
Group
Spring
Summer
Median
0.870
0.840
25%
0.810
0.760
75%
0.925
0.900
T = 27036.0 P = 0.001
subsampling in the laboratory as potential ma-
jor sources of variance for this stage of the
analysis.
As with PSc values, there was a significant,
though somewhat slight, difference between C
similarity values generated from spring and
summer cruises, with the latter slightly lower
on average than the former (Table 12). This
was probably due at least in part to the greater
species diversity seen in the summer. A sig-
nificant difference was also found between
Table 12. Results of Mann Whitney rank sum
test comparing effects of mesh size on values
of C similarity comparisons between field rep-
licates.
Group
63 jam
153 jam
Median
0.830
0.880
25%
0.755
0.820
75%
0.895
0.920
T = 21390.0 P = <0.001
mesh sizes, with the smaller mesh size (i.e.,
shallower tows) showing somewhat greater
variability between field replicates, as meas-
ured by the C index (Table 12). This differ-
ence, though, was not entirely consistent
across all basins.
5.3.3 Between station
Within-basin similarity values were only calcu-
lated from samples collected with the deeper,
153-um mesh tows. These similarity values
should theoretically provide an estimate of the
Table 13. Percentiles of similarity values for within-basin samples
PSc
Percentile Similarity
Total
95*
75th
50th
25th
5th
0.93
0.87
0.79
0.69
0.47
C
Percentile
95*
75th
50th
25th
5th
Similarity
0.90
0.80
0.69
0.54
0.25
Spring
95th
75th
50th
25th
5th
0.94
0.90
0.85
0.76
0.58
95*
75th
50th
25th
th
0.92
0.83
0.74
0.60
0.22
Summer
95*
75th
50th
25th
5th
0.89
0.83
0.74
0.61
0.36
95*
75th
50th
25th
5th
0.84
0.75
0.64
0.51
0.27
27
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Table 14. Results of Mann Whitney rank sum
test comparing PSc and C similarity values.
Group
BC
PSc
Median
0.690
0.790
25%
0.540
0.690
75%
0.800
0.872
T = 416276.0 P = <0.001
error contributed to the data from within-
basin spatial heterogeneity (in addition to sam-
ple collection and analysis). However, the
shallower 63-um mesh tows would also in-
clude variation due to vertical migration, since
it is likely that some stations within a basin
would be visited at different times during the
diurnal cycle of at least some species. In order
not to confound these two potential sources
of variation, therefore, only deeper tows were
considered.
As with the field replicate samples, within
basin PSc similarity values were higher than C
values (Table 13); this difference was statisti-
cally significant (Table 14). The differences
between these two measures were more pro-
nounced for within basin comparisons than
for the field replicates, suggesting that, as
might be expected, differences in crustacean
densities varied more from site to site than
within a site. Half of PSc values were between
0.69 and 0.87, while half of Bray Curtis values
Table 15. Results of Mann Whitney rank sum
test comparing effects of season on values of
PSc similarity comparisons between field repli-
cates
Group
Spring
Summer
Median
0.850
0.740
25%
0.761
0.610
75%
0.900
0.828
Table 16. Results of Mann Whitney rank sum
test comparing effects of season on values of
Bray-Curtis similarity comparisons between
field replicates.
Group
Spring
Summer
Median
0.740
0.645
25%
0.600
0.510
75%
0.835
0.750
T = 144516.0 P = <0.001
ranged between 0.54 and 0.80.
For both measures, similarity values of com-
parisons made during the spring were statisti-
cally significantly higher than those of summer
comparisons (Tables 15 and 16). During the
spring, over 75% of spring PSc values were
above 0.75, a value often taken to indicate
samples taken from the same community.
Fully half were above 0.85. In contrast, less
than half of summer PSc values met the 0.75
criterion. The high values in the spring are
most likely reflective of the extremely limited
species composition of spring samples. For
example, average numbers of crustacean taxa
per site ranged between 5 and 8 for the five
lakes in spring, 1999. C values were lower
than PSc values for both seasons (Table 13).
Somewhat less than half of spring values were
above 0.75, while only a quarter of summer
values met or exceeded that value. The differ-
ences between the two indices were more pro-
nounced in spring than in summer, which
again indicates that within-basin species com-
position was more variable in summer.
Values in the central and western basin of
Lake Erie were notably lower than those for
other basins, and this was apparent for both
PSc and Bray Curtis values, indicating that
both species composition and densities varied
greatly within these two basins (Figs 11 and
12). Consistent differences were not apparent
in other basins.
T = 157938.0 P = <0.001
28
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
1.00
0.75 -
: 0.50
E
0.25 -
0.00
Spring
1.00
Summer
0.00
WCENCS NCSWCEWE
Superior Michigan Huron Erie Ontario
Fig. 11. PSc similarity values for within-basin comparisons. Data from 1998 and
1999; 153-um mesh tows. Boxes as in Fig. 6.
29
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
1.00
0.75 -
1 0.50 H
E
0.25 -
0.00
• '±
•
_•__!_
•
• •
Spring
1.00
0.75
= 0.50
E
0.25 -
0.00
. I
Summer
WCENCS NCSWCEWE
Superior Michigan Huron Erie Ontario
Fig. 12. C similarity values for within-basin comparisons. Data from 1998 and
1999; 153-fjm mesh tows. Boxes as in Fig. 6.
30
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
5.3.4 Relative contribution of different
sources of error
An idea of the relative contribution of the
various sources of error can be gained by
comparing PSc and C values from the various
stages of the analysis. Since what is of interest
here is variability, it is more convenient to ex-
press these values as dissimilarity values, rather
than similarity values. This is accomplished by
simply taking their inverse (i.e., 1-PSc; 1-C).
As noted, the amount of uncertainty result-
ing from laboratory analyses is relatively slight.
As measured by dissimilarity this averaged
0.03 (i.e., 1-0.97) for comparisons of spring
samples made by both indices, and 0.04 for
summer samples (Figs 13 and 14). Values
were essentially the same whether relative spe-
cies composition or actual densities are con-
sidered (i.e., when examining PSc or C values).
The amount of dissimilarity resulting from
sample collection is only slightly higher than
that from sample analysis when relative spe-
cies composition is considered. In other
words, estimates of relative species composi-
tion appear to be fairly robust for each par-
ticular site. Again, it is important to remem-
ber that variability due to sub-sampling is not
captured by replicate QC analyses, and would
therefore be incorporated into dissimilarity
values from field replicates. When considered
in terms of absolute abundances, however, the
contribution of sampling variability increases
notably. During spring, on average, it is over
three times higher than that of sample analy-
sis, while in summer it is three and a half times
greater than that of sample analysis (Fig. 14).
This indicates that the greatest introduction of
variability during sample collection is in esti-
mation of absolute densities of organisms,
while estimates of the relative proportions of
constituent species are relatively robust.
In all cases the greatest amount of dissimi-
larity was a result of site to site variation
(Table 17). Even when just relative abun-
dances are compared, site to site variation
contributes more dissimilarity than both labo-
ratory analysis and sample collection com-
bined in spring, while this contribution is close
to double that of laboratory analysis plus sam-
ple collection in summer. When absolute den-
sities (i.e., C values) are considered, the contri-
bution of site to site variation to dissimilarity
doubles in spring, but during summer is essen-
tially the same as that of relative proportions
of species, indicating that there are substantial
differences in species composition from site to
site within a basin during the summer, while
during spring the majority of dissimilarity re-
sults from site to site differences in densities.
In all cases, though, dissimilarity values were
lower when measuring using the PSc index.
The relatively low site to site variability in spe-
cies composition in spring is consistent with
the highly restricted species richness of most
spring communities. As was pointed out pre-
viously, site to site variability was particularly
high in the western and central basins of Lake
Table 17. Relative contributions of different sources of variability to overall within-basin dissimilar-
ity, as measured by both PSc and C indices.
Variance Component
PSc
Spring Summer
C
Spring Summer
Between Station
Field Reps
Lab Dups
51%
27%
23%
64%
19%
17%
54%
35%
10%
45%
42%
14%
31
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
100
75 -
50
25 -
100
75 -
M 50
25 -
PSc
Spring
Basin Field Lab
Summer
Basin Field Lab
Level of Replication
100
75 -
50
25 -
100
75 -
50
25 -
Spring
Basin Field Lab
Summer
Basin Field Lab
Level of Replication
Fig. 13. Comparison of laboratory, field, and basin replicate similarity values. Data for 153-um
mesh tows, 1998 and 1999. Boxes as in Fig. 6.
32
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
PSc
E
(/5
(/5
Q
_CD
I
Lab Reps
Field Reps
Basin Reps
WCENCS NSW
SU Ml HU
W E
ON
Fig. 14. Contribution to variability (as quantified by dissimilarity) of between-site heterogeneity,
sample collection, and laboratory analysis.
Erie, with regard to both species composition
and species densities. While such variation
was high at different times in other basins,
such effects were not consistently noted.
The error resulting from laboratory analyses
as a percentage of overall dissimilarity (Table
17) was much higher than the error compo-
nent of laboratory analyses estimated by
ANOVA (Table 3). In the latter case, this
source of error rarely exceeded a few percent
of total within-basin variability, while dissimi-
larity values of duplicate laboratory analyses
were approximately 10 to 25% of total within-
basin dissimilarity. This in all likelihood does
not represent greater variability in the taxo-
nomical aspect of laboratory analyses (as
quantified by dissimilarity values), but rather
indicates that there is less variability overall
involved in taxonomic analyses, as compared
to estimation of densities. A direct compari-
son of the estimates of sources of error from
ANOVA and dissimilarity analyses is not pos-
sible, however, since these two types of analy-
sis yield quantitatively incommensurate results.
33
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
Discussion
6.1 DQO
Of the 184 cases examined in this study, mini-
mum detectable differences satisfied the crite-
rion set by the DQO in only 3 instances.
While exhibiting a wide range of values, mini-
mum detectable differences in general ranged
between 40% and 190%. This means that in
order for the current sampling regime to de-
tect a change in the densities of major crusta-
cean groups, in some cases these would have
to nearly triple. Minimum detectable differ-
ences were highest for cladocerans, a group of
particular ecological and management interest
given their importance as fish food items. Of
the regions examined, minimum detectable
differences were particularly high in the west-
ern basin of Lake Erie, an area that is typically
subject to high spatial heterogeneity.
While clearly not satisfying the current
DQO, is the current level of sampling effort
adequate to detect ecologically significant
changes? Is normal interannual variability
greater than the DQO criterion of 20%? Un-
fortunately, GLNPO does not currently pos-
sess the data necessary to address these ques-
tions. Only two years of data collected with
153-um mesh nets is available at present, so
statements about year to year variability can
not be made with any confidence. While over
15 years of data collected with the 63-um
mesh net are available, as pointed out above,
interannual variability in this data is con-
founded with variability due to diurnal vertical
migration. However, a recent study (Barbiero
and Tuchman, in press) was able to detect sig-
nificant changes in the densities of many
cladoceran species, as estimated by 63-um
mesh tows, resulting from the invasion of an
exotic zooplankton predator in the mid 1980s.
These changes in many cases were quite dras-
tic, though, and it is unclear if less substantial,
but still ecologically significant, changes would
be detectable under the current sampling re-
gime.
A more fundamental shortcoming of the
current DQO is that it does not afford a
means of assessing community-level data qual-
ity. Data quality can only be assessed indi-
vidually for each of the numerous variates that
collectively make up each zooplankton sam-
ple.
In spite of falling far short of the DQO, the
current sampling program is apparently suc-
cessful at measuring community structure,
though somewhat less successful at measuring
community size. Overall, relative (i.e., PSc)
similarity values for within-basin comparisons
were high, with most comparisons exceeding
Engleberg's (1987) criterion for identical com-
munities of 0.60. C similarity values were al-
ways lower, though this difference narrowed
in the spring, compared to summer. This indi-
cates that community structure can be as-
sessed with some confidence, somewhat more
so in the spring than in summer due to the re-
stricted species richness during the former sea-
son.
Both the lowest similarity values, and the
highest variability of similarity values, were ob-
served in the western and central basins of
Lake Erie. Because of the morphometry of
these basins and the relatively high inflow (in
comparison to volume) entering the western
basin, these areas exhibit substantial spatial
heterogeneity in many variables, so the high
variability of zooplankton data is not unex-
pected.
6.2 Sources of variation
Both ANOVA analyses and similarity meas-
ures indicate that relatively little uncertainty is
contributed by the final stages of zooplankton
analysis. The amount of variance contributed
to estimation of numbers of broad taxonomic
34
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
categories, as estimated by ANOVA, averaged
less than 5%. The amount of dissimilarity
contributed by this stage of the analysis was
about 3-4%, although this represented on av-
erage 20% and 12% of the total measured
within basin dissimilarity for the PSc and C
indices, respectively. The low variance com-
ponent of this part of the analysis is not com-
pletely surprising. Duplicate counts are per-
formed after subsamples are taken and placed
in counting chambers, so discrepancies in
counts would arise strictly from miscounts,
rather than differences in the numbers of or-
ganisms contained in different subsamples.
The counting chamber contains a circular
groove which allows the sample to be enumer-
ated essentially along a continuous transect,
with most of the width of the transect remain-
ing within the field of vision of the micro-
scope. While discrepancies in identifications
might occur between analysts, the low PSc dis-
similarity values suggest that this does not
happen frequently, which again is not surpris-
ing given the limited species diversity of most
zooplankton samples, and the tendency for
samples to be dominated by a small number of
the half dozen common species.
All three measures of variance suggest that
about one quarter of the total within-basin un-
certainty in the zooplankton data is apparent
in field replicates. Variance between field rep-
licates contributed about 25% of within-basin
variability, according to the ANOVA analysis;
PSc dissimilarity values between field repli-
cates contributed, on average, 23% of total
within-basin dissimilarity, while variance be-
tween field replicates contributed 39% of total
within-basin C dissimilarity. Included in this
component of variance is small-scale patchi-
ness, uncertainty associated with sample col-
lection, and also uncertainty resulting from
subsampling in the laboratory.
Station-to-station variability within a basin
contributed the most variance, as quantified
by all three measures, which indicates zoo-
plankton communities vary considerably
within the nominally homogeneous basins.
More of this variability appears to be a result
of differences in densities, rather than differ-
ences in species composition from station to
station. Station-to-station variability contrib-
uted 70% of the total within-basin variability
measured by ANOVA, which specifically
quantifies variance in densities, while 40-60%
of total within basin dissimilarity, which takes
into account differences in species composi-
tion, was contributed by station-to-station dif-
ferences. A comparison of PSc and C values
suggests that during the spring, most of this
variability was a result of differences in abun-
dances, since C values were substantially and
consistently higher than PSc values in this sea-
son. However, during summer, the relatively
high PSc dissimilarity values and the lack of a
substantial difference between PSc and C val-
ues indicate that species composition also var-
ied from station to station with basins.
6.3 Controlling variation
The major source of variation in GLNPO's
zooplankton data appears to be basin-scale
spatial heterogeneity. The most appropriate
way of reducing this source of variability
would be to increase the number of stations
within each basin. It is recognized that this is
probably not a feasible alternative.
The error associated with field replicates
contributed substantially less variability, but
this stage of data generation offers more real-
istic possibilities for reducing overall variance.
As mentioned, this variance component in-
cludes uncertainty due to subsampling in the
laboratory, in addition to the uncertainty in-
volved in sample collection and small scale
spatial heterogeneity. Regression results sug-
gest that a significant amount of uncertainty in
this stage is associated with variations in flow
meter readings between field replicates. This
source of variability can be reduced by ensur-
35
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
ing that all flow meters are in a good state of
repair through a regular schedule of mainte-
nance. Anomalous readings should be recog-
nized by field personnel and should result in
replacement of faulty meters. Records of me-
ter-specific calibrations should be kept on ship
so that large divergences from past calibration
factors can be recognized. It is also necessary
that field personnel be properly trained to en-
sure that meters are read correctly and that po-
tential problems with meters are recognized
early and addressed appropriately.
Other actions can be taken in the field to re-
duce the level of uncertainty introduced at this
stage of data generation. Field personnel
should ensure that zooplankton nets are thor-
oughly rinsed before decanting contents into
sample bottles. They should also exercise care
in ensuring that both net speed and depth are
kept as close to those specified in the standard
operating procedure as possible. The most
difficult element of field sampling to control is
typically the angle of the net. Interestingly, no
relationship was found between variability in
net angle between field replicates and levels of
dissimilarity, which suggests that the impact of
net angle on uncertainty might be relatively
slight.
Since replicate analyses are not conducted
on subsamples taken in the laboratory, the
amount of variability contributed by this stage
of analysis is unknown. Instead, the variability
contributed by subsampling is included in esti-
mates of between field replicate variance.
Since subsampling represents a source of un-
certainty that is particularly amenable to inves-
tigator control, it would be helpful to know
how substantial it is. This could be accom-
plished by analyzing duplicate splits of a single
sample. Ways of reducing uncertainty due to
subsampling include ensuring that the sample
is completely homogenized prior to splitting in
the Folsom splitter, and ensuring that all or-
ganisms are subsequently transferred to the
counting chamber.
36
-------
VARIABILITY OF GLNPO ZOOPLANKTON DATA
References
Barbiero, R.P. 2003. Application of the Great Lakes National Program Office's Data Quality Ob-
jective to Benthos Data Generated by the Annual Water Quality Survey. US EPA, GLNPO,
Chicago II.
Barbiero, R.P. and M.L. Tuchman. 2003. Changes in the crustacean communities of Lakes Michi-
gan, Huron and Erie following the invasion of the predatory cladoceran Eythotrephes ceder-
stroemi Can. J. Fish. Aq. Sci. (inpress).
Engelberg, K. 1987. Die Diatomeen-Zonose in eimem Mittelgegirgsbach und die Abgrenzung
jahreszeitlicher Aspekte mit Hilfe der Dominanz-Identitat. Arch. Hydrobiol. 110:217-236.
GLNPO 2003. Sampling and Analytical Procedures for GLNPO's Open Lake Water Quality Sur-
vey of the Great Lakes. EPA 905-R-03-002, U.S. EPA, GLNPO, Chicago II.
Kulczynski, S. 1927. Die Pflanzenassoziation der Pieninen. Internat. Acad. Polon. Sci., Lettr. Bull.,
Classe Sei. Math, et Nat., ser. B. Sci. Nat. Suppl. 2:1927:57-203.
Makarewicz, J.C. 1987. Phytoplankton and zooplankton composition, abundance and distribution:
Lake Erie, Lake Huron an Lake Michigan - 1983 Volume 1 and 2 U.S. Environmental Pro-
tection Agency. EPA-905/2-87-002. 183 p.
Makarewicz, J.C. 1988. Phytoplankton and zooplankton composition, abundance and distribution:
Lakes Erie, Huron and Michigan - 1984 U.S. Environmental Protection Agency. EPA-
905/3-88-001.
Makarewicz, J.C. and P. Bertram. 1991. Phytoplankton and zooplankton composition, abundance
and distribution Lakes Erie, Huron and Michigan - 1985. U.S. Environmental Protection
Agency. EPA- 905/3-85-003.
Motyka, J., B. Dobrzanski and S. Zawadzki. 1950. Preliminary studies on meadows in the southeast
of the province Lublin. Univ. Mariae Curie-Sklodowska Ann. Sect. E. 5 (13):367-447.
Whittaker, R.H. 1952. A study of summer foliage insect communities in the Great Smoky Moun-
tains. Ecol. Monogr. 22:6-31.
Whittaker, R.H. and C.W. Fairbanks 1958. A study of copepod communities in the Colombia Ba-
sins, Southeastern Washington. Ecology 39:46-63
Yan, N.D., W. Keller, N.M. Scully, D.R.S. Lean and P.J. Dillon. 1996. Increased UV-B penetration
in a lake owing to drought-induced acidification. Nature. 381:141-143.
37
------- |